Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganize CUDAScopedContext #355

Conversation

makortel
Copy link

PR description:

This PR reorganizes CUDAScopedContext in two ways

  1. Split CUDAScopedContext to CUDAScopedContextAcquire and CUDAScopedContextProduce
    • Currently CUDAScopedContext has a mixture of possible operations that is based on how it was constructed. This is rather error prone, e.g. I remember forgetting several times to pass edm::WaitingTaskWithArenaHolder to it leading to weird crashes. Now CUDAScopedContextAcquire always requires the holder.
  2. Rename CUDAContextToken to CUDAContextState, and instead of explicitly saving the state of CUDAScopedContextAcquire into it, pass CUDAContextState to the constructor of CUDAScopedContextAcquire so that the latter can save the state on its destructor.
    • This way is much safer. The pattern is somewhat similar to std::mutex (as member variable) and locks (local variable)

PR validation:

Profiling workflow runs.

Motivation is that the acquire() and produce() need a different
functionality, and are constructed differently (e.g. acquire version
always needs the edm::WaitingTaskWithArenaHolder). This split should
make it more difficult to make mistakes. It should also make future
evolution, e.g. towards chains of TBB tasks alternating in CPU and GPU
work, easier.
Now CUDAScopedContextAcquire takes it as a parameter to constructor,
and stores the state in its destructor (yielding RAII semantics).
@fwyzard
Copy link

fwyzard commented Jun 18, 2019

Validation summary

Reference release CMSSW_10_6_0 at b45186e
Development branch CMSSW_10_6_X_Patatrack at 828f536
Testing PRs:

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.51
  • tracking validation plots and summary for workflow 10824.52

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.51
  • tracking validation plots and summary for workflow 10824.52

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.51
  • tracking validation plots and summary for workflow 10824.52

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

Logs

The full log is available at https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/4110ca453e58dc6903ff55dd22d58424d8352f4e/log .

@fwyzard
Copy link

fwyzard commented Jun 20, 2019

No changes in physics performance, as expected,

@fwyzard
Copy link

fwyzard commented Jun 20, 2019

No changes in performance observed on the T4:

  • reference: 981.2 ± 6.1 ev/s
  • changed: 982.8 ± 6.8 ev/s

Copy link

@fwyzard fwyzard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing which is slightly confusing is the use of CUDAScopedContextProduce in an EDAnalyzer::analyze() method.

But I don't have a better name to suggest, so I'm fine with the changes.

@fwyzard fwyzard merged commit 957e184 into cms-patatrack:CMSSW_10_6_X_Patatrack Jun 20, 2019
@makortel
Copy link
Author

The only thing which is slightly confusing is the use of CUDAScopedContextProduce in an EDAnalyzer::analyze() method.

But I don't have a better name to suggest, so I'm fine with the changes.

I agree that it's a bit confusing. I could add CUDAScopedContextAnalyze that would differ from CUDAScopedContextProduce by not having the wrap() nor emplace(). If you think that would be useful, I can add it in a follow-up PR that introduces a pattern for a chain of CPU tasks and GPU "tasks" within a module.

@fwyzard
Copy link

fwyzard commented Jun 20, 2019

Yes, I think that would make things even clear, thank you.

fwyzard pushed a commit that referenced this pull request Oct 8, 2020
* Split CUDAScopedContext to *Acquire and *Produce

The motivation is that acquire() and produce() need a different
functionality, and are constructed differently (e.g. acquire version
always needs the edm::WaitingTaskWithArenaHolder). This split should
make it more difficult to make mistakes. It should also make future
evolution, e.g. towards chains of TBB tasks alternating in CPU and GPU
work, easier.

* Rename CUDAContextToken to CUDAContextState, and change semantics

Now CUDAScopedContextAcquire takes it as a parameter to constructor,
and stores the state in its destructor (yielding RAII semantics).

* Document the constructors.
fwyzard pushed a commit that referenced this pull request Oct 19, 2020
* Split CUDAScopedContext to *Acquire and *Produce

The motivation is that acquire() and produce() need a different
functionality, and are constructed differently (e.g. acquire version
always needs the edm::WaitingTaskWithArenaHolder). This split should
make it more difficult to make mistakes. It should also make future
evolution, e.g. towards chains of TBB tasks alternating in CPU and GPU
work, easier.

* Rename CUDAContextToken to CUDAContextState, and change semantics

Now CUDAScopedContextAcquire takes it as a parameter to constructor,
and stores the state in its destructor (yielding RAII semantics).

* Document the constructors.
fwyzard pushed a commit that referenced this pull request Oct 20, 2020
* Split CUDAScopedContext to *Acquire and *Produce

The motivation is that acquire() and produce() need a different
functionality, and are constructed differently (e.g. acquire version
always needs the edm::WaitingTaskWithArenaHolder). This split should
make it more difficult to make mistakes. It should also make future
evolution, e.g. towards chains of TBB tasks alternating in CPU and GPU
work, easier.

* Rename CUDAContextToken to CUDAContextState, and change semantics

Now CUDAScopedContextAcquire takes it as a parameter to constructor,
and stores the state in its destructor (yielding RAII semantics).

* Document the constructors.
fwyzard pushed a commit that referenced this pull request Oct 23, 2020
* Split CUDAScopedContext to *Acquire and *Produce

The motivation is that acquire() and produce() need a different
functionality, and are constructed differently (e.g. acquire version
always needs the edm::WaitingTaskWithArenaHolder). This split should
make it more difficult to make mistakes. It should also make future
evolution, e.g. towards chains of TBB tasks alternating in CPU and GPU
work, easier.

* Rename CUDAContextToken to CUDAContextState, and change semantics

Now CUDAScopedContextAcquire takes it as a parameter to constructor,
and stores the state in its destructor (yielding RAII semantics).

* Document the constructors.
fwyzard pushed a commit that referenced this pull request Nov 6, 2020
* Split CUDAScopedContext to *Acquire and *Produce

The motivation is that acquire() and produce() need a different
functionality, and are constructed differently (e.g. acquire version
always needs the edm::WaitingTaskWithArenaHolder). This split should
make it more difficult to make mistakes. It should also make future
evolution, e.g. towards chains of TBB tasks alternating in CPU and GPU
work, easier.

* Rename CUDAContextToken to CUDAContextState, and change semantics

Now CUDAScopedContextAcquire takes it as a parameter to constructor,
and stores the state in its destructor (yielding RAII semantics).

* Document the constructors.
fwyzard pushed a commit that referenced this pull request Nov 16, 2020
* Split CUDAScopedContext to *Acquire and *Produce

The motivation is that acquire() and produce() need a different
functionality, and are constructed differently (e.g. acquire version
always needs the edm::WaitingTaskWithArenaHolder). This split should
make it more difficult to make mistakes. It should also make future
evolution, e.g. towards chains of TBB tasks alternating in CPU and GPU
work, easier.

* Rename CUDAContextToken to CUDAContextState, and change semantics

Now CUDAScopedContextAcquire takes it as a parameter to constructor,
and stores the state in its destructor (yielding RAII semantics).

* Document the constructors.
fwyzard added a commit that referenced this pull request Nov 27, 2020
* Split CUDAScopedContext to *Acquire and *Produce

The motivation is that acquire() and produce() need a different
functionality, and are constructed differently (e.g. acquire version
always needs the edm::WaitingTaskWithArenaHolder). This split should
make it more difficult to make mistakes. It should also make future
evolution, e.g. towards chains of TBB tasks alternating in CPU and GPU
work, easier.

* Rename CUDAContextToken to CUDAContextState, and change semantics

Now CUDAScopedContextAcquire takes it as a parameter to constructor,
and stores the state in its destructor (yielding RAII semantics).

* Document the constructors.
fwyzard pushed a commit that referenced this pull request Dec 25, 2020
* Split CUDAScopedContext to *Acquire and *Produce

The motivation is that acquire() and produce() need a different
functionality, and are constructed differently (e.g. acquire version
always needs the edm::WaitingTaskWithArenaHolder). This split should
make it more difficult to make mistakes. It should also make future
evolution, e.g. towards chains of TBB tasks alternating in CPU and GPU
work, easier.

* Rename CUDAContextToken to CUDAContextState, and change semantics

Now CUDAScopedContextAcquire takes it as a parameter to constructor,
and stores the state in its destructor (yielding RAII semantics).

* Document the constructors.
fwyzard pushed a commit that referenced this pull request Dec 29, 2020
* Split CUDAScopedContext to *Acquire and *Produce

The motivation is that acquire() and produce() need a different
functionality, and are constructed differently (e.g. acquire version
always needs the edm::WaitingTaskWithArenaHolder). This split should
make it more difficult to make mistakes. It should also make future
evolution, e.g. towards chains of TBB tasks alternating in CPU and GPU
work, easier.

* Rename CUDAContextToken to CUDAContextState, and change semantics

Now CUDAScopedContextAcquire takes it as a parameter to constructor,
and stores the state in its destructor (yielding RAII semantics).

* Document the constructors.
fwyzard pushed a commit that referenced this pull request Dec 29, 2020
* Split CUDAScopedContext to *Acquire and *Produce

The motivation is that acquire() and produce() need a different
functionality, and are constructed differently (e.g. acquire version
always needs the edm::WaitingTaskWithArenaHolder). This split should
make it more difficult to make mistakes. It should also make future
evolution, e.g. towards chains of TBB tasks alternating in CPU and GPU
work, easier.

* Rename CUDAContextToken to CUDAContextState, and change semantics

Now CUDAScopedContextAcquire takes it as a parameter to constructor,
and stores the state in its destructor (yielding RAII semantics).

* Document the constructors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants