New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NVProfilerService: improve the handling of modules and events #25749
NVProfilerService: improve the handling of modules and events #25749
Conversation
The code-checks are being triggered in jenkins. |
@cmsbuild, please test |
@makortel FYI |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-25749/8147
|
The tests are being triggered in jenkins. |
A new Pull Request was created by @fwyzard (Andrea Bocci) for master. It involves the following packages: HeterogeneousCore/CUDAServices @cmsbuild, @smuzaffar, @Dr15Jones can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
@makortel please review |
Comparison job queued. |
Comparison is ready Comparison Summary:
|
// there is a possible race condition among different threads processing different events; | ||
// however, cudaProfilerStart() is supposed to be thread-safe and ignore multiple calls, so this should not be an issue. | ||
if (std::all_of(streamFirstEventDone_.begin(), streamFirstEventDone_.end(), [](bool x){ return x; })) { | ||
globalFirstEventDone_ = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed a bit with @Dr15Jones, and we think that streamFirstEventDone_
and globalFirstEventDone_
should be changed to use std::atomic<bool>
. Especially the vector<bool>
is dangerous because it is internally a bit pattern, so setting one bit in current thread may invalidate an operation on a different bit on a same word in another thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@makortel @Dr15Jones thanks for the suggestion.
Do the changes to the member variables (lines 296-297 ) and to the initialisation (lines 541-543) look correct ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks. You could also use globalFirstEventDone_.compare_exchange_strong()
here to call cudaProfilerStart()
exactly once (that would also allow to remove the comments above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion.
I have also changed a bit the way in which I initialise the vector, as this "feels" better - though I expect the end result to be the same.
@fwyzard Could you add "in NVProfilerService" to the end of the PR title? |
ef2f6cc
to
341da25
Compare
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-25749/8160
|
Pull request #25749 was updated. @cmsbuild, @smuzaffar, @Dr15Jones can you please check and sign again. |
@cmsbuild, please test |
The tests are being triggered in jenkins. |
Comparison job queued. |
Comparison is ready Comparison Summary:
|
+1 |
+1 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2) |
@fabiocos if 10.5.0-pre1 is not tagged yet, could you include this ? |
+1 @fwyzard so you may use pre1 as a new base for further integration |
thank you |
Take into account the proper number of modules: increment by one
pathsAndConsumes.allModules().size()
, because it does not include thesource
module.Add asserts to check that all ranges are properly closed.
Optionally, delay starting the profiler until after the first event on each stream has completed.
This requires running
nvprof
with the--profile-from-start off
option.