New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade ConcurrentHadronizerFilter and ConcurrentGeneratorFilter to support concurrent runs #39374
Conversation
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39374/32077
|
A new Pull Request was created by @wddgit (W. David Dagenhart) for master. It involves the following packages:
@SiewYan, @mkirsano, @Saptaparna, @cmsbuild, @alberto-sanchez, @menglu21, @GurpreetSinghChahal can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
please test |
@wddgit Could you add |
I modified the title. Thanks. That is better. |
mutable std::atomic<unsigned long long> greatestNStreamEndLumis_{0}; | ||
mutable std::atomic<bool> streamEndRunComplete_{true}; | ||
// The next two data members are thread safe and can be safely mutable because | ||
// they are only modified/read in globalBeginRun and globalBeginLuminosityBlock. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we (in principle) have globalBeginRun and globalBeginLuminosityBlock transitions being run simultaneously? (obviously the lumi would have to belong to a different run) Or two globalBeginRuns or two globalBeginLuminosityBlocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting question. I would be interested in hearing Chris's (@Dr15Jones) opinion about it.
The existing code in CMSSW serializes globalBeginLumi transitions (and all run transitions). The globalBeginLumi transition completes before the Framework even looks to see what transition should come next. The run concurrency PR currently under review continues that pattern for globalBeginRun. globalBeginLumi and globalBeginRun are serialized like input source activity. They will not run concurrently. It would not be trivial to change this. It would take some effort just to identify the things that would break if we changed this behavior. I think it is built into the current Framework implementation in multiple places.
On the other hand, the TWIKI that documents the multithreading design does not require this serialization: https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkTransitions
Maybe we should add this requirement to this TWIKI documentation?
If we want to make this code support that kind of concurrency in addition to adding support for run concurrency then I need to do more work on this PR (and in multiple other places probably in separate PRs later).
My personal guess is that the performance improvement this additional concurrency would allow is not worth the effort that would be required to implement it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The global run/lumi transitions need to be serialized as the EventSetup system doesn't support concurrent determination of new IOVs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think we should document that the global run/lumi transitions are serialized (I didn't read the twiki David pointed to carefully enough to say whether it would be implied by the text there already or not)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an extra requirement in the TWIKI with this additional requirement on global begin transitions (in the bullet points under the figure, the next to last one is new). Feel free to edit it or let me know if you want me to reword it.
https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkTransitions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks David, looks good.
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-33da22/27486/summary.html Comparison SummarySummary:
|
while (useInLumi_.load() == nullptr) { | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about a comment here explaining why the extra wait?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks.
useInLumi_.store(nullptr); | ||
gen::GenStreamCache<HAD, DEC>* streamCachePtr = this->streamCache(id); | ||
if (this->luminosityBlockCache(lumi.index())->useInLumi_ != streamCachePtr) { | ||
initLumi(streamCachePtr, lumi, es); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for clarity, should have have an else
which sets the lumi cache useInLum_ to nullptr? It isn't strictly needed as the lumi cache will go away at the end of the luminosity block. But it might make the logic easier to understand and avoid possible uses of that cache later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, resetting the lumi cache useInLumi_
here would require the variable to become an atomic, which really isn't worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This transition will run multiple times (once per stream), possibly concurrently. useInLumi_ in the lumi cache is not atomic. Only reading it here is OK, but writing would create a data race. I could make it atomic, but it does not seem worth adding the overhead of an atomic. I'm not sure whether your proposed extra code clarifies or just leads the reader to wonder why one set it null... As is, this lumi cache value is written to in one line of code and read in only this one line code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have refreshed my window. I didn't see your second comment before my reply.
} | ||
|
||
auto lumiCache = std::make_shared<gen::GenLumiCache<HAD, DEC>>(); | ||
lumiCache->useInLumi_ = useInLumi_.load(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If globalBeginLuminosityBlockProduce
uses the lumi cache's useInLumi_
we could reset the member variable useInLumi_
here which might help understanding the logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is critically important that streamBeginLuminosityBlock uses the value from the lumi cache. That is the main reason I created the lumi cache and only using it for that purpose emphasizes that requirement.
For globalBeginLuminosityBlockProduce, we could use either value (the module data member or the lumi cache member). To me it just seems easier to use the class data member because it avoids the extra code to get the value out of the lumi cache. I'll change this if you really want, but adding that seems less readable to me. globalBeginLuminosityBlock and globalBeginLuminosityBlockProduce just run one immediately after the other. If you really want me to change this, let me know and I will. I don't feel strongly about it.
// The next two data members are thread safe and can be safely mutable because | ||
// they are only modified/read in globalBeginRun and globalBeginLuminosityBlock. | ||
mutable unsigned long long nGlobalBeginRuns_{0}; | ||
mutable unsigned long long nInitializedInGlobalLumi_{0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually nInitializedInGlobalLumiAfterNewRun_
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. That is a more accurate name. I will change it.
@@ -407,6 +449,9 @@ namespace edm { | |||
if (rCache->product_.compare_exchange_strong(expect, griproduct.get())) { | |||
griproduct.release(); | |||
} | |||
if (cache == useInLumi_.load()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If useInLumi_
is set to nullptr in globalBeginLuminosityBlockProduce
will this ever happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. streamEndLuminosityBlockSummary selects the stream cache to use in globalBeginLumi and that occurs before streamEndRun. So it will happen on the stream with the selected stream cache.
@cms-sw/generators-l2 Kind ping. |
I ran runTheMatrix.py with all the tests (not just the limited ones) and threading enabled. The tests all passed (except for 3 known and unrelated failures). That test also included PR #38801 and PR #39491. The run concurrency pull request is now only waiting on approval of these 3 PR's. We are hoping to get it merged into 12_6_X soon so we have time to gain experience with it before 12_6_0 is finalized. |
+1 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2) |
please test |
-1 Failed Tests: RelVals-THREADING RelVals-THREADING
Expand to see more relval errors ...Comparison SummarySummary:
|
Workflow failures are those that should get fixed with #39500. |
please test with #39500 |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-33da22/27803/summary.html Comparison SummarySummary:
|
+1 |
PR description:
Upgrade the code in this module to support concurrent runs. When support for concurrent runs is added to the Framework in the near future, it will become possible for streamBeginRun and streamEndRun to run concurrently with other transitions. There is special code in this module which would be broken by this additional concurrency.
This should not affect the output of this module or its externally visible behavior. Results should be the same before and after this is merged.
@Dr15Jones and @makortel Could you take a look at this also? This is more a technical change related to Framework support of concurrency than a generator issue.
PR validation:
Given that this does not change the output or behavior of this module, I am relying on existing tests. The unit tests pass. I also ran existing tests under the debugger and with print statements manually added to verify things were working as expected.
If there are any additional tests that generator experts have available, it would be great if someone could run them to verify that I didn't inadvertently break something.