Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade ConcurrentHadronizerFilter and ConcurrentGeneratorFilter to support concurrent runs #39374

Merged
merged 3 commits into from Sep 28, 2022

Conversation

wddgit
Copy link
Contributor

@wddgit wddgit commented Sep 12, 2022

PR description:

Upgrade the code in this module to support concurrent runs. When support for concurrent runs is added to the Framework in the near future, it will become possible for streamBeginRun and streamEndRun to run concurrently with other transitions. There is special code in this module which would be broken by this additional concurrency.

This should not affect the output of this module or its externally visible behavior. Results should be the same before and after this is merged.

@Dr15Jones and @makortel Could you take a look at this also? This is more a technical change related to Framework support of concurrency than a generator issue.

PR validation:

Given that this does not change the output or behavior of this module, I am relying on existing tests. The unit tests pass. I also ran existing tests under the debugger and with print statements manually added to verify things were working as expected.

If there are any additional tests that generator experts have available, it would be great if someone could run them to verify that I didn't inadvertently break something.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39374/32077

  • This PR adds an extra 20KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @wddgit (W. David Dagenhart) for master.

It involves the following packages:

  • GeneratorInterface/Core (generators)

@SiewYan, @mkirsano, @Saptaparna, @cmsbuild, @alberto-sanchez, @menglu21, @GurpreetSinghChahal can you please review it and eventually sign? Thanks.
@alberto-sanchez, @mkirsano this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@wddgit
Copy link
Contributor Author

wddgit commented Sep 12, 2022

please test

@makortel
Copy link
Contributor

@wddgit Could you add ConcurrentHadronizerFilter and ConcurrentGeneratorFilter to the PR title?

@wddgit wddgit changed the title Upgrade to support concurrent runs Upgrade ConcurrentHadronizerFilter and ConcurrentGeneratorFilter to support concurrent runs Sep 12, 2022
@wddgit
Copy link
Contributor Author

wddgit commented Sep 12, 2022

I modified the title. Thanks. That is better.

mutable std::atomic<unsigned long long> greatestNStreamEndLumis_{0};
mutable std::atomic<bool> streamEndRunComplete_{true};
// The next two data members are thread safe and can be safely mutable because
// they are only modified/read in globalBeginRun and globalBeginLuminosityBlock.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we (in principle) have globalBeginRun and globalBeginLuminosityBlock transitions being run simultaneously? (obviously the lumi would have to belong to a different run) Or two globalBeginRuns or two globalBeginLuminosityBlocks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting question. I would be interested in hearing Chris's (@Dr15Jones) opinion about it.

The existing code in CMSSW serializes globalBeginLumi transitions (and all run transitions). The globalBeginLumi transition completes before the Framework even looks to see what transition should come next. The run concurrency PR currently under review continues that pattern for globalBeginRun. globalBeginLumi and globalBeginRun are serialized like input source activity. They will not run concurrently. It would not be trivial to change this. It would take some effort just to identify the things that would break if we changed this behavior. I think it is built into the current Framework implementation in multiple places.

On the other hand, the TWIKI that documents the multithreading design does not require this serialization: https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkTransitions
Maybe we should add this requirement to this TWIKI documentation?

If we want to make this code support that kind of concurrency in addition to adding support for run concurrency then I need to do more work on this PR (and in multiple other places probably in separate PRs later).

My personal guess is that the performance improvement this additional concurrency would allow is not worth the effort that would be required to implement it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The global run/lumi transitions need to be serialized as the EventSetup system doesn't support concurrent determination of new IOVs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think we should document that the global run/lumi transitions are serialized (I didn't read the twiki David pointed to carefully enough to say whether it would be implied by the text there already or not)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an extra requirement in the TWIKI with this additional requirement on global begin transitions (in the bullet points under the figure, the next to last one is new). Feel free to edit it or let me know if you want me to reword it.

https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkTransitions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks David, looks good.

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-33da22/27486/summary.html
COMMIT: a8f007e
CMSSW: CMSSW_12_6_X_2022-09-12-1100/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/39374/27486/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 5 differences found in the comparisons
  • DQMHistoTests: Total files compared: 51
  • DQMHistoTests: Total histograms compared: 3618326
  • DQMHistoTests: Total failures: 8
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3618296
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 50 files compared)
  • Checked 212 log files, 49 edm output root files, 51 DQM output files
  • TriggerResults: no differences found

while (useInLumi_.load() == nullptr) {
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a comment here explaining why the extra wait?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

useInLumi_.store(nullptr);
gen::GenStreamCache<HAD, DEC>* streamCachePtr = this->streamCache(id);
if (this->luminosityBlockCache(lumi.index())->useInLumi_ != streamCachePtr) {
initLumi(streamCachePtr, lumi, es);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for clarity, should have have an else which sets the lumi cache useInLum_ to nullptr? It isn't strictly needed as the lumi cache will go away at the end of the luminosity block. But it might make the logic easier to understand and avoid possible uses of that cache later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, resetting the lumi cache useInLumi_ here would require the variable to become an atomic, which really isn't worth it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This transition will run multiple times (once per stream), possibly concurrently. useInLumi_ in the lumi cache is not atomic. Only reading it here is OK, but writing would create a data race. I could make it atomic, but it does not seem worth adding the overhead of an atomic. I'm not sure whether your proposed extra code clarifies or just leads the reader to wonder why one set it null... As is, this lumi cache value is written to in one line of code and read in only this one line code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have refreshed my window. I didn't see your second comment before my reply.

}

auto lumiCache = std::make_shared<gen::GenLumiCache<HAD, DEC>>();
lumiCache->useInLumi_ = useInLumi_.load();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If globalBeginLuminosityBlockProduce uses the lumi cache's useInLumi_ we could reset the member variable useInLumi_ here which might help understanding the logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is critically important that streamBeginLuminosityBlock uses the value from the lumi cache. That is the main reason I created the lumi cache and only using it for that purpose emphasizes that requirement.

For globalBeginLuminosityBlockProduce, we could use either value (the module data member or the lumi cache member). To me it just seems easier to use the class data member because it avoids the extra code to get the value out of the lumi cache. I'll change this if you really want, but adding that seems less readable to me. globalBeginLuminosityBlock and globalBeginLuminosityBlockProduce just run one immediately after the other. If you really want me to change this, let me know and I will. I don't feel strongly about it.

// The next two data members are thread safe and can be safely mutable because
// they are only modified/read in globalBeginRun and globalBeginLuminosityBlock.
mutable unsigned long long nGlobalBeginRuns_{0};
mutable unsigned long long nInitializedInGlobalLumi_{0};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually nInitializedInGlobalLumiAfterNewRun_?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. That is a more accurate name. I will change it.

@@ -407,6 +449,9 @@ namespace edm {
if (rCache->product_.compare_exchange_strong(expect, griproduct.get())) {
griproduct.release();
}
if (cache == useInLumi_.load()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If useInLumi_ is set to nullptr in globalBeginLuminosityBlockProduce will this ever happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. streamEndLuminosityBlockSummary selects the stream cache to use in globalBeginLumi and that occurs before streamEndRun. So it will happen on the stream with the selected stream cache.

@makortel
Copy link
Contributor

@cms-sw/generators-l2 Kind ping.

@wddgit
Copy link
Contributor Author

wddgit commented Sep 26, 2022

I ran runTheMatrix.py with all the tests (not just the limited ones) and threading enabled. The tests all passed (except for 3 known and unrelated failures). That test also included PR #38801 and PR #39491. The run concurrency pull request is now only waiting on approval of these 3 PR's. We are hoping to get it merged into 12_6_X soon so we have time to gain experience with it before 12_6_0 is finalized.

@wddgit
Copy link
Contributor Author

wddgit commented Sep 27, 2022

PR #39491 (the DQMStore fix) was merged last night and PR #38801 was approved by Core last night. The run concurrency PR is waiting only on this PR now.

@Saptaparna
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

please test
(let refresh the 11 days old tests before merging)

@cmsbuild
Copy link
Contributor

-1

Failed Tests: RelVals-THREADING
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-33da22/27801/summary.html
COMMIT: fac3b87
CMSSW: CMSSW_12_6_X_2022-09-27-1100/el8_amd64_gcc10
Additional Tests: THREADING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/39374/27801/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals-THREADING

  • 23234.023234.0_TTbar_14TeV+2026D49+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14INPUT+DigiTrigger+RecoGlobal+HARVESTGlobal/step3_TTbar_14TeV+2026D49+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14INPUT+DigiTrigger+RecoGlobal+HARVESTGlobal.log
  • 35034.035034.0_TTbar_14TeV+2026D77+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14INPUT+DigiTrigger+RecoGlobal+HARVESTGlobal/step3_TTbar_14TeV+2026D77+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14INPUT+DigiTrigger+RecoGlobal+HARVESTGlobal.log
  • 28234.028234.0_TTbar_14TeV+2026D60+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+DigiTrigger+RecoGlobal+HARVESTGlobal/step3_TTbar_14TeV+2026D60+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+DigiTrigger+RecoGlobal+HARVESTGlobal.log
Expand to see more relval errors ...

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 51
  • DQMHistoTests: Total histograms compared: 3624368
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3624343
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 50 files compared)
  • Checked 212 log files, 49 edm output root files, 51 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor

Workflow failures are those that should get fixed with #39500.

@perrotta
Copy link
Contributor

please test with #39500

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-33da22/27803/summary.html
COMMIT: fac3b87
CMSSW: CMSSW_12_6_X_2022-09-27-1100/el8_amd64_gcc10
Additional Tests: THREADING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/39374/27803/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 1 differences found in the comparisons
  • DQMHistoTests: Total files compared: 51
  • DQMHistoTests: Total histograms compared: 3624368
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3624343
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 50 files compared)
  • Checked 212 log files, 49 edm output root files, 51 DQM output files
  • TriggerResults: no differences found

@perrotta
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants