New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run modules concurrently during global begin transitions #18451
Run modules concurrently during global begin transitions #18451
Conversation
Test getting products that are only made at end transitions during begin transitions and during event.
Changed the interface to modules to allow querying about which data products it consumes for all transition types.
In the future, the Principal will need to know if it is at an end Run or end LuminosityBlock transition so we prefetch items that are only made at the end transition on the end transition.
Prefetching will be used for non-Event transitions so we need to be certain to only send signals from the ActivityRegistry meant for the Event during Event processing.
Requesting a data product without using a process name can result in different results based on if the request happens before the end transition. The reason is if the module in the job only puts its data into the Principal at an end transition but the source contains a related data product from a previous process. Requesting before the end transition would return the previous process data product, while waiting to request at end will return the newly created data product. To accommodate prefetching of data products from Run and LuminosityBlocks, we need to reset the cached lookup information at end transition to allow the newly requested item to be obtained.
Use new async version for global begin Run and Luminosity transitions.
Extended the concurrent running of modules on global begin transitions to SubProcesses. Child SubProcesses are also run concurrently.
A new Pull Request was created by @Dr15Jones (Chris Jones) for master. It involves the following packages: FWCore/Framework @cmsbuild, @smuzaffar, @Dr15Jones, @davidlange6 can you please review it and eventually sign? Thanks. cms-bot commands are listed here #13028 |
please test |
The tests are being triggered in jenkins. |
-1 Tested at: d677356 You can see the results of the tests here: I found follow errors while testing this PR Failed tests: Build
I found an error when building: gmake[1]: Target 'PostBuild' not remade because of errors. gmake[1]: Leaving directory '/build/cmsbld/jenkins-workarea/workspace/ib-any-integration/CMSSW_9_1_X_2017-04-24-1100' config/SCRAM/GMake/Makefile.rules:2035: recipe for target 'src' failed gmake: *** [src] Error 2 gmake: Target 'all' not remade because of errors. gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 2 |
Comparison not run due to Build errors (RelVals and Igprof tests were also skipped) |
The build failed because of a problem in a python init.py file generated by scram. |
please test |
The RelVal failures are all from xrootd socket timeouts and are not caused by the changes in this pull request. |
The AddOnTest failures are all from xrootd errors and are not caused by the changes in this pull request. |
+1 |
This pull request is fully signed and it will be integrated in one of the next master IBs (but tests are reportedly failing). This pull request requires discussion in the ORP meeting before it's merged. @Muzaffar, @davidlange6, @smuzaffar |
I rebuild CMSSW using this change and ran the full runTheMatrix.py. The only errors were ones that occur already in the IBs. |
please test |
The tests are being triggered in jenkins. |
From my run of the full runTheMatrix there were no problems, but I'll rerun the tests to see if we can get comparisons out. |
Comparison job queued. |
Comparison is ready Comparison Summary:
|
@davidlange6 given the unforseen thread-safety problem with LumiProducer (the static analyzer doesn't catch it) do you want to roll back this change? |
To be safe I will |
Modules are run concurrently during global begin Run and LuminosityBlock transitions.
We do not yet run concurrently during global end transitions because that would break the use of the DQMStore.