New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decrease begin Run startup time for HLT #29492
Conversation
Added the needed dictionaries.
When many output modules were used in the HLT job, the begin Run was completely dominated by the calculation of the ParameterSetBlobs. This allows an option to have the ParameterSetBlobs created once and then shared by all the output modules.
The code-checks are being triggered in jenkins. |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-29492/14723
|
A new Pull Request was created by @Dr15Jones (Chris Jones) for master. It involves the following packages: DataFormats/Common @perrotta, @smuzaffar, @Dr15Jones, @makortel, @emeschi, @cmsbuild, @slava77, @mommsen can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
please test |
The tests are being triggered in jenkins. |
@fwyzard FYI I found this from the configuration you sent me. |
thanks, looks like an interesting improvement ! |
In order to turn on the feature, I added the following to the test configuration I was using process.psetMap = cms.EDProducer("ParameterSetBlobProducer")
process.PhysicsMuonsOutput.associate(cms.Task(process.psetMap)) I.e. I added the EDProducer as a Task to one of the EndPaths, that made it available to all of the OutputModules. |
Yes, very interesting. Do I get it right that the whole configuration is stored as a blob in every output file ?
Is this really necessary ?
If the numbers quote by @Dr15Jones<https://github.com/Dr15Jones> are for a realistic HLT menu we should re-evaluate the actual startup time and see if we can say something new about the amount of input buffer needed in the HLT…
To be followed up
On 16 Apr 2020, at 17:08, Remi Mommsen <notifications@github.com<mailto:notifications@github.com>> wrote:
This sounds like a great improvement. Thanks for looking into this.
@smorovic<https://github.com/smorovic>, @emeschi<https://github.com/emeschi>, this could be of interest for you, too.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#29492 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABIXAXDGCBTTCM2JWVMF44LRM4NOVANCNFSM4MJWQVYQ>.
|
+1 |
Comparison job queued. |
@fwyzard thanks for the files. |
+core |
+1 |
@Dr15Jones |
I would add it to the |
I think Andrea's suggestion is a good one. It actually doesn't much matter since during Run and LuminosityBlock transitions modules are run in data dependency order, not in strict Path order. |
+1
|
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo (and backports should be raised in the release meeting by the corresponding L2) |
On a job with 4 threads I see a reduction in the startup time (measured from the The diff to the hlt configuration is attached. @Dr15Jones do you think this is coherent with your results ? |
There are two things happening
The work I'm trying to do for allowing EventSetup modules to run concurrently should also allow the framework to do better scheduling at begin Run. |
OK, I'll try again
|
I get similar result with BU-FU appliance. 116 s, 113 s (no blob producer module) Average improvement: 39 seconds. As far as I know, squid cache expiration is 30 seconds in HLT, and repeated attempts don't happen within that time frame. |
Is that only for the data with a short lifetime (i.e. IOVs) or also for the actual immutable payloads ? |
Another thing to keep in mind is when accessing data from the EventSetup the module takes a lock. The framework doesn't know about the lock so in multi-threading the framework can schedule multiple modules who all want the EventSetup lock so no progress gets made on some of the threads while waiting for the lock. (This is again why I'm working on running EventSetup modules concurrently and why we are adding 'consumes' to modules for EventSetup products.) The TimeReport does say how much time was spent waiting on the EventSetup lock. |
@fwyzard good question. It was discussed in context of lumi-based conditions and Dave Dykstra mentioned it, but did not specify if there is different setting for IOVs and payloads. |
For a single threaded job the time drops from For a multi threaded job the time drops from So, the startup time with the new approach is independent of using multiple threads, while the original approach had some benefit from it (as expected following the explanation by Chris). In any case, the improvement is impressive :-) |
+1 |
PR description:
Running igprof on an example HLT configuration uncovered that the vast amount of time spent in the begin Run transition was all the ShmStreamConsumer OutputModules calculating the ParameterSetBlobs to be stored in the files. This pull request adds the option to run ParameterSetBlobProducer at begin Run to create the ParameterSetBlobs once and then have all the OutputModules use that information. This decreased the time spent making the blobs from 147s in the orginal job to 4s.
PR validation:
The test configuration I was using ran fine both with and without ParameterSetBlobProducer added.