Decrease begin Run startup time for HLT #29492

Dr15Jones · 2020-04-16T14:02:37Z

PR description:

Running igprof on an example HLT configuration uncovered that the vast amount of time spent in the begin Run transition was all the ShmStreamConsumer OutputModules calculating the ParameterSetBlobs to be stored in the files. This pull request adds the option to run ParameterSetBlobProducer at begin Run to create the ParameterSetBlobs once and then have all the OutputModules use that information. This decreased the time spent making the blobs from 147s in the orginal job to 4s.

PR validation:

The test configuration I was using ran fine both with and without ParameterSetBlobProducer added.

Added the needed dictionaries.

When many output modules were used in the HLT job, the begin Run was completely dominated by the calculation of the ParameterSetBlobs. This allows an option to have the ParameterSetBlobs created once and then shared by all the output modules.

cmsbuild · 2020-04-16T14:03:02Z

The code-checks are being triggered in jenkins.

cmsbuild · 2020-04-16T14:09:28Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-29492/14723

This PR adds an extra 40KB to repository
There are other open Pull requests which might conflict with changes you have proposed:
- File DataFormats/Provenance/src/classes_def.xml modified in PR(s): From cmssw 10 2 21 #29490

cmsbuild · 2020-04-16T14:09:48Z

A new Pull Request was created by @Dr15Jones (Chris Jones) for master.

It involves the following packages:

DataFormats/Common
DataFormats/Provenance
EventFilter/Utilities
IOPool/Streamer

@perrotta, @smuzaffar, @Dr15Jones, @makortel, @emeschi, @cmsbuild, @slava77, @mommsen can you please review it and eventually sign? Thanks.
@makortel, @Martin-Grunewald, @rovere, @wddgit this is something you requested to watch as well.
@silviodonato, @dpiparo you are the release manager for this.

cms-bot commands are listed here

Dr15Jones · 2020-04-16T14:11:57Z

please test

cmsbuild · 2020-04-16T14:12:31Z

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/5720/console Started: 2020/04/16 16:13

Dr15Jones · 2020-04-16T14:14:34Z

@fwyzard FYI I found this from the configuration you sent me.

fwyzard · 2020-04-16T14:22:09Z

thanks, looks like an interesting improvement !

Dr15Jones · 2020-04-16T15:04:34Z

In order to turn on the feature, I added the following to the test configuration I was using

process.psetMap = cms.EDProducer("ParameterSetBlobProducer")
process.PhysicsMuonsOutput.associate(cms.Task(process.psetMap))

I.e. I added the EDProducer as a Task to one of the EndPaths, that made it available to all of the OutputModules.

mommsen · 2020-04-16T15:08:08Z

This sounds like a great improvement. Thanks for looking into this.
@smorovic, @emeschi, this could be of interest for you, too.

emeschi · 2020-04-16T15:26:41Z

Yes, very interesting. Do I get it right that the whole configuration is stored as a blob in every output file ? Is this really necessary ? If the numbers quote by @Dr15Jones<https://github.com/Dr15Jones> are for a realistic HLT menu we should re-evaluate the actual startup time and see if we can say something new about the amount of input buffer needed in the HLT… To be followed up On 16 Apr 2020, at 17:08, Remi Mommsen <notifications@github.com<mailto:notifications@github.com>> wrote: This sounds like a great improvement. Thanks for looking into this. @smorovic<https://github.com/smorovic>, @emeschi<https://github.com/emeschi>, this could be of interest for you, too. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#29492 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABIXAXDGCBTTCM2JWVMF44LRM4NOVANCNFSM4MJWQVYQ>.

cmsbuild · 2020-04-16T15:28:52Z

+1
Tested at: 2ea92c1
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-30d713/5720/summary.html
CMSSW: CMSSW_11_1_X_2020-04-16-1100
SCRAM_ARCH: slc7_amd64_gcc820

cmsbuild · 2020-04-16T15:28:56Z

Comparison job queued.

smorovic · 2020-04-16T17:36:30Z

@fwyzard thanks for the files.
We should install 11_1_0_pre6 in Hilton and BU appliance that I use for tests as soon as it's available. It was being prepared this morning so we can expect it soon.
I can run tests with scripts in BU appliance. It is equivalent to Hilton tests so we can compare (but I expect the same conclusion). In appliance test we can see init period from the microstate plot. Event sample won't matter since this all comes before first event.

makortel · 2020-04-16T18:56:05Z

+core

mommsen · 2020-04-17T06:45:52Z

+1

Martin-Grunewald · 2020-04-17T08:40:26Z

@Dr15Jones
Since ConfDB is not yet task-aware, how should this new producer be included in a menu?
Adding it to each EndPath with an OutputModule?

fwyzard · 2020-04-17T09:01:12Z

I would add it to the HLTriggerFirstPath, together with hltGetConditions and hltGetRaw .

Dr15Jones · 2020-04-17T12:52:04Z

I would add it to the HLTriggerFirstPath, together with hltGetConditions and hltGetRaw .

I think Andrea's suggestion is a good one. It actually doesn't much matter since during Run and LuminosityBlock transitions modules are run in data dependency order, not in strict Path order.

perrotta · 2020-04-17T13:08:15Z

+1

It adds the option to run ParameterSetBlobProducer at begin Run to create the ParameterSetBlobs once, and then have all the OutputModules use that information: this may speed up beginRun startup time for HLT: that option is not enabled by default
Reco only affected by the needed modifications (new parameter added) in EventFilter/Utilities/EvFOutputModule
Jenkins tests pass

cmsbuild · 2020-04-17T13:08:40Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo (and backports should be raised in the release meeting by the corresponding L2)

fwyzard · 2020-04-17T13:51:42Z

On a job with 4 threads I see a reduction in the startup time (measured from the %MSG-i ThreadStreamSetup [diff.txt](https://github.com/cms-sw/cmssw/files/4493310/diff.txt) pre-events message to the Begin processing the 1st record message) from 106 ± 2 s to 62 ± 2 s.

The diff to the hlt configuration is attached.

@Dr15Jones do you think this is coherent with your results ?

Dr15Jones · 2020-04-17T14:56:42Z

do you think this is coherent with your results ?

There are two things happening

when using the old code under multi-threading several output modules could run concurrently at begin Run and just doing the exact same work so the biggest benefit happens at 1 thread in a job (since everything is serialized and the doing the same work over and over hurts the most). When using multiple threads, if the framework can find more work to do (based on data dependencies) then you can get a bigger 'win'.
conditions access can become the dominant timing for begin Run (I saw large variations in timing because of different response time from the frontier servers).

The work I'm trying to do for allowing EventSetup modules to run concurrently should also allow the framework to do better scheduling at begin Run.

fwyzard · 2020-04-17T15:09:48Z

OK, I'll try again

with a single threaded job
with a multi-threaded job
with a single threaded job, without any actual events and conditions data
with a multi-threaded job, without any actual events and conditions data

smorovic · 2020-04-17T15:10:08Z

I get similar result with BU-FU appliance.
With12 FUs, 4-streams/thread processes (1 thread per hyperthread). I did two tries each case.

116 s, 113 s (no blob producer module)
73 s, 78 s (module added)

Average improvement: 39 seconds.
Note: one HLT machine took longer than others, but only in "no blob producer" case. I ignored it for the result.

As far as I know, squid cache expiration is 30 seconds in HLT, and repeated attempts don't happen within that time frame.

fwyzard · 2020-04-17T15:11:03Z

As far as I know, squid cache expiration is 30 seconds in HLT, and repeated attempts don't happen within that time frame.

Is that only for the data with a short lifetime (i.e. IOVs) or also for the actual immutable payloads ?

Dr15Jones · 2020-04-17T15:15:15Z

Another thing to keep in mind is when accessing data from the EventSetup the module takes a lock. The framework doesn't know about the lock so in multi-threading the framework can schedule multiple modules who all want the EventSetup lock so no progress gets made on some of the threads while waiting for the lock. (This is again why I'm working on running EventSetup modules concurrently and why we are adding 'consumes' to modules for EventSetup products.) The TimeReport does say how much time was spent waiting on the EventSetup lock.

smorovic · 2020-04-17T15:19:43Z

@fwyzard good question. It was discussed in context of lumi-based conditions and Dave Dykstra mentioned it, but did not specify if there is different setting for IOVs and payloads.

fwyzard · 2020-04-17T16:16:47Z

For a single threaded job the time drops from 150 ± 20s (best: 138s) to 63 ± 1s (best: 62s), with a gain of -87 ± 20s (best: -76s).

For a multi threaded job the time drops from 107 ± 3s (best: 103s) to 62 ± 2s (best: 60s), with a gain of -45 ± 3s (best: -43s).

So, the startup time with the new approach is independent of using multiple threads, while the original approach had some benefit from it (as expected following the explanation by Chris).

In any case, the improvement is impressive :-)

silviodonato · 2020-04-17T16:33:25Z

+1
wow! Impressive improvement!

Dr15Jones added 3 commits April 16, 2020 08:24

Allow ParameterSetBlob map to be stored in Run

f18482d

Added the needed dictionaries.

Created ParameterSetBlobProducer

9de1df1

cmsbuild added this to the CMSSW_11_1_X milestone Apr 16, 2020

cmsbuild added code-checks-pending comparison-pending core-pending daq-pending orp-pending pending-signatures reconstruction-pending tests-pending labels Apr 16, 2020

cmsbuild added code-checks-approved and removed code-checks-pending labels Apr 16, 2020

cmsbuild added tests-started and removed tests-pending labels Apr 16, 2020

cmsbuild added tests-approved and removed tests-started labels Apr 16, 2020

cmsbuild added the comparison-available label Apr 16, 2020

cmsbuild added core-approved and removed core-pending labels Apr 16, 2020

cmsbuild added daq-approved and removed daq-pending labels Apr 17, 2020

cmsbuild added fully-signed reconstruction-approved and removed pending-signatures reconstruction-pending labels Apr 17, 2020

cmsbuild added orp-approved and removed orp-pending labels Apr 17, 2020

cmsbuild merged commit 3548ccd into cms-sw:master Apr 17, 2020

Dr15Jones deleted the fastPSetStorage branch April 21, 2020 14:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decrease begin Run startup time for HLT #29492

Decrease begin Run startup time for HLT #29492

Dr15Jones commented Apr 16, 2020

cmsbuild commented Apr 16, 2020

cmsbuild commented Apr 16, 2020

cmsbuild commented Apr 16, 2020

Dr15Jones commented Apr 16, 2020

cmsbuild commented Apr 16, 2020 •

edited

Dr15Jones commented Apr 16, 2020

fwyzard commented Apr 16, 2020

Dr15Jones commented Apr 16, 2020

mommsen commented Apr 16, 2020

emeschi commented Apr 16, 2020 via email

cmsbuild commented Apr 16, 2020

cmsbuild commented Apr 16, 2020

smorovic commented Apr 16, 2020

makortel commented Apr 16, 2020

mommsen commented Apr 17, 2020

Martin-Grunewald commented Apr 17, 2020

fwyzard commented Apr 17, 2020

Dr15Jones commented Apr 17, 2020

perrotta commented Apr 17, 2020

cmsbuild commented Apr 17, 2020

fwyzard commented Apr 17, 2020

Dr15Jones commented Apr 17, 2020

fwyzard commented Apr 17, 2020

smorovic commented Apr 17, 2020

fwyzard commented Apr 17, 2020

Dr15Jones commented Apr 17, 2020

smorovic commented Apr 17, 2020

fwyzard commented Apr 17, 2020 •

edited

silviodonato commented Apr 17, 2020

Decrease begin Run startup time for HLT #29492

Decrease begin Run startup time for HLT #29492

Conversation

Dr15Jones commented Apr 16, 2020

PR description:

PR validation:

cmsbuild commented Apr 16, 2020

cmsbuild commented Apr 16, 2020

cmsbuild commented Apr 16, 2020

Dr15Jones commented Apr 16, 2020

cmsbuild commented Apr 16, 2020 • edited

Dr15Jones commented Apr 16, 2020

fwyzard commented Apr 16, 2020

Dr15Jones commented Apr 16, 2020

mommsen commented Apr 16, 2020

emeschi commented Apr 16, 2020 via email

cmsbuild commented Apr 16, 2020

cmsbuild commented Apr 16, 2020

smorovic commented Apr 16, 2020

makortel commented Apr 16, 2020

mommsen commented Apr 17, 2020

Martin-Grunewald commented Apr 17, 2020

fwyzard commented Apr 17, 2020

Dr15Jones commented Apr 17, 2020

perrotta commented Apr 17, 2020

cmsbuild commented Apr 17, 2020

fwyzard commented Apr 17, 2020

Dr15Jones commented Apr 17, 2020

fwyzard commented Apr 17, 2020

smorovic commented Apr 17, 2020

fwyzard commented Apr 17, 2020

Dr15Jones commented Apr 17, 2020

smorovic commented Apr 17, 2020

fwyzard commented Apr 17, 2020 • edited

silviodonato commented Apr 17, 2020

cmsbuild commented Apr 16, 2020 •

edited

fwyzard commented Apr 17, 2020 •

edited