New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent ExternalLHEProducer #28899
Concurrent ExternalLHEProducer #28899
Conversation
If requested, use multiple tbb::tasks where each task runs the script. Each task is assigned its own directory to run the script. We then read the generated LHE files sequentially during event processing.
The code-checks are being triggered in jenkins. |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28899/13685
|
A new Pull Request was created by @Dr15Jones (Chris Jones) for master. It involves the following packages: GeneratorInterface/LHEInterface @SiewYan, @efeyazgan, @mkirsano, @cmsbuild, @agrohsje, @alberto-sanchez, @qliphy can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
please test |
The tests are being triggered in jenkins. |
+1 |
Comparison job queued. |
Comparison is ready Comparison Summary:
|
@Dr15Jones Thanks. Looks interesting! Can you give an example how to test this? Will the LHE produced in each thread be merged at last? Current gridpacks (MadGraph, Powheg...) doesn't support multi-thread yet, so does this mean it needs to copy and tar/untar gridpacks multi times? |
Hi, I tried to produce a couple of Powheg ttbar events, it seems to work :) I am just wondering about the use of consecutive random seeds. These would overlap when consecutive job numbers are used as seed. How is it done in production? Consecutive or "random" random seeds? @qliphy Yes, the gridpack is unpacked and used multiple times in directories I am using this fragment:
|
@intrepid42 already answered these but just for completeness.
You can take any existing configuration and add
Yes
Yes Now as for the random numbers
The framework already makes use of consecutive random numbers to handle random number assignments for threading. The product system assigns a single random number seed to each module in a configuration. Then for each stream in a job, the random number seed is modifed by adding the stream index (a number from 0 to #streams -1). |
Thanks @Dr15Jones @intrepid42 (1). curl -s --insecure https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_fragment/SMP- cmsDriver.py Configuration/GenProduction/python/SMP-RunIISummer19UL17wmLHEGEN-00001-fragment.py --fileout file:SMP-RunIISummer19UL17wmLHEGEN-00001.root --mc --eventcontent RAWSIM,LHE --datatier GEN,LHE --conditions 106X_mc2017_realistic_v6 --beamspot Realistic25ns13TeVEarly2017Collision --step LHE,GEN --geometry DB:Extended --era Run2_2017 --python_filename SMP-RunIISummer19UL17wmLHEGEN-00001_1_cfg.py --no_exec --customise Configuration/DataProcessing/Utils.addMonitoring --customise_commands (2) same as (1), I indeed see 4 thread running, and each with a gridpack untarring and running. Probably this is due to this example is with very complicated physics process. I will have a look with a simple case. The output of (1):
The output of (2)
|
Thanks for doing the study! If we look just at the single threaded case we see that the CPU efficiency is poor
I was able to replicate what you ran and by watching the job with 'top' I was able to see that the generation step is incredibly CPU inefficient. It does lots of work with 'tar' and even when 'madevent' is running, that application is also very CPU inefficient. So given that the single threaded job is so IO bound, it stands to reason that running 4 IO bound jobs (which is what the changes to externalLHEProducer replicates) would be slower since the 4 applications are fighting each other for the disk resources. There is one thing to keep in mind, the workflow management system will not just run 1 single threaded GEN job on a node as a batch slot typically has 8 cores assigned to it. Instead, the workflow system is likely to assign 8 single threaded GEN jobs to a node. Therefore a better test would be to compare the 4 threaded case to the time it takes to run 4 single threaded jobs on the node. As a test, I ran the configuration you created under 3 different conditions
The results were as follows. Single threaded 4 threads with only ExternalLHEProducer 4 threads with ExternalLHEProducer and Pythia8ConcurrentHadronizerFiter So using all the changes did process the code 3134.77/1699.53 = 1.8 times faster than the single threaded case. |
As a test, I ran 4 single-threaded jobs concurrently. The timing was Wallclock Total loop: 3737.87 Wallclock Total loop: 3663.9 Wallclock Total loop: 3780.75 So much slower than the case of just 1 job. Now this did process 4x as many events as the previous test but does show that multiple IO bound jobs to interfere with one another. |
Thanks @Dr15Jones It makes sense. One question, You got two single thread numbers: (1) (2) The efficiency differ a lot. Why? |
So we each ran them on different machines. It looks like one machine has a much faster CPU than the other. If the IO systems on the two machines were about the same then the slower CPU machine would have spent a proportionally less time doing IO than the fast machine. That would mean the faster machine would have a proportionally worse CPU utilization. |
What further checks/code do you want in order to finish this pull request? |
This PR looks good to me. Probably we can check with not only GENonly but also GEN+SIM, but that can be done later. |
+1 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @silviodonato, @fabiocos (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
Backported from cms-sw#28899 Note: the macro CMSSW_SA_ALLOW is disabled.
Backported from cms-sw#28899 Note: a unit test using TestProcessor::testEndRun() is disabled.
+1 |
@silviodonato I guess you meant to +1 to #30481 instead? |
Concurrent ExternalLHEProducer (backport #28899)
PR description:
Add the ability to use multiple TBB tasks to execute the script concurrently during the begin run phase.
PR validation:
The code compiles and the new unit tests succeed. In addition, I ran the code using an example job I had been given. If the new parameter is not used, the old code is used.