Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi thread lhe to increase GEN efficiency (master PR) #24653

Merged
merged 3 commits into from Nov 23, 2018

Conversation

perrozzi
Copy link
Contributor

originally opened as #24344

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @perrozzi for master.

It involves the following packages:

GeneratorInterface/LHEInterface

@alberto-sanchez, @cmsbuild, @qliphy, @perrozzi, @efeyazgan can you please review it and eventually sign? Thanks.
@alberto-sanchez, @agrohsje, @mkirsano this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@davidlange6
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 25, 2018

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/30610/console Started: 2018/09/25 14:54

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@perrozzi
Copy link
Contributor Author

started implementing what proposed in #24344 (comment)
need one more commit

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Pull request #24653 was updated. @alberto-sanchez, @cmsbuild, @qliphy, @perrozzi, @efeyazgan can you please check and sign again.

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@covarell
Copy link
Contributor

covarell commented Oct 3, 2018

On how many generators has this been tested? Is there at least a successful test in MG and one in POWHEG?

@fabiocos
Copy link
Contributor

@efeyazgan @qliphy @alberto-sanchez where do we stand with this PR?

@qliphy
Copy link
Contributor

qliphy commented Oct 23, 2018

@fabiocos

We had a MG26X meeting last week and decide to merge first to 260 in next few weeks, and then hopefully can update to more recent version more quickly. The multi thread option involves also new features in MG263(+), as you can see in a related PR by Petr: cms-sw/genproductions#1935

So this is on going and need more tests after we merging to 260 from 242.

@qliphy
Copy link
Contributor

qliphy commented Oct 23, 2018

MG26X meeting on Oct 16, where you can find more info from the gdoc:
https://indico.cern.ch/event/746830/

@perrozzi
Copy link
Contributor Author

ciao, this PR is completely independent from all the other developments.
It adds a script, named run_generic_tarball_cvmfs_manual_multithread.sh
which is never user unless explicitly called in the ExternalLHEProducer.
In fact, the default script used is
scriptName = cms.FileInPath('GeneratorInterface/LHEInterface/data/run_generic_tarball_cvmfs.sh')
cfr
https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_fragment/B2G-RunIIFall17wmLHEGS-00003
although in cmssw there is still an old default name (should be probably updated) https://github.com/cms-sw/cmssw/blob/master/GeneratorInterface/LHEInterface/python/ExternalLHEProducer_cfi.py#L4

@perrozzi
Copy link
Contributor Author

in fact, this development is not only valid for madgraph, but also for powheg and any other generator using gridpacks through the ExternalLHEProducer interface

@perrozzi
Copy link
Contributor Author

ping

@efeyazgan
Copy link
Contributor

this should be now ready to be reviewed. I tested it and it does what promised:
using this script and specifying a number of cores > 1, a corresponding number of processes will be launched in parallel, produce event independently and then merged.
the handling of the random seeds is somehow sloppy rnum=$((rnum+10)) as for the multiple iterations used in mg5_amc to produce as many events are required. a proposal from Tommaso long time ago was to multiply the seed by 10 in central production.
the overhead to untar the gridpacks as many times as needed is not negligible for mg5_amc, as well as the time used to cleanup these directories after producing LHE files (to free up space in view of the GEN-SIM step). this might make difficult to setup the time/ev in McM... (things will become more effective with the so called read-only gridpacks)
for powheg should work much smoother already as of now.

Comments and questions are welcome.

Hi Luca (@perrozzi ), how did you test this? Could you please give some details (which MC, which system, what parameters etc)? Before moving further, we would like to test it using several different cases with MG5_aMC and Powheg (and may be other) at LO, NLO, etc. thanks.

@perrozzi
Copy link
Contributor Author

perrozzi commented Nov 2, 2018

Ciao Efe, I tested it by modifying the parameter scriptName of the ExternalLHEProducer as
scriptName = cms.FileInPath('GeneratorInterface/LHEInterface/data/run_generic_tarball_cvmfs_manual_multithread.sh')
I used a few prepids with madgraph (for instance SMP-RunIIFall18wmLHEGS-00002) and POWHEG on lxplus. Let me know if you need other info.

@fabiocos
Copy link
Contributor

@perrozzi @efeyazgan @qliphy where do we stay with this PR?

@efeyazgan
Copy link
Contributor

@fabiocos , we still need to test it with several different configurations, however, it is ready to be merged as @perrozzi commented "this PR is completely independent from all the other developments. It adds a script, named run_generic_tarball_cvmfs_manual_multithread.sh
which is never user unless explicitly called in the ExternalLHEProducer."
It can't break anything.

@fabiocos
Copy link
Contributor

@efeyazgan thank you for the update, anyway I would prefer to merge something that is useful, not that just "does not break the toy". So I expect that you will sign the PR when all the pending tests will be completed.

@perrozzi
Copy link
Contributor Author

imho this kind of improvements should go in the release as soon as they work, because the extensive tests can be very quickly performed using the mcm validation, with a single click for each set of parameters, and computing can decide to put this option in production at any moment in time to increase the cpu efficiency...

@fabiocos
Copy link
Contributor

@perrozzi it depends on which level of test we are discussing, fixing some detail or probing basic functionalities. Anyway generators have not yet signed...

@efeyazgan
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

@perrozzi
Copy link
Contributor Author

@perrozzi it depends on which level of test we are discussing, fixing some detail or probing basic functionalities. Anyway generators have not yet signed...

I would define them as "details". to decide which request can successfully use this feature, I feel like the only fully reliable way is to validate the request with McM, to use "central production-like" setup (although the I/O can depend by many different external/random conditions).

@fabiocos
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants