Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConcurrentHadronizerFilter #28913

Merged
merged 7 commits into from Feb 19, 2020
Merged

Conversation

Dr15Jones
Copy link
Contributor

PR description:

Created the ConcurrentHadronizerFilter. This templated class is similar to HadronizerFilter except it is thread-safe and can run the Hadronizer concurrently for different events.

The only hadronizer that has been instantiated is Pythia8 using a dummy decayer class, ConcurrentExternalDecayDriver.

PR validation:

The code was tested using a production workflow snippet where the Pythia8Hadronizer was replaced with the ConcurrentPythia8Hadronizer. In the snippet, no external decayer was begin specified.

Dr15Jones and others added 6 commits February 10, 2020 17:21
This global module replicates the Hadronizer for each stream in
order to run them concurrently. This only works for thread-friendly
hadronizers, decayers and filters.
This is meant to give access to thread-friendly decayers. At the
moment non such exist so if used the object will throw an exception.
This makes use of the ConcurrentHadronizerFilter.
Use identical clones of the random engine in order to setup the
hadronizer and decayer on each LuminosityBlock boundary.
@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28913/13703

  • This PR adds an extra 28KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @Dr15Jones (Chris Jones) for master.

It involves the following packages:

GeneratorInterface/Core
GeneratorInterface/ExternalDecays
GeneratorInterface/Pythia8Interface

@SiewYan, @efeyazgan, @mkirsano, @cmsbuild, @agrohsje, @alberto-sanchez, @qliphy can you please review it and eventually sign? Thanks.
@alberto-sanchez, @agrohsje, @mkirsano this is something you requested to watch as well.
@davidlange6, @silviodonato, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@Dr15Jones
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 10, 2020

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/4584/console Started: 2020/02/10 21:06

@cmsbuild
Copy link
Contributor

-1

Tested at: d49909b

CMSSW: CMSSW_11_1_X_2020-02-10-1100
SCRAM_ARCH: slc7_amd64_gcc820
You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-98755c/4584/summary.html

I found follow errors while testing this PR

Failed tests: ClangBuild

  • Clang:

I found compilation error while trying to compile with clang. Command used:

USER_CUDA_FLAGS='--expt-relaxed-constexpr' USER_CXXFLAGS='-Wno-register -fsyntax-only' scram build -k -j 32 COMPILER='llvm compile'

                 ^
/cvmfs/cms-ib.cern.ch/nweek-02615/slc7_amd64_gcc820/external/pythia8/243/include/Pythia8/SpaceShower.h:146:18: note: hidden overloaded virtual function 'Pythia8::SpaceShower::getSplittingProb' declared here: different number of parameters (5 vs 7)
  virtual double getSplittingProb( const Event& , int , int , int , string )
                 ^
In file included from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_11_1_X_2020-02-10-1100/src/GeneratorInterface/Pythia8Interface/plugins/Pythia8Hadronizer.cc:61:
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_11_1_X_2020-02-10-1100/src/GeneratorInterface/Core/interface/ConcurrentHadronizerFilter.h:139:20: error: 'callWhenNewProductsRegistered' following the 'template' keyword does not refer to a template
    this->template callWhenNewProductsRegistered([ptrThis](BranchDescription const& iBD) {
                   ^
/cvmfs/cms-ib.cern.ch/nweek-02615/slc7_amd64_gcc820/external/gcc/8.2.0-pafccj/lib/gcc/x86_64-unknown-linux-gnu/8.3.1/../../../../include/c++/8.3.1/bits/unique_ptr.h:831:34: note: in instantiation of member function 'edm::ConcurrentHadronizerFilter::ConcurrentHadronizerFilter' requested here
    { return unique_ptr<_Tp>(new _Tp(std::forward<_Args>(__args)...)); }
                                 ^


@cmsbuild
Copy link
Contributor

Comparison not run due to Build errors/Fireworks only changes/No short matrix requested (RelVals and Igprof tests were also skipped)

@Dr15Jones
Copy link
Contributor Author

The test used the following

gen_cff.py

import FWCore.ParameterSet.Config as cms

externalLHEProducer = cms.EDProducer("ExternalLHEProducer",
    args = cms.vstring('/cvmfs/cms.cern.ch/phys_generator/gridpacks/2017/13TeV/powheg/V2/TT_hvq/patched/TT_hdamp_NNPDF31_NNLO_inclusive_patched_reducedPDFWeights.tgz'),
    nEvents = cms.untracked.uint32(5000),
    numberOfParameters = cms.uint32(1),
    outputFile = cms.string('cmsgrid_final.lhe'),
    scriptName = cms.FileInPath('GeneratorInterface/LHEInterface/data/run_generic_tarball_cvmfs.sh')
)

#Link to datacards:
#https://github.com/cms-sw/genproductions/blob/master/bin/Powheg/production/2017/13TeV/TT_hvq/TT_hdamp_NNPDF31_NNLO_inclusive.input

import FWCore.ParameterSet.Config as cms
from Configuration.Generator.Pythia8CommonSettings_cfi import *
from Configuration.Generator.MCTunes2017.PythiaCP5Settings_cfi import *
from Configuration.Generator.Pythia8PowhegEmissionVetoSettings_cfi import *
from Configuration.Generator.PSweightsPythia.PythiaPSweightsSettings_cfi import *

#generator = cms.EDFilter("Pythia8HadronizerFilter",
generator = cms.EDFilter("Pythia8ConcurrentHadronizerFilter",
maxEventsToPrint = cms.untracked.int32(1),
pythiaPylistVerbosity = cms.untracked.int32(1),
filterEfficiency = cms.untracked.double(1.0),
pythiaHepMCVerbosity = cms.untracked.bool(False),
comEnergy = cms.double(13000.),
PythiaParameters = cms.PSet(
pythia8CommonSettingsBlock,
pythia8CP5SettingsBlock,
pythia8PowhegEmissionVetoSettingsBlock,
pythia8PSweightsSettingsBlock,
processParameters = cms.vstring(
        'POWHEG:nFinal = 2', ## Number of final state particles
        ## (BEFORE THE DECAYS) in the LHE
        ## other than emitted extra parton
        'TimeShower:mMaxGamma = 1.0',#cutting off lepton-pair production
        ##in the electromagnetic shower
        ##to not overlap with ttZ/gamma* samples
        '6:m0 = 172.5',    # top mass'
),
parameterSets = cms.vstring('pythia8CommonSettings',
'pythia8CP5Settings',
'pythia8PowhegEmissionVetoSettings',
'pythia8PSweightsSettings',
'processParameters'
)
)
)

genParticlesForFilter = cms.EDProducer("GenParticleProducer",
    abortOnUnknownPDGCode = cms.untracked.bool(False),
    saveBarCodes = cms.untracked.bool(True),
    src = cms.InputTag("generator", "unsmeared")
)

genParticlesForjetsForFilter = cms.EDProducer("InputGenJetsParticleSelector",
    excludeFromResonancePids = cms.vuint32(12, 13, 14, 16),
    excludeResonances = cms.bool(False),
    ignoreParticleIDs = cms.vuint32(1000022, 1000012, 1000014, 1000016, 2000012, 
        2000014, 2000016, 1000039, 5100039, 4000012, 
        4000014, 4000016, 9900012, 9900014, 9900016, 
        39),
    partonicFinalState = cms.bool(False),
    src = cms.InputTag("genParticlesForFilter"),
    tausAsJets = cms.bool(False)
)

ak8GenJetsForFilter = cms.EDProducer("FastjetJetProducer",
    Active_Area_Repeats = cms.int32(5),
    GhostArea = cms.double(0.01),
    Ghost_EtaMax = cms.double(6.0),
    Rho_EtaMax = cms.double(4.5),
    doAreaFastjet = cms.bool(False),
    doPUOffsetCorr = cms.bool(False),
    doPVCorrection = cms.bool(False),
    doRhoFastjet = cms.bool(False),
    inputEMin = cms.double(0.0),
    inputEtMin = cms.double(0.0),
    jetAlgorithm = cms.string('AntiKt'),
    jetPtMin = cms.double(3.0),
    jetType = cms.string('GenJet'),
    maxBadEcalCells = cms.uint32(9999999),
    maxBadHcalCells = cms.uint32(9999999),
    maxProblematicEcalCells = cms.uint32(9999999),
    maxProblematicHcalCells = cms.uint32(9999999),
    maxRecoveredEcalCells = cms.uint32(9999999),
    maxRecoveredHcalCells = cms.uint32(9999999),
    minSeed = cms.uint32(14327),
    nSigmaPU = cms.double(1.0),
    rParam = cms.double(0.8),
    radiusPU = cms.double(0.5),
    src = cms.InputTag("genParticlesForjetsForFilter"),
    srcPVs = cms.InputTag(""),
    useDeterministicSeed = cms.bool(True)
)

genHTFilter = cms.EDFilter("GenHTFilter",
    genHTcut = cms.double(649.0),
    jetEtaCut = cms.double(1000.0),
    jetPtCut = cms.double(650.0),
    src = cms.InputTag("ak8GenJetsForFilter")
)

LHEJetFilter = cms.EDFilter("LHEJetFilter",
    jetPtMin = cms.double(350.0),
    jetR = cms.double(0.8),
    src = cms.InputTag("externalLHEProducer")
)

ProductionFilterSequence = cms.Sequence(LHEJetFilter*generator*genParticlesForFilter*genParticlesForjetsForFilter*ak8GenJetsForFilter*genHTFilter)

test_cfg.py

import FWCore.ParameterSet.Config as cms

process = cms.Process("GEN")

process.load("gen_cff")
process.load('IOMC.RandomEngine.IOMC_cff')
process.load('SimGeneral.HepPDTESSource.pythiapdt_cfi')

process.source = cms.Source("EmptySource")

process.maxEvents.input = 5000

process.options.wantSummary = True
process.options.numberOfThreads = 4

process.p = cms.Path(process.ProductionFilterSequence, cms.Task(process.externalLHEProducer))

@Dr15Jones
Copy link
Contributor Author

It would be possible to allow our present Decayers to work with this class. I can see two ways to do it

  1. we use the code from FWCore/SharedMemory to run the decayers in a separate process. The down side is we would have to serialize/deserialize all the data needed and created by the decayers. It is quite likely that the overhead of serialization would only be beneficial under very high (say 100+) thread count.
  2. We could use the ExternalWork facility to execute the decayers in a separate TBB task where serialization of access to the decayer is handled by the appropriate SerialTaskQueue. (This is how the one:: module HadronizerFilter works internally.)

Of course the absolute best would be to have access to thread-friendly decayers.

@Dr15Jones
Copy link
Contributor Author

@makortel FYI

@cmsbuild
Copy link
Contributor

+1
Tested at: 31cfb66
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-98755c/4585/summary.html
CMSSW: CMSSW_11_1_X_2020-02-10-1100
SCRAM_ARCH: slc7_amd64_gcc820

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-98755c/4585/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 34
  • DQMHistoTests: Total histograms compared: 2694005
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2693658
  • DQMHistoTests: Total skipped: 346
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 33 files compared)
  • Checked 147 log files, 16 edm output root files, 34 DQM output files

@silviodonato
Copy link
Contributor

We need generators' review @alberto-sanchez @agrohsje @efeyazgan @mkirsano @qliphy @SiewYan

@agrohsje
Copy link

+1
@Dr15Jones : Do you plan to work on the decays as mentioned above? (Following option 2?)
Would you mind an extended talk in GEN starting with a more general introduction and then discussing your modifications for parallelizing.

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @silviodonato, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

@Dr15Jones
Copy link
Contributor Author

Do you plan to work on the decays as mentioned above? (Following option 2?)

I am willing to if that is seen as useful. The question I have is which configurations of Pythia8HadronizerFilter are actually used for production?

Would you mind an extended talk in GEN starting with a more general introduction and then discussing your modifications for parallelizing.

That would be fine as well.

@silviodonato
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit cd9d8d7 into cms-sw:master Feb 19, 2020
@Dr15Jones Dr15Jones deleted the concurrentPythiaHadronizer branch February 24, 2020 20:12
@jordan-martins
Copy link
Contributor

jordan-martins commented Mar 12, 2020

Hi @Dr15Jones , BPH has some low filter efficiency requests that would be great if any gain in time/evt would be achieved. I share one cfg of interest. Currently, this request is producing ~35 events/lumisection in a 8hr condor job. The cfg uses Pythia8 as initialize but then evtGen (not Multi-Process capable) takes over to decay some particular particles. Other thing to notice is that we do not use the Pythia8HadronizerFilter but rather the Pythia8GeneratorFilter.

Would you think that we could have some gain in here as well!?

Many Thanks in advance,
Jordan.
@alberto-sanchez @qliphy @agrohsje

[1]
/afs/cern.ch/work/j/jordanm/public/any/CMSSW_10_2_20/src/BPH-RunIIFall18GS-00219_1_cfg.py

@Dr15Jones
Copy link
Contributor Author

The problem is EvtGen is not thread safe (as you pointed out) and therefore not ameanable to the code I orginally wrote. To really handle this case would be to write the code I had originally thought would be necessary which was using FWCore/SharedMemory to run all the generator code in a different process. Doing such would probably be about 2-4 weeks of work by me.

@Dr15Jones
Copy link
Contributor Author

@jordan-martins wrote

BPH has some low filter efficiency requests that would be great if any gain in time/evt would be achieved. I share one cfg of interest. Currently, this request is producing ~35 events/lumisection in a 8hr condor job. The cfg uses Pythia8 as initialize but then evtGen (not Multi-Process capable) takes over to decay some particular particles. Other thing to notice is that we do not use the Pythia8HadronizerFilter but rather the Pythia8GeneratorFilter.

So I tried out the code in pull request #29445 using the configuration to which you pointed. My first observation is that configuration did not seem all that slow, it could do 2.98472 ev/s using 4 threads (although its CPU efficiency was very low). Using #29445 and 4 threads I got a 2.3x speed up (6.93303 ev/s) and very good CPU efficiency (based on results of watching top). I was stuck at 4 threads since the VM I am using only has 4 threads. This could should scale well under 8 threads as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants