Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random crashes related to Pythia8GeneratorFilter #25638

Closed
Dr15Jones opened this issue Jan 12, 2019 · 14 comments
Closed

Random crashes related to Pythia8GeneratorFilter #25638

Dr15Jones opened this issue Jan 12, 2019 · 14 comments

Comments

@Dr15Jones
Copy link
Contributor

In the IB RelVals, we infrequently see crashes in jobs using Pythia8GeneratorFilter. The tracebacks are not always the same, but they do all seem to relate with putting data into the edm::Event.

@cmsbuild
Copy link
Contributor

A new Issue was created by @Dr15Jones Chris Jones.

@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@Dr15Jones
Copy link
Contributor Author

A recent traceback is from workflow 250206.118 (although similar crashes happen in other workflows as well)

#4  <signal handler called>
#5  0x00007f30ba097f3f in edm::OrphanHandle<edm::HepMCProduct> edm::Event::putImpl<edm::HepMCProduct>(unsigned int, std::unique_ptr<edm::HepMCProduct, std::default_delete<edm::HepMCProduct> >) () from /cvmfs/cms-ib.cern.ch/nweek-02558/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_5_CLANG_X_2019-01-10-2300/lib/slc7_amd64_gcc700/pluginGeneratorInterfacePythia8Filters.so
#6  0x00007f30ba094af4 in edm::GeneratorFilter<Pythia8Hadronizer, gen::ExternalDecayDriver>::filter(edm::Event&amp, edm::EventSetup const&amp) () from /cvmfs/cms-ib.cern.ch/nweek-02558/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_5_CLANG_X_2019-01-10-2300/lib/slc7_amd64_gcc700/pluginGeneratorInterfacePythia8Filters.so
#7  0x00007f30f6c7a1e2 in edm::one::EDFilterBase::doEvent(edm::EventPrincipal const&amp, edm::EventSetup const&amp, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/nweek-02558/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_5_CLANG_X_2019-01-10-2300/lib/slc7_amd64_gcc700/libFWCoreFramework.so

One aspect of the put would be a call to copy or move the `edm::HepMCProduct'. So if its memory was corrupted, it could lead to a crash.

@Dr15Jones
Copy link
Contributor Author

The line where the crash for the traceback given above is

ev.put(std::move(bare_product), "unsmeared");

Where the edm::HepMCProduct is created two lines before and has taken ownership of the HepMC::GenEvent instance. If that HepMC::GenEvent were deleted earlier, it could account for the problem.

@Dr15Jones
Copy link
Contributor Author

assign generator

@Dr15Jones
Copy link
Contributor Author

assign core

@cmsbuild
Copy link
Contributor

New categories assigned: core

@Dr15Jones,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

@Dr15Jones
Copy link
Contributor Author

The configuration of the module in the job which crashed is

cms.EDFilter("Pythia8HadronizerFilter",
    ExternalDecays = cms.PSet(
        EvtGen130 = cms.untracked.PSet(
            decay_table = cms.string('GeneratorInterface/EvtGenInterface/data/DECAY_2010.DEC'),
            list_forced_decays = cms.vstring(),
            operates_on_particles = cms.vint32(),
            particle_property_file = cms.FileInPath('GeneratorInterface/EvtGenInterface/data/evt.pdl')
        ),
        parameterSets = cms.vstring('EvtGen130')
    ),
    PythiaParameters = cms.PSet(
        parameterSets = cms.vstring(
            'pythia8CommonSettings', 
            'pythia8CUEP8M1Settings', 
            'pythia8aMCatNLOSettings', 
            'processParameters'
        ),
        processParameters = cms.vstring(
            'JetMatching:setMad = off', 
            'JetMatching:scheme = 1', 
            'JetMatching:merge = on', 
            'JetMatching:jetAlgorithm = 2', 
            'JetMatching:etaJetMax = 999.', 
            'JetMatching:coneRadius = 1.', 
            'JetMatching:slowJetPower = 1', 
            'JetMatching:qCut = 40.', 
            'JetMatching:doFxFx = on', 
            'JetMatching:qCutME = 20.', 
            'JetMatching:nQmatch = 5', 
            'JetMatching:nJetMax = 2'
        ),
        pythia8CUEP8M1Settings = cms.vstring(
            'Tune:pp 14', 
            'Tune:ee 7', 
            'MultipartonInteractions:pT0Ref=2.4024', 
            'MultipartonInteractions:ecmPow=0.25208', 
            'MultipartonInteractions:expPow=1.6'
        ),
        pythia8CommonSettings = cms.vstring(
            'Tune:preferLHAPDF = 2', 
            'Main:timesAllowErrors = 10000', 
            'Check:epTolErr = 0.01', 
            'Beams:setProductionScalesFromLHEF = off', 
            'SLHA:keepSM = on', 
            'SLHA:minMassSM = 1000.', 
            'ParticleDecays:limitTau0 = on', 
            'ParticleDecays:tau0Max = 10', 
            'ParticleDecays:allowPhotonRadiation = on'
        ),
        pythia8aMCatNLOSettings = cms.vstring(
            'SpaceShower:pTmaxMatch = 1', 
            'SpaceShower:pTmaxFudge = 1', 
            'SpaceShower:MEcorrections = off', 
            'TimeShower:pTmaxMatch = 1', 
            'TimeShower:pTmaxFudge = 1', 
            'TimeShower:MEcorrections = off', 
            'TimeShower:globalRecoil = on', 
            'TimeShower:limitPTmaxGlobal = on', 
            'TimeShower:nMaxGlobalRecoil = 1', 
            'TimeShower:globalRecoilMode = 2', 
            'TimeShower:nMaxGlobalBranch = 1', 
            'TimeShower:weightGluonToQuark = 1'
        )
    ),
    comEnergy = cms.double(13000.0),
    filterEfficiency = cms.untracked.double(1.0),
    maxEventsToPrint = cms.untracked.int32(1),
    pythiaHepMCVerbosity = cms.untracked.bool(False),
    pythiaPylistVerbosity = cms.untracked.int32(1)
)

@Dr15Jones
Copy link
Contributor Author

In the configuration, HepMCFilter does not exist so no filtering is done, which means the crash can not be caused by memory problems related to the filter. Also nAttempts does not exist which means reseting of weights does not happen as well. ExternalDecays is set so the memory problem could be related to the use of a decayer.

@Dr15Jones
Copy link
Contributor Author

The decayer being used is here
https://github.com/cms-sw/cmssw/blob/1ada420edcc0650a777cee5b3850c12ca01c5aec/GeneratorInterface/EvtGenInterface/plugins/EvtGen/EvtGenInterface.cc

I did not see an obvious memory problem related to HepMC::GenEvent. Also, the lhef::LHEEvent is not even used in this module instance because it is only used by Tauola decayer.

@fabiocos
Copy link
Contributor

@Dr15Jones do I understand correctly that #25632 is related to the checks that you are discussing in this issue, but it is not guaranteed to be the solution of the problem itself? At least this is what I conclude from #25632 (comment)

@Dr15Jones
Copy link
Contributor Author

@fabiocos yes

@dan131riley
Copy link

A possibly related seg fault in GeneratorSmearedProducer, seen in slc7_amd64_gcc700 CMSSW_10_5_X_2019-05-02-1100 wf 1360.0:

Module: GeneratorSmearedProducer:generatorSmeared (crashed)

#5  0x00007f8ab3e639fc in HepMC::GenEvent::GenEvent(HepMC::GenEvent const&) () from /cvmfs/cms-ib.cern.ch/nweek-02574/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_5_X_2019-04-28-0000/lib/slc7_amd64_gcc700/libSimDataFormatsGeneratorProducts.so
#6  0x00007f8ab3e5c900 in edm::HepMCProduct::HepMCProduct(edm::HepMCProduct const&) () from /cvmfs/cms-ib.cern.ch/nweek-02574/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_5_X_2019-04-28-0000/lib/slc7_amd64_gcc700/libSimDataFormatsGeneratorProducts.so
#7  0x00007f8aaade01c0 in GeneratorSmearedProducer::produce(edm::StreamID, edm::Event&, edm::EventSetup const&) const () from /cvmfs/cms-ib.cern.ch/nweek-02574/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_5_X_2019-04-28-0000/lib/slc7_amd64_gcc700/pluginGeneratorInterfaceCore_plugins.so

Only one other thread active:

Module: Pythia8HadronizerFilter:generator

#4  0x00007f8aaea4316d in Pythia8::SimpleTimeShower::pTnext(Pythia8::Event&, double, double, bool, bool) () from /cvmfs/cms-ib.cern.ch/nweek-02574/slc7_amd64_gcc700/cms/cmssw-patch/CMSSW_10_5_X_2019-05-02-1100/external/slc7_amd64_gcc700/lib/libpythia8.so
#5  0x00007f8aae877811 in Pythia8::PartonLevel::next(Pythia8::Event&, Pythia8::Event&) () from /cvmfs/cms-ib.cern.ch/nweek-02574/slc7_amd64_gcc700/cms/cmssw-patch/CMSSW_10_5_X_2019-05-02-1100/external/slc7_amd64_gcc700/lib/libpythia8.so
#6  0x00007f8aae8e4a2e in Pythia8::Pythia::next() () from /cvmfs/cms-ib.cern.ch/nweek-02574/slc7_amd64_gcc700/cms/cmssw-patch/CMSSW_10_5_X_2019-05-02-1100/external/slc7_amd64_gcc700/lib/libpythia8.so
#7  0x00007f8ab159080b in Pythia8Hadronizer::hadronize() () from /cvmfs/cms-ib.cern.ch/nweek-02574/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_5_X_2019-04-28-0000/lib/slc7_amd64_gcc700/pluginGeneratorInterfacePythia8Filters.so
#8  0x00007f8ab15c5421 in edm::HadronizerFilter<Pythia8Hadronizer, gen::ExternalDecayDriver>::filter(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/nweek-02574/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_5_X_2019-04-28-0000/lib/slc7_amd64_gcc700/pluginGeneratorInterfacePythia8Filters.so

@dan131riley
Copy link

Another, slc7_amd64_gcc820 CMSSW_10_5_X_2019-05-05-2300, also wf 1360.0. I did check that 10_5 includes #25632. Only one active thread:

Module: Pythia8HadronizerFilter:generator (crashed)

#6  0x00007ff094018937 in edm::DataManagingProductResolver::checkType(edm::WrapperBase const&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc820/cms/cmssw/CMSSW_10_5_X_2019-05-05-2300/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#7  0x00007ff094018a99 in edm::DataManagingProductResolver::setProduct(std::unique_ptr<edm::WrapperBase, std::default_delete<edm::WrapperBase> >) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc820/cms/cmssw/CMSSW_10_5_X_2019-05-05-2300/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#8  0x00007ff094018b2c in edm::ProducedProductResolver::putProduct_(std::unique_ptr<edm::WrapperBase, std::default_delete<edm::WrapperBase> >) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc820/cms/cmssw/CMSSW_10_5_X_2019-05-05-2300/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#9  0x00007ff094018c05 in edm::PuttableProductResolver::putProduct_(std::unique_ptr<edm::WrapperBase, std::default_delete<edm::WrapperBase> >) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc820/cms/cmssw/CMSSW_10_5_X_2019-05-05-2300/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#10 0x00007ff0940908ed in edm::EventPrincipal::put(unsigned int, std::unique_ptr<edm::WrapperBase, std::default_delete<edm::WrapperBase> >, edm::Hash<5>) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc820/cms/cmssw/CMSSW_10_5_X_2019-05-05-2300/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#11 0x00007ff0940ef8f2 in edm::Event::commit_aux(std::vector<edm::propagate_const<std::unique_ptr<edm::WrapperBase, std::default_delete<edm::WrapperBase> > >, std::allocator<edm::propagate_const<std::unique_ptr<edm::WrapperBase, std::default_delete<edm::WrapperBase> > > > >&, edm::Hash<5>*) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc820/cms/cmssw/CMSSW_10_5_X_2019-05-05-2300/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#12 0x00007ff0940f011a in edm::Event::commit_(std::vector<unsigned int, std::allocator<unsigned int> > const&, edm::Hash<5>*) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc820/cms/cmssw/CMSSW_10_5_X_2019-05-05-2300/lib/slc7_amd64_gcc820/libFWCoreFramework.so

@smuzaffar
Copy link
Contributor

looks like this has been fixed. Please open a new issue if needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants