Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in BDHadronTrackMonitoringAnalyzer #31670

Closed
makortel opened this issue Oct 5, 2020 · 8 comments
Closed

Segfault in BDHadronTrackMonitoringAnalyzer #31670

makortel opened this issue Oct 5, 2020 · 8 comments

Comments

@makortel
Copy link
Contributor

makortel commented Oct 5, 2020

Many HI workflows (158.1, 158.2, 158.3, 159.1, 159.3, 159.4, 300.0, 301.0, 302.0, 130.0, 311.0, 312.0) are crashing in step3 with a segfault in BDHadronTrackMonitoringAnalyzer since CMSSW_11_2_X_2020-10-03-1100

#5  0x00002b144cec18c6 in BDHadronTrackMonitoringAnalyzer::analyze(edm::Event const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/nweek-02649/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-10-04-0000/lib/slc7_amd64_gcc820/pluginValidationRecoBPlugins.so
#6  0x00002b13e8910b84 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/nweek-02649/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-10-04-0000/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#7  0x00002b13e88eb15e in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/nweek-02649/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-10-04-0000/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#8  0x00002b13e8851245 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/nweek-02649/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-10-04-0000/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#9  0x00002b13e88513fd in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms-ib.cern.ch/nweek-02649/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-10-04-0000/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#10 0x00002b13e8851706 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms-ib.cern.ch/nweek-02649/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-10-04-0000/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#11 0x00002b13e8852e0a in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() () from /cvmfs/cms-ib.cern.ch/nweek-02649/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-10-04-0000/lib/slc7_amd64_gcc820/libFWCoreFramework.so
@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 5, 2020

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

makortel commented Oct 5, 2020

assign dqm

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 5, 2020

New categories assigned: dqm

@jfernan2,@andrius-k,@fioriNTU,@kmaeshima,@ErnestaP you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor Author

makortel commented Oct 5, 2020

CMSSW_11_2_X_2020-10-03-1100 included two HI related PRs: #31646 and #30898. Nevertheless, the cause for the crash is not immediately clear from either (and #30898 is causing configuration problems in 3 other HI workflows).

In any case it would be good for BDHadronTrackMonitoringAnalyzer to report possible errors e.g. via exceptions instead of segfaults.

@makortel
Copy link
Contributor Author

makortel commented Oct 5, 2020

#31646 at least is not the cause, by reverting it locally on top of CMSSW_11_2_X_2020-10-04-2300 step3 of 158.1 still crashes.

@makortel
Copy link
Contributor Author

makortel commented Oct 5, 2020

Reverting #30898 locally did make 158.1 step3 to complete succesfully.

@smuzaffar
Copy link
Contributor

#31674 has fixed this issue

@Martin-Grunewald
Copy link
Contributor

Possibly related problem:

It seems there is still a problem in an HIon workflow (HLT validation tests suite):

https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_11_2_X_2020-10-07-2300/slc7_amd64_gcc820/RelVal_RECO_HIon_MC.log

----- Begin Fatal Exception 08-Oct-2020 07:51:28 CEST-----------------------
An exception of category 'ProductNotFound' occurred while
   [0] Processing  Event run: 1 lumi: 12 event: 3302 stream: 3
   [1] Running path 'validation_step'
   [2] Calling method for module B2GHadronicHLTValidation/'b2gDiJetHLTValidatio\
n'
Exception Message:
Principal::getByToken: Found zero products matching all criteria
Looking for a container with elements of type: reco::Jet
Looking for module label: ak8PFJetsPuppi
Looking for productInstanceName:

   Additional Info:
      [a] If you wish to continue processing events after a ProductNotFound exc\
eption,
add "SkipEvent = cms.untracked.vstring('ProductNotFound')" to the "options" PSe\
t in the configuration.

----- End Fatal Exception -------------------------------------------------

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants