Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DQMHcal crash in CMSSW_8_1_THREADED_X_2016-06-10-2300 #14898

Closed
bouril opened this issue Jun 15, 2016 · 8 comments
Closed

DQMHcal crash in CMSSW_8_1_THREADED_X_2016-06-10-2300 #14898

bouril opened this issue Jun 15, 2016 · 8 comments

Comments

@bouril
Copy link

bouril commented Jun 15, 2016

Build CMSSW_8_1_THREADED_X_2016-06-10-2300

4.45 RunJet2012A

DQMHcal problem

https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/slc6_amd64_gcc530/CMSSW_8_1_THREADED_X_2016-06-10-2300/pyRelValMatrixLogs/run/4.45_RunJet2012A+RunJet2012A+HLTD+RECODR1reHLT+HARVESTDR1reHLT/step3_RunJet2012A+RunJet2012A+HLTD+RECODR1reHLT+HARVESTDR1reHLT.log

Thread 2 (Thread 0x7fb2c873f700 (LWP 32294)):
#0 0x0000003fccadf283 in poll () from /lib64/libc.so.6
#1 0x00007fb386dd9da4 in full_read.constprop () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_THREADED_X_2016-06-10-2300/lib/slc6_amd64_gcc530/pluginFWCoreServicesPlugins.so
#2 0x00007fb386dda00a in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_THREADED_X_2016-06-10-2300/lib/slc6_amd64_gcc530/pluginFWCoreServicesPlugins.so
#3 0x00007fb386dda18b in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_THREADED_X_2016-06-10-2300/lib/slc6_amd64_gcc530/pluginFWCoreServicesPlugins.so
#4
#5 0x00007fb34d040d9d in RawTask::process(edm::Event const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_THREADED_X_2016-06-10-2300/lib/slc6_amd64_gcc530/pluginDQMHcalTasksAuto.so
#6 0x00007fb34cf20d2a in hcaldqm::DQTask::analyze(edm::Event const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_THREADED_X_2016-06-10-2300/lib/slc6_amd64_gcc530/libDQMHcalCommon.so
#7 0x00007fb38c7f5a1a in edm::stream::EDAnalyzerAdaptorBase::doEvent(edm::EventPrincipal const&, edm::EventSetup const&, edm::ActivityRegistry
, edm::ModuleCallingContext const_) () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_THREADED_X_2016-06-10-2300/lib/slc6_amd64_gcc530/libFWCoreFramework.so
#8 0x00007fb38c7cf81f in edm::WorkerTedm::stream::EDAnalyzerAdaptorBase::implDo(edm::EventPrincipal const&, edm::EventSetup const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_THREADED_X_2016-06-10-2300/lib/slc6_amd64_gcc530/libFWCoreFramework.so
#9 0x00007fb38c70972c in decltype ({parm#1}()) edm::convertException::wrap<bool edm::Worker::doWork<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const_)::{lambda()#1}>(bool edm::Worker::doWork<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const_)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_THREADED_X_2016-06-10-2300/lib/slc6_amd64_gcc530/libFWCoreFramework.so
#10 0x00007fb38c7098f2 in bool edm::Worker::doWork<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_THREADED_X_2016-06-10-2300/lib/slc6_amd64_gcc530/libFWCoreFramework.so
#11 0x00007fb38c725e2e in decltype ({parm#1}()) edm::convertException::wrap<void edm::Path::processOneOccurrence<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const_)::{lambda()#1}>(void edm::Path::processOneOccurrence<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const_)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_THREADED_X_2016-06-10-2300/lib/slc6_amd64_gcc530/libFWCoreFramework.so
#12 0x00007fb38c7260e2 in void edm::Path::processOneOccurrence<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_THREADED_X_2016-06-10-2300/lib/slc6_amd64_gcc530/libFWCoreFramework.so
#13 0x00007fb38c726505 in void edm::StreamSchedule::processOneEvent<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal&, edm::EventSetup const&, bool)::{lambda()#1}::operator()() const () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_THREADED_X_2016-06-10-2300/lib/slc6_amd64_gcc530/libFWCoreFramework.so
#14 0x00007fb38c7267fb in decltype ({parm#1}()) edm::convertException::wrap<void edm::StreamSchedule::processOneEvent<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal&, edm::EventSetup const&, bool)::{lambda()#1}>(void edm::StreamSchedule::processOneEvent<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal&, edm::EventSetup const&, bool)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_THREADED_X_2016-06-10-2300/lib/slc6_amd64_gcc530/libFWCoreFramework.so
#15 0x00007fb38c726a6f in void edm::StreamSchedule::processOneEvent<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal&, edm::EventSetup const&, bool) () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_THREADED_X_2016-06-10-2300/lib/slc6_amd64_gcc530/libFWCoreFramework.so
#16 0x00007fb38c78c612 in edm::EventProcessor::processEvent(unsigned int) () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_THREADED_X_2016-06-10-2300/lib/slc6_amd64_gcc530/libFWCoreFramework.so
#17 0x00007fb38c78cd47 in edm::EventProcessor::processEventsForStreamAsync(unsigned int, std::atomic*) () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_THREADED_X_2016-06-10-2300/lib/slc6_amd64_gcc530/libFWCoreFramework.so
#18 0x00007fb38c793524 in edm::StreamProcessingTask::execute() () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_THREADED_X_2016-06-10-2300/lib/slc6_amd64_gcc530/libFWCoreFramework.so
#19 0x00007fb38b4d89b5 in tbb::internal::custom_schedulertbb::internal::IntelSchedulerTraits::local_wait_for_all (this=0x7fb36196fe00, parent=..., child=) at ../../src/tbb/custom_scheduler.h:463
#20 0x00007fb38b4d21a4 in tbb::internal::arena::process (this=this@entry=0x7fb38a01f780, s=...) at ../../src/tbb/arena.cpp:156
#21 0x00007fb38b4d1264 in tbb::internal::market::process (this=0x7fb38a047c00, j=...) at ../../src/tbb/market.cpp:502
#22 0x00007fb38b4cd736 in tbb::internal::rml::private_worker::run (this=0x7fb389e66080) at ../../src/tbb/private_server.cpp:275
#23 0x00007fb38b4cd989 in tbb::internal::rml::private_worker::thread_routine (arg=) at ../../src/tbb/private_server.cpp:228
#24 0x0000003fcce07aa1 in start_thread () from /lib64/libpthread.so.0
#25 0x0000003fccae8aad in clone () from /lib64/libc.so.6

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 15, 2016

A new Issue was created by @bouril Dimitri Bourilkov.

@davidlange6, @smuzaffar, @Degano, @davidlt, @Dr15Jones can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@Dr15Jones
Copy link
Contributor

assign core

@cmsbuild
Copy link
Contributor

New categories assigned: core

@Dr15Jones,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

@dan131riley
Copy link

@Dr15Jones this looks like another memory overrun, we're getting these in memcheck:

==83065== Thread 3:
==83065== Invalid read of size 2
==83065== at 0x4E51FD9D: getBunchNumber (EventFilter/HcalRawToDigi/interface/HcalHTRData.h:116)
==83065== by 0x4E51FD9D: RawTask::_process(edm::Event const&, edm::EventSetup const&) (DQM/HcalTasks/plugins/RawTask.cc:276)
==83065== by 0x4E618529: hcaldqm::DQTask::analyze(edm::Event const&, edm::EventSetup const&) (in /cvmfs/cms-ib.cern.ch/2016-26/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_THREADED_X_2016-06-19-0000/lib/slc6_amd64_gcc530/libDQMHcalCommon.so)

In the log file there are lots of "unable to read", "could not get", "is not valid", etc. complaints indicating that analyzers aren't finding what they expected in the event, and it looks like the HcalTasks analyzer isn't properly protected against that--it is getting 0 length data packets, and here:

void HcalHTRData::adoptData(const unsigned short* data, int length) {

where the HcalHTRData class "adopts" the data it notices that the length is 0 and sets a flag that it's invalid:

if (m_rawLength<5) {
m_formatVersion=-2; // invalid!
} else {

but that doesn't get used by the caller. I can put in a point fix, something like:

--- a/DQM/HcalTasks/plugins/RawTask.cc
+++ b/DQM/HcalTasks/plugins/RawTask.cc
@@ -263,7 +263,7 @@ RawTask::RawTask(edm::ParameterSet const& ps):
for (int is=0; is<HcalDCCHeader::SPIGOT_COUNT; is++)
{
int r = hdcc->getSpigotData(is, htr, raw.size());

  •                           if (r!=0)
    
  •                           if (r!=0 || !htr.check())
                                    continue;
                            HcalElectronicsId eid = HcalElectronicsId(
                                    constants::FIBERCH_MIN, constants::FIBER_VME_MIN,
    

That pattern is used in several other places, like

if (!htr.check()) continue;

so it should be safe, and is pretty minimal, but I'm wondering if there ought to be more protection somewhere upstream. Any advice?

-dan

@Dr15Jones
Copy link
Contributor

I think create a pull request which uses the check and in the comments of the pull request talk about having a check further up stream. That way the discussion can happen in the right place (with the pull request) and hopefully with the right people.

@smuzaffar
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This issue is fully signed and ready to be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants