Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CLANG_X] Segmentation violation in MillePedeDQMModule::dqmEndJob #38364

Closed
iarspider opened this issue Jun 14, 2022 · 13 comments · Fixed by #38367
Closed

[CLANG_X] Segmentation violation in MillePedeDQMModule::dqmEndJob #38364

iarspider opened this issue Jun 14, 2022 · 13 comments · Fixed by #38367

Comments

@iarspider
Copy link
Contributor

Test Calibration/TkAlCaRecoProducers/testCalibrationTkAlCaRecoProducers failed in CLANG build:

#3  0x00002b4ddf54f72f in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/nweek-02737/el8_amd64_gcc10/cms/cmssw/CMSSW_12_5_CLANG_X_2022-06-13-2300/lib/el8_amd64_gcc10/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00002b4e07e6e3f4 in MillePedeDQMModule::dqmEndJob(dqm::implementation::IBooker&, dqm::implementation::IGetter&) () from /cvmfs/cms-ib.cern.ch/nweek-02737/el8_amd64_gcc10/cms/cmssw/CMSSW_12_5_CLANG_X_2022-06-13-2300/lib/el8_amd64_gcc10/pluginAlignmentMillePedeAlignmentAlgorithmAuto.so
#6  0x00002b4e07e710e8 in non-virtual thunk to DQMEDHarvester::endProcessBlockProduce(edm::ProcessBlock&) () from /cvmfs/cms-ib.cern.ch/nweek-02737/el8_amd64_gcc10/cms/cmssw/CMSSW_12_5_CLANG_X_2022-06-13-2300/lib/el8_amd64_gcc10/pluginAlignmentMillePedeAlignmentAlgorithmAuto.so
#7  0x00002b4dd609faec in edm::one::EDProducerBase::doEndProcessBlock(edm::ProcessBlockPrincipal const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/nweek-02737/el8_amd64_gcc10/cms/cmssw/CMSSW_12_5_CLANG_X_2022-06-13-2300/lib/el8_amd64_gcc10/libFWCoreFramework.so

Full log: link

@cmsbuild
Copy link
Contributor

A new Issue was created by @iarspider .

@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@iarspider
Copy link
Contributor Author

assign alca

@cmsbuild
Copy link
Contributor

New categories assigned: alca

@yuanchao,@francescobrivio,@malbouis,@tvami you have been requested to review this Pull request/Issue and eventually sign? Thanks

@iarspider
Copy link
Contributor Author

The same issue is observed in RelVal 1001.0

@tvami
Copy link
Contributor

tvami commented Jun 14, 2022

type trk

@cmsbuild cmsbuild added the trk label Jun 14, 2022
@tvami
Copy link
Contributor

tvami commented Jun 14, 2022

Attention of @connorpa @antoniovagnerini @consuegs @mmusich

@mmusich
Copy link
Contributor

mmusich commented Jun 14, 2022

@dmeuser please have a look.

@tvami
Copy link
Contributor

tvami commented Jun 15, 2022

+alca

@cmsbuild
Copy link
Contributor

This issue is fully signed and ready to be closed.

@aandvalenzuela
Copy link
Contributor

aandvalenzuela commented Jun 15, 2022

Hi,

We have seen the same test Calibration/TkAlCaRecoProducers/testCalibrationTkAlCaRecoProducers failing in DEFAULT IBs on platform el8_ppc64le_gcc10:

A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.

Wed Jun 15 17:30:19 CEST 2022
Thread 2 (Thread 0x3ff75d0a8460 (LWP 117413) "cmsRun"):
#0  0x00003fff81fa9510 in waitpid () from /lib64/libpthread.so.0
#1  0x00003fff79d811bc in edm::service::cmssw_stacktrace_fork() () from /cvmfs/cms-ib.cern.ch/nweek-02737/el8_ppc64le_gcc10/cms/cmssw/CMSSW_12_5_X_2022-06-13-2300/lib/el8_ppc64le_gcc10/pluginFWCoreServicesPlugins.so
#2  0x00003fff79d82d4c in edm::service::InitRootHandlers::stacktraceHelperThread() () from /cvmfs/cms-ib.cern.ch/nweek-02737/el8_ppc64le_gcc10/cms/cmssw/CMSSW_12_5_X_2022-06-13-2300/lib/el8_ppc64le_gcc10/pluginFWCoreServicesPlugins.so
#3  0x00003fff79d87014 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run() () from /cvmfs/cms-ib.cern.ch/nweek-02737/el8_ppc64le_gcc10/cms/cmssw/CMSSW_12_5_X_2022-06-13-2300/lib/el8_ppc64le_gcc10/pluginFWCoreServicesPlugins.so
#4  0x00003fff82260bb0 in std::execute_native_thread_routine (__p=<optimized out>) at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#5  0x00003fff81f99718 in start_thread () from /lib64/libpthread.so.0
#6  0x00003fff81eaab58 in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x3fff8242b260 (LWP 117393) "cmsRun"):
#0  0x00003fff81e99250 in poll () from /lib64/libc.so.6
#1  0x00003fff79d8185c in full_read.constprop () from /cvmfs/cms-ib.cern.ch/nweek-02737/el8_ppc64le_gcc10/cms/cmssw/CMSSW_12_5_X_2022-06-13-2300/lib/el8_ppc64le_gcc10/pluginFWCoreServicesPlugins.so
#2  0x00003fff79d82e28 in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ib.cern.ch/nweek-02737/el8_ppc64le_gcc10/cms/cmssw/CMSSW_12_5_X_2022-06-13-2300/lib/el8_ppc64le_gcc10/pluginFWCoreServicesPlugins.so
#3  0x00003fff79d862fc in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/nweek-02737/el8_ppc64le_gcc10/cms/cmssw/CMSSW_12_5_X_2022-06-13-2300/lib/el8_ppc64le_gcc10/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00003fff81e2ca7c in __memset_power8 () from /lib64/libc.so.6
#6  0x00003fff830de0d4 in TObjArray::Init(int, int) () from /cvmfs/cms-ib.cern.ch/week1/el8_ppc64le_gcc10/cms/cmssw-patch/CMSSW_12_5_X_2022-06-14-2300/external/el8_ppc64le_gcc10/lib/libCore.so
#7  0x00003fff830de224 in TObjArray::TObjArray(int, int) () from /cvmfs/cms-ib.cern.ch/week1/el8_ppc64le_gcc10/cms/cmssw-patch/CMSSW_12_5_X_2022-06-14-2300/external/el8_ppc64le_gcc10/lib/libCore.so
#8  0x00003fff8362d914 in TFile::Init(bool) () from /cvmfs/cms-ib.cern.ch/week1/el8_ppc64le_gcc10/cms/cmssw-patch/CMSSW_12_5_X_2022-06-14-2300/external/el8_ppc64le_gcc10/lib/libRIO.so
#9  0x00003fff8362f150 in TFile::TFile(char const*, char const*, char const*, int) () from /cvmfs/cms-ib.cern.ch/week1/el8_ppc64le_gcc10/cms/cmssw-patch/CMSSW_12_5_X_2022-06-14-2300/external/el8_ppc64le_gcc10/lib/libRIO.so
#10 0x00003fff791a828c in LegacyIOHelper::save(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /cvmfs/cms-ib.cern.ch/nweek-02737/el8_ppc64le_gcc10/cms/cmssw/CMSSW_12_5_X_2022-06-13-2300/lib/el8_ppc64le_gcc10/libDQMServicesCore.so
#11 0x00003ff74cb1557c in DQMFileSaver::saveForOffline(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int) () from /cvmfs/cms-ib.cern.ch/nweek-02737/el8_ppc64le_gcc10/cms/cmssw/CMSSW_12_5_X_2022-06-13-2300/lib/el8_ppc64le_gcc10/pluginDQMServicesComponentsPlugins.so
#12 0x00003ff74cb15d7c in non-virtual thunk to DQMFileSaver::endProcessBlock(edm::ProcessBlock const&) () from /cvmfs/cms-ib.cern.ch/nweek-02737/el8_ppc64le_gcc10/cms/cmssw/CMSSW_12_5_X_2022-06-13-2300/lib/el8_ppc64le_gcc10/pluginDQMServicesComponentsPlugins.so
#13 0x00003fff84a08514 in virtual thunk to edm::one::impl::WatchProcessBlock<edm::one::EDAnalyzerBase>::doEndProcessBlock_(edm::ProcessBlock const&) () from /cvmfs/cms-ib.cern.ch/nweek-02737/el8_ppc64le_gcc10/cms/cmssw/CMSSW_12_5_X_2022-06-13-2300/lib/el8_ppc64le_gcc10/libFWCoreFramework.so
#14 0x00003fff84a03130 in edm::one::EDAnalyzerBase::doEndProcessBlock(edm::ProcessBlockPrincipal const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/nweek-02737/el8_ppc64le_gcc10/cms/cmssw/CMSSW_12_5_X_2022-06-13-2300/lib/el8_ppc64le_gcc10/libFWCoreFramework.so
#15 0x00003fff849d8a7c in edm::WorkerT<edm::one::EDAnalyzerBase>::implDoEndProcessBlock(edm::ProcessBlockPrincipal const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/nweek-02737/el8_ppc64le_gcc10/cms/cmssw/CMSSW_12_5_X_2022-06-13-2300/lib/el8_ppc64le_gcc10/libFWCoreFramework.so
#16 0x00003fff848678e0 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3> >(edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3> >(edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/nweek-02737/el8_ppc64le_gcc10/cms/cmssw/CMSSW_12_5_X_2022-06-13-2300/lib/el8_ppc64le_gcc10/libFWCoreFramework.so
#17 0x00003fff84867dd8 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3>::Context const*) () from /cvmfs/cms-ib.cern.ch/nweek-02737/el8_ppc64le_gcc10/cms/cmssw/CMSSW_12_5_X_2022-06-13-2300/lib/el8_ppc64le_gcc10/libFWCoreFramework.so
#18 0x00003fff84868554 in void edm::SerialTaskQueueChain::actionToRun<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3> >::execute()::{lambda()#1}&>(edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3> >::execute()::{lambda()#1}&) () from /cvmfs/cms-ib.cern.ch/nweek-02737/el8_ppc64le_gcc10/cms/cmssw/CMSSW_12_5_X_2022-06-13-2300/lib/el8_ppc64le_gcc10/libFWCoreFramework.so
#19 0x00003fff848686b0 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3> >::execute()::{lambda()#1}&>(tbb::detail::d1::task_group&, edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3> >::execute()::{lambda()#1}&)::{lambda()#1}>::execute() () from /cvmfs/cms-ib.cern.ch/nweek-02737/el8_ppc64le_gcc10/cms/cmssw/CMSSW_12_5_X_2022-06-13-2300/lib/el8_ppc64le_gcc10/libFWCoreFramework.so
#20 0x00003fff840f628c in tbb::detail::d1::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/nweek-02737/el8_ppc64le_gcc10/cms/cmssw/CMSSW_12_5_X_2022-06-13-2300/lib/el8_ppc64le_gcc10/libFWCoreConcurrency.so
#21 0x00003fff82655bb4 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=0x3fff815dee00, this=0x3fff815d3e00) at /scratch/cmsbuild/jenkins_a/workspace/jenkins-test-bootstrap/toolconf/BUILD/el8_ppc64le_gcc10/external/tbb/v2021.5.0-e966a5acb1e4d5fd7605074bafbb079c/tbb-v2021.5.0/src/tbb/task_dispatcher.h:322
#22 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x3fff815d3e00) at /scratch/cmsbuild/jenkins_a/workspace/jenkins-test-bootstrap/toolconf/BUILD/el8_ppc64le_gcc10/external/tbb/v2021.5.0-e966a5acb1e4d5fd7605074bafbb079c/tbb-v2021.5.0/src/tbb/task_dispatcher.h:463
#23 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /scratch/cmsbuild/jenkins_a/workspace/jenkins-test-bootstrap/toolconf/BUILD/el8_ppc64le_gcc10/external/tbb/v2021.5.0-e966a5acb1e4d5fd7605074bafbb079c/tbb-v2021.5.0/src/tbb/task_dispatcher.cpp:168
#24 0x00003fff84812fe8 in edm::EventProcessor::endProcessBlock(bool, bool) () from /cvmfs/cms-ib.cern.ch/nweek-02737/el8_ppc64le_gcc10/cms/cmssw/CMSSW_12_5_X_2022-06-13-2300/lib/el8_ppc64le_gcc10/libFWCoreFramework.so
#25 0x00003fff84818e58 in edm::EventProcessor::runToCompletion() () from /cvmfs/cms-ib.cern.ch/nweek-02737/el8_ppc64le_gcc10/cms/cmssw/CMSSW_12_5_X_2022-06-13-2300/lib/el8_ppc64le_gcc10/libFWCoreFramework.so
#26 0x000000001000ac24 in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const ()
#27 0x00003fff8263a480 in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /scratch/cmsbuild/jenkins_a/workspace/jenkins-test-bootstrap/toolconf/BUILD/el8_ppc64le_gcc10/external/tbb/v2021.5.0-e966a5acb1e4d5fd7605074bafbb079c/tbb-v2021.5.0/src/tbb/arena.cpp:698
#28 0x000000001000bd14 in main::{lambda()#1}::operator()() const ()
#29 0x00000000100097e4 in main ()

Current Modules:

Module: DQMFileSaver:dqmSaver (crashed)

A fatal system signal has occurred: segmentation violation
/afs/cern.ch/user/c/cmsbuild/CMSSW_12_5_X_2022-06-14-2300/src/Calibration/TkAlCaRecoProducers/test/testAlCaHarvesting.sh: line 12: 117393 Segmentation fault      (core dumped) cmsRun -e -j testPCLAlCaHarvesting.xml ${LOCAL_TEST_DIR}/testPCLAlCaHarvesting.py
Failure running testPCLAlCaHarvesting.py: status 139

We suspect it could be related to this issue because of the following error messages we get when reproducing the issue locally:

%MSG
STOP FILETC: no binary files                                 
%MSG-e MillePedeFileReader:   AlignmentProducerAsAnalyzer:SiPixelAliPedeAlignmentProducer@endProcessBlock  15-Jun-2022 17:30:19 CEST post-events
Could not read millepede result-file.
%MSG
%MSG-e MillePedeFileReader:   MillePedeDQMModule:SiPixelAliDQMModule@endProcessBlock  15-Jun-2022 17:30:19 CEST post-events
Could not read millepede result-file.
%MSG
%MSG-e SiPixelLorentzAnglePCLHarvester::dqmEndJob:   SiPixelLorentzAnglePCLHarvester:alcaSiPixelLorentzAngleHarvester@endProcessBlock  15-Jun-2022 17:30:19 CEST post-events
Failed to retrieve electron drift over depth for layer 1, module 1.

Full log: link

@smuzaffar
Copy link
Contributor

cms kibana shows this unit tests start failing for el8_ppc64le_gcc10/CMSSW_12_5_X_2022-06-13-2300 and above. Could it be #38273 change which is causing this issue?

@mmusich
Copy link
Contributor

mmusich commented Jun 15, 2022

This was already fixed

@perrotta
Copy link
Contributor

The fix should be #38367, which has been merged since today's CMSSW_12_5_X_2022-06-15-1100, after which IB that unit tests didn't fail

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants