Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfaults in PluginFactoryBase::findPMaker() #32344

Closed
dan131riley opened this issue Nov 30, 2020 · 11 comments
Closed

segfaults in PluginFactoryBase::findPMaker() #32344

dan131riley opened this issue Nov 30, 2020 · 11 comments

Comments

@dan131riley
Copy link

We've been getting occasional segfaults in PluginFactoryBase::findPMaker(), including a recent one in the CLANG IB which seems especially informative:

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_amd64_gcc900/CMSSW_11_3_CLANG_X_2020-11-29-2300/pyRelValMatrixLogs/run/23434.99_TTbar_14TeV+2026D49PU_PMXS1S2+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14INPUT+PREMIX_PremixHLBeamSpot14PU+DigiTriggerPU+RecoGlobalPU+HARVESTGlobalPU/step2_TTbar_14TeV+2026D49PU_PMXS1S2+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14INPUT+PREMIX_PremixHLBeamSpot14PU+DigiTriggerPU+RecoGlobalPU+HARVESTGlobalPU.log#/

Looking at the stack trace I'm wondering if we've got a race window before PluginMakerInfo.m_ptr gets zeroed by the constructor. Should we be using the zero_allocator recommended here:

https://www.threadingbuildingblocks.org/docs/help/tbb_userguide/Advanced_Idiom_Waiting_on_an_Element.html

Also, I have doubts about the correctness of this comment:

//NOTE: this has to be last since once it is non zero it signals
// that the construction has finished
std::atomic<void*> m_ptr;

The constructor initializer lists zero m_ptr, so we want that first, then setting it to non-zero in the body of the constructor is last. However, that would still leave a window open before the constructor is even called, which is why I think we need the zeroing allocator so that it's zeroed before being put into the vector.

Stack trace:

Thread 11 (Thread 0x2b978a800700 (LWP 7164)):
#3  0x00002b9756a825af in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00002b974e402600 in edmplugin::PluginFactoryBase::findPMaker(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libFWCorePluginManager.so
#6  0x00002b977bc9b907 in AttachSD::create(edm::EventSetup const&, SensitiveDetectorCatalog const&, edm::ParameterSet const&, SimTrackManager const*, SimActivityRegistry&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libSimG4CoreSensitiveDetector.so
#7  0x00002b977bc77673 in RunManagerMTWorker::initializeG4(RunManagerMT*, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libSimG4CoreApplication.so
#8  0x00002b977bc25266 in OscarMTProducer::beginRun(edm::Run const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/pluginSimG4CoreApplicationPlugins.so

Thread 10 (Thread 0x2b9789801700 (LWP 7163)):
#2  0x00002b9756a82260 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002b97506cc54d in __lll_lock_wait () from /lib64/libpthread.so.0
#5  0x00002b97506c7eb6 in _L_lock_941 () from /lib64/libpthread.so.0
#6  0x00002b97506c7daf in pthread_mutex_lock () from /lib64/libpthread.so.0
#7  0x00002b974e4062b8 in edmplugin::PluginManager::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libFWCorePluginManager.so
#8  0x00002b974e402596 in edmplugin::PluginFactoryBase::findPMaker(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libFWCorePluginManager.so
#9  0x00002b977bc9b907 in AttachSD::create(edm::EventSetup const&, SensitiveDetectorCatalog const&, edm::ParameterSet const&, SimTrackManager const*, SimActivityRegistry&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libSimG4CoreSensitiveDetector.so
#10 0x00002b977bc77673 in RunManagerMTWorker::initializeG4(RunManagerMT*, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libSimG4CoreApplication.so
#11 0x00002b977bc25266 in OscarMTProducer::beginRun(edm::Run const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/pluginSimG4CoreApplicationPlugins.so

Thread 9 (Thread 0x2b9788e00700 (LWP 7162)):
#2  0x00002b9756a82260 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002b97506cc54d in __lll_lock_wait () from /lib64/libpthread.so.0
#5  0x00002b97506c7eb6 in _L_lock_941 () from /lib64/libpthread.so.0
#6  0x00002b97506c7daf in pthread_mutex_lock () from /lib64/libpthread.so.0
#7  0x00002b974e4062b8 in edmplugin::PluginManager::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libFWCorePluginManager.so
#8  0x00002b974e402596 in edmplugin::PluginFactoryBase::findPMaker(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libFWCorePluginManager.so
#9  0x00002b977bc9b907 in AttachSD::create(edm::EventSetup const&, SensitiveDetectorCatalog const&, edm::ParameterSet const&, SimTrackManager const*, SimActivityRegistry&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libSimG4CoreSensitiveDetector.so
#10 0x00002b977bc77673 in RunManagerMTWorker::initializeG4(RunManagerMT*, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libSimG4CoreApplication.so
#11 0x00002b977bc25266 in OscarMTProducer::beginRun(edm::Run const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/pluginSimG4CoreApplicationPlugins.so


Thread 3 (Thread 0x2b9781b0b700 (LWP 6757)):
#0  0x00002b97506c9a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002b975029185c in __gthread_cond_wait (__mutex=<optimized out>, __cond=<optimized out>) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_11_1_0_pre6-slc7_amd64_gcc900/build/CMSSW_11_1_0_pre6-build/BUILD/slc7_amd64_gcc900/external/gcc/9.3.0/gcc-9.3.0/obj/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu/bits/gthr-default.h:865
#2  std::condition_variable::wait (this=<optimized out>, __lock=...) at ../../../../../libstdc++-v3/src/c++11/condition_variable.cc:53
#3  0x00002b977bc652bb in std::thread::_State_impl<std::thread::_Invoker<std::tuple<OscarMTMasterThread::OscarMTMasterThread(edm::ParameterSet const&)::$_0> > >::_M_run() () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libSimG4CoreApplication.so
#4  0x00002b9750296af0 in std::execute_native_thread_routine (__p=0x2b9777c9bba0) at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#5  0x00002b97506c5ea5 in start_thread () from /lib64/libpthread.so.0
#6  0x00002b97509d896d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x2b97525d6d00 (LWP 3644)):
#2  0x00002b9756a82260 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002b9750a26f1d in __memcpy_ssse3 () from /lib64/libc.so.6
#5  0x00002b974e403252 in edmplugin::PluginFactoryBase::registerPMaker(void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libFWCorePluginManager.so
#6  0x00002b97bf7de1a9 in _GLOBAL__sub_I_module.cc () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/pluginSimG4CMSTrackerPlugins.so
#7  0x00002b974dd689c3 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
#8  0x00002b974dd6d59e in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#9  0x00002b974dd687d4 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#10 0x00002b974dd6cb8b in _dl_open () from /lib64/ld-linux-x86-64.so.2
#11 0x00002b974fda2fab in dlopen_doit () from /lib64/libdl.so.2
#12 0x00002b974dd687d4 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#13 0x00002b974fda35ad in _dlerror_run () from /lib64/libdl.so.2
#14 0x00002b974fda3041 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
#15 0x00002b974e409666 in edmplugin::SharedLibrary::SharedLibrary(std::filesystem::__cxx11::path const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libFWCorePluginManager.so
#16 0x00002b974e40651e in edmplugin::PluginManager::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libFWCorePluginManager.so
#17 0x00002b974e402596 in edmplugin::PluginFactoryBase::findPMaker(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libFWCorePluginManager.so
#18 0x00002b977bc9b907 in AttachSD::create(edm::EventSetup const&, SensitiveDetectorCatalog const&, edm::ParameterSet const&, SimTrackManager const*, SimActivityRegistry&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libSimG4CoreSensitiveDetector.so
#19 0x00002b977bc77673 in RunManagerMTWorker::initializeG4(RunManagerMT*, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/libSimG4CoreApplication.so
#20 0x00002b977bc25266 in OscarMTProducer::beginRun(edm::Run const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_CLANG_X_2020-11-29-2300/lib/slc7_amd64_gcc900/pluginSimG4CoreApplicationPlugins.so

Current Modules:

Module: OscarMTProducer:g4SimHits (crashed)
Module: OscarMTProducer:g4SimHits
Module: OscarMTProducer:g4SimHits
Module: OscarMTProducer:g4SimHits
@cmsbuild
Copy link
Contributor

A new Issue was created by @dan131riley Dan Riley.

@Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor

assign core

@cmsbuild
Copy link
Contributor

New categories assigned: core

@Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor

Thanks Dan, sounds plausible to me. @Dr15Jones, what do you think?

@Dr15Jones
Copy link
Contributor

I think Dan is right that there is a race condition there. However, I think I found two others.

itFound = m_plugins.find(iName);

Here the iterator could be to an object that hasn't yet been fully initialized so needs the same while loop.

The default value created here before being reset to the found value

loadables_[p] = ptr;

might be read here and returned from the function with a nullptr

auto itLoaded = loadables_.find(p);

@makortel
Copy link
Contributor

makortel commented Dec 1, 2020

(mostly for my education)

Here the iterator could be to an object that hasn't yet been fully initialized so needs the same while loop.

Is this because the call

std::string lib = PluginManager::get()->load(this->category(), iName).path().string();

ends up in
void PluginFactoryBase::registerPMaker(void* iPMaker, const std::string& iName) {
assert(nullptr != iPMaker);
m_plugins[iName].push_back(PluginMakerInfo(iPMaker, PluginManager::loadingFile()));
newPlugin(iName);
}

and even if that finishes for the calling thread, the first element in itFound->second could have been inserted by another thread and could still be under construction?

The default value created here before being reset to the found value

loadables_[p] = ptr;

By quick test it indeed appears that tbb::concurrent_unordered_map::operator[]() indeed calls default constructor first and then move-assigns the value. Would tbb::concurrent_unordered_map::emplace() do the update of the map atomically?

@Dr15Jones
Copy link
Contributor

Yes to the first question about the element not yet being fully constructed.

By quick test it indeed appears that tbb::concurrent_unordered_map::operator indeed calls default constructor first and then move-assigns the value. Would tbb::concurrent_unordered_map::emplace() do the update of the map atomically?

I'm in the process of making that change right now (and making the other changes pointed to by this issue).

@Dr15Jones
Copy link
Contributor

See #32359

@makortel
Copy link
Contributor

makortel commented Dec 2, 2020

+1

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 2, 2020

This issue is fully signed and ready to be closed.

@makortel
Copy link
Contributor

There was another segfault in PluginFactoryBase::findPMaker() in the latest CLANG IB. I opened a new issue for that #32867

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants