New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
segfaults in PluginFactoryBase::findPMaker() #32344
Comments
A new Issue was created by @dan131riley Dan Riley. @Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign core |
New categories assigned: core @Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks |
Thanks Dan, sounds plausible to me. @Dr15Jones, what do you think? |
I think Dan is right that there is a race condition there. However, I think I found two others.
Here the iterator could be to an object that hasn't yet been fully initialized so needs the same The default value created here before being reset to the found value
might be read here and returned from the function with a nullptr
|
(mostly for my education)
Is this because the call
ends up in cmssw/FWCore/PluginManager/src/PluginFactoryBase.cc Lines 146 to 150 in 297e2a7
and even if that finishes for the calling thread, the first element in itFound->second could have been inserted by another thread and could still be under construction?
By quick test it indeed appears that |
See #32359 |
+1 |
This issue is fully signed and ready to be closed. |
There was another segfault in |
We've been getting occasional segfaults in
PluginFactoryBase::findPMaker()
, including a recent one in the CLANG IB which seems especially informative:https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_amd64_gcc900/CMSSW_11_3_CLANG_X_2020-11-29-2300/pyRelValMatrixLogs/run/23434.99_TTbar_14TeV+2026D49PU_PMXS1S2+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14INPUT+PREMIX_PremixHLBeamSpot14PU+DigiTriggerPU+RecoGlobalPU+HARVESTGlobalPU/step2_TTbar_14TeV+2026D49PU_PMXS1S2+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14INPUT+PREMIX_PremixHLBeamSpot14PU+DigiTriggerPU+RecoGlobalPU+HARVESTGlobalPU.log#/
Looking at the stack trace I'm wondering if we've got a race window before
PluginMakerInfo.m_ptr
gets zeroed by the constructor. Should we be using thezero_allocator
recommended here:https://www.threadingbuildingblocks.org/docs/help/tbb_userguide/Advanced_Idiom_Waiting_on_an_Element.html
Also, I have doubts about the correctness of this comment:
cmssw/FWCore/PluginManager/interface/PluginFactoryBase.h
Lines 59 to 61 in 297e2a7
The constructor initializer lists zero
m_ptr
, so we want that first, then setting it to non-zero in the body of the constructor is last. However, that would still leave a window open before the constructor is even called, which is why I think we need the zeroing allocator so that it's zeroed before being put into the vector.Stack trace:
The text was updated successfully, but these errors were encountered: