Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phase2TrackerClusterizer can not be a global::EDProducer #14334

Closed
Dr15Jones opened this issue May 2, 2016 · 11 comments
Closed

Phase2TrackerClusterizer can not be a global::EDProducer #14334

Dr15Jones opened this issue May 2, 2016 · 11 comments

Comments

@Dr15Jones
Copy link
Contributor

The class Phase2TrackerClusterizer can not be a global EDProducer since it is not thread safe. The Threaded IB is crashing with

#5  0x0000003c58e89e5d in _wordcopy_bwd_aligned () from /lib64/libc.so.6
#6  0x0000003c58e839aa in memmove () from /lib64/libc.so.6
#7  0x00007fd3d4db9e6e in std::vector<bool, std::allocator<bool> >::_M_fill_insert(std::_Bit_iterator, unsigned long, bool) () from /cvmfs/cms-ib.cern.ch/2016-18/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_THREADED_X_2016-04-29-1100/lib/slc6_amd64_gcc530/libFWCoreFramework.so
#8  0x00007fd3abbe2583 in Phase2TrackerClusterizerArray::setSize(unsigned int, unsigned int) () from /cvmfs/cms-ib.cern.ch/2016-18/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_THREADED_X_2016-04-29-1100/lib/slc6_amd64_gcc530/pluginPhase2TrackerClusterizerPlugin.so
#9  0x00007fd3abbde8bf in Phase2TrackerClusterizer::produce(edm::StreamID, edm::Event&, edm::EventSetup const&) const () from /cvmfs/cms-ib.cern.ch/2016-18/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_THREADED_X_2016-04-29-1100/lib/slc6_amd64_gcc530/pluginPhase2TrackerClusterizerPlugin.so
#10 0x00007fd3d4ddaf61 in edm::global::EDProducerBase::doEvent(edm::EventPrincipal const&, edm::EventSetup const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/2016-18/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_THREADED_X_2016-04-29-1100/lib/slc6_amd64_gcc530/libFWCoreFramework.so

The problem is the class uses a pointer to a non-const object as a member data:

std::unique_ptr< Phase2TrackerClusterizerAlgorithm > clusterizer_;

which allows the code to break const correctness in the call to Phase2TrackerClusterizer::produce.

This code needs to be changed, probably to be a stream::EProducer<> instead.

@cmsbuild
Copy link
Contributor

cmsbuild commented May 2, 2016

A new Issue was created by @Dr15Jones (Chris Jones).

@davidlange6, @smuzaffar, @Degano, @davidlt, @Dr15Jones can you please review it and eventually sign/assign? Thanks.

cms-bot commands are list here #13029

@Dr15Jones
Copy link
Contributor Author

Assign reconstruction

@cmsbuild
Copy link
Contributor

cmsbuild commented May 2, 2016

New categories assigned: reconstruction

@cvuosalo,@slava77 you have been requested to review this Pull request/Issue and eventually sign? Thanks

@Dr15Jones
Copy link
Contributor Author

@thomaslenzi

@thomaslenzi
Copy link
Contributor

Hi @Dr15Jones,
Could you tell me the commands you ran to that I can reproduce the error and debug. I tested with CMSSW_8_1_THREADED_X_2016-05-02-2300 and ran runTheMatrix.py --what upgrade -l 10600 just fine. Maybe there is a special command that need to be added in order to use the multithreading.

@Dr15Jones
Copy link
Contributor Author

One of the failures was workflow 106250.0 on step2:
https://cms-sw.github.io/relvalLogDetail.html#slc6_amd64_gcc530;CMSSW_8_1_THREADED_X_2016-05-02-1100

you can click on the cmd link next to the workflow name and it will show you exactly what commands were issued.

Beware, threading problems are not guaranteed to reproduce themselves since they depend on exactly what each thread is doing at a given moment.

@Dr15Jones
Copy link
Contributor Author

Just to be clear, the problem is multiple threads can be modifying the member data clusterizer_ which is of type Phase2TrackerClusterizerAlgorithm. You can easily see there is a problem just by changing all (and I mean all) member functions of Phase2TrackerClusterizerAlgorithm to be const. By doing that you'll see the code will not compile and therefore the class Phase2TrackerClusterizerAlgorithm is not const thread safe and can not be used in a global EDProducer.

Fortunately, the solution is simple, just switch to a stream EDProducer. A stream EDProducer gets multiple copies of the EDProducer made in the job but each copy is guaranteed to see one and only one event at a time and therefore doesn't have to worry about concurrency.

@thomaslenzi
Copy link
Contributor

I created a PR to fix the issue : #14366.

@Dr15Jones
Copy link
Contributor Author

Thanks for your prompt action, it is much appreciated.

@slava77
Copy link
Contributor

slava77 commented Jul 21, 2016

+1

fixed in #14366 (merged since a while now): changed to edm::stream::EDProducer

@cmsbuild
Copy link
Contributor

This issue is fully signed and ready to be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants