Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix thread safety problem in PhysicsTools/MVAComputer #7332

Merged
merged 5 commits into from
Jan 24, 2015

Conversation

Dr15Jones
Copy link
Contributor

The CMSSW_7_4_THREADED_X IBs have been crashing due to MVA threading issues. The problem stemmed from the use of mutable in ProcForeach. These mutables were being read/written indirectly by GenericMVAJetTagComputer which is an EventSetup product. The simultaneous writes were causing the crashes.
The solution was to move the mutables to be external to the MVA class and instead be passed in as an argument to the functions. Encapsolating the loop state into variables that live on the stack allows each thread to have its own copy.

ProcForeach was using mutable member data to keep track of the
state of the looping it was doing. Given that multiple threads
could be updating those values concurrently, this lead to crashes.
The code has now been changed to store the looping state externally
via VarProcessor::LoopCtx on the stack inside MVAComputer::evalInternal.
This allows each thread to have its own copy.
Theoretically, making the allocator static could lead to an order
of initialization problem. In reality the default allocator is
stateless so wasn't a problem. However not being static simplifies
understanding the code.
Since non of the member functions are const we don't need to use
mutable.
@cmsbuild
Copy link
Contributor

A new Pull Request was created by @Dr15Jones (Chris Jones) for CMSSW_7_4_X.

Fix thread safety problem in PhysicsTools/MVAComputer

It involves the following packages:

PhysicsTools/MVAComputer

@cmsbuild, @vadler, @nclopezo, @monttj can you please review it and eventually sign? Thanks.
@ferencek, @acaudron, @pvmulder, @imarches this is something you requested to watch as well.
You can sign-off by replying to this message having '+1' in the first line of your reply.
You can reject by replying to this message having '-1' in the first line of your reply.
If you are a L2 or a release manager you can ask for tests by saying 'please test' in the first line of a comment.

@Dr15Jones
Copy link
Contributor Author

@Slava This problem was affecting RECO so you might wish to run your own tests.

The static analyzer found that the registries were being accessed
at event processing time by some modules. Given that we could get
a shared library load happening while some other code was reading
the registry, we needed to make the lookup and registration process
threads safe. This was easy to accomplish using tbb::concurrent_unordered_map.
@Dr15Jones
Copy link
Contributor Author

I took the opportunity to also fix a threading problem related to registration found by the static analyzer. I have no further changes I plan on making.

@cmsbuild
Copy link
Contributor

Pull request #7332 was updated. @cmsbuild, @vadler, @nclopezo, @monttj can you please check and sign again.

@Slava
Copy link

Slava commented Jan 22, 2015

@Dr15Jones you are tagging the wrong person

@Dr15Jones
Copy link
Contributor Author

@slava77 You might want to check this out since the problem was originally see in RECO.

@slava77
Copy link
Contributor

slava77 commented Jan 22, 2015

@cmsbuild please test

1 similar comment
@ghost
Copy link

ghost commented Jan 23, 2015

@cmsbuild please test

@cmsbuild
Copy link
Contributor

The tests are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Pull request #7332 was updated. @cmsbuild, @vadler, @nclopezo, @monttj can you please check and sign again.

When initializing the value of 'size' for the LoopCtx we need to
check that the 'size' has not yet been set. The bug only changed
the value of 'size' if it had already been set.
@Dr15Jones
Copy link
Contributor Author

@slava77 this last commit fixed the bug demonstrated by my unit test. Hopefully now everything should compare identically. Thanks for catching the problem.

@Dr15Jones
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

Pull request #7332 was updated. @cmsbuild, @vadler, @nclopezo, @monttj can you please check and sign again.

@slava77
Copy link
Contributor

slava77 commented Jan 24, 2015

@cmsbuild please test

maybe the bot listens to me

@slava77
Copy link
Contributor

slava77 commented Jan 24, 2015

nope, it didn't listen to me either :(

@cmsbuild
Copy link
Contributor

@slava77
Copy link
Contributor

slava77 commented Jan 24, 2015

Hi Chris,
your last fix did it. Now the comparisons are clean.

@Dr15Jones
Copy link
Contributor Author

Woo hoo! Thanks for doing all the checking.

@ktf
Copy link
Contributor

ktf commented Jan 24, 2015

Bypassing analysis signature. @monttj complain if not ok.

ktf added a commit that referenced this pull request Jan 24, 2015
Fix thread safety problem in PhysicsTools/MVAComputer
@ktf ktf merged commit 8f2894c into cms-sw:CMSSW_7_4_X Jan 24, 2015
@Dr15Jones Dr15Jones deleted the fixThreadSafetyOfMVAComputer branch March 4, 2015 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants