Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making the boosted double SV tagger thread-safe #12680

Conversation

ferencek
Copy link
Contributor

@ferencek ferencek commented Dec 6, 2015

fastjet::contrib::Njettiness is not thread-safe and therefore should not be used as a data member of an ESProducer. This PR addresses the problem by defining a local instance of fastjet::contrib::Njettiness instead of using a global instance defined as a data member of the CandidateBoostedDoubleSecondaryVertexComputer class.

This PR is urgent since it fixes CMSSW crashes occurring in the multi-threaded jobs.

Njettiness is not thread-safe and therefore should not be used as a data member
of an ESProducer
@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 6, 2015

A new Pull Request was created by @ferencek (Dinko Ferenček) for CMSSW_8_0_X.

It involves the following packages:

RecoBTag/SecondaryVertex

@cmsbuild, @cvuosalo, @davidlange6, @slava77 can you please review it and eventually sign? Thanks.
@ferencek, @acaudron, @pvmulder, @imarches this is something you requested to watch as well.
@slava77, @Degano, @smuzaffar you are the release manager for this.

Following commands in first line of a comment are recognized

  • +1|approve[d]|sign[ed]: L1/L2's to approve it
  • -1|reject[ed]: L1/L2's to reject it
  • assign <category>[,<category>[,...]]: L1/L2's to request signatures from other categories
  • unassign <category>[,<category>[,...]]: L1/L2's to remove signatures from other categories
  • hold: L1/all L2's/release manager to mark it as on hold
  • unhold: L1/user who put this PR on hold
  • merge: L1/release managers to merge this request
  • [@cmsbuild,] please test: L1/L2 and selected users to start jenkins tests
  • [@cmsbuild,] please test with cms-sw/cmsdist#<PR>: L1/L2 and selected users to start jenkins tests using externals from cmsdist

@slava77
Copy link
Contributor

slava77 commented Dec 6, 2015

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 6, 2015

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/10166/console

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 6, 2015

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 6, 2015

@Dr15Jones
Copy link
Contributor

Just to give some sense of urgency, over 120 workflows are failing in CMSSW_8_0_THREADED_X without the thread-safe fix.

@smuzaffar
Copy link
Contributor

I am approving it and starting 00h00 IB now

@smuzaffar
Copy link
Contributor

merge

cmsbuild added a commit that referenced this pull request Dec 8, 2015
…readSafe_from-CMSSW_7_6_1

Making the boosted double SV tagger thread-safe
@cmsbuild cmsbuild merged commit 97777d9 into cms-sw:CMSSW_8_0_X Dec 8, 2015
@cvuosalo
Copy link
Contributor

cvuosalo commented Dec 8, 2015

+1

For #12680 b7970d9

Urgent bug fix for to prevent multi-threaded jobs from crashing. There should be no change in monitored quantities. #12679 is the 76X version of this PR.

The code change is satisfactory, and Jenkins tests against baseline CMSSW_8_0_X_2015-12-04-1100 show no significant differences, as expected. An extended test of workflow 25202.0_TTbar_13 with 70 events against baseline CMSSW_8_0_X_2015-12-04-1100 also shows no significant differences. Timing measurements show no significant difference. A related module even seems to show a tiny timing decrease:

  delta/mean delta/orJob     original                   new       module name
  ---------- ------------     --------                  ----       ------------
   -0.066339      -0.00%        15.36 ms/ev ->        14.37 ms/ev pfBoostedDoubleSecondaryVertexAK8BJetTagsAK8

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 8, 2015

This pull request is fully signed and it will be integrated in one of the next CMSSW_8_0_X IBs (tests are also fine). This pull request requires discussion in the ORP meeting before it's merged. @slava77, @davidlange6, @Degano, @smuzaffar

@smuzaffar
Copy link
Contributor

@Dr15Jones , we now have a new issue in threaded IBs.
cmsRun: Nsubjettiness/MeasureDefinition.cc:168: fastjet::contrib::TauPartition fastjet::contrib::MeasureDefinition::get_partition(const std::vector&, const std::vector&) const: Assertion `has_beam()' failed.

@ferencek
Copy link
Contributor Author

ferencek commented Dec 9, 2015

@smuzaffar, can you give an example of a workflow that is failing?

@smuzaffar
Copy link
Contributor

@ferencek , see the workflows here https://cms-sw.github.io/relvalLogDetail.html#slc6_amd64_gcc493;CMSSW_8_0_THREADED_X_2015-12-09-1100

for example 4.67, 4.68 (failing in step3)

@Dr15Jones
Copy link
Contributor

I think the problem is fastjet::contrib::DefaultMeasure::UpdateAxesFast

https://github.com/cms-externals/fastjet-contrib/blob/cms/v1.020/Nsubjettiness/MeasureDefinition.cc#L387

@cvuosalo
Copy link
Contributor

cvuosalo commented Dec 9, 2015

@Dr15Jones
Copy link
Contributor

I think I have the needed fix to fastjet-contrib

cms-externals/fastjet-contrib#9

@ferencek
Copy link
Contributor Author

ferencek commented Dec 9, 2015

@Dr15Jones, thanks for the fix. I wonder why this issue did not show up earlier since there are other places in CMSSW where Njettiness is used, in particular in the MiniAOD production. Also, what is the mechanism to propagate this fix upstream.

@ferencek ferencek deleted the BoostedDoubleSVTaggerV2-ThreadSafe_from-CMSSW_7_6_1 branch December 10, 2015 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants