make RU CSC segment algorithm reproducible by enforcing constness #19421

slava77 · 2017-06-24T07:29:25Z

make method buildSegments const and move all varying data members to an AlgoState percolated through all methods.

This is a somewhat mindless method to make each call independent and resolve the problem with reproducibility running the algorithm in multithreaded mode or otherwise reordered events.
With this solution the reproducibility is effectively enforced by the compiler.

The code is called chamber-by chamber. Clearly, there was a changing memory between calls to build a segment in different chambers. After the constness is enforced, the order of calls between chambers or between events shouldn't matter.

Changes, compared to the baseline (black CMSSW_9_2_3_patch1) in wf 27411 (10 muons per event)
in one thread:

Baseline CMSSW_9_2_3_patch1 comparison between single-thread run (black) and multi-thread (red, using 8 threads)

in this test the events are still somewhat in order and on the same events there should be no differences. This explains much smaller size of changes in the multithread-single thread.

After the fix there are no differences in the cscSegments distributions in comparison between MT1 and MT8 runs.

…an AlgoState percolated through all methods. This is a somewhat mindless method to make each call independent and resolve the problem with reproducibility running the algorithm in multithreaded mode or otherwise reordered events.

cmsbuild · 2017-06-24T07:29:45Z

A new Pull Request was created by @slava77 (Slava Krutelyov) for master.

It involves the following packages:

RecoLocalMuon/CSCSegment

@perrotta, @cmsbuild, @slava77, @davidlange6 can you please review it and eventually sign? Thanks.
@ptcox, @bellan, @abbiendi, @jhgoh this is something you requested to watch as well.
@davidlange6 you are the release manager for this.

cms-bot commands are listed here

slava77 · 2017-06-24T07:29:52Z

@cmsbuild please test

cmsbuild · 2017-06-24T07:30:03Z

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/20898/console Started: 2017/06/24 09:31

cmsbuild · 2017-06-24T08:41:16Z

-1

Tested at: 404d1d9

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
821918a
You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19421/20898/git-log-recent-commits
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19421/20898/git-merge-result

You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19421/20898/summary.html

I found follow errors while testing this PR

Failed tests: RelVals

RelVals:

When I ran the RelVals I found an error in the following worklfows:
136.731 step3

runTheMatrix-results/136.731_RunSinglePh2016B+RunSinglePh2016B+HLTDR2_2016+RECODR2_2016reHLT_skimSinglePh_HIPM+HARVESTDR2/step3_RunSinglePh2016B+RunSinglePh2016B+HLTDR2_2016+RECODR2_2016reHLT_skimSinglePh_HIPM+HARVESTDR2.log

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
821918a
You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19421/20898/git-log-recent-commits
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19421/20898/git-merge-result

cmsbuild · 2017-06-24T08:41:18Z

Comparison not run due to runTheMatrix errors (RelVals and Igprof tests were also skipped)

perrotta · 2017-06-24T08:42:56Z

RecoLocalMuon/CSCSegment/src/CSCSegAlgoRU.cc

-      chi2Norm_2D_ = 5*chi2Norm_2D_;
-      chi2_str_ = 100;
-      chi2Max = 2*chi2Max;
+    if(aState.doCollisions && search_disp && int(rechits.size()-used_rh)>2){//check if there are enough recHits left to build a segment from displaced vertices


Could you please also apply here the (unrelated) fix pointed out in #19081 (review)?

perrotta · 2017-06-24T09:13:29Z

The crash to wf 136.731 is already in the latest CMSSW_9_2_X_2017-06-23-2300 IB, and therefore unrelated from this PR. It is quite likely originated from the merging of #19194

ptcox · 2017-06-24T09:50:29Z

Hey, Slava! Thanks for doing our work for us. This seems like a sledgehammer fix! I still want Nikolay to i) simplify the logic flow, ii) leave the config parameters const and not perform algebra on them, and iii) remove historical comments that have no relation to the current code. But it is great that you've solved the non-reproducibility like this. Thanks!

slava77 · 2017-06-24T15:16:55Z

On 6/24/17 2:50 AM, ptcox wrote: Hey, Slava! Thanks for doing our work for us. This seems like a sledgehammer fix! I still want Nikolay to i) simplify the logic flow, ii) leave the config parameters const and not perform algebra on them, and iii) remove historical comments that have no relation to the current code. But it is great that you've solved the non-reproducibility like this. Thanks!

Hi Tim, Regarding the "sledgehammer fix", given the attempts by others to read into the code and find an issue, it felt much easier to make it reproducible from the first principles and not spend much effort trying to understand the full logic of the algorithm. Just to be clear: are you OK with this fix or would you rather we revert the algorithm to the old one and wait for Nikolay to provide fixes on the 3 items in your list? Please clarify. I'm fine either way, but we should know soon before the release is to be built. Thank you.

…

--slava

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#19421 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEdcbudfnDtXSec3GqHhki3IKRV6jNHgks5sHNvmgaJpZM4OEQTC>.

ptcox · 2017-06-24T15:25:36Z

Hi Slava, I'm happy with your fix temporarily. Longer term I think we need the code cleaned up as we all see it needs. But I (and all CSC) are grateful to you that we won't have to just revert to the old algorithm. Thanks, Tim

________________________________ From: Slava Krutelyov [notifications@github.com] Sent: 24 June 2017 17:16 To: cms-sw/cmssw Cc: Tim Cox; Mention

Subject: Re: [cms-sw/cmssw] make RU CSC segment algorithm reproducible by enforcing constness (#19421) On 6/24/17 2:50 AM, ptcox wrote: Hey, Slava! Thanks for doing our work for us. This seems like a sledgehammer fix! I still want Nikolay to i) simplify the logic flow, ii) leave the config parameters const and not perform algebra on them, and iii) remove historical comments that have no relation to the current code. But it is great that you've solved the non-reproducibility like this. Thanks!

Hi Tim, Regarding the "sledgehammer fix", given the attempts by others to read into the code and find an issue, it felt much easier to make it reproducible from the first principles and not spend much effort trying to understand the full logic of the algorithm. Just to be clear: are you OK with this fix or would you rather we revert the algorithm to the old one and wait for Nikolay to provide fixes on the 3 items in your list? Please clarify. I'm fine either way, but we should know soon before the release is to be built. Thank you.

--slava

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#19421 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEdcbudfnDtXSec3GqHhki3IKRV6jNHgks5sHNvmgaJpZM4OEQTC>.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#19421 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AE2FnntsGdKSAagli2ce1pRklTAcPgyqks5sHShrgaJpZM4OEQTC>.

slava77 · 2017-06-24T15:52:32Z

@cmsbuild please test

it looks like failures in 136.731 are somewhat random (the baseline used in the last test CMSSW_9_2_X_2017-06-23-2300 did not have the error).
Maybe it goes away.

cmsbuild · 2017-06-24T15:53:47Z

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/20900/console Started: 2017/06/24 17:55

cmsbuild · 2017-06-24T16:47:54Z

+1
Tested at: 404d1d9
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19421/20900/summary.html

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
821918a
You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19421/20900/git-log-recent-commits
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19421/20900/git-merge-result

cmsbuild · 2017-06-24T16:47:57Z

Comparison job queued.

slava77 · 2017-06-24T17:25:50Z

Here are some plots from running on 1K events with pt=1000 muons:

The trend seems to repeat (as in the PR description) that the CSC segments are becoming somewhat shorter, while more abundant

This change corresponds to somewhat better DyDz residuals (the effect is less pronounced on Dy or Dx)

There are more hits on tracks

and there is probably a higher efficiency (one bin here, not stat significant)

The more restrictive definition of efficiency (IIRC, the numerator requires a fraction of hits to be from muon sim hits) is clearly better by ~4-5% in the endcaps

q/pt pull (and other pulls) is not changing significantly

The plots above suggest to me that the behavior of the algorithm starting from a fixed initial state is appropriate (and the original version didn't get to it by starting from an incorrect initial point and then settling down on a better point after a few segment fits, by virtue of changing the settings in a fit remembered in the next fit calls).

cmsbuild · 2017-06-24T17:32:45Z

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19421/20900/summary.html

There are some workflows for which there are errors in the baseline:
10824.0 step 3
The results for the comparisons for these workflows could be incomplete
This means most likely that the IB is having errors in the relvals.The error does NOT come from this pull request

Comparison Summary:

You potentially added 3 lines to the logs
Reco comparison results: 931 differences found in the comparisons
DQMHistoTests: Total files compared: 21
DQMHistoTests: Total histograms compared: 1669851
DQMHistoTests: Total failures: 722
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 1668971
DQMHistoTests: Total skipped: 158
DQMHistoTests: Total Missing objects: 0
Checked 85 log files, 14 edm output root files, 21 DQM output files

slava77 · 2017-06-24T23:19:21Z

+1

for #19421 404d1d9

jenkins tests pass and comparisons with baseline show small changes that start in cscSegments and propagate downstream
local tests with multimuon, high pt muon without PU and also ttbar and ZMM with PU35 show essentially the same if not slightly better performance related to the updates in CSC segment reco

@perrotta I couldn't convince myself that the change mentioned in #19421 (comment) is required (it definitely would be if there was no int() on the left hand side already).

cmsbuild · 2017-06-24T23:20:42Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request requires discussion in the ORP meeting before it's merged. @davidlange6, @smuzaffar

perrotta · 2017-06-25T09:50:23Z

Slava Krutelyov <notifications@github.com> ha scritto:

@perrotta I couldn't convince myself that the change mentioned in #19421 (comment) is required (it definitely would be if there was no int() on the left hand side already).

Fine with me, because here the result will not change. Simply, I find in this case only useless, but normally also error prone, casting to an int the whole difference between an unsigned type and an int.

ptcox · 2017-06-25T10:27:10Z

Hi Andrea, I agree with you and I'll make sure Nikolay includes cleaning this up in the more thorough revision of the code in the next few weeks. Tim

________________________________ From: perrotta [notifications@github.com] Sent: 25 June 2017 11:50 To: cms-sw/cmssw Cc: Tim Cox; Mention Subject: Re: [cms-sw/cmssw] make RU CSC segment algorithm reproducible by enforcing constness (#19421) Slava Krutelyov <notifications@github.com> ha scritto:

@perrotta I couldn't convince myself that the change mentioned in #19421 (comment) is required (it definitely would be if there was no int() on the left hand side already).

Fine with me, because here the result will not change. Simply, I find in this case only useless, but normally also error prone, casting to an int the whole difference between an unsigned type and an int. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#19421 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AE2FnrbZS3FTI_63suBxCaIjVgI4bHb6ks5sHi1ggaJpZM4OEQTC>.

davidlange6 · 2017-06-25T13:28:08Z

+1

cmsbuild added this to the CMSSW_9_2_X milestone Jun 24, 2017

cmsbuild added comparison-pending orp-pending pending-signatures reconstruction-pending tests-pending labels Jun 24, 2017

cmsbuild added tests-started and removed tests-pending labels Jun 24, 2017

cmsbuild added comparison-notrun tests-rejected and removed comparison-pending tests-started labels Jun 24, 2017

perrotta reviewed Jun 24, 2017

View reviewed changes

cmsbuild added comparison-pending tests-pending and removed comparison-notrun tests-rejected labels Jun 24, 2017

cmsbuild removed the tests-pending label Jun 24, 2017

cmsbuild added the tests-started label Jun 24, 2017

cmsbuild added tests-approved and removed tests-started labels Jun 24, 2017

cmsbuild added comparison-available and removed comparison-pending labels Jun 24, 2017

cmsbuild added fully-signed reconstruction-approved and removed pending-signatures reconstruction-pending labels Jun 24, 2017

slava77 mentioned this pull request Jun 24, 2017

RU CSC segment builder parameters initialization fix #19081

Closed

cmsbuild added orp-approved and removed orp-pending labels Jun 25, 2017

cmsbuild merged commit 59a599d into cms-sw:master Jun 25, 2017

slava77 mentioned this pull request Jun 28, 2017

Muon does not reproduce #18605

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make RU CSC segment algorithm reproducible by enforcing constness #19421

make RU CSC segment algorithm reproducible by enforcing constness #19421

slava77 commented Jun 24, 2017

cmsbuild commented Jun 24, 2017

slava77 commented Jun 24, 2017

cmsbuild commented Jun 24, 2017 •

edited

cmsbuild commented Jun 24, 2017

cmsbuild commented Jun 24, 2017

perrotta Jun 24, 2017

perrotta commented Jun 24, 2017

ptcox commented Jun 24, 2017

slava77 commented Jun 24, 2017 via email

ptcox commented Jun 24, 2017 via email

slava77 commented Jun 24, 2017

cmsbuild commented Jun 24, 2017 •

edited

cmsbuild commented Jun 24, 2017

cmsbuild commented Jun 24, 2017

slava77 commented Jun 24, 2017

cmsbuild commented Jun 24, 2017

slava77 commented Jun 24, 2017

cmsbuild commented Jun 24, 2017

perrotta commented Jun 25, 2017 via email

ptcox commented Jun 25, 2017 via email

davidlange6 commented Jun 25, 2017

make RU CSC segment algorithm reproducible by enforcing constness #19421

make RU CSC segment algorithm reproducible by enforcing constness #19421

Conversation

slava77 commented Jun 24, 2017

cmsbuild commented Jun 24, 2017

slava77 commented Jun 24, 2017

cmsbuild commented Jun 24, 2017 • edited

cmsbuild commented Jun 24, 2017

cmsbuild commented Jun 24, 2017

perrotta Jun 24, 2017

Choose a reason for hiding this comment

perrotta commented Jun 24, 2017

ptcox commented Jun 24, 2017

slava77 commented Jun 24, 2017 via email

ptcox commented Jun 24, 2017 via email

slava77 commented Jun 24, 2017

cmsbuild commented Jun 24, 2017 • edited

cmsbuild commented Jun 24, 2017

cmsbuild commented Jun 24, 2017

slava77 commented Jun 24, 2017

cmsbuild commented Jun 24, 2017

slava77 commented Jun 24, 2017

cmsbuild commented Jun 24, 2017

perrotta commented Jun 25, 2017 via email

ptcox commented Jun 25, 2017 via email

davidlange6 commented Jun 25, 2017

cmsbuild commented Jun 24, 2017 •

edited

cmsbuild commented Jun 24, 2017 •

edited