Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attempt to fix SiStrip DQM reproducibility issue: allign members common to root and full c++ #20050

Merged

Conversation

slava77
Copy link
Contributor

@slava77 slava77 commented Aug 3, 2017

DPGAnalysis/SiStripTools/interface/Multiplicities.h has different class data member content depending on the compilation mode (used for dictionary generation or for execution in algorithms at run time). This PR makes the layout for common data members to be the same.
This should hopefully resolve https://its.cern.ch/jira/browse/CMSTRACK-151

This is meant to solve the following problem
We frequently have SiStrip DQM plots coming and going altogether in AlCaReco/Run summary/SiStrip/MechanicalView folder.
These are produced by SiStripCalZeroBiasMonitorCluster, which is running in pathALCARECOSiStripCalZeroBias right after LargeSiStripClusterEvents.
The reason for empty plots is that an otherwise the same setup can have this module running when needed or not running at all by simply rerunning cmsRun with the same config.

The most direct reason for SiStripCalZeroBiasMonitorCluster to be on/off is that LargeSiStripClusterEvents filter can be on or off (while other modules preceding in the path appear to give consistently reproducible results).

I was not able to get a setup that leads to a reproducible bad behavior to be able to confirm from first principles where the problem is.
The only meaningful reason that I could see for this behavior was in this duality of the data layout in SingleMultiplicity class.
Note that the problem did not show up at noticeable level until December 9 2016,
even though the relevant change in the member data layout happened in 2014 in #3094.
Most likely the appearance of the issue is related to #16821 merged Dec 5.

The evidence of a fix to the problem is based on rerunning [extended] short matrix several times and on different machines to have better sampling using CMSSW_9_3_0_pre3 as the baseline.

  • the baseline setup has workflows with differences corresponding to SiStripCalZeroBiasMonitorCluster going off
  • the baseline+this PR has no workflows with differences corresponding to SiStripCalZeroBiasMonitorCluster going off
  • comparisons of baseline with "baseline+thisPR" show workflows where the baseline histograms are empty (SiStripCalZeroBiasMonitorCluster is off) while they are filled in "baseline+thisPR", which is the expected behavior of the fix.

A better solution is to properly decouple classes defined in DPGAnalysis/SiStripTools/interface/Multiplicities.h to be data formats and update the logic of ByMultiplicityEventFilter.
This will also remove the self-aware data product violation mentioned already in #16821 (comment)

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 3, 2017

A new Pull Request was created by @slava77 (Slava Krutelyov) for master.

It involves the following packages:

DPGAnalysis/SiStripTools

@cmsbuild, @monttj can you please review it and eventually sign? Thanks.
@ebrondol, @threus, @mmusich, @venturia this is something you requested to watch as well.
@davidlange6 you are the release manager for this.

cms-bot commands are listed here

@slava77
Copy link
Contributor Author

slava77 commented Aug 3, 2017

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 3, 2017

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/22075/console Started: 2017/08/03 20:57

@slava77
Copy link
Contributor Author

slava77 commented Aug 3, 2017

@Dr15Jones please check this to perhaps confirm that the old code can have ill-defined behavior (e.g. ::mult method returning inappropriate data member/memory address value depending on the order of loaded libraries or some other external factors)
while the updated version should make it more stable

@Dr15Jones
Copy link
Contributor

Breaking the C++ 'one definition' rule is extremely tricky to determine what might happen. If any of these classes were used in a container or C array, bad things could happen.

Exactly what problem was this conditional compilation attempting to fix?

@Dr15Jones
Copy link
Contributor

If this object is used as a member data to another object (e.g. edm::Wrapper<>) then the padding will be different and things would also break.

@slava77
Copy link
Contributor Author

slava77 commented Aug 3, 2017 via email

@slava77
Copy link
Contributor Author

slava77 commented Aug 3, 2017 via email

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 3, 2017

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 3, 2017

Comparison job queued.

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 3, 2017

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-20050/22075/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 25
  • DQMHistoTests: Total histograms compared: 2643105
  • DQMHistoTests: Total failures: 44402
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2598522
  • DQMHistoTests: Total skipped: 181
  • DQMHistoTests: Total Missing objects: 0
  • Checked 102 log files, 14 edm output root files, 25 DQM output files

@slava77
Copy link
Contributor Author

slava77 commented Aug 3, 2017

@davidlange6
Copy link
Contributor

merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants