Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Phase2 memory corruption #16696

Merged
merged 2 commits into from Nov 26, 2016
Merged

Conversation

kpedro88
Copy link
Contributor

This PR addresses two of the memory corruption issues in Phase2 workflows, found by valgrind and noted in #16493 (which contains extensive discussion of the causes).

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @kpedro88 (Kevin Pedro) for CMSSW_9_0_X.

It involves the following packages:

RecoParticleFlow/PFTracking
Utilities/BinningTools

@cmsbuild, @cvuosalo, @slava77, @monttj, @davidlange6 can you please review it and eventually sign? Thanks.
@mmarionncern, @rafaellopesdesa, @wddgit, @lgray, @Martin-Grunewald, @cbernet, @bachtis this is something you requested to watch as well.
@slava77, @smuzaffar you are the release manager for this.

cms-bot commands are listed here #13028

@kpedro88
Copy link
Contributor Author

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 20, 2016

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/16492/console

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

@monttj
Copy link
Contributor

monttj commented Nov 21, 2016

+1

@cvuosalo
Copy link
Contributor

@kpedro88: Why does this fix make small changes to tracking, jet, and tagging quantities? Changes are appearing in the Jenkins DQM plots.

@kpedro88
Copy link
Contributor Author

@cvuosalo GenericBinFinderInZ is used in Phase2 tracking code. Before this fix, it could try to access uninitialized memory (reserved but not push_backd). I presume the use of uninitialized values happened fairly infrequently or there would have been obvious physics performance indicators.

@slava77
Copy link
Contributor

slava77 commented Nov 24, 2016 via email

@kpedro88
Copy link
Contributor Author

It looks like an actual bug to me.

theNbins( last-first)
theBins.reserve(theNbins);
for (ConstItr i=first; i<last-1; i++) {

With that setup, theBins reserves 1 more entry than gets filled (last-1 is never pushed back). GenericBinFinderInZ::binIndex(T z) can return any index up through theNbins-1, which can then be used with GenericBinFinderInZ::binPosition(int ind) to access values in theBins. theBins[theNbins-1] is uninitialized (according to both the above logic and valgrind).

@slava77
Copy link
Contributor

slava77 commented Nov 24, 2016 via email

@cvuosalo
Copy link
Contributor

Tests in progress...

@cvuosalo
Copy link
Contributor

+1

For #16696 4ce4f0f

Fixing two memory corruption problems seen in Phase 2 testing.

#16697 is the 81X version of this PR.

The code changes are satisfactory. Jenkins tests against baseline CMSSW_9_0_X_2016-11-19-1100 show no significant differences, except in Phase 2 workflows, where there are numerous tiny, insignificant differences. A test of workflow 1321.0_SingleMuPt100 with 1000 events against baseline CMSSW_9_0_X_2016-11-19-1100 shows no significant differences, while a test of Phase 2 workflow 23234.0_TTbar_14TeV+TTbar_14TeV_TuneCUETP8M1_2023D5_GenSimHLBeamSpotFull14+DigiFull_2023D5 with 150 events against baseline CMSSW_9_0_X_2016-11-21-1400 shows numerous tiny, insignificant differences in downstream Reco quantities related to jets, tracks, and vertices.

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_9_0_X IBs (tests are also fine). This pull request requires discussion in the ORP meeting before it's merged. @slava77, @davidlange6, @smuzaffar

@davidlange6 davidlange6 merged commit 6bbf43f into cms-sw:CMSSW_9_0_X Nov 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants