New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Phase2 memory corruption #16696
Fix Phase2 memory corruption #16696
Conversation
A new Pull Request was created by @kpedro88 (Kevin Pedro) for CMSSW_9_0_X. It involves the following packages: RecoParticleFlow/PFTracking @cmsbuild, @cvuosalo, @slava77, @monttj, @davidlange6 can you please review it and eventually sign? Thanks. cms-bot commands are listed here #13028 |
@cmsbuild please test |
The tests are being triggered in jenkins. |
Comparison job queued. |
+1 |
@kpedro88: Why does this fix make small changes to tracking, jet, and tagging quantities? Changes are appearing in the Jenkins DQM plots. |
@cvuosalo |
On 11/23/16 4:34 PM, Kevin Pedro wrote:
@cvuosalo <https://github.com/cvuosalo> |GenericBinFinderInZ| is used in
Phase2 tracking code. Before this fix, it could try to access
uninitialized memory (|reserve|d but not |push_back|d). I presume the
use of uninitialized values happened fairly infrequently or there would
have been obvious physics performance indicators.
GenericBinFinderInZ is used in muon code since a while (CMSSW_1).
Did you find a bug or is it a kludge fixing a problem with an origin
somewhere else?
…
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16696 (comment)>, or
mute the thread
<https://github.com/notifications/unsubscribe-auth/AEdcbuBkWbxaBn-8ycK_10SNCAr7lNBRks5rBNt_gaJpZM4K3chX>.
|
It looks like an actual bug to me.
With that setup, |
On 11/23/16 4:58 PM, Kevin Pedro wrote:
It looks like an actual bug to me.
|theNbins( last-first) theBins.reserve(theNbins); for (ConstItr i=first;
i<last-1; i++) { |
With that setup, |theBins| reserves 1 more entry than gets filled
(|last-1| is never pushed back). |GenericBinFinderInZ::binIndex(T z)|
can return any index up through |theNbins-1|, which can then be used
with |GenericBinFinderInZ::binPosition(int ind)| to access values in
|theBins|. |theBins[theNbins-1]| is uninitialized (according to both the
above logic and valgrind).
OK.
Looks like the muon code didn't use the binPosition method to stumble on
theBins size mismatch
…
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16696 (comment)>, or
mute the thread
<https://github.com/notifications/unsubscribe-auth/AEdcbkpdlMldHSJbN1hvEYICzh2mb_qMks5rBOFRgaJpZM4K3chX>.
|
Tests in progress... |
+1 Fixing two memory corruption problems seen in Phase 2 testing. #16697 is the 81X version of this PR. The code changes are satisfactory. Jenkins tests against baseline CMSSW_9_0_X_2016-11-19-1100 show no significant differences, except in Phase 2 workflows, where there are numerous tiny, insignificant differences. A test of workflow 1321.0_SingleMuPt100 with 1000 events against baseline CMSSW_9_0_X_2016-11-19-1100 shows no significant differences, while a test of Phase 2 workflow 23234.0_TTbar_14TeV+TTbar_14TeV_TuneCUETP8M1_2023D5_GenSimHLBeamSpotFull14+DigiFull_2023D5 with 150 events against baseline CMSSW_9_0_X_2016-11-21-1400 shows numerous tiny, insignificant differences in downstream Reco quantities related to jets, tracks, and vertices. |
This pull request is fully signed and it will be integrated in one of the next CMSSW_9_0_X IBs (tests are also fine). This pull request requires discussion in the ORP meeting before it's merged. @slava77, @davidlange6, @smuzaffar |
This PR addresses two of the memory corruption issues in Phase2 workflows, found by valgrind and noted in #16493 (which contains extensive discussion of the causes).