New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing memory leak in new Tracking AlCa producer #24626
Fixing memory leak in new Tracking AlCa producer #24626
Conversation
The code-checks are being triggered in jenkins. |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-24626/6529 |
A new Pull Request was created by @Sam-Harper (Sam Harper) for master. It involves the following packages: Calibration/TkAlCaRecoProducers @cmsbuild, @franzoni, @pohsun, @tocheng, @lpernie can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
please test |
The tests are being triggered in jenkins. |
@@ -108,22 +107,20 @@ void CalibrationTrackSelectorFromDetIdList::produce(edm::Event& iEvent, const ed | |||
} | |||
|
|||
// here there will be the selection | |||
hits.push_back(hit->clone()); | |||
hits.emplace_back(hit->clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @Sam-Harper @mmusich - this copy is not needed as the hits are again copied later in the code. perhaps just a std::vector<const TrackingRecHit*> filled with 'hit' rather than 'hit->clone()' would be a more efficient fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I'm not sure I follow. I dont see where it copies. I can see places where it looks like it copies but doesnt. ownHits is a edm::OwnVector which we pass pointers into (and their ownership) not a vector. And when we create the TrackCandidate its also not a copy, it moves the own vector into it (well swaps it). Is it due to the return of that function, I though return value optimization would take care of that although I'm far from an expert in these things.
Regardless which of us is correct, you raise a good point, we should have a vector of pointers here and only if we actually save a track should we start making copies, as its coded now its inefficient. I think I'll leave that as an exercise for @mmusich, my involvement is solely a minimal fix for the problems it was causing EGamma (well for CMS but Egamma was the canary)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder who really is aware of the four(!) different semantics of OwnVector::push_back
(I was not).
https://cmssdt.cern.ch/dxr/CMSSW/source/DataFormats/Common/interface/OwnVector.h#288
In my opinion they are just confusing and obfuscating...
OwnVector is a remnant from the time when OO was supposed to "hide complexity" behind a common "simple", "natural" interface: failed! :-(look to its iterator)
anyhow this is how history goes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed up at #24759
Comparison job queued. |
Comparison is ready Comparison Summary:
|
+1 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2) |
@Sam-Harper thanks for the fix, I will move forward with this PR so as to avoid the problem in 10_3_0_pre5, and let @mmusich in case work further on the code optimization, if deemed necessary |
+1 |
No problem and a sensible strategy. Thanks! Btw as an aside my test job of one problematic file ended up consuming 32 gb of memory without this fix and 11gb with. |
Dear All,
The EGamma 2018 RelVal workflows have been crashing due to jobs existing their memory allocations since pre3. After investigation from the EGamma group, we have discovered that CalibrationTrackSelectorFromDetIdList (newly introduced for pre3) has a large memory leak where it leaks the hits it copies if it does not decide to save the track. For example after 1850 events for one of the problematic jobs, this module is responsible for 20% of the live memory.
http://sharper.web.cern.ch/sharper/cgi-bin/igprof-navigator/egMem1030pre3Live1851
With this PR, this consumption is reduced to a level below the measurable threshold.
http://sharper.web.cern.ch/sharper/cgi-bin/igprof-navigator/egMem1030pre3FixedLive1851
This PR fixes the memory leak in CalibrationTrackSelectorFromDetIdList by using smart rather than bare pointers. Could I request that the AlCa group encourages new code to follow this approach in the future, its good practice in modern c++ to do use smart pointers where possible and saves effort (such as this) in the long run.
More details:
https://sharper.web.cern.ch/sharper/cms/talks/egammaRelValMemLeak.pdf
On testing:
I have only tested that it fixes the memory leak, I did not look at the output of the module to ensure it was still the same. I would be surprised if it changed due to these technical changes but just to warn that bit hasnt been tested. If in few days my jobs finish successfully (ie they arent killed or anything) I can supply the relevant output root files if requested too but otherwise I dont intend to do anything more
Best,
Sam