Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trajectory cleanup #3010

Closed
wants to merge 34 commits into from
Closed

Conversation

VinInn
Copy link
Contributor

@VinInn VinInn commented Mar 25, 2014

Pure technical changes to cleanup Trajectory and TrackCandidate memory churn.
Still a lot can be done (for instance in Conversion code).
More will be done once fitters will be migrated to use standard TRH instead of TTRH.

In any case it is a sort of lost battle as classes whose objects are stored in vector in the event cannot be made non-copiable due to Reflex. Will try again with Root6.

No regression expected.
No regression observed.

bapavlov and others added 29 commits March 11, 2014 20:02
@cmsbuild
Copy link
Contributor

-1
When I ran the RelVals I found an error in the following worklfows:
4.22 step2

runTheMatrix-results/4.22_RunCosmics2011A+RunCosmics2011A+RECOCOSD+ALCACOSD+SKIMCOSD+HARVESTDC/step2_RunCosmics2011A+RunCosmics2011A+RECOCOSD+ALCACOSD+SKIMCOSD+HARVESTDC.log

1000.0 step2

runTheMatrix-results/1000.0_RunMinBias2011A+RunMinBias2011A+TIER0+SKIMD+HARVESTDfst2+ALCASPLIT/step2_RunMinBias2011A+RunMinBias2011A+TIER0+SKIMD+HARVESTDfst2+ALCASPLIT.log

1001.0 step2

runTheMatrix-results/1001.0_RunMinBias2011A+RunMinBias2011A+TIER0EXP+ALCAEXP+ALCAHARVD/step2_RunMinBias2011A+RunMinBias2011A+TIER0EXP+ALCAEXP+ALCAHARVD.log

1003.0 step2

runTheMatrix-results/1003.0_RunMinBias2012A+RunMinBias2012A+RECODDQM+HARVESTDDQM/step2_RunMinBias2012A+RunMinBias2012A+RECODDQM+HARVESTDDQM.log

4.53 step3

runTheMatrix-results/4.53_RunPhoton2012B+RunPhoton2012B+HLTD+RECODreHLT+HARVESTDreHLT/step3_RunPhoton2012B+RunPhoton2012B+HLTD+RECODreHLT+HARVESTDreHLT.log

you can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3010/643/summary.html

@VinInn
Copy link
Contributor Author

VinInn commented Mar 25, 2014

ok, strange but possible given the trace back (looks like a double delete)
I run myself limited matrix and succeeded

4.22_RunCosmics2011A+RunCosmics2011A+RECOCOSD+ALCACOSD+SKIMCOSD+HARVESTDC Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED Step4-PASSED - time date Tue Mar 25 10:05:34 2014-date Tue Mar 25 10:03:11 2014; exit: 0 0 0 0 0
4.53_RunPhoton2012B+RunPhoton2012B+HLTD+RECODreHLT+HARVESTDreHLT Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED - time date Tue Mar 25 10:14:22 2014-date Tue Mar 25 10:03:15 2014; exit: 0 0 0 0
5.1_TTbar+TTbarFS+HARVESTFS Step0-PASSED Step1-PASSED - time date Tue Mar 25 10:05:09 2014-date Tue Mar 25 10:03:17 2014; exit: 0 0
8.0_BeamHalo+BeamHaloINPUT+DIGICOS+RECOCOS+ALCABH+HARVESTCOS Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED Step4-PASSED - time date Tue Mar 25 10:06:55 2014-date Tue Mar 25 10:03:20 2014; exit: 0 0 0 0 0
25.0_TTbar+TTbarINPUT+DIGI+RECO+HARVEST+ALCATT Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED Step4-PASSED - time date Tue Mar 25 10:15:22 2014-date Tue Mar 25 10:05:14 2014; exit: 0 0 0 0 0
1000.0_RunMinBias2011A+RunMinBias2011A+TIER0+SKIMD+HARVESTDfst2+ALCASPLIT Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED Step4-PASSED - time date Tue Mar 25 10:10:43 2014-date Tue Mar 25 10:05:35 2014; exit: 0 0 0 0 0
1001.0_RunMinBias2011A+RunMinBias2011A+TIER0EXP+ALCAEXP+ALCAHARVD Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED - time date Tue Mar 25 10:10:12 2014-date Tue Mar 25 10:06:59 2014; exit: 0 0 0 0
7 7 6 6 4 tests passed, 0 0 0 0 0 failed

will check again,
checkdeps -a again
and see what's wrong

@VinInn
Copy link
Contributor Author

VinInn commented Mar 25, 2014

not sure what I could ever done to crash in PF though
#17 0x00007fa35bd55f9d in KDTreeLinkerTrackEcal::buildTree() () from /build/cmsbuild/jenkins-workarea/workspace/ib-integration-CMSSW_7_1_X-slc6_amd64_gcc481/CMSSW_7_1_X_2014-03-25-0200/lib/slc6_amd64_gcc481/libRecoParticleFlowPFProducer.so
#18 0x00007fa35bd537da in KDTreeLinkerBase::process() () from /build/cmsbuild/jenkins-workarea/workspace/ib-integration-CMSSW_7_1_X-slc6_amd64_gcc481/CMSSW_7_1_X_2014-03-25-0200/lib/slc6_amd64_gcc481/libRecoParticleFlowPFProducer.so
#19 0x00007fa35bd7b60c in PFBlockAlgo::findBlocks() () from /build/cmsbuild/jenkins-workarea/workspace/ib-integration-CMSSW_7_1_X-slc6_amd64_gcc481/CMSSW_7_1_X_2014-03-25-0200/lib/slc6_amd64_gcc481/libRecoParticleFlowPFProducer.so
#20 0x00007fa35be53666 in PFBlockProducer::produce(edm::Event&, edm::EventSetup const&) () from /build/cmsbuild/jenkins-workarea/workspace/ib-integration-CMSSW_7_1_X-slc6_amd64_gcc481/CMSSW_7_1_X_2014-03-25-0200/lib/slc6_amd64_gcc481/pluginRecoParticleFlowPFProducerPlugins.so

@VinInn
Copy link
Contributor Author

VinInn commented Mar 25, 2014

Sorry, I cannot reproduce the segfaults in my development area based on
/afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc490/cms/cmssw/CMSSW_7_1_X_2014-03-21-0200/

@VinInn
Copy link
Contributor Author

VinInn commented Mar 25, 2014

did
cmsrel CMSSW_7_1_X_2014-03-25-1400
cd CMSSW_7_1_X_2014-03-25-1400
554 18:42 cmsenv
556 18:43 cd src
559 18:44 git cms-merge-topic VinInn:TrajectoryCleanup
560 19:01 scram b -j 8
runTheMatrix.py --useInput=all --list=limited > & lim.log &
7 7 6 6 4 tests passed, 0 0 0 0 0 failed

@cmsbuild
Copy link
Contributor

-1
When I ran the RelVals I found an error in the following worklfows:
4.22 step2

runTheMatrix-results/4.22_RunCosmics2011A+RunCosmics2011A+RECOCOSD+ALCACOSD+SKIMCOSD+HARVESTDC/step2_RunCosmics2011A+RunCosmics2011A+RECOCOSD+ALCACOSD+SKIMCOSD+HARVESTDC.log

1000.0 step2

runTheMatrix-results/1000.0_RunMinBias2011A+RunMinBias2011A+TIER0+SKIMD+HARVESTDfst2+ALCASPLIT/step2_RunMinBias2011A+RunMinBias2011A+TIER0+SKIMD+HARVESTDfst2+ALCASPLIT.log

1001.0 step2

runTheMatrix-results/1001.0_RunMinBias2011A+RunMinBias2011A+TIER0EXP+ALCAEXP+ALCAHARVD/step2_RunMinBias2011A+RunMinBias2011A+TIER0EXP+ALCAEXP+ALCAHARVD.log

1003.0 step2

runTheMatrix-results/1003.0_RunMinBias2012A+RunMinBias2012A+RECODDQM+HARVESTDDQM/step2_RunMinBias2012A+RunMinBias2012A+RECODDQM+HARVESTDDQM.log

4.53 step3

runTheMatrix-results/4.53_RunPhoton2012B+RunPhoton2012B+HLTD+RECODreHLT+HARVESTDreHLT/step3_RunPhoton2012B+RunPhoton2012B+HLTD+RECODreHLT+HARVESTDreHLT.log

you can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3010/650/summary.html

@VinInn
Copy link
Contributor Author

VinInn commented Mar 26, 2014

I can reproduce the crash with gcc4.8.1
this points to a difference in behavior, between 4.8.1 and 4.9.0, of either move semantics or atomics: most probably a combination of both.
Valgrind is running.

If I cannot find a fix (and a reasonable justification) in a day or so, I will cherry pick few urgent mods that are in here and make a different pull request with those.

@VinInn
Copy link
Contributor Author

VinInn commented Mar 26, 2014

so Valgrind on gcc490 gives no error of any sort
valgrind on gcc481 produces

==2917== Invalid read of size 8
==2917==    at 0x32DB5562: Trajectory::lastMeasurement() const (intrusive_ptr.hpp:150)
==2917==    by 0x32DB3516: CosmicMuonTrajectoryBuilder::buildSecondHalf(Trajectory&) (CosmicMuonTrajectoryBuilder.cc:573)
==2917==    by 0x32DB48CB: CosmicMuonTrajectoryBuilder::trajectories(TrajectorySeed const&) (CosmicMuonTrajectoryBuilder.cc:367)
==2917==    by 0x15CC35C8: MuonTrackFinder::reconstruct(edm::Handle<edm::View<TrajectorySeed> > const&, edm::Event&) (MuonTrackFinder.cc:95)

as predicted an intrusive ref-counting that is not doing its job...

we do not define constructors and assignment for
ConstReferenceCountingPointer
we rely on the behavior of boost::intrusive_ptr
(have to get rid of those!)
so most probably move (in gcc4.8) decrement the counter

@cmsbuild
Copy link
Contributor

Pull request #3010 was updated. @civanch, @Dr15Jones, @danduggan, @ianna, @mdhildreth, @cmsbuild, @anton-a, @thspeer, @rovere, @deguio, @slava77, @Degano, @ojeda, @ktf, @nclopezo can you please check and sign again.

@@ -0,0 +1,9 @@
import FWCore.ParameterSet.Config as cms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VinInn - could you, please, delete this file from PR. It looks like this file is misplaced. There is another one like that in Configuration/Geometry already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge/rebase mess.
will try to rebase.
thanks for spotting this...

@VinInn VinInn closed this Mar 27, 2014
@VinInn VinInn deleted the TrajectoryCleanup branch July 13, 2016 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants