Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending the Phase2 2023D4 Workflow to Reco (up to Harvesting) steps #16407

Merged
merged 3 commits into from Nov 8, 2016

Conversation

boudoul
Copy link
Contributor

@boudoul boudoul commented Nov 1, 2016

Titles says it all

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 1, 2016

A new Pull Request was created by @boudoul (boudoul) for CMSSW_8_1_X.

It involves the following packages:

Configuration/PyReleaseValidation
RecoPixelVertexing/PixelTriplets

@cvuosalo, @fabozzi, @cmsbuild, @srimanob, @slava77, @hengne, @davidlange6 can you please review it and eventually sign? Thanks.
@ghellwig, @makortel, @GiacomoSguazzoni, @rovere, @VinInn, @Martin-Grunewald, @dgulhan this is something you requested to watch as well.
@slava77, @smuzaffar you are the release manager for this.

cms-bot commands are listed here #13028

@boudoul
Copy link
Contributor Author

boudoul commented Nov 1, 2016

Adding few watchers not in the usual list : @kpedro88 @delaere @atricomi @ebrondol

@boudoul
Copy link
Contributor Author

boudoul commented Nov 1, 2016

Tested with one of the 2024D4 WF (number 21200 )

@slava77
Copy link
Contributor

slava77 commented Nov 1, 2016

@boudoul maybe we should add D4 to the short matrix already

@slava77
Copy link
Contributor

slava77 commented Nov 1, 2016

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 1, 2016

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/16116/console

@boudoul
Copy link
Contributor Author

boudoul commented Nov 1, 2016

Hi @slava77 , 2023D4 (212XX) is already in short matrix
https://github.com/cms-sw/cmssw/blob/CMSSW_8_1_X/Configuration/PyReleaseValidation/python/relval_2023.py#L21
Unless I misunderstood your request ?

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 1, 2016

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 1, 2016

Comparison job queued.

@slava77
Copy link
Contributor

slava77 commented Nov 1, 2016

On 11/1/16 8:14 AM, boudoul wrote:

Hi @slava77 https://github.com/slava77 , 2023D4 (212XX) is already in
short matrix
https://github.com/cms-sw/cmssw/blob/CMSSW_8_1_X/Configuration/PyReleaseValidation/python/relval_2023.py#L21
Unless I misunderstood your request ?

only 20024 and 22424 are in the short matrix

The list is controlled in
Configuration/PyReleaseValidation/scripts/runTheMatrix.py


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#16407 (comment), or
mute the thread
https://github.com/notifications/unsubscribe-auth/AEdcbnrBSE0HmFzc8yIYGbRGIpjxgKo5ks5q51dBgaJpZM4KmHEy.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 1, 2016

Pull request #16407 was updated. @cvuosalo, @fabozzi, @cmsbuild, @srimanob, @slava77, @hengne, @davidlange6 can you please check and sign again.

@boudoul
Copy link
Contributor Author

boudoul commented Nov 1, 2016

thanks @slava77 I forgot to look at this script - done in my last commit

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 1, 2016

@slava77
Copy link
Contributor

slava77 commented Nov 1, 2016

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 1, 2016

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/16130/console

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 3, 2016

Pull request #16407 was updated. @cvuosalo, @fabozzi, @cmsbuild, @srimanob, @slava77, @hengne, @davidlange6 can you please check and sign again.

@boudoul
Copy link
Contributor Author

boudoul commented Nov 3, 2016

Rebase done

@slava77
Copy link
Contributor

slava77 commented Nov 3, 2016

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 3, 2016

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/16177/console

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 3, 2016

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 3, 2016

Comparison job queued.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 3, 2016

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-16407/16177/summary.html

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • 23224.0_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2023D5_GenSimHLBeamSpotFull+DigiFull_2023D5+RecoFullGlobal_2023D5+HARVESTFullGlobal_2023D5

@slava77
Copy link
Contributor

slava77 commented Nov 4, 2016

@boudoul @ebrondol
is 810pre15 GEN-SIM still usable for this D4 workflow?

I'm not sure if there were any conditions [as beamspot] or geometry changes that would make this inconsistent.
I'm trying to check this, but it looks like some tracking-related modules may be taking much more time that I used to see, suggestive of some issues.

@ebrondol
Copy link
Contributor

ebrondol commented Nov 4, 2016

@slava77 I would näively think so, but I do not know for sure.

@boudoul
Copy link
Contributor Author

boudoul commented Nov 4, 2016

Hi @slava77 , geometry has changed since pre15 (at least for the tracker, no idea from beamspot etc..) but I would encourage to make test with a fresh geometry

@slava77
Copy link
Contributor

slava77 commented Nov 4, 2016

On 11/4/16 7:37 AM, boudoul wrote:

Hi @slava77 https://github.com/slava77 , geometry has changed since
pre15 (at least for the tracker, no idea from beamspot etc..) but I
would encourage to make test with a fresh geometry

OK.
Makes sense then.

With pre15 GEN-SIM I see
in CMSSW_8_1_X_2016-11-03-1100 with this PR pixelTracks taking
680s/event in D4 ttbar with PU200

compared to pre15 D3Timing ttbar with PU200 pixelTracks taking 5.3 s/event

Let's hope this was the geometry etc mismatch rather than the actual
realistic timing.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#16407 (comment), or
mute the thread
https://github.com/notifications/unsubscribe-auth/AEdcbv1C9s7N-gStBB5OzIYtD-z0PZcKks5q60NCgaJpZM4KmHEy.

@slava77
Copy link
Contributor

slava77 commented Nov 5, 2016

well, after redoing GEN-SIM from scratch, timing is still just about as bad.

@slava77
Copy link
Contributor

slava77 commented Nov 7, 2016

+1

for #16407 653d756

  • code changes are as described
  • jenkins tests pass: new 21224.0 2023D4 workflow shows up in the short matrix

Here are some notes from comparing 2023D3Timing to 2023D4 ttbar PU200 in CMSSW_8_1_X_2016-11-03-1100

  • time per event goes up from 5.7E3 to 8.0E3 seconds per event. Most of the increase is in the reco modules.
  • conversionStepTracks are missing in D4 because all upstream photonConvTrajSeedFromSingleLeg fail due to cluster multiplicity check

The top modules by CPU fractional change are (I'm ignoring timing-specific modules, which are not relevant for this comparison)

 delta/average    delta/jobAverage   D3T            D4           name
   +1.906519      +0.02%        34.16 ms/ev ->      1427.57 ms/ev dedxTruncated40
   +1.874353      +0.02%        33.66 ms/ev ->      1037.84 ms/ev dedxHarmonic2
   +1.451549      +0.01%        91.03 ms/ev ->       572.85 ms/ev phase2ITPixelClusters

   +1.214537      +1.69%     30922.40 ms/ev ->    126551.00 ms/ev initialStepTrackCandidates
   +1.173648      +5.99%    119261.00 ms/ev ->    458028.00 ms/ev lowPtTripletStepTrackCandidates
   +1.066569      +7.13%    176446.00 ms/ev ->    579672.00 ms/ev pixelTracks
   +1.057395      +0.39%      9958.29 ms/ev ->     32300.30 ms/ev unsortedOfflinePrimaryVertices
   +0.841172      +0.26%      9981.18 ms/ev ->     24471.50 ms/ev particleFlowDisplacedVertex
   +0.648147      +4.65%    274368.00 ms/ev ->    537460.00 ms/ev pfTrackElec
   +0.333595      +7.43%   1049960.00 ms/ev ->   1470340.00 ms/ev ecalDrivenElectronSeeds
   +0.305382      +1.48%    232413.00 ms/ev ->    316178.00 ms/ev lowPtQuadStepSeeds
   +0.297064      +6.91%   1119740.00 ms/ev ->   1510400.00 ms/ev allConversions
   +0.275092      +1.21%    214635.00 ms/ev ->    283096.00 ms/ev muons1stStep

#and some reductions
   -0.984058      -0.01%       562.27 ms/ev ->       191.43 ms/ev pixelPairStepTracks
   -0.619127      -2.40%    287552.00 ms/ev ->    151605.00 ms/ev highPtTripletStepTrackCandidates
   -0.482136      -0.29%     41950.20 ms/ev ->     25653.20 ms/ev detachedQuadStepTrackCandidates

The time spent in electron and conversion tracking is problematic. @lgray @fcouderc mostly on the side of the total, rather than a change from this PR.
Different tracker layout appears to be the main reason for the CPU increase. Looking at some performance plots, there are improvements in tracking with D4 (red) vs D3 (black)

wf2023pu200d3tvd4_general_eff_vs_eta

wf2023pu200d3tvd4_general_eff_vs_pt

wf2023pu200d3tvd4_general_layers_v_eta

wf2023pu200d3tvd4_general_losthit_vs_eta

wf2023pu200d3tvd4_general_pulldz_vs_eta

wf2023pu200d3tvd4_general_pulltheta_vs_eta

There is also increase in duplicate and fake rate and the uncertainties appear to be overestimated (chi2 is small)
wf2023pu200d3tvd4_general_chi2_v_eta
wf2023pu200d3tvd4_general_dup_vs_eta
wf2023pu200d3tvd4_general_dup_vs_pt
wf2023pu200d3tvd4_general_fr_vs_pt
wf2023pu200d3tvd4_general_fr_vs_eta

@davidlange6
Copy link
Contributor

Hi all -should we go ahead and merge this? Presumably the CPU issues will be addressed in later PRs now that the workflow is defined

@ebrondol
Copy link
Contributor

ebrondol commented Nov 7, 2016

Hi @slava77, just to point out that some of these results (such as fakerate) will be already a bit better with the introduction of the PR#16470. Thanks!

@slava77
Copy link
Contributor

slava77 commented Nov 7, 2016

On 11/7/16 3:23 PM, David Lange wrote:

Hi all -should we go ahead and merge this? Presumably the CPU issues
will be addressed in later PRs now that the workflow is defined

makes sense to me.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#16407 (comment), or
mute the thread
https://github.com/notifications/unsubscribe-auth/AEdcbvcE0a3vDcFOVbOwVv2BrbNHk7Vtks5q7zRigaJpZM4KmHEy.

@slava77
Copy link
Contributor

slava77 commented Nov 7, 2016

On 11/7/16 3:25 PM, ebrondol wrote:

Hi @slava77 https://github.com/slava77, just to point out that some of
these results (such as fakerate) will be already a bit better with the
introduction of the PR#16470. Thanks!

Will the CPU use get even worse?

IIUC, 16470 increases the number of layer connections in the navigation


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#16407 (comment), or
mute the thread
https://github.com/notifications/unsubscribe-auth/AEdcblFP8roSpBRaZ7L2Qw1fec1Qu6g2ks5q7zS8gaJpZM4KmHEy.

@ebrondol
Copy link
Contributor

ebrondol commented Nov 7, 2016

@slava77 The layer connections, i.e. the navigation school, should stay exactly the same before and after the PR#16470. I have just checked with the analyzer RecoTracker/TkNavigation/test/NavigationSchoolAnalyzer_cfg.py. This does not exclude that other factors can make the CPU worse..

@davidlange6 davidlange6 merged commit 355c12a into cms-sw:CMSSW_8_1_X Nov 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants