Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update mkFit for CMSSW_12_2_0_pre3 #36246

Merged
merged 1 commit into from Nov 30, 2021

Conversation

mmasciov
Copy link
Contributor

@mmasciov mmasciov commented Nov 25, 2021

PR description:

This PR follows #35974 and updates mkFit for CMSSW_12_2_0_pre3.

In detail, this PR:

It requires cms-sw/cmsdist#7465 and cms-data/RecoTracker-MkFit#8.

Notes on physics performance (as from PR validation below):

  • the effect of this (set) of PR(s) on track reconstruction efficiency is negligible;
  • fake+duplicate tracks are reduced in endcaps;
  • the average number of pixel layers and pixel hits per track and the average number of strip hits are higher;
  • as a result, track resolutions are improved.

Additional note: cms-sw/cmsdist#7465 was updated on November 28th, with a new mkFit release containing an additional mkFit-internal technical update (physics is unchanged) that allows for an additional speedup in track building.

PR validation:

Full MTV results, including cms-sw/cmsdist#7465 and cms-data/RecoTracker-MkFit#8:

FYI, @mmusich @vmariani

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-36246/26886

  • This PR adds an extra 12KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @mmasciov (Mario Masciovecchio) for master.

It involves the following packages:

  • RecoTracker/MkFit (reconstruction)

@jpata, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks.
@makortel, @felicepantaleo, @GiacomoSguazzoni, @JanFSchulte, @rovere, @VinInn, @ebrondol, @gpetruc, @mmusich, @mtosi, @dgulhan this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@makortel
Copy link
Contributor

@cmsbuild, please test with cms-sw/cmsdist#7465, cms-data/RecoTracker-MkFit#8

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b97bc9/20753/summary.html
COMMIT: bd77b89
CMSSW: CMSSW_12_2_X_2021-11-24-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/36246/20753/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b97bc9/20753/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b97bc9/20753/git-merge-result

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 17000 differences found in the comparisons
  • DQMHistoTests: Total files compared: 42
  • DQMHistoTests: Total histograms compared: 3247025
  • DQMHistoTests: Total failures: 26334
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3220669
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 41 files compared)
  • Checked 177 log files, 37 edm output root files, 42 DQM output files
  • TriggerResults: no differences found

@jpata
Copy link
Contributor

jpata commented Nov 25, 2021

should we run this with profiling, to check if we can observe the memory usage being reduced?

@jpata
Copy link
Contributor

jpata commented Nov 25, 2021

should we also consider this for a backport to 12_1?

  • it's expected to improve memory usage
  • it will also change the physics performance

@cms-sw/orp-l2 what do you think?

@perrotta
Copy link
Contributor

@jpata, about the possible 12_1_X backport: the original intention was to integrate mkfit in 12_2_X. Recently we got a request from @tvami to backport mkfit to 12_1_X in order to re-reco there and use it for the alignment. That request was endorsed by PPD.

Said that, I would guess that also these updates should be backported to be included in that re-reco (even though they were not full scale validated yet). But I'd let Tamas to confirm, as well as @rappoccio and Kaori.

@tvami
Copy link
Contributor

tvami commented Nov 25, 2021

It would be nice to know how much these improvements affect the alignment, if at all.

track resolutions are improved.

Based on this it seems like that it does affect alignment. However the previous PR that I wanted to be backported was a bugfix, while this is rather an improvement, so I'm not sure we really need this one for the first rereco (we'll do another one in 12_2_X too)

@mmusich
Copy link
Contributor

mmusich commented Nov 25, 2021

track resolutions are improved.

Based on this it seems like that it does affect alignment.

How can you conclude that? I'd rather maintain the opposite.
Also this

the average number of pixel layers and pixel hits per track and the average number of strip hits are higher;

has implications on the ALCARECO selection efficiency

@tvami
Copy link
Contributor

tvami commented Nov 25, 2021

track resolutions are improved.

Based on this it seems like that it does affect alignment.

How can you conclude that? I'd rather maintain the opposite.

I thought a better resolution leads to less biases leading to better alignment

so I'm not sure we really need this one for the first rereco

So @mmusich you agree with this, right?

@mmusich
Copy link
Contributor

mmusich commented Nov 25, 2021

So @mmusich you agree with this, right?

No, I rather strongly disagree.

@tvami
Copy link
Contributor

tvami commented Nov 25, 2021

OK, Marco has a better view of this for sure, so let's do the backport to 12_1_X as well then

@mmasciov
Copy link
Contributor Author

mmasciov commented Nov 29, 2021

Slava pointed out in a different thread that actually some CPU improvement can be seen here too when looking at edm::global::EDProducerBase.

old: 14.17 | 52.75 | 52.75 | 1 | 1 | MkFitProducer::produce(edm::StreamID, edm::Event&, edm::EventSetup const&) const
new: 9.28 | 33.35 | 33.35 | 1 | 1 | MkFitProducer::produce(edm::StreamID, edm::Event&, edm::EventSetup const&) const

https://cmssdt.cern.ch/SDT/cgi-bin/igprof-navigator/CMSSW_12_2_X_2021-11-26-2300/slc7_amd64_gcc900/profiling/11834.21/igprofCPU_step3/19 https://cmssdt.cern.ch/SDT/cgi-bin/igprof-navigator/CMSSW_12_2_X_2021-11-26-2300/slc7_amd64_gcc900/profiling/11834.21/PR-b97bc9/20794/igprofCPU_step3/22

@mmasciov how will these CPU improvements affect to the overall tracking time reduction? Will you also be able to provide the full throughput comparison numbers requested by Danilo / O&C?

@jpata: in terms of tracking time, this corresponds to an extra few %, which translates to a 1-2% improvement in full event throughput on top of the previous version of mkFit. Yes, I shall have a measurement of the throughput using on the latest mkFit at today Tracking POG.

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b97bc9/20808/summary.html
COMMIT: bd77b89
CMSSW: CMSSW_12_2_X_2021-11-28-0000/slc7_amd64_gcc900
Additional Tests: PROFILING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/36246/20808/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 17004 differences found in the comparisons
  • DQMHistoTests: Total files compared: 42
  • DQMHistoTests: Total histograms compared: 3247873
  • DQMHistoTests: Total failures: 26339
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3221511
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.004 KiB( 41 files compared)
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • Checked 177 log files, 37 edm output root files, 42 DQM output files
  • TriggerResults: no differences found

@jpata
Copy link
Contributor

jpata commented Nov 29, 2021

Looks like there's a pretty significant CPU decrease with the latest changes (on top of what was already seen in #36246 (comment))

https://cmssdt.cern.ch/SDT/cgi-bin/igprof-navigator/CMSSW_12_2_X_2021-11-28-0000/slc7_amd64_gcc900/profiling/11834.21/igprofCPU_step3/19
Screenshot 2021-11-29 at 14 43 15

https://cmssdt.cern.ch/SDT/cgi-bin/igprof-navigator/CMSSW_12_2_X_2021-11-28-0000/slc7_amd64_gcc900/profiling/11834.21/PR-b97bc9/20808/igprofCPU_step3/25
Screenshot 2021-11-29 at 14 43 42

Can you comment on if you expect additional low-hanging fruit like trackreco/mkFit#388 to be possible? I had been under the impression that things are pretty stable already.

@mmasciov
Copy link
Contributor Author

mmasciov commented Nov 29, 2021

Looks like there's a pretty significant CPU decrease with the latest changes (on top of what was already seen in #36246 (comment))
Can you comment on if you expect additional low-hanging fruit like trackreco/mkFit#388 to be possible? I had been under the impression that things are pretty stable already.

@jpata, the main speedup in the comparison to the IB baseline is not coming from trackreco/mkFit#388, but from a fix in the mkFit pixelLessStep which was already included in this PR before yesterday. The update from trackreco/mkFit#388 allowed a further speedup only by few % compared to the "rest" of this PR.
We don't see more "low-hanging fruits" at the moment. In case we'll find room for extra improvement, we'll propose technical updates in the future.

@jpata
Copy link
Contributor

jpata commented Nov 29, 2021

Right, I'm counting currently 14.2% (baseline) -> 9.3% (previous version of this PR) -> 7.6% (current version of this PR) as the fraction of time spent in MkFitProducer::produce. I was just not expecting additional performance updates on this PR from Friday to today, but of course, it's great that it was possible...

@jpata
Copy link
Contributor

jpata commented Nov 29, 2021

+reconstruction

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

@mmasciov the number of warning messages, in particular from TrackProducer:pixelLessStepTracks increase after this update, see #35798 (comment).
Could you please have a glance? Is everything under control?

@mmasciov
Copy link
Contributor Author

@mmasciov the number of warning messages, in particular from TrackProducer:pixelLessStepTracks increase after this update, see #35798 (comment). Could you please have a glance? Is everything under control?

@perrotta, as per our answer in #35798: the number of warnings is actually (significantly) reduced in certain workflows.
However, as also mentioned there, we'll try to suppress these warnings further/for good with future technical updates.

@tvami
Copy link
Contributor

tvami commented Nov 30, 2021

urgent

  • as the title says this should go into the pre-release to be cut today

@perrotta
Copy link
Contributor

+1

  • Among other things, improvements in memory allocation and cpu time consuming
  • Warning messages still around, probably diminishing in a few wfs, augmenting in others... It will be dealt with in a different PR

@mmasciov
Copy link
Contributor Author

mmasciov commented Dec 1, 2021

@mmusich, @tvami: FYI, I opened backport PRs to 121X (cms-sw/cmsdist#7482 + #36315).

cmsbuild added a commit that referenced this pull request Dec 3, 2021
[12.1.X] Update of mkFit as in 12_2_0_pre3 (backport of PR #36246)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants