Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Puppi speedup: use reserve for vectors #37717

Merged
merged 1 commit into from May 2, 2022

Conversation

kpedro88
Copy link
Contributor

PR description:

While profiling, I noticed a non-negligible fraction of time spent in reallocating vectors of candidates in PuppiProducer. This occurred because the vectors weren't reserved, and it was noticeable because constructing candidate objects is not lightweight. Therefore, repeating it should be avoided. Properly reserving the offending vectors accomplishes this. This reduced the CPU usage by 10-15% in my tests.

PR validation:

Code compiles and runs. CPU impact quantified with igprof. Compared AK8 jet pT in a ttbar workflow to confirm that no changes occurred.

if this PR is a backport please specify the original PR and why you need to backport that PR:

Planned to be backported to 10_6_X (along with some other miscellaneous Puppi speedups that had not yet been backported).

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37717/29554

  • This PR adds an extra 16KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @kpedro88 (Kevin Pedro) for master.

It involves the following packages:

  • CommonTools/PileupAlgos (reconstruction)

@jpata, @cmsbuild, @clacaputo, @slava77 can you please review it and eventually sign? Thanks.
@rappoccio, @jdolen, @ahinzmann, @missirol, @gkasieczka, @hatakeyamak this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@kpedro88
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0a5662/24286/summary.html
COMMIT: b2c0405
CMSSW: CMSSW_12_4_X_2022-04-27-1100/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37717/24286/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

There are some workflows for which there are errors in the baseline:
1001.2 step 2
The results for the comparisons for these workflows could be incomplete
This means most likely that the IB is having errors in the relvals.The error does NOT come from this pull request

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3695434
  • DQMHistoTests: Total failures: 8
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3695404
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 205 log files, 45 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@clacaputo
Copy link
Contributor

test parameters

  • enable_test = profiling

@clacaputo
Copy link
Contributor

@cmsbuild please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0a5662/24318/summary.html
COMMIT: b2c0405
CMSSW: CMSSW_12_4_X_2022-04-27-2300/slc7_amd64_gcc10
Additional Tests: PROFILING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37717/24318/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3695434
  • DQMHistoTests: Total failures: 2
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3695410
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 205 log files, 45 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@jpata
Copy link
Contributor

jpata commented Apr 29, 2022

type jetmet

@kpedro88
Copy link
Contributor Author

IB profile (11834.21 step 4, PAT):

            0.1  .........       0.04 / 11.16        edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) [34]
[2742]      0.1       0.04       0.00 / 0.04       PuppiProducer::produce(edm::Event&, edm::EventSetup const&)
            0.0  .........       0.02 / 0.02         void std::vector<pat::PackedCandidate, std::allocator<pat::PackedCandidate> >::_M_realloc_insert<pat::PackedCandidate const&>(__gnu_cxx::__normal_iterator<pat::PackedCandidate*, std::vector<pat::PackedCandidate, std::allocator<pat::PackedCandidate> > >, pat::PackedCandidate const&) [3563]
            0.0  .........       0.01 / 0.03         pat::PackedCandidate::PackedCandidate(pat::PackedCandidate const&) [3087]
            0.0  .........       0.00 / 0.16         bool edm::Event::getByToken<reco::Candidate>(edm::EDGetTokenT<edm::View<reco::Candidate> >, edm::Handle<edm::View<reco::Candidate> >&) const [1516]
            0.0  .........       0.00 / 0.00         pat::PackedCandidate::setP4(ROOT::Math::LorentzVector<ROOT::Math::PxPyPzE4D<double> > const&) [6716]
            0.0  .........       0.00 / 0.00         void std::vector<double, std::allocator<double> >::_M_realloc_insert<double>(__gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, double&&) [6798]

PR profile:

            0.1  .........       0.04 / 11.31        edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) [34]
[2735]      0.1       0.04       0.00 / 0.04       PuppiProducer::produce(edm::Event&, edm::EventSetup const&)
            0.0  .........       0.01 / 0.04         pat::PackedCandidate::~PackedCandidate() [2739]
            0.0  .........       0.01 / 0.01         pat::PackedCandidate::setP4(ROOT::Math::LorentzVector<ROOT::Math::PxPyPzE4D<double> > const&) [5166]
            0.0  .........       0.00 / 1.03         _init [520]
            0.0  .........       0.00 / 0.96         operator delete(void*, unsigned long) [542]
            0.0  .........       0.00 / 0.03         pat::PackedCandidate::PackedCandidate(pat::PackedCandidate const&) [3047]
            0.0  .........       0.00 / 0.01         pat::PackedCandidate::unpackVtx() const [5188]
            0.0  .........       0.00 / 0.00         pat::PackedCandidate::py() const [7151]

The calls to std::vector<...>::_M_realloc_insert are no longer present. (Unfortunately the 10-15% speedup is too small to be directly visible with the limited resolution in this profile and workflow.)

@clacaputo
Copy link
Contributor

type performace-improvement

@clacaputo
Copy link
Contributor

+reconstruction

@cmsbuild
Copy link
Contributor

cmsbuild commented May 2, 2022

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

perrotta commented May 2, 2022

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants