Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactored first part of PFAlgo::processBlock #26894

Merged
merged 2 commits into from Jun 3, 2019

Conversation

jpata
Copy link
Contributor

@jpata jpata commented May 22, 2019

PR description:

Proposal to start refactoring the PFAlgo::processBlock, which is a single function of several thousand lines, into smaller independent subfunctions based on physics functionality. As the original code is laid out quite clearly, simply in a long stream of consciousness, it is easy to factorize.

At this stage, we tackle a part of the PFAlgo::processBlock, introducing the following factorizations:

  • egammaFilters: code in PFAlgo::processblock { if(useEGammaFilters_) { ... } }
  • conversionAlgo: code in PFAlgo::processblock { if(usePFConversions_) { ... } }
  • elementLoop: big loop in PF code (line in code) that does multiple things:
    o Deals with tracks that are not associated to electrons or HCAL clusters
    o Sorts the ECAL and HCAL elements in separate vectors
    o For tracks which are connected to more than one HCAL cluster, cuts the links between the track and the cluster for all clusters but the closest one

This is meant as an inital PR to start clearing up the PFAlgo code, for which I imagine the path could be something like the following:

  • Factorize PFAlgo::processBlock to smaller subfunctions (this PR, partly)
  • Sanitize the inputs and outputs of the functions and making sure they rely less on the PFAlgo state
  • Given the cleaned-up implementation from above, revisit the logic/physics of the algo to rewrite it from with Run3/HL-LHC in mind
  • Compare to an eventual ML PF algo that arrives in CMS

PR validation:

We do not change the functionality of the PFAlgo, hence the output should not change at all and timing performance should change only in as much as the C++ compiler is able to optimize the code better. We have checked this in a previous version of this PR, see here. Nevertheless, I'm running the following comparisons:

  • runTheMatrix -l 38.0 between 43fd1cf and 3b1ae07: exactly the same reconstruction output on 10 events, output files have the same number of bytes.

if this PR is a backport please specify the original PR:

cc PF group: @hatakeyamak @bendavid

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-26894/9932

  • This PR adds an extra 44KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @jpata (Joosep Pata) for master.

It involves the following packages:

RecoParticleFlow/PFProducer

@perrotta, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks.
@mmarionncern, @hatakeyamak, @lgray, @seemasharmafnal, @cbernet, @bachtis this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@perrotta
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented May 23, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/438/console Started: 2019/05/23 07:37

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-cdc0c8/438/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 33
  • DQMHistoTests: Total histograms compared: 3206856
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3206521
  • DQMHistoTests: Total skipped: 334
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 32 files compared)
  • Checked 137 log files, 14 edm output root files, 33 DQM output files


void PFAlgo::elementLoop(const reco::PFBlock &block, reco::PFBlock::LinkData& linkData, const edm::OwnVector<reco::PFBlockElement> &elements, std::vector<bool>& active, const reco::PFBlockRef &blockref, ElementIndices& inds, std::vector<bool> &deadArea) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines as such could be more better readable if splitted into some reasonable way.
This will be automatically done anyhow in the clang cleaning campaign ongoing

@@ -619,7 +495,7 @@ void PFAlgo::processBlock( const reco::PFBlockRef& blockref,
assert( !trackRef.isNull() );

if (debug_ ) {
cout <<"PFAlgo:processBlock "<<" "<<trackIs.size()<<" "<<ecalIs.size()<<" "<<hcalIs.size()<<" "<<hoIs.size()<<endl;
cout <<"PFAlgo:processBlock "<<" "<< inds.trackIs.size()<<" "<< inds.ecalIs.size()<<" "<< inds.hcalIs.size()<<" "<< inds.hoIs.size()<<endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here some line splitting may help

@@ -1333,14 +1209,145 @@ void PFAlgo::processBlock( const reco::PFBlockRef& blockref,
}
} //loop hcal elements
} // end of loop 1 on elements iEle of any type
}

int PFAlgo::decideType(const edm::OwnVector<reco::PFBlockElement> &elements, const reco::PFBlockElement::Type type, std::vector<bool>& active, ElementIndices& inds, std::vector<bool> &deadArea, unsigned int iEle) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line splitting would allow a better readability also here

@perrotta
Copy link
Contributor

Please @jpata let us know whether you plan to apply here the style fixes pointed out in #26894 (review)

@cmsbuild
Copy link
Contributor

cmsbuild commented May 30, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/610/console Started: 2019/05/30 17:38

@cmsbuild
Copy link
Contributor

Pull request #26894 was updated. @perrotta, @cmsbuild, @slava77 can you please check and sign again.

@hatakeyamak
Copy link
Contributor

hatakeyamak commented May 30, 2019

Thank you @jpata @perrotta .
Regarding elementLoop, well the other loops are also looping over elements, but this is the first loop to set things up for the subsequent loops. So, I don't feel strongly, but something like "initialElementLoop" or something similar can be slight improvement. That's my suggestion.

Otherwise this change looks good to me. Thank you.

@jpata
Copy link
Contributor Author

jpata commented May 30, 2019

Thanks for the suggestion @hatakeyamak. My take on this would be to spend a few iterations quickly over some days to factorize all the code functionally, also breaking apart PFAlgo::elementLoop as appropriate, and then figure out the durable function names, datastructures to pass around etc. It is possible that some functions may merge or disappear or change significantly, so I'm hesitant to spend a lot of time to choose the perfect names just now.

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-cdc0c8/610/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 8 differences found in the comparisons
  • DQMHistoTests: Total files compared: 33
  • DQMHistoTests: Total histograms compared: 3210678
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3210341
  • DQMHistoTests: Total skipped: 334
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 32 files compared)
  • Checked 137 log files, 14 edm output root files, 33 DQM output files

@hatakeyamak
Copy link
Contributor

@jpata Thankis. I understand that you plan to introduce additional functions for other loops and HF portion in the next PR. Probably you can consider my suggestions along the side of such subsequent step.

@perrotta
Copy link
Contributor

+1

  • Refactoring implemented as planned
  • No changes expected in output, jenkins tests pass and show no differences
  • Further updates are planned for some forthcoming further update

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

@fabiocos
Copy link
Contributor

fabiocos commented Jun 3, 2019

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants