Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HGCal] Run HGCalValidator only on the hard scatterer event and correcting the mess from merging #27478

Merged
merged 7 commits into from Jul 14, 2019

Conversation

apsallid
Copy link
Contributor

@apsallid apsallid commented Jul 10, 2019

PR description:

This PR addresses the issue #27417. We decided (@rovere , @felicepantaleo , @amartelli )
to follow PR #26915 and consider only CaloParticles coming from the hard scatterer,
excluding the PU contribution. Running the Timing Service on 10 events of workflow 20234.99
we see that the timing of the hgcalValidator module is on average 2.88 s/event.

We should also note that the hgcalValidator module will for sure be even faster in cooperation
with TICL multiclusters, since at the moment the number of multiclusters produced is huge
(~10^4 multiclusters per event for wf 20234.99).

Furthermore, we noted:

  • A duplication of a loop coming from the git rebasing/merging.
  • An incorrect initialization of the 3D score, shared energy and shared energy fraction inside the CaloParticles loop.
  • An incorrect iteration over the full collection of multiclusters. At that point we should iterate only to the multiclusters that are related to CaloParticles. Iterating over the full multicluster collection contained cases for the score where the numerator and denominator had their initialized values,which is zero and lead to nan's in the histograms.

PR validation:

We tested the PR with wf 20234.99 (TTbar with 200PU).

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-27478/10808

  • This PR adds an extra 36KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-27478/10810

  • This PR adds an extra 36KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @apsallid for master.

It involves the following packages:

Validation/HGCalValidation

@andrius-k, @kmaeshima, @schneiml, @cmsbuild, @jfernan2, @fioriNTU can you please review it and eventually sign? Thanks.
@vandreev11, @sethzenz, @rovere, @lgray, @cseez, @bsunanda, @pfs, @kpedro88 this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@jfernan2
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 10, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/1398/console Started: 2019/07/10 19:20

@@ -192,6 +192,18 @@ void HGCalValidator::dqmAnalyze(const edm::Event& event,
histoProducerAlgo_->fill_info_histos(histograms.histoProducerAlgo, totallayers_to_monitor_);
}

//Consider CaloParticles coming from the hard scatterer, excluding the PU contribution.
std::vector<CaloParticle> caloParticlesFromHardScat;
for (auto& it_caloPart : caloParticles) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be const auto?

<< " with BX: " << it_caloPart.g4Tracks()[0].eventId().bunchCrossing() << std::endl;
continue;
}
caloParticlesFromHardScat.push_back(it_caloPart);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is making a copy of each hard-scatter caloparticle. Some alternatives:

  • store pointers to hard-scatter caloparticles
  • store vector<size_t> with a list of hard-scatter indices, and then select only those indices from the original vector
    Both of these alternatives have worse memory locality, but reduce memory usage and copying overhead. It would be interesting to see if these improve the performance any further.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kpedro88 ,

Using pointers or the selected indices has far reaching consequences to the whole package since the input to the vast
majority of the methods in Validation/HGCalValidation/src/HGVHistoProducerAlgo.cc should be changed and subsequently
a lot of code. Could we please go on with the emplace_back?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine for now, but if the memory usage becomes concerning, these alternatives should be revisited.

Copy link
Contributor

@rovere rovere Jul 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apsallid move the filter on the CaloParticles where you actually use them, i.e. one level down the stack.
In this way, you do not have to copy anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rovere i tested that but for some reason it is really slow. Maybe because we need that check in 5 different places while the code in these places will loop in any case through all CaloParticles which is of the order of 10000. By reducing the collection to ~200 CaloParticles once in the beginning we gain speed. I do not know what else to think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apsallid doesn't sound like a good explanation. Are you sure, by any chance, you left some part of the code w/o the selection?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apsallid otherwise use the vector of indices down the stack, so that you don't need to loop over and over and make the very same selection.

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@apsallid
Copy link
Contributor Author

@jfernan2 We understood the cause of the nan's and it should be fine now. I will update the description so that it is clear what was causing the problem as well as with the new timing average of the module, which is a little faster.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-27478/10872

  • This PR adds an extra 40KB to repository

  • There are other open Pull requests which might conflict with changes you have proposed:

@cmsbuild
Copy link
Contributor

Pull request #27478 was updated. @andrius-k, @kmaeshima, @schneiml, @cmsbuild, @jfernan2, @fioriNTU can you please check and sign again.

@jfernan2
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 13, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/1461/console Started: 2019/07/13 16:05

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-65ee12/1461/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 32
  • DQMHistoTests: Total histograms compared: 3081858
  • DQMHistoTests: Total failures: 56
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3081480
  • DQMHistoTests: Total skipped: 322
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 31 files compared)
  • Checked 133 log files, 14 edm output root files, 32 DQM output files

@jfernan2
Copy link
Contributor

+1
thanks @apalid you have indeed fixed the NaN
https://tinyurl.com/y3qukje9

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

@fabiocos
Copy link
Contributor

In a private test on wf 20634.99 I see in step4:

18:42 cmsdev25 2949> grep hgcalV step4_TTbar_14TeV+TTbar_14TeV_TuneCUETP8M1_2023D41PU_GenSimHLBeamSpotFull14INPUT+SingleNuE10_cf_2023D41PU_PremixHLBeamSpotFull14PU_Premix+DigiFullTriggerPUPRMXCombined_2023D41PU+RecoFullGlobalPUPRMX_2023D41PU+HARVESTFullGlobalPU_2023D41PU.log 
TimeModule> 106 1 hgcalValidator HGCalValidator 2.73505
TimeModule> 103 1 hgcalValidator HGCalValidator 3.22121
TimeModule> 104 1 hgcalValidator HGCalValidator 3.35191
TimeModule> 105 1 hgcalValidator HGCalValidator 3.08943
TimeModule> 107 1 hgcalValidator HGCalValidator 3.06279
TimeModule> 101 1 hgcalValidator HGCalValidator 2.84508
TimeModule> 108 1 hgcalValidator HGCalValidator 2.68395
TimeModule> 110 1 hgcalValidator HGCalValidator 3.53003
TimeModule> 102 1 hgcalValidator HGCalValidator 2.82613

So I would say that the original problem is fixed.

@fabiocos
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants