Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HGCal] TICL v3 major upgrade #31906

Merged
merged 66 commits into from Nov 6, 2020
Merged

Conversation

felicepantaleo
Copy link
Contributor

@felicepantaleo felicepantaleo commented Oct 22, 2020

PR description:

This PR updates the TICL reconstruction to make it more robust against pile-up.
In particular the following actions have been taken:

  • a track seeded electromagnetic iteration has been introduced;
  • the order of the iterations has been changed to TrkEM, EM unseeded, Trackseeded hadronic, hadronic unseeded
  • layerClusters with a minimum size of three are used for pattern recognition
  • seeding regions are sorted by pT and processed in this order
  • EM iterations are limited to the first 30 layers
  • a cut on the first layer id for the shower start has been introduced
  • a cut on the number of missing layers has been introduced
  • a cut on the longitudinal compactness of a trackster (sigmaPCA-z)
  • a simple PF interpretation has been introduced
  • the MIP iteration has been dropped

PR validation:

This is the electron reconstruction with PU200 before this PR:
gsfEle-E_by_genEle-E_PU200

This is after the PR:
gsfEle-E_by_genEle-E_PU200 (3)

Single Particle noPU

All samples with 6 energy steps = 10, 20, 50, 100, 200, 300 GeV, eta = 1.8 (HGCAL center)

image

image

image

image

image

More info

you can find the latest physics results with this pull request described in the talks by HGCAL and Jets/MET at the HLT Upgrade workshop:
https://indico.cern.ch/event/962025/#4-hgcal
https://indico.cern.ch/event/962025/#6-jetsmet

Timing Report

Timing with PR:
https://fpantale.web.cern.ch/fpantale/circles/web/piechart.html?local=false&dataset=TICLv3_PR31906&resource=time_thread&colours=default&groups=reco_PhaseII&threshold=0

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@felicepantaleo
Copy link
Contributor Author

felicepantaleo commented Oct 22, 2020

@felicepantaleo
Copy link
Contributor Author

enable profiling

@felicepantaleo
Copy link
Contributor Author

@cmsbuild please test

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-31906/19329

  • This PR adds an extra 224KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 22, 2020

The tests are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @felicepantaleo (Felice Pantaleo) for master.

It involves the following packages:

DataFormats/HGCalReco
RecoHGCal/Configuration
RecoHGCal/TICL
RecoParticleFlow/PFClusterProducer
Validation/HGCalValidation

@perrotta, @andrius-k, @kmaeshima, @ErnestaP, @kpedro88, @cmsbuild, @jfernan2, @fioriNTU, @slava77, @jpata can you please review it and eventually sign? Thanks.
@mmarionncern, @lecriste, @sethzenz, @bsunanda, @clelange, @riga, @cbernet, @vandreev11, @rovere, @lgray, @cseez, @apsallid, @sobhatta, @pfs, @deguio, @hatakeyamak, @seemasharmafnal this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@hatakeyamak
Copy link
Contributor

Thanks @felicepantaleo.
If you don't mind, for my info, can you expand a bit on "a simple PF interpretation has been introduced"? (or just point to some reference on this point)?

@cmsbuild
Copy link
Contributor

+1
Tested at: ff51d7e
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-814241/10216/summary.html
CMSSW: CMSSW_11_2_X_2020-10-21-2300
SCRAM_ARCH: slc7_amd64_gcc820

@cmsbuild
Copy link
Contributor

Comparison job queued.

Copy link
Contributor

@kpedro88 kpedro88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This review includes some suggestions to reduce CPU usage and code duplication, as well as a few other minor points. The performance profile from the PR test may provide more direction re: reducing CPU usage (if necessary).

@@ -44,6 +52,9 @@ PatternRecognitionbyCA<TILES>::PatternRecognitionbyCA(const edm::ParameterSet &c
<< "PatternRecognitionbyCA received an empty graph definition from the global cache";
}
eidSession_ = tensorflow::createSession(trackstersCache->eidGraphDef);
if (max_missing_layers_in_trackster_ < 100) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100 seems like a magic number here - how it is obtained?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this could be done in the constructor initializer list:

check_missing_layers(max_missing_layers_in_trackster_ < 100),

@@ -137,29 +155,105 @@ void PatternRecognitionbyCA<TILES>::makeTracksters(
<< input.layerClusters[outerCluster].z() << " " << tracksterId << std::endl;
}
}
unsigned showerMinLayerId = 99999;
std::vector<unsigned int> layerIds;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it appears this variable is never used

lcIdAndLayer.emplace_back(i, layerId);
}
std::sort(uniqueLayerIds.begin(), uniqueLayerIds.end());
uniqueLayerIds.erase(std::unique(uniqueLayerIds.begin(), uniqueLayerIds.end()), uniqueLayerIds.end());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How fast is this procedure (push_back, sort, erase unique) compared to inserting into std::set for typical occupancies and number of duplicates?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the fastest is to build a heap while pushing and then do the erase(unique());
std::set is by definition slower then a vector as it allocated node on the fly (no reserve)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for a very small number of elements, the vector is the fastest option. We are talking of 20-30 elements here

int numberOfMissingLayers = 0;
unsigned int j = showerMinLayerId;
unsigned int indexInVec = 0;
for (auto &layer : uniqueLayerIds) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const auto&

}
}

bool selected =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

temporary is unnecessary

tmpCandidate.setRawEnergy(energy);
math::XYZTLorentzVector p4(track.momentum().x(), track.momentum().y(), track.momentum().z(), energy);
tmpCandidate.setP4(p4);
resultCandidates->push_back(tmpCandidate);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy could be avoided by using emplace_back() and then back()

cPOnLayer[h.clusterId][lcLayerId].layerClusterIdToEnergyAndScore[mclId].second = FLT_MAX;
//cpsInMultiCluster[multicluster][CPids]
//Connects a multi cluster with all related caloparticles.
cpsInMultiCluster[mclId].emplace_back(std::make_pair<int, float>(h.clusterId, FLT_MAX));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make_pair is unnecessary with emplace_back

occurrencesCPinMCL[c]++;
//Loop through all rechits to count how many of them are noise and how many are matched.
//In case of matched rechit-simhit, he counts and saves the number of rechits related to the maximum energy CaloParticle.
for (auto& c : hitsToCaloParticleId) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto (get primitive types by value)

maxCPNumberOfHitsInMCL = c.second;
//Below from all maximum energy CaloParticles, he saves the one with the largest amount
//of related rechits.
for (auto& c : occurrencesCPinMCL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could use structured binding e.g. auto [id, nhits] : occurrencesCPinMCL

for (unsigned int j = 0; j < layers * 2; ++j) {
totalCPEnergyFromLayerCP = totalCPEnergyFromLayerCP + cPOnLayer[maxCPId_byEnergy][j].energy;
//Find the CaloParticle that has the maximum energy shared with the multicluster under study.
for (auto& c : CPEnergyInMCL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could use structured binding

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-814241/10216/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 1071 differences found in the comparisons
  • DQMHistoTests: Total files compared: 35
  • DQMHistoTests: Total histograms compared: 2544110
  • DQMHistoTests: Total failures: 2389
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2541699
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 34 files compared)
  • Checked 149 log files, 22 edm output root files, 35 DQM output files

@missirol
Copy link
Contributor

missirol commented Nov 4, 2020

you can find the latest physics results with this pull request described in the talks by HGCAL and Jets/MET at the HLT Upgrade workshop

The plots shown at the TSG workshop correspond to commit abc84da

One minor correction: the HLT-JME plots linked above by Felice include updates up to 4da60d8 (from the 11_1_X backport in #31907); I think that would correspond to 6c32c60 in this PR.

@perrotta
Copy link
Contributor

perrotta commented Nov 4, 2020

da60d8

Thank you @missirol.

There were several updates and bug fixes applied to this PR, and I think some final check and validation should be allowed based on the very final version before merging it. Also the plots in the PR description seems to correspond to a supposed 9b855f6 commit, which I am not able to retrieve neither here or in the backport PR.

I expect that the authors and the HGCal team can provide some kind of green light based on it. The same for the performance, which should be better re-computed with the very final version (by the way, has this PR settled down by now?)

@rovere
Copy link
Contributor

rovere commented Nov 4, 2020

you can find the latest physics results with this pull request described in the talks by HGCAL and Jets/MET at the HLT Upgrade workshop

The plots shown at the TSG workshop correspond to commit abc84da

One minor correction: the HLT-JME plots linked above by Felice include updates up to 4da60d8 (from the 11_1_X backport in #31907); I think that would correspond to 6c32c60 in this PR.

Thanks @missirol
The commits that were added after the one you pointed out in #31907 are related to the various technical changes requested while reviewing this PR, the addition of the EM and HAD information to the pfCandidates produced by TICL and, lastly, the correction on the direction taking the track information as input, when available.
I guess you are in the process of testing the EM and HAD assignment in order to derive JEC in11_1.
In any case, I do not expect dramatic changes to the overall reconstruction even after the inclusion of the latest commits.
There are still a couple of commits missing in the backport, but again those are mainly technical, with no impact on the physics.
I'll wait up until this PR gets merged to have them backported too in 11_1.

@perrotta
Copy link
Contributor

perrotta commented Nov 4, 2020

Testing with just 20 events from the wf 23234.0 (TTbar with 2026 D49 geometry) the overall event size reductions are

  • -1.3% for the FEVTDEBUGHLT event content: 6481072 -> 6396632 -84440 -1.3 ALL BRANCHES
  • -6.4% for the MINIAODSIM event content: 70405 -> 65869 -4536 -6.4 ALL BRANCHES

@perrotta
Copy link
Contributor

perrotta commented Nov 4, 2020

@slava77 the new products created in output by this PR are the following:

      0.0 ->       809.1        809     NEWO   0.01     TICLCandidates_ticlTrackstersMerge__RECO.
      0.0 ->       616.5        616     NEWO   0.01     ticlTracksters_ticlTrackstersTrkEM__RECO.
      0.0 ->      1123.3       1123     NEWO   0.02     recoHGCalMultiClusters_ticlMultiClustersFromTrackstersTrkEM__RECO.
      0.0 ->       344.2        344     NEWO   0.01     floats_ticlTrackstersTrkEM__RECO.

@perrotta
Copy link
Contributor

perrotta commented Nov 5, 2020

Trying to summarize the discussion happened yestreday in this thread: please let me know whether you intend to provide updated validations and comparisons, and if so when they can be ready, so that the review can get finalized here.

@felicepantaleo
Copy link
Contributor Author

@perrotta no further validation would be produced in this PR. I think you can proceed with the finalization of the review.

@kpedro88
Copy link
Contributor

kpedro88 commented Nov 5, 2020

+upgrade

min_clusters_per_ntuplet_(conf.getParameter<int>("min_clusters_per_ntuplet")),
skip_layers_(conf.getParameter<int>("skip_layers")),
max_missing_layers_in_trackster_(conf.getParameter<int>("max_missing_layers_in_trackster")),
check_missing_layers_(max_missing_layers_in_trackster_ < 100),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have found more intuitutive using a negative number to identify the max number which corresponds to "do not check". But ok, the behaviour doesn't change anyhow

@perrotta
Copy link
Contributor

perrotta commented Nov 5, 2020

+1

  • TICL algo upgraded as described, with clear improvements in the electron energy resolution and updates in the PF description
  • Event size shrinks (approx 6% less in miniAOD)
  • Timing also gets reduced, but this is said to be a temporary effect due to the removal of theMIP iteration and selection on the input cluster size: both them are planned to be reintroduced as soon as dedicated studies will be carried on (and the cpu performance will get affected consequently)
  • Jenkins tests pass and the differences observed there are consequence of the updates implemented, including the removal of a few objects from the event content (including the CandidateFromTracksters and TrackstersMIP)

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 5, 2020

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@silviodonato
Copy link
Contributor

As expected we see changes in the Phase-2 workflows

23234.0_TTbar_14TeV+2026D49+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+DigiTrigger+RecoGlobal+HARVESTGlobal
23434.999_TTbar_14TeV+2026D49PU_PMXS1S2PR+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+PREMIX_PremixHLBeamSpot14PU+DigiTriggerPU+RecoGlobalPU+HARVESTGlobalPU
28234.0_TTbar_14TeV+2026D60+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+DigiTrigger+RecoGlobal+HARVESTGlobal

Do you know why we see many more difference in 23234.0 and 28234.0? Since this PR improves the performance against the pile-up, I expected to see larger differences in 23434.999 which has pileup instead of 23234.0 and 28234.0 which are without pileup.

Another question: are these (small) differences in the tau ID expected?
https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_11_2_X_2020-11-02-2300+814241/39695/validateJR/all_mini_OldVSNew_TTbar14TeV2026D49wf23234p0/
https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_11_2_X_2020-11-02-2300+814241/39695/validateJR/all_mini_OldVSNew_TTbar14TeV2026D60wf28234p0/

@slava77
Copy link
Contributor

slava77 commented Nov 5, 2020

I expected to see larger differences in 23434.999

I'm beginning to suspect that this workflow is broken in some way
@kpedro88 did we have any physics validation for it?

@kpedro88
Copy link
Contributor

kpedro88 commented Nov 5, 2020

I'm also suspicious that something funny is happening with 23434.999. I don't see anything out of the ordinary in the matrix test logs. I probably won't have time to look into this in more detail in the next few days; maybe it's worth opening an issue to keep track.

@silviodonato
Copy link
Contributor

+1

auto thisPt = tracksterTotalRawPt + trackstersMergedHandle->at(otherTracksterIdx).raw_pt() - t.raw_pt();
closestTrackster = std::abs(thisPt - track.pt()) < minPtDiff ? otherTracksterIdx : closestTrackster;
}
tracksterTotalRawPt += trackstersMergedHandle->at(closestTrackster).raw_pt() - t.raw_pt();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felicepantaleo this line (also pointed out by the static analyzer) escaped my review of the PR: this increment is completely useless here. Either the line can/should be removed, or it was originally intended to do something different and it has to be fixed then. Please check and provide the fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @perrotta that line can be erased safely. I will make a pr now

@slava77
Copy link
Contributor

slava77 commented Nov 20, 2020

@slava77 the new products created in output by this PR are the following:

      0.0 ->       809.1        809     NEWO   0.01     TICLCandidates_ticlTrackstersMerge__RECO.
      0.0 ->       616.5        616     NEWO   0.01     ticlTracksters_ticlTrackstersTrkEM__RECO.
      0.0 ->      1123.3       1123     NEWO   0.02     recoHGCalMultiClusters_ticlMultiClustersFromTrackstersTrkEM__RECO.
      0.0 ->       344.2        344     NEWO   0.01     floats_ticlTrackstersTrkEM__RECO.

reco monitoring now covers these (sorry, it took a while for me to get to updating the script).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet