Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phase2 - HGC - Realistic SimClusters #19571

Merged

Conversation

felicepantaleo
Copy link
Contributor

This PR creates PFClusters using a more realistic assignment of energy fractions to Clusters based on distance and energy.
It also allows to merge two simclusters together if they share an amount of energy above threshold.
This PR does not enable Realistic SimClusters by default, so downstream quantities should remain unchanged.

You can find a presentation with more details here:
https://indico.cern.ch/event/647401/contributions/2632485/attachments/1479578/2293729/20170620_Felice_Realism.pdf

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 5, 2017

A new Pull Request was created by @felicepantaleo (Felice Pantaleo) for master.

It involves the following packages:

RecoParticleFlow/PFClusterProducer

@perrotta, @cmsbuild, @slava77, @davidlange6 can you please review it and eventually sign? Thanks.
@mmarionncern, @rafaellopesdesa, @lgray, @seemasharmafnal, @cbernet, @bachtis this is something you requested to watch as well.
@davidlange6 you are the release manager for this.

cms-bot commands are listed here

@felicepantaleo
Copy link
Contributor Author

FYI: @malgeri @rovere @cseez @kpedro88

@slava77
Copy link
Contributor

slava77 commented Jul 5, 2017

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 5, 2017

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/21183/console Started: 2017/07/05 18:56

@kpedro88
Copy link
Contributor

kpedro88 commented Jul 5, 2017

assign upgrade

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 5, 2017

New categories assigned: upgrade

@kpedro88 you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 5, 2017

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 5, 2017

Comparison job queued.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 5, 2017

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19571/21183/summary.html

Comparison Summary:

  • You potentially added 5 lines to the logs
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 22
  • DQMHistoTests: Total histograms compared: 1786918
  • DQMHistoTests: Total failures: 15114
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 1771638
  • DQMHistoTests: Total skipped: 166
  • DQMHistoTests: Total Missing objects: 0
  • Checked 90 log files, 14 edm output root files, 22 DQM output files

Copy link
Contributor

@kpedro88 kpedro88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code needs some work to get into shape to be merged in official-cmssw. A few other general questions/comments:

  1. Are there performance plots showing what effect this has on clusters?
  2. We need to understand the CPU and memory performance of this code - even if it's not enabled right now, it will be eventually.


RealisticCluster()
{
totalEnergy = 0.f;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use member initializer list in constructor

{
hitIdsAndFractions_.emplace_back(hit,fraction);
}
//TODO: hits are sorted, replace with binary search.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the hits already sorted in this PR? If so, find_if could just be replaced with lower_bound


namespace
{
//TODO: get number of layers per subdet from geometry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please do this ASAP, the methods are available now (since #19485)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot find a method which returns the number of layers.
Should I really use something like the following?
`auto geomBH = static_cast<const HGCalGeometry*>(geom_->getSubdetectorGeometry(DetId::Forward,ForwardSubdetector::HGCHEB));

bhOffset_+ (geomBH->topology().dddConstants()).layers(true);`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems to be what is available. @bsunanda @clelange please advise

void insertHitPosition(float x, float y, float z, unsigned int hitIndex)
{
hitPosition_[hitIndex] =
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove unnecessary spaces/linebreaks here

float getDecayLength(unsigned int layer)
{
if (layer <= 28)
return 2.f;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decay length magic numbers should be made into consts with explanatory names

//TODO: get number of layers+1 from geometry
realisticAssociator.init(hits.size(), simClusters.size(), numberOfLayers + 1);
// for quick indexing back to hit energy
std::unordered_map < uint32_t, size_t > detIdToIndex(hits.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need #include <unordered_map> for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

for (unsigned int ic = 0; ic < simClusters.size(); ++ic)
{
const auto & sc = simClusters[ic];
auto hitsAndFractions = std::move(sc.hits_and_fractions());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the std::move() is probably unnecessary here, the compiler will optimize this

}
realisticAssociator.computeAssociation(_exclusiveFraction, _useMCFractionsForExclEnergy);
realisticAssociator.findAndMergeInvisibleClusters(_invisibleFraction, _exclusiveFraction);
auto realisticClusters = std::move(realisticAssociator.realisticClusters());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just const auto&, no std::move()

output.emplace_back();
reco::PFCluster& back = output.back();
edm::Ref < std::vector<reco::PFRecHit> > seed;
auto hitsIdsAndFractions = std::move(realisticClusters[ic].hitsIdsAndFractions());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just const auto&, no std::move()

#include "SimDataFormats/CaloAnalysis/interface/SimClusterFwd.h"

class RealisticSimClusterMapper : public InitialClusteringStepBase {
typedef RealisticSimClusterMapper B2DGT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for this indecipherable typedef?

@felicepantaleo
Copy link
Contributor Author

Dear @kpedro88 ,
Many thanks for the prompt feedbacks. I will try to address them today.
The effect this has on physics is:

  • for hits which are shared among different SimClusters, fractions are recalculated without considering the MC truth
  • Clusters which are sharing too much energy are merged and create a single PFCluster.

We have to tune configuration parameters and use PR #19572 to filter out rechits going into the cluster based on energy and distance.
I suggest we discuss the physics performance once I have all these parameters and will make the new PR after this one.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 7, 2017

Pull request #19571 was updated. @perrotta, @cmsbuild, @slava77, @kpedro88, @davidlange6 can you please check and sign again.

1 similar comment
@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 7, 2017

Pull request #19571 was updated. @perrotta, @cmsbuild, @slava77, @kpedro88, @davidlange6 can you please check and sign again.

@kpedro88
Copy link
Contributor

kpedro88 commented Aug 2, 2017

@felicepantaleo and then what did you do before making the latest "removing white space" commits? The merge-topic commit wasn't there after the rebase, it appeared when you pushed the new commits.

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 2, 2017

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 2, 2017

Comparison job queued.

@felicepantaleo
Copy link
Contributor Author

@kpedro88 I created another pre3 area, merged the branch and checked that everything was ok...

this has now already been rebased to CMSSW_9_3_0_pre3
15893 2017-08-02 14:07:03 cmsrel CMSSW_9_3_0_pre3
15894 2017-08-02 14:07:09 cd CMSSW_9_3_0_pre3/src/
15895 2017-08-02 14:07:10 cmsenv
15896 2017-08-02 14:07:15 git cms-merge-topic felicepantaleo:HGCal_Realistic_simclusters_PR_master
15897 2017-08-02 14:08:05 scram b -j 8
15898 2017-08-02 14:39:36 git status
15900 2017-08-02 14:39:41 git add .
15901 2017-08-02 14:39:47 git commit -m 'removing white space'
15902 2017-08-02 14:42:30 git add .
15903 2017-08-02 14:42:31 git commit -m 'removing white space'
15904 2017-08-02 14:42:36 scram b -j 8
15908 2017-08-02 14:43:49 git push my-cmssw from-CMSSW_9_3_0_pre3:HGCal_Realistic_simclusters_PR_master

@kpedro88
Copy link
Contributor

kpedro88 commented Aug 2, 2017

That is not a correct use of cms-merge-topic. If you want to recreate a working area, use cms-checkout-topic. In this case, you could probably just use the area where you did the rebase.

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 2, 2017

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19571/22012/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 899 differences found in the comparisons
  • DQMHistoTests: Total files compared: 25
  • DQMHistoTests: Total histograms compared: 2651091
  • DQMHistoTests: Total failures: 61016
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2589894
  • DQMHistoTests: Total skipped: 181
  • DQMHistoTests: Total Missing objects: 0
  • Checked 102 log files, 14 edm output root files, 25 DQM output files

@felicepantaleo
Copy link
Contributor Author

@perrotta
here you will find the two time and memory reports for 20 events Zee PU200:
https://fpantale.web.cern.ch/fpantale/PRs/19571/diff_eventSize.html
event size decreases
most notable

  • particleFlowClusterHGCal: (recoPFClusters) -13.8% uncomp, -19.7% compressed

https://fpantale.web.cern.ch/fpantale/PRs/19571/diff_timeMem.html
overall timing stays more or less the same with an increase in:
particleFlowClusterHGCal from 0.536151 to 0.717634
which is not the most expensive module...

@perrotta
Copy link
Contributor

perrotta commented Aug 4, 2017

Thank you @felicepantaleo !

Trying to dig into your diffs:

  • EventSize: it is great that you reduces the size occupied by the particleFlowClusterHGCal (and the same happens for the superclusters, on a smaller scale). On the other hand, the size of gsf tracks for electrons widens (which corresponds to the larger number of electrons also seen in the jenkins outputs, where they almost double). I have the impression that the overall eventSize diminishes with this PR, however: do you have the also the total sizes to compare, and confirm (or deny) my impression?

  • CPU Time: the total CPU increases from 132.016 -> 134.453 s/evt, i.e. 1.8%. Even assuming this counts for all events (without including the startup), and therefore removing the max event (assumed to be the first one) the total CPU time still increases from 128.85 -> 131.15 s/evt, i.e. 1.77% (the same increase). This is noticeable, and I tried to understand where it comes from. As you noticed, the increase in the HGCal specific modules is rather limited (good!), while the largest extra-CPU eaters are the electrons:
    TimeReport ecalDrivenElectronSeeds + 1.6 s/evt
    TimeReport electronCkfTrackCandidates + 0.75 s/evt
    TimeReport electronGsfTracks + 0.8 s/evt
    TimeReport pfTrackElec + 1.5 s/evt

Overall, I think that you should understand where the doubling of the number of electrons comes from, and either justify or try to tune it, so that no extra CPU is eaten neadlessly

@felicepantaleo
Copy link
Contributor Author

@perrotta
The overall event size decreases. I have the total sizes (but with FEVT): 1420244962 to 1412076439 for 20 events Zee PU200.
The increase in number of gsf electrons is expected as well.
SimClusters are now merging and you have two effects:

  1. the merged cluster becomes a PFCluster with zero energy (so the number of clusters with zero energy increases.
  2. the absorbing cluster becomes fatter and can now exceed the energy threshold and make it to become a gsf electron.

@perrotta
Copy link
Contributor

perrotta commented Aug 4, 2017

Could you provide some evidence that the newly produced electrons are really worth being produced and saved? I mean, some eff vs fakes, even with the low stat that you have in your sample.

Is any tuning, at least in the endcaps, able save the physics content and reduce the additional cpu time required by this PR?

@felicepantaleo
Copy link
Contributor Author

ciao Andrea,
https://fpantale.web.cern.ch/fpantale/PRs/19571/eff.pdf

Efficiency is 100% by definition, since all the PFclusters are associated to a SimCluster.
What is missing to get to 100% is due to a missing track in the tracker.

What could be added (in another PR) is an energy cut, for which if a PFCluster has energy below, say 1GeV, I set it to zero and remove all the hits contained in it....

@perrotta
Copy link
Contributor

perrotta commented Aug 5, 2017

+1

@slava77
Copy link
Contributor

slava77 commented Aug 7, 2017

@kpedro88
a signature from upgrades is missing.
Please check this PR and suggest changes or sign it.
Thank you.

@kpedro88
Copy link
Contributor

kpedro88 commented Aug 7, 2017

+1

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 7, 2017

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @smuzaffar (and backports should be raised in the release meeting by the corresponding L2)

@davidlange6
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 92989a2 into cms-sw:master Aug 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants