Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some EcalClusterTools maintainance and energy matrix in Egamma MVA Ntuplizers #25633

Merged
merged 3 commits into from Mar 3, 2019
Merged

Some EcalClusterTools maintainance and energy matrix in Egamma MVA Ntuplizers #25633

merged 3 commits into from Mar 3, 2019

Conversation

guitargeek
Copy link
Contributor

Hi,

I always felt the potential of machine learning applied to the ECAL is not fully tapped, while it's 2 dimensional nature provides a very good playing ground to test modern neural network architectures designed for image processing.

To enable quick studies in this direction for myself and everybody, I always wanted to (optionally) add the raw rec hit energies in a N times N matrix around the seed crystal to the Electron/PhotonMVANtuplizers. This is now possible, with N being a configurable parameter.

For this purpose, I wrote a new function EcalClusterLazyTools<T>::getEnergies and generally went over the EcalClusterTools/EcalClusterLazyTools duo, tying to make it a bit more slick. This includes:

  • new helper class to easily iterate over rectangular ranges around seed crystals
  • don't separate member function implementation from declaration for template classes because it adds a lot of distracting template lines (see the lazy tools)
  • detabify, removal of trailing whitespaces, occasionally better line breaks

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-25633/7950

  • This PR adds an extra 56KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @guitargeek (Jonas Rembser) for master.

It involves the following packages:

RecoCaloTools/Navigation
RecoEcal/EgammaClusterProducers
RecoEcal/EgammaCoreTools
RecoEgamma/ElectronIdentification
RecoEgamma/PhotonIdentification

@perrotta, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks.
@Sam-Harper, @jainshilpi, @rovere, @argiro, @lgray, @varuns23 this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

}
// slow elegant version
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit surprised by this comment here: why was that solution so slow? Is looping over the DetIDs twice and keeping a vector of them for the second loop so expensive? That statement prompted me to do this rectangle range solution where you only loop over the detIDs once. I hope it combines the speed of the previous solution with the "elegance" of the commented-out solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this solution is slow because a cluster can own many many hits due to how PF clustering works. As you know, PF clustering shares the energy of a crystal between local energy maximums. It does this fairly globally and (initially at least) did not clean well. So you would end up with half the ECAL assigned as a "rec-hit" of the cluster but with fraction of 1E-8 or similar. This was later cleaned up to be >1E-4/E-5 (and may have been further changed) but at the time of writing you were running over a huge number of hits, hence the "slow" comment. Its not clear to me if this was actually measured to be slow or just thought to be from the above argument, I think it was actually slow from dimly remembered conversations.

One thing which I want to achieve in the shutdown is to review the minimum fraction (or fraction*energy) for a hit to be part of a PFCluster to further reduce this problem. Although due to the raised minimum fraction this may not be a problem anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is interesting, thanks for explaining! But I was talking about a different thing. You explained why recHitEnergy() can be slow, right? I was more talking about the fact that the comments labels this as slow (if you "unwrap" the functions and remove the frac related stuff):

CaloNavigator<DetId> cursor = CaloNavigator<DetId>( id, topology->getSubdetectorTopology( id ) );
std::vector<DetId> v;
 for ( int i = ixMin; i <= ixMax; ++i ) {
    for ( int j = iyMin; j <= iyMax; ++j ) {
         cursor.home();
         cursor.offsetBy( i, j );
         if ( *cursor != DetId(0) ) v.push_back( *cursor );
    }
}

float energy = 0;
for ( std::vector<DetId>::const_iterator it = v.begin(); it != v.end(); ++it ) {
        energy += recHitEnergy( *it, recHits );
}

and this as _fast:

float energy = 0;
CaloNavigator<DetId> cursor = CaloNavigator<DetId>( id, topology->getSubdetectorTopology( id ) );
std::vector<DetId> v;
 for ( int i = ixMin; i <= ixMax; ++i ) {
    for ( int j = iyMin; j <= iyMax; ++j ) {
         cursor.home();
         cursor.offsetBy( i, j );
         energy += recHitEnergy( *cursor, recHits )
    }
}

In the "slow" solution, the DetIds in the rectangle are precomputed and stored, and in the "fast" solution the energies are already added in the loop where the DetHits are obtained (and the DetHits are not stored). If one believes that this really makes a difference, it made sense to implement this CaloRectangleRange I thought.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you're right, I miss read the code. Okay, I have no idea :)

@slava77
Copy link
Contributor

slava77 commented Jan 12, 2019

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 12, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/32547/console Started: 2019/01/12 07:04

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-25633/32547/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 11968 differences found in the comparisons
  • DQMHistoTests: Total files compared: 33
  • DQMHistoTests: Total histograms compared: 3153717
  • DQMHistoTests: Total failures: 2390
  • DQMHistoTests: Total nulls: 7
  • DQMHistoTests: Total successes: 3151116
  • DQMHistoTests: Total skipped: 204
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.027 KiB( 32 files compared)
  • DQMHistoSizes: changed ( 136.731 ): 0.027 KiB JetMET/SUSYDQM
  • Checked 137 log files, 14 edm output root files, 33 DQM output files

@guitargeek
Copy link
Contributor Author

Sorry I have no idea where these differences come from! I didn't see them neither in comparisons of the egamma MVA nutuplizer outputs (comparing to a more minimalist implementation of getting the energy matrix), nor in the outputs of the RecoEcal/EgammaCoreTools/test/testEcalCluster(Lazy)Tools.py. I will just update this PR with the revival of these tests, but otherwise I don't know how to proceed here. Anyway, it's maybe not so important.

class ElectronMVANtuplizer : public edm::one::EDAnalyzer<edm::one::SharedResources> {
public:
explicit ElectronMVANtuplizer(const edm::ParameterSet&);
~ElectronMVANtuplizer() override;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if it is a "one" module, why to remove the destructor?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A counter argument, why declare it if the compiler can write the default?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would naively say that otherwise the not overridden dtor of the base class is used instead, and this would fail in deleting specific members of the derived one: isn't it so? Or a virtual method is always overridden by a default one in the derived class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler will aways write a destructor for a class if you do not specifically state you will write the destructor. This is also true for virtual inheritance. I.e. the compiler will write the correct destructor for ElectronMVANtuplizer which will call the destructors for all member data of ElectronMVANtuplizer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Dr15Jones ! Comment retracted, then

@perrotta
Copy link
Contributor

+1

  • Code changes are in line with the PR description and the following comments received during the review
  • The restructuring of the code does not affect the final outputs; I verified that also the computing timing is not modified in any significant way when using either real data or TTbar PU samples
  • Jenkins tests pass

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (but tests are reportedly failing). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

@fabiocos
Copy link
Contributor

fabiocos commented Mar 1, 2019

please test

the random unit test errors should be gone

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 1, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/33352/console Started: 2019/03/01 14:49

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 1, 2019

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 1, 2019

Comparison job queued.

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 1, 2019

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-25633/33352/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 32
  • DQMHistoTests: Total histograms compared: 3114826
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3114628
  • DQMHistoTests: Total skipped: 197
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 31 files compared)
  • Checked 133 log files, 14 edm output root files, 32 DQM output files

@fabiocos
Copy link
Contributor

fabiocos commented Mar 3, 2019

+1

the updated test configurations run smoothly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants