Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New HLT DEBUG object: EgTrigSumObj : CMSSW_11_3_X #32486

Merged
merged 1 commit into from Feb 9, 2021

Conversation

Sam-Harper
Copy link
Contributor

PR description:

This PR introduces a new trigger object, EgTrigSumObj.

The purpose of this is for easing debugging, validation and trigger development. In a nutshell, this object packs all E/gamma ID reco and ID quantities into a single object which can be easily analysed offline.

The reason this is necessary is that the HLT builds the e/gamma object bit by bit for CPU efficiency reasons. Therefore you have to associate the GsfTrack, the pixel seeds , the id variables to the supercluster when you want to analyse this. This class does all this for you.

At the same time, the producer of the class can also filter PFClusters, calo hits and tracks for further analysis offline. This greatly helps ID tuning as well as validation. The class may evolve in the future with filtering of the pixel hits, gsf tracks and superclusters in the plans, however it is usable now.

This is has been the core of the E/gamma Phase-II development (a separate class for which another PR will be made once this merges handles the phase-II specific things).

The goal of this class once merged is to add it to the HLTDEBUG format for RelVals and also in any future Phase-II re-HLT. Ultimately it may make it into other data tiers once it is stable but that depends on many other things.

Finally the observant reader will notice that this has a lot of overlap with the scouting egamma format, both store very similar information. However right now they serve different purposes (this is a validation / debugging / trigger dev format) and scouting has an analysis format with strong space constraints. This does not preclude in the future that they merge but right now that is not in the plan.

PR validation:

To get it to output, the following snippet can be added to an HLT workflow

process.load("RecoEgamma.EgammaHLTProducers.hltEgammaHLTExtra_cfi")
process.hltEgHLTOut = cms.EndPath(process.hltEgammaHLTExtra)
process.schedule.append(process.hltEgHLTOut)
process.FEVTDEBUGHLToutput.outputCommands.append("keep *_hltEgammaHLTExtra_*_*")

Happily used for phase-II HLT development.

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32486/20392

  • This PR adds an extra 64KB to repository

  • Found files with invalid states:

    • RecoEgamma/EgammaHLTProducers/python/hltEgammaHLTExtraL1Seeded_cfi.py:
    • RecoEgamma/EgammaHLTProducers/python/hltEgammaHLTExtraUnseeded_cfi.py:

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32486/20409

  • This PR adds an extra 68KB to repository

  • Found files with invalid states:

    • RecoEgamma/EgammaHLTProducers/python/hltEgammaHLTExtraL1Seeded_cfi.py:
    • RecoEgamma/EgammaHLTProducers/python/hltEgammaHLTExtraUnseeded_cfi.py:

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @Sam-Harper (Sam Harper) for master.

It involves the following packages:

DataFormats/EgammaReco
RecoEgamma/EgammaHLTProducers

@perrotta, @Martin-Grunewald, @slava77, @cmsbuild, @fwyzard, @jpata can you please review it and eventually sign? Thanks.
@sobhatta, @jainshilpi, @rovere, @lgray, @calderona, @HuguesBrun, @Fedespring, @afiqaize, @varuns23, @cericeci, @ram1123 this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@Martin-Grunewald
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c18dd3/11666/summary.html
CMSSW: CMSSW_11_3_X_2020-12-14-2300/slc7_amd64_gcc900

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 36
  • DQMHistoTests: Total histograms compared: 2747284
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2747261
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 35 files compared)
  • Checked 153 log files, 37 edm output root files, 36 DQM output files

@Martin-Grunewald
Copy link
Contributor

+1

}

namespace reco {
class EgTrigSumObj {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps EgammaTriggerSummary?
(just to be more readable and not go to even shorter EgTrSumOb)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer EgTrigSumObj as EgammaTriggerSummary suggests its a summary of the entire trigger rather than a specific e/gamma object. I could also go to EgHLTSumObj or EgHLTObj. However its not something I feel super strong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EgammaHLTSummaryObject ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could do, I mean I'm trying to have a relatively short name. I think "EgHLT" should be an acceptable abbreviation which anybody using this class will figure out. I do concede on reflection "Sum" is a bit less clear. Obj I would imagine is also fairly clear.

With the summary, I'm trying to get across that its an object which is produced at the end of the HLT rather than an object which is part of the HLT decision process. Now this is important to get right as there will be very soon be a MuHLTSumObj, a TauHLTSumObj, a BTagHLTSumObj and a JetHLTSumObj (although jet & btag will likely merge).

So I do appreciate suggestions here, I know I'm terrible at names.

I could live with
EgHLTSummaryObject
MuHLTSummaryObject
TauHLTSummaryObject
JetHLTSummaryObject

but I would prefer something a bit shorter but not to the point thats confusing.

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still use Muon and Egamma, these are common class naming patterns in CMSSW and it's better to be consistent.

With the summary, I'm trying to get across that its an object which is produced at the end of the HLT rather than an object which is part of the HLT decision process.

Looking at existing patterns,
isn't a TriggerObject from DataFormats/HLTReco/interface/TriggerObject.h supposed to cover the final objects as well?
Then, EgammaTriggerObject seems like a natural extension.

But if the endpoint needs to be included, berhaps EgammaTriggerResultObject ?

given the trigger/HLT specificity, does this have to be in the reco namespace?

Also, given the ideas for other variants, wouldn't it be better to inherit from smth like TriggerObject for some common interface?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apparently #32486 (comment)
and #32486 (comment) were ignored.
It would be nice to at least have a comment why they were not considered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry they were not ignored, just missed (my bad, I need to fix my email filters better.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apparently #32486 (comment)
and #32486 (comment) were ignored.
It would be nice to at least have a comment why they were not considered.

Sorry Slava, I also missed them during the review.
Removing the "+1" while waiting for Sam making the update

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this indeed might be better.

What do you think @fwyzard, @Martin-Grunewald

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EgammaTriggerObject is fine with me especially if you extend TriggerObject.

float eta_;
float phi_;
bool hasPixelMatch_;
std::unordered_map<std::string, float> vars_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have any other examples of unordered_map streamed in CMSSW?
@cms-sw/core-l2 do we have good support for this in root?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have any other examples of unordered_map streamed in CMSSW?

It seems we do have a few

std::unordered_map<std::string, std::vector<float>> feature_map_;

std::unordered_map<uint32_t, edm::Ptr<C>> constituents_;
std::unordered_map<uint32_t, double> constituentsFraction_;

std::unordered_map<unsigned short, l1t::HGCalTower> towerMap_;

@cms-sw/core-l2 do we have good support for this in root?

Good question, @pcanal (@Dr15Jones) could you comment?

How many elements is the map expected to have?

Traditional alternative would be a sorted std::vector<std::pair<std::string, float>> (or separate std::vector<std::string> and std::vector<float>, those would allow to make varNames() trivial).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have any other examples of unordered_map streamed in CMSSW?

It seems we do have a few

IIRC, DeepBoostedJetFeatures is effectively transient (although it is not declared as such): it is not saved.

I'm not familiar with the L1THGCal cases.

We've recently gone through a case with problems of streaming a usual std::map in cms-sw/cmsdist#6388 (comment)
If this data is important it may be worth to pick a different type, like a suggested vector<pair

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the spot, I had not considered this. I used std::unorderedmap simply for convenience but we get better data stability with a sorted std::vector then thats not a problem.

It will contain on average a maximum of ~about 20, possibly 30 entries I believe but I can check.

On the splitting it to std::string and std::vector, so I personally dislike having two vectors coupled to each other as in we need to ensure they are the same length, kept up to date etc. This is not overly difficult and I'm happy to go with yours and others suggestions on what would be the best practice to create a set of read only key,value pairs in a class.

<class name="edm::Wrapper<edm::ValueMap<reco::SuperClusterRef> >" />

<class name="reco::EgTrigSumObj" ClassVersion="10">
<version ClassVersion="10" checksum="2636406548"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use 3 as the initial version number of a class which has never been stored before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah thank you I did not know that, much appreicated!

@perrotta
Copy link
Contributor

In general, even if there is some overlap I agree that the scouting class and this debug one can remain separated, as they serve different purposes and will probaly need to be updated and maintainent independently.

By the way @Sam-Harper , do you have some script or simple recipe to test that this new code is actually working?

@Sam-Harper
Copy link
Contributor Author

okay I've now done all the renaming (I had some other ideas how to do this but ultimate decided not to do them), I squished all the commits to avoid modifying packages in Reco and make the backport easier

@Sam-Harper
Copy link
Contributor Author

Oh and I named it trigger::EgammaObject (the trigger::EgammaTriggerObject seemed a bit redundant)

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 5, 2021

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32486/21028

  • This PR adds an extra 28KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 5, 2021

Pull request #32486 was updated. @Martin-Grunewald, @cmsbuild, @fwyzard can you please check and sign again.

@perrotta
Copy link
Contributor

perrotta commented Feb 5, 2021

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 5, 2021

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c18dd3/12737/summary.html
COMMIT: 9ae839c
CMSSW: CMSSW_11_3_X_2021-02-05-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/32486/12737/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 37
  • DQMHistoTests: Total histograms compared: 2751765
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2751742
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 36 files compared)
  • Checked 156 log files, 37 edm output root files, 37 DQM output files

@Sam-Harper
Copy link
Contributor Author

@Martin-Grunewald , could you have a look at the new solution please and say if you're happy with it. Its now pure HLT :) Thanks!

@Martin-Grunewald
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 9, 2021

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@silviodonato
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants