Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated HLT paths for EXODisappTrk skim #38919

Merged
merged 2 commits into from Aug 22, 2022

Conversation

carriganm95
Copy link
Contributor

PR description:

This PR changes the HLT paths for the EXO disappearing tracks skim. Previously the HLT path MC_PFMET was used for testing MC data. We are now removing this path and adding the HLT paths we plan to use for run 3. They are:

  "HLT_PFMET*_PFMHT*_IDTight_v*",
  "HLT_PFMETTypeOne*_PFMHT*_IDTight_v*",
  "HLT_PFMETNoMu*_PFMHTNoMu*_IDTight_v*", 
  "HLT_MET*_IsoTrk*_v*", 
  "HLT_PFMET*_*Cleaned_v*", 
  "HLT_Ele*_WPTight_Gsf_v*", 
  "HLT_Ele*_WPLoose_Gsf_v*", 
  "HLT_IsoMu*_v*", 
  "HLT_IsoMu*_TightChargedIsoPFTauHPS*_Trk1_eta2p1_SingleL1_v*", 
  "HLT_IsoMu*_eta2p1_TightChargedIsoPFTauHPS*_eta2p1_CrossL1_v*",
  "HLT_IsoMu*_MediumChargedIsoPFTauHPS20_Trk1_eta2p1_SingleL1_v*", 
  "HLT_MediumChargedIsoPFTau*HighPtRelaxedIso_Trk50_eta2p1_v*", 
  "HLT_VBF_DoubleMediumChargedIsoPFTauHPS20_Trk1_eta2p1_v*",
  "HLT_DoubleMediumDeepTauIsoPFTauHPS*_L2NN_eta2p1_v*",
  "HLT_DoubleMediumChargedIsoPFTauHPS*_Trk1_eta2p1_v*"

PR validation:

Ran test script over Run2022C data to check output file size. Used EGamma, SingleMuon, Tau, and MET datasets for testing because these are the datasets used by the disappearing tracks analysis.

MET
Total Events: 136898
Total Size: 2.0 GB
Number of Files: 91
Average Size: 21.9 MB
Dataset Size: 568.65 GB
% Size Saved: 0.35%

Tau
Total Events: 91896
Total Size: 1.2 GB
Number of Files: 91
Average Size: 13.2 MB
Dataset Size: 210.15 GB
% Size Saved: 0.57%

SingleMuon
Total Events: 1828792
Total Size: 7.9 GB
Number of Files: 84
Average Size: 96.0 MB
Dataset Size: 2436.23 GB
% Size Saved: 0.32%

EGamma
Total Events: 1715084
Total Size: 12.1 GB
Number of Files: 82
Average Size: 151.7 MB
Dataset Size: 6280.43 GB
% Size Saved: 0.19%

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 1, 2022

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-38919/31366

  • This PR adds an extra 12KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 1, 2022

A new Pull Request was created by @carriganm95 for master.

It involves the following packages:

  • Configuration/Skimming (pdmv)

@cmsbuild, @bbilin, @kskovpen, @jordan-martins can you please review it and eventually sign? Thanks.
@Martin-Grunewald, @missirol, @fabiocos this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@kskovpen
Copy link
Contributor

kskovpen commented Aug 1, 2022

please test

@missirol
Copy link
Contributor

missirol commented Aug 1, 2022

@carriganm95

One question from HLT (just for my own education): how is this list of triggers decided?

For example, the latest HLT menu also includes new triggers like HLT_VBF_DoubleMediumDeepTauPFTauHPS20_eta2p1_v* and HLT_PFMETNoMu*_PFMHTNoMu*_IDTight_FilterHF_v*, but those aren't included in the current list.

@carriganm95
Copy link
Contributor Author

@carriganm95

One question from HLT (just for my own education): how is this list of triggers decided?

For example, the latest HLT menu also includes new triggers like HLT_VBF_DoubleMediumDeepTauPFTauHPS20_eta2p1_v* and HLT_PFMETNoMu*_PFMHTNoMu*_IDTight_FilterHF_v*, but those aren't included in the current list.

For the MET HLT we did not include it because it was not used in the run 2 analysis. I think you are right that it could be useful to us and so we can add it in.

For the tau HLT we used to use HLT_MediumChargedIsoPFTau50_Trk30_eta2p1_1pr_v* and this was removed from the list in run 3. Instead we have added the above triggers because we are unsure what will work for us in the end. Again I think adding the trigger you suggested is a good idea. We only use the taus as a control sample to estimate the probability that a non reconstructed tau could pass our offline selections. The triggers we have listed should give us enough taus to accomplish that but having more cannot hurt.

Comment on lines 21 to 24
"HLT_IsoMu*_v*",
"HLT_IsoMu*_TightChargedIsoPFTauHPS*_Trk1_eta2p1_SingleL1_v*",
"HLT_IsoMu*_eta2p1_TightChargedIsoPFTauHPS*_eta2p1_CrossL1_v*",
"HLT_IsoMu*_MediumChargedIsoPFTauHPS20_Trk1_eta2p1_SingleL1_v*",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that the first pattern is already a superset of the last 3 (and includes more than those, e.g. HLT_IsoMu50_AK8PFJet230_SoftDropMass40_v).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has been fixed

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 1, 2022

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-df5988/26568/summary.html
COMMIT: a9465cf
CMSSW: CMSSW_12_5_X_2022-08-01-1100/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/38919/26568/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 51
  • DQMHistoTests: Total histograms compared: 3669004
  • DQMHistoTests: Total failures: 8
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3668974
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 50 files compared)
  • Checked 210 log files, 47 edm output root files, 51 DQM output files
  • TriggerResults: no differences found

removed duplicate paths, added tau paths
@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-38919/31486

  • This PR adds an extra 16KB to repository

@cmsbuild
Copy link
Contributor

Pull request #38919 was updated. @cmsbuild, @bbilin, @kskovpen, @jordan-martins can you please check and sign again.

@carriganm95
Copy link
Contributor Author

@kskovpen can this be checked again? The updates were following the advice from @missirol

@kskovpen
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-df5988/26835/summary.html
COMMIT: 4a908c3
CMSSW: CMSSW_12_5_X_2022-08-15-1100/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/38919/26835/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 51
  • DQMHistoTests: Total histograms compared: 3692476
  • DQMHistoTests: Total failures: 8
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3692446
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 50 files compared)
  • Checked 212 log files, 49 edm output root files, 51 DQM output files
  • TriggerResults: no differences found

@kskovpen
Copy link
Contributor

+pdmv

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

Wildcards are usually discouraged, in order to avoid unwanted inclusion of already existing or lately included paths in the menu. You made a widespread usage of those wildcards here:

  • Could you please check confirm that they are all justified, and you don't list every HLT path separately because you want to be ready to accept possible new ones if lately added?
  • Could @cms-sw/hlt-l2 (re)confirm that they support tall the usages of wildcards here, instead of listing the paths one by one?

@missirol
Copy link
Contributor

Could hlt-l2 (re)confirm that they support tall the usages of wildcards here, instead of listing the paths one by one?

Regarding what the plugin HLTHighLevel supports, it does support the use of the * wildcard, and not much more (internally, it uses the utilities from RegexMatch.h).

if (edm::is_glob(pattern)) {

std::vector<std::vector<std::string>::const_iterator> regexMatch(std::vector<std::string> const& strings,

Regarding the use of wildcards itself ..

Wildcards are usually discouraged, in order to avoid unwanted inclusion of already existing or lately included paths in the menu.

I would have the same concern (for example, HLT_IsoMu* seems too inclusive to me, as I tried to point out in #38919 (comment)). On the other hand, it's not really my decision (and I'm not familiar enough with this use case).

@Martin-Grunewald
Copy link
Contributor

Either way a change may (or may not) be needed in case new paths are added or old paths are removed. If it is clear that all possible current & future matches are supposed to be skimmed, then I suppose those wildcards are ok.

@perrotta
Copy link
Contributor

Either way a change may (or may not) be needed in case new paths are added or old paths are removed. If it is clear that all possible current & future matches are supposed to be skimmed, then I suppose those wildcards are ok.

Indeed.
Once it is made clear that "all possible current & future matches are supposed to be skimmed" we will merge this PR

@perrotta
Copy link
Contributor

Either way a change may (or may not) be needed in case new paths are added or old paths are removed. If it is clear that all possible current & future matches are supposed to be skimmed, then I suppose those wildcards are ok.

Indeed. Once it is made clear that "all possible current & future matches are supposed to be skimmed" we will merge this PR

@carriganm95 could you please either confirm that all possible current & future matches corresponding to the wildcards as set here are supposed to be skimmed, or modify the PR accordingly?

@carriganm95
Copy link
Contributor Author

@perrotta yes I can confirm that all of the current and future matches are what we want to skim. We have intentionally made this general so that we do not have to go back and reskim later.

@perrotta
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants