Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Electron MVA V2 and general code changes #23473

Merged
merged 7 commits into from Jun 26, 2018
Merged

Electron MVA V2 and general code changes #23473

merged 7 commits into from Jun 26, 2018

Conversation

guitargeek
Copy link
Contributor

@guitargeek guitargeek commented Jun 4, 2018

  • Implemented Fall17 Electron MVA ID V2.
  • Restructured Electron and Photon MVA ID such that input variables are automatically obtained given their name in the XML weight file.
  • MVA ID categorization and cuts now specified in python config file.
  • Added Electron ntuplizer to train/validate MVA implementation
  • Added Photon ntuplizer to validate Photon MVA implementation.

The changes go together with cms-data/RecoEgamma-ElectronIdentification#9.

It is interesting for @Sam-Harper, @lsoffi, @michelif, @UAEDF-tomc.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 4, 2018

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 4, 2018

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-23473/5020

Code check has found code style and quality issues which could be resolved by applying a patch in https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-23473/5020/git-diff.patch
e.g. curl https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-23473/5020/git-diff.patch | patch -p1

You can run scram build code-checks to apply code checks directly

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 5, 2018

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 5, 2018

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 5, 2018

A new Pull Request was created by @guitargeek (Jonas Rembser) for master.

It involves the following packages:

RecoEgamma/EgammaTools
RecoEgamma/ElectronIdentification
RecoEgamma/PhotonIdentification

@perrotta, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks.
@jainshilpi, @Sam-Harper, @varuns23, @lgray this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@slava77
Copy link
Contributor

slava77 commented Jun 5, 2018

@cmsbuild please test with cms-sw/cmsdist#4056

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 5, 2018

The tests are being triggered in jenkins.
Using externals from cms-sw/cmsdist#4056
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/28477/console

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 5, 2018

-1

Tested at: fd07032

You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-23473/28477/summary.html

I found follow errors while testing this PR

Failed tests: UnitTests RelVals

  • Unit Tests:

I found errors in the following unit tests:

---> test runtestRecoEgammaElectronIdentification had ERRORS
---> test runtestPhysicsToolsNanoAOD had ERRORS

  • RelVals:

When I ran the RelVals I found an error in the following worklfows:
4.53 step3

runTheMatrix-results/4.53_RunPhoton2012B+RunPhoton2012B+HLTD+RECODR1reHLT+HARVESTDR1reHLT/step3_RunPhoton2012B+RunPhoton2012B+HLTD+RECODR1reHLT+HARVESTDR1reHLT.log

136.7611 step2
runTheMatrix-results/136.7611_RunJetHT2016E_reminiaod+RunJetHT2016E_reminiaod+REMINIAOD_data2016_HIPM+HARVESTDR2_REMINIAOD_data2016_HIPM/step2_RunJetHT2016E_reminiaod+RunJetHT2016E_reminiaod+REMINIAOD_data2016_HIPM+HARVESTDR2_REMINIAOD_data2016_HIPM.log

136.731 step3
runTheMatrix-results/136.731_RunSinglePh2016B+RunSinglePh2016B+HLTDR2_2016+RECODR2_2016reHLT_skimSinglePh_HIPM+HARVESTDR2/step3_RunSinglePh2016B+RunSinglePh2016B+HLTDR2_2016+RECODR2_2016reHLT_skimSinglePh_HIPM+HARVESTDR2.log

136.8311 step2
runTheMatrix-results/136.8311_RunJetHT2017F_reminiaod+RunJetHT2017F_reminiaod+REMINIAOD_data2017+HARVEST2017_REMINIAOD_data2017/step2_RunJetHT2017F_reminiaod+RunJetHT2017F_reminiaod+REMINIAOD_data2017+HARVEST2017_REMINIAOD_data2017.log

135.4 step3
runTheMatrix-results/135.4_ZEE_13+ZEEFS_13+HARVESTUP15FS+MINIAODMCUP15FS/step3_ZEE_13+ZEEFS_13+HARVESTUP15FS+MINIAODMCUP15FS.log

136.788 step3
runTheMatrix-results/136.788_RunSinglePh2017B+RunSinglePh2017B+HLTDR2_2017+RECODR2_2017reHLT_skimSinglePh_Prompt+HARVEST2017/step3_RunSinglePh2017B+RunSinglePh2017B+HLTDR2_2017+RECODR2_2017reHLT_skimSinglePh_Prompt+HARVEST2017.log

9.0 step3
runTheMatrix-results/9.0_Higgs200ChargedTaus+Higgs200ChargedTaus+DIGI+RECO+HARVEST/step3_Higgs200ChargedTaus+Higgs200ChargedTaus+DIGI+RECO+HARVEST.log

1325.7 step2
runTheMatrix-results/1325.7_TTbar_13_94Xv2NanoAODINPUT+TTbar_13_94Xv2NanoAODINPUT+NANOEDMMC2017_94XMiniAODv2+HARVESTNANOAODMC2017_94XMiniAODv2/step2_TTbar_13_94Xv2NanoAODINPUT+TTbar_13_94Xv2NanoAODINPUT+NANOEDMMC2017_94XMiniAODv2+HARVESTNANOAODMC2017_94XMiniAODv2.log

25.0 step3
runTheMatrix-results/25.0_TTbar+TTbar+DIGI+RECOAlCaCalo+HARVEST+ALCATT/step3_TTbar+TTbar+DIGI+RECOAlCaCalo+HARVEST+ALCATT.log

1306.0 step3
runTheMatrix-results/1306.0_SingleMuPt1_UP15+SingleMuPt1_UP15+DIGIUP15+RECOUP15+HARVESTUP15/step3_SingleMuPt1_UP15+SingleMuPt1_UP15+DIGIUP15+RECOUP15+HARVESTUP15.log

1330.0 step3
runTheMatrix-results/1330.0_ZMM_13+ZMM_13+DIGIUP15+RECOUP15_L1TMuDQM+HARVESTUP15_L1TMuDQM/step3_ZMM_13+ZMM_13+DIGIUP15+RECOUP15_L1TMuDQM+HARVESTUP15_L1TMuDQM.log

136.85 step3
runTheMatrix-results/136.85_RunEGamma2018A+RunEGamma2018A+HLTDR2_2018+RECODR2_2018reHLT_skimEGamma_Prompt_L1TEgDQM+HARVEST2018_L1TEgDQM/step3_RunEGamma2018A+RunEGamma2018A+HLTDR2_2018+RECODR2_2018reHLT_skimEGamma_Prompt_L1TEgDQM+HARVEST2018_L1TEgDQM.log

10042.0 step3
runTheMatrix-results/10042.0_ZMM_13+ZMM_13TeV_TuneCUETP8M1_2017_GenSimFull+DigiFull_2017+RecoFull_2017+HARVESTFull_2017+ALCAFull_2017/step3_ZMM_13+ZMM_13TeV_TuneCUETP8M1_2017_GenSimFull+DigiFull_2017+RecoFull_2017+HARVESTFull_2017+ALCAFull_2017.log

25202.0 step3
runTheMatrix-results/25202.0_TTbar_13+TTbar_13+DIGIUP15_PU25+RECOUP15_PU25+HARVESTUP15_PU25/step3_TTbar_13+TTbar_13+DIGIUP15_PU25+RECOUP15_PU25+HARVESTUP15_PU25.log

10024.0 step3
runTheMatrix-results/10024.0_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2017_GenSimFull+DigiFull_2017+RecoFull_2017+HARVESTFull_2017+ALCAFull_2017/step3_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2017_GenSimFull+DigiFull_2017+RecoFull_2017+HARVESTFull_2017+ALCAFull_2017.log

10224.0 step3
runTheMatrix-results/10224.0_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2017PU_GenSimFull+DigiFullPU_2017PU+RecoFullPU_2017PU+HARVESTFullPU_2017PU/step3_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2017PU_GenSimFull+DigiFullPU_2017PU+RecoFullPU_2017PU+HARVESTFullPU_2017PU.log

10824.0 step3
runTheMatrix-results/10824.0_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2018_GenSimFull+DigiFull_2018+RecoFull_2018+HARVESTFull_2018+ALCAFull_2018/step3_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2018_GenSimFull+DigiFull_2018+RecoFull_2018+HARVESTFull_2018+ALCAFull_2018.log

11624.0 step3
runTheMatrix-results/11624.0_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2019_GenSimFull+DigiFull_2019+RecoFull_2019+HARVESTFull_2019+ALCAFull_2019/step3_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2019_GenSimFull+DigiFull_2019+RecoFull_2019+HARVESTFull_2019+ALCAFull_2019.log

20034.0 step3
runTheMatrix-results/20034.0_TTbar_14TeV+TTbar_14TeV_TuneCUETP8M1_2023D17_GenSimHLBeamSpotFull14+DigiFullTrigger_2023D17+RecoFullGlobal_2023D17+HARVESTFullGlobal_2023D17/step3_TTbar_14TeV+TTbar_14TeV_TuneCUETP8M1_2023D17_GenSimHLBeamSpotFull14+DigiFullTrigger_2023D17+RecoFullGlobal_2023D17+HARVESTFullGlobal_2023D17.log

21234.0 step3
runTheMatrix-results/21234.0_TTbar_14TeV+TTbar_14TeV_TuneCUETP8M1_2023D21_GenSimHLBeamSpotFull14+DigiFullTrigger_2023D21+RecoFullGlobal_2023D21+HARVESTFullGlobal_2023D21/step3_TTbar_14TeV+TTbar_14TeV_TuneCUETP8M1_2023D21_GenSimHLBeamSpotFull14+DigiFullTrigger_2023D21+RecoFullGlobal_2023D21+HARVESTFullGlobal_2023D21.log

20434.0 step3
runTheMatrix-results/20434.0_TTbar_14TeV+TTbar_14TeV_TuneCUETP8M1_2023D19_GenSimHLBeamSpotFull14+DigiFullTrigger_2023D19+RecoFullGlobal_2023D19+HARVESTFullGlobal_2023D19/step3_TTbar_14TeV+TTbar_14TeV_TuneCUETP8M1_2023D19_GenSimHLBeamSpotFull14+DigiFullTrigger_2023D19+RecoFullGlobal_2023D19+HARVESTFullGlobal_2023D19.log

250202.181 step4
runTheMatrix-results/250202.181_TTbar_13UP18+TTbar_13UP18+PREMIXUP18_PU25+DIGIPRMXLOCALUP18_PU25+RECOPRMXUP18_PU25+HARVESTUP18_PU25/step4_TTbar_13UP18+TTbar_13UP18+PREMIXUP18_PU25+DIGIPRMXLOCALUP18_PU25+RECOPRMXUP18_PU25+HARVESTUP18_PU25.log

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 5, 2018

Comparison not run due to runTheMatrix errors (RelVals and Igprof tests were also skipped)

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-23473/28844/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 222 differences found in the comparisons
  • DQMHistoTests: Total files compared: 31
  • DQMHistoTests: Total histograms compared: 2899480
  • DQMHistoTests: Total failures: 26
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2899264
  • DQMHistoTests: Total skipped: 190
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 30 files compared)
  • Checked 128 log files, 14 edm output root files, 31 DQM output files

@perrotta
Copy link
Contributor

I would like to understand the new eleID output as it got updated by this PR, e.g. https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_10_2_X_2018-06-22-2300+23473/27222/validateJR/all_mini_OldVSNew_RunSinglePh2017B136p788/

Overall: could you please check the output, confirm that it does correspond to what you expect, and possibly also explain here what do you actually expect?

@guitargeek
Copy link
Contributor Author

guitargeek commented Jun 25, 2018

Hi Andrea, thanks for pointing out these plots. Yes, 7 new IDs is what I expected. The iso V2 has 4 working points because it's including the HZZ working point:
https://github.com/guitargeek/cmssw/blob/ElectronID_MVA2017_V2_102X/RecoEgamma/ElectronIdentification/python/Identification/mvaElectronID_Fall17_iso_V2_cff.py#L80
I didn't include it in NanoAOD though because the HZZ analysis doesn't really need it there.

What is unexpected for me is that some of the old IDs got changed, this should not be the case. At least the MVA values didn't change, otherwise we would have seen that too. I think it's just related to the way the IDs are ordered in MiniAOD:
https://github.com/cms-sw/cmssw/pull/23473/files#diff-76dede28e7751d7bb6c5c9542de4155cR282

I'm still trying to understand what's going on exactly, but how are the IDs ordered in MiniAOD? Alphabetically or in the order as they are appear in miniAOD_tools.py? I'm investigating but I anticipate that I will come to the conclusion that the IDs got just reshuffled.

@slava77
Copy link
Contributor

slava77 commented Jun 25, 2018 via email

@guitargeek
Copy link
Contributor Author

I didn't know that we do this kind of comparisons, otherwise I wouldn't have inserted them in the middle...

Slava the reason why I'm alarmed it because the number of IDs before the insertion is 1+4+4+3+3 = 15. While the in plots, the first 16 IDs agree, which is one too much. So either there is one additional agreeing by chance (unlikely), or there is something going on which I don't understand.

@slava77
Copy link
Contributor

slava77 commented Jun 25, 2018 via email

@guitargeek
Copy link
Contributor Author

guitargeek commented Jun 25, 2018

I don't know which module I should run edmConfigDump on, since miniAOD_tools.py is not a cfg file...Can you please help me out and tell me more specifically what you'd do? Otherwise I'm just trying to figure it out myself.

@slava77
Copy link
Contributor

slava77 commented Jun 25, 2018 via email

@guitargeek
Copy link
Contributor Author

guitargeek commented Jun 25, 2018

Ok thank you but now I already dumped the pairs directly from there:
https://github.com/cms-sw/cmssw/blob/master/DataFormats/PatCandidates/interface/Electron.h#L144

Let's check this file here form the tests:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-23473/28844/runTheMatrix-results/136.788_RunSinglePh2017B+RunSinglePh2017B+HLTDR2_2017+RECODR2_2017reHLT_skimSinglePh_Prompt+HARVEST2017/step3_inMINIAOD.root

So if I loop over the that vector<pair<string, float>> electronIDs and print it out I get:

cutBasedElectronID-Fall17-94X-V1-loose 0
cutBasedElectronID-Fall17-94X-V1-medium 0
cutBasedElectronID-Fall17-94X-V1-tight 0
cutBasedElectronID-Fall17-94X-V1-veto 1
cutBasedElectronID-Fall17-94X-V2-loose 0
cutBasedElectronID-Fall17-94X-V2-medium 0
cutBasedElectronID-Fall17-94X-V2-tight 0
cutBasedElectronID-Fall17-94X-V2-veto 0
cutBasedElectronID-Summer16-80X-V1-loose 0
cutBasedElectronID-Summer16-80X-V1-medium 0
cutBasedElectronID-Summer16-80X-V1-tight 0
cutBasedElectronID-Summer16-80X-V1-veto 1
heepElectronID-HEEPV70 0
mvaEleID-Fall17-iso-V1-wp80 0
mvaEleID-Fall17-iso-V1-wp90 0
mvaEleID-Fall17-iso-V1-wpLoose 1
mvaEleID-Fall17-iso-V2-wp80 0
mvaEleID-Fall17-iso-V2-wp90 0
mvaEleID-Fall17-iso-V2-wpHZZ 0
mvaEleID-Fall17-iso-V2-wpLoose 0
mvaEleID-Fall17-noIso-V1-wp80 0
mvaEleID-Fall17-noIso-V1-wp90 0
mvaEleID-Fall17-noIso-V1-wpLoose 1
mvaEleID-Fall17-noIso-V2-wp80 0
mvaEleID-Fall17-noIso-V2-wp90 0
mvaEleID-Fall17-noIso-V2-wpLoose 0
mvaEleID-Spring16-GeneralPurpose-V1-wp80 0
mvaEleID-Spring16-GeneralPurpose-V1-wp90 0
mvaEleID-Spring16-HZZ-V1-wpLoose 1

So it's indeed alphabetically sorted and the new IDs were inserted exactly in the place that we anticipated from the plots. Here my code for reproducing:

typedef std::pair<std::string,float> IdPair;

std::vector<IdPair> electronIDs = ele->electronIDs();
for (std::vector<IdPair>::const_iterator it = electronIDs.begin(), ed = electronIDs.end(); it != ed; ++it) {
    std::cout << it->first << " " << it->second << std::endl;
}

ele is a pat::Electron. It's maybe not optimal that the ordering is alphabetically because like this we will always get this confusion once new IDs are not added at the end of the alphabetical order.

For comparison, here the same thing for the equivalent file from a random PR where the electron ID was not changed:

https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-23665/28846/runTheMatrix-results/136.788_RunSinglePh2017B+RunSinglePh2017B+HLTDR2_2017+RECODR2_2017reHLT_skimSinglePh_Prompt+HARVEST2017/step3_inMINIAOD.root

cutBasedElectronID-Fall17-94X-V1-loose 0
cutBasedElectronID-Fall17-94X-V1-medium 0
cutBasedElectronID-Fall17-94X-V1-tight 0
cutBasedElectronID-Fall17-94X-V1-veto 0
cutBasedElectronID-Fall17-94X-V2-loose 0
cutBasedElectronID-Fall17-94X-V2-medium 0
cutBasedElectronID-Fall17-94X-V2-tight 0
cutBasedElectronID-Fall17-94X-V2-veto 0
cutBasedElectronID-Summer16-80X-V1-loose 0
cutBasedElectronID-Summer16-80X-V1-medium 0
cutBasedElectronID-Summer16-80X-V1-tight 0
cutBasedElectronID-Summer16-80X-V1-veto 0
heepElectronID-HEEPV70 0
mvaEleID-Fall17-iso-V1-wp80 0
mvaEleID-Fall17-iso-V1-wp90 0
mvaEleID-Fall17-iso-V1-wpLoose 0
mvaEleID-Fall17-noIso-V1-wp80 0
mvaEleID-Fall17-noIso-V1-wp90 0
mvaEleID-Fall17-noIso-V1-wpLoose 0
mvaEleID-Spring16-GeneralPurpose-V1-wp80 0
mvaEleID-Spring16-GeneralPurpose-V1-wp90 0
mvaEleID-Spring16-HZZ-V1-wpLoose 0

@perrotta
Copy link
Contributor

Thank you @guitargeek and @slava77 : everything seems to be as it should, then!

@perrotta
Copy link
Contributor

+1

  • Fall17 Electron MVA ID V2 are implemented and included in the miniAOD output
  • Related code is also cleaned and rationalised
  • Little time increase (which is fraction of ms's, anyhow) in the electron MVA ID producer is compatible with running it on a larger set of IDs
  • Reco output is unchanged; miniAOD output includes now the additional IDs
  • Merging this PR requires that the external [10.2.X] Ele MVA V2 weight files and gzipped all Run 2 weight files cmsdist#4056 is also merged

@fabiocos
Copy link
Contributor

+1

@arizzi @gpetruc please note that this affects also NanoAOD

@fabiocos
Copy link
Contributor

merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants