Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JMENanoAOD] Rework for Prompt #35614

Closed

Conversation

nurfikri89
Copy link
Contributor

PR description:

This PR is a follow up for PR #34714. In PhysicsTools/NanoAOD/python/custom_jme_cff.py, Sequences have been changed to Tasks and JMENano can now run in Prompt. The JMENano workflows are enabled back again, which was disabled by PR #35140 while waiting for this PR.

Some rearrangements are done in PhysicsTools/PatAlgos/python/tools/jetCollectionTools.py so that packedPrimaryVertexAssociationJME can be used for CHS and Puppi. It is a fix following up on PR #33885.

PR validation:

  • passes the usual runTheMatrix test: runTheMatrix.py -l limited -i all --ibeos
  • passes the JMENano workflows: runTheMatrix.py -i all --ibeos -l 10224.15,11024.15,25202.15
  • passes the following command using the output of step2 from 11634.0 workflow as input:
cmsDriver.py step3 \
-s RAW2DIGI,L1Reco,RECO,RECOSIM,EI,PAT,NANO,VALIDATION:@standardValidation+@miniAODValidation,DQM:@standardDQM+@ExtraHLT+@miniAODDQM \
--conditions auto:phase1_2021_realistic \
--datatier GEN-SIM-RECO,MINIAODSIM,NANOAODSIM,DQMIO \
-n 10 \
--geometry DB:Extended \
--era Run3,run3_nanoAOD_devel \
--eventcontent RECOSIM,MINIAODSIM,NANOEDMAODSIM,DQM \
--customise_commands="process.add_(cms.Service('InitRootHandlers', EnableIMT = cms.untracked.bool(False))) \n from PhysicsTools.NanoAOD.custom_jme_cff import PrepJMECustomNanoAOD_MC; PrepJMECustomNanoAOD_MC(process)\n"

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35614/25887

  • This PR adds an extra 32KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @nurfikri89 (Nurfikri Norjoharuddeen) for master.

It involves the following packages:

  • Configuration/PyReleaseValidation (pdmv, upgrade)
  • PhysicsTools/NanoAOD (xpog)
  • PhysicsTools/PatAlgos (reconstruction)

@gouskos, @jordan-martins, @bbilin, @wajidalikhan, @cmsbuild, @AdrianoDee, @srimanob, @fgolf, @kskovpen, @slava77, @jpata, @mariadalfonso can you please review it and eventually sign? Thanks.
@rappoccio, @gouskos, @schoef, @emilbols, @kpedro88, @Martin-Grunewald, @mbluj, @ahinzmann, @demuller, @seemasharmafnal, @mmarionncern, @jdolen, @makortel, @fabiocos, @missirol, @slomeo, @jdamgov, @nhanvtran, @gkasieczka, @hatakeyamak, @andrzejnovak, @clelange, @AlexDeMoor, @swertz, @swozniewski, @JyothsnaKomaragiri, @gpetruc, @mariadalfonso this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35614/25888

  • This PR adds an extra 32KB to repository

@cmsbuild
Copy link
Contributor

Pull request #35614 was updated. @gouskos, @jordan-martins, @bbilin, @wajidalikhan, @cmsbuild, @AdrianoDee, @srimanob, @fgolf, @kskovpen, @slava77, @jpata, @mariadalfonso can you please check and sign again.

@srimanob
Copy link
Contributor

test parameters:

  • workflows = 10224.15,11024.15,25202.15

@srimanob
Copy link
Contributor

@cmsbuild please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d651b8/19553/summary.html
COMMIT: 24feaf2
CMSSW: CMSSW_12_1_X_2021-10-11-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/35614/19553/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-d651b8/10224.15_TTbar_13+2017PU_JMENano+TTbar_13TeV_TuneCUETP8M1_GenSim+DigiPU+RecoFakeHLTPU+HARVESTFakeHLTPU+Nano
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-d651b8/11024.15_TTbar_13+2018PU_JMENano+TTbar_13TeV_TuneCUETP8M1_GenSim+DigiPU+RecoFakeHLTPU+HARVESTFakeHLTPU+Nano
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-d651b8/25202.15_TTbar_13+TTbar_13+DIGIUP15_PU25+RECOUP15_PU25+HARVESTUP15_PU25+NANOUP15MC_PU25_JME

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 40
  • DQMHistoTests: Total histograms compared: 2798082
  • DQMHistoTests: Total failures: 5
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 2798054
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.004 KiB( 39 files compared)
  • DQMHistoSizes: changed ( 312.0 ): 0.004 KiB MessageLogger/Warnings
  • Checked 169 log files, 37 edm output root files, 40 DQM output files
  • TriggerResults: no differences found

@mariadalfonso
Copy link
Contributor

mariadalfonso commented Oct 19, 2021

@nurfikri89

Thanks for the input. These pie charts are very useful.
Should be evident that the JMEnano can be more optimised (multiple running of PUjetID ...)
Suggest to start profiling your setup with igprof - function by function - now that we are in the transition of the Run2/Run3. This will have double benefit: it improves the miniAOD and also your custom private setup.

The fact that JME are basically reminiAOD is not considered ok, especially when suddenly 12B requests eat priority respect to the other samples.
I think you can do the same with much less CPU.

@mariadalfonso
Copy link
Contributor

hold

(to be addressed first #35614 (comment))

@cmsbuild
Copy link
Contributor

Pull request has been put on hold by @mariadalfonso
They need to issue an unhold command to remove the hold state or L1 can unhold it for all

@cmsbuild cmsbuild added the hold label Oct 19, 2021
@nurfikri89
Copy link
Contributor Author

@nurfikri89

Thanks for the input. These pie charts are very useful. Should be evident that the JMEnano can be more optimised (multiple running of PUjetID ...) Suggest to start profiling your setup with igprof - function by function - now that we are in the transition of the Run2/Run3. This will have double benefit: it improves the miniAOD and also your custom private setup.

The fact that JME are basically reminiAOD is not considered ok, especially when suddenly 12B requests eat priority respect to the other samples. I think you can do the same with much less CPU.

@mariadalfonso
Optimizing further the CPU time is something we will look into but does it need to be a part of this PR? This PR updates the JMENano such that it is in sync with the default Nano. As of this moment, JMENano does not run in the master branch. In my opinion, we should merge this PR and then start working on the optimization in another PR.

@mariadalfonso
Copy link
Contributor

@nurfikri89
Let's focus on having a sustainable product for Run3 and beyond.
Take shock of your 12B events PPD is giving you (3 months production of central resources, unique case in CMS).
You might rethink the whole process completely (probably) or just optimise the technical aspects.

@rappoccio
Copy link
Contributor

@mariadalfonso I am puzzled why you think 12B nano events is 3 months production of central resources?

@mariadalfonso
Copy link
Contributor

@rappoccio

JMEnano wf runs mini sequences to cluster jets and recompute properties, all very intensive cpu.
It stores nano format at the end.

JMEnano takes ~ same time of the mini. this is measured here #35614 (comment)

Of course central nano (bless by xPOG) is a different story: runs runs very little stuff, mostly package informations for analysis. Do not confuse the two products.

P.S.
Also look you progress bar of the 12B injected:
tool 1 months to finalise 4B so another 2 to complete the 12B
https://cms-pdmv.cern.ch/pmp/historical?r=*JMENanoAOD*

@mariadalfonso
Copy link
Contributor

mariadalfonso commented Oct 26, 2021

@nurfikri89
any progress on the CPU optimisation of your JMEntuples ?
can you add the pie charts for the memory usage: again comparing mini/nano/JMEnano.

The JMEnanoAOD requests marked as done, are making tiny progress (4.7/ 11.07 B 40% after 5 weeks after the injections)

Screen Shot 2021-10-26 at 15 33 03

@smuzaffar smuzaffar modified the milestones: CMSSW_12_1_X, CMSSW_12_2_X Oct 29, 2021
@mariadalfonso
Copy link
Contributor

@nurfikri89

based on your CPU piechart #35614 (comment)
you should

  1. switch off the HBB/HCC jet regression those target the VH analysis and are specialised to one particular jet and do apply to your extra custom collections
  2. pileup jetID: the DZ call take most of the cpu, this should be just the difference of two numbers.
    Please optimise how this is called
    https://github.com/cms-sw/cmssw/blob/master/RecoJets/JetProducers/src/PileupJetIdAlgo.cc#L261
    We are waiting this also for the central nano and miniAOD
  3. please check the the deepFlavour: the inference should be done per event and not per jet; only for the high PT jet you store at the end and not done down to pt>5.

once you take care of this I expect you reduce of 50% your CPU

@nurfikri89
Copy link
Contributor Author

Hi @mariadalfonso

My apologies for the very late reply to your comment. I was (and still is) too swamped with teaching and faculty matters.

@nurfikri89 any progress on the CPU optimisation of your JMEntuples ? can you add the pie charts for the memory usage: again comparing mini/nano/JMEnano.

Here are the memory usage for different productions:

Default Nano
igrprof_memory_nano_default

JME Nano
igrprof_memory_nano_jme

ULMiniAODv2
igrprof_memory_miniAOD

  1. switch off the HBB/HCC jet regression those target the VH analysis and are specialised to one particular jet and do apply to your extra custom collections
  2. pileup jetID: the DZ call take most of the cpu, this should be just the difference of two numbers.
    Please optimise how this is called
    https://github.com/cms-sw/cmssw/blob/master/RecoJets/JetProducers/src/PileupJetIdAlgo.cc#L261
    We are waiting this also for the central nano and miniAOD
  3. please check the the deepFlavour: the inference should be done per event and not per jet; only for the high PT jet you store at the end and not done down to pt>5.

once you take care of this I expect you reduce of 50% your CPU

Thank you for the suggestions. (1) should be easy. I am still looking at optimizing (2). (3) sounds like a very good step to take although I will have to consult the relevant experts on how to do this.

@mariadalfonso
Copy link
Contributor

@nurfikri89

The DQM of this is totally missing and since now are not your private production anymore this is will not be allowed.
please add something along this line PhysicsTools/NanoAOD/python/nanoDQM_cfi.py of course in a separate/dedicated file. I understdand that this was already asked years ago.

Please post here the result of the DQM.
This will need to be reported at each PR.

@srimanob
Copy link
Contributor

Any follow up on this @nurfikri89 @cms-sw/jetmet-pog-l2 @cms-sw/xpog-l2

@qliphy
Copy link
Contributor

qliphy commented Mar 14, 2022

close this now. @nurfikri89 please open a new updated one if needed.

@qliphy qliphy closed this Mar 14, 2022
@nurfikri89 nurfikri89 deleted the from1210pre4_jmenano_run3prompt branch September 1, 2022 16:00
@nurfikri89 nurfikri89 restored the from1210pre4_jmenano_run3prompt branch September 1, 2022 16:00
@nurfikri89 nurfikri89 deleted the from1210pre4_jmenano_run3prompt branch September 1, 2022 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants