Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to DNN-based strategy for outside-in seed generation in Muon HLT #37437

Merged
merged 1 commit into from Apr 6, 2022

Conversation

kondratyevd
Copy link
Contributor

@kondratyevd kondratyevd commented Apr 1, 2022

PR description:

This is a follow-up on #35237.
The DNN-based approach to outside-in seed generation for muon HLT has been revised.

  • The DNN model has been retrained to address the following issues:
    • Old models didn't perform well for low-pT muons.
    • Overall HLT timing for old models was higher than that for the baseline approach w/o DNN.
    • The approach of training two different models (barrel & endcap) was found to be inconvenient for optimization, and didn't provide a significant gain in terms of overall efficiency and timing.
  • Updates to DNN training:
    • Low-pT muons are included in training.
    • Hyperparameters optimized automatically using Keras Tuner (previously architecture was chosen manually). The optimized architecture features two hidden layers, with 1024 and 2048 nodes, respectively.
  • Change to TSGForOIDNN.cc plugin: remove split into barrel and endcap; use one model for all muons.

The updated model is added to cms-data: cms-data/RecoMuon-TrackerSeedGenerator#4. The older classifier models will not work with the updated plugin.

PR validation:

Slides describing the update: link.
The performance was checked on J/psi (low-pT) and Drell-Yan (high-pT) datasets. Overall HLT efficiency is similar for all models, while the timing for the newly trained models is improved.

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 1, 2022

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37437/29132

  • This PR adds an extra 20KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 1, 2022

A new Pull Request was created by @kondratyevd (Dmitry Kondratyev) for master.

It involves the following packages:

  • RecoMuon/TrackerSeedGenerator (reconstruction)

@jpata, @cmsbuild, @clacaputo, @slava77 can you please review it and eventually sign? Thanks.
@HuguesBrun, @abbiendi, @Fedespring, @bellan, @sscruz, @jhgoh, @CeliaFernandez, @trocino, @cericeci, @rociovilar this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@JanFSchulte
Copy link
Contributor

@missirol This together with the cms-data PR is for the HLT and would ideally also be backported to 12_3_0 if time permits.

@missirol
Copy link
Contributor

missirol commented Apr 2, 2022

Thanks for the info, @JanFSchulte . It looks like this PR is only updating the producer, so it does not really need to be tested together with cms-data/RecoMuon-TrackerSeedGenerator#4 , correct?

Concerning the backport to 12_3_X, I'm not sure how the backporting works for the cms-data update. If needed, you could ask in that PR.

@missirol
Copy link
Contributor

missirol commented Apr 2, 2022

urgent

MUO-HLT developers intend to have this update backported in time for 12_3_0.

("urgent" here means "to be backported in time for 12_3_0").

@cmsbuild cmsbuild added the urgent label Apr 2, 2022
@JanFSchulte
Copy link
Contributor

Yes, the cms-data PR does not affect the tests. In fact, the producer is not run in any tests at all, so they are meaningless for this PR.

@jpata
Copy link
Contributor

jpata commented Apr 4, 2022

assign hlt

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 4, 2022

New categories assigned: hlt

@missirol,@Martin-Grunewald you have been requested to review this Pull request/Issue and eventually sign? Thanks

@Martin-Grunewald
Copy link
Contributor

please test

@jpata
Copy link
Contributor

jpata commented Apr 4, 2022

The model file is fairly large (the largest I'm aware of in CMSSW):

BIN +10.1 MB OIseeding/DNNclassifier_Run3_inclusive.pb

Have you rechecked memory and CPU performance of the model? I suppose it's for HLT to evaluate if it's appropriate (since AFAIK this doesn't run offline), but just to be aware.

@missirol
Copy link
Contributor

missirol commented Apr 4, 2022

Have you rechecked memory and CPU performance of the model? I suppose it's for HLT to evaluate if it's appropriate (since AFAIK this doesn't run offline), but just to be aware.

Thanks for pointing this out (for the record, this does not run at HLT either yet, not even the previous version of this DNN did).

@JanFSchulte @khaosmos93 @kondratyevd , for CPU timing I see the slides have the numbers for HLT, but do you also have numbers for the amount of memory used by the model?

( cc: @silviodonato )

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 4, 2022

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-553822/23636/summary.html
COMMIT: 52ca744
CMSSW: CMSSW_12_4_X_2022-04-03-2300/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/37437/23636/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 12 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3593039
  • DQMHistoTests: Total failures: 25
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3592991
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.004 KiB( 47 files compared)
  • DQMHistoSizes: changed ( 312.0 ): 0.004 KiB MessageLogger/Warnings
  • Checked 200 log files, 45 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@missirol
Copy link
Contributor

missirol commented Apr 4, 2022

+hlt

  • not validated by PR tests, relies on validation done by the MUO POG in the context of HLT development

@jpata
Copy link
Contributor

jpata commented Apr 5, 2022

+reconstruction

  • does not / is not intended to run offline, nothing for us to validate here (code changes look reasonable)

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 5, 2022

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@missirol
Copy link
Contributor

missirol commented Apr 5, 2022

for CPU timing I see the slides have the numbers for HLT, but do you also have numbers for the amount of memory used by the model?

@JanFSchulte @khaosmos93 @kondratyevd , please address this point, and open a backport of this PR to 12_3_X.

@kondratyevd
Copy link
Contributor Author

@jpata @missirol

I have generated the igprof memory reports, but I didn't manage to turn them into interpretable web-navigable reports. The sql3 files are here (lxplus):
/afs/cern.ch/user/d/dkondrat/public/OIseeding_reports/ig_reports_apr4/
There are two files for comparison - with and w/o using DNN-based strategy.

@jpata
Copy link
Contributor

jpata commented Apr 5, 2022

I uploaded your memory profiles here:

Since this is probably a HLT workflow, I'm not really able to tell much about the expected/observed use from these.

@kondratyevd
Copy link
Contributor Author

@jpata ah, I used the default igprof instructions and forgot to take into account Slava's recommendations to the previous PR: #35237 (comment).
I am going to reproduce these reports shortly.

@kondratyevd
Copy link
Contributor Author

@jpata the new reports are available at /afs/cern.ch/user/d/dkondrat/public/OIseeding_reports/ig_reports_apr5/.
Could you please upload them again? Hopefully they are more informative now.

@kondratyevd
Copy link
Contributor Author

Looking at the logs, it seems that unfortunately in the most recent tests the input file failed to open.
I have fixed the issue and updated the files in /afs/cern.ch/user/d/dkondrat/public/OIseeding_reports/ig_reports_apr5/.
@jpata could you please copy them once again? I apologize for the inconvenience.

@jpata
Copy link
Contributor

jpata commented Apr 6, 2022

Sure, I reuploaded them. The links are the same as above. Looks like the DNN is among the top modules by memory in this workflow, but I don't have a reference point for HLT, so it doesn't affect the reco signature.

Screenshot 2022-04-06 at 10 11 49

Screenshot 2022-04-06 at 10 13 25

@perrotta
Copy link
Contributor

perrotta commented Apr 6, 2022

+1

  • It updates (optimizes) an existing model for outside-in seed generation in Muon HLT
  • This model is not used in the current developments for the actual HLT menu in Run3, only for tests: before using it the large memory consumption issue should be taken into account (partially addressed here bye use one model for all muons and remove split into barrel and endcap

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants