Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeepVertex and DeepJet+DeepVertex combination in release (onnx inference) #31988

Merged
merged 14 commits into from Nov 13, 2020

Conversation

leonardogiannini
Copy link
Contributor

@leonardogiannini leonardogiannini commented Oct 29, 2020

PR description:

The PR adds the new training of the DeepVertex taggers and the Combination with DeepJet. The BTV POG wants to have them into the release but not in the standard production for future deployment studies.

The BTV group will take care of enabling the taggers for a BTV only dedicated production.

In order to test, expand the configuration RecoBTag/ONNXRuntime/test/test_deep_vertexcomb_cfg.py
and turn to "True" the switches "compute_probabilities" and "run_deepVertex", which are set by default to "False" (module pfDeepFlavourTagInfos). Alternatively, modify RecoBTag/FeatureTools/plugins/DeepFlavourTagInfoProducer.cc before compiling.

PR validation:

the PR is presented here https://docs.google.com/presentation/d/1vWKUH2ANOMQWEdp-tT-9S8WW29RPKoe8m8zNzdk1O6M/edit#slide=id.ga4e5d7cc6d_0_79

After the meeting the agreement was to add taggers with ONNX inference

The performances validated in CMSSW are in slide 3 (purple line)

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-31988/19471

  • This PR adds an extra 24KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-31988/19472

  • This PR adds an extra 24KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @leonardogiannini for master.

It involves the following packages:

PhysicsTools/PatAlgos
RecoBTag/ONNXRuntime

@perrotta, @jpata, @cmsbuild, @santocch, @slava77 can you please review it and eventually sign? Thanks.
@rappoccio, @gouskos, @JyothsnaKomaragiri, @ahinzmann, @smoortga, @riga, @schoef, @mmarionncern, @jdamgov, @jdolen, @nhanvtran, @gkasieczka, @clelange, @emilbols, @hatakeyamak, @ferencek, @gpetruc, @andrzejnovak, @mariadalfonso, @seemasharmafnal this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@santocch
Copy link

santocch commented Nov 4, 2020

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 4, 2020

The tests are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 4, 2020

+1
Tested at: c01ee19
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-36ce35/10494/summary.html
CMSSW: CMSSW_11_2_X_2020-11-03-2300
SCRAM_ARCH: slc7_amd64_gcc820

@cmsbuild
Copy link
Contributor

Comparison job queued.

@leonardogiannini
Copy link
Contributor Author

@jpata
Copy link
Contributor

jpata commented Nov 11, 2020

The profiles for cpu and memory of a ReMiniAOD (step2 of 1325.518)

Thanks for that, but I can't find DeepVertex in the latest profiles. Can you check if it was enabled in step2?

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-36ce35/10636/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 35
  • DQMHistoTests: Total histograms compared: 2529296
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 2529267
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.004 KiB( 34 files compared)
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • Checked 148 log files, 22 edm output root files, 35 DQM output files

@leonardogiannini
Copy link
Contributor Author

The profiles for cpu and memory of a ReMiniAOD (step2 of 1325.518)

Thanks for that, but I can't find DeepVertex in the latest profiles. Can you check if it was enabled in step2?

I suppose it's below 1000. It appears in the timing reports as usual together with the other taggers

TimeReport 0.012164 0.012164 0.012164 pfDeepCombinedJetTagsSlimmedDeepFlavour
TimeReport 0.007313 0.007313 0.007313 pfDeepFlavourJetTagsSlimmedDeepFlavour
TimeReport 0.008506 0.008506 0.008506 pfDeepFlavourTagInfosSlimmedDeepFlavour
TimeReport 0.008442 0.008442 0.008442 pfDeepVertexJetTagsSlimmedDeepFlavour

Now the inference is bit more optimized wrt to the PU200 profile, as it runs only on interesting jets and there are less jets overall in these samples. (I am running on /store/relval/CMSSW_10_6_4/RelValProdTTbar_13_pmx25ns/AODSIM/PUpmx25ns_106X_upgrade2018_realistic_v9-v1/10000/*)

@slava77
Copy link
Contributor

slava77 commented Nov 11, 2020

The profiles for cpu and memory of a ReMiniAOD (step2 of 1325.518)
are here
https://legianni.web.cern.ch/legianni/cgi-bin/igprof-navigator/testMINI+bTag/Remini_mem
https://legianni.web.cern.ch/legianni/cgi-bin/igprof-navigator/testMINI+bTag/Remini_cpu

https://legianni.web.cern.ch/legianni/cgi-bin/igprof-navigator/testMINI+bTag/Remini_cpu/17

% from total c1 c tot
0.32 0.47 0.47 1 1 DeepCombinedONNXJetTagsProducer::produce(edm::Event&, edm::EventSetup const&)
0.19 0.27 0.27 1 1 DeepVertexONNXJetTagsProducer::produce(edm::Event&, edm::EventSetup const&)
0.17 0.25 0.25 1 1 DeepFlavourONNXJetTagsProducer::produce(edm::Event&, edm::EventSetup const&)

the "total" for the denominator is close to 35%.
So, if we run both DeepVertex and DeepCombined, they will be 3 times larger than DeepFlavour and by itself be around 1.5% of total miniAOD time, which seems still acceptable.

I don't see any changes in the TagInfos in this PR, but in the profiler
https://legianni.web.cern.ch/legianni/cgi-bin/igprof-navigator/testMINI+bTag/Remini_cpu/637
btagbtvdeep::seedingTracksToFeatures is the largest contributor. Please remind me if it's a piece used by DeepFlavour already or if it's a part specific to the needs of DeepVertex/DeepCombined.

@leonardogiannini
Copy link
Contributor Author

leonardogiannini commented Nov 11, 2020

I made an additional check with 1000 events, here are the profiles
https://legianni.web.cern.ch/legianni/cgi-bin/igprof-navigator/testMINI+bTag/cpu1K
https://legianni.web.cern.ch/legianni/cgi-bin/igprof-navigator/testMINI+bTag/mem1K

in particular:
https://legianni.web.cern.ch/legianni/cgi-bin/igprof-navigator/testMINI+bTag/cpu1K/17
https://legianni.web.cern.ch/legianni/cgi-bin/igprof-navigator/testMINI+bTag/mem1K/98

the TagInfos are not modified, since the modules are not in standard production. However, seedingTracksToFeatures must be switched on to run the test and compute the taggers. Anyway, DeepVertex and DeepCombined will not be run together other than in BTV workflows, so the extra time should be counted from DeepCombined and seedingTracksToFeatures only.

@slava77
Copy link
Contributor

slava77 commented Nov 11, 2020

However, seedingTracksToFeatures must be switched on to run the test and compute the taggers.

I see, indeed in DeepFlavourTagInfoProducer if (run_deepVertex_) { ... btagbtvdeep::seedingTracksToFeatures(.
this apparently triggers a larger cost increase than the inference part .
Some careful inspection and optimization of repeated calls better be done here soon.
This has some nested loop over tracks with some expensive computations repeated.

@slava77
Copy link
Contributor

slava77 commented Nov 11, 2020

Some careful inspection and optimization of repeated calls better be done here soon.

I created #32114 to keep track of the progress to improve the TagInfos code

@jpata
Copy link
Contributor

jpata commented Nov 12, 2020

+reconstruction

@riga
Copy link
Contributor

riga commented Nov 12, 2020

@jpata

Something that came up in a reco discussion: would we still need a TF evaluator in the future, for e.g. static compilation or GPU evaluation? In this case, it might be premature to remove it, however, the two evaluators should be kept in sync if both are kept in the release. I wonder what's the guidance from the ML prod group @riga @mialiu149?

We are working on the integration of TF GPU into cmsdist, but we cannot give a reliable timeline yet, so no objects from our side to remove the TF evaluator at this point.

@santocch
Copy link

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@qliphy
Copy link
Contributor

qliphy commented Nov 13, 2020

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants