Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce (Robust)ParTAK4 jet tagger, DeepJet model update for Run 3, remove DeepCSV from nano #41275

Merged
merged 19 commits into from May 4, 2023

Conversation

AnnikaStein
Copy link
Contributor

@AnnikaStein AnnikaStein commented Apr 4, 2023

PR description:

This is to add (Robust)ParticleTransformerAK4, in particular for Mini and Nano, update the DeepJet (aka DeepFlavour) model and remove outdated DeepCSV (aka bTagDeep…) entries for Run 3. New models have been trained on 122X samples (PUPPI). This refers to multiple issues on GitLab, https://gitlab.cern.ch/cms-nanoAOD/xpog-coordination/-/issues/61 https://gitlab.cern.ch/cms-nanoAOD/xpog-coordination/-/issues/56

It’s a follow-up of PR #40706 (where DataFormats were defined, which are filled with meaning now).

We are using model files of cms-data/RecoBTag-Combined#51 and as discussed in https://indico.cern.ch/event/1263008/#10-ak4-jets-tagging-algorithms we were able to introduce a well performing, robust and yet quicker ParT model (for inference).

One of the most comprehensive overviews on the new tagger along with the different developments that go in was given lately: https://indico.cern.ch/event/1218506/#6-particle-transformer-run-3-t
The new concept we want to introduce in RecoBTag is adversarial training, e.g. explained in the jet tagging context here https://doi.org/10.1007/s41781-022-00087-1, with first full CMS application here https://cds.cern.ch/record/2839919. Implications / advantages have been studied already for earlier versions of ParTAK4 (e.g. https://indico.cern.ch/event/1218501/#4-adversarial-training-for-par, https://indico.cern.ch/event/1218499/#5-adversarial-training-for-par). More (written) documentation to follow.

Collaboration with @AlexDeMoor

PR validation:

The comparisons between CMSSW and PyTorch have been run lately, results presented here: https://indico.cern.ch/event/1272115/#1-news

A set of basic checks was run:

scram b runtests
scram build code-checks
scram build code-format

Custom tests (which we think shall not go in the official repo, but would be available if needed: AnnikaStein@07aac71) have been run to perform the AOD -> MiniAOD step, („step1_PAT.py“) as well as „step1_NANO.py“ to go from Mini to Nano, both without errors, adding the new branches as expected.

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

This is for master, but a backport to 130X is planned.

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 4, 2023

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 4, 2023

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41275/35041

ERROR: Build errors found during clang-tidy run.

RecoBTag/FeatureTools/plugins/ParticleTransformerAK4TagInfoProducer.cc:314:11: error: no matching function for call to 'packedCandidateToFeatures' [clang-diagnostic-error]
          btagbtvdeep::packedCandidateToFeatures(packed_cand,
          ^
RecoBTag/FeatureTools/interface/ChargedCandidateConverter.h:77:8: note: candidate function not viable: no known conversion from 'btagbtvdeep::ChargedCandidateFeatures' to 'const float' for 6th argument
--
RecoBTag/FeatureTools/plugins/ParticleTransformerAK4TagInfoProducer.cc:350:11: error: no matching function for call to 'recoCandidateToFeatures' [clang-diagnostic-error]
          btagbtvdeep::recoCandidateToFeatures(reco_cand,
          ^
RecoBTag/FeatureTools/interface/ChargedCandidateConverter.h:88:8: note: candidate function not viable: no known conversion from 'edm::Ref<std::vector<reco::Vertex>, reco::Vertex, edm::refhelper::FindUsingAdvance<std::vector<reco::Vertex>, reco::Vertex>>' to 'const int' for 8th argument
--
RecoBTag/FeatureTools/plugins/ParticleTransformerAK4TagInfoProducer.cc:369:11: error: no matching function for call to 'packedCandidateToFeatures' [clang-diagnostic-error]
          btagbtvdeep::packedCandidateToFeatures(
          ^
RecoBTag/FeatureTools/interface/NeutralCandidateConverter.h:14:8: note: candidate function not viable: requires 7 arguments, but 5 were provided
--
RecoBTag/FeatureTools/plugins/ParticleTransformerAK4TagInfoProducer.cc:372:11: error: no matching function for call to 'recoCandidateToFeatures' [clang-diagnostic-error]
          btagbtvdeep::recoCandidateToFeatures(
          ^
RecoBTag/FeatureTools/interface/NeutralCandidateConverter.h:22:8: note: candidate function not viable: requires 7 arguments, but 6 were provided
--
gmake: *** [config/SCRAM/GMake/Makefile.coderules:129: code-checks] Error 2
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 2

@AlexDeMoor
Copy link
Contributor

It seems some changes on the inputs producers in the last PRs led to the errors : #40803

We are investigating this and adjusting our implementation w.r.t. those new elements

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 5, 2023

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41275/35049

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@tvami
Copy link
Contributor

tvami commented May 2, 2023

@cmsbuild , please test

@cmsbuild
Copy link
Contributor

cmsbuild commented May 3, 2023

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d29304/32316/summary.html
COMMIT: 4e2ae70
CMSSW: CMSSW_13_1_X_2023-05-02-1100/el8_amd64_gcc11
Additional Tests: NANO
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/41275/32316/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d29304/32316/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d29304/32316/git-merge-result

Comparison Summary

Summary:

  • You potentially removed 424 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 26715 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3460672
  • DQMHistoTests: Total failures: 6137
  • DQMHistoTests: Total nulls: 6
  • DQMHistoTests: Total successes: 3454507
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -3.5710000000000006 KiB( 47 files compared)
  • DQMHistoSizes: changed ( 12434.0,... ): -0.438 KiB Physics/NanoAODDQM
  • DQMHistoSizes: changed ( 13234.0,... ): -0.264 KiB Physics/NanoAODDQM
  • DQMHistoSizes: changed ( 4.53 ): 0.023 KiB JetMET/SUSYDQM
  • Checked 207 log files, 159 edm output root files, 48 DQM output files
  • TriggerResults: found differences in 2 / 46 workflows

NANO Comparison Summary

Summary:

  • You potentially added 6 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 15 differences found in the comparisons
  • DQMHistoTests: Total files compared: 12
  • DQMHistoTests: Total histograms compared: 12171
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 12171
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 11.106000000000003 KiB( 11 files compared)
  • DQMHistoSizes: changed ( 2500.311,... ): 2.696 KiB Physics/NanoAODDQM
  • DQMHistoSizes: changed ( 2500.331,... ): 1.620 KiB Physics/NanoAODDQM
  • DQMHistoSizes: changed ( 2500.401,... ): -0.438 KiB Physics/NanoAODDQM
  • DQMHistoSizes: changed ( 2500.5111,... ): -0.264 KiB Physics/NanoAODDQM
  • Checked 25 log files, 11 edm output root files, 12 DQM output files

Nano size comparison Summary:

Sample kb/ev ref kb/ev diff kb/ev ev/s/thd ref ev/s/thd diff rate mem/thd ref mem/thd
2500.31 2.438 2.387 0.051 ( +2.1% ) 5.17 5.52 -6.2% 1.485 1.451
2500.311 2.545 2.487 0.058 ( +2.3% ) 4.67 4.98 -6.1% 1.875 1.842
2500.312 2.491 2.437 0.054 ( +2.2% ) 4.85 5.11 -5.1% 1.867 1.833
2500.33 1.240 1.206 0.034 ( +2.8% ) 9.77 10.49 -6.8% 1.678 1.658
2500.331 1.603 1.550 0.054 ( +3.5% ) 5.25 5.68 -7.6% 1.843 1.825
2500.332 1.485 1.444 0.041 ( +2.9% ) 7.60 8.14 -6.6% 1.878 1.870
2500.401 2.153 2.184 -0.030 ( -1.4% ) 5.28 5.33 -0.8% 1.344 1.310
2500.501 1.752 1.768 -0.017 ( -0.9% ) 9.04 9.05 -0.1% 1.250 1.218
2500.511 1.144 1.162 -0.018 ( -1.5% ) 4.37 4.37 +0.0% 1.550 1.529
2500.5111 1.502 1.520 -0.018 ( -1.2% ) 3.85 3.85 -0.0% 1.598 1.575
2500.601 2.014 2.049 -0.035 ( -1.7% ) 12.81 12.80 +0.1% 1.150 1.122

@simonepigazzini
Copy link
Contributor

+1

@simonepigazzini
Copy link
Contributor

@AnnikaStein @AlexDeMoor thank you for getting this one done. Let's try to finilize the backport today

@clacaputo
Copy link
Contributor

+reconstruction

@cmsbuild
Copy link
Contributor

cmsbuild commented May 4, 2023

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@rappoccio
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit ecc4a80 into cms-sw:master May 4, 2023
14 checks passed
@AlexDeMoor
Copy link
Contributor

By the way the model's PR has not yet been merged : cms-data/RecoBTag-Combined#51
@simonepigazzini and @perrotta , what is the procedure about it ?

@perrotta
Copy link
Contributor

perrotta commented May 4, 2023

By the way the model's PR has not yet been merged : cms-data/RecoBTag-Combined#51 @simonepigazzini and @perrotta , what is the procedure about it ?

Thank you for noticing it @AlexDeMoor
Now merged, as well as the related update in cmdist cms-sw/cmsdist#8492

cmsbuild added a commit that referenced this pull request May 9, 2023
Backport to 13_0_X of #41275 Introduce (Robust)ParTAK4 jet tagger, DeepJet model update for Run 3, remove DeepCSV from nano
cmsbuild added a commit that referenced this pull request May 9, 2023
 Backport to 13_1_X of #41275 Introduce (Robust)ParTAK4 jet tagger, DeepJet model update for Run 3, remove DeepCSV from nano
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants