Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[102X][Backport] DeepAK8 tagger integration #24505

Merged
merged 31 commits into from Oct 3, 2018

Conversation

hqucms
Copy link
Contributor

@hqucms hqucms commented Sep 11, 2018

Backport of #23768.
Needs external from cms-sw/cmsdist#4326.
The tagger is disabled in the MiniAOD sequence by default to preserve the event content.

 - Both the nominal and the decorrelated versions are included, as well
as a few meta taggers (aggregating the scores).
 - Currently it supports only updating a jet collection. The
implementation to run on RECO exists but not tested.
Re-bind executor every time for thread-safety.
@cmsbuild
Copy link
Contributor

Pull request #24505 was updated. @perrotta, @monttj, @cmsbuild, @slava77, @gpetruc, @arizzi can you please check and sign again.

@jmduarte
Copy link
Member

@hqucms Are you saying this issue is only causing a difference when running the tagger on RECO inputs?

How should I test that this? e.g. Should I compare the tagger output running on both RECO and MINIAOD inputs before and after this PR is applied?

Thanks,
Javier

@hqucms
Copy link
Contributor Author

hqucms commented Sep 12, 2018

@slava77
I found a workaround to avoid the change in d7a1c26. This should allow us to preserve the event content in the backport.

@slava77
Copy link
Contributor

slava77 commented Sep 12, 2018

@cmsbuild please test with cms-sw/cmsdist#4326

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 12, 2018

The tests are being triggered in jenkins.
Using externals from cms-sw/cmsdist#4326
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/30386/console

@hqucms
Copy link
Contributor Author

hqucms commented Sep 13, 2018

@jmduarte

If I understand correctly, the way that DeepDoubleB was set up in the MiniAOD sequence (in https://github.com/cms-sw/cmssw/blob/CMSSW_10_2_X/PhysicsTools/PatAlgos/python/slimming/applyDeepBtagging_cff.py#L49-L68), i.e., by updating slimmedJetsAK8, was because you wanted to access pat::PackedCandidate, which was fully consistent with the training input from MiniAOD files, right? However, due to changes introduced in #22914, the daughters of slimmedJetsAK8 are still reco::PFCandidate before slimmedJetsAK8 is written to MiniAOD file (because the daughters are cached in memory), and therefore DeepDoubleB was actually accessing reco::PFCandidate when running in the MiniAOD step.

This PR, specifically d7a1c26, fixes the change in #22914 such that the daughters of slimmedJetsAK8 are always pat::PackedCandidate. Then, DeepDoubleB running in MiniAOD step will use pat::PackedCandidate which I think is more consistent with the training, right? However, this is then causing some difference for the DeepDoubleB scores produced in the MiniAOD step:

However, I think only the high pt AK8 jets are of interest for applying DeepDoubleB, right? And since the training was derived with high pt jets, it is most likely not optimal/applicable for these low pt jets anyhow.

What do you think, @jmduarte?

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-24505/30386/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 31
  • DQMHistoTests: Total histograms compared: 2987919
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2987726
  • DQMHistoTests: Total skipped: 190
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 30 files compared)
  • Checked 129 log files, 14 edm output root files, 31 DQM output files

@slava77
Copy link
Contributor

slava77 commented Sep 14, 2018

+1

for #24505 ed815a3

  • the backport of DeepAK8 tagger integration #23768 1acac95 is correct: it is not verbatim to satisfy the no-change policy. The deepAK8 taggers can be enabled in analysis setup.
    • the deepAK8 taggers are disabled (can be enabled by uncommenting some code in PhysicsTools/PatAlgos/python/slimming/applyDeepBtagging_cff.py
    • the changes in DataFormats/Candidate/interface/CompositePtrCandidate.h and DataFormats/PatCandidates/interface/Jet.h are not in this backport PR to preserve the default behavior of the deepDoubleB tagger currently running in production. From the discussion so far in [102X][Backport] DeepAK8 tagger integration #24505 (comment) related to these, the change in the master branch seems appropriate.
  • jenkins tests pass and comparisons with the baseline show no differences as expected

@fabiocos this PR needs to be merged with cms-sw/cmsdist#4326
IIUC, the dependence on the external becomes essential at runtime only in multi-threaded mode. So, I do not expect any build/run issues in the IBs if the updates are not in sync.

@hqucms
Copy link
Contributor Author

hqucms commented Sep 14, 2018

@slava77

Thank you very much for your review!

@fabiocos this PR needs to be merged with cms-sw/cmsdist#4326
IIUC, the dependence on the external becomes essential at runtime only in multi-threaded mode. So, I do not expect any build/run issues in the IBs if the updates are not in sync.

Actually cms-sw/cmsdist#4326 needs to be merged before this PR, because cms-sw/cmsdist#4326 also adds some header files to the MXNet external which are necessary for this PR to compile.

@slava77
Copy link
Contributor

slava77 commented Sep 14, 2018 via email

btagDiscriminators = [
'pfDeepDoubleBJetTags:probQ',
'pfDeepDoubleBJetTags:probH',
],
postfix = 'SlimmedAK8DeepDoubleB'+postfix,
# ] + pfDeepBoostedJetTagsAll, # uncomment it to test DeepBoostedJet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slava77 I understand that this is the line that would trigger the production of the new collection, for the erst the old one is renamed (DeepDoubleB -> DeppTags) but it is not a persistent collection

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand for the erst .

The change of the name in the transient collection names starts at L49 and the postfix above also updates other transient collection names.

@fabiocos
Copy link
Contributor

fabiocos commented Oct 3, 2018

+1

@fabiocos
Copy link
Contributor

fabiocos commented Oct 3, 2018

merge

@cmsbuild cmsbuild merged commit fe0f9a9 into cms-sw:CMSSW_10_2_X Oct 3, 2018
@hqucms hqucms deleted the deep-boosted-jets-rebase-102X branch November 15, 2019 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants