Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new PhaseII trainings for DeepCSV and DeepJet #42

Merged
merged 1 commit into from Feb 25, 2021

Conversation

mneukum
Copy link
Contributor

@mneukum mneukum commented Feb 18, 2021

Adding updated versions of DeepCSV and DeepJet training for PhaseII.

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @mneukum (Max Neukum) for branch master.

@perrotta, @smuzaffar, @mrodozov, @cmsbuild, @slava77, @jpata can you please review it and eventually sign? Thanks.
@ferencek, @JyothsnaKomaragiri, @emilbols, @andrzejnovak, @smoortga this is something you requested to watch as well.
cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

Pull request #42 was updated.

@mneukum mneukum marked this pull request as ready for review February 19, 2021 14:58
@slava77
Copy link

slava77 commented Feb 19, 2021

  • DeepCSV_PhaseII.json is modified in this PR, it already exists and is used in CMSSW, while and DeepCSV_PhaseII_jan2021.json is added: what is the purpose of the second file? was the update in DeepCSV_PhaseII.json intended?
  • files in DeepJetV01_PhaseII/ do not follow the existing directory pattern: is it a new model or an extension of an existing one? would it be more appropriate in DeepFlavourV01_PhaseII ?

@SWuchterl
Copy link

@slava77 The upload of two versions for DeepCSV and DeepJet was intended. The files with _jan2021 are the most recent trainings as presented here. The other versions have been studied in the context of the HLT-TDR and need to be included for the baseline Phase 2 menu, see 32903

@cmsbuild
Copy link
Contributor

Pull request #42 was updated.

@mneukum
Copy link
Contributor Author

mneukum commented Feb 19, 2021

@slava77
The DeepJet PhaseII files are a retrained versions of the existing model with PhaseII conditions.
As suggested, I moved (and renamed) the DeepJet / DeepFlavour PhaseII files to reflect the directory pattern.

Let me know, if I should clean up the commit history before merging.

@cmsbuild
Copy link
Contributor

Pull request #42 was updated.

@slava77
Copy link

slava77 commented Feb 19, 2021

The upload of two versions for DeepCSV and DeepJet was intended. The files with _jan2021 are the most recent trainings as presented here. The other versions have been studied in the context of the HLT-TDR and need to be included for the baseline Phase 2 menu, see 32903

If I understood this correctly

  • the updated DeepCSV_PhaseII.json will be used for the HLT-TDR (including a backport to 11_1_X?)
  • the new DeepCSV_PhaseII_jan2021.json is going to be used in the offline setup (and there is no cms-sw PR that would start using it yet; when?)

Please check/confirm.
Thank you.

@slava77
Copy link

slava77 commented Feb 19, 2021

it looks like moving of the files in multiple commits still keeps them in the git history and increases the repo size more than necessary.
Please remake the final version in one commit.

@cmsbuild
Copy link
Contributor

Pull request #42 was updated.

@mneukum
Copy link
Contributor Author

mneukum commented Feb 22, 2021

The upload of two versions for DeepCSV and DeepJet was intended. The files with _jan2021 are the most recent trainings as presented here. The other versions have been studied in the context of the HLT-TDR and need to be included for the baseline Phase 2 menu, see 32903

If I understood this correctly

* the updated DeepCSV_PhaseII.json will be used for the HLT-TDR (including a backport to 11_1_X?)

* the new DeepCSV_PhaseII_jan2021.json is going to be used in the offline setup (and there is no cms-sw PR that would start using it yet; when?)

Please check/confirm.
Thank you.

I can confirm the 2nd bullet point. To my knowledge, there is no definite date for it yet.
It is possible to remove "_jan2021" files from this PR, since the main reason is for HLT development. In that case, we would issue a new PR when the timeline is clearer.

@slava77
Copy link

slava77 commented Feb 22, 2021

I can confirm the 2nd bullet point. To my knowledge, there is no definite date for it yet.
It is possible to remove "_jan2021" files from this PR, since the main reason is for HLT development. In that case, we would issue a new PR when the timeline is clearer.

If the timeline for deploying in offline is unclear and there are no other (e.g. analysis) use cases for this file, it may be better not to push it in; partly to avoid a case of having a file that will never be used.

BTW, is DeepFlavourV01_PhaseII/model.onnx going to be used in the HLT TDR?

@SWuchterl
Copy link

SWuchterl commented Feb 22, 2021

If I understood this correctly

* the updated DeepCSV_PhaseII.json will be used for the HLT-TDR (including a backport to 11_1_X?)

* BTW, is DeepFlavourV01_PhaseII/model.onnx going to be used in the HLT TDR?

Please check/confirm.
Thank you.

Yes, both DeepCSV_PhaseII.json and DeepFlavourV01_PhaseII/model.onnx are going to be used in the HLT TDR.
And yes, we would need to have them in the next CMSSW_11_1_X release. What would be the next necessary steps for us to have that?

If the timeline for deploying in offline is unclear and there are no other (e.g. analysis) use cases for this file, it may be better not to push it in; partly to avoid a case of having a file that will never be used.

For HLT studies after the TDR we would make use of these files, but I think we don't need them in a central release. So I would also be fine with not pushing them.

@slava77
Copy link

slava77 commented Feb 22, 2021

And yes, we would need to have them in the next CMSSW_11_1_X release. What would be the next necessary steps for us to have that?

we will need to make a backport
in 11_1_X we have RecoBTag-Combined=V01-03-00
https://github.com/cms-sw/cmsdist/blob/IB/CMSSW_11_1_X/master/data/cmsswdata.txt
we will likely need a special branch

V01-03-00...V01-09-00 has many more changes already, not restricted to phase-2.

@smuzaffar @mrodozov please make a CMSSW_11_1_X branch for this repository so that a backport to it (V01-03-00 + the diffs of this PR) can be made.

@cmsbuild
Copy link
Contributor

Pull request #42 was updated.

@mneukum
Copy link
Contributor Author

mneukum commented Feb 22, 2021

If the timeline for deploying in offline is unclear and there are no other (e.g. analysis) use cases for this file, it may be better not to push it in; partly to avoid a case of having a file that will never be used.

For HLT studies after the TDR we would make use of these files, but I think we don't need them in a central release. So I would also be fine with not pushing them.

I updated the PR to only include the model files used in HLT TDR.

@smuzaffar
Copy link
Contributor

https://github.com/cms-data/RecoBTag-Combined/tree/CMSSW_11_1_X is now available for 11.1.X. @mneukum , can you please submit a PR for CMSSW_11_1_X branch too

@mneukum
Copy link
Contributor Author

mneukum commented Feb 22, 2021

I just created a new PR to CMSSW_11_1_X branch:

#43

@slava77
Copy link

slava77 commented Feb 22, 2021

@cmsbuild please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-4f0049/13017/summary.html
COMMIT: 1786255
CMSSW: CMSSW_11_3_X_2021-02-22-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-data/RecoBTag-Combined/42/13017/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 94 differences found in the comparisons
  • DQMHistoTests: Total files compared: 37
  • DQMHistoTests: Total histograms compared: 2750983
  • DQMHistoTests: Total failures: 186
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2750775
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 36 files compared)
  • Checked 156 log files, 37 edm output root files, 37 DQM output files

@slava77
Copy link

slava77 commented Feb 22, 2021

Reco comparison results: 94 differences found in the comparisons

I see that the discriminant shape changed quite significantly for the phase-2 deepCSV (default offline).
from wf 23434.999

  • probb
    all_OldVSNew_TTbar14TeV2026D49PUwPRMXwf23434p999c_recoJetedmRefToBaseProdTofloatsAssociationVector_pfDeepCSVJetTags_probb_RECO_obj_data_119
  • probudsg
    all_OldVSNew_TTbar14TeV2026D49PUwPRMXwf23434p999c_recoJetedmRefToBaseProdTofloatsAssociationVector_pfDeepCSVJetTags_probudsg_RECO_obj_data_118

There are not enough events to conclude from the ROC curves in the DQM plots if this is OK, although it seems like it's fine (red is with this PR)

  • b.vs.l

wf23434 999_deepCSV_bvl

- b.vs.c

wf23434 999_deepCSV_bvc

Please clarify which slide in https://indico.cern.ch/event/1008226/#73-status-of-phase2-training-o corresponds to this update (unfortunately I was not able to match).
If it's not covered in these slides, please provide some details on validation of this update for the offline case.
Thank you.

@mneukum
Copy link
Contributor Author

mneukum commented Feb 23, 2021

@slava77
Here are some slides from last december. Slide 13 shows ROC curves for the old (black) and new (green) training.
The red curve is ongoing development and not part of this PR.

https://indico.cern.ch/event/951596/contributions/4140990/attachments/2159414/3643026/2020_12_09_draft1.pdf

@slava77
Copy link

slava77 commented Feb 23, 2021

+1

based on the preceding review; jenkins tests and #42 (comment)

@slava77
Copy link

slava77 commented Feb 25, 2021

ping
what else is needed to get this to be merged?

@slava77
Copy link

slava77 commented Feb 25, 2021

ping
what else is needed to get this to be merged?

@smuzaffar @silviodonato
(tagging explicitly just in case the notifications got lost)

@smuzaffar
Copy link
Contributor

+externals

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants