Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

L2 tau identification with a CNN #35640

Merged
merged 14 commits into from Oct 21, 2021

Conversation

azotz
Copy link
Contributor

@azotz azotz commented Oct 13, 2021

PR description:

In this PR the implementation of a machine learning based approach for the L2 taus is introduced. The chosen ML algorithm is a CNN trained with different hadronic taus production mechanisms (and reweighed in order to avoid overtraining for specific production mechanisms) versus fake taus coming from QCD.

You can find:

  • the L2TauNNTagProducer which produces the CNN outputs for given L1Tau collections (>=1 collection). For each collection a vector of CNN output is created;
  • the L2TauNNTagFilter which takes in input a L1 collection (just 1 in this case), the vector of CNN outputs and the CNN threshold. For the threshold, three different Working Points have been defines in order reach three different desired rate values (3kHz, 4kHz and 5kHz) - evaluated on EphemeralHLTPhysicsX. For more information see the presentation at the TSG meeting [1]
  • The L2TauML production file and a custom function (testL2TauTag) that includes the L2TauTagProducer+Filter
  • instruction on how to run the test in a README.md file
  • A unit test based on the test mentioned above

The trained CNN and the normalization file (needed to normalize the matrix which will be given in input to the CNN) is currently in the private fork [2] of the RecoTauTag-TrainingFiles repository as you can see from the instructions in the md file. The training files are planned to be provided via cms-sw/cms-data, see [3].

[1] https://indico.cern.ch/event/1040952/contributions/4401915/attachments/2260841/3837378/L2TauIDCNN.pdf
[2] https://github.com/valeriadamante/RecoTauTag-TrainingFiles/tree/L2Taus
[3] cms-data/RecoTauTag-TrainingFiles#7

PR validation:

The PR has been validated on top of 12_1_0_pre4. The provided test has been validated and runs smoothly.

The following tests were run and passed as expected, since this PR does not include any change to the official workflows:

scram b distclean 
git cms-checkdeps -a -A
scram b -j 8
scram b runtests
runTheMatrix.py -l limited -i all --ibeos

if this PR is a backport please specify the original PR and why you need to backport that PR:

This PR is not a backport.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35640/25913

  • This PR adds an extra 212KB to repository

  • Found files with invalid states:

    • RecoTauTag/HLTProducers/python/TauL2ML.py:
    • RecoTauTag/HLTProducers/python/ApplyCNNL2Test.py:
    • RecoTauTag/TrainingFiles:

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @azotz for master.

It involves the following packages:

  • RecoTauTag/HLTProducers (hlt)

@cmsbuild, @missirol, @Martin-Grunewald can you please review it and eventually sign? Thanks.
@silviodonato, @mbluj, @azotz this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@missirol
Copy link
Contributor

Hi, a few comments/questions from a very first look:

  • Are there any wfs that would exercise this new code in the PR tests? If not, a unit test should probably be added (following the readme and testing manually is not the way to go, imho).

  • What is the plan to integrate the necessary inputs in cms-sw/cms-data (not relying on the fork mentioned in the readme)?

@missirol
Copy link
Contributor

On a technical note, please squash the commits into one, to tidy up the history.

@@ -14,5 +14,7 @@
<use name="DataFormats/HLTReco"/>
<use name="HLTrigger/HLTcore"/>
<use name="RecoTracker/TkTrackingRegions"/>
<use name="cuda"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the classes in this package really using cuda code (I could not see it)?
Otoh, I was expecting the addition of (judging from the includes in some of the classes):

<use name="CUDADataFormats/SiPixelCluster"/>
<use name="CUDADataFormats/Track"/>
<use name="CUDADataFormats/Vertex"/>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kandrosov @valeriadamante could you comment this, please?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @missirol

If we do not include <use name="cuda"/> the following error raises:

>> Compiling edm plugin /afs/cern.ch/work/v/vdamante/public/CMSSW_12_1_0_pre3/src/RecoTauTag/HLTProducers/src/VertexFromTrackProducer.cc
In file included from /cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_1_0_pre3/src/RecoPixelVertexing/PixelTrackFitting/interface/FitUtils.h:6,
                 from /afs/cern.ch/work/v/vdamante/public/CMSSW_12_1_0_pre3/src/RecoTauTag/HLTProducers/src/L2TauTagNNProducer.cc:40:
/cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_1_0_pre3/src/RecoPixelVertexing/PixelTrackFitting/interface/FitResult.h:7:10: fatal error: cuda_runtime.h: No such file or directory
    7 | #include <cuda_runtime.h>
      |          ^~~~~~~~~~~~~~~~
compilation terminated.
gmake: *** [config/SCRAM/GMake/Makefile.rules:1700: tmp/slc7_amd64_gcc900/src/RecoTauTag/HLTProducers/src/RecoTauTagHLTProducers/L2TauTagNNProducer.cc.o] Error 1
gmake: *** Waiting for unfinished jobs....
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 2

I can try to remove <use name="cuda"/> and include the lines you suggested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@missirol
I removed <use name="cuda"/> and I included the lines you suggested and it works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will implement the changes then as suggested

desc.add<edm::InputTag>("L1TauSrc", edm::InputTag(""))
->setComment("Which trigger should the L1 Taus collection pass");
desc.add<edm::InputTag>("L2Outcomes", edm::InputTag(""))->setComment("L2 CNN outcomes");
desc.add<double>("DiscrWP", 0.12267940863785043)->setComment("value of discriminator threshold");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please round this up, e.g. 0.123.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @missirol

For the moment I wonder if can we round at 0.1227. Further studies on the rounding will be done to see if we can leave 3 floating or we need other ones.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's okay.

@azotz
Copy link
Contributor Author

azotz commented Oct 14, 2021

The tests as described in the PR validation all passed.

@azotz
Copy link
Contributor Author

azotz commented Oct 14, 2021

* What is the plan to integrate the necessary inputs in `cms-sw/cms-data` (not relying on the fork mentioned in the readme)?

The plan is to provide them through cms-sw/cms-data, see cms-data/RecoTauTag-TrainingFiles#7

@azotz
Copy link
Contributor Author

azotz commented Oct 14, 2021

* Are there any wfs that would exercise this new code in the PR tests? If not, a [unit test](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideDevelopersGuide#Add_tests_to_your_package) should probably be added (following the readme and testing manually is not the way to go, imho).

The instructions in the README are not supposed to replace a unit test, once these developments reach integration phase. Is a unit test necessary at this stage? A shell script could be provided, which follows the instructions in the README

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35640/25949

  • This PR adds an extra 76KB to repository

  • Found files with invalid states:

    • RecoTauTag/HLTProducers/python/TauL2ML.py:
    • RecoTauTag/HLTProducers/python/ApplyCNNL2Test.py:
    • RecoTauTag/TrainingFiles:

@azotz
Copy link
Contributor Author

azotz commented Oct 14, 2021

I rebased the development branch to the recent IB. The squash is incoming. The comments are being addressed by the original authors.

@cmsbuild
Copy link
Contributor

Pull request #35640 was updated. @cmsbuild, @missirol, @Martin-Grunewald can you please check and sign again.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35640/25952

  • This PR adds an extra 8KB to repository

@cmsbuild
Copy link
Contributor

Pull request #35640 was updated. @cmsbuild, @missirol, @Martin-Grunewald can you please check and sign again.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35640/26113

  • This PR adds an extra 76KB to repository

@cmsbuild
Copy link
Contributor

Pull request #35640 was updated. @cmsbuild, @missirol, @Martin-Grunewald can you please check and sign again.

@missirol
Copy link
Contributor

please test

Thanks to Andrea for helping with the review.

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d40177/19798/summary.html
COMMIT: e73fe94
CMSSW: CMSSW_12_1_X_2021-10-21-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/35640/19798/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 40
  • DQMHistoTests: Total histograms compared: 2751113
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2751090
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 39 files compared)
  • Checked 170 log files, 37 edm output root files, 40 DQM output files
  • TriggerResults: no differences found

@missirol
Copy link
Contributor

+hlt

note: cms-data/RecoTauTag-TrainingFiles#7 should be merged before this PR

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 16173a3 into cms-sw:master Oct 21, 2021
@Martin-Grunewald
Copy link
Contributor

For me, the unit test in question, https://github.com/cms-sw/cmssw/blob/master/RecoTauTag/HLTProducers/test/testL2TauTagNN.py seems unacceptable as a unit test as it appears to run the menu itself in a special environment, customised, which now fails as the input menu has changed. This goes way beyond a supposedly simple unit test. So the required course of action is to remove the unit test or scale it down to become a proper unit test.
See also discussion in #35863

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants