Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNN-based Tau-Id discrimians #25016

Merged
merged 49 commits into from Dec 6, 2018

Conversation

mbluj
Copy link
Contributor

@mbluj mbluj commented Oct 26, 2018

This pull request provides two new DNN-based Tau-Ids, DeepTau and DPFTau to be produced for pat::Taus with MiniAOD.

Details:

The new Tau-Ids are not part of any current workflow and primary they are meant to be run within users' workflows. Their integration as a part of NanoAOD or/and MiniAOD will be a subject of separate pull requests when this one is accepted.
Backports to 94X and 102X release series (used for 2016/17 and 2018 datasets) is foreseen when this one is accepted.

Recipe for tests (as new modules are not part of standard workflows):

  • Checkout needed training files:
mkdir -p $CMSSW_BASE/external/$SCRAM_ARCH/data
cd $CMSSW_BASE/external/$SCRAM_ARCH/data
git clone https://github.com/cms-tau-pog/RecoTauTag-TrainingFiles -b master RecoTauTag/TrainingFiles/data
cd -
  • Edit and run tests configuration:
cmsRun RecoTauTag/RecoTau/test/runDeepTauIDsOnMiniAOD.py

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @mbluj for master.

It involves the following packages:

RecoTauTag/RecoTau

@perrotta, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

MRD2F added a commit to MRD2F/cmssw that referenced this pull request Dec 3, 2018
…de in the commit 194a1d5 from the PR cms-sw#25016

- RecoTauTag/RecoTau/plugins/DeepTauId.cc: code cleaning
@fabiocos
Copy link
Contributor

fabiocos commented Dec 5, 2018

please test with cms-sw/cmsdist#4554

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 5, 2018

The tests are being triggered in jenkins.
Using externals from cms-sw/cmsdist#4554
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/32002/console

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 5, 2018

Comparison job queued.

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 5, 2018

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-25016/32002/summary.html

The workflows 1001.0, 1000.0, 140.53, 136.85, 136.8311, 136.7611, 136.731, 4.22 have different files in step1_dasquery.log than the ones found in the baseline. You may want to check and retrigger the tests if necessary. You can check it in the "files" directory in the results of the comparisons

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 33
  • DQMHistoTests: Total histograms compared: 3131939
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3131734
  • DQMHistoTests: Total skipped: 204
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 32 files compared)
  • Checked 137 log files, 14 edm output root files, 33 DQM output files

@fabiocos
Copy link
Contributor

fabiocos commented Dec 6, 2018

+1

@cmsbuild cmsbuild merged commit 7a65e90 into cms-sw:master Dec 6, 2018
forthommel pushed a commit to forthommel/cmssw that referenced this pull request Dec 10, 2018
fabiocos pushed a commit that referenced this pull request Dec 12, 2018
* First implementation of deep tau id.

* Building dpf isolation module

* Adding in v1

* Adding in runTauIDMVA for other users

* making things fully reproducible

* Reorganisation of configuration files: cff split to cfi and cff

* Some code cleaning

* adapt to cfi/cff reorganization

* Review of DPF and DeepTauId code.

- Defined base class for deep tau discriminators.
- Removed weight files from home cms repository. Now using weights from cms-data.
- Defined WP for both discriminators. Now all discriminators return the corresponding WP results.
- Removed cfi files. Using fillDescriptions instead.
- General code review and cleaning.

* Added example of a python configuration file to produce pat::Tau collection with the new Tau-Ids

* requested changes on runDeepTauIDsOnMiniAOD.py

* Clean runTauIdMVA.py tool and test config to run tauIDs

* Made DeepTauId and DPFIsolation thread-safe

* Finish implement thread-safe requirements on DPFIsolation

* Disable DPFTau_2016_v1 and issue some warnings

* Remove assigning value of variable to itself

* - Implemented on runTauIdMVA the option to work with new training files quantized
- Added a new parameter 'version' on runTauIdMVA, used on DPFIsolation
- Changes on DeepTauId to reduce memory consumption

* - Implementation of global cache to avoid reloading graph for each thread and reduce the memory consuption
- Creation of class DeepTauCache in DeepTauBase, in which now is created graph and session
- Implementation of two new static methods inside the class DeepTauBase: initializeGlobalCache and globalEndJob. The graph and DeepTauCache object are created now inside initializeGlobalCache

* Applied changes on DeepTauBase to allow load new training files using memory mapping

* Implemented TauWPThreshold class.

TauWPThreshold class parses WP cut string (or value) provided in the
python configuration. It is needed because the use of the standard
StringObjectFunction class to parse complex expression results in an
extensive memory usage (> 100 MB per expression).

* Remove the qm.pb input files and leaving just the quantized and the original files

* -Overall, changes to improve memory usage, among these are:
	- Implementation of global cache to avoid reloading graph for each thread
	- Creation of two new static methods inside the class DeepTauBase: initializeGlobalCache and globalEndJob. The graph and DeepTauCache object are created now inside initializeGlobalCache. The memory consumption of initializeGlobalCache for the original, quantized and files that are load using memory mapping method are in the memory_usage.pdf file
	- Implemented configuration to use new training files quantized, and set them as default
	- Implementation of configuration for load files using memory mapping. In our case there wasn't any improvement, respect at the memory consumption of this method, respect the quantized files, so this is not used, but set for future training files
- General code review and cleaning.

* Applied style comments

* Applied style comments

* Applied comments

* Change to be by default the original training file for deepTau, instead of the quantized

* Changes regarding forward-porting DNN-related developments from the PRs #105 and #106 from 94X to 104X

* Applied commets of previus PR

* cleaning code

* Modification in the config to work with new label in files

* Applied comment about the expected format of name of training file

* Fix in last commit

* Applied last comments

* Changes regarding forward-porting DNN-related developments from the PRs #105 and #106 from 94X to 104X

* Applied @perrotta comments on 104X

* Fix error

* Applied comments

* Applied comments

* Fix merge problem

* Applied a few commets

* Applied more changes

* Applied a few small followups

*  Fixed error on DPFIsolation

* Update DPFIsolation.cc

* - RecoTauTag/RecoTau/plugins/DeepTauId.cc: Remove ' clusterVariables 'as a  class member
- RecoTauTag/RecoTau/test/runDeepTauIDsOnMiniAOD.py: Update globaltag and sample

* Added changes in RecoTauTag/RecoTau/python/tools/runTauIdMVA.py made in the commit 194a1d5 from the PR #25016

* Fix error on runDeepTauIDsOnMiniAOD

* Change the GT in RecoTauTag/RecoTau/test/runDeepTauIDsOnMiniAOD.py
fabiocos pushed a commit to fabiocos/cmssw that referenced this pull request Jan 10, 2019
Building dpf isolation module

Adding in v1

Adding in runTauIDMVA for other users

making things fully reproducible

Reorganisation of configuration files: cff split to cfi and cff

Some code cleaning

adapt to cfi/cff reorganization

Review of DPF and DeepTauId code.

- Defined base class for deep tau discriminators.
- Removed weight files from home cms repository. Now using weights from cms-data.
- Defined WP for both discriminators. Now all discriminators return the corresponding WP results.
- Removed cfi files. Using fillDescriptions instead.
- General code review and cleaning.

Added example of a python configuration file to produce pat::Tau collection with the new Tau-Ids

requested changes on runDeepTauIDsOnMiniAOD.py

Clean runTauIdMVA.py tool and test config to run tauIDs

Made DeepTauId and DPFIsolation thread-safe

Finish implement thread-safe requirements on DPFIsolation

Disable DPFTau_2016_v1 and issue some warnings

- Implemented on runTauIdMVA the option to work with new training files quantized
- Added a new parameter 'version' on runTauIdMVA, used on DPFIsolation
- Changes on DeepTauId to reduce memory consumption

- Implementation of global cache to avoid reloading graph for each thread and reduce the memory consuption
- Creation of class DeepTauCache in DeepTauBase, in which now is created graph and session
- Implementation of two new static methods inside the class DeepTauBase: initializeGlobalCache and globalEndJob. The graph and DeepTauCache object are created now inside initializeGlobalCache

Applied changes on DeepTauBase to allow load new training files using memory mapping

Implemented TauWPThreshold class.

TauWPThreshold class parses WP cut string (or value) provided in the
python configuration. It is needed because the use of the standard
StringObjectFunction class to parse complex expression results in an
extensive memory usage (> 100 MB per expression).

Remove the qm.pb input files and leaving just the quantized and the original files

-Overall, changes to improve memory usage, among these are:
	- Implementation of global cache to avoid reloading graph for each thread
	- Creation of two new static methods inside the class DeepTauBase: initializeGlobalCache and globalEndJob. The graph and DeepTauCache object are created now inside initializeGlobalCache. The memory consumption of initializeGlobalCache for the original, quantized and files that are load using memory mapping method are in the memory_usage.pdf file
	- Implemented configuration to use new training files quantized, and set them as default
	- Implementation of configuration for load files using memory mapping. In our case there wasn't any improvement, respect at the memory consumption of this method, respect the quantized files, so this is not used, but set for future training files
- General code review and cleaning.

Applied style comments

Applied style comments

Applied comments

Change to be by default the original training file for deepTau, instead of the quantized

Applied commets of previus PR

cleaning code

Modification in the config to work with new label in files

Applied comment about the expected format of name of training file

Fix in last commit

Applied last comments

Remove assigning value of variable to itself

Applied @perrotta comments on 104X

Fix error

Applied comments

Applied comments

Fix merge problem

Fix cherry-pick issue

Applied a few commets

Applied more changes

Applied a few small followups

 Fixed error on DPFIsolation

Added clamp function, that only is called it only when the std::clamp is not defined

Update DPFIsolation.cc

- RecoTauTag/RecoTau/test/runDeepTauIDsOnMiniAOD.py: Added changes made in the commit 194a1d5 from the PR cms-sw#25016
- RecoTauTag/RecoTau/plugins/DeepTauId.cc: code cleaning
@mbluj mbluj deleted the CMSSW_10_4_X_tau_pog_DNNTauIDs branch October 10, 2023 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants