Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNN-based Tau-Id discrimians (94X) #25385

Closed

Conversation

mbluj
Copy link
Contributor

@mbluj mbluj commented Nov 30, 2018

This pull request provides two new DNN-based Tau-Ids, DeepTau and DPFTau, to be produced for pat::Taus with MiniAOD.
It is a backport of #25016 to 94X for analyses based on 2016+2017 data and detailed description can be found therein.

kandrosov and others added 30 commits June 15, 2018 16:33
- Defined base class for deep tau discriminators.
- Removed weight files from home cms repository. Now using weights from cms-data.
- Defined WP for both discriminators. Now all discriminators return the corresponding WP results.
- Removed cfi files. Using fillDescriptions instead.
- General code review and cleaning.
Integration of DPFIsolation and DeepTauId
Made DeepTauId and DPFIsolation thread-safe
…es quantized

- Added a new parameter 'version' on runTauIdMVA, used on DPFIsolation
- Changes on DeepTauId to reduce memory consumption
…read and reduce the memory consuption

- Creation of class DeepTauCache in DeepTauBase, in which now is created graph and session
- Implementation of two new static methods inside the class DeepTauBase: initializeGlobalCache and globalEndJob. The graph and DeepTauCache object are created now inside initializeGlobalCache
TauWPThreshold class parses WP cut string (or value) provided in the
python configuration. It is needed because the use of the standard
StringObjectFunction class to parse complex expression results in an
extensive memory usage (> 100 MB per expression).
	- Implementation of global cache to avoid reloading graph for each thread
	- Creation of two new static methods inside the class DeepTauBase: initializeGlobalCache and globalEndJob. The graph and DeepTauCache object are created now inside initializeGlobalCache. The memory consumption of initializeGlobalCache for the original, quantized and files that are load using memory mapping method are in the memory_usage.pdf file
	- Implemented configuration to use new training files quantized, and set them as default
	- Implementation of configuration for load files using memory mapping. In our case there wasn't any improvement, respect at the memory consumption of this method, respect the quantized files, so this is not used, but set for future training files
- General code review and cleaning.
@fabiocos
Copy link
Contributor

fabiocos commented Jan 8, 2019

+code-checks

@fabiocos
Copy link
Contributor

fabiocos commented Jan 8, 2019

code-checks

@fabiocos
Copy link
Contributor

fabiocos commented Jan 8, 2019

@mbluj as far as I can see the list of commits here is containing the unwanted big files, so we need to squash it. In case we do not want to miss the review history, I suggest to just open a new PR with the commit resulting from the squash.

@mbluj
Copy link
Contributor Author

mbluj commented Jan 8, 2019

@fabiocos: please squash the history as done with 102X version of this PR if you you don't mind. The full history is anyway preserved for original PR to the master. Thank you.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 8, 2019

Comparison job queued.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 8, 2019

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 8, 2019

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-25385/7868

  • This PR adds an extra 145200KB to repository

  • Found files with invalid states:

    • RecoTauTag/RecoTau/data/DPFIsolation_2017v1.pb:
    • RecoTauTag/RecoTau/test/runTauIdMVA.py:
    • RecoTauTag/RecoTau/data/deepTau_2017v1_20L1024N.pb:
    • RecoTauTag/RecoTau/python/DPFIsolation_cfi.py:
    • RecoTauTag/RecoTau/python/DeepTauId_cfi.py:
    • RecoTauTag/RecoTau/data/DPFIsolation_2017v0.pb:
    • RecoTauTag/RecoTau/python/runTauIdMVA.py:

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 8, 2019

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-25385/32469/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 27
  • DQMHistoTests: Total histograms compared: 2721493
  • DQMHistoTests: Total failures: 108
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2721223
  • DQMHistoTests: Total skipped: 162
  • DQMHistoTests: Total Missing objects: 0

@fabiocos
Copy link
Contributor

fabiocos commented Jan 9, 2019

@mbluj @perrotta I would prefer to avoid losing the review history. But as 10 we need to squash anyway and 2) @smuzaffar will implement the possibility to instruct the bot for it, I will clone this PR into another one, and try the new possibility there.

@fabiocos
Copy link
Contributor

code-checks

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-25385/7927

  • This PR adds an extra 145196KB to repository

  • Found files with invalid states:

    • RecoTauTag/RecoTau/data/DPFIsolation_2017v1.pb:
    • RecoTauTag/RecoTau/test/runTauIdMVA.py:
    • RecoTauTag/RecoTau/data/deepTau_2017v1_20L1024N.pb:
    • RecoTauTag/RecoTau/python/DPFIsolation_cfi.py:
    • RecoTauTag/RecoTau/python/DeepTauId_cfi.py:
    • RecoTauTag/RecoTau/data/DPFIsolation_2017v0.pb:
    • RecoTauTag/RecoTau/python/runTauIdMVA.py:

@fabiocos
Copy link
Contributor

@smuzaffar I have cross checked that the code in #26621 is identical to that in this PR, but code-checks is complaining in the new and not in the old version. If I just manually merge in a working area 25385 and run "scram b code-checks" I indeed find the complaints, and produces the patch as in #25621. So what is going on? Is the bot missing the differences in some case?

@mbluj
Copy link
Contributor Author

mbluj commented Jan 11, 2019

It is indeed interesting issue to understand, but maybe in parallel it makes sense to fix the code issues here and prepare small PRs with similar fixes also for master and 102X? Or start with fix to master with backports to 102X and here (with cherry-pick)?

@fabiocos
Copy link
Contributor

In order to manage things in a ordered way, I would move forward with merging the present version, and let you add fixes where needed for all the possible releases, staring with master

@fabiocos
Copy link
Contributor

@mbluj as the squashed version has been merged, this one may be closed, and will stay as a a reference for the review.

@perrotta
Copy link
Contributor

-1
Superseeded by #25621: please @mbluj close this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants