Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNN-based Tau-Id discrimians (102X) #25386

Merged

Conversation

mbluj
Copy link
Contributor

@mbluj mbluj commented Nov 30, 2018

This pull request provides two new DNN-based Tau-Ids, DeepTau and DPFTau, to be produced for pat::Taus with MiniAOD.
It is a backport of #25016 to 102X for analyses based on 2018 data and detailed description can be found therein.

kandrosov and others added 30 commits October 27, 2018 21:23
- Defined base class for deep tau discriminators.
- Removed weight files from home cms repository. Now using weights from cms-data.
- Defined WP for both discriminators. Now all discriminators return the corresponding WP results.
- Removed cfi files. Using fillDescriptions instead.
- General code review and cleaning.
…es quantized

- Added a new parameter 'version' on runTauIdMVA, used on DPFIsolation
- Changes on DeepTauId to reduce memory consumption
…read and reduce the memory consuption

- Creation of class DeepTauCache in DeepTauBase, in which now is created graph and session
- Implementation of two new static methods inside the class DeepTauBase: initializeGlobalCache and globalEndJob. The graph and DeepTauCache object are created now inside initializeGlobalCache
TauWPThreshold class parses WP cut string (or value) provided in the
python configuration. It is needed because the use of the standard
StringObjectFunction class to parse complex expression results in an
extensive memory usage (> 100 MB per expression).
	- Implementation of global cache to avoid reloading graph for each thread
	- Creation of two new static methods inside the class DeepTauBase: initializeGlobalCache and globalEndJob. The graph and DeepTauCache object are created now inside initializeGlobalCache. The memory consumption of initializeGlobalCache for the original, quantized and files that are load using memory mapping method are in the memory_usage.pdf file
	- Implemented configuration to use new training files quantized, and set them as default
	- Implementation of configuration for load files using memory mapping. In our case there wasn't any improvement, respect at the memory consumption of this method, respect the quantized files, so this is not used, but set for future training files
- General code review and cleaning.
@smuzaffar
Copy link
Contributor

hold
this PR history contains large data files.

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 9, 2018

Pull request has been put on hold by @smuzaffar
They need to issue an unhold command to remove the hold state or L1 can unhold it for all

@perrotta
Copy link
Contributor

@smuzaffar , if I understand it correctly, your request in practice is to rebase these pull requests: correct?

@kandrosov
Copy link
Contributor

@smuzaffar The PR #25016 has already been merged. It should include the same big files in history as in this PR. Therefore, merging this PR should not consume more space, since the big files are already part of the history in the central repository.
To solve the problem, one can merge this PR (and similar backport PR in 94X) and then manually remove the big files from the history in the central repository.

@smuzaffar
Copy link
Contributor

@kandrosov, It was mistake on our part that #25016 was merged with those big files in the history and now we get warnings like these [a]. We are going to rebase master branch to clean the history and remove ref to these big files.

For this back ports PR I would suggest to rebase ( and re-write the history , see detaile here https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History and edit the commits which add these files)

[a]

remote: warning: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com.
remote: warning: See http://git.io/iEPt8g for more information.
remote: warning: File RecoTauTag/RecoTau/data/DPFIsolation_2017v0.pb is 50.10 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File RecoTauTag/RecoTau/data/deepTau_2017v1_20L1024N.pb is 80.98 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB

@smuzaffar
Copy link
Contributor

I would suggest to clean the history and remove ref to these files [a]. These were added and later deleted in the PR.

ERROR: RecoTauTag/RecoTau/data/DPFIsolation_2017v1.pb ['Added', 'Deleted']
ERROR: RecoTauTag/RecoTau/test/runTauIdMVA.py ['Added', 'Deleted']
ERROR: RecoTauTag/RecoTau/data/deepTau_2017v1_20L1024N.pb ['Added', 'Deleted']
ERROR: RecoTauTag/RecoTau/python/DPFIsolation_cfi.py ['Added', 'Deleted']
ERROR: RecoTauTag/RecoTau/python/DeepTauId_cfi.py ['Added', 'Deleted']
ERROR: RecoTauTag/RecoTau/data/DPFIsolation_2017v0.pb ['Added', 'Deleted']
ERROR: RecoTauTag/RecoTau/python/runTauIdMVA.py ['Added', 'Modified', 'Deleted']

@mbluj
Copy link
Contributor Author

mbluj commented Dec 11, 2018

Hello,
as I never tried to rewrite git history (which sounds danger) I would like ask for confirmation if I got correctly what should be done. What I understand is following

  1. I checkout the PR'ed branch from the repository from which I PR'ed to official CMSSW, e.g. like this
git cms-init -X repo-name
git checkout branch-name
  1. Then, I clean history of the branch as follows
git filter-branch --tree-filter 'rm -f file1-to-remove, file2-to-remove ... ' branch-name
  1. Finally, I should push the branch with modified history to remote (will it be possible?).

Is it what you mean?
This should be repeated also for 94X backport.

@fabiocos
Copy link
Contributor

@mbluj as this and the 9_4_X are backports with no special review in the middle of the history, unless keeping the long list of commits has a value for you I think it might be simpler to follow the advice https://cms-sw.github.io/faq.html#how-do-i-collapse-multiple-commits-into-one

@smuzaffar this is what we were discussing as a possibility to trim commit histories, right?

@mbluj
Copy link
Contributor Author

mbluj commented Dec 12, 2018

@fabiocos @smuzaffar I have no problem with rewriting a full history - it takes quite some time, but I can do it in background. However, I would like to understand if procedure I described is correct one (to not mess things).

@smuzaffar
Copy link
Contributor

unhold
I am happy with the squash

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_10_2_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_10_4_X is complete. This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

@fabiocos
Copy link
Contributor

@mbluj I can just squash myself the history at merge time, if that is ok for you

@mbluj
Copy link
Contributor Author

mbluj commented Dec 12, 2018

@mbluj I can just squash myself the history at merge time, if that is ok for you

Yes, great! Thank you.

@fabiocos fabiocos merged commit afd5e8e into cms-sw:CMSSW_10_2_X Dec 12, 2018
@fabiocos
Copy link
Contributor

@smuzaffar if I add "+1" will this confuse the bot, or trigger any further action on his side? Just to update the label status for the record..

@smuzaffar
Copy link
Contributor

it should be fine to do +1 now. Bot should only update the labels.

@fabiocos
Copy link
Contributor

+1

@smuzaffar ok thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants