Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Legacy csvV2 calibrations for 2018 PbPb data #31660

Merged
merged 5 commits into from
Oct 16, 2020

Conversation

mandrenguyen
Copy link
Contributor

@mandrenguyen mandrenguyen commented Oct 3, 2020

PR description:

This PR adds csvV2 b-tagging calibrations for the 2018 PbPb data. The calibrations are delivered via an xml file and require a ESProducer to read them. This xml file is included in
cms-data/RecoBTag-Combined#36
For Run 3, we plan to resynchronize with the b-tagging calibrations used for pp. However, these legacy calibrations are needed for the 2018 PbPb miniAOD to match what is currently being used at analysis level.

PR validation:

Tested on 140.5611

if this PR is a backport please specify the original PR and why you need to backport that PR:

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 3, 2020

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 3, 2020

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-31660/18767

  • This PR adds an extra 1272KB to repository

  • There are other open Pull requests which might conflict with changes you have proposed:

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 3, 2020

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 3, 2020

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-31660/18768

  • This PR adds an extra 1272KB to repository

  • There are other open Pull requests which might conflict with changes you have proposed:

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 3, 2020

A new Pull Request was created by @mandrenguyen (Matthew Nguyen) for master.

It involves the following packages:

PhysicsTools/PatAlgos
RecoBTag/Combined
RecoBTag/SecondaryVertex

@perrotta, @jpata, @cmsbuild, @santocch, @slava77 can you please review it and eventually sign? Thanks.
@jdamgov, @rappoccio, @gouskos, @jdolen, @ahinzmann, @smoortga, @riga, @schoef, @emilbols, @mariadalfonso, @JyothsnaKomaragiri, @nhanvtran, @gkasieczka, @clelange, @hatakeyamak, @ferencek, @gpetruc, @andrzejnovak, @peruzzim, @seemasharmafnal, @mmarionncern this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@slava77
Copy link
Contributor

slava77 commented Oct 3, 2020

The calibrations are delivered via an xml file and require a ESProducer to read them.

the file RecoBTag/Combined/data/TMVA_Btag_CsJets_PbPb2018_BDTG.weights.xml should be removed from this PR and proposed as a part of a PR in https://github.com/cms-data/RecoBTag-Combined
Please scrub the xml file from the commit history (given one commit, a rebase will be needed).

@mandrenguyen
Copy link
Contributor Author

@slava77 Sorry for putting the xml file in the wrong place.
How do I properly remove it from the commit history?
I don't want to mess that up.

@slava77
Copy link
Contributor

slava77 commented Oct 4, 2020

@slava77 Sorry for putting the xml file in the wrong place.
How do I properly remove it from the commit history?
I don't want to mess that up.

smth like

git reset --soft HEAD~2 #resets to before all your 2 commits (as in the PR now)
git reset HEAD RecoBTag/Combined/data/TMVA_Btag_CsJets_PbPb2018_BDTG.weights.xml #undo git add
rm RecoBTag/Combined/data/TMVA_Btag_CsJets_PbPb2018_BDTG.weights.xml
git commit 
scram build code-checks; scram build code-format
git push --force your-remote-name hiLegacyRun2CSVv2

you may want to also modify the file names given https://github.com/cms-sw/cmssw/pull/31660/files#r499205087

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 4, 2020

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 4, 2020

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-31660/18771

  • This PR adds an extra 12KB to repository

  • There are other open Pull requests which might conflict with changes you have proposed:

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 8, 2020

+1
Tested at: 3f3a2f1
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-949549/9799/summary.html
CMSSW: CMSSW_11_2_X_2020-10-07-2300
SCRAM_ARCH: slc7_amd64_gcc820

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 8, 2020

Comparison job queued.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 8, 2020

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-949549/9799/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 6 differences found in the comparisons
  • DQMHistoTests: Total files compared: 35
  • DQMHistoTests: Total histograms compared: 2542225
  • DQMHistoTests: Total failures: 7
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2542196
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 34 files compared)
  • Checked 149 log files, 22 edm output root files, 35 DQM output files

@jpata
Copy link
Contributor

jpata commented Oct 8, 2020

Thanks for restarting the test Slava. Looks to me like the reco output is changed only for HI workflows, only for the affected tagger, as expected.

@mandrenguyen to proceed, I would like to ask:

  • have you already made any larger-stats validation plots that the tagger output is as desired?
  • please include a timing output to be sure

@mandrenguyen
Copy link
Contributor Author

@jpata
For wf 140.5611 (single thread), here's the time report:
TimeReport> Time report complete in 350.315 seconds
Time Summary:

  • Min event: 0.241455
  • Max event: 17.0775
  • Avg event: 1.79025
  • Total loop: 232.863
  • Total init: 114.566
  • Total job: 350.315
  • EventSetup Lock: 0
  • EventSetup Get: 0
    Event Throughput: 0.429437 ev/s

Here's the module level summary, picking out the top ones by eye
TimeReport ---------- Module Summary ---[Real sec]----
TimeReport per event per exec per visit Name
TimeReport 0.369067 0.369067 0.369067 akCs4PFJets
TimeReport 0.098232 0.098232 0.098232 puppi
TimeReport 0.082314 0.082314 0.082314 isolatedTracks
TimeReport 0.078638 0.078638 0.078638 packedPFCandidates
TimeReport 0.076910 0.076910 0.076910 secondaryVertexTagInfos
TimeReport 0.065569 0.065569 0.065569 impactParameterTagInfos
TimeReport 0.042427 0.042427 0.042427 pfCandidateToVertexAssociation
TimeReport 0.033903 0.033903 0.033903 primaryVertexAssociation
TimeReport 0.028780 0.028780 0.028780 packedCandidateTrackValidator
TimeReport 0.026883 0.026883 0.026883 patMETs
TimeReport 0.026764 0.026764 0.026764 patPFMet
TimeReport 0.023160 0.023160 0.023160 pfPileUpIsoPFBRECO
TimeReport 0.021012 0.021012 0.021012 combinedSecondaryVertexV2BJetTags
TimeReport 0.020493 0.020493 0.020493 patTrigger
TimeReport 0.018123 0.018123 0.018123 lostTracks
TimeReport 0.014771 0.014771 0.014771 hiPuRho
TimeReport 0.015700 0.015700 0.015700 ca8PFJetsCHSprunedForBoostedTaus

If it interpret correctly this module takes about 1% of the total time.

The validation of the code and xml file where originally presented here:
https://indico.cern.ch/event/853820/contributions/3590366/attachments/1921458/3178704/HIN_BTag2018_07102019.pdf

I can of course try and check quickly that the output is still sensible after modifications made in the course of the code review, if that's what you're asking for.

@jpata
Copy link
Contributor

jpata commented Oct 8, 2020

Thanks!

Yes, combinedSecondaryVertexV2BJetTags that you modify with your computer takes about 1% according to this list, I think this is fine.

Given that some modifications were made to the code, I think making a small validation that the shapes are not changed in major ways may save time later on.

@mandrenguyen
Copy link
Contributor Author

@jpata I don't have a setup to draw the values immediately available, and I'm having trouble getting the values out of FWLite.
I can cout the values and they look reasonable.
For the validation of the HI miniAOD code, we plan to compare the values from our current AOD -> flat ntuple workflow to the new AOD -> miniAOD -> flat ntuple workflow, but I don't have the latter setup at the moment.

@jpata
Copy link
Contributor

jpata commented Oct 9, 2020

OK. Since I understand you (=HI) are both the main developer and consumer of this tagger, I don't see an issue to merge this now, so you can fully validate the shape on a large scale later.

@jpata
Copy link
Contributor

jpata commented Oct 14, 2020

+reconstruction

@jpata
Copy link
Contributor

jpata commented Oct 14, 2020

@mandrenguyen please add to the PR description clearly that it requires cms-data/RecoBTag-Combined#36.

@silviodonato
Copy link
Contributor

merge

@cmsbuild cmsbuild merged commit 1c7df09 into cms-sw:master Oct 16, 2020
@santocch
Copy link

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will be automatically merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants