Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EGamma Scale & Smearing For Re-miniAOD #22531

Merged
merged 13 commits into from Mar 15, 2018

Conversation

Sam-Harper
Copy link
Contributor

Dear All,

This is mostly a complete re-write of the E/gamma Scale & Smearing code to bring it up to minimal acceptable standards. This code is not perfect, many things could be done better but perfect is the enemy of good. It is functionally equivalent to #22308

This is step one in an evolution of E/gamma energy correctors. You will notice significant ground work has been done in energy regression management. It is a future goal to remove the badness that is "EGExtraInfoModifierFromDB" and instead have the regression all calculated in simple classes which can be wrapped in producers or modifiers as we see fit.

Guide to changes:
ReducedEGProducer has been modified to allow the energy to be changed inside it. In hindsight, it would be have been better to have a general modification framework rather than such a specific hardcoded option. It is the easiest option here to modify the electrons rather an introduce another intermediate step. A reminder, VID re-run on the miniAOD output must give identical results hence ReducedEGProducer needs to already have any modifiers applied to it. Note, I am still "abusing" the PFClusterIsolation maps, this will be fixed by just renaming the parameters to something more neutral

All the classes in EgammaAnalysis/ElectronTools needed for scale & smearing have been put into RecoEgamma/EgammaTools. The classes have been substantially re-written, although the original design is clearly recognizable. They are as follows

  • EnergyScaleCorrection
    The class which reads the scale & smearing corrections from a file and then returns the ones appropriate for an electron/photon. The information is now encapsulated in sub classes ScaleCorrection and SmearCorrection which contain the information for a given e/gamma object. In general, I dislike the reading mechanism, its all kinds of awful. However it is what we have. In the future, this should be either in the database or at the very least some sort of nice json/xml format.

  • PhotonEnergyCalibrator
    Uses the energy scale correction class to calculate the new corrected energies and systmatic variations. Uses a deterministic random number based on run/event/lumi/supercluster seed id/#crystals in SC for the smearing. Note the smearing does effect the data, it enters in the error of the energy.

  • ElectronEnergyCalibrator
    The electron equivalent class of PhotonEnergyCalibrator. Diffs because electrons energy is ECAL-Trk combined and the corrections apply solely to the ECAL energy. This complicates things, and is why I had to start to overhaul the regression code as it needs to re-apply the ecal-track regression starting from the new corrected energy. Obviously having two parallel codes doing exactly the same thing is unacceptable, this is why the class EpCombinationTool which does this is designed to as a drop in replacement for the code in "EGExtraInfoModifierFromDB". This hasnt been swapped over yet but will be in a future PR. Note I'm about 90% of the way to being able to fully template this with PhotonEnergyCalibrator but it wont happen for this PR.

  • CalibratedElectronProducerT, CalibratedPhotonProducerT
    The produces which write the output of PhotonEnergyCalibrator , ElectronEnergyCalibrator to a file. Optionally writes out a new collection of calibrated eles/phos

Proto Regression Changes:
As mentioned the ElectronEnergyCalibrator needs to reapply the ecal-track regression. The following classes facilitate this

Future evolution:

  • all actual regression functionality from EGExtraInfoModifierFromDB is removed and it will be done via classes such as EpCombinationTool
  • the entire EgammaAnalysis package will be deleted from the release
  • CalibratedElectron and CalibratedPhoton producers will ultimately be a templated instance of each other
  • CalibratedElectron and CalibratedPhoton producers will likely be able to apply the ECAL energy regression as well

I am now re-running my longer validations these are

  • miniAOD -> miniAOD of the entire DoubleEG dataset, this allows us to check the preformance in every bin, also done for a DY sample
  • re-miniAOD on a smaller sample size of the DoubleEG dataset + DY sample to validate the results are the same as the re-miniAOD
  • the checks Slava suggested (profiles, size increases , etc)

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 8, 2018

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 8, 2018

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 8, 2018

A new Pull Request was created by @Sam-Harper for master.

It involves the following packages:

EgammaAnalysis/ElectronTools
PhysicsTools/PatAlgos
RecoEgamma/EgammaPhotonProducers
RecoEgamma/EgammaTools
RecoEgamma/ElectronIdentification
RecoEgamma/PhotonIdentification

@perrotta, @monttj, @cmsbuild, @slava77, @gpetruc, @arizzi can you please review it and eventually sign? Thanks.
@TaiSakuma, @gouskos, @jainshilpi, @rappoccio, @varuns23, @seemasharmafnal, @mmarionncern, @imarches, @ahinzmann, @acaudron, @lgray, @jdolen, @ferencek, @rovere, @jdamgov, @nhanvtran, @gkasieczka, @schoef, @JyothsnaKomaragiri, @mverzett, @gpetruc, @mariadalfonso, @pvmulder this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@davidlange6
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 8, 2018

The tests are being triggered in jenkins.

@slava77
Copy link
Contributor

slava77 commented Mar 8, 2018

@Sam-Harper
please clarify the status of smearing in this PR.
The description is not clear enough to me and I'm also puzzled by what looks like photons are smeared in data as well ("Note the smearing does effect the data").

My rough estimate is that this will take until about Wednesday (Mar 14) to review and sign off for the master.

@Sam-Harper
Copy link
Contributor Author

The smearing is currently enabled. It can be disabled with a bool switch.

The smearing will effect the data as it enters the energy error. @shervin86 can give a full explanation this more eloquently than me

@Sam-Harper
Copy link
Contributor Author

On the time estimate, the miniAOD production can start without this, this is an ongoing discussion with PC and PPD. What remains is that it is mandatory for this to be applied for all CMS analyses using this data. Therefore I want this in the release regardless whether it makes it to the re-miniAOD. And back ported to 9_4 (with it all disabled if needs be).

This is a simple consequence of having high standards for code in the release now, things take longer, we had a working ready to go PR submitted. It got rejected due to code standards last week and it takes time to re-write and understand things. This will ultimately benefit us in the future. If it means that either the reminiAOD takes a bit longer or analysers have to suffer a bit with recipes, well that was the choice made.

@slava77
Copy link
Contributor

slava77 commented Mar 8, 2018 via email

@Sam-Harper
Copy link
Contributor Author

Okay, will turn off the MC for now. The final decision on this is not taken though.

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 8, 2018

The code-checks are being triggered in jenkins.

@Sam-Harper
Copy link
Contributor Author

done. It just occurred to me that to save a little bit of space, I didnt store the corrected ECAL energy as its trivally a function of other stored numbers. But given its not applied by default now I think its better to store another 4 floats which this information. If you agree, I'll update the PR later to do this as I have other issues to attend to

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 8, 2018

@cmsbuild
Copy link
Contributor

Pull request #22531 was updated. @perrotta, @monttj, @cmsbuild, @slava77, @gpetruc, @arizzi can you please check and sign again.

@davidlange6
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 15, 2018

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/26867/console Started: 2018/03/15 10:43

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@slava77
Copy link
Contributor

slava77 commented Mar 15, 2018

unhold

@cmsbuild cmsbuild removed the hold label Mar 15, 2018
@slava77
Copy link
Contributor

slava77 commented Mar 15, 2018

+1

for #22531 e0525c8

  • bugfix in the input tags was added for cases of running on miniAOD inputs ; otherwise this is the same as the previously signed EGamma Scale & Smearing For Re-miniAOD #22531 2656d1e
  • jenkins tests pass (the jobs run confirming that everything else is still not affected, but the test of running calibrations on miniAOD is not a part of the tests)

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-22531/26867/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 53 differences found in the comparisons
  • DQMHistoTests: Total files compared: 29
  • DQMHistoTests: Total histograms compared: 2477528
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2477351
  • DQMHistoTests: Total skipped: 176
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.869999999886 KiB( 22 files compared)
  • Checked 118 log files, 9 edm output root files, 29 DQM output files

@slava77
Copy link
Contributor

slava77 commented Mar 15, 2018

@smuzaffar
[since I looked]
I was wondering about the health of the comparison stage in/out
https://cmssdt.cern.ch/jenkins/job/compare-root-files-short-matrix/25660/console
for this PR I see

  • 13:39:32 + rsync -az -e
  • 14:14:22 + echo JA_STATUS=OK

It seems a bit crazy these days to have a copy out take 35 mins.
Are the worker nodes on dialup or a similarly performant connection?

@smuzaffar
Copy link
Contributor

@slava77 , last time I check it was still on high bandwidth connection :-)
I think slowdown is due to large number (over 200K) of small files. We can try to archive them first and then upload.

@slava77
Copy link
Contributor

slava77 commented Mar 15, 2018

I think slowdown is due to large number (over 200K) of small files. We can try to archive them first and then upload.

Archiving should be a good thing to do.
Is this really the case here that we have 200K files?
Can you please copy the output somewhere easily accessible.
I wanted to inspect what's there.
I guess RelMon output is the issue where it has a file for every single folder of DQM even if that folder is empty.
Perhaps its output can be pruned so that only folders with plots are created

@fabiocos
Copy link
Contributor

+1

basis for 94X reminiAODv2, not affecting 2018

@smuzaffar
Copy link
Contributor

@slava77 , comparison files are avialable here /afs/cern.ch/work/m/muzaffar/public/PR22531.tar.gz

@fabiocos
Copy link
Contributor

merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants