New style GEDelectron and GEDphoton regressions to EGM reco sequence in 90X #17101

rafaellopesdesa · 2016-12-28T18:44:09Z

PR for new style of EGM energy correction and per-particle resolution regression. It is similar to PR #16968 which was intended for standalone application by users (and that is still under review). This one modifies the EGM reco sequence, so changes are expected in the validation plots, specially in plots related to electron and photon energy. This PR depends on the PR #17048, which I believe has already been merged, with the new name convention for the regressions in the GlobalTag. I've tested locally in details with workflow 250200.0 and the results were good.

Att: @fcouderc

cmsbuild · 2016-12-28T18:44:29Z

A new Pull Request was created by @rafaellopesdesa (Rafael Lopes de Sa) for CMSSW_9_0_X.

It involves the following packages:

RecoEgamma/EgammaTools

@cmsbuild, @cvuosalo, @slava77, @davidlange6 can you please review it and eventually sign? Thanks.
@Sam-Harper, @lgray this is something you requested to watch as well.
@davidlange6, @smuzaffar you are the release manager for this.

cms-bot commands are listed here #13028

slava77 · 2016-12-28T19:01:14Z

@cmsbuild please test

cmsbuild · 2016-12-28T19:01:31Z

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/17165/console Started: 2016/12/28 20:02

slava77 · 2016-12-28T19:04:17Z

RecoEgamma/EgammaTools/plugins/EGExtraInfoModifierFromDB.cc

+
+  bool useLocalFile_;
+  std::string addressLocalFile_;  
+  TFile* pointerLocalFile_;


additional open files during execution should be avoided as much as possible.
Open the file, read what's needed from it into memory and then close the file.
Do this if memory profile allows.

slava77 · 2016-12-28T19:05:06Z

RecoEgamma/EgammaTools/plugins/EGExtraInfoModifierFromDB.cc

+  useLocalFile_     = conf.getParameter<bool>("useLocalFile");
+  if (useLocalFile_) {
+    addressLocalFile_ = conf.getParameter<std::string>("addressLocalFile");
+    pointerLocalFile_ = TFile::Open(addressLocalFile_.c_str());


use FileInPath

slava77 · 2016-12-28T19:09:20Z

RecoEgamma/EgammaTools/plugins/EGExtraInfoModifierFromDB.cc


-  std::array<float, 33> eval;  
+  Int_t N_SATURATEDXTALS  = 0;


please use base C++ types.
Also, all caps for a variable is a bad style, it's usually used for compiler macros and some constants.

slava77 · 2016-12-28T19:11:44Z

RecoEgamma/EgammaTools/plugins/EGExtraInfoModifierFromDB.cc

+  eval[0]  = raw_energy;
+  eval[1]  = the_sc->etaWidth();
+  eval[2]  = the_sc->phiWidth(); 
+  eval[3]  = full5x5_ess.e5x5/raw_energy;


check for full5x5_ess.e5x5 to be non-zero and set all values with potential divisions by zero to something sensible below. (maybe define e5x5Inverse = full5x5_ess.e5x5 != 0 ? 1.f/full5x5_ess.e5x5 : 0 )

Please define rawInverse in a similar way. It's used multiple times here (in [3] and then in [25-27])

slava77 · 2016-12-28T19:20:48Z

RecoEgamma/EgammaTools/plugins/EGExtraInfoModifierFromDB.cc

+  float ptMode       = el_track->ptMode();
+  float ptModeErrror = el_track->ptModeError();
+  float etaModeError = el_track->etaModeError();
+  float pModeError   = sqrt(ptModeErrror*ptModeErrror*cosh(trkEta)*cosh(trkEta) + ptMode*ptMode*sinh(trkEta)*sinh(trkEta)*etaModeError*etaModeError);


Can you use qoverpModeError instead of this rather unnatural value?

slava77 · 2016-12-28T19:23:18Z

RecoEgamma/EgammaTools/python/regressionWeights_cfi.py

-from CondCore.DBCommon.CondDBSetup_cfi import *
+import FWCore.ParameterSet.Config as cms
+
+def regressionWeights(process):


why is this file needed if you already have the GT updated in the release?

cmsbuild · 2016-12-28T19:47:49Z

-1

Tested at: ea7d9b8

You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-17101/17165/summary.html

I found follow errors while testing this PR

Failed tests: RelVals AddOn

RelVals:

When I ran the RelVals I found an error in the following worklfows:
5.1 step1

runTheMatrix-results/5.1_TTbar+TTbarFS+HARVESTFS/step1_TTbar+TTbarFS+HARVESTFS.log

9.0 step3

runTheMatrix-results/9.0_Higgs200ChargedTaus+Higgs200ChargedTaus+DIGI+RECO+HARVEST/step3_Higgs200ChargedTaus+Higgs200ChargedTaus+DIGI+RECO+HARVEST.log

25.0 step3

runTheMatrix-results/25.0_TTbar+TTbar+DIGI+RECOAlCaCalo+HARVEST+ALCATT/step3_TTbar+TTbar+DIGI+RECOAlCaCalo+HARVEST+ALCATT.log

AddOn:

I found errors in the following addon tests:

cmsDriver.py TTbar_8TeV_TuneCUETP8M1_cfi --conditions auto:run1_mc --fast -n 100 --eventcontent AODSIM,DQM --relval 100000,1000 -s GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,EI,HLT:@Fake,VALIDATION --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --datatier GEN-SIM-DIGI-RECO,DQMIO --beamspot Realistic8TeVCollision : FAILED - time: date Wed Dec 28 20:36:01 2016-date Wed Dec 28 20:33:45 2016 s - exit: 16640
cmsDriver.py RelVal -s HLT:Fake,RAW2DIGI,L1Reco,RECO --mc --scenario=pp -n 10 --conditions auto:run1_mc_Fake --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --processName=HLTRECO --filein file:RelVal_Raw_Fake_MC.root --fileout file:RelVal_Raw_Fake_MC_HLT_RECO.root : FAILED - time: date Wed Dec 28 20:40:28 2016-date Wed Dec 28 20:34:11 2016 s - exit: 16640

cmsbuild · 2016-12-28T19:47:51Z

Comparison not run due to runTheMatrix errors (RelVals and Igprof tests were also skipped)

rafaellopesdesa · 2016-12-28T20:43:58Z

@slava77 I will apply the corrections in the code review. I have some comments about the other questions here:

I tried to run the workflows that failed locally. They failed because the regressions cannot be found in the mcRun1 GlobalTags (I ran adding the regressions by hand). I have contacted the AlCa conveners to check why the mcRun2 queues were updated but not the mcRun1 ones.
Yes, we could have used a more natural definition of the uncertainty in the track momentum. In particular error on qOverp is an excellent suggestion. However, the training has already be done with this (rather convoluted, I agree) definition. We will keep your suggestion in mind for the next round.
regressionWeights_cfi.py is not necessary. I just updated it for "historical reasons". It was there before and I updated it pointing to the new names (which supposedly are in the GT). If preferred, I can remove this configuration from RecoEgamma.

Should I go updating the code with your review or should I wait the confirmation from AlCa that the mcRun1 queues have been committed as well before doing that?

slava77 · 2016-12-28T21:24:16Z

On 12/28/16 12:43 PM, Rafael Lopes de Sa wrote: @slava77 <https://github.com/slava77> I will apply the corrections in the code review. I have some comments about the other questions here: * I tried to run the workflows that failed locally. They failed because the regressions cannot be found in the mcRun1 GlobalTags (I ran adding the regressions by hand). I have contacted the AlCa conveners to check why the mcRun2 queues were updated but not the mcRun1 ones.

OK. Assuming the payloads are available, it should be straightforward to get the updates in.

* Yes, we could have used a more natural definition of the uncertainty in the track momentum. In particular error on qOverp is an excellent suggestion. However, the training has already be done with this (rather convoluted, I agree) definition. We will keep your suggestion in mind for the next round.

Can you check the math with a few printouts for what goes into the training inputs using qOverP and this pError. It could be that the final value that's computed will be the same, to within numerical precision. The problem is that you are making a numerical roundtrip from qOverP to pt and eta and then back to p, which is slow and can be a source of bad values for no good reason.

* regressionWeights_cfi.py is not necessary. I just updated it for "historical reasons". It was there before and I updated it pointing to the new names (which supposedly are in the GT). If preferred, I can remove this configuration from RecoEgamma.

Please remove it.

Should I go updating the code with your review or should I wait the confirmation from AlCa that the mcRun1 queues have been committed as well before doing that?

Both will be necessary to proceed with this PR. It's up to you to wait or update the code now. Is it correct to assume that you will not need to support the old training with the old variables in 90X? Do we now have a copy of the new training processing code here and in EgammaAnalysis? If so, the copy in the analysis better be removed to reduce possible confusion when someone just keeps using 80X recipes on 91X.

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#17101 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEdcbser0grC3KOXRQVcakW-MHh6UhF7ks5rMsoOgaJpZM4LXHH4>.

rafaellopesdesa · 2016-12-29T01:28:38Z

About the suggestion of how to deal with the 80X PR, I think I prefer the file structure as it is right now. I understand that is less ideal for code review, but I think it is easier for us in EGM to find things in the future. As I've explained in the 80X PR, EgammaAnalysis has been traditionally used for user-level applications, and I was hoping to be faithful to this idea.

Let's see how these PR go. Tomorrow I will update the 80X PR with the improvements discussed today in this thread.

mmusich · 2016-12-30T09:10:51Z

@rafaellopesdesa @slava77
about:

I tested other run1 workflows, and all seem to fail because of the lack of the payloads in the GT... as I mentioned above, I wrote to @mmusich about this issue. When the GTs are update we proceed with the tests here. :)

the updates are being tested now in #17104. I suggest that commit cms-AlCaDB@aee8c69 is pulled in together with the rest of the code changes requested during the review to be able to execute tests.
Cheers,

M.

slava77 · 2017-01-09T22:04:41Z

@cmsbuild please test

cmsbuild · 2017-01-09T22:04:58Z

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/17219/console Started: 2017/01/09 23:06

cmsbuild · 2017-01-09T23:33:07Z

+1
Tested at: 4b9089e
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-17101/17219/summary.html

cmsbuild · 2017-01-09T23:33:10Z

Comparison job queued.

cmsbuild · 2017-01-10T01:56:50Z

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-17101/17219/summary.html

slava77 · 2017-01-11T14:02:36Z

@rafaellopesdesa
I was looking at slides in the EGM/ECAL meeting today.
Should I interpret that you will soon update the payloads or/and the variables used in training?
Please clarify.
Thank you.

slava77 · 2017-01-11T19:58:07Z

RecoEgamma/EgammaTools/plugins/EGExtraInfoModifierFromDB.cc


-  const std::vector<std::string> ph_condnames_mean  = (bunchspacing_ == 25) ? ph_conf.condnames_mean_25ns  : ph_conf.condnames_mean_50ns;
-  const std::vector<std::string> ph_condnames_sigma = (bunchspacing_ == 25) ? ph_conf.condnames_sigma_25ns : ph_conf.condnames_sigma_50ns;
+  const std::vector<std::string> ph_condnames_ecalonly_mean  = ph_conf.condnames_ecalonly_mean;


use const std::vector<std::string>& ph_condnames_ecalonly_mean = ph_conf.condnames_ecalonly_mean; to avoid unnecessary copies

same for all other instances in this file

slava77 · 2017-01-11T20:01:55Z

RecoEgamma/EgammaTools/plugins/EGExtraInfoModifierFromDB.cc

-  std::array<float, 33> eval;  
+  int nSaturatedXtals  = 0;
+  std::vector< std::pair<DetId, float> > hitsAndFractions = theseed->hitsAndFractions();
+  for (auto hitFractionPair : hitsAndFractions) {    


const std::vector< std::pair<DetId, float> >& hitsAndFractions
for (auto const& hitFractionPair : hitsAndFractions

slava77 · 2017-01-11T20:07:50Z

RecoEgamma/EgammaTools/plugins/EGExtraInfoModifierFromDB.cc

-
-    if(theseed == pclus ) 
+  size_t i_cluster = 0;
+  for( auto clus = the_sc->clustersBegin(); clus != the_sc->clustersEnd(); ++clus ) {


why not just for (auto const& pclus: the_sc->clusters() ) ?

slava77 · 2017-01-11T20:10:31Z

RecoEgamma/EgammaTools/plugins/EGExtraInfoModifierFromDB.cc

  //magic numbers for MINUIT-like transformation of BDT output onto limited range
  //(These should be stored inside the conditions object in the future as well)
-  constexpr double meanlimlow  = 0.2;
-  constexpr double meanlimhigh = 2.0;
+  constexpr double meanlimlow  = -1.0;


Looks like the comment about saving these magic numbers in the conditions did not serve the purpose.
Better happen next time.

slava77 · 2017-01-11T20:12:12Z

RecoEgamma/EgammaTools/plugins/EGExtraInfoModifierFromDB.cc

+
+  int nSaturatedXtals  = 0;
+  std::vector< std::pair<DetId, float> > hitsAndFractions = theseed->hitsAndFractions();
+  for (auto hitFractionPair : hitsAndFractions) {


as above, add & to avoid copies

slava77 · 2017-01-11T20:12:57Z

RecoEgamma/EgammaTools/plugins/EGExtraInfoModifierFromDB.cc

+  edm::Ptr<reco::CaloCluster> pclus;
+  // loop over all clusters that aren't the seed  
+  size_t i_cluster = 0;
+  for( auto clus = the_sc->clustersBegin(); clus != the_sc->clustersEnd(); ++clus ) {


as above, range loop can be simpler

rafaellopesdesa · 2017-01-11T21:38:54Z

Hi Slava, Yes, Fabrice pointed to me an inconsistency between the training code and application code, so a small update will be introduced in the coming days. I will also reply to your more recent code review concomitantly. Thanks again, -- Rafael.

…

On Thu, Jan 12, 2017 at 5:12 AM, Slava Krutelyov ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In RecoEgamma/EgammaTools/plugins/EGExtraInfoModifierFromDB.cc <#17101 (review)>: > + eval[22] = full5x5_pss.e2x5Bottom*e5x5Inverse; + eval[23] = nSaturatedXtals; + eval[24] = std::max(0,numberOfClusters); + + // calculate sub-cluster variables + std::vector<float> clusterRawEnergy; + clusterRawEnergy.resize(std::max(3, numberOfClusters), 0); + std::vector<float> clusterDEtaToSeed; + clusterDEtaToSeed.resize(std::max(3, numberOfClusters), 0); + std::vector<float> clusterDPhiToSeed; + clusterDPhiToSeed.resize(std::max(3, numberOfClusters), 0); + + edm::Ptr<reco::CaloCluster> pclus; + // loop over all clusters that aren't the seed + size_t i_cluster = 0; + for( auto clus = the_sc->clustersBegin(); clus != the_sc->clustersEnd(); ++clus ) { as above, range loop can be simpler — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#17101 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFtCpHcPEK4re83Ripiq29rG-sVZSM4oks5rRTfKgaJpZM4LXHH4> .

slava77 · 2017-01-16T19:14:04Z

@rafaellopesdesa
I just wanted to check the progress here.

slava77 · 2017-01-27T13:15:49Z

ping
is this still needed?
(I know there were other things to be taken care of)

slava77 · 2017-02-01T12:15:14Z

(almost) weekly ping.
@rafaellopesdesa
Please update on the status of this PR

Thank you.

rafaellopesdesa · 2017-02-01T14:35:09Z

@slava77 Iterating with AlCA to add the payloads and just made an associated PR so that I don't have to re-calculate values when applying the regression.

rafaellopesdesa · 2017-02-14T00:47:40Z

Closing and merging with #17370 with the corrections required in both reviews.

add new regression to EGM reco sequence in 90X, requires PR 17048

ea7d9b8

cmsbuild added this to the Next CMSSW_9_0_X milestone Dec 28, 2016

cmsbuild added comparison-pending orp-pending pending-signatures reconstruction-pending tests-pending labels Dec 28, 2016

cmsbuild added tests-started and removed tests-pending labels Dec 28, 2016

slava77 reviewed Dec 28, 2016

View reviewed changes

cmsbuild added comparison-notrun tests-rejected and removed comparison-pending tests-started labels Dec 28, 2016

code review modifications

ca008cd

cmsbuild added comparison-pending and removed comparison-notrun tests-rejected labels Dec 28, 2016

mmusich mentioned this pull request Dec 30, 2016

[90X] Add new style of e/gamma regressions labels in Run1 MC Global Tags #17104

Merged

cmsbuild added tests-started and removed tests-pending labels Jan 9, 2017

cmsbuild added tests-approved and removed tests-started labels Jan 9, 2017

cmsbuild added comparison-available and removed comparison-pending labels Jan 10, 2017

slava77 reviewed Jan 11, 2017

View reviewed changes

rafaellopesdesa mentioned this pull request Feb 1, 2017

Add 2x5 shower-shape and saturation info for electrons and photons #17370

Closed

rafaellopesdesa closed this Feb 14, 2017

rafaellopesdesa mentioned this pull request Feb 14, 2017

New methods in GsfElectron and Photon classes, and 90X EGM regression #17506

Merged

New style GEDelectron and GEDphoton regressions to EGM reco sequence in 90X #17101

New style GEDelectron and GEDphoton regressions to EGM reco sequence in 90X #17101

Conversation

rafaellopesdesa commented Dec 28, 2016

cmsbuild commented Dec 28, 2016

slava77 commented Dec 28, 2016

cmsbuild commented Dec 28, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmsbuild commented Dec 28, 2016

cmsbuild commented Dec 28, 2016

rafaellopesdesa commented Dec 28, 2016

slava77 commented Dec 28, 2016 via email

rafaellopesdesa commented Dec 29, 2016

mmusich commented Dec 30, 2016

slava77 commented Jan 9, 2017

cmsbuild commented Jan 9, 2017 • edited

cmsbuild commented Jan 9, 2017

cmsbuild commented Jan 9, 2017

cmsbuild commented Jan 10, 2017

slava77 commented Jan 11, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rafaellopesdesa commented Jan 11, 2017 via email

slava77 commented Jan 16, 2017

slava77 commented Jan 27, 2017

slava77 commented Feb 1, 2017

rafaellopesdesa commented Feb 1, 2017

rafaellopesdesa commented Feb 14, 2017

cmsbuild commented Dec 28, 2016 •

edited

cmsbuild commented Jan 9, 2017 •

edited