Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New style GEDelectron and GEDphoton regressions to EGM reco sequence in 90X #17101

Conversation

rafaellopesdesa
Copy link
Contributor

PR for new style of EGM energy correction and per-particle resolution regression. It is similar to PR #16968 which was intended for standalone application by users (and that is still under review). This one modifies the EGM reco sequence, so changes are expected in the validation plots, specially in plots related to electron and photon energy. This PR depends on the PR #17048, which I believe has already been merged, with the new name convention for the regressions in the GlobalTag. I've tested locally in details with workflow 250200.0 and the results were good.

Att: @fcouderc

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @rafaellopesdesa (Rafael Lopes de Sa) for CMSSW_9_0_X.

It involves the following packages:

RecoEgamma/EgammaTools

@cmsbuild, @cvuosalo, @slava77, @davidlange6 can you please review it and eventually sign? Thanks.
@Sam-Harper, @lgray this is something you requested to watch as well.
@davidlange6, @smuzaffar you are the release manager for this.

cms-bot commands are listed here #13028

@slava77
Copy link
Contributor

slava77 commented Dec 28, 2016

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 28, 2016

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/17165/console Started: 2016/12/28 20:02


bool useLocalFile_;
std::string addressLocalFile_;
TFile* pointerLocalFile_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

additional open files during execution should be avoided as much as possible.
Open the file, read what's needed from it into memory and then close the file.
Do this if memory profile allows.

useLocalFile_ = conf.getParameter<bool>("useLocalFile");
if (useLocalFile_) {
addressLocalFile_ = conf.getParameter<std::string>("addressLocalFile");
pointerLocalFile_ = TFile::Open(addressLocalFile_.c_str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use FileInPath


std::array<float, 33> eval;
Int_t N_SATURATEDXTALS = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use base C++ types.
Also, all caps for a variable is a bad style, it's usually used for compiler macros and some constants.

eval[0] = raw_energy;
eval[1] = the_sc->etaWidth();
eval[2] = the_sc->phiWidth();
eval[3] = full5x5_ess.e5x5/raw_energy;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check for full5x5_ess.e5x5 to be non-zero and set all values with potential divisions by zero to something sensible below. (maybe define e5x5Inverse = full5x5_ess.e5x5 != 0 ? 1.f/full5x5_ess.e5x5 : 0 )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please define rawInverse in a similar way. It's used multiple times here (in [3] and then in [25-27])

float ptMode = el_track->ptMode();
float ptModeErrror = el_track->ptModeError();
float etaModeError = el_track->etaModeError();
float pModeError = sqrt(ptModeErrror*ptModeErrror*cosh(trkEta)*cosh(trkEta) + ptMode*ptMode*sinh(trkEta)*sinh(trkEta)*etaModeError*etaModeError);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use qoverpModeError instead of this rather unnatural value?

from CondCore.DBCommon.CondDBSetup_cfi import *
import FWCore.ParameterSet.Config as cms

def regressionWeights(process):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this file needed if you already have the GT updated in the release?

@cmsbuild
Copy link
Contributor

-1

Tested at: ea7d9b8

You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-17101/17165/summary.html

I found follow errors while testing this PR

Failed tests: RelVals AddOn

  • RelVals:

When I ran the RelVals I found an error in the following worklfows:
5.1 step1

runTheMatrix-results/5.1_TTbar+TTbarFS+HARVESTFS/step1_TTbar+TTbarFS+HARVESTFS.log

9.0 step3
runTheMatrix-results/9.0_Higgs200ChargedTaus+Higgs200ChargedTaus+DIGI+RECO+HARVEST/step3_Higgs200ChargedTaus+Higgs200ChargedTaus+DIGI+RECO+HARVEST.log

25.0 step3
runTheMatrix-results/25.0_TTbar+TTbar+DIGI+RECOAlCaCalo+HARVEST+ALCATT/step3_TTbar+TTbar+DIGI+RECOAlCaCalo+HARVEST+ALCATT.log

  • AddOn:

I found errors in the following addon tests:

cmsDriver.py TTbar_8TeV_TuneCUETP8M1_cfi --conditions auto:run1_mc --fast -n 100 --eventcontent AODSIM,DQM --relval 100000,1000 -s GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,EI,HLT:@Fake,VALIDATION --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --datatier GEN-SIM-DIGI-RECO,DQMIO --beamspot Realistic8TeVCollision : FAILED - time: date Wed Dec 28 20:36:01 2016-date Wed Dec 28 20:33:45 2016 s - exit: 16640
cmsDriver.py RelVal -s HLT:Fake,RAW2DIGI,L1Reco,RECO --mc --scenario=pp -n 10 --conditions auto:run1_mc_Fake --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --processName=HLTRECO --filein file:RelVal_Raw_Fake_MC.root --fileout file:RelVal_Raw_Fake_MC_HLT_RECO.root : FAILED - time: date Wed Dec 28 20:40:28 2016-date Wed Dec 28 20:34:11 2016 s - exit: 16640

@cmsbuild
Copy link
Contributor

Comparison not run due to runTheMatrix errors (RelVals and Igprof tests were also skipped)

@rafaellopesdesa
Copy link
Contributor Author

@slava77 I will apply the corrections in the code review. I have some comments about the other questions here:

  • I tried to run the workflows that failed locally. They failed because the regressions cannot be found in the mcRun1 GlobalTags (I ran adding the regressions by hand). I have contacted the AlCa conveners to check why the mcRun2 queues were updated but not the mcRun1 ones.
  • Yes, we could have used a more natural definition of the uncertainty in the track momentum. In particular error on qOverp is an excellent suggestion. However, the training has already be done with this (rather convoluted, I agree) definition. We will keep your suggestion in mind for the next round.
  • regressionWeights_cfi.py is not necessary. I just updated it for "historical reasons". It was there before and I updated it pointing to the new names (which supposedly are in the GT). If preferred, I can remove this configuration from RecoEgamma.

Should I go updating the code with your review or should I wait the confirmation from AlCa that the mcRun1 queues have been committed as well before doing that?

@slava77
Copy link
Contributor

slava77 commented Dec 28, 2016 via email

@rafaellopesdesa
Copy link
Contributor Author

About the suggestion of how to deal with the 80X PR, I think I prefer the file structure as it is right now. I understand that is less ideal for code review, but I think it is easier for us in EGM to find things in the future. As I've explained in the 80X PR, EgammaAnalysis has been traditionally used for user-level applications, and I was hoping to be faithful to this idea.

Let's see how these PR go. Tomorrow I will update the 80X PR with the improvements discussed today in this thread.

@mmusich
Copy link
Contributor

mmusich commented Dec 30, 2016

@rafaellopesdesa @slava77
about:

I tested other run1 workflows, and all seem to fail because of the lack of the payloads in the GT... as I mentioned above, I wrote to @mmusich about this issue. When the GTs are update we proceed with the tests here. :)

the updates are being tested now in #17104. I suggest that commit cms-AlCaDB@aee8c69 is pulled in together with the rest of the code changes requested during the review to be able to execute tests.
Cheers,

M.

@slava77
Copy link
Contributor

slava77 commented Jan 9, 2017

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 9, 2017

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/17219/console Started: 2017/01/09 23:06

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 9, 2017

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 9, 2017

Comparison job queued.

@cmsbuild
Copy link
Contributor

@slava77
Copy link
Contributor

slava77 commented Jan 11, 2017

@rafaellopesdesa
I was looking at slides in the EGM/ECAL meeting today.
Should I interpret that you will soon update the payloads or/and the variables used in training?
Please clarify.
Thank you.


const std::vector<std::string> ph_condnames_mean = (bunchspacing_ == 25) ? ph_conf.condnames_mean_25ns : ph_conf.condnames_mean_50ns;
const std::vector<std::string> ph_condnames_sigma = (bunchspacing_ == 25) ? ph_conf.condnames_sigma_25ns : ph_conf.condnames_sigma_50ns;
const std::vector<std::string> ph_condnames_ecalonly_mean = ph_conf.condnames_ecalonly_mean;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use const std::vector<std::string>& ph_condnames_ecalonly_mean = ph_conf.condnames_ecalonly_mean; to avoid unnecessary copies

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for all other instances in this file

std::array<float, 33> eval;
int nSaturatedXtals = 0;
std::vector< std::pair<DetId, float> > hitsAndFractions = theseed->hitsAndFractions();
for (auto hitFractionPair : hitsAndFractions) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const std::vector< std::pair<DetId, float> >& hitsAndFractions
for (auto const& hitFractionPair : hitsAndFractions


if(theseed == pclus )
size_t i_cluster = 0;
for( auto clus = the_sc->clustersBegin(); clus != the_sc->clustersEnd(); ++clus ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just for (auto const& pclus: the_sc->clusters() ) ?

//magic numbers for MINUIT-like transformation of BDT output onto limited range
//(These should be stored inside the conditions object in the future as well)
constexpr double meanlimlow = 0.2;
constexpr double meanlimhigh = 2.0;
constexpr double meanlimlow = -1.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the comment about saving these magic numbers in the conditions did not serve the purpose.
Better happen next time.


int nSaturatedXtals = 0;
std::vector< std::pair<DetId, float> > hitsAndFractions = theseed->hitsAndFractions();
for (auto hitFractionPair : hitsAndFractions) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above, add & to avoid copies

edm::Ptr<reco::CaloCluster> pclus;
// loop over all clusters that aren't the seed
size_t i_cluster = 0;
for( auto clus = the_sc->clustersBegin(); clus != the_sc->clustersEnd(); ++clus ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above, range loop can be simpler

@rafaellopesdesa
Copy link
Contributor Author

rafaellopesdesa commented Jan 11, 2017 via email

@slava77
Copy link
Contributor

slava77 commented Jan 16, 2017

@rafaellopesdesa
I just wanted to check the progress here.

@slava77
Copy link
Contributor

slava77 commented Jan 27, 2017

ping
is this still needed?
(I know there were other things to be taken care of)

@slava77
Copy link
Contributor

slava77 commented Feb 1, 2017

(almost) weekly ping.
@rafaellopesdesa
Please update on the status of this PR

Thank you.

@rafaellopesdesa
Copy link
Contributor Author

@slava77 Iterating with AlCA to add the payloads and just made an associated PR so that I don't have to re-calculate values when applying the regression.

@rafaellopesdesa
Copy link
Contributor Author

Closing and merging with #17370 with the corrections required in both reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants