New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New style GEDelectron and GEDphoton regressions to EGM reco sequence in 90X #17101
New style GEDelectron and GEDphoton regressions to EGM reco sequence in 90X #17101
Conversation
A new Pull Request was created by @rafaellopesdesa (Rafael Lopes de Sa) for CMSSW_9_0_X. It involves the following packages: RecoEgamma/EgammaTools @cmsbuild, @cvuosalo, @slava77, @davidlange6 can you please review it and eventually sign? Thanks. cms-bot commands are listed here #13028 |
@cmsbuild please test |
The tests are being triggered in jenkins. |
|
||
bool useLocalFile_; | ||
std::string addressLocalFile_; | ||
TFile* pointerLocalFile_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
additional open files during execution should be avoided as much as possible.
Open the file, read what's needed from it into memory and then close the file.
Do this if memory profile allows.
useLocalFile_ = conf.getParameter<bool>("useLocalFile"); | ||
if (useLocalFile_) { | ||
addressLocalFile_ = conf.getParameter<std::string>("addressLocalFile"); | ||
pointerLocalFile_ = TFile::Open(addressLocalFile_.c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use FileInPath
|
||
std::array<float, 33> eval; | ||
Int_t N_SATURATEDXTALS = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use base C++ types.
Also, all caps for a variable is a bad style, it's usually used for compiler macros and some constants.
eval[0] = raw_energy; | ||
eval[1] = the_sc->etaWidth(); | ||
eval[2] = the_sc->phiWidth(); | ||
eval[3] = full5x5_ess.e5x5/raw_energy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check for full5x5_ess.e5x5 to be non-zero and set all values with potential divisions by zero to something sensible below. (maybe define e5x5Inverse = full5x5_ess.e5x5 != 0 ? 1.f/full5x5_ess.e5x5 : 0
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please define rawInverse
in a similar way. It's used multiple times here (in [3] and then in [25-27])
float ptMode = el_track->ptMode(); | ||
float ptModeErrror = el_track->ptModeError(); | ||
float etaModeError = el_track->etaModeError(); | ||
float pModeError = sqrt(ptModeErrror*ptModeErrror*cosh(trkEta)*cosh(trkEta) + ptMode*ptMode*sinh(trkEta)*sinh(trkEta)*etaModeError*etaModeError); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use qoverpModeError instead of this rather unnatural value?
from CondCore.DBCommon.CondDBSetup_cfi import * | ||
import FWCore.ParameterSet.Config as cms | ||
|
||
def regressionWeights(process): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this file needed if you already have the GT updated in the release?
-1 Tested at: ea7d9b8 You can see the results of the tests here: I found follow errors while testing this PR Failed tests: RelVals AddOn
When I ran the RelVals I found an error in the following worklfows: runTheMatrix-results/5.1_TTbar+TTbarFS+HARVESTFS/step1_TTbar+TTbarFS+HARVESTFS.log9.0 step3 runTheMatrix-results/9.0_Higgs200ChargedTaus+Higgs200ChargedTaus+DIGI+RECO+HARVEST/step3_Higgs200ChargedTaus+Higgs200ChargedTaus+DIGI+RECO+HARVEST.log25.0 step3 runTheMatrix-results/25.0_TTbar+TTbar+DIGI+RECOAlCaCalo+HARVEST+ALCATT/step3_TTbar+TTbar+DIGI+RECOAlCaCalo+HARVEST+ALCATT.log
I found errors in the following addon tests: cmsDriver.py TTbar_8TeV_TuneCUETP8M1_cfi --conditions auto:run1_mc --fast -n 100 --eventcontent AODSIM,DQM --relval 100000,1000 -s GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,EI,HLT:@Fake,VALIDATION --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --datatier GEN-SIM-DIGI-RECO,DQMIO --beamspot Realistic8TeVCollision : FAILED - time: date Wed Dec 28 20:36:01 2016-date Wed Dec 28 20:33:45 2016 s - exit: 16640 |
Comparison not run due to runTheMatrix errors (RelVals and Igprof tests were also skipped) |
@slava77 I will apply the corrections in the code review. I have some comments about the other questions here:
Should I go updating the code with your review or should I wait the confirmation from AlCa that the mcRun1 queues have been committed as well before doing that? |
On 12/28/16 12:43 PM, Rafael Lopes de Sa wrote:
@slava77 <https://github.com/slava77> I will apply the corrections in
the code review. I have some comments about the other questions here:
* I tried to run the workflows that failed locally. They failed
because the regressions cannot be found in the mcRun1 GlobalTags (I
ran adding the regressions by hand). I have contacted the AlCa
conveners to check why the mcRun2 queues were updated but not the
mcRun1 ones.
OK.
Assuming the payloads are available, it should be straightforward to get
the updates in.
* Yes, we could have used a more natural definition of the uncertainty
in the track momentum. In particular error on qOverp is an excellent
suggestion. However, the training has already be done with this
(rather convoluted, I agree) definition. We will keep your
suggestion in mind for the next round.
Can you check the math with a few printouts for what goes into the
training inputs
using qOverP and this pError.
It could be that the final value that's computed will be the same, to
within numerical precision.
The problem is that you are making a numerical roundtrip from qOverP to
pt and eta and then back to p, which is slow and can be a source of bad
values for no good reason.
* regressionWeights_cfi.py is not necessary. I just updated it for
"historical reasons". It was there before and I updated it pointing
to the new names (which supposedly are in the GT). If preferred, I
can remove this configuration from RecoEgamma.
Please remove it.
Should I go updating the code with your review or should I wait the
confirmation from AlCa that the mcRun1 queues have been committed as
well before doing that?
Both will be necessary to proceed with this PR.
It's up to you to wait or update the code now.
Is it correct to assume that you will not need to support the old
training with the old variables in 90X?
Do we now have a copy of the new training processing code here and in
EgammaAnalysis?
If so, the copy in the analysis better be removed to reduce possible
confusion when someone just keeps using 80X recipes on 91X.
…
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17101 (comment)>, or
mute the thread
<https://github.com/notifications/unsubscribe-auth/AEdcbser0grC3KOXRQVcakW-MHh6UhF7ks5rMsoOgaJpZM4LXHH4>.
|
About the suggestion of how to deal with the 80X PR, I think I prefer the file structure as it is right now. I understand that is less ideal for code review, but I think it is easier for us in EGM to find things in the future. As I've explained in the 80X PR, EgammaAnalysis has been traditionally used for user-level applications, and I was hoping to be faithful to this idea. Let's see how these PR go. Tomorrow I will update the 80X PR with the improvements discussed today in this thread. |
@rafaellopesdesa @slava77
the updates are being tested now in #17104. I suggest that commit cms-AlCaDB@aee8c69 is pulled in together with the rest of the code changes requested during the review to be able to execute tests. M. |
@cmsbuild please test |
The tests are being triggered in jenkins. |
Comparison job queued. |
@rafaellopesdesa |
|
||
const std::vector<std::string> ph_condnames_mean = (bunchspacing_ == 25) ? ph_conf.condnames_mean_25ns : ph_conf.condnames_mean_50ns; | ||
const std::vector<std::string> ph_condnames_sigma = (bunchspacing_ == 25) ? ph_conf.condnames_sigma_25ns : ph_conf.condnames_sigma_50ns; | ||
const std::vector<std::string> ph_condnames_ecalonly_mean = ph_conf.condnames_ecalonly_mean; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use const std::vector<std::string>& ph_condnames_ecalonly_mean = ph_conf.condnames_ecalonly_mean;
to avoid unnecessary copies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for all other instances in this file
std::array<float, 33> eval; | ||
int nSaturatedXtals = 0; | ||
std::vector< std::pair<DetId, float> > hitsAndFractions = theseed->hitsAndFractions(); | ||
for (auto hitFractionPair : hitsAndFractions) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const std::vector< std::pair<DetId, float> >& hitsAndFractions
for (auto const& hitFractionPair : hitsAndFractions
|
||
if(theseed == pclus ) | ||
size_t i_cluster = 0; | ||
for( auto clus = the_sc->clustersBegin(); clus != the_sc->clustersEnd(); ++clus ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just for (auto const& pclus: the_sc->clusters() )
?
//magic numbers for MINUIT-like transformation of BDT output onto limited range | ||
//(These should be stored inside the conditions object in the future as well) | ||
constexpr double meanlimlow = 0.2; | ||
constexpr double meanlimhigh = 2.0; | ||
constexpr double meanlimlow = -1.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the comment about saving these magic numbers in the conditions did not serve the purpose.
Better happen next time.
|
||
int nSaturatedXtals = 0; | ||
std::vector< std::pair<DetId, float> > hitsAndFractions = theseed->hitsAndFractions(); | ||
for (auto hitFractionPair : hitsAndFractions) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above, add &
to avoid copies
edm::Ptr<reco::CaloCluster> pclus; | ||
// loop over all clusters that aren't the seed | ||
size_t i_cluster = 0; | ||
for( auto clus = the_sc->clustersBegin(); clus != the_sc->clustersEnd(); ++clus ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above, range loop can be simpler
Hi Slava,
Yes, Fabrice pointed to me an inconsistency between the training code and
application code, so a small update will be introduced in the coming days.
I will also reply to your more recent code review concomitantly.
Thanks again,
-- Rafael.
…On Thu, Jan 12, 2017 at 5:12 AM, Slava Krutelyov ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In RecoEgamma/EgammaTools/plugins/EGExtraInfoModifierFromDB.cc
<#17101 (review)>:
> + eval[22] = full5x5_pss.e2x5Bottom*e5x5Inverse;
+ eval[23] = nSaturatedXtals;
+ eval[24] = std::max(0,numberOfClusters);
+
+ // calculate sub-cluster variables
+ std::vector<float> clusterRawEnergy;
+ clusterRawEnergy.resize(std::max(3, numberOfClusters), 0);
+ std::vector<float> clusterDEtaToSeed;
+ clusterDEtaToSeed.resize(std::max(3, numberOfClusters), 0);
+ std::vector<float> clusterDPhiToSeed;
+ clusterDPhiToSeed.resize(std::max(3, numberOfClusters), 0);
+
+ edm::Ptr<reco::CaloCluster> pclus;
+ // loop over all clusters that aren't the seed
+ size_t i_cluster = 0;
+ for( auto clus = the_sc->clustersBegin(); clus != the_sc->clustersEnd(); ++clus ) {
as above, range loop can be simpler
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17101 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFtCpHcPEK4re83Ripiq29rG-sVZSM4oks5rRTfKgaJpZM4LXHH4>
.
|
@rafaellopesdesa |
ping |
(almost) weekly ping. Thank you. |
@slava77 Iterating with AlCA to add the payloads and just made an associated PR so that I don't have to re-calculate values when applying the regression. |
Closing and merging with #17370 with the corrections required in both reviews. |
PR for new style of EGM energy correction and per-particle resolution regression. It is similar to PR #16968 which was intended for standalone application by users (and that is still under review). This one modifies the EGM reco sequence, so changes are expected in the validation plots, specially in plots related to electron and photon energy. This PR depends on the PR #17048, which I believe has already been merged, with the new name convention for the regressions in the GlobalTag. I've tested locally in details with workflow 250200.0 and the results were good.
Att: @fcouderc