Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply regression corrections from database in PFECAL-SuperClusters, decrease PF-SC seeding thresholds #972

Merged
merged 11 commits into from Oct 29, 2013

Conversation

lgray
Copy link
Contributor

@lgray lgray commented Oct 2, 2013

This branch includes an update to apply a BDT regression to found PFECAL-SuperClusters in an event.
The penalty to timing is marginal, about 0.003 seconds are added for this module per event in the ttbar sample.

We should see improved energy resolution for superclusters associated to electrons and photons.
Not sure of the impact on soft clusters.

The associated .db file will be moved into databases soon, for now it reads from a local file. This will be changed once it's in the GlobalTags.

This PR is also pending some final validation from the developer of the regression weights.

Also:
barrel seeding threshold goes from 3 GeV to 1 GeV
endcap seeding threshold goes from 5 GeV to 1 Gev
There is a corresponding increase in superclustering timing.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 2, 2013

A new Pull Request was created by @lgray (Lindsey Gray) for CMSSW_7_0_X.

Apply regression corrections from database in PFECAL-SuperClusters

It involves the following packages:

RecoEcal/EgammaClusterProducers
RecoEcal/EgammaClusterAlgos
RecoEgamma/EgammaTools

@nclopezo, @smuzaffar, @thspeer, @slava77 can you please review it and eventually sign? Thanks.
You can sign-off by replying to this message having '+1' in the first line of your reply.
You can reject by replying to this message having '-1' in the first line of your reply.
@ktf you are the release manager for this.

@@ -116,21 +116,21 @@ class GBRWrapperMaker : public edm::EDAnalyzer {
// iSetup.get<SetupRecord>().get(pSetup);
// #endif
//from Josh:
TFile *infile = new TFile("/afs/cern.ch/user/b/bendavid/cmspublic/gbrv3ph.root","READ");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this code ever called other than in user tests?
(I recall seeing it once and back then concluded it's not used).
If you need the file, please request a corresponding external and add this file in package/data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this code is only ever called from the test directory of this package.
Can move this to test/ as well if you want?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lindsey
Yes, please, move to test.
It's a bit disturbing to see private directories/files used in the general code area.
/test is much more obviously for testing

@slava77
Copy link
Contributor

slava77 commented Oct 2, 2013

HI Lindsey
pfecalsc_regression_weights_01092013.db and all the stuff in RecoEcal/EgammaClusterProducers/data/ is becoming big enough to trigger a request to make it into an external.
What's the prospect to get the .db file to the global tag (have you made all necessary requests with AlCa already)?

Slava

@lgray
Copy link
Contributor Author

lgray commented Oct 2, 2013

Hi Slava,

The .db file is only temporary, it's meant for testing. I'll send it in for inclusion to in the global tags tomorrow.

-Lindsey

@lgray
Copy link
Contributor Author

lgray commented Oct 2, 2013

@apfeiffer1 @ggovi Could you guys please grab the .db file in this PR and integrate it into the global tag?

@apfeiffer1
Copy link
Contributor

... oops, I almost overlooked this one, this kind of requests should
definitely NOT go here, especially since this PR is not related to AlCaDB
in the first place, I got this only accidentally.

The short answer is sorry, no, we can't. :(

Assuming that this file contains new payloads, please use the dropbox
service to upload them to their proper tag in the DBs. Once this is done,
you can request them to be included into a global tag of your choice via
the Global Tag Collector, following the standard procedure.

Please let us know if you find any problems with these procedures.

On Wed, Oct 2, 2013 at 7:33 PM, Lindsey Gray notifications@github.comwrote:

@apfeiffer1 https://github.com/apfeiffer1 @ggovihttps://github.com/ggoviCould you guys please grab the .db file in this PR and integrate it into
the global tag?


Reply to this email directly or view it on GitHubhttps://github.com//pull/972#issuecomment-25559240
.

Thanks,
cheers, andreas

@lgray
Copy link
Contributor Author

lgray commented Oct 4, 2013

@apfeiffer1 Could you please give me a link to the dropbox and standard procedures, I have never gone through them before.

Thanks!

@slava77
Copy link
Contributor

slava77 commented Oct 4, 2013

@lgray
Copy link
Contributor Author

lgray commented Oct 4, 2013

@slava77 Thanks. In the mean time while I get this stuff in the db, let's test to make sure we see the improvements we expect?

@apfeiffer1
Copy link
Contributor

Thanks, Slava!

And more info on handling the global tags is at:

https://twiki.cern.ch/twiki/bin/viewauth/CMS/AlCaDB#Global_Tags

(which includes the link to the DropBox and one to the Procedure to submit
user database tags in the
GThttps://twiki.cern.ch/twiki/bin/view/CMS/UserTagsInTheGTProcedure
)

HTH,
cheers, andreas

On Fri, Oct 4, 2013 at 9:13 AM, slava77 notifications@github.com wrote:

https://twiki.cern.ch/twiki/bin/view/CMS/DropBox


Reply to this email directly or view it on GitHubhttps://github.com//pull/972#issuecomment-25680206
.

Thanks,
cheers, andreas

@lgray
Copy link
Contributor Author

lgray commented Oct 4, 2013

With respect to "all the other stuff" becoming big enough to be an external, I think a fair amount of the other stuff in data/ is no longer used or can be phased out. I will check

@slava77
Copy link
Contributor

slava77 commented Oct 4, 2013

OK
@slava77 testing

@slava77
Copy link
Contributor

slava77 commented Oct 4, 2013

Lindsey,

I was hoping that at least
"runTheMatrix.py -s --useInput all"
is run, before a PR is submitted
Almost all of it is failing
(I'm using CMSSW_7_0_X_2013-10-03-1400)

e.g. in workflow 1000.0 step2:
[2] Calling event method for module PFECALSuperClusterProducer/'particleFlowSuperClusterECAL'
Exception Message:
Principal::getByToken: Found zero products matching all criteria
Looking for type: std::vectorreco::Vertex
Looking for module label: offlinePrimaryVertices
Looking for productInstanceName:

process.offlinePrimaryVertices is a part of process.vertexreco
process.particleFlowSuperClusterECAL is a part of process.particleFlowSuperClusteringSequence
which is a part of process.ecalClusters.
These are not set in the right order in the globalreco sequence
process.ecalClusters+process.caloTowersRec+process.vertexreco
This is the reason for the failure.

Please, swap the two in your topic branch, test the short matrix (command above) and then update this pull request.

Thanks

Slava

@lgray
Copy link
Contributor Author

lgray commented Oct 4, 2013

@slava77 Ahh, it passed for me since I was doing a rereco. I'll swap them out. Thanks.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 4, 2013

Pull request #972 was updated. @smuzaffar, @nclopezo, @vlimant, @franzoni, @thspeer, @slava77, @fabiocos can you please check and sign again.

@lgray
Copy link
Contributor Author

lgray commented Oct 4, 2013

Sure, I'll fix it. Vestige of original implementation.

@@ -15,6 +57,10 @@
#PFClusters collection
PFClusters = cms.InputTag("particleFlowClusterECAL"),
PFClustersES = cms.InputTag("particleFlowClusterPS"),
vertexCollection = cms.InputTag("offlinePrimaryVertices"),
#rechit collections for lazytools
reducedEcalRecHitsEB = cms.InputTag('ecalRecHit','EcalRecHitsEB'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK, these are new parameters that were not here before.
My last comment was based only on the last commit changes.
That made it look like input tags have changed wrt the current values already in the release.

@slava77
Copy link
Contributor

slava77 commented Oct 7, 2013

Lindsey,

Looking at the object diffs, I see that only energy() changes in
particleFlowSuperClusterECAL_particleFlowSuperClusterECALBarrel
particleFlowSuperClusterECAL_particleFlowSuperClusterECALEndcapWithPreshower

all_sign257vsorig_ttbarpuwf202p0c_log10recosuperclusters_particleflowsuperclusterecal_particleflowsuperclusterecalbarrel_reco_obj_energy

The plot is for the TTbar PU sample, workflow 202.0

What's surprising is that this doesn't affect the downstream objects:
recoPFCandidates_particleFlowEGamma__RECO.obj.pt() is the same.
Did I miss something in the relationship between the particleFlowSuperClusterECAL energy
and the particleFlowEGamma kinematics?

Timing looks about the same.

@lgray
Copy link
Contributor Author

lgray commented Oct 7, 2013

@slava77 The mustache SCs aren't yet used in PFEG-algo, so you wouldn't see any change until we flipped the switch for the inputs. It would be more informative to take a look at E_reco/E_true or di-SuperCluster mass widths using just the superclusters. Which sample is the above plot from

@slava77
Copy link
Contributor

slava77 commented Oct 7, 2013

That was a ttbar PU (I updated the comment).

OK for the no change in particleFlowEGamma, got it.

Is the size of the change in the energy expected? Note that this is a log10(energy) plot. That's a pretty large correction.

@lgray
Copy link
Contributor Author

lgray commented Oct 7, 2013

For the barrel the correction tends to be > 1, for endcap somewhat < 1. The pT threshold for the EB/EE is 3/5 GeV.
Jean-Baptiste (the who made the corrections), is checking some weird inconsistencies between 620 and 700 Mustache SCs at the moment as well.

@lgray
Copy link
Contributor Author

lgray commented Oct 9, 2013

@slava77

Matteo Sani reports that he sees improved resolution when running the SCs in the HLT, with a few bins worse possibly due to statistical fluctuations, checking further. Waiting for word from Emanuele di Marco for the offline electron seeding.

I want to wait a bit before putting the .db in the dropbox, we need to make sure we don't need separate regressions for offline/online or other things along those lines.

@lgray
Copy link
Contributor Author

lgray commented Oct 21, 2013

@slava77 Make sure timing doesn't go nuts from lowered threshold. We need to lower the threshold otherwise we see turn-on effects up to very high pT in the endcaps, O(15 GeV).

@cmsbuild
Copy link
Contributor

Pull request #972 was updated. @smuzaffar, @nclopezo, @vlimant, @franzoni, @thspeer, @slava77, @fabiocos can you please check and sign again.

@slava77
Copy link
Contributor

slava77 commented Oct 25, 2013

@slava77 working on it

@slava77
Copy link
Contributor

slava77 commented Oct 25, 2013

-1

This needs a rebase on top of something more fresh.
Primarily, the interference is with #1078

@lgray Lindsey, please fix.

Thanks

@lgray
Copy link
Contributor Author

lgray commented Oct 26, 2013

@slava77 fixed, matrix tests passed

@cmsbuild
Copy link
Contributor

Pull request #972 was updated. @smuzaffar, @nclopezo, @vlimant, @franzoni, @thspeer, @slava77, @fabiocos can you please check and sign again.

@cmsbuild
Copy link
Contributor

Pull request #972 was updated. @smuzaffar, @nclopezo, @vlimant, @franzoni, @thspeer, @slava77, @fabiocos can you please check and sign again.

@slava77
Copy link
Contributor

slava77 commented Oct 28, 2013

+1

Tested f808ec7
in CMSSW_7_0_X_2013-10-25-0200 as sign257

DQM
physics: no change
Timing:
(the total time per reco path looks the same)
(looking at diffs >5% and > 300ms per job)
particleFlowSuperClusterECAL 2.23202 ms/ev -> 7.9764 ms/ev
compared to pre- #1078 (~20 ms/ev running with higher thresholds),
this is a manageable increase in timing
validate.C
diffs only in particleFlowSuperClusterECAL_ collections
pt threshold increase (see above) shows up very significantly
there are practically no differences in superclusters above ~20 GeV

all_sign257vsorig_ttbarpuwf202p0c_log10recosuperclusters_particleflowsuperclusterecal_particleflowsuperclusterecalbarrel_reco_obj_energy

@lgray
Lindsey, please update the PR main comment, specifically, to add a note on the decreased thresholds

  • thresh_PFClusterSeedBarrel = cms.double(3.0),
  • thresh_PFClusterSeedBarrel = cms.double(1.0),
  • thresh_PFClusterSeedEndcap = cms.double(5.0),
  • thresh_PFClusterSeedEndcap = cms.double(1.0),

@ktf
Copy link
Contributor

ktf commented Oct 29, 2013

@lgray I'll wait for the above mentioned documentation update before merging this.

@lgray
Copy link
Contributor Author

lgray commented Oct 29, 2013

@ktf it is already updated.

@@ -116,21 +116,21 @@ class GBRWrapperMaker : public edm::EDAnalyzer {
// iSetup.get<SetupRecord>().get(pSetup);
// #endif
//from Josh:
TFile *infile = new TFile("/afs/cern.ch/user/b/bendavid/cmspublic/gbrv3ph.root","READ");
TFile *infile = new TFile("/afs/cern.ch/user/l/lgray/work/public/CMSSW_7_0_0_pre4/src/RecoEgamma/EgammaTools/test/GBR_Clustering_PFMustache_results.root","READ");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to go somewhere central.

ktf added a commit that referenced this pull request Oct 29, 2013
Reco updates -- Apply regression corrections from database in PFECAL-SuperClusters, decrease PF-SC seeding thresholds
@ktf ktf merged commit 2f8f663 into cms-sw:CMSSW_7_0_X Oct 29, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants