Calo towers speed-up #2240

ktf · 2014-01-30T15:24:26Z

Forward port of #1885 which itself is a rebase of #1404.

…serve (sic)

VinInn · 2014-01-30T15:26:13Z

for the history please refer to 1885 and 1404

cmsbuild · 2014-01-30T15:27:51Z

A new Pull Request was created by @ktf (Giulio Eulisse) for CMSSW_7_1_X.

Calo towers speed-up

It involves the following packages:

DataFormats/CaloTowers
DataFormats/Candidate
DataFormats/Common
Geometry/CaloGeometry
Geometry/CaloTopology
RecoJets/JetProducers
RecoLocalCalo/CaloTowersCreator
RecoLocalCalo/EcalRecAlgos

@civanch, @Dr15Jones, @ianna, @mdhildreth, @cmsbuild, @anton-a, @thspeer, @slava77, @vadler, @Degano, @ktf, @nclopezo can you please review it and eventually sign? Thanks.
@ghellwig, @yslai, @TaiSakuma, @wmtan this is something you requested to watch as well.
You can sign-off by replying to this message having '+1' in the first line of your reply.
You can reject by replying to this message having '-1' in the first line of your reply.
@ktf you are the release manager for this.
You can merge this pull request by typing 'merge' in the first line of your comment.

Dr15Jones · 2014-01-30T15:42:08Z

+1

vadler · 2014-01-30T16:09:42Z

+1
Not explixitly tested, but code changes for AT (LeafCandidate) are identical to what has been approved in #1885 .

cmsbuild · 2014-01-30T17:54:53Z

+1
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2240/2/summary.html

VinInn · 2014-02-26T08:08:34Z

sorry Slava, are you sure that the plot you refer to was run with my latest correction e76ebb0 ?
I cannot access the latest results from 1885
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1885/2120/summary.html
and the one from this PR are not available (yet, @ktf ???)
I have verified after e76ebb0 and the number are identical BUT for recovered xstals
I know that it is more than a month old (it is difficult also for me to go back and recollect what was down and why), please look back to my comments in 1885
just above the whole set of +1 and the notification of my last commit.

about logic: I think you missed something not so big but relevant:
Old code: numBadEcalCells (access is odd/slow) is the number of constituents that are in the hit-collections and bad + those that are NOT in the collection and have the hit severity flag or the channel (DB) flag to be excluded
please look at the logic in
cmssdt.cern.ch/SDT/lxr/source/RecoLocalCalo/EcalRecAlgos/src/EcalSeverityLevelAlgo.cc#087
@argiro can give you details

new code tries to emulate this logic

could you please run your DQM test again?

ktf · 2014-02-26T08:21:56Z

sorry Slava, are you sure that the plot you refer to was run with my
latest correction e76ebb0 ?
I cannot access the latest results from 1885
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1885/2120/summary.html
and the one from this PR are not available (yet, @ktf ???)

@nclopezo can you check what is going on?

Ciao,
Giulio

nclopezo · 2014-02-26T10:22:16Z

Hi,
The results for #1885 were one month old and they had been cleaned up. I started them again so you can see them. I also started again all the tests for this pull request.

cmsbuild · 2014-02-26T12:05:20Z

+1
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2240/292/summary.html

slava77 · 2014-02-26T12:45:44Z

Hi Vincenzo

I will let you know what I see in the validation with the latest IB shortly.

From the code it seems to me that it is not implemented to give logically the same result, while it seems possible to do so without going back to the old implementation. (I'm ignoring long-distance relationship between frequencies of channels vs hits being bad).

On your comment about the logic for the old

hit-collections and bad + those that are NOT in the collection and have the hit severity flag or the channel (DB) flag to be excluded
please look at the logic in
cmssdt.cern.ch/SDT/lxr/source/RecoLocalCalo/EcalRecAlgos/src/EcalSeverityLevelAlgo.cc#087

what I wrote is

numBadEcalCells (access is odd/slow) is the number of constituents that have the hit severity flag or the channel (DB) flag to be excluded

so, only the cells which have constituent hits that are "not.bad and pass the threshold" are checked for the "hit or channel" (same as badHit +((not badHit) && channelBad)) severity. In your case, as I see it, you are also adding "channel" severity for hits that are bad (and which severity is not listed in the severities to exclude list) in the mt.numBad. This subset is probably empty, but it's still there in the code.

VinInn · 2014-02-26T12:48:30Z

hope that
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2240/292/PARAM_BASELINE_COMPARISON
will show up soon

VinInn · 2014-02-26T13:17:22Z

slava, I do not follow you
the loop at
https://cmssdt.cern.ch/SDT/lxr/source/RecoLocalCalo/CaloTowersCreator/src/CaloTowersCreationAlgo.cc#923
in on all constituents. It is independent on the fact that the tower has or do not has ecal-hits
that are "not.bad and pass the threshold".

nclopezo · 2014-02-26T13:24:14Z

Hi. The comparison results are ready here:

https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_7_1_X_2014-02-26-0200+2240/1501/

slava77 · 2014-02-26T13:25:05Z

the loop at
https://cmssdt.cern.ch/SDT/lxr/source/RecoLocalCalo/CaloTowersCreator/src/CaloTowersCreationAlgo.cc#923
in on all constituents.

Right, that's what I said in my summary.

Constituents are counted here
https://cmssdt.cern.ch/SDT/lxr/source/RecoLocalCalo/CaloTowersCreator/src/CaloTowersCreationAlgo.cc#547
only if

if (chStatusForCT != CaloTowersCreationAlgo::BadChan && passEmThreshold) {

In your proposal you also increment numBad in https://github.com/cms-sw/cmssw/pull/2240/files#diff-161bc85b60d87e850649585423571364R546 for hits that wouldn't make it to the constituents list

VinInn · 2014-02-26T13:28:33Z

I see only
https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_7_1_X_2014-02-26-0200+2240/1501/1000.0_RunMinBias2011A+RunMinBias2011A+TIER0+SKIMD+HARVESTDfst2+ALCASPLIT/comparisonResults/Hcal.html
which Ido not expect coming from CaloTower

slava77 · 2014-02-26T13:34:13Z

Based on the latest IB, I can confirm that, as you expect, the regression in the bad ecal cells is much smaller, but it's still there.

Here is a result for 2K single-ele 1 TeV sample (16.0)

Here, almost 1% (578/2/40E3) calo towers have differences in the bad cells and this is MC.

The other set of plots that had regressions previously is still there

What's happening here?

VinInn · 2014-02-26T13:35:18Z

line 546
will never been reached for Xstals not in the constituent map because of
https://github.com/ktf/cmssw/blob/rebase-1404/RecoLocalCalo/CaloTowersCreator/src/CaloTowersCreationAlgo.cc#L536

VinInn · 2014-02-26T13:38:57Z

at line 547 are not constituents of the tower. these are hits in the current tower-object.
I consider the regression in number of bad Ecal EE expected. Are due to recovered channels in EE.
The old code was not accounting for those. I consider that a defect that I corrected..

I do not remember the hLT regression (also because I tested mainly on HLT)
and to say the truth is pretty bizarre: the only distribution presenting a difference? ( a sort of reflection in eta)

VinInn · 2014-02-26T13:47:36Z

sorry slava, can you give me the html link
in
https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_7_1_X_2014-02-26-0200+2240/1501/1000.0_RunMinBias2011A+RunMinBias2011A+TIER0+SKIMD+HARVESTDfst2+ALCASPLIT/comparisonResults/
I do not find any regression, where the Tau HLT plot comes from?

VinInn · 2014-02-26T13:53:20Z

Slava: two electrons back to back?
Are you arguing on the fact that the "leading L1 tau" is flipping?

slava77 · 2014-02-26T14:02:34Z

Vincenzo

on badCells. From your follow up on my code comments, I understand that I missed a few points.
(good, I miss things; I've spent two hours last night getting through this spaghetti and clearly followed wrong leads). What is the reason for the remaining regression? 1% of towers sounds a significant effect to me.
on the taus. Yes, I suspect flip-flopping and don't consider it a major issue.

VinInn · 2014-02-26T14:11:12Z

Not tell to me about the spagettis. As you can imagine I had to follow way many more leads to be able to fix performance and keep behavior the same!

What is the reason for the remaining regression? 1% of towers sounds a significant effect to me.

different handling of recovered Xstals. In the old code they are not used and NOT counter bad.
In my opinion was wrong: essentially you may have a tower will all 25 xstal not used and this is not recorded.

In my code they are counted bad (also because is difficult to do it differently).
As explained to me by @argiro they recover Xstas dividing the energy in the trigger read-out among all Xstals belonging to a trigger-channel.
In barrel Ecal-trigger and Calo_tower coincides: you shall see a small red bump at 25: each time a calotower has all Xstals recovered.
In end cap the overlap is not perfect, so you see more single bad Xstals (sometime 4 sometime 7) popping up as bad.

In future we may wish to use the recovered energy: after all if it is ok for PF and Ecal, why should be bad for CaloTower?

slava77 · 2014-02-27T03:19:04Z

looking at other tests: 17.0 single ele35 has some large drop in PFJets at the edge of the tracker.
Can this still be blamed on numerical?

anton-a · 2014-02-27T07:01:29Z

From the point of view of rechit selection and calotower energy assignment the old/new code look consistent. There is indeed an omission in the counting of bad EE/EB cells in the old code due to not adding the "recovered" cells in the count of "bad" when the flag is set not to use them. This is literally a two-line fix. After implementing this fix and comparing the "fixed old code" to PR2240 the count of bad cells is identical as they both follow the proper and intended logic now. The test was done with wf16 - same as in Slava's example of the discrepancy.

At the same time, the implementation of the bad channel counting in PR2240 still needs to be modified. The current counting logic cannot handle properly the case when recovered EE/EB channels are set to be used (they will be counted both as recovered and bad). I would also suggest that the logic behind the handling of the count of bad channels for the currently active configuration (not to use recovered channels) is made more transparent and robust. It works if channels marked as bad in the DB can only enter the rechit collection as recovered. While this is reasonable, it may not cover some future changes of DB content. Furthermore, the logic is not transparent in the code. In worst case, please make clear comments on the assumed relation between flags in DB and the rechit status word in the code for future reference.
Please also remove the tuples returning the status information if some of the second variable is not going to be used.

There are some other minor suggestions that can wait for future iterations.

In summary, I think that we are very close to closing the discussion. Hope we can address the weird discrepancy in PFJets and the tau trigger stuff...

VinInn · 2014-02-27T07:43:15Z

On 27 Feb, 2014, at 4:19 AM, slava77 notifications@github.com wrote:

looking at other tests: 17.0 single ele35 has some large drop in PFJets at the edge of the tracker.
What is the physical origin of those spikes in first place?
Can this still be blamed on numerical?

the binning is pretty compressed and the signal looks to be at a fixed eta (what is the generated distribution?) : I would say that the black plot looks suspicious to me…
is there also a phi and pt distribution? if we are loosing PFs they should show up everywhere….
again single here is really single or double back to back: I do not trust PF for perfectly back to back single particles

v.

VinInn · 2014-02-27T07:45:31Z

On 27 Feb, 2014, at 8:01 AM, anton-a notifications@github.com wrote:

From the point of view of rechit selection and calotower energy assignment the old/new code look consistent. There is indeed an omission in the counting of bad EE/EB cells in the old code due to not adding the "recovered" cells in the count of "bad" when the flag is set not to use them. This is literally a two-line fix. After implementing this fix and comparing the "fixed old code" to PR2240 the count of bad cells is identical as they both follow the proper and intended logic now. The test was done with wf16 - same as in Slava's example of the discrepancy.

At the same time, the implementation of the bad channel counting in PR2240 still needs to be modified. The current counting logic cannot handle properly the case when recovered EE/EB channels are set to be used (they will be counted both as recovered and bad). I would also suggest that the logic behind the handling of the count of bad channels for the currently active configuration (not to use recovered channels) is made more transparent and robust. It works if channels marked as bad in the DB can only enter the rechit collection as recovered. While this is reasonable, it may not cover some future changes of DB content. Furthermore, the logic is not transparent in the code. In worst case, please make clear comments on the assumed relation between flags in DB and the rechit status word in the code for future reference.

I think this can be done later.
Please also remove the tuples returning the status information if some of the second variable is not going to be used.

This is purely estelical. At the same level there is much more to be done in there!
There are some other minor suggestions that can wait for future iterations.

In summary, I think that we are very close to closing the discussion. Hope we can address the weird discrepancy in PFJets and the tau trigger stuff...

—

I suggest to merge this PR now and do the other changes later on as it fully satifly the needs oc current HLT and Reco.

v,

Reply to this email directly or view it on GitHub.

Il est bon de suivre sa pente, pourvu que ce soit en montant.
A.G.
http://www.flickr.com/photos/vin60/1320965757/

davidlange6 · 2014-02-27T09:19:38Z

Merging this - lets move further discussion of the details of CaloTowers to the reco meeting or other appropriate forum and meanwhile rely on the validation team to look in more detail in the context of pre4.

Calo towers speed-up

slava77 · 2014-02-27T12:29:45Z

On 2/27/14, 1:43 AM, Vincenzo Innocente wrote:

On 27 Feb, 2014, at 4:19 AM, slava77 notifications@github.com wrote:

looking at other tests: 17.0 single ele35 has some large drop in PFJets at the edge of the tracker.
What is the physical origin of those spikes in first place?

I haven't investigated this.

The gun parameters are
MaxPt = cms.double(35.01),
MinPt = cms.double(34.99),
PartID = cms.vint32(11),
MaxEta = cms.double(2.5),
MinEta = cms.double(-2.5),

So, we shouldn't expect anything outside 2.5.

Can this still be blamed on numerical?

the binning is pretty compressed and the signal looks to be at a fixed
eta (what is the generated distribution?) : I would say that the black
plot looks suspicious to me…
is there also a phi and pt distribution? if we are loosing PFs they
should show up everywhere….

right.
Still, it was worth showing, as these samples are showing up in relvals.

again single here is really single or double back to back: I do not
trust PF for perfectly back to back single particles

back-to-back.

v.

—
Reply to this email directly or view it on GitHub
#2240 (comment).

Vyacheslav (Slava) Krutelyov
TAMU: Physics Dept Texas A&M MS4242, College Station, TX 77843-4242
CERN: 42-R-027
AIM/Skype: siava16 googleTalk: slava77@gmail.com
(630) 291-5128 Cell (US) +41 76 275 7116 Cell (CERN)

Advance data for L1Trigger/ L1TCalorimeter and L1TGlobal.

VinInn and others added 23 commits January 6, 2014 12:19

store number of bad ecal channel in a map has done for hcal, add a re…

c35bec7

…serve (sic)

remove old code

9a2cf00

add emplace_back and pop_back, use them

f80e29c

cache backpoint in geometry

cb471ad

small printout to verify regressions

b0001c9

does not compile yet, moved from map to vector

619e33d

compiles

19daaf7

major regression fixed

08188cf

added constructor from flats, optimized the other

4f5a832

added move constituents

598ab61

use cartesian vector in float instead of polar lorenz in double

ce457e2

remove printout

5a6e39d

fixed regression in LeafCandidate constructor

a69c8b9

fix test

c6f6880

compute mass from difference of directions...

13b2df1

fix constructor

c2ef883

remove printout

cce64bb

fix test

fd5ef03

Add back PFRootEvent.

68c694b

Fix difference in Ghost Towers

d78f4fc

Check for negative pt when calculating et to work around rounding errors

4c6fd5d

Merged refs/pull/1885/head from repository cms-sw

bea4fba

account for bad ecal hits not in db

e76ebb0

nclopezo modified the milestones: CMSSW_7_1_0_pre3, CMSSW_7_1_0_pre2 Feb 5, 2014

davidlange6 added a commit that referenced this pull request Feb 27, 2014

Merge pull request #2240 from ktf/rebase-1404

8062bf4

Calo towers speed-up

davidlange6 merged commit 8062bf4 into cms-sw:CMSSW_7_1_X Feb 27, 2014

slava77 mentioned this pull request Feb 27, 2014

CaloTowers badCells slava77/cmssw#28

Open

nclopezo modified the milestones: CMSSW_7_1_0_pre5, CMSSW_7_1_0_pre4 Mar 10, 2014

ktf deleted the rebase-1404 branch June 4, 2014 07:33

ggovi pushed a commit to ggovi/cmssw that referenced this pull request Jan 11, 2017

Merge pull request cms-sw#2240 from degano/data_l1trigger_81X530

668075e

Advance data for L1Trigger/ L1TCalorimeter and L1TGlobal.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calo towers speed-up #2240

Calo towers speed-up #2240

ktf commented Jan 30, 2014

VinInn commented Jan 30, 2014

cmsbuild commented Jan 30, 2014

Dr15Jones commented Jan 30, 2014

vadler commented Jan 30, 2014

cmsbuild commented Jan 30, 2014

VinInn commented Feb 26, 2014

ktf commented Feb 26, 2014

nclopezo commented Feb 26, 2014

cmsbuild commented Feb 26, 2014

slava77 commented Feb 26, 2014

VinInn commented Feb 26, 2014

VinInn commented Feb 26, 2014

nclopezo commented Feb 26, 2014

slava77 commented Feb 26, 2014

VinInn commented Feb 26, 2014

slava77 commented Feb 26, 2014

VinInn commented Feb 26, 2014

VinInn commented Feb 26, 2014

VinInn commented Feb 26, 2014

VinInn commented Feb 26, 2014

slava77 commented Feb 26, 2014

VinInn commented Feb 26, 2014

slava77 commented Feb 27, 2014

anton-a commented Feb 27, 2014

VinInn commented Feb 27, 2014

VinInn commented Feb 27, 2014

davidlange6 commented Feb 27, 2014

slava77 commented Feb 27, 2014

Calo towers speed-up #2240

Calo towers speed-up #2240

Conversation

ktf commented Jan 30, 2014

VinInn commented Jan 30, 2014

cmsbuild commented Jan 30, 2014

Dr15Jones commented Jan 30, 2014

vadler commented Jan 30, 2014

cmsbuild commented Jan 30, 2014

VinInn commented Feb 26, 2014

ktf commented Feb 26, 2014

nclopezo commented Feb 26, 2014

cmsbuild commented Feb 26, 2014

slava77 commented Feb 26, 2014

VinInn commented Feb 26, 2014

VinInn commented Feb 26, 2014

nclopezo commented Feb 26, 2014

slava77 commented Feb 26, 2014

VinInn commented Feb 26, 2014

slava77 commented Feb 26, 2014

VinInn commented Feb 26, 2014

VinInn commented Feb 26, 2014

VinInn commented Feb 26, 2014

VinInn commented Feb 26, 2014

slava77 commented Feb 26, 2014

VinInn commented Feb 26, 2014

slava77 commented Feb 27, 2014

anton-a commented Feb 27, 2014

VinInn commented Feb 27, 2014

VinInn commented Feb 27, 2014

davidlange6 commented Feb 27, 2014

slava77 commented Feb 27, 2014