Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calo towers speed-up #2240

Merged
merged 23 commits into from Feb 27, 2014
Merged

Calo towers speed-up #2240

merged 23 commits into from Feb 27, 2014

Conversation

ktf
Copy link
Contributor

@ktf ktf commented Jan 30, 2014

Forward port of #1885 which itself is a rebase of #1404.

@VinInn
Copy link
Contributor

VinInn commented Jan 30, 2014

for the history please refer to 1885 and 1404

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @ktf (Giulio Eulisse) for CMSSW_7_1_X.

Calo towers speed-up

It involves the following packages:

DataFormats/CaloTowers
DataFormats/Candidate
DataFormats/Common
Geometry/CaloGeometry
Geometry/CaloTopology
RecoJets/JetProducers
RecoLocalCalo/CaloTowersCreator
RecoLocalCalo/EcalRecAlgos

@civanch, @Dr15Jones, @ianna, @mdhildreth, @cmsbuild, @anton-a, @thspeer, @slava77, @vadler, @Degano, @ktf, @nclopezo can you please review it and eventually sign? Thanks.
@ghellwig, @yslai, @TaiSakuma, @wmtan this is something you requested to watch as well.
You can sign-off by replying to this message having '+1' in the first line of your reply.
You can reject by replying to this message having '-1' in the first line of your reply.
@ktf you are the release manager for this.
You can merge this pull request by typing 'merge' in the first line of your comment.

@Dr15Jones
Copy link
Contributor

+1

@vadler
Copy link

vadler commented Jan 30, 2014

+1
Not explixitly tested, but code changes for AT (LeafCandidate) are identical to what has been approved in #1885 .

@cmsbuild
Copy link
Contributor

@nclopezo nclopezo modified the milestones: CMSSW_7_1_0_pre3, CMSSW_7_1_0_pre2 Feb 5, 2014
@VinInn
Copy link
Contributor

VinInn commented Feb 26, 2014

sorry Slava, are you sure that the plot you refer to was run with my latest correction e76ebb0 ?
I cannot access the latest results from 1885
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1885/2120/summary.html
and the one from this PR are not available (yet, @ktf ???)
I have verified after e76ebb0 and the number are identical BUT for recovered xstals
I know that it is more than a month old (it is difficult also for me to go back and recollect what was down and why), please look back to my comments in 1885
just above the whole set of +1 and the notification of my last commit.

about logic: I think you missed something not so big but relevant:
Old code: numBadEcalCells (access is odd/slow) is the number of constituents that are in the hit-collections and bad + those that are NOT in the collection and have the hit severity flag or the channel (DB) flag to be excluded
please look at the logic in
cmssdt.cern.ch/SDT/lxr/source/RecoLocalCalo/EcalRecAlgos/src/EcalSeverityLevelAlgo.cc#087
@argiro can give you details

new code tries to emulate this logic

could you please run your DQM test again?

@ktf
Copy link
Contributor Author

ktf commented Feb 26, 2014

sorry Slava, are you sure that the plot you refer to was run with my
latest correction e76ebb0 ?
I cannot access the latest results from 1885
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1885/2120/summary.html
and the one from this PR are not available (yet, @ktf ???)

@nclopezo can you check what is going on?

Ciao,
Giulio

@nclopezo
Copy link
Contributor

Hi,
The results for #1885 were one month old and they had been cleaned up. I started them again so you can see them. I also started again all the tests for this pull request.

@cmsbuild
Copy link
Contributor

@slava77
Copy link
Contributor

slava77 commented Feb 26, 2014

Hi Vincenzo

I will let you know what I see in the validation with the latest IB shortly.

From the code it seems to me that it is not implemented to give logically the same result, while it seems possible to do so without going back to the old implementation. (I'm ignoring long-distance relationship between frequencies of channels vs hits being bad).

On your comment about the logic for the old

hit-collections and bad + those that are NOT in the collection and have the hit severity flag or the channel (DB) flag to be excluded
please look at the logic in
cmssdt.cern.ch/SDT/lxr/source/RecoLocalCalo/EcalRecAlgos/src/EcalSeverityLevelAlgo.cc#087

what I wrote is

numBadEcalCells (access is odd/slow) is the number of constituents that have the hit severity flag or the channel (DB) flag to be excluded 

so, only the cells which have constituent hits that are "not.bad and pass the threshold" are checked for the "hit or channel" (same as badHit +((not badHit) && channelBad)) severity. In your case, as I see it, you are also adding "channel" severity for hits that are bad (and which severity is not listed in the severities to exclude list) in the mt.numBad. This subset is probably empty, but it's still there in the code.

@VinInn
Copy link
Contributor

VinInn commented Feb 26, 2014

@VinInn
Copy link
Contributor

VinInn commented Feb 26, 2014

slava, I do not follow you
the loop at
https://cmssdt.cern.ch/SDT/lxr/source/RecoLocalCalo/CaloTowersCreator/src/CaloTowersCreationAlgo.cc#923
in on all constituents. It is independent on the fact that the tower has or do not has ecal-hits
that are "not.bad and pass the threshold".

@nclopezo
Copy link
Contributor

@slava77
Copy link
Contributor

slava77 commented Feb 26, 2014

the loop at
https://cmssdt.cern.ch/SDT/lxr/source/RecoLocalCalo/CaloTowersCreator/src/CaloTowersCreationAlgo.cc#923
in on all constituents. 

Right, that's what I said in my summary.

Constituents are counted here
https://cmssdt.cern.ch/SDT/lxr/source/RecoLocalCalo/CaloTowersCreator/src/CaloTowersCreationAlgo.cc#547
only if

if (chStatusForCT != CaloTowersCreationAlgo::BadChan && passEmThreshold) {

In your proposal you also increment numBad in https://github.com/cms-sw/cmssw/pull/2240/files#diff-161bc85b60d87e850649585423571364R546 for hits that wouldn't make it to the constituents list

@slava77
Copy link
Contributor

slava77 commented Feb 26, 2014

Based on the latest IB, I can confirm that, as you expect, the regression in the bad ecal cells is much smaller, but it's still there.

Here is a result for 2K single-ele 1 TeV sample (16.0)
wf16_badcells_ee
Here, almost 1% (578/2/40E3) calo towers have differences in the bad cells and this is MC.

The other set of plots that had regressions previously is still there
wf16_l1leadingtau
What's happening here?

@VinInn
Copy link
Contributor

VinInn commented Feb 26, 2014

line 546
will never been reached for Xstals not in the constituent map because of
https://github.com/ktf/cmssw/blob/rebase-1404/RecoLocalCalo/CaloTowersCreator/src/CaloTowersCreationAlgo.cc#L536

@VinInn
Copy link
Contributor

VinInn commented Feb 26, 2014

at line 547 are not constituents of the tower. these are hits in the current tower-object.
I consider the regression in number of bad Ecal EE expected. Are due to recovered channels in EE.
The old code was not accounting for those. I consider that a defect that I corrected..

I do not remember the hLT regression (also because I tested mainly on HLT)
and to say the truth is pretty bizarre: the only distribution presenting a difference? ( a sort of reflection in eta)

@VinInn
Copy link
Contributor

VinInn commented Feb 26, 2014

@VinInn
Copy link
Contributor

VinInn commented Feb 26, 2014

Slava: two electrons back to back?
Are you arguing on the fact that the "leading L1 tau" is flipping?

@slava77
Copy link
Contributor

slava77 commented Feb 26, 2014

Vincenzo

  1. on badCells. From your follow up on my code comments, I understand that I missed a few points.
    (good, I miss things; I've spent two hours last night getting through this spaghetti and clearly followed wrong leads). What is the reason for the remaining regression? 1% of towers sounds a significant effect to me.

  2. on the taus. Yes, I suspect flip-flopping and don't consider it a major issue.

@VinInn
Copy link
Contributor

VinInn commented Feb 26, 2014

Not tell to me about the spagettis. As you can imagine I had to follow way many more leads to be able to fix performance and keep behavior the same!

What is the reason for the remaining regression? 1% of towers sounds a significant effect to me.

different handling of recovered Xstals. In the old code they are not used and NOT counter bad.
In my opinion was wrong: essentially you may have a tower will all 25 xstal not used and this is not recorded.

In my code they are counted bad (also because is difficult to do it differently).
As explained to me by @argiro they recover Xstas dividing the energy in the trigger read-out among all Xstals belonging to a trigger-channel.
In barrel Ecal-trigger and Calo_tower coincides: you shall see a small red bump at 25: each time a calotower has all Xstals recovered.
In end cap the overlap is not perfect, so you see more single bad Xstals (sometime 4 sometime 7) popping up as bad.

In future we may wish to use the recovered energy: after all if it is ok for PF and Ecal, why should be bad for CaloTower?

@slava77
Copy link
Contributor

slava77 commented Feb 27, 2014

looking at other tests: 17.0 single ele35 has some large drop in PFJets at the edge of the tracker.
Can this still be blamed on numerical?

wf17_pfjet_eta

@anton-a
Copy link

anton-a commented Feb 27, 2014

From the point of view of rechit selection and calotower energy assignment the old/new code look consistent. There is indeed an omission in the counting of bad EE/EB cells in the old code due to not adding the "recovered" cells in the count of "bad" when the flag is set not to use them. This is literally a two-line fix. After implementing this fix and comparing the "fixed old code" to PR2240 the count of bad cells is identical as they both follow the proper and intended logic now. The test was done with wf16 - same as in Slava's example of the discrepancy.

At the same time, the implementation of the bad channel counting in PR2240 still needs to be modified. The current counting logic cannot handle properly the case when recovered EE/EB channels are set to be used (they will be counted both as recovered and bad). I would also suggest that the logic behind the handling of the count of bad channels for the currently active configuration (not to use recovered channels) is made more transparent and robust. It works if channels marked as bad in the DB can only enter the rechit collection as recovered. While this is reasonable, it may not cover some future changes of DB content. Furthermore, the logic is not transparent in the code. In worst case, please make clear comments on the assumed relation between flags in DB and the rechit status word in the code for future reference.
Please also remove the tuples returning the status information if some of the second variable is not going to be used.

There are some other minor suggestions that can wait for future iterations.

In summary, I think that we are very close to closing the discussion. Hope we can address the weird discrepancy in PFJets and the tau trigger stuff...

@VinInn
Copy link
Contributor

VinInn commented Feb 27, 2014

On 27 Feb, 2014, at 4:19 AM, slava77 notifications@github.com wrote:

looking at other tests: 17.0 single ele35 has some large drop in PFJets at the edge of the tracker.
What is the physical origin of those spikes in first place?
Can this still be blamed on numerical?

the binning is pretty compressed and the signal looks to be at a fixed eta (what is the generated distribution?) : I would say that the black plot looks suspicious to me…
is there also a phi and pt distribution? if we are loosing PFs they should show up everywhere….
again single here is really single or double back to back: I do not trust PF for perfectly back to back single particles

v.

@VinInn
Copy link
Contributor

VinInn commented Feb 27, 2014

On 27 Feb, 2014, at 8:01 AM, anton-a notifications@github.com wrote:

From the point of view of rechit selection and calotower energy assignment the old/new code look consistent. There is indeed an omission in the counting of bad EE/EB cells in the old code due to not adding the "recovered" cells in the count of "bad" when the flag is set not to use them. This is literally a two-line fix. After implementing this fix and comparing the "fixed old code" to PR2240 the count of bad cells is identical as they both follow the proper and intended logic now. The test was done with wf16 - same as in Slava's example of the discrepancy.

At the same time, the implementation of the bad channel counting in PR2240 still needs to be modified. The current counting logic cannot handle properly the case when recovered EE/EB channels are set to be used (they will be counted both as recovered and bad). I would also suggest that the logic behind the handling of the count of bad channels for the currently active configuration (not to use recovered channels) is made more transparent and robust. It works if channels marked as bad in the DB can only enter the rechit collection as recovered. While this is reasonable, it may not cover some future changes of DB content. Furthermore, the logic is not transparent in the code. In worst case, please make clear comments on the assumed relation between flags in DB and the rechit status word in the code for future reference.

I think this can be done later.
Please also remove the tuples returning the status information if some of the second variable is not going to be used.

This is purely estelical. At the same level there is much more to be done in there!
There are some other minor suggestions that can wait for future iterations.

In summary, I think that we are very close to closing the discussion. Hope we can address the weird discrepancy in PFJets and the tau trigger stuff...

I suggest to merge this PR now and do the other changes later on as it fully satifly the needs oc current HLT and Reco.

v,

Reply to this email directly or view it on GitHub.

Il est bon de suivre sa pente, pourvu que ce soit en montant.
A.G.
http://www.flickr.com/photos/vin60/1320965757/

@davidlange6
Copy link
Contributor

Merging this - lets move further discussion of the details of CaloTowers to the reco meeting or other appropriate forum and meanwhile rely on the validation team to look in more detail in the context of pre4.

davidlange6 added a commit that referenced this pull request Feb 27, 2014
@davidlange6 davidlange6 merged commit 8062bf4 into cms-sw:CMSSW_7_1_X Feb 27, 2014
@slava77
Copy link
Contributor

slava77 commented Feb 27, 2014

On 2/27/14, 1:43 AM, Vincenzo Innocente wrote:

On 27 Feb, 2014, at 4:19 AM, slava77 notifications@github.com wrote:

looking at other tests: 17.0 single ele35 has some large drop in PFJets at the edge of the tracker.
What is the physical origin of those spikes in first place?

I haven't investigated this.

The gun parameters are
MaxPt = cms.double(35.01),
MinPt = cms.double(34.99),
PartID = cms.vint32(11),
MaxEta = cms.double(2.5),
MinEta = cms.double(-2.5),

So, we shouldn't expect anything outside 2.5.

Can this still be blamed on numerical?

the binning is pretty compressed and the signal looks to be at a fixed
eta (what is the generated distribution?) : I would say that the black
plot looks suspicious to me…
is there also a phi and pt distribution? if we are loosing PFs they
should show up everywhere….

right.
Still, it was worth showing, as these samples are showing up in relvals.

again single here is really single or double back to back: I do not
trust PF for perfectly back to back single particles

back-to-back.

v.


Reply to this email directly or view it on GitHub
#2240 (comment).


Vyacheslav (Slava) Krutelyov
TAMU: Physics Dept Texas A&M MS4242, College Station, TX 77843-4242
CERN: 42-R-027
AIM/Skype: siava16 googleTalk: slava77@gmail.com
(630) 291-5128 Cell (US) +41 76 275 7116 Cell (CERN)


@nclopezo nclopezo modified the milestones: CMSSW_7_1_0_pre5, CMSSW_7_1_0_pre4 Mar 10, 2014
@ktf ktf deleted the rebase-1404 branch June 4, 2014 07:33
ggovi pushed a commit to ggovi/cmssw that referenced this pull request Jan 11, 2017
Advance data for L1Trigger/ L1TCalorimeter and  L1TGlobal.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants