Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features for low pt electrons in B parking processing #25991

Closed
6 tasks done
fabiocos opened this issue Feb 21, 2019 · 40 comments
Closed
6 tasks done

Features for low pt electrons in B parking processing #25991

fabiocos opened this issue Feb 21, 2019 · 40 comments

Comments

@fabiocos
Copy link
Contributor

fabiocos commented Feb 21, 2019

This issue is meant to keep track of the features needed to finalzie the low pt electron reconstruction at the heart of the B parking data reprocessing. As of now the list is (2019/03/05 17:00):

@cmsbuild
Copy link
Contributor

A new Issue was created by @fabiocos Fabio Cossutti.

@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@fabiocos
Copy link
Contributor Author

@bainbrid
Copy link
Contributor

bainbrid commented Feb 21, 2019

@fabiocos many thanks for this. Some corrections/additions below:

Reconstruction:

BDT models:

Conversions and Skim:

@perrotta
Copy link
Contributor

@bainbrid , please correct either me or the table above, but the backport for #25915 is included in #25936 (yet to be merged), not in #25887

@bainbrid
Copy link
Contributor

@perrotta yes, good spot - thanks! Corrected.

@bainbrid
Copy link
Contributor

FYI, we are considering to open two new PRs for the 10_2_X cycle:

  1. an update of the ID BDT model in cms-data/RecoEgamma-ElectronIdentification, which involves a retraining follow the bug fixes in Final BDT models based on 10.2 MC samples #25936. We could push this updated model to the current open PR (Final BDT models based on 10.2 MC samples  cms-data/RecoEgamma-ElectronIdentification#13) if people agree?

  2. A switch to a new "Very Loose" working for the Seeding module. This would involve a 1-2 line change in a single configuration file. We will provide the physics case in tomorrow's RECO/AT meeting, along with an estimate of the computing cost.

@gkaratha @mverzett

@bainbrid
Copy link
Contributor

With regards to point 1) above, we now plan to proceed with the existing ID model in Autumn18, as a recent retraining shows little improvement (and we will anyway further develop the ID on a longer timescale).

I've just made a new PR just made (#26012) which updates master to use the Autumn18 models, found in cms-data/RecoEgamma-ElectronIdentification#13. I've updated the status. This PR also uses a "Very Loose" working point in the low pT electron seed module for the bParking era.

I will propagate the VL WP change back to the 10_2_X cycle, adding numbers for timing/footprint increases. I can either add it to the existing open #25936 or create a new PR. By default I will create a new PR, unless I hear otherwise.

@bainbrid
Copy link
Contributor

The switch to a Very Loose working point for the bParking era is provided in #26012 (master) and the back port in #26013 (10_2_X). I've updated the status.

@bainbrid
Copy link
Contributor

102X: #26013 (open) supersedes #25936 (closed)

@bainbrid
Copy link
Contributor

bainbrid commented Feb 25, 2019

Added "Change WP" (skimming code): #26015 (master open) -> #25995 (10_2_X open)

@bainbrid
Copy link
Contributor

#26015 master merged

@bainbrid
Copy link
Contributor

#26012 master merged. All PRs to master are now merged.
There are two remaining open PRs to 10_2_X: #26013 and #25995.

@bainbrid
Copy link
Contributor

FYI, we are planning one final PR (to master first, then back port), which will "link" each low pT GsfElectron (via its GsfTrack) to the corresponding "packedCandidate" or "lostTrack" using edm::Association containers.

This is needed because the link between each GsfTrack and the corresponding "generalTrack" that seeds the low pT electron reconstruction is lost in the PAT framework. The above PR will provide this link for miniAOD and remove the need for (suboptimal) deltaR matching.

@mverzett

@fabiocos
Copy link
Contributor Author

@slava77 @perrotta following the discussion at the ORP, is the test of the B parking era performed by passing to cmsDriver the "--era Run2_2018,bParking" option? Could you please confirm or rectify in case?

@perrotta
Copy link
Contributor

Exactly that one:

--era Run2_2018,bParking

One has to decide about the samples to use. I was thinking about modifying as such e.g. the workflows 136.85 (RunEGamma2018A) and a TTbar PU. @bainbrid suggested one signal sample instead for the MC workflow.

@bainbrid
Copy link
Contributor

bainbrid commented Feb 27, 2019 via email

@prebello
Copy link
Contributor

FYI @kfjack @zhenhu @pgunnell

@bainbrid
Copy link
Contributor

#26013 merged to 10_2_X.

@bainbrid
Copy link
Contributor

bainbrid commented Feb 28, 2019

@fabiocos @slava77 @perrotta

Here is an update.

Master:

10_2_X:

Unless we have to iterate further with management on the skim working point, we believe the above PRs should complete our requirements for processing of the B Parked data set.

Please can I query the timeline to form a new 10_2 release? Is it reasonable to hope for early next week?

Our aim is to take this release and immediately process a "pilot run" comprising ~500M events.

@bainbrid
Copy link
Contributor

bainbrid commented Mar 3, 2019

#26031 merged to master

@prebello
Copy link
Contributor

prebello commented Mar 4, 2019

@fabiocos please let us know which 10-2-X release will be prepared for early rereco bparking based in these PRs.
PdmV is waiting for it to reprocess the request from https://its.cern.ch/jira/browse/PDMVRERECO-12
Thank you

@fabiocos
Copy link
Contributor Author

fabiocos commented Mar 4, 2019

@prebello and I am waiting for PdmV to approve #26035, that will trigger the merge of #26038 that will contribute to next 10_2_X build :-)

@prebello
Copy link
Contributor

prebello commented Mar 4, 2019

@fabiocos done
sorry I didn't reach it in time, before I asked you here, among my 1100 emails :-)

@perrotta
Copy link
Contributor

perrotta commented Mar 5, 2019

This is just to remind here that there is still to solve (or at least to understand) the issue of tracks with all zero parameters, which was reported in #26031 (comment)
@bainbrid @mverzett : is there any progress on that front?

@bainbrid
Copy link
Contributor

bainbrid commented Mar 5, 2019

#26035 merged to master.
#26036 merged to CMSSW_10_2_X.

Only #26038 remains unmerged (to 10_2_X).

@fabiocos
Copy link
Contributor Author

fabiocos commented Mar 5, 2019

All the PRs discussed so far have been merged both in master and in 10_2_X (tonight IB will have them).
We still have the issue #25991 (comment) to be clarified

@bainbrid
Copy link
Contributor

bainbrid commented Mar 5, 2019 via email

@perrotta
Copy link
Contributor

perrotta commented Mar 5, 2019

Thanks all! We’ll get back to you on the warnings mentioned above. We’ve seen this before and believe it poses no problems. More details tomorrow. @perrotta, please can you provide exact details on how to reproduce the warnings?

It was explained in #26031 (comment)

Explicitely:

runTheMatrix.py -l 11024 --command "-n 60 --era Run2_2018,bParking"

and the warnings which report about p=0 tracks show up in step3 at events 16, 24, 42, 59

@bainbrid
Copy link
Contributor

bainbrid commented Mar 6, 2019

#26038 merged into 10_2_X

@bainbrid
Copy link
Contributor

bainbrid commented Mar 6, 2019

All PRs are now merged and the full set are available in these two IBs:

  • CMSSW_10_6_X_2019-03-05-1100
  • CMSSW_10_2_X_2019-03-05-2300

Many thanks to everyone for helping us to get to this point!

@bainbrid
Copy link
Contributor

bainbrid commented Mar 6, 2019

@perrotta @fabiocos

With regards to the warnings mentioned here #26031 (comment), we have seen these errors since early in the low pT electrons development cycle. The warnings arise in the PFTrackTransformer class, used by the PFElecTkProducer. We make use of "utility methods" from this PF code that extrapolate (in this case) brem trajectories to the ECAL surface. These extrapolations are used to match bremsstrahlung radiation to ECAL deposits, which in turn are used to build the SuperClusters. The class has several logical clauses in which warnings are made when the extrapolation to the ECAL surface fails.

We note that empirical observations indicate that these warnings are more prevalent in the low pT regime in which we are operating, and we are at or near some boundary conditions in terms of acceptance or performance of these PF algorithms. Any detailed investigations or changes to the PF code is outside the scope of this work and would likely incur a significant delay (weeks?). It was - and is - our intention to rely on this PF code "as is", so this is not a route we would like to pursue. (However, we will attempt to recreate the warnings now and - hopefully - isolate the details for these specific warning instances.)

We also note the following. 1) The warnings arise from a module that occurs after the production of the low pT GsfTracks, which are unaffected. 2) At worst, each warning simply implies a potentially missing energy deposit from brem in the SuperCluster. While this is arguably not ideal, we note that the ID BDT is trained under these conditions. 3) Further, we rely heavily on the GsfTrack information for a P4 measurement; the SuperCluster is mainly used to strengthen the "ID" the electron, beyond that provided by the BDTs at the seeding step (that are quite discriminating already!).

Given this context, we would like to proceed and continue to a new 10_2_X release and the pilot run at the earliest opportunity, if you agree.

@perrotta
Copy link
Contributor

perrotta commented Mar 6, 2019

@bainbrid : yes, the warnings themselves are there since longtime, and not only for the low pt electrons. The point that I asked to investigate is that in some of those warnings, the ones in correspondence of the events I listed, all the track parameters are identically 0! And this (apparently) happens only for the low pt electrons.
Could you please investigate where those tracks with 0 momentum and all other parameters come out? Then, once a track has pt=0 I am not surprised that it cannot reach the calos. The real issue is: "why a track with p=0 enters in your algo"?

@perrotta @fabiocos

With regards to the warnings mentioned here #26031 (comment), we have seen these errors since early in the low pT electrons development cycle. The warnings arise in the PFTrackTransformer class, used by the PFElecTkProducer. We make use of "utility methods" from this PF code that extrapolate (in this case) brem trajectories to the ECAL surface. These extrapolations are used to match bremsstrahlung radiation to ECAL deposits, which in turn are used to build the SuperClusters. The class has several logical clauses in which warnings are made when the extrapolation to the ECAL surface fails.

We note that empirical observations indicate that these warnings are more prevalent in the low pT regime in which we are operating, and we are at or near some boundary conditions in terms of acceptance or performance of these PF algorithms. Any detailed investigations or changes to the PF code is outside the scope of this work and would likely incur a significant delay (weeks?). It was - and is - our intention to rely on this PF code "as is", so this is not a route we would like to pursue. (However, we will attempt to recreate the warnings now and - hopefully - isolate the details for these specific warning instances.)

We also note the following. 1) The warnings arise from a module that occurs after the production of the low pT GsfTracks, which are unaffected. 2) At worst, each warning simply implies a potentially missing energy deposit from brem in the SuperCluster. While this is arguably not ideal, we note that the ID BDT is trained under these conditions. 3) Further, we rely heavily on the GsfTrack information for a P4 measurement; the SuperCluster is mainly used to strengthen the "ID" the electron, beyond that provided by the BDTs at the seeding step (that are quite discriminating already!).

Given this context, we would like to proceed and continue to a new 10_2_X release and the pilot run at the earliest opportunity, if you agree.

@nancymarinelli
Copy link
Contributor

@perrotta @mverzett @bainbrid I just submitted PR #26090 for the master, to include conversions from lowPtGsfTracks in miniAOD

@fabiocos
Copy link
Contributor Author

fabiocos commented Mar 7, 2019

All the code backported so far in 10_2_X will be included in the next release build, see #26098

@fabiocos
Copy link
Contributor Author

@mverzett @bainbrid @nancymarinelli apart for possible fixes, is there anything else to be expected? Or we may finally consider this issue as finalized?

@bainbrid
Copy link
Contributor

If @nancymarinelli is happy from the conversions side, then we are finished and can close.

(The issue #26154 may potentially lead to some bug fixes ...)

@nancymarinelli
Copy link
Contributor

nancymarinelli commented Mar 12, 2019 via email

@fabiocos
Copy link
Contributor Author

fabiocos commented Apr 3, 2019

I consider this plan of work as finalized.

@fabiocos fabiocos closed this as completed Apr 3, 2019
@bainbrid
Copy link
Contributor

bainbrid commented Apr 3, 2019

For the record:

@prebello reports warnings/errors when running on data with EarlyReReco2018 conditions, found in this log /afs/cern.ch/user/p/prebello/public/forBparking/test.log, as reported here. The command used is below:

cmsDriver.py RECO -s RAW2DIGI,L1Reco,RECO,SKIM:SkimBPark,EI,PAT,DQM:@allForPrompt --runUnscheduled --nThreads 8 --data --era Run2_2018,bParking --scenario pp --conditions 102X_dataRun2_Sep2018Rereco_v1 --eventcontent AOD,MINIAOD,DQM --datatier AOD,MINIAOD,DQMIO --customise Configuration/DataProcessing/RecoTLR.customisePostEra_Run2_2018,Configuration/DataProcessing/Utils.addMonitoring --filein /store/data/Run2018A/ParkingBPH1/RAW/v1/000/315/974/00000/8C4FF0AD-4D53-E811-ABFE-FA163EFF434D.root -n 1000 --python_filename=recoskim_Run2018A_ParkingBPH1.py --no_exec

The warnings/errors that appear to relate to the low pT electron code are listed below.

Comments welcome. @mverzett @fabiocos @perrotta @Sam-Harper

%MSG-e PFTrackTransformer: PFElecTkProducer:lowPtGsfElePfGsfTracks 03-Apr-2019 01:24:36 CEST Run: 315974 Event: 44400197
BE CAREFUL: Gsf Tangents not stored in the event. You need to re-reco the particle-flow with RecoToDisplay_cfg.py and not RecoToDisplay_NoTracking_cfg.py

This error is produced here. The message appears to be a little misleading, as the full reconstruction chain is performed in full in the same process and so the transient TrackExtra objects should be in memory. So the error is presumably related to the fact that the number of tangents is zero. In the context of brem, this is presumably an issue as a brem should lead to a tangent. So perhaps this means that the electron did not brem?

%MSG-w OutOfBounds: LowPtGsfElectronSeedProducer:lowPtGsfElectronSeeds 03-Apr-2019 01:25:43 CEST Run: 315974 Event: 44589124
Prob Q outside the bounds of the quality word. Defaulting to Prob=0. Prob = 1 QualityWord = 960

This warning originates from the pixel RecHit code here. I think this call on the cloneSH on the SiPixcelRecHit class triggers the warning.

%MSG-w PFTrackTransformer: PFElecTkProducer:lowPtGsfElePfGsfTracks 03-Apr-2019 01:27:08 CEST Run: 315974 Event: 44558021
BREM Reco track charge = 0, type = 0, Pt = 0, P = 0
  R0 = 0 Z0 = 0
  number of tracker measurements = 0
  number of points total = 6
Traj point id = -1, layer = -1, Eta,Phi = 0,0, X,Y = 0,0, R,Z = 0,0, E,Pt = 0,0
Traj point id = -1, layer = -1, Eta,Phi = 0,0, X,Y = 0,0, R,Z = 0,0, E,Pt = 0,0
Traj point id = -1, layer = -1, Eta,Phi = 0,0, X,Y = 0,0, R,Z = 0,0, E,Pt = 0,0
Traj point id = -1, layer = -1, Eta,Phi = 0,0, X,Y = 0,0, R,Z = 0,0, E,Pt = 0,0
Traj point id = -1, layer = 4, Eta,Phi = 2.99906,0.941118, X,Y = 18.8814,25.9138, R,Z = 32.0629,320.9, E,Pt = 7.72584,0.75581
Traj point id = -1, layer = 5, Eta,Phi = 2.99918,0.941113, X,Y = 19.0197,26.1033, R,Z = 32.2976,323.287, E,Pt = 7.72584,0.75581
 PROPAGATION TO THE HCAL ENTRANCE HAS FAILED
%MSG

This warning is "understood" and has been discussed here and later in the same thread. (Also discussed here.)

@perrotta
Copy link
Contributor

perrotta commented Apr 3, 2019

Couldn't issue nr (2) be just due to some rounding effect?

Line https://github.com/cms-sw/cmssw/blob/master/DataFormats/TrackerRecHit2D/interface/SiPixelRecHitQuality.h#L134 says:

      if(prob<0 || prob>1) {

and the log reports that prob is equal to "1" (as a float), therefore it shouldn't have triggered the warning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants