Turning on ClusterRepair for edge hits only #26263

cmantill · 2019-03-26T21:20:09Z

PR description:

This PR turns ON ClusterRepair CPE for edge hits only.
It adds an option to the configuration file so that is possible to choose which pixel sub-detectors/layers to run on. By default this option is: "PXB 2","PXB 3","PXB 4" which refers to layers 2,3 and 4 of PixelBarrel.
The PR further adds a flag to turn on/off CR for truncated clusters - this flag is false by default.

The PR follows from the discussion on the DPG meeting https://indico.cern.ch/event/787658/?showDate=all&showSession=6

The following are updated tags for autoCond:

Update pixel local reco conditions in the offline GTs

Data

New offline GT is
https://cms-conddb.cern.ch/cmsDbBrowser/list/Prod/gts/106X_dataRun2_v4
Diff
https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/106X_dataRun2_v4/106X_dataRun2_v1
New offline relval GT is
https://cms-conddb.cern.ch/cmsDbBrowser/list/Prod/gts/106X_dataRun2_relval_v2
Diff
https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/106X_dataRun2_relval_v2/106X_dataRun2_relval_1

Only 2017 conditions are updated in GenErr, Lorentz angle and 1D template.
2D template changes in 2016 and 2017, but it was not consumed in the past till this PR.

PR validation:

It has been tested locally on data - using cmsDriver step3 - (on JetHT and on a run for which a 2D template exists) and verified by printouts that is called only for the layers that are defined in the configuration file.
It has also been tested using standard runTheMatrix tests using 105X_dataRun2_Candidate_2019_04_03_13_37_01 GT and checking step3 independently for runs corresponding to each 2017 IOV.

cmsbuild · 2019-03-26T21:22:03Z

The code-checks are being triggered in jenkins.

cmsbuild · 2019-03-26T21:28:31Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-26263/8938

This PR adds an extra 24KB to repository

cmsbuild · 2019-03-26T21:28:57Z

A new Pull Request was created by @cmantill (Cristina Ana Mantilla Suarez) for master.

It involves the following packages:

RecoLocalTracker/SiPixelRecHits
RecoTracker/TransientTrackingRecHit

@cmsbuild, @perrotta, @slava77 can you please review it and eventually sign? Thanks.
@makortel, @felicepantaleo, @GiacomoSguazzoni, @rovere, @VinInn, @mschrode, @dkotlins, @gpetruc, @ebrondol, @threus, @dgulhan this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

cmantill · 2019-03-26T23:53:50Z

Standard runTheMatrix tests have been performed -> but they are failing due to a global tag error [1]
The current release I'm using is CMSSW_10_6_0_pre2 and sl7.
We expect jobs to fail because there are no updated conditions for 2D templates but this error does not seem related. If anyone has any suggestions please let me know.

[1] ----- Begin Fatal Exception 26-Mar-2019 18:37:15 CDT----------------------- An exception of category 'StdException' occurred while [0] Constructing the EventProcessor [1] Constructing ESSource: class=PoolDBESSource label='GlobalTag' Exception Message: A std::exception was thrown. Connection on "frontier://(preferipfamily=0)(proxyconfigurl=http://grid-wpad/wpad.dat)(backupproxyurl=http://cmst0frontier.cern.ch:3128)(backupproxyurl=http://cmst0frontier1.cern.ch:3128)(backupproxyurl=\ http://cmst0frontier2.cern.ch:3128)(backupproxyurl=http://cmsbpfrontier.cern.ch:3128)(backupproxyurl=http://cmsbpfrontier1.cern.ch:3128)(backupproxyurl=http://cmsbpfrontier2.cern.ch:3128)(backupproxyurl=\ http://cmsbproxy.fnal.gov:3128)(serverurl=http://cmsfrontier.cern.ch:8000/FrontierProd)(serverurl=http://cmsfrontier1.cern.ch:8000/FrontierProd)(serverurl=http://cmsfrontier2.cern.ch:8000/FrontierProd)(s\ erverurl=http://cmsfrontier3.cern.ch:8000/FrontierProd)(serverurl=http://cmsfrontier4.cern.ch:8000/FrontierProd)/CMS_CONDITIONS" cannot be established ( CORAL : "ConnectionPool::getSessionFromNewConnecti\ on" from "CORAL/Services/ConnectionService" ) ----- End Fatal Exception -------------------------------------------------

cmantill · 2019-03-27T17:03:39Z

Quick update: I repeated the runTheMatrix standard tests on lxplus and most of them were completed (it seems that the error from before was something regarding the connection in the fermilab cluster).

The following workflows fail:
136.85_RunEGamma2018A+RunEGamma2018A+HLTDR2_2018+RECODR2_2018reHLT_skimEGamma_Prompt_L1TEgDQM+HARVEST2018_L1TEgDQM
140.56_RunHI2018+RunHI2018+RECOHID18+HARVESTDHI18
136.788_RunSinglePh2017B+RunSinglePh2017B+HLTDR2_2017+RECODR2_2017reHLT_skimSinglePh_Prompt+HARVEST2017

[2]
Exception Message:
SiPixelTemplate2D::interpolate can't find needed template ID = 6534, Are you using the correct global tag?

tvami · 2019-03-28T18:23:47Z

Hi @tocheng, the PR fails because it wants to use the Prompt GTs. PR26281 seems to fix this. Can we do some kind of workaround to pass these tests?

cmantill · 2019-03-29T02:23:48Z

@tvami I tried your GT: 105X_dataRun2_v8 and this works for all 3 workflows that were failing before.

But I tried the PR26281 and it still does not work, e.g for workflow 136.788:

No HLT prescale table found
Using default empty table with all prescales 1
%MSG
----- Begin Fatal Exception 29-Mar-2019 03:20:27 CET-----------------------
An exception of category 'DataCorrupt' occurred while
   [0] Processing  Event run: 297557 lumi: 119 event: 189755411 stream: 0
   [1] Running path 'Flag_METFilters'
   [2] Prefetching for module BooleanFlagFilter/'HBHENoiseFilter'
   [3] Prefetching for module HBHENoiseFilterResultProducer/'HBHENoiseFilterResultProducer'
   [4] Prefetching for module HcalNoiseInfoProducer/'hcalnoise'
   [5] Prefetching for module FastjetJetProducer/'ak4PFJets'
   [6] Prefetching for module PFLinker/'particleFlow'
   [7] Prefetching for module PFProducer/'particleFlowTmp'
   [8] Prefetching for module PFBlockProducer/'particleFlowBlock'
   [9] Prefetching for module PFElecTkProducer/'pfTrackElec'
   [10] Prefetching for module GsfTrackProducer/'electronGsfTracks'
   [11] Prefetching for module MeasurementTrackerEventProducer/'MeasurementTrackerEvent'
   [12] Prefetching for module JetCoreClusterSplitter/'siPixelClusters'
   [13] Prefetching for module PrimaryVertexProducer/'firstStepPrimaryVerticesPreSplitting'
   [14] Calling method for module TrackProducer/'initialStepTracksPreSplitting'
Exception Message:
SiPixelTemplate2D::interpolate can't find needed template ID = 715, Are you using the correct global tag?
----- End Fatal Exception -------------------------------------------------
%MSG-w BeamFitter:  AlcaBeamMonitor:AlcaBeamMonitor@endLumi  29-Mar-2019 03:20:27 CET Run: 297557 Lumi: 119
No event read! No Fitting!
%MSG

tocheng · 2019-03-29T02:34:32Z

@tvami @cmantill
I suspect the 2D template works for 2017.
If you look at the tag,
https://cms-conddb.cern.ch/cmsDbBrowser/list/Prod/tags/SiPixel2DTemplateDBObject_38T_v1_offline
the first IOV covers from run 1 to run 312209 where 2016 and 2017 runs all fall into this IOV.

mmusich · 2019-03-29T08:54:58Z

@tvami @cmantill
I suspect the 2D template works for 2017.
If you look at the tag,
https://cms-conddb.cern.ch/cmsDbBrowser/list/Prod/tags/SiPixel2DTemplateDBObject_38T_v1_offline
the first IOV covers from run 1 to run 312209 where 2016 and 2017 runs all fall into this IOV.

@tocheng I don't believe that's the problem for issue reported in #26263 (comment), because all the other workflows run fine excepted there 3:

136.85_RunEGamma2018A+RunEGamma2018A+HLTDR2_2018+RECODR2_2018reHLT_skimEGamma_Prompt_L1TEgDQM+HARVEST2018_L1TEgDQM
140.56_RunHI2018+RunHI2018+RECOHID18+HARVESTDHI18
136.788_RunSinglePh2017B+RunSinglePh2017B+HLTDR2_2017+RECODR2_2017reHLT_skimSinglePh_Prompt+HARVEST2017

By the way, phase-0 detector reconstruction is not touched intentionally by this PR, see :
https://github.com/cms-sw/cmssw/pull/26263/files#diff-f952152458b1577d1a7da17a1db86daaR16

As far as I can see all of the three failing workflows use the same 2D template tag: SiPixel2DTemplateDBObject_38T_v1_prompt:

136.788 => GT: auto:run2_data_promptlike (105X_dataRun2_PromptLike_v8 in master branch)
136.85 => GT: auto:run2_data_promptlike (105X_dataRun2_PromptLike_v8 in master branch)
140.56 => GT: auto:run2_data_promptlike_hi (105X_dataRun2_PromptLike_HI_v3 in master branch)

and all of these GTs use the same tag [0]
As the various failing workflows fail for runs: 297557, 315489, 326479 it looks like all the 3 payloads in that tag are problematic.

Since: Run   Insertion Time       Payload                                   Object Type               
-----------  -------------------  ----------------------------------------  ------------------------- 
1            2018-06-18 22:16:47  a114682c4d0a1df6cafb53f9ce37af591cfd7bfb  SiPixel2DTemplateDBObject 
312209       2018-06-18 22:20:22  50758a12856b842ffe768ac29348d73bda9480b3  SiPixel2DTemplateDBObject 
324276       2018-10-10 19:02:50  916a68aa85d2bbbb4c61f0db8a1bdb6a30008829  SiPixel2DTemplateDBObject

an easy workaround would be if you could supply new Global Tag for the prompt-like workflows that instead of the tag SiPixel2DTemplateDBObject_38T_v1_prompt contain the tag: SiPixel2DTemplateDBObject_38T_v1_offline.
Please let us know.

[0]

$ conddb list 105X_dataRun2_PromptLike_v8 | grep SiPixel2D
SiPixel2DTemplateDBObjectRcd                            numerator                                         SiPixel2DTemplateDBObject_38T_v1_prompt                                        
$ conddb list 105X_dataRun2_PromptLike_HI_v3 | grep SiPixel2D
SiPixel2DTemplateDBObjectRcd                            numerator                                         SiPixel2DTemplateDBObject_38T_v1_prompt

tvami · 2019-03-29T16:50:37Z

Hi,
to make things a bit faster, I've queued SiPixel2DTemplateDBObject_38T_v1_offline to
105X_dataRun2_PromptLike_Queue
Tamas

tvami · 2019-03-29T18:52:04Z

Hi @tocheng, I've created SiPixel2DTemplateDBObject_38T_v2_prompt as a copy of SiPixel2DTemplateDBObject_38T_v1_offline but with prompt synchronization and queued to the same queue as above.

cmsbuild · 2019-04-04T21:48:33Z

The code-checks are being triggered in jenkins.

cmsbuild · 2019-04-04T21:53:13Z

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-26263/9101

This PR adds an extra 24KB to repository

Code check has found code style and quality issues which could be resolved by applying a patch in https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-26263/9101/git-diff.patch
e.g. curl https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-26263/9101/git-diff.patch | patch -p1

You can run scram build code-checks to apply code checks directly

cmantill · 2019-04-04T22:19:54Z

@mmusich @tocheng I am confused by this patch:
https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-26263/9101/git-diff.patch
The last lines seem to update the GTs according to this PR: f7c4a8a#diff-8cbfe6480e6018966eac475523946acd
which is a reasonable thing to fix.

However, the first lines are reverting the commit we just did to update pixel local reco conditions for 2017.

Please let me know how to proceed.

tocheng · 2019-04-04T22:26:08Z

@mmusich @tocheng I am confused by this patch:
https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-26263/9101/git-diff.patch
The last lines seem to update the GTs according to this PR: f7c4a8a#diff-8cbfe6480e6018966eac475523946acd
which is a reasonable thing to fix.

However, the first lines are reverting the commit we just did to update pixel local reco conditions for 2017.

Please let me know how to proceed.

@cmantill Can you rebase the PR to the most recent IB?

cmsbuild · 2019-04-10T18:10:54Z

Pull request #26263 was updated. @perrotta, @christopheralanwest, @tocheng, @cmsbuild, @franzoni, @slava77, @pohsun can you please check and sign again.

cmantill · 2019-04-10T18:28:42Z

@perrotta We found a bug in the current implementation to call the 2D fit:

For cases where we are in the correct layers (BPIX 2,3,4) but not edgehits, the theClusterParam.recommended2D_ is set to a garbage value (!=0) that later passes the 2D call. Thus, instead of calling 2D reco for 0.5% of the hits, we were calling for 10% of them. We have fixed that in the latest commit by setting theClusterParam.recommended2D_ false by default.

I should clarify that this bug should have not affected the hit residuals by a significant amount. For the hits affected by this bug, the 2D Reco would have been running with option edgeY=0, i.e. when CR is not trying to repair anything and is pretty much the same as the 1D Reco. It was only making things much slower as shown by the results below.

With the fix introduced in the latest commit we have updated the timing and CPU time tests as follows:

Here is the CPU time comparison with CRON, again running on JetHT for 10evts (withbugfix), https://cmantill.web.cern.ch/cmantill/cgi-bin//igprof-navigator/igprofbugpixCRON/342 compared to:
- CROFF (https://cmantill.web.cern.ch/cmantill/cgi-bin//igprof-navigator/igprofCROFF10evts/394)
- CRON with bug (https://cmantill.web.cern.ch/cmantill/cgi-bin//igprof-navigator/ig/330)
  There is a significant reduction of the CPU time that 2D consumes: 2D uses 5% of the CPU time, while the 1D fit consumes most of it ~90%.
Here [1] is the timing comparison run on the same 100 evts of JetHT as before (in lxplus too) compared to the old log with CROFF:

   +0.062444      +0.00%        10.43 ms/ev ->        11.10 ms/ev jetCoreRegionalStepTracks
   +0.057465      +0.04%       113.48 ms/ev ->       120.19 ms/ev initialStepTracksPreSplitting
   +0.065809      +0.05%       121.00 ms/ev ->       129.24 ms/ev detachedTripletStepTracks
   +0.070813      +0.06%       134.83 ms/ev ->       144.73 ms/ev highPtTripletStepTracks
   +0.071519      +0.06%       130.54 ms/ev ->       140.22 ms/ev initialStepTracks
   +0.069876      +0.06%       138.94 ms/ev ->       149.00 ms/ev lowPtTripletStepTracks
   +0.087252      +0.07%       128.88 ms/ev ->       140.64 ms/ev lowPtQuadStepTracks
Job total:  15.7222 s/ev ==> 17.0266 s/ev

Although the time increases for tracking, the increased time is less than that shown before the bugfix. We think that the overall increase of time is due to random increases of other modules (also this comparison is done to a test with CROFF two days ago).

[1] https://cmantill.web.cern.ch/cmantill/tmp/newtiming_afterbugfix.log

perrotta · 2019-04-10T21:16:46Z

please test

cmsbuild · 2019-04-10T21:17:22Z

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/34137/console Started: 2019/04/10 23:17

cmsbuild · 2019-04-10T23:54:23Z

+1
Tested at: bec1c24
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-26263/34137/summary.html

cmsbuild · 2019-04-10T23:54:29Z

Comparison job queued.

cmsbuild · 2019-04-11T02:00:44Z

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-26263/34137/summary.html

Comparison Summary:

No significant changes to the logs found
Reco comparison results: 8476 differences found in the comparisons
DQMHistoTests: Total files compared: 32
DQMHistoTests: Total histograms compared: 3140495
DQMHistoTests: Total failures: 12869
DQMHistoTests: Total nulls: 6
DQMHistoTests: Total successes: 3127423
DQMHistoTests: Total skipped: 197
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: -0.028 KiB( 31 files compared)
DQMHistoSizes: changed ( 136.788 ): -0.016 KiB JetMET/SUSYDQM
DQMHistoSizes: changed ( 136.85 ): -0.012 KiB JetMET/SUSYDQM
Checked 133 log files, 14 edm output root files, 32 DQM output files

perrotta · 2019-04-11T08:09:44Z

After the last fix, overall timing is quite comparable while using ClusterRepair on a limited number of hits, as in this PR, or having it turned off, as in the baseline IB. I think that the differences you report in #26263 (comment) are mostly due to an overall normalization factor, as I see no relevant increase while running on the usual 100 evt of the TTbar PU sample 11024.

The number of hits affected being now rather limited, as you pointed out, also the effects on general tracks justshow up as small deviations from the baseline, see

The same for the derived higher order objects, see for example

or

A detailed validation cannot be done with just those small test samples, but some is certainly expected based on pre4, provided this PR can get merged for it

perrotta · 2019-04-11T08:13:32Z

+1

ClusterRepair has beed turned on for the restricted class of hits as in the PR descritption.
While restricting to that set of hits the impact on code performance is negligible, and also the propagation to the reco objects has overall a limited effect.
Jenkins tests pass and show small deviations in tracking and derived quantities, as expected.

pohsun · 2019-04-11T12:48:35Z

+1

cmsbuild · 2019-04-11T12:49:11Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

fabiocos · 2019-04-11T20:19:44Z

+1

cmsbuild added this to the CMSSW_10_6_X milestone Mar 26, 2019

cmsbuild added code-checks-pending comparison-pending orp-pending pending-signatures reconstruction-pending tests-pending labels Mar 26, 2019

cmsbuild added code-checks-approved and removed code-checks-pending labels Mar 26, 2019

mmusich mentioned this pull request Apr 3, 2019

Added plots to monitor pixel clusterrepair algorithm #26324

Merged

cmsbuild added alca-pending code-checks-pending and removed code-checks-approved labels Apr 4, 2019

cmsbuild added code-checks-rejected and removed code-checks-pending labels Apr 4, 2019

cmsbuild added the code-checks-approved label Apr 10, 2019

cmsbuild added tests-started and removed tests-pending labels Apr 10, 2019

This was referenced Apr 10, 2019

Added Residuals plots and fix monitoring plots for Pixel ClusterRepairs #26430

Merged

Added Residuals plots and fix monitoring plots for Pixel ClusterRepairs (backport) #26431

Closed

cmsbuild added tests-approved and removed tests-started labels Apr 10, 2019

cmsbuild added comparison-available and removed comparison-pending labels Apr 11, 2019

cmsbuild added reconstruction-approved and removed reconstruction-pending labels Apr 11, 2019

cmsbuild added alca-approved fully-signed and removed alca-pending pending-signatures labels Apr 11, 2019

cmsbuild added orp-approved and removed orp-pending labels Apr 11, 2019

cmsbuild merged commit c2e23af into cms-sw:master Apr 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Turning on ClusterRepair for edge hits only #26263

Turning on ClusterRepair for edge hits only #26263

cmantill commented Mar 26, 2019 •

edited

cmsbuild commented Mar 26, 2019

cmsbuild commented Mar 26, 2019

cmsbuild commented Mar 26, 2019

cmantill commented Mar 26, 2019

cmantill commented Mar 27, 2019

tvami commented Mar 28, 2019

cmantill commented Mar 29, 2019

tocheng commented Mar 29, 2019

mmusich commented Mar 29, 2019

tvami commented Mar 29, 2019

tvami commented Mar 29, 2019

cmsbuild commented Apr 4, 2019

cmsbuild commented Apr 4, 2019

cmantill commented Apr 4, 2019

tocheng commented Apr 4, 2019 •

edited

cmsbuild commented Apr 10, 2019

cmantill commented Apr 10, 2019

perrotta commented Apr 10, 2019

cmsbuild commented Apr 10, 2019 •

edited

cmsbuild commented Apr 10, 2019

cmsbuild commented Apr 10, 2019

cmsbuild commented Apr 11, 2019

perrotta commented Apr 11, 2019

perrotta commented Apr 11, 2019

pohsun commented Apr 11, 2019

cmsbuild commented Apr 11, 2019

fabiocos commented Apr 11, 2019

Turning on ClusterRepair for edge hits only #26263

Turning on ClusterRepair for edge hits only #26263

Conversation

cmantill commented Mar 26, 2019 • edited

PR description:

Update pixel local reco conditions in the offline GTs

Data

PR validation:

cmsbuild commented Mar 26, 2019

cmsbuild commented Mar 26, 2019

cmsbuild commented Mar 26, 2019

cmantill commented Mar 26, 2019

cmantill commented Mar 27, 2019

tvami commented Mar 28, 2019

cmantill commented Mar 29, 2019

tocheng commented Mar 29, 2019

mmusich commented Mar 29, 2019

tvami commented Mar 29, 2019

tvami commented Mar 29, 2019

cmsbuild commented Apr 4, 2019

cmsbuild commented Apr 4, 2019

cmantill commented Apr 4, 2019

tocheng commented Apr 4, 2019 • edited

cmsbuild commented Apr 10, 2019

cmantill commented Apr 10, 2019

perrotta commented Apr 10, 2019

cmsbuild commented Apr 10, 2019 • edited

cmsbuild commented Apr 10, 2019

cmsbuild commented Apr 10, 2019

cmsbuild commented Apr 11, 2019

perrotta commented Apr 11, 2019

perrotta commented Apr 11, 2019

pohsun commented Apr 11, 2019

cmsbuild commented Apr 11, 2019

fabiocos commented Apr 11, 2019

cmantill commented Mar 26, 2019 •

edited

tocheng commented Apr 4, 2019 •

edited

cmsbuild commented Apr 10, 2019 •

edited