Fix automated pixel pair mitigation to include "Fed25" information #23052

makortel · 2018-04-25T11:05:21Z

The investigation of https://hypernews.cern.ch/HyperNews/CMS/get/recoTracking/1760.html lead to finding that the "Fed25" ("stuck TBM") information is not currently used in the automated pixel pair mitigation (an oversight in #21630?). This PR fixes the configuration to read the information.

Tested in 10_1_0, expecting changes (=more pixelPair tracks) in workflows processing data containing "Fed25" errors.

@VinInn

cmsbuild · 2018-04-25T11:05:39Z

The code-checks are being triggered in jenkins.

cmsbuild · 2018-04-25T11:07:47Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-23052/4456

cmsbuild · 2018-04-25T11:08:04Z

A new Pull Request was created by @makortel (Matti Kortelainen) for master.

It involves the following packages:

RecoTracker/TkTrackingRegions

@perrotta, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks.
@felicepantaleo, @GiacomoSguazzoni, @rovere, @VinInn, @mschrode, @gpetruc, @ebrondol, @dgulhan this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

makortel · 2018-04-25T11:09:36Z

@cmsbuild, please test

makortel · 2018-04-25T11:09:43Z

type bug-fix

cmsbuild · 2018-04-25T11:09:53Z

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/27642/console Started: 2018/04/25 13:10

makortel · 2018-04-25T11:17:28Z

Here is the effect of this PR with 100 events from run 315104 in 10_1_1 (corresponding to the HN message) for pixelPair seeds (left: 10_1_1+this PR, right: 10_1_1)

and generalTracks

More details can be found from my private DQM GUI
http://127.0.0.1:8081/dqm/relval/start?runnr=315104;dataset=/Commissioning/CMSSW_10_1_1-101X_dataRun2_Express_v7_fix-v1/DQMIO;sampletype=offline_data;filter=all;referencepos=ratiooverlay;referenceshow=all;referencenorm=True;referenceobj1=other%3A%3A/Commissioning/CMSSW_10_1_1-101X_dataRun2_Express_v7_orig_v1/DQMIO%3A;referenceobj2=none;referenceobj3=none;referenceobj4=none;search=;striptype=object;stripruns=;stripaxis=run;stripomit=none;workspace=Everything;size=M;root=Tracking/TrackParameters/generalTracks/TrackBuilding;focus=;zoom=no;
with the following ssh tunnel recipe

ssh -L8081:mkdev.cern.ch:8081 lxplus.cern.ch

makortel · 2018-04-25T11:22:44Z

And here is the effect with 1000 events from run 305064 (DoubleEG) that @slava77 used for plots in #21630 (comment) (left: 10_1_0+this PR, right: 10_1_0)
pixelPair seeds

and generalTracks

More details can be found from my private DQM GUI
http://127.0.0.1:8081/dqm/relval/start?runnr=305064;dataset=/DoubleEG/CMSSW_10_1_0-101X_dataRun2_PromptLike_v7_fix_v1/DQMIO;sampletype=offline_data;filter=all;referencepos=ratiooverlay;referenceshow=all;referencenorm=True;referenceobj1=other%3A%3A/DoubleEG/CMSSW_10_1_0-101X_dataRun2_PromptLike_v7_orig_v1/DQMIO%3A;referenceobj2=none;referenceobj3=none;referenceobj4=none;search=;striptype=object;stripruns=;stripaxis=run;stripomit=none;workspace=Everything;size=M;root=Tracking/TrackParameters/generalTracks/TrackBuilding;focus=;zoom=no;
with the following ssh tunnel recipe

ssh -L8081:mkdev.cern.ch:8081 lxplus.cern.ch

VinInn · 2018-04-25T11:50:38Z

The recovery at track level seems to be quite marginal (and I did not manage to identify any other iteration that was backing up PixelPair).
It is true that we seed PixelPairs only on the first 5 highest-pt vertices and these are minbias, so on average we recover tracks only for ~10% of the pp-collisions ( was 37 for the event in Matti's DQM)

slava77 · 2018-04-25T12:10:53Z

@makortel
what are the timing and output changes in 305064 case.
Perhaps a 1D histogram of pixel pair seeds where it's possible to read the scale would be helpful as well. The heat-map plot looks like the overall number of seeds in the pixel pair goes up by a factor of 2.

In the context of possibly having to port to 10_1_X:
I'm trying to understand if this is really a fix for a previously unknown situation, or effectively an improvement.
Do I understand correctly (from the plot for 305064) that the "Fed25" list was filled similarly when #21630 feature was developed?
There was ample time to study performance of #21630 and probably much worse situation with data loss in 305064 and also relative satisfaction with performance of #21630.
I so far conclude that this is rather an improvement and a backport to 10_1_X is not necessary.

slava77 · 2018-04-25T12:11:58Z

what is the situation in the current MC for 2017 and 2018? Is "Fed25" filled at all?

makortel · 2018-04-25T12:22:03Z

@slava77

what are the timing and output changes in 305064 case.

Didn't check the timing yet.

Perhaps a 1D histogram of pixel pair seeds where it's possible to read the scale would be helpful as well. The heat-map plot looks like the overall number of seeds in the pixel pair goes up by a factor of 2.

Here is the distribution of pixelPair seeds per event

so a factor of 3-4 increase.

I'm trying to understand if this is really a fix for a previously unknown situation, or effectively an improvement.

I'm calling it bugfix since #21630 should have included "Fed25" list as well.

Do I understand correctly (from the plot for 305064) that the "Fed25" list was filled similarly when #21630 feature was developed?

AFAIK there has been no change in the "Fed25" list filling since it was introduced in #20151.

what is the situation in the current MC for 2017 and 2018? Is "Fed25" filled at all?

Not to my knowledge but @veszpv & co should confirm.

slava77 · 2018-04-25T12:35:21Z

On 4/25/18 5:22 AM, Matti Kortelainen wrote: I'm trying to understand if this is really a fix for a previously unknown situation, or effectively an improvement. I'm calling it bugfix since #21630 <#21630> should have included "Fed25" list as well. Do I understand correctly (from the plot for 305064) that the "Fed25" list was filled similarly when #21630 <#21630> feature was developed? AFAIK there has been no change in the "Fed25" list filling since it was introduced in #20151 <#20151>.

OK. I'm still concluding that we can have this feature just in 10_2_X. Related to Vincenzo's comment on marginal change to general tracks: do we have a DQM plot for the leading vertex(vertices)?

makortel · 2018-04-25T12:39:55Z

do we have a DQM plot for the leading vertex(vertices)?

What plot do you have in mind? (my test on run 305064 had full DQM)

VinInn · 2018-04-25T12:40:45Z

On 25 Apr, 2018, at 2:35 PM, Slava Krutelyov ***@***.***> wrote: Related to Vincenzo's comment on marginal change to general tracks: do we have a DQM plot for the leading vertex(vertices)?

yes, but I think that the selection does not work... / Tracking / TrackParameters / highPurityTracks / dzPV0p1 needs to be fixed v.

cmsbuild · 2018-04-25T12:54:08Z

+1
Tested at: 2a73834
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-23052/27642/summary.html

cmsbuild · 2018-04-25T12:54:11Z

Comparison job queued.

cmsbuild · 2018-04-25T14:30:42Z

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-23052/27642/summary.html

Comparison Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 29
DQMHistoTests: Total histograms compared: 2494144
DQMHistoTests: Total failures: 1
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 2493967
DQMHistoTests: Total skipped: 176
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 28 files compared)
Checked 119 log files, 9 edm output root files, 29 DQM output files

slava77 · 2018-04-25T14:58:45Z

Reco comparison results: 0 differences found in the comparisons

are we so lucky with the matrix workflows?
The only 2017 full reco data wf is 136.788 using 297557 from 2017B.
Is it a too early run for this purpose (like Fed25 was not even filled)?

VinInn · 2018-04-25T16:38:38Z

Is it a too early run for this purpose (like Fed25 was not even filled)?

most probably

…

makortel · 2018-04-26T07:42:05Z

Is it a too early run for this purpose (like Fed25 was not even filled)?

most probably

This is my recollection as well.

VinInn · 2018-04-26T12:46:07Z

We need a backport and request a new release for production

makortel · 2018-04-26T12:48:31Z

Backport is in #23064.

slava77 · 2018-04-26T13:07:01Z

@makortel @VinInn
what is the impact of this update on HLT?

makortel · 2018-04-26T13:09:11Z

@slava77

what is the impact of this update on HLT?

None, HLT already sets this parameter correctly (and thus makes use of the "Fed25" information).

@JanFSchulte

fabiocos · 2018-04-26T17:36:37Z

@slava77 @VinInn @makortel I see that this fix is not yet fully signed. I would like to avoid delaying further CMSSW_10_2_0_pre2 if not strictly necessary, but if it is about to come it would be useful to get it inside...

slava77 · 2018-04-26T17:43:36Z

@fabiocos
I understood that the pre2 build is tomorrow.
It would be very useful to get this in, indeed.

I expect to sign this soon, today.

slava77 · 2018-04-26T18:36:14Z

Here are some observations from 2017F matrix workflows

136.831 JetHT Run 305064, LumiSection 81
(the behavior appears to be about the same also in wf 136.829 : Run 305064, LumiSection 39 and
136.83 : Run 305064, LumiSection 81)

CPU
- in pixel pair seeding parts is up by x10; total in pixelPairStep is up by x2
- later iterations decrease by a few %
- total in iter tracking (*Step* in tracking module names) is up by 4.2%
- the rest of reco is up by 1.5%
Disk: about 1% increase in RECO and miniAOD, driven by the increase in the number of tracks and PF candidates by about 2% each.
at the generalTracks level:
- the additional tracks are clearly in the pixel pair, as expected. About double the count in this case.
- most additions are at low pt, high eta (and ~flat in phi!) , short length. Combined, it seems like most of the added tracks are fakes.

The above high level plots are going in line with the lower level details:

the fraction of pixPair seeds that make a candidate is down

The number of seeds is up by about x4

Perhaps the most strikingly different plot is the pixelPair seeding regions eta-phi

The added candidates are somewhat localized

As far as I understand, these are all somewhat expected changes.

The hit pattern efficiency plot shows some drop in efficiency

I guess addition of shorter/poorer tracks to the reference can lead to visible degradation in this kind of plot.

slava77 · 2018-04-26T18:46:36Z

+1

for #23052 2a73834

the code changes are clear and are meant to pick up the event-by-event bad ROCs (FED25 errors). As described in the PR description or/and the follow up comments, this information was intended to be included all along.
jenkins tests pass and comparisons with the baseline show no differences (the only data workflow tested automatically is in 2017B which didn't have the FED25 reported yet at the hardware level).
local tests with 2017F confirm that there is some impact from picking up this FED25 error data.

The plots for 2017F relval matrix workflows were done with a prompt-like GT ( '101X_dataRun2_PromptLike_v9').
Based on comments in the 101X version #23064, it sounds like in 2018 and in the rereco GT the effect should be smaller because the bad components should already be mostly included at the payload level.
So, the plots posted earlier show an overestimate of the effect
@VinInn @venturia @veszpv please confirm

cmsbuild · 2018-04-26T18:46:55Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

slava77 · 2018-04-26T20:06:28Z

@makortel
please remind me how unique was your solution to define the regions and wasn't it driven by expected changes in fakes vs efficiency.

IIUC, this "Fed25" was not a part of the tests done during the pixel pair recovery feature development.
The changes from picking it up are apparently pretty large and suggest a significant increase of fakes.
Shouldn't the region definition be revisited now before we are ready to have this fully in production?

makortel · 2018-04-27T07:21:23Z

@slava77 The automation code was developed with MC and tested also with the various pixel failure scenarios (up to the "v6" which, IIRC, was worse than what happened with the detector in the end). As I've noted earlier, the current implementation was tuned to maximize the efficiency (at the cost of fakes), and there is room for improvement (to reduce fakes) in various places. E.g. relating to the "Fed25", currently for individual/groups of ROCs the code assumes the full module to be inactive (this happening on BPix1 likely leads to larger-than-necessary cones).

fabiocos · 2018-04-27T07:33:40Z

The error in checks refers to an apparently unrelated crash in fastsim in the tests on the same commit in the 10_1_X branch

fabiocos · 2018-04-27T07:34:40Z

+1

slava77 · 2018-04-27T12:17:18Z

Perhaps the most strikingly different plot is the pixelPair seeding regions eta-phi

@makortel
regarding this plot: is there a plot of bad components in the DQM which would clarify the properties in this plot of regions? E.g. I want to see a band of ~full eta width around phi ~ -1.5.
IIUC, this plot should cover (eta,phi)_badComponent +/- region width. How large is the region in eta?

makortel · 2018-04-27T12:54:25Z

@slava77

is there a plot of bad components in the DQM which would clarify the properties in this plot of regions?

There are various maps in PixelPhase1/Phase1_MechanicalView/{PXBarrel,PXForward}. The clusterposition_* are in global phi+z / x+y so are easiest to correlate with phi+eta.

How large is the region in eta?

By region do you mean the TrackingRegion? It depends on the inactive areas on the two layers. Also note that the "TrackingRegion-covered" plot you quote includes all TrackingRegions, so if there are lots of holes in the pixel it can be challenging to correlate all features in these plots with the pixel maps.

Enable "Fed25" information for automated pixel pair mitigation

2a73834

cmsbuild added this to the CMSSW_10_2_X milestone Apr 25, 2018

cmsbuild added code-checks-pending comparison-pending orp-pending pending-signatures reconstruction-pending tests-pending labels Apr 25, 2018

cmsbuild added code-checks-approved and removed code-checks-pending labels Apr 25, 2018

cmsbuild added the bug-fix label Apr 25, 2018

cmsbuild added tests-started and removed tests-pending labels Apr 25, 2018

cmsbuild added the tests-approved label Apr 25, 2018

cmsbuild added comparison-available and removed comparison-pending labels Apr 25, 2018

makortel mentioned this pull request Apr 26, 2018

Fix automated pixel pair mitigation to include "Fed25" information (10_1_X) #23064

Merged

cmsbuild added fully-signed reconstruction-approved and removed pending-signatures reconstruction-pending labels Apr 26, 2018

cmsbuild added orp-approved and removed orp-pending labels Apr 27, 2018

cmsbuild merged commit ea67390 into cms-sw:master Apr 27, 2018

Fix automated pixel pair mitigation to include "Fed25" information #23052

Fix automated pixel pair mitigation to include "Fed25" information #23052

Conversation

makortel commented Apr 25, 2018

cmsbuild commented Apr 25, 2018

cmsbuild commented Apr 25, 2018

cmsbuild commented Apr 25, 2018

makortel commented Apr 25, 2018

makortel commented Apr 25, 2018

cmsbuild commented Apr 25, 2018 • edited

makortel commented Apr 25, 2018 • edited

makortel commented Apr 25, 2018 • edited

VinInn commented Apr 25, 2018

slava77 commented Apr 25, 2018

slava77 commented Apr 25, 2018

makortel commented Apr 25, 2018

slava77 commented Apr 25, 2018 via email

makortel commented Apr 25, 2018

VinInn commented Apr 25, 2018 via email

cmsbuild commented Apr 25, 2018

cmsbuild commented Apr 25, 2018

cmsbuild commented Apr 25, 2018

slava77 commented Apr 25, 2018

VinInn commented Apr 25, 2018 via email

makortel commented Apr 26, 2018

VinInn commented Apr 26, 2018

makortel commented Apr 26, 2018

slava77 commented Apr 26, 2018

makortel commented Apr 26, 2018

fabiocos commented Apr 26, 2018

slava77 commented Apr 26, 2018

slava77 commented Apr 26, 2018

slava77 commented Apr 26, 2018

cmsbuild commented Apr 26, 2018

slava77 commented Apr 26, 2018

makortel commented Apr 27, 2018

fabiocos commented Apr 27, 2018

fabiocos commented Apr 27, 2018

slava77 commented Apr 27, 2018

makortel commented Apr 27, 2018

cmsbuild commented Apr 25, 2018 •

edited

makortel commented Apr 25, 2018 •

edited

makortel commented Apr 25, 2018 •

edited