Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix automated pixel pair mitigation to include "Fed25" information #23052

Merged
merged 1 commit into from Apr 27, 2018

Conversation

makortel
Copy link
Contributor

The investigation of https://hypernews.cern.ch/HyperNews/CMS/get/recoTracking/1760.html lead to finding that the "Fed25" ("stuck TBM") information is not currently used in the automated pixel pair mitigation (an oversight in #21630?). This PR fixes the configuration to read the information.

Tested in 10_1_0, expecting changes (=more pixelPair tracks) in workflows processing data containing "Fed25" errors.

@VinInn

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @makortel (Matti Kortelainen) for master.

It involves the following packages:

RecoTracker/TkTrackingRegions

@perrotta, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks.
@felicepantaleo, @GiacomoSguazzoni, @rovere, @VinInn, @mschrode, @gpetruc, @ebrondol, @dgulhan this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

@cmsbuild, please test

@makortel
Copy link
Contributor Author

type bug-fix

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 25, 2018

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/27642/console Started: 2018/04/25 13:10

@makortel
Copy link
Contributor Author

makortel commented Apr 25, 2018

@makortel
Copy link
Contributor Author

makortel commented Apr 25, 2018

@VinInn
Copy link
Contributor

VinInn commented Apr 25, 2018

The recovery at track level seems to be quite marginal (and I did not manage to identify any other iteration that was backing up PixelPair).
It is true that we seed PixelPairs only on the first 5 highest-pt vertices and these are minbias, so on average we recover tracks only for ~10% of the pp-collisions ( was 37 for the event in Matti's DQM)

@slava77
Copy link
Contributor

slava77 commented Apr 25, 2018

@makortel
what are the timing and output changes in 305064 case.
Perhaps a 1D histogram of pixel pair seeds where it's possible to read the scale would be helpful as well. The heat-map plot looks like the overall number of seeds in the pixel pair goes up by a factor of 2.

In the context of possibly having to port to 10_1_X:
I'm trying to understand if this is really a fix for a previously unknown situation, or effectively an improvement.
Do I understand correctly (from the plot for 305064) that the "Fed25" list was filled similarly when #21630 feature was developed?
There was ample time to study performance of #21630 and probably much worse situation with data loss in 305064 and also relative satisfaction with performance of #21630.
I so far conclude that this is rather an improvement and a backport to 10_1_X is not necessary.

@slava77
Copy link
Contributor

slava77 commented Apr 25, 2018

what is the situation in the current MC for 2017 and 2018? Is "Fed25" filled at all?

@makortel
Copy link
Contributor Author

@slava77

what are the timing and output changes in 305064 case.

Didn't check the timing yet.

Perhaps a 1D histogram of pixel pair seeds where it's possible to read the scale would be helpful as well. The heat-map plot looks like the overall number of seeds in the pixel pair goes up by a factor of 2.

Here is the distribution of pixelPair seeds per event
image
so a factor of 3-4 increase.

I'm trying to understand if this is really a fix for a previously unknown situation, or effectively an improvement.

I'm calling it bugfix since #21630 should have included "Fed25" list as well.

Do I understand correctly (from the plot for 305064) that the "Fed25" list was filled similarly when #21630 feature was developed?

AFAIK there has been no change in the "Fed25" list filling since it was introduced in #20151.

what is the situation in the current MC for 2017 and 2018? Is "Fed25" filled at all?

Not to my knowledge but @veszpv & co should confirm.

@slava77
Copy link
Contributor

slava77 commented Apr 25, 2018 via email

@makortel
Copy link
Contributor Author

do we have a DQM plot for the leading vertex(vertices)?

What plot do you have in mind? (my test on run 305064 had full DQM)

@VinInn
Copy link
Contributor

VinInn commented Apr 25, 2018 via email

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-23052/27642/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 29
  • DQMHistoTests: Total histograms compared: 2494144
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2493967
  • DQMHistoTests: Total skipped: 176
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 28 files compared)
  • Checked 119 log files, 9 edm output root files, 29 DQM output files

@slava77
Copy link
Contributor

slava77 commented Apr 25, 2018

Reco comparison results: 0 differences found in the comparisons

are we so lucky with the matrix workflows?
The only 2017 full reco data wf is 136.788 using 297557 from 2017B.
Is it a too early run for this purpose (like Fed25 was not even filled)?

@VinInn
Copy link
Contributor

VinInn commented Apr 25, 2018 via email

@makortel
Copy link
Contributor Author

Is it a too early run for this purpose (like Fed25 was not even filled)?

most probably

This is my recollection as well.

@VinInn
Copy link
Contributor

VinInn commented Apr 26, 2018

We need a backport and request a new release for production

@makortel
Copy link
Contributor Author

Backport is in #23064.

@slava77
Copy link
Contributor

slava77 commented Apr 26, 2018

@makortel @VinInn
what is the impact of this update on HLT?

@makortel
Copy link
Contributor Author

@slava77

what is the impact of this update on HLT?

None, HLT already sets this parameter correctly (and thus makes use of the "Fed25" information).

@JanFSchulte

@fabiocos
Copy link
Contributor

@slava77 @VinInn @makortel I see that this fix is not yet fully signed. I would like to avoid delaying further CMSSW_10_2_0_pre2 if not strictly necessary, but if it is about to come it would be useful to get it inside...

@slava77
Copy link
Contributor

slava77 commented Apr 26, 2018

@fabiocos
I understood that the pre2 build is tomorrow.
It would be very useful to get this in, indeed.

I expect to sign this soon, today.

@slava77
Copy link
Contributor

slava77 commented Apr 26, 2018

Here are some observations from 2017F matrix workflows

136.831 JetHT Run 305064, LumiSection 81
(the behavior appears to be about the same also in wf 136.829 : Run 305064, LumiSection 39 and
136.83 : Run 305064, LumiSection 81)

  • CPU

    • in pixel pair seeding parts is up by x10; total in pixelPairStep is up by x2
    • later iterations decrease by a few %
    • total in iter tracking (*Step* in tracking module names) is up by 4.2%
    • the rest of reco is up by 1.5%
  • Disk: about 1% increase in RECO and miniAOD, driven by the increase in the number of tracks and PF candidates by about 2% each.

  • at the generalTracks level:

    • the additional tracks are clearly in the pixel pair, as expected. About double the count in this case. all_sign1016vsorig_runjetht2017f136p831c_recotracks_generaltracks__rereco_obj_originalalgo
    • most additions are at low pt, high eta (and ~flat in phi!) , short length. Combined, it seems like most of the added tracks are fakes.
      all_sign1016vsorig_runjetht2017f136p831c_log10recotracks_generaltracks__rereco_obj_pt
      all_sign1016vsorig_runjetht2017f136p831c_recotracks_generaltracks__rereco_obj_eta
      all_sign1016vsorig_runjetht2017f136p831c_recotracks_generaltracks__rereco_obj_phi
      all_sign1016vsorig_runjetht2017f136p831c_recotracks_generaltracks__rereco_obj_found

The above high level plots are going in line with the lower level details:

  • the fraction of pixPair seeds that make a candidate is down

wf136 831_pixpair_candoverseed

The number of seeds is up by about x4
wf136 831_pixpair_seed_etaphi

Perhaps the most strikingly different plot is the pixelPair seeding regions eta-phi
wf136 831_pixpair_region_etaphi

The added candidates are somewhat localized
wf136 831_pixpair_cand_etaphi

As far as I understand, these are all somewhat expected changes.

The hit pattern efficiency plot shows some drop in efficiency
wf136 831_tkpt1_eff_pattern
I guess addition of shorter/poorer tracks to the reference can lead to visible degradation in this kind of plot.

@slava77
Copy link
Contributor

slava77 commented Apr 26, 2018

+1

for #23052 2a73834

  • the code changes are clear and are meant to pick up the event-by-event bad ROCs (FED25 errors). As described in the PR description or/and the follow up comments, this information was intended to be included all along.
  • jenkins tests pass and comparisons with the baseline show no differences (the only data workflow tested automatically is in 2017B which didn't have the FED25 reported yet at the hardware level).
  • local tests with 2017F confirm that there is some impact from picking up this FED25 error data.

The plots for 2017F relval matrix workflows were done with a prompt-like GT ( '101X_dataRun2_PromptLike_v9').
Based on comments in the 101X version #23064, it sounds like in 2018 and in the rereco GT the effect should be smaller because the bad components should already be mostly included at the payload level.
So, the plots posted earlier show an overestimate of the effect
@VinInn @venturia @veszpv please confirm

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

@slava77
Copy link
Contributor

slava77 commented Apr 26, 2018

@makortel
please remind me how unique was your solution to define the regions and wasn't it driven by expected changes in fakes vs efficiency.

IIUC, this "Fed25" was not a part of the tests done during the pixel pair recovery feature development.
The changes from picking it up are apparently pretty large and suggest a significant increase of fakes.
Shouldn't the region definition be revisited now before we are ready to have this fully in production?

@makortel
Copy link
Contributor Author

@slava77 The automation code was developed with MC and tested also with the various pixel failure scenarios (up to the "v6" which, IIRC, was worse than what happened with the detector in the end). As I've noted earlier, the current implementation was tuned to maximize the efficiency (at the cost of fakes), and there is room for improvement (to reduce fakes) in various places. E.g. relating to the "Fed25", currently for individual/groups of ROCs the code assumes the full module to be inactive (this happening on BPix1 likely leads to larger-than-necessary cones).

@fabiocos
Copy link
Contributor

The error in checks refers to an apparently unrelated crash in fastsim in the tests on the same commit in the 10_1_X branch

@fabiocos
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit ea67390 into cms-sw:master Apr 27, 2018
@slava77
Copy link
Contributor

slava77 commented Apr 27, 2018

Perhaps the most strikingly different plot is the pixelPair seeding regions eta-phi

wf136 831_pixpair_region_etaphi

@makortel
regarding this plot: is there a plot of bad components in the DQM which would clarify the properties in this plot of regions? E.g. I want to see a band of ~full eta width around phi ~ -1.5.
IIUC, this plot should cover (eta,phi)_badComponent +/- region width. How large is the region in eta?

@makortel
Copy link
Contributor Author

@slava77

is there a plot of bad components in the DQM which would clarify the properties in this plot of regions?

There are various maps in PixelPhase1/Phase1_MechanicalView/{PXBarrel,PXForward}. The clusterposition_* are in global phi+z / x+y so are easiest to correlate with phi+eta.

How large is the region in eta?

By region do you mean the TrackingRegion? It depends on the inactive areas on the two layers. Also note that the "TrackingRegion-covered" plot you quote includes all TrackingRegions, so if there are lots of holes in the pixel it can be challenging to correlate all features in these plots with the pixel maps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants