Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix automated pixel pair mitigation to include "Fed25" information (10_1_X) #23064

Merged
merged 1 commit into from May 11, 2018

Conversation

makortel
Copy link
Contributor

Backport of #23052. Original description

The investigation of https://hypernews.cern.ch/HyperNews/CMS/get/recoTracking/1760.html lead to finding that the "Fed25" ("stuck TBM") information is not currently used in the automated pixel pair mitigation (an oversight in #21630?). This PR fixes the configuration to read the information.

Tested in 10_1_0, expecting changes (=more pixelPair tracks) in workflows processing data containing "Fed25" errors.

@VinInn

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 26, 2018

A new Pull Request was created by @makortel (Matti Kortelainen) for CMSSW_10_1_X.

It involves the following packages:

RecoTracker/TkTrackingRegions

@perrotta, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks.
@felicepantaleo, @GiacomoSguazzoni, @rovere, @VinInn, @mschrode, @gpetruc, @ebrondol, @dgulhan this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 26, 2018

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/27660/console Started: 2018/04/26 18:09

@makortel
Copy link
Contributor Author

type bugfix

@makortel
Copy link
Contributor Author

backport of #23052

@makortel
Copy link
Contributor Author

(there are no code-checks for non-master PRs?)

@slava77
Copy link
Contributor

slava77 commented Apr 26, 2018

I already commented in #23052 that I find this update more of an improvement, not a fully proper bugfix after taking into account the validation of the auto-mitigation feature in 10X.

@VinInn
Copy link
Contributor

VinInn commented Apr 26, 2018

Sorry Slava, I have to strongly disagree. It was supposed to be there and is mostly valuable at HLT
(where is ON) and in Express (where is currently off). In prompt the PixelQuality PLC should create proper IOV and the very same regions will than be "mitigated".
So w/o this bug fix Express and Prompt will differ for no reason (and Express and HLT as well).

This Fix does not affect MC, so it is not a problem.

@fabiocos @davidlange6 @venturia @veszpv @boudoul @mtosi
As leader of TRK-POG I formally ask this bugfix to be put in production BEFORE physics start

@slava77
Copy link
Contributor

slava77 commented Apr 26, 2018 via email

@slava77
Copy link
Contributor

slava77 commented Apr 26, 2018

Is the pixel-less itaration used in PCL for marking IOVs and defining

sorry, I meant pixel pair

@makortel
Copy link
Contributor Author

Is the pixel-pair itaration used in PCL for marking IOVs and defining
payloads for the tracker components?

Maybe @tvami could comment the procedure.

@VinInn
Copy link
Contributor

VinInn commented Apr 26, 2018

Sorry, I did not express it well.
In PCL PixelQuality is supposed to be re-evaluated (using pixel-DPG info) and a new IOV created.
In most of the cases the region identified by FED25 errors will then become "dead ROC" and pixelpair mitigation will therefore trigger for those region in Prompt.

So w/o this fix for the same dead-region Express and Prompt will differ.
(or if PixelQuality is not updated in PCL we get inconsistent behavior between runs with updated PixelQuality and run where it was missed (as it happen in 2017). Triggering MItigation on FED25 error is safer.

Mitigation was supposed to trigger on FED25. It is part of the core strategy and must be enabled as planned asap

@slava77
Copy link
Contributor

slava77 commented Apr 26, 2018 via email

@venturia
Copy link
Contributor

There is a crucial feature which is forgotten in this discussion: the FED25 information is by event, the PCL, at most, it will be by lumisection. Not using FED25 is a downgrade of the system

@VinInn
Copy link
Contributor

VinInn commented Apr 26, 2018 via email

@cmsbuild
Copy link
Contributor

-1

Tested at: 2a73834

You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-23064/27660/summary.html

I found follow errors while testing this PR

Failed tests: RelVals

  • RelVals:

The relvals timed out after 2 hours.

@cmsbuild
Copy link
Contributor

Comparison not run due to runTheMatrix errors (RelVals and Igprof tests were also skipped)

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 27, 2018

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/27683/console Started: 2018/04/27 09:36

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-23064/27683/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 29
  • DQMHistoTests: Total histograms compared: 2506331
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2506154
  • DQMHistoTests: Total skipped: 176
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 1.067 KiB( 28 files compared)
  • DQMHistoSizes: changed ( 10824.0,... ): 0.006 KiB Muons/EventInfo
  • DQMHistoSizes: changed ( 10824.0,... ): 0.005 KiB RecoTauV/EventInfo
  • DQMHistoSizes: changed ( 10824.0,... ): 0.005 KiB RPC/EventInfo
  • DQMHistoSizes: changed ( 10824.0,... ): 0.004 KiB L1T/EventInfo
  • DQMHistoSizes: changed ( 10824.0,... ): 0.004 KiB Tracking/EventInfo
  • DQMHistoSizes: changed ( 10824.0,... ): 0.004 KiB L1TEMU/EventInfo
  • DQMHistoSizes: changed ( 10824.0,... ): 0.004 KiB DT/EventInfo
  • DQMHistoSizes: changed ( 10824.0,... ): 0.004 KiB Ecal/EventInfo
  • DQMHistoSizes: changed ( 10824.0 ): 0.003 KiB SiStrip/EventInfo
  • DQMHistoSizes: changed ( 10824.0 ): 0.001 KiB HLT/EventInfo
  • DQMHistoSizes: changed ( 135.4 ): ...
  • Checked 119 log files, 9 edm output root files, 29 DQM output files

@slava77
Copy link
Contributor

slava77 commented Apr 27, 2018

@makortel @VinInn @venturia
please confirm that FED25 does not exist in simulation so that it's obvious that this PR affects only data.

@makortel
Copy link
Contributor Author

@slava77

please confirm that FED25 does not exist in simulation so that it's obvious that this PR affects only data.

To best of my knowledge FED25 does not exist in simulation (modules are marked inactive only based on SiPixelQuality payload in GT). I'm not sure if the recent #22465 changes the picture, but it is disabled by default.

Adding more people who could confirm for sure @tsusa @tvami @jkarancs @veszpv

@tvami
Copy link
Contributor

tvami commented Apr 28, 2018

I can confirm that FED25 does not exist in simulation. We are actually working on how to include all the failures in the detector simulation, Deborah will have a status report at the next pixel offline meeting about this.

Regarding the PCL workflow, there are 3 payloads produced: (1) containing the permanently bad ROCs, (2) ROCs with FED25 but not the permanently bad ones, (3) everything else. From this three we merge (1) + (3) and we get the payload for prompt. (2) is not included in order to avoid double counting, that is used for monitoring purposes.
The granularity is ROC and lumisection (it's actually a dynamic lumisection -- depends on statistics) and the process is based on occupancy. So tracking (pixel-pair itaration) does not effect the payload creation.
Let me also add @tocheng as he worked on the PCL implementation.

@veszpv
Copy link
Contributor

veszpv commented Apr 29, 2018

Simulation: Indeed, FED-signaled errors are not simulated at all. The effect of bad channels (all ROCs in a channel delivering no data) will probably be added by a mechanism similar to the dynamic inefficiency. PR #22465, instead, is something totally different: it makes randomly dropping single pixels configurable individually for every ROC (the mechanism existed in the past, but only configurable per layer.)

Collision: The PCL bad component feature is not yet fully ready. When ready, its first consumer will be prompt reco, as it derives the information by processing express data. In contrast, the FED25 error is available both in HLT and Express. Also, as @venturia pointed out, the FED25 error is delivered event by event based on knowledge of the state of the system, not from a statistical measurement per lumisection. In fact, this is why the PCL mechanism is (supposed to be) constructed such that FED25 bad components are not going to be marked bad by the payload in order to avoid double counting.

I agree with @VinInn , this should be used whenever possible.

@makortel
Copy link
Contributor Author

@veszpv

The effect of bad channels (all ROCs in a channel delivering no data) will probably be added by a mechanism similar to the dynamic inefficiency.

Just to make sure, is the information of such ROCs planned to be delivered to the reconstruction with the same mechanism as the FED25?

@slava77
Copy link
Contributor

slava77 commented Apr 30, 2018

+1

for #23064 2a73834

This is expected to be merged as soon as we get feedback from the 10_2_0_pre2 relvals.

@fabiocos
When this gets merged we should make sure to at least increment the dataset version. It maybe the most clear to include this PR with the data acquisition era change (2018B at this point).

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_10_1_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_10_2_X is complete. This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

@fabiocos
Copy link
Contributor

fabiocos commented May 2, 2018

@slava77 I agree, in any case we said that we are waiting for the CMSSW_10_2_0_pre2 validation to back-port this update/fix into 10_1_X

@fabiocos fabiocos mentioned this pull request May 11, 2018
6 tasks
@fabiocos
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit c83176f into cms-sw:CMSSW_10_1_X May 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants