[MkFit] Consolidate nan checks and filtering, expose propagation fail flag to steering code #40681

osschar · 2023-02-03T06:21:48Z

PR description:

Changes introduced in this PR are mostly technical but they do include some bugfixes and improvements of corner cases that were potentially leading to failed propagations, Kalman updates and consequently to trashed/nan-ed output track parameters.

Further, there are some changes in standalone/ parts of the code, mostly introduction of interactive shell to be used for debugging.

1. Nan filtering

In MkBuilder::filter_comb_cands() add a flag allowing other TrackCands to be tested, not just the best one. If a later one passes, copy it over the first one as that will be the one used for backward fit/search or for export.

Properly handle not-best TrackCands in filtering after backward search.

After backward search representation of TrackCand hits is branching at the bottom and they need to be re-connected for scoring / filtering algorithms to process them correctly.
Call nan filters and other config specified filters at the same time through definition of a lambda when both of them need to be run.

2. Propagation fail flag & propagation changes

Added MPlexQI MkBase::m_FailFlag (parent class of MkFinder). One has to manually clear it when desired (before propagation). Now only 0 and 1 are used ... could have different error codes, eg., failed to reach, step too large, etc.
The fail-flag mplex is passed into all propagation and propaget+kalman_op functions (barrel and endcap - existing code had limited internal handling with forced restore for barrel propagation before, state was not visible outside).
Added WSR_Failed as member of enum WithinSensitiveRegion_e. Handled in selectHitIndices() and findTracksCloneEngine() -> stop the candidate.
In propagateToR reduce the dot-product correction for in-going tracks to prevent them from overshooting.

At this stage, changes that were leading to major physics changes have been disabled and are marked with PROP-FAIL-ENABLE comments. *

3. Changes in standalone/

Consolidate debug printouts, introduce mkfit::g_debug for additional, top-level debug output control.
Clarify --help and --all-seeds in usage, exit on --help.
Implement mkFit::Shell to allow targeted, interactive debugging through ROOT prompt. Includes:
- event navigation, iteration selection, processing flag setting;
- options for processing of a single seed (or a sequence of seeds for seed indices) based on label or seed position before/after cleaning;
- some track comparison functions.

PR validation:

MTV plots, compare blue (baseline) vs. black:
http://uaf-10.t2.ucsd.edu/~mmasciov/MIC/updatedPR111/MTV_ttbarPU_mkFit_5iter_updatedPR111_update_fixPropagation/

… flag to steering code. 1. Nan filtering In MkBuilder::filter_comb_cands() add a flag allowing other TrackCands to be tested, not just the best one. If a later one passes, copy it over the first one as that will be the one used for backward fit/search or for export. Properly handle not-best TrackCands in filtering after backward search. - After backward search representation of TrackCand hits is branching at the bottom and they need to be re-connected for scoring / filtering algorithms to process them correctly. - Call nan filters and other config specified filters at the same time through definition of a lambda when both of them need to be run. 2. Propagation fail flag *** At this stage physics changing changes are not active, they are marked with PROP-FAIL-ENABLE comments. *** Added MPlexQI MkBase::m_FailFlag (parent class of MkFinder). One has to manually clear it when desired (before propagation). Now only 0 and 1 are used ... could have differnt error codes, eg. failed to reach, step too large, etc. The fail-flag mplex is passed into all propagation and propaget+kalman_op functions (barrel and endcap - there was limited handling with forced restore for barrel propagation before, state was not visible outside). Are there some error states in KalmanOp? We could pass that in as well. Added WSR_Failed as member of enum WithinSensitiveRegion_e Handled in selectHitIndices() and findTracksCloneEngine() -> stop the candidate. With this we can now stop also endcap tracks when they are about to reach apex. Until now endcap tracks went on looping. Another check done is how large the helix angle along the step was attempted (pi/2 forces error). Other propToR/Z checks can be added / tried. fail codes, kamlnaOp fail In backward fit the endcap fails are somewhat accidental -- trying to fix them does not really help as hits are already chosen. So we only do rollback in barrel. We can probably remove "can-reach-radius" check in MkBuilder::fund_track_unroll_candidates() Note that Config::useTrigApprox = true -- reconsider. Retry backward-search with non-swapped first/second layers in the plan? b-tagging efficiency / resolution / number of hits 3. In propagateToR reduce the dot-product correction for ingoing tracks to prevent them from overshooting. 4. Consolidate debug printouts, introduce mkfit::g_debug for additional, top-level debug output control. 5. Changes in standalone/ Clarify --help and --all-seeds in usage, exit on --help. Implement mkFit::Shell to allow targeted, interactive debugging through ROOT prompt. Includes: - event navigation, iteration selection, processing flag setting; - options for processing of a single seed (or a sequence of seeds for seed indices) based on label or seed position before/after cleaning; - some track comparison functions.

cmsbuild · 2023-02-03T06:27:51Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40681/34031

This PR adds an extra 136KB to repository
There are other open Pull requests which might conflict with changes you have proposed:
- File RecoTracker/MkFitCore/src/MkBuilder.cc modified in PR(s): [MkFit] Format change for windows in json files #40679
- File RecoTracker/MkFitCore/src/MkFinder.cc modified in PR(s): [MkFit] Format change for windows in json files #40679
- File RecoTracker/MkFitCore/src/MkFinder.h modified in PR(s): [MkFit] Format change for windows in json files #40679

cmsbuild · 2023-02-03T06:28:19Z

A new Pull Request was created by @osschar (Matevž Tadel) for master.

It involves the following packages:

RecoTracker/MkFitCMS (reconstruction)
RecoTracker/MkFitCore (reconstruction)

@cmsbuild, @mandrenguyen, @clacaputo can you please review it and eventually sign? Thanks.
@VourMa, @makortel, @felicepantaleo, @GiacomoSguazzoni, @JanFSchulte, @rovere, @VinInn, @missirol, @gpetruc, @mmusich, @mtosi, @dgulhan this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

slava77 · 2023-02-03T06:28:28Z

@cmsbuild please test

cmsbuild · 2023-02-03T10:59:26Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-477216/30372/summary.html
COMMIT: eef793e
CMSSW: CMSSW_13_0_X_2023-02-02-2300/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/40681/30372/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

You potentially added 59 lines to the logs
Reco comparison results: 7360 differences found in the comparisons
DQMHistoTests: Total files compared: 49
DQMHistoTests: Total histograms compared: 3555495
DQMHistoTests: Total failures: 28379
DQMHistoTests: Total nulls: 2
DQMHistoTests: Total successes: 3527092
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
Checked 213 log files, 164 edm output root files, 49 DQM output files
TriggerResults: no differences found

slava77 · 2023-02-03T16:08:04Z

Reco comparison results: 7360 differences found in the comparisons

somewhat random check (in a folder with more differences) for 136.793

most differences appear in the detachedTriplet (and are still pretty tiny/small) as expected

slava77 · 2023-02-06T17:04:41Z

@cms-sw/reconstruction-l2
please clarify on the status/plans of the review, in particular if it can converge in time for building pre4
Thank you.

clacaputo · 2023-02-06T17:19:10Z

Hi @slava77 , I'm not sure I'll be able to review all the new code by tomorrow

osschar · 2023-02-06T19:18:11Z

Yes, there's quite a few of changes in there, let me break them up a little:

A lot of changes (most by volume) are in standalone/ subdirectories, eg., the new Shell class.
Changes in the signature of propagation functions to get the propagation fail-flag out are spread out a bit but purely technical.
Changes to processing of candidate filtering are localized to class MkBuilder and files TrackStructures.h/.cc and runFunctions.cc (for running in cmsRun, there are corresponding changes in standalone/buildtestMPlex.cc).

perrotta · 2023-02-08T08:41:42Z

urgent

marking as "urgent" what was requested for 13_0_0_pre4 during the ORP meeting of Feb 7

clacaputo · 2023-02-08T15:56:28Z

Yes, there's quite a few of changes in there, let me break them up a little:

A lot of changes (most by volume) are in standalone/ subdirectories, eg., the new Shell class.

Changes in the signature of propagation functions to get the propagation fail-flag out are spread out a bit but purely technical.

Changes to processing of candidate filtering are localized to class MkBuilder and files TrackStructures.h/.cc and runFunctions.cc (for running in cmsRun, there are corresponding changes in standalone/buildtestMPlex.cc).

Hi @osschar , thanks a lot for the walkthrough. I've checked the code and I don't have any comments.

clacaputo · 2023-02-08T16:00:04Z

+reconstruction

PS is mostly technical with some bugfixes and improvements of corner cases that were potentially leading to failed propagations
reco changes in generalTracks are rather small and percolate to final products, but no pathological behaviour

cmsbuild · 2023-02-08T16:00:34Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

perrotta · 2023-02-09T09:24:36Z

+1

Most of the updates are in "standalone", and for the rest the modifications in reco outputs are widespread but tiny, as expected

cmsbuild added this to the CMSSW_13_0_X milestone Feb 3, 2023

cmsbuild added reconstruction-pending pending-signatures tests-pending orp-pending code-checks-pending tracking labels Feb 3, 2023

cmsbuild added code-checks-approved and removed code-checks-pending labels Feb 3, 2023

cmsbuild added tests-started and removed tests-pending labels Feb 3, 2023

cmsbuild added tests-approved and removed tests-started labels Feb 3, 2023

cmsbuild added the urgent label Feb 8, 2023

cmsbuild added reconstruction-approved fully-signed and removed reconstruction-pending pending-signatures labels Feb 8, 2023

cmsbuild added orp-approved and removed orp-pending labels Feb 9, 2023

cmsbuild merged commit c45c991 into cms-sw:master Feb 9, 2023

This was referenced Feb 9, 2023

CUDA, ROCm and Alpaka-related updates [13.0.x] #40725

Merged

Update gcc 12 builds to CUDA 12.1 cms-sw/cmsdist#8296

Merged

move sanitizer flags in to toolfile; drop sanitizer deps for rocm prod cms-sw/cmsdist#8311

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MkFit] Consolidate nan checks and filtering, expose propagation fail flag to steering code #40681

[MkFit] Consolidate nan checks and filtering, expose propagation fail flag to steering code #40681

osschar commented Feb 3, 2023 •

edited

cmsbuild commented Feb 3, 2023

cmsbuild commented Feb 3, 2023

slava77 commented Feb 3, 2023

cmsbuild commented Feb 3, 2023

slava77 commented Feb 3, 2023

slava77 commented Feb 6, 2023

clacaputo commented Feb 6, 2023

osschar commented Feb 6, 2023

perrotta commented Feb 8, 2023

clacaputo commented Feb 8, 2023

clacaputo commented Feb 8, 2023

cmsbuild commented Feb 8, 2023

perrotta commented Feb 9, 2023

[MkFit] Consolidate nan checks and filtering, expose propagation fail flag to steering code #40681

[MkFit] Consolidate nan checks and filtering, expose propagation fail flag to steering code #40681

Conversation

osschar commented Feb 3, 2023 • edited

PR description:

1. Nan filtering

2. Propagation fail flag & propagation changes

3. Changes in standalone/

PR validation:

cmsbuild commented Feb 3, 2023

cmsbuild commented Feb 3, 2023

slava77 commented Feb 3, 2023

cmsbuild commented Feb 3, 2023

Comparison Summary

slava77 commented Feb 3, 2023

slava77 commented Feb 6, 2023

clacaputo commented Feb 6, 2023

osschar commented Feb 6, 2023

perrotta commented Feb 8, 2023

clacaputo commented Feb 8, 2023

clacaputo commented Feb 8, 2023

cmsbuild commented Feb 8, 2023

perrotta commented Feb 9, 2023

osschar commented Feb 3, 2023 •

edited