Skip to content

updated TrkPanelMap with stations 14, 4, and 5, fixes to error handling in digi maker#1748

Merged
oksuzian merged 4 commits intoMu2e:mainfrom
pavel1murat:daquser
Mar 11, 2026
Merged

updated TrkPanelMap with stations 14, 4, and 5, fixes to error handling in digi maker#1748
oksuzian merged 4 commits intoMu2e:mainfrom
pavel1murat:daquser

Conversation

@pavel1murat
Copy link
Contributor

  • also update fcl files

@FNALbuild
Copy link
Collaborator

Hi @pavel1murat,
You have proposed changes to files in these packages:

  • DAQ
  • TrackerConditions

which require these tests: build.

@Mu2e/write, @Mu2e/fnalbuild-users have access to CI actions on main.

⌛ The following tests have been triggered for 53b5701: build (Build queue - API unavailable)

About FNALbuild. Code review on Mu2e/Offline.

@FNALbuild
Copy link
Collaborator

☀️ The build tests passed at 53b5701.

Test Result Details
test with Command did not list any other PRs to include
merge Merged 53b5701 at 191ad5d
build (prof) Log file. Build time: 04 min 09 sec
ceSimReco Log file.
g4test_03MT Log file.
transportOnly Log file.
POT Log file.
g4study Log file.
cosmicSimReco Log file.
cosmicOffSpill Log file.
ceSteps Log file.
ceDigi Log file.
muDauSteps Log file.
ceMix Log file.
rootOverlaps Log file.
g4surfaceCheck Log file.
trigger Log file.
check_cmake Log file.
FIXME, TODO TODO (0) FIXME (0) in 1 files
clang-tidy ➡️ 15 errors 23 warnings
whitespace check ➡️ found whitespace errors

N.B. These results were obtained from a build of this Pull Request at 53b5701 after being merged into the base branch at 191ad5d.

For more information, please check the job page here.
Build artifacts are deleted after 5 days. If this is not desired, select Keep this build forever on the job page.

@oksuzian
Copy link
Collaborator

oksuzian commented Mar 1, 2026

I'll review this pull request by first loading the PR details and its changes.Here is my review of PR #1748. The PR touches 9 files (+236/-65), authored by @pavel1murat from the daquser branch.


Summary

This PR makes three categories of changes:

  1. TrkPanelMap.txt — Expands the tracker panel map with new run-range tables for stations 14, 4, and 5.
  2. StrawDigisFromArtdaqFragments_module.cc — Removes old commented-out code, improves error handling for undersized fragments, fixes a pointer-advance bug on corrupt data, and enhances diagnostic messages.
  3. FCL files (6 files) — Adds the MAKE_SD message category to all test FCL configs so diagnostic output is visible.

Issues Found

🔴 Critical / Bug

1. Slot 5 data is commented out in TABLE TrkPanelMap 108000-200000 but active in TABLE TrkPanelMap 1-10000

In TrackerConditions/data/TrkPanelMap.txt, the 108000-200000 table has all slot 5 entries (station 5, planes 10–11) commented out with #, while the identical entries are active in the 1-10000 table. If slot 5 should be mapped for runs ≥108000, these lines need to be uncommented. If this is intentional (slot 5 not yet installed for those runs), it should be documented more explicitly than just the comment header.

# root [1] print_trk_panel_map(5);        // slot 5
#  034,   20,   5,  10,    5,  1,   0
#  093,   20,   2,  10,    5,  3,   0
...

2. Missing zfc=2 row for slot 11 panel 29 in both new tables

The slot 11 entries for ppid=29 (plane 23) have rows for zfc values 1, 2, 3, 3, 3, 3 — but there is no entry with pnl=2, zfc=2. The rows show:

mnid pnl zfc
275 1 2
254 3 2
229 5 3
274 0 3
151 4 3
225 2 3

Panel pnl=2 has zfc=3, but there's no zfc=2 row for a panel that would typically fill that slot. Comparing with the 6-panel-per-plane pattern elsewhere (where zfc alternates 0,0,0,1,1,1 or 2,2,2,3,3,3), this block has zfc values 2,2,3,3,3,3 — only two panels at zfc=2 versus four at zfc=3. This looks like a data error. The comment "not reliable, needs a double check" reinforces this concern.

3. Gap in run-range coverage: runs 108000–110000 → 107999 cutoff

The original table covered 107000-110000. The new tables cover 107000-107999 and 108000-200000. This is technically fine (no gap), but the old range 107000-110000 has been narrowed to 107000-107999. Any configuration that relied on the old mapping for runs 108000–110000 will now get the new (different) mapping. Make sure this is intentional and that the new 108000-200000 table is correct for those transitional runs.

🟡 Medium / Code Quality

4. Trailing whitespace in error-handling fix

                print_(std::format("ERROR: either dtc_id:{} or link_id:{} is corrupted, skip ROC data",
                                   dtc_id,link_id));
                
                roc_data += (nhits*np_per_hit_+1)*packet_size;
                continue;

There is a blank line with trailing whitespace between the print_() call and the roc_data += line. Minor, but may trigger whitespace linting warnings.

5. Commented-out std::cout debug code left in print_()

  // std::cout << s << ss.back() << ":" << location.line()
  //   //            << location.function_name()
  //      << " : " << Message << std::endl;

This leftover debug code should be removed before merging. It adds no value as a comment and clutters the function.

6. Removed blank line changes code formatting inconsistently

  std::unique_ptr<mu2e::IntensityInfoTrackerHits> intInfo(new mu2e::IntensityInfoTrackerHits);
-
 //-----------------------------------------------------------------------------

The blank line before the comment block was removed, making the code slightly harder to read. This is minor but goes against the PR's own formatting style used elsewhere.

🟢 Low / Suggestions

7. Verbose debug logging enabled by default in all test FCL files

All 6 FCL test files now have MAKE_SD with limit: -1 and reportEvery: 1. While useful for debugging, having unlimited verbose logging permanently enabled in test configs may produce very large log files. Consider whether this should be controlled by a debug flag or limited to a reasonable count for production test runs.

8. Large duplicated data blocks in TrkPanelMap.txt

The entries for slots 0, 10, 11, and 12 are identically duplicated across both the 1-10000 and 108000-200000 tables (72 entries each). This creates a maintenance burden — any correction to a mapping must be applied in two places. Consider whether there's a way to factor this out, or at minimum add a comment noting the intentional duplication.

9. Typo in comment: "confclicts"

# 2026-01-17 PM: needs double checking, at least can be used with stations 4,5,14 w/o confclicts

Should be "conflicts".


Summary Table

# Severity File Issue
1 🔴 Critical TrkPanelMap.txt Slot 5 commented out in 108000-200000 but active in 1-10000 — intentional?
2 🔴 Critical TrkPanelMap.txt Slot 11/panel 29 has suspicious zfc distribution (2,2,3,3,3,3 vs expected 3×2+3×3)
3 🔴 Critical TrkPanelMap.txt Run-range change from 107000-110000 → 107000-107999 may break existing workflows
4 🟡 Medium StrawDigisFromArtdaqFragments_module.cc Trailing whitespace on blank line
5 🟡 Medium StrawDigisFromArtdaqFragments_module.cc Dead std::cout debug code left commented out
6 🟡 Medium StrawDigisFromArtdaqFragments_module.cc Inconsistent blank-line removal
7 🟢 Low FCL files (×6) Unlimited verbose logging enabled in all test configs
8 🟢 Low TrkPanelMap.txt Large duplicated data blocks across run-range tables
9 🟢 Low TrkPanelMap.txt Typo: "confclicts" → "conflicts"

The error handling improvements in the C++ module (fragment size check, roc_data pointer advance on corrupt DTC/link IDs) look correct and are good fixes. The main concerns center around the panel map data integrity — the slot 5 commenting discrepancy and the slot 11 zfc pattern anomaly should be verified before merging.

@oksuzian oksuzian requested a review from brownd1978 March 1, 2026 02:33
@oksuzian
Copy link
Collaborator

I'll review this pull request by examining its details and the code changes.Here is my review of PR #1748 with the critical issues I've identified:


PR #1748 — Critical Issues Review

PR: Mu2e/Offline#1748 by @pavel1murat
Files changed: 11 (+312 / −65) across C++ modules, FCL configs, and a data mapping file


🔴 Critical Issue 1: Missing ROC Data Pointer Advance on Corrupted dtc_id/link_id (Data Corruption / Infinite Loop Risk)

                print_(std::format("ERROR: either dtc_id:{} or link_id:{} is corrupted, skip ROC data",
                                   dtc_id,link_id));
                
                roc_data += (nhits*np_per_hit_+1)*packet_size;
                continue;

This is a new addition that advances the roc_data pointer when dtc_id or link_id is corrupted. While the intent (skip past the bad ROC data) is correct and was a missing bug fix, the formula (nhits*np_per_hit_+1)*packet_size relies on nhits being correctly parsed from the very same corrupted data. If the corruption also affects the nhits field (which is likely if the dtc_id/link_id are already corrupted), this could:

  • Overshoot the buffer if nhits is garbage → buffer over-read / undefined behavior
  • Undershoot if nhits is zero → effectively not advancing, potentially causing an infinite loop in the enclosing while-loop

Recommendation: Add a bounds check to ensure roc_data does not advance past the end of the fragment data buffer. Consider also capping nhits to a sane maximum or breaking out of the ROC loop entirely on corruption.


🔴 Critical Issue 2: Inconsistent zfc Value Between Duplicate Table Sections for Slot 11 Panel 229

In TrackerConditions/data/TrkPanelMap.txt, the panel map for slot 11, mnid 229 has different zfc values across the run-range tables:

# In TABLE TrkPanelMap 1-10000 and TABLE TrkPanelMap 108000-120470:
  229,   16,   3,  23,   29,  5,   3     ← zfc = 3

# In TABLE TrkPanelMap 120471-200000:
  229,   15,   3,  23,   29,  5,   2     ← zfc = 2

The zfc field changes from 3 to 2 for the same physical panel (mnid 229, ppid 29, pnl 5). If this is not an intentional hardware reconfiguration at run 120471, this is a data mapping error that would silently assign incorrect z-face coordinates, corrupting track reconstruction.

Additionally, the dtcid changes from 16 to 15 for this entry — verify this is intentional for the new run range and not a copy-paste error.


🟠 High-Severity Issue 3: DTC_SubEventHeader vs DTC_EventHeader Size Mismatch in Guard

          if (frag->dataSizeBytes() <= sizeof(DTCLib::DTC_SubEventHeader)) {
            // ... SKIP FRAGMENT
            continue;
          }
          fdata += sizeof(DTCLib::DTC_EventHeader);

The size check uses DTC_SubEventHeader but the pointer advance uses DTC_EventHeader. These are different structures with potentially different sizes. If DTC_SubEventHeader is smaller than DTC_EventHeader, a fragment could pass the guard check but still have insufficient data after the DTC_EventHeader skip, causing a buffer over-read in subsequent parsing. The guard should be checking against DTC_EventHeader size (the thing actually being skipped), or at minimum max(DTC_SubEventHeader, DTC_EventHeader).


🟠 High-Severity Issue 4: Slot 5 (Station 5) Entirely Commented Out in Run Ranges 108000+ but Active in 1-10000

# In TABLE TrkPanelMap 1-10000:
  034,   20,   5,  10,    5,  1,   0    ← ACTIVE (12 entries for slot 5)

# In TABLE TrkPanelMap 108000-120470 and 120471-200000:
#  034,   20,   5,  10,    5,  1,   0   ← ALL COMMENTED OUT

All 12 entries for slot 5 (station 5, ppid 5 and ppid 24) are commented out in the 108000-120470 and 120471-200000 tables but active in the 1-10000 table. The PR title says "updated TrkPanelMap with stations 14, 4, and 5" — if station 5 is supposed to be active in production runs (108000+), this is a critical omission. If this is intentional (station 5 not yet commissioned for those runs), it should be clearly documented in the PR description.


🟡 Medium-Severity Issue 5: #include <format> Added But Not Used in Two Files

+#include <format>
+#include <format>

<format> is added to StrawDigiFilter_module.cc and StrawHitFilter_module.cc but neither file's diff shows any usage of std::format. This is a C++20 header that may not be available on all compilers in the Mu2e build environment (the build is labeled "build pending"). Unused includes should be avoided, especially ones with strict compiler requirements.


🟡 Medium-Severity Issue 6: Verbose Debug Logging Enabled by Default in All FCL Files

Six FCL files have the MAKE_SD message category set to reportEvery: 1 and limit: -1 (unlimited). This means every single message from the MAKE_SD category will be printed. For production or shared test configurations, this can produce enormous log files and degrade I/O performance. These should either be gated behind a debug flag or set with a reasonable limit for shared configs.


🟡 Medium-Severity Issue 7: Run Range Gap — 107999 to 108000 Not Covered by Overlapping Tables

The original single table covered 107000-110000. The new tables are:

  • 1-10000
  • 107000-107999
  • 108000-120470
  • 120471-200000

Run 108000 is covered, but runs in the range 10001–106999 have no table entry. If runs in this range exist (even test runs), the panel map lookup will fail silently or throw. Verify this gap is intentional.


Summary

# Severity File Issue
1 🔴 Critical StrawDigisFromArtdaqFragments_module.cc ROC pointer advance uses potentially corrupted nhits — buffer over-read / infinite loop risk
2 🔴 Critical TrkPanelMap.txt Inconsistent zfc (3 vs 2) for mnid 229 across run-range tables
3 🟠 High StrawDigisFromArtdaqFragments_module.cc Fragment size guard checks SubEventHeader but skips EventHeader — size mismatch
4 🟠 High TrkPanelMap.txt Station 5 entries commented out in production run ranges but active in test range
5 🟡 Medium StrawDigiFilter_module.cc, StrawHitFilter_module.cc Unused #include <format> with C++20 dependency
6 🟡 Medium 6 FCL files Verbose debug logging enabled unconditionally
7 🟡 Medium TrkPanelMap.txt Run range gap 10001–106999 has no panel map coverage

@oksuzian
Copy link
Collaborator

@FNALbuild run build test

@FNALbuild
Copy link
Collaborator

⌛ The following tests have been triggered for e0f7365: build (Build queue - API unavailable)

@FNALbuild
Copy link
Collaborator

☀️ The build tests passed at e0f7365.

Test Result Details
test with Command did not list any other PRs to include
merge Merged e0f7365 at dc50b36
build (prof) Log file. Build time: 04 min 05 sec
ceSimReco Log file.
g4test_03MT Log file.
transportOnly Log file.
POT Log file.
g4study Log file.
cosmicSimReco Log file.
cosmicOffSpill Log file.
ceSteps Log file.
ceDigi Log file.
muDauSteps Log file.
ceMix Log file.
rootOverlaps Log file.
g4surfaceCheck Log file.
trigger Log file. Return Code 1.
check_cmake Log file.
FIXME, TODO TODO (0) FIXME (0) in 3 files
clang-tidy ➡️ 21 errors 142 warnings
whitespace check ➡️ found whitespace errors

N.B. These results were obtained from a build of this Pull Request at e0f7365 after being merged into the base branch at dc50b36.

For more information, please check the job page here.
Build artifacts are deleted after 5 days. If this is not desired, select Keep this build forever on the job page.

@oksuzian oksuzian merged commit 6a994e7 into Mu2e:main Mar 11, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants