Skip to content

Updates to DBSClusterer and FlgBkgHits#1762

Merged
oksuzian merged 18 commits intoMu2e:mainfrom
NamithaChitrazee:flgbkghits
Mar 19, 2026
Merged

Updates to DBSClusterer and FlgBkgHits#1762
oksuzian merged 18 commits intoMu2e:mainfrom
NamithaChitrazee:flgbkghits

Conversation

@NamithaChitrazee
Copy link
Contributor

  1. In DBSClusterer module, added the functions to classify clusters based on the keras quailty
  2. In flgBkgHits, added a function to flag proton hits as background as well.
  3. Added the newly trained .dat and .hxx files (using CE1BB and pbar datasets)

More small refinements will follow for this update.

@FNALbuild
Copy link
Collaborator

Hi @NamithaChitrazee,
You have proposed changes to files in these packages:

  • TrkHitReco

which require these tests: build.

@Mu2e/fnalbuild-users, @Mu2e/write have access to CI actions on main.

⌛ The following tests have been triggered for ea2d9aa: build (Build queue - API unavailable)

About FNALbuild. Code review on Mu2e/Offline.

@FNALbuild
Copy link
Collaborator

☔ The build is failing at ea2d9aa.

scons: *** [build/al9-prof-e29-p092/Offline/tmp/TrkHitReco/src/DBSClusterer.os] Error 1
Test Result Details
test with Command did not list any other PRs to include
merge Merged ea2d9aa at e4c9d83
build (prof) Log file.
ceSimReco 〰️ Log file.
g4test_03MT 〰️ Log file.
transportOnly 〰️ Log file.
POT 〰️ Log file.
g4study 〰️ Log file.
cosmicSimReco 〰️ Log file.
cosmicOffSpill 〰️ Log file.
ceSteps 〰️ Log file.
ceDigi 〰️ Log file.
muDauSteps 〰️ Log file.
ceMix 〰️ Log file.
rootOverlaps 〰️ Log file.
g4surfaceCheck 〰️ Log file.
trigger Log file.
check_cmake 〰️ Log file.
FIXME, TODO ➡️ TODO (0) FIXME (1) in 2 files
clang-tidy ➡️ 9 errors 34 warnings
whitespace check no whitespace errors found

N.B. These results were obtained from a build of this Pull Request at ea2d9aa after being merged into the base branch at e4c9d83.

For more information, please check the job page here.
Build artifacts are deleted after 5 days. If this is not desired, select Keep this build forever on the job page.

@oksuzian
Copy link
Collaborator

Let me review this pull request in detail.Here's my review of PR #1762"Updates to DBSClusterer and FlgBkgHits". I've identified several categories of issues across the 7 changed files.


🔴 Bugs / Correctness Issues

1. Copy-paste comment error in BkgCluster.hh

       float                    _edep = 0.0;//cluster time   // ← Should say "cluster edep", not "cluster time"

The comment on the new _edep member says //cluster time — copy-pasted from the _time field above it. It should read //cluster energy deposit or similar.


2. Input array size mismatch in classifyCluster (DBSClusterer.cc)

    std::array<float,7> kerasvars;               // ← 7 elements
    kerasvars[0]  = cluster.pos().Rho();
    kerasvars[1]  = zmax - zmin;
    kerasvars[2]  = phimax - phimin;
    kerasvars[3]  = nhits;
    kerasvars[4]  = std::sqrt(...);
    kerasvars[5]  = std::sqrt(...);
    kerasvars[6]  = std::sqrt(...);
    std::vector<float> kerasout = sofiePtr_->infer(kerasvars.data());

The SOFIE infer() function expects an input of size 8 (op_0_k = 8, tensor_densekernel0 has 128 elements = 8×16). But kerasvars only has 7 elements. This means the model will read one uninitialized float from memory past the array, causing undefined behavior and incorrect inference results. Either add the 8th variable (perhaps cluster.edep()?) or verify the model truly expects only 7 inputs.


3. hitToClusterMap passed by value in classifyCluster

      void classifyCluster(BkgClusterCollection& bkgccol, StrawHitFlagCollection& chfcol,
                           const ComboHitCollection& chcol,
                           std::vector<int> hitToClusterMap) const;  // ← passed by VALUE

hitToClusterMap is a std::vector<int> passed by value, so the assignments hitToClusterMap[chit] = icl; inside the function are lost when it returns. The caller's hitToClusterMap remains all -1s. It should be passed by reference: std::vector<int>& hitToClusterMap.

This bug also causes the savebkg_ block to always see icl == -1, meaning every BkgClusterHit gets distance 999.0 — defeating the purpose of the map optimization.


4. BkgClusterHit collection uses chcol_out but iterates with nch (original size)

    if (savebkg_) {
      for (size_t ich=0;ich < nch; ++ich) {
        int icl = hitToClusterMap[ich];
        if (icl > -1) bkghitcol.emplace_back(BkgClusterHit(clusterer_->distance(bkgccol[icl],chcol_out->at(ich)),...));
        else          bkghitcol.emplace_back(BkgClusterHit(999.0,chcol_out->at(ich).flag()));
      }
    }

This code iterates ich from 0 to nch (the original collection size) but indexes into chcol_out, which may be smaller than nch when filter_ == true (hits were filtered out). This will cause an out-of-bounds access / crash via chcol_out->at(ich). Previously the code correctly used the original chcol[ich] and ch.flag().


5. findNeighbors misses backward neighbors in time

    float minTime = time0 - deltaTime_;
    auto it_start = std::lower_bound(idx.begin(), idx.end(), minTime, [&chcol](unsigned i, float val){
      return chcol[i].correctedTime() < val;
    });
    size_t istart = std::distance(idx.begin(), it_start);
    for (size_t j = istart; j < idx.size(); ++j){
      ...
      float dt = hitj.correctedTime() - time0;
      if (dt > deltaTime_) break;

The new lower_bound search uses time0 - deltaTime_ and then only checks dt > deltaTime_ (forward direction). This correctly finds hits in [time0 - deltaTime_, time0 + deltaTime_]. However, with DeltaTime changing from 15 → 30, verify that the much larger time window combined with DeltaZ shrinking from 800 → 200 produces the intended cluster behavior. The old code scanned backward from ihit; the new binary search approach is an improvement, but ensure this was validated.


6. Indentation / scoping issue in classifyCluster

    std::array<int,StrawId::_nplanes> hitplanes{0};
    for (const auto& chit : cluster.hits()) {
    const ComboHit& ch = chcol[chit];          // ← Not indented, looks like it's outside the loop
    hitplanes[ch.strawId().plane()] += ch.nStrawHits();
    }

And similarly:

    for (int ip = ipmin; ip <= ipmax; ++ip) {
    npexp++;                                    // ← Not indented
    if (hitplanes[ip]> 0){
      ++np;
      ...
    }
    nhits += hitplanes[ip];
    }

While this compiles correctly, the lack of indentation makes the code very hard to review and error-prone for future edits.


7. mergeClusters declared but never defined

      void     mergeClusters    (std::vector<BkgCluster>& clusters, const ComboHitCollection& chcol);

This new function is declared in the header but never defined in DBSClusterer.cc. This will cause a linker error if anything ever calls it. Either implement it or remove the declaration.


8. Unused variables in classifyCluster

    unsigned npexp(0), np(0), nhits(0);
    // ...
    int fp(ipmin), lp(ipmin-1), pgap(0);
  • npexp is incremented but never used.
  • fp is assigned but never read after the loop.
  • pgap is computed but never used.

These will generate compiler warnings and suggest incomplete logic.


9. nNeighbors return type mismatch

  int DBSClusterer::findNeighbors(...) {
    // ...
    unsigned nNeighbors = hit0.nStrawHits()-1;   // unsigned
    // ...
    return nNeighbors;                           // returned as int
  }

nNeighbors is declared as unsigned inside the function but the return type is int. If nStrawHits() returns 0 (unlikely but possible), nStrawHits()-1 would wrap around to a huge unsigned value. Consider initializing as int nNeighbors = ....


🟡 Design / Quality Concerns

10. #include "TMath.h" added but not needed

The new code in DBSClusterer.cc includes TMath.h but doesn't use any TMath functions. M_PI, std::sqrt, std::abs, etc. come from <cmath>. This adds an unnecessary ROOT dependency.

11. Leftover debug output in classifyCluster

    if (diag_>0) std::cout << "kerasout = " << kerasout[0] << std::endl;

Using raw std::cout instead of the framework's messaging service (mf::LogDebug/mf::LogInfo) is not best practice for art modules. Also, diag_ > 0 is a very low bar — this will print for every cluster in any diagnostic run.

12. Removed cluster sort

The std::sort(bkgccol...) by time was removed from FlagBkgHits_module.cc. The comment said "sort is needed for recovery algorithm." Verify no downstream code relies on time-sorted clusters.

13. ConditionsHandle.hh included but unused

#include "Offline/ConditionsService/inc/ConditionsHandle.hh"

This header is added but nothing from it is used.

14. countProton logic: flags individual hits, not entire clusters

The countProton function flags individual hits with high energy deposit as background, but only for clusters not already marked as background. This means a cluster can have a mix of flagged/unflagged hits — is this intentional, or should the entire cluster be flagged if a threshold fraction of hits are proton-like?

15. Commented-out code should be removed

There's a significant amount of commented-out code (//std::queue, /*old findNeighbors*/, //int findClusterIdx, etc.) that should be cleaned up before merge.


Summary

Severity Issue File
🔴 Bug kerasvars has 7 elements but model expects 8 inputs DBSClusterer.cc
🔴 Bug hitToClusterMap passed by value (writes are lost) FlagBkgHits_module.cc
🔴 Bug chcol_out->at(ich) out-of-bounds when filtering FlagBkgHits_module.cc
🟠 Potential mergeClusters declared but not defined (linker error) DBSClusterer.hh
🟠 Potential nNeighbors unsigned wrap-around risk DBSClusterer.cc
🟡 Quality Copy-paste comment //cluster time on _edep BkgCluster.hh
🟡 Quality Unused variables (npexp, fp, pgap) DBSClusterer.cc
🟡 Quality Unnecessary #include "TMath.h" and ConditionsHandle.hh Multiple
🟡 Quality Lots of commented-out code / debug std::cout Multiple
🟡 Quality Inconsistent indentation in classifyCluster DBSClusterer.cc

The most critical issues are #2 (array size mismatch → undefined behavior), #3 (pass-by-value → silently broken map), and #4 (out-of-bounds when filtering is active). These should be fixed before merging.

@oksuzian
Copy link
Collaborator

@FNALbuild run build test

Copy link
Collaborator

@brownd1978 brownd1978 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no comments. I suggest copilog comments be addressed before merging.

Namitha Chithirasreemadam and others added 2 commits March 17, 2026 11:09
@NamithaChitrazee
Copy link
Contributor Author

@oksuzian Thank you for the comments. Fixed the bugs and cleaned up the code.

🔴 Bugs / Correctness Issues

1. Copy-paste comment error in BkgCluster.hh

       float                    _edep = 0.0;//cluster time   // ← Should say "cluster edep", not "cluster time"

The comment on the new _edep member says //cluster time — copy-pasted from the _time field above it. It should read //cluster energy deposit or similar.

Fixed it

2. Input array size mismatch in classifyCluster (DBSClusterer.cc)

    std::array<float,7> kerasvars;               // ← 7 elements
    kerasvars[0]  = cluster.pos().Rho();
    kerasvars[1]  = zmax - zmin;
    kerasvars[2]  = phimax - phimin;
    kerasvars[3]  = nhits;
    kerasvars[4]  = std::sqrt(...);
    kerasvars[5]  = std::sqrt(...);
    kerasvars[6]  = std::sqrt(...);
    std::vector<float> kerasout = sofiePtr_->infer(kerasvars.data());

The SOFIE infer() function expects an input of size 8 (op_0_k = 8, tensor_densekernel0 has 128 elements = 8×16). But kerasvars only has 7 elements. This means the model will read one uninitialized float from memory past the array, causing undefined behavior and incorrect inference results. Either add the 8th variable (perhaps cluster.edep()?) or verify the model truly expects only 7 inputs.

Fixed the bug. The correct number of inputs is 7.

3. hitToClusterMap passed by value in classifyCluster

      void classifyCluster(BkgClusterCollection& bkgccol, StrawHitFlagCollection& chfcol,
                           const ComboHitCollection& chcol,
                           std::vector<int> hitToClusterMap) const;  // ← passed by VALUE

hitToClusterMap is a std::vector<int> passed by value, so the assignments hitToClusterMap[chit] = icl; inside the function are lost when it returns. The caller's hitToClusterMap remains all -1s. It should be passed by reference: std::vector<int>& hitToClusterMap.

This bug also causes the savebkg_ block to always see icl == -1, meaning every BkgClusterHit gets distance 999.0 — defeating the purpose of the map optimization.

Fixed it.

4. BkgClusterHit collection uses chcol_out but iterates with nch (original size)

    if (savebkg_) {
      for (size_t ich=0;ich < nch; ++ich) {
        int icl = hitToClusterMap[ich];
        if (icl > -1) bkghitcol.emplace_back(BkgClusterHit(clusterer_->distance(bkgccol[icl],chcol_out->at(ich)),...));
        else          bkghitcol.emplace_back(BkgClusterHit(999.0,chcol_out->at(ich).flag()));
      }
    }

This code iterates ich from 0 to nch (the original collection size) but indexes into chcol_out, which may be smaller than nch when filter_ == true (hits were filtered out). This will cause an out-of-bounds access / crash via chcol_out->at(ich). Previously the code correctly used the original chcol[ich] and ch.flag().

Fixed it.

5. findNeighbors misses backward neighbors in time

    float minTime = time0 - deltaTime_;
    auto it_start = std::lower_bound(idx.begin(), idx.end(), minTime, [&chcol](unsigned i, float val){
      return chcol[i].correctedTime() < val;
    });
    size_t istart = std::distance(idx.begin(), it_start);
    for (size_t j = istart; j < idx.size(); ++j){
      ...
      float dt = hitj.correctedTime() - time0;
      if (dt > deltaTime_) break;

The new lower_bound search uses time0 - deltaTime_ and then only checks dt > deltaTime_ (forward direction). This correctly finds hits in [time0 - deltaTime_, time0 + deltaTime_]. However, with DeltaTime changing from 15 → 30, verify that the much larger time window combined with DeltaZ shrinking from 800 → 200 produces the intended cluster behavior. The old code scanned backward from ihit; the new binary search approach is an improvement, but ensure this was validated.

6. Indentation / scoping issue in classifyCluster

    std::array<int,StrawId::_nplanes> hitplanes{0};
    for (const auto& chit : cluster.hits()) {
    const ComboHit& ch = chcol[chit];          // ← Not indented, looks like it's outside the loop
    hitplanes[ch.strawId().plane()] += ch.nStrawHits();
    }

And similarly:

    for (int ip = ipmin; ip <= ipmax; ++ip) {
    npexp++;                                    // ← Not indented
    if (hitplanes[ip]> 0){
      ++np;
      ...
    }
    nhits += hitplanes[ip];
    }

While this compiles correctly, the lack of indentation makes the code very hard to review and error-prone for future edits.

Cleaned up this function

7. mergeClusters declared but never defined

      void     mergeClusters    (std::vector<BkgCluster>& clusters, const ComboHitCollection& chcol);

This new function is declared in the header but never defined in DBSClusterer.cc. This will cause a linker error if anything ever calls it. Either implement it or remove the declaration.

Removed the declaration

8. Unused variables in classifyCluster

    unsigned npexp(0), np(0), nhits(0);
    // ...
    int fp(ipmin), lp(ipmin-1), pgap(0);
  • npexp is incremented but never used.
  • fp is assigned but never read after the loop.
  • pgap is computed but never used.

These will generate compiler warnings and suggest incomplete logic.

Removed them. This function has been cleaned up.

9. nNeighbors return type mismatch

  int DBSClusterer::findNeighbors(...) {
    // ...
    unsigned nNeighbors = hit0.nStrawHits()-1;   // unsigned
    // ...
    return nNeighbors;                           // returned as int
  }

nNeighbors is declared as unsigned inside the function but the return type is int. If nStrawHits() returns 0 (unlikely but possible), nStrawHits()-1 would wrap around to a huge unsigned value. Consider initializing as int nNeighbors = ....

Fixed it.

🟡 Design / Quality Concerns

10. #include "TMath.h" added but not needed

The new code in DBSClusterer.cc includes TMath.h but doesn't use any TMath functions. M_PI, std::sqrt, std::abs, etc. come from <cmath>. This adds an unnecessary ROOT dependency.

Removed TMath.

11. Leftover debug output in classifyCluster

    if (diag_>0) std::cout << "kerasout = " << kerasout[0] << std::endl;

Using raw std::cout instead of the framework's messaging service (mf::LogDebug/mf::LogInfo) is not best practice for art modules. Also, diag_ > 0 is a very low bar — this will print for every cluster in any diagnostic run.

Removed this print-out

12. Removed cluster sort

The std::sort(bkgccol...) by time was removed from FlagBkgHits_module.cc. The comment said "sort is needed for recovery algorithm." Verify no downstream code relies on time-sorted clusters.

13. ConditionsHandle.hh included but unused

#include "Offline/ConditionsService/inc/ConditionsHandle.hh"

This header is added but nothing from it is used.

Removed this header file

14. countProton logic: flags individual hits, not entire clusters

The countProton function flags individual hits with high energy deposit as background, but only for clusters not already marked as background. This means a cluster can have a mix of flagged/unflagged hits — is this intentional, or should the entire cluster be flagged if a threshold fraction of hits are proton-like?

This is intentional. I would leave it like this for the time being and probably improve upon it in the next iteration

15. Commented-out code should be removed

There's a significant amount of commented-out code (//std::queue, /*old findNeighbors*/, //int findClusterIdx, etc.) that should be cleaned up before merge.

Cleaned up the code

@oksuzian
Copy link
Collaborator

@FNALbuild run build test

@FNALbuild
Copy link
Collaborator

⌛ The following tests have been triggered for 3282b56: build (Build queue - API unavailable)

@FNALbuild
Copy link
Collaborator

☀️ The build tests passed at 3282b56.

Test Result Details
test with Command did not list any other PRs to include
merge Merged 3282b56 at b188c42
build (prof) Log file. Build time: 08 min 29 sec
ceSimReco Log file.
g4test_03MT Log file.
transportOnly Log file.
POT Log file.
g4study Log file.
cosmicSimReco Log file.
cosmicOffSpill Log file.
ceSteps Log file.
ceDigi Log file.
muDauSteps Log file.
ceMix Log file.
rootOverlaps Log file.
g4surfaceCheck Log file.
trigger Log file. Return Code 1.
check_cmake Log file.
FIXME, TODO TODO (0) FIXME (0) in 4 files
clang-tidy ➡️ 3 errors 26 warnings
whitespace check no whitespace errors found

N.B. These results were obtained from a build of this Pull Request at 3282b56 after being merged into the base branch at b188c42.

For more information, please check the job page here.
Build artifacts are deleted after 5 days. If this is not desired, select Keep this build forever on the job page.

@oksuzian
Copy link
Collaborator

I'll review this pull request by examining its details and the code changes.Here is my review of PR #1762 ("Updates to DBSClusterer and FlgBkgHits") in Mu2e/Offline. I've organized the findings by severity.


🔴 Bugs / Potential Correctness Issues

1. findNeighbors return type mismatch — unsigned to signed conversion

unsigned nNeighbors = hit0.nStrawHits()-1;

findNeighbors is now declared as returning int, and nNeighbors is locally declared unsigned. The accumulation with nNeighbors += hitj.nStrawHits() keeps it unsigned, and then it is implicitly narrowed to int on return. If nNeighbors exceeds INT_MAX (unlikely but contractually wrong), this is undefined behavior. More practically, hit0.nStrawHits()-1 when nStrawHits() is 0 wraps to a huge unsigned value. The return type and internal type should be consistent — either both int (with a guard for nStrawHits() == 0) or both unsigned.

2. findNeighbors misses earlier hits (asymmetric window)

float minTime = time0 - deltaTime_;
auto it_start = std::lower_bound(idx.begin(), idx.end(), minTime, [&chcol](unsigned i, float val){
  return chcol[i].correctedTime() < val;
});
size_t istart = std::distance(idx.begin(),it_start);
for (size_t j = istart; j < idx.size(); ++j){
  ...
  float dt = hitj.correctedTime() - time0;
  if (dt > deltaTime_) break;

The search starts from time0 - deltaTime_ but the loop only breaks when dt > deltaTime_ (i.e., hitj.time > time0 + deltaTime_). However, the old code scanned both directions from ihit. The new lower_bound finds the first hit with time ≥ minTime, which is correct for the backward direction. But the loop walks forward from istart through ihit and past it, so hits between minTime and time0 with index < ihit are now included even if time0 - hitj.time > deltaTime_ (which can't happen given the lower_bound). This is actually fine logically, but note that the lower_bound comparator compares chcol[i].correctedTime() < val where i is an element of idx (a combo hit index) and val is a float. This relies on idx being sorted by correctedTime — which it is. So this is correct, but fragile and could use a comment.

3. classifyCluster does not set kerasQ for small clusters

void DBSClusterer::classifyCluster(BkgCluster& cluster, const ComboHitCollection& chcol){
    if(cluster.hits().size() < 3) return;

Clusters with 1 or 2 hits return without setting kerasQ. The kerasQ member is initialized in the class to 0.0 (from the default constructor). In FlagBkgHits, the comparison is cluster.getKerasQ() > kerasQ_ where kerasQ_ defaults to 0.8. So small clusters would have kerasQ = 0.0 < 0.8 and would not be flagged as background. Previously the code set kerasQ = -1.0 for all clusters. The new behavior silently relies on default initialization being 0. This should at minimum be documented, and ideally kerasQ should be explicitly set (e.g., to -1.0 or 0.0) in the early return path so the intent is clear.

4. countProton flags non-background hits with energysel — semantic confusion

void FlagBkgHits::countProton(...) const
{ for (size_t icl=0; icl < bkgccol.size(); ++icl) {
      auto& cluster = bkgccol[icl];
      bool isAlreadyBkg = cluster._flag.hasAllProperties(BkgClusterFlag::bkg);
      if (!isAlreadyBkg) {
        StrawHitFlag bkgFlag(StrawHitFlag::energysel);
        for (const auto& hitIdx : cluster.hits()) {
           if (chcol[hitIdx].energyDep() > minedep_)
             chfcol[hitIdx].merge(bkgFlag);
        }
      }
    }
  }

The function name is countProton but it doesn't count anything — it flags hits with high energy deposition in non-background clusters using the energysel flag. The name is misleading. More importantly, the energysel flag is used in SignalMask in the prolog config. Setting this flag on high-edep hits within non-background clusters may inadvertently mark proton-like signal hits as satisfying energy selection, which may not be the intended behavior. The logic and naming should be verified against physics intent.

5. calculateCluster computes cphi without wrapping — potential discontinuity

float dp     = chcol[idx].phi();
// OLD code had:
// if (dp > M_PI)  dp -= 2*M_PI;
// if (dp < -M_PI) dp += 2*M_PI;
// NEW code removed these lines
cphi     += dp*weight;

The old code explicitly wrapped phi into [-π, π]. The new code removes this. If chcol[idx].phi() can return values outside [-π, π], the weighted average of cphi will be wrong. Even within [-π, π], averaging angles near ±π can produce incorrect results (the classic circular mean problem). The old code had the same issue but at least normalized inputs; now it doesn't.


🟡 Code Quality / Style Issues

6. Commented-out code left in

     //std::queue<unsigned>  inspect;

The old std::queue is commented out rather than deleted. This should be removed.

7. Opening brace placement in classifyCluster and countProton

  void FlagBkgHits::classifyCluster(...) const
  { for (size_t icl =0; icl < bkgccol.size(); ++icl) {

The { is on the same line as for, making it read like the function body is { for ... } rather than the conventional opening-brace-on-next-line style used elsewhere. Same issue in countProton.

8. dump() lost its cluster summary line

  void DBSClusterer::dump(const std::vector<BkgCluster>& clusters)
  {
    int iclu(0);
    for (auto& cluster: clusters) {
      // REMOVED: std::cout<<"Cluster "<<iclu<<" "<<cluster.pos()<<" "<<cluster.time()<<"  "<<cluster.hits().size()<<"  - ";
      for (auto& hit : cluster.hits()) std::cout<<hit<<" ";

The line printing cluster position, time, and size was removed. Now the dump just prints raw hit indices with no cluster identification header, making the diagnostic output much less useful.

9. Extra space in variable assignment

    float y0    =  hit0.pos().y();

Double space before hit0.

10. Huge auto-generated file added without comment

TrkHitReco/inc/TrainBkgDiagBDT.hxx adds 22,562 lines. It appears to be an auto-generated BDT inference header but is not referenced anywhere in the PR's code changes. If it's not used, it shouldn't be committed. If it's planned for future use, it should at least be documented.


🟢 Positive Changes Worth Noting

  • Performance improvement: Replacing std::queue with std::vector (stack-based DFS) and using std::lower_bound for neighbor search are good optimizations.
  • findClusterIdx elimination: Replacing the O(n×m) linear scan with a pre-built hitToClusterMap is a significant efficiency improvement.
  • sofiePtr_ activation: Enabling the previously-commented-out SOFIE inference is a meaningful feature addition.
  • edep tracking in clusters: Adding energy deposition to BkgCluster is a clean, well-integrated extension.
  • Reserving vectors: neighbors.reserve(256), clusters.reserve(...) etc. are good for reducing allocations.

Summary

Severity Count Key Concerns
🔴 Bug/Correctness 5 Unsigned/signed mismatch in findNeighbors return, uninitialized kerasQ for small clusters, phi wrapping removed, countProton semantics, energysel flag meaning
🟡 Quality 5 Commented-out code, brace style, broken dump(), unused 22K-line file, extra whitespace
🟢 Good 5 Performance improvements, cleaner architecture, feature enablement

The most critical items to address before merge are #1 (unsigned return narrowing), #3 (uninitialized kerasQ for small clusters), and #5 (phi wrapping removal). The unused TrainBkgDiagBDT.hxx file (#10) should also be clarified.

@NamithaChitrazee
Copy link
Contributor Author

@oksuzian Hi, I've addressed almost all the comments.

🔴 Bugs / Potential Correctness Issues

1. findNeighbors return type mismatch — unsigned to signed conversion

unsigned nNeighbors = hit0.nStrawHits()-1;

Fixed the bug.

2. findNeighbors misses earlier hits (asymmetric window)

float minTime = time0 - deltaTime_;
auto it_start = std::lower_bound(idx.begin(), idx.end(), minTime, [&chcol](unsigned i, float val){
  return chcol[i].correctedTime() < val;
});
size_t istart = std::distance(idx.begin(),it_start);
for (size_t j = istart; j < idx.size(); ++j){
  ...
  float dt = hitj.correctedTime() - time0;
  if (dt > deltaTime_) break;

The search starts from time0 - deltaTime_ but the loop only breaks when dt > deltaTime_ (i.e., hitj.time > time0 + deltaTime_). However, the old code scanned both directions from ihit. The new lower_bound finds the first hit with time ≥ minTime, which is correct for the backward direction. But the loop walks forward from istart through ihit and past it, so hits between minTime and time0 with index < ihit are now included even if time0 - hitj.time > deltaTime_ (which can't happen given the lower_bound). This is actually fine logically, but note that the lower_bound comparator compares chcol[i].correctedTime() < val where i is an element of idx (a combo hit index) and val is a float. This relies on idx being sorted by correctedTime — which it is. So this is correct, but fragile and could use a comment.

3. classifyCluster does not set kerasQ for small clusters

void DBSClusterer::classifyCluster(BkgCluster& cluster, const ComboHitCollection& chcol){
    if(cluster.hits().size() < 3) return;

Yes, very small clusters have the keras quality by default set to 0.0 and they are passed along to the next stage of reconstruction as it is difficult to identify at this stage if they are background or potential signal hits.

4. countProton flags non-background hits with energysel — semantic confusion

void FlagBkgHits::countProton(...) const
{ for (size_t icl=0; icl < bkgccol.size(); ++icl) {
      auto& cluster = bkgccol[icl];
      bool isAlreadyBkg = cluster._flag.hasAllProperties(BkgClusterFlag::bkg);
      if (!isAlreadyBkg) {
        StrawHitFlag bkgFlag(StrawHitFlag::energysel);
        for (const auto& hitIdx : cluster.hits()) {
           if (chcol[hitIdx].energyDep() > minedep_)
             chfcol[hitIdx].merge(bkgFlag);
        }
      }
    }
  }

this function now counts proton clusters and the counter is passed along as IntensityInfoTimeCluster data product. The logic of this counting will be improved in the next iteration. At present, it is a place holder.

5. calculateCluster computes cphi without wrapping — potential discontinuity

float dp     = chcol[idx].phi();
// OLD code had:
// if (dp > M_PI)  dp -= 2*M_PI;
// if (dp < -M_PI) dp += 2*M_PI;
// NEW code removed these lines
cphi     += dp*weight;

The old code explicitly wrapped phi into [-π, π]. The new code removes this. If chcol[idx].phi() can return values outside [-π, π], the weighted average of cphi will be wrong. Even within [-π, π], averaging angles near ±π can produce incorrect results (the classic circular mean problem). The old code had the same issue but at least normalized inputs; now it doesn't.

🟡 Code Quality / Style Issues

6. Commented-out code left in

     //std::queue<unsigned>  inspect;

Removed it.

7. Opening brace placement in classifyCluster and countProton

  void FlagBkgHits::classifyCluster(...) const
  { for (size_t icl =0; icl < bkgccol.size(); ++icl) {

Fixed it.

8. dump() lost its cluster summary line

  void DBSClusterer::dump(const std::vector<BkgCluster>& clusters)
  {
    int iclu(0);
    for (auto& cluster: clusters) {
      // REMOVED: std::cout<<"Cluster "<<iclu<<" "<<cluster.pos()<<" "<<cluster.time()<<"  "<<cluster.hits().size()<<"  - ";
      for (auto& hit : cluster.hits()) std::cout<<hit<<" ";

Just removed the whole dump function for the time being. There is BkgDiag module in the ArtAnalysis repo which can be used to do a detailed diagnostics of the background clusters.

9. Extra space in variable assignment

    float y0    =  hit0.pos().y();

Fixed

10. Huge auto-generated file added without comment

TrkHitReco/inc/TrainBkgDiagBDT.hxx adds 22,562 lines. It appears to be an auto-generated BDT inference header but is not referenced anywhere in the PR's code changes. If it's not used, it shouldn't be committed. If it's planned for future use, it should at least be documented.

Removed this file

@oksuzian
Copy link
Collaborator

@FNALbuild run build test

@FNALbuild
Copy link
Collaborator

⌛ The following tests have been triggered for 22c023a: build (Build queue - API unavailable)

@FNALbuild
Copy link
Collaborator

📝 The HEAD of main has changed to c570352. Tests are now out of date.

@FNALbuild
Copy link
Collaborator

☀️ The build tests passed at 22c023a.

Test Result Details
test with Command did not list any other PRs to include
merge Merged 22c023a at 1241ca2
build (prof) Log file. Build time: 04 min 11 sec
ceSimReco Log file.
g4test_03MT Log file.
transportOnly Log file.
POT Log file.
g4study Log file.
cosmicSimReco Log file.
cosmicOffSpill Log file.
ceSteps Log file.
ceDigi Log file.
muDauSteps Log file.
ceMix Log file.
rootOverlaps Log file.
g4surfaceCheck Log file.
trigger Log file. Return Code 1.
check_cmake Log file.
FIXME, TODO TODO (0) FIXME (0) in 4 files
clang-tidy ➡️ 3 errors 21 warnings
whitespace check no whitespace errors found

N.B. These results were obtained from a build of this Pull Request at 22c023a after being merged into the base branch at 1241ca2.

For more information, please check the job page here.
Build artifacts are deleted after 5 days. If this is not desired, select Keep this build forever on the job page.

@oksuzian
Copy link
Collaborator

Let me review this pull request to find any issues.Here's my review of PR #1762 — "Updates to DBSClusterer and FlgBkgHits". I've identified several issues ranging from potential bugs to code quality concerns.


🔴 Bugs / Correctness Issues

Missing shiv.clear() in sub-level output path (FlagBkgHits_module.cc): fillStrawHitIndices(size_t, SHIV&, Level) appends via push_back without clearing — verified in ComboHit.cc:135-159. The old code called shiv.clear() each iteration; the new code removed it. Hit indices accumulate across iterations, producing duplicate ComboHits when the sub-level path is active.

hit.time() vs hit.correctedTime() mismatch in MVA input (DBSClusterer.cc:classifyCluster): Time delta computed as hit.time() - cluster.time(), but cluster.time() derives from correctedTime(). time() returns raw TDC threshold crossing (_etime[_eend]), correctedTime() returns calibrated aggregate time (_time). The MVA feature is physically meaningless.

1. Incorrect use of std::lower_bound on indirectly-sorted vector

auto it_start = std::lower_bound(idx.begin(), idx.end(), minTime, [&chcol](unsigned i, float val){
  return chcol[i].correctedTime() < val;
});

std::lower_bound requires the range to be sorted by the same comparator. The idx vector is sorted by chcol[i].correctedTime(), but the values in idx themselves are not in ascending order — they are indices into chcol. lower_bound works on the element values in the range, comparing them directly. Here the comparator takes an unsigned (index) and a float (time), which is a valid heterogeneous comparison — but std::lower_bound with a projection-style comparator on the elements is actually correct here because the sequence is sorted by the projected key. However, there's a subtle issue: lower_bound only searches forward from time0 - deltaTime_, but the old linear scan also searched backward. This means hits with times in [time0 - deltaTime_, time0) that appear before ihit in the sorted order are correctly found, but the binary search starts from the global beginning of idx. This is actually fine and is a valid optimization — no bug here on second analysis, but it changes behavior from the previous code (the old code also only searched from istart forward).

2. countProton flags hits as energysel even for non-proton clusters

for (const auto& hitIdx : cluster.hits()) {
  if (chcol[hitIdx].energyDep() > minedep_){
     chfcol[hitIdx].merge(bkgFlag);   // <-- flags ALL high-edep hits
     nhighedep++;
  }
}
// only THEN checks if the cluster qualifies as proton
if(nhits > minhits_ and (float)nhighedep/nhits > minfrac_)
  npc++;

Every high-energy-deposit hit in every non-background cluster gets flagged with energysel, regardless of whether the cluster actually qualifies as a proton candidate. The flagging should be done after confirming the cluster meets the proton criteria, or the flags should be reverted. As written, this could incorrectly flag signal hits.

3. Use of and keyword instead of &&

if(nhits > minhits_ and (float)nhighedep/nhits > minfrac_)

While and is a valid C++ alternative token, the rest of the codebase consistently uses &&. More importantly, this is a style inconsistency that could confuse other developers.

4. Division by zero risk in classifyCluster

unsigned nchits = cluster.hits().size();
// ...
kerasvars[4]  = std::sqrt((sqrSumDeltaX+sqrSumDeltaY)/nchits);
kerasvars[5] = std::sqrt(sqrSumDeltaTime/nchits);
kerasvars[6] = std::sqrt(sqrSumDeltaPhi/nchits);

While the early return if(cluster.hits().size() < 3) return; guards against nchits == 0, if a cluster somehow has hits but all with zero straw hits, the nhits variable (used for kerasvars[3]) could be 0. Not a division-by-zero for nhits directly, but worth noting for defensive coding.

5. Keras quality not set for small clusters

void DBSClusterer::classifyCluster(BkgCluster& cluster, const ComboHitCollection& chcol){
    if(cluster.hits().size() < 3) return;
    // ...
    cluster.setKerasQ(kerasout[0]);
}

Clusters with fewer than 3 hits will not have their kerasQ set at all — it stays at the default. The old code set it to -1.0 for all clusters. Now with the default _kerasQ being 0.0 (from the header), small clusters will have kerasQ == 0.0, which is below the threshold of 0.8, so they won't be flagged as background. This may be intentional but should be explicit (e.g., set kerasQ = 0.0 or -1.0 explicitly on early return).

6. Removed error handling in FlagBkgHits_module.cc

// REMOVED:
// } else {
//   throw cet::exception("RECO")<<"FlagBkgHits: inconsistent ComboHits" << std::endl;
// }
// ...
// if((!filter_) && chcol_out->size() != chcol_p.size())
//   throw cet::exception("RECO")<<"FlagBkgHits: inconsistent ComboHit output" << std::endl;

Two important consistency checks were removed silently. The first checks that fillStrawHitIndices returns the expected parent collection. The second validates the output size when not filtering. Removing these makes it much harder to diagnose data corruption issues in production.

7. shiv vector is no longer cleared before fillStrawHitIndices

// Old code had: shiv.clear();
// New code removed it
for(size_t ich=0; ich < nch; ++ich) {
    StrawHitFlag const& flag = chfcol[ich];
    if (!filter_ || !flag.hasAnyProperty(bkgmsk_)) {
        if(&chcol_p == chcol.fillStrawHitIndices(ich, shiv, level_)){

If fillStrawHitIndices appends to shiv rather than replacing it, this will cause accumulated stale indices across iterations.


🟡 Code Quality / Design Issues

8. Redundant forward declaration with full include

#include "Offline/TrkHitReco/inc/TrainBkgDiag.hxx"   // full definition included

// Then:
namespace TMVA_SOFIE_TrainBkgDiag {
  class Session;  // redundant forward declaration
}

Since TrainBkgDiag.hxx is already #include'd and it defines Session, the forward declaration is redundant. Either use only the forward declaration (and move the #include to the .cc file) or remove the forward declaration.

9. Phi averaging without circular mean handling

float dp = chcol[idx].phi();
// Removed:
// if (dp > M_PI)  dp -= 2*M_PI;
// if (dp < -M_PI) dp += 2*M_PI;
cphi += dp*weight;

The phi normalization before averaging was removed. Simple arithmetic averaging of angles is incorrect near the ±π boundary (e.g., averaging -179° and +179° gives 0° instead of 180°). This could produce incorrect cluster positions for clusters near the phi wrap-around.

10. bkghitcol always allocated and filled even when savebkg_ is false

auto bkghitcol = std::make_unique<BkgClusterHitCollection>();
bkghitcol->assign(nch, BkgClusterHit(999.0, StrawHitFlag())); //Pre-fill with defaults

This allocation happens unconditionally now, but bkghitcol is only event.put() when savebkg_ is true. When savebkg_ is false, the classifyCluster method still writes into it (gated by if(savebkg_)), but the initial allocation of nch elements is wasted memory. The old code only reserved when savebkg_ was true.

11. Commented-out code left in

//bkghitcol.reserve(nch);
//hitToClusterMap[chit] = icl;

These commented-out lines should be removed for cleanliness.

12. Inconsistent alignment of kerasvars assignments

kerasvars[0]  = cluster.pos().Rho();
kerasvars[1]  = zmax - zmin;
// ...
kerasvars[5] = std::sqrt(sqrSumDeltaTime/nchits);   // missing extra space
kerasvars[6] = std::sqrt(sqrSumDeltaPhi/nchits);

Indices 5 and 6 have different spacing than 0–4. Minor, but affects readability.


Summary

Severity Issue File
🔴 Bug Hits flagged as energysel even when cluster isn't a proton candidate FlagBkgHits_module.cc
🔴 Bug Removed consistency checks / error handling silently FlagBkgHits_module.cc
🔴 Bug shiv.clear() removed — potential stale data accumulation FlagBkgHits_module.cc
🟠 Risk Phi averaging without wrap-around handling DBSClusterer.cc
🟠 Risk Small clusters get default kerasQ=0 instead of explicit sentinel DBSClusterer.cc
🟡 Quality Redundant forward declaration alongside full #include DBSClusterer.hh
🟡 Quality bkghitcol allocated unconditionally FlagBkgHits_module.cc
🟡 Quality and vs && style inconsistency FlagBkgHits_module.cc
🟡 Quality Commented-out code should be removed FlagBkgHits_module.cc

float edep = chcol[idx].energyDep();
XYZVectorF hitpos(chcol[idx].pos().x(), chcol[idx].pos().y(), chcol[idx].pos().z());
cluster.addHitPosition(hitpos);
float dp = chcol[idx].phi();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the AI that phi wrapping has to be handled correctly to deal with the weighted sum. Typically this is done by choosing 1 (arbitrary) wrapping (the 1st entry is typical), then enforcing that all subsequent entries have the same. Let me know if this is unclear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @brownd1978, fixed this.

Copy link
Collaborator

@brownd1978 brownd1978 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes, the phi wrapping now all looks correct. Have you retrained since these fixes? I imagine it might have some impact.
Otherwise I have a few minor suggestions that don't affect the code quality, implement as you see fit.

void FlagBkgHits::classifyCluster(BkgClusterCollection& bkgccol, BkgClusterHitCollection& bkghitcol, StrawHitFlagCollection& chfcol, const ComboHitCollection& chcol) const
{
for (auto& cluster : bkgccol) {
for (size_t icl =0; icl < bkgccol.size(); ++icl) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest a range loop here as well: for (auto& cluster : bkgcol) {

if (std::find(bkgccol[icl].hits().begin(),bkgccol[icl].hits().end(),ich) != bkgccol[icl].hits().end()) return icl;
return -1;
int npc(0);
for (size_t icl=0; icl < bkgccol.size(); ++icl) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest using a range loop here too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @brownd1978, Thank you. Fixed them.

@oksuzian
Copy link
Collaborator

@FNALbuild run build test

@FNALbuild
Copy link
Collaborator

⌛ The following tests have been triggered for 19b4105: build (Build queue - API unavailable)

@FNALbuild
Copy link
Collaborator

☀️ The build tests passed at 19b4105.

Test Result Details
test with Command did not list any other PRs to include
merge Merged 19b4105 at c570352
build (prof) Log file. Build time: 04 min 07 sec
ceSimReco Log file.
g4test_03MT Log file.
transportOnly Log file.
POT Log file.
g4study Log file.
cosmicSimReco Log file.
cosmicOffSpill Log file.
ceSteps Log file.
ceDigi Log file.
muDauSteps Log file.
ceMix Log file.
rootOverlaps Log file.
g4surfaceCheck Log file.
trigger Log file. Return Code 1.
check_cmake Log file.
FIXME, TODO TODO (0) FIXME (0) in 4 files
clang-tidy ➡️ 3 errors 23 warnings
whitespace check no whitespace errors found

N.B. These results were obtained from a build of this Pull Request at 19b4105 after being merged into the base branch at c570352.

For more information, please check the job page here.
Build artifacts are deleted after 5 days. If this is not desired, select Keep this build forever on the job page.

@oksuzian oksuzian merged commit 5b02c13 into Mu2e:main Mar 19, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants