Skip to content

Calo filter#1778

Merged
oksuzian merged 10 commits intoMu2e:mainfrom
bechenard:CaloFilter
Apr 10, 2026
Merged

Calo filter#1778
oksuzian merged 10 commits intoMu2e:mainfrom
bechenard:CaloFilter

Conversation

@bechenard
Copy link
Copy Markdown
Contributor

Updated calo filter without energy and time variables. The NN evaluation and the filtering are also separated. Need a mu2e-trig-config PR to work

@FNALbuild
Copy link
Copy Markdown
Collaborator

Hi @bechenard,
You have proposed changes to files in these packages:

  • CaloReco
  • CaloFilters
  • CaloDiag
  • CaloCluster

which require these tests: build.

@Mu2e/fnalbuild-users, @Mu2e/write have access to CI actions on main.

⌛ The following tests have been triggered for 77caf00: build (Build queue - API unavailable)

About FNALbuild. Code review on Mu2e/Offline.

@FNALbuild
Copy link
Copy Markdown
Collaborator

☔ The build tests failed for 77caf00.

Test Result Details
test with Command did not list any other PRs to include
merge Merged 77caf00 at 67d99ba
build (prof) Log file. Build time: 04 min 06 sec
ceSimReco Log file. Return Code 9.
g4test_03MT Log file.
transportOnly Log file.
POT Log file.
g4study Log file.
cosmicSimReco Log file.
cosmicOffSpill Log file. Return Code 9.
ceSteps Log file.
ceDigi Log file. Return Code 9.
muDauSteps Log file.
ceMix Log file. Return Code 9.
rootOverlaps Log file.
g4surfaceCheck Log file.
trigger Log file. Return Code 2.
check_cmake Log file. Return Code 1.
FIXME, TODO ➡️ TODO (0) FIXME (1) in 6 files
clang-tidy ➡️ 10 errors 148 warnings
whitespace check no whitespace errors found

N.B. These results were obtained from a build of this Pull Request at 77caf00 after being merged into the base branch at 67d99ba.

For more information, please check the job page here.
Build artifacts are deleted after 5 days. If this is not desired, select Keep this build forever on the job page.

@michaelmackenzie
Copy link
Copy Markdown
Contributor

@FNALbuild run build test with Mu2e/mu2e-trig-config#152

@FNALbuild
Copy link
Copy Markdown
Collaborator

⌛ The following tests have been triggered for 77caf00: build (Build queue - API unavailable)

@FNALbuild
Copy link
Copy Markdown
Collaborator

☔ The build tests failed for 77caf00.

Test Result Details
test with Included Mu2e/mu2e-trig-config#152 @ 8a756aff88565cd0d3669cbae9fbe92640559266 by merge
merge Merged 77caf00 at 67d99ba
build (prof) Log file. Build time: 08 min 28 sec
ceSimReco Log file. Return Code 9.
g4test_03MT Log file.
transportOnly Log file.
POT Log file.
g4study Log file.
cosmicSimReco Log file.
cosmicOffSpill Log file.
ceSteps Log file.
ceDigi Log file.
muDauSteps Log file.
ceMix Log file.
rootOverlaps Log file.
g4surfaceCheck Log file.
trigger Log file. Return Code 2.
check_cmake Log file. Return Code 1.
FIXME, TODO ➡️ TODO (0) FIXME (1) in 6 files
clang-tidy ➡️ 10 errors 148 warnings
whitespace check no whitespace errors found

N.B. These results were obtained from a build of this Pull Request at 77caf00 after being merged into the base branch at 67d99ba.

For more information, please check the job page here.
Build artifacts are deleted after 5 days. If this is not desired, select Keep this build forever on the job page.

@bechenard
Copy link
Copy Markdown
Contributor Author

@FNALbuild run build test with Mu2e/mu2e-trig-config#152

@FNALbuild
Copy link
Copy Markdown
Collaborator

⌛ The following tests have been triggered for 77caf00: build (Build queue - API unavailable)

@FNALbuild
Copy link
Copy Markdown
Collaborator

☔ The build tests failed for 77caf00.

Test Result Details
test with Included Mu2e/mu2e-trig-config#152 @ 8a756aff88565cd0d3669cbae9fbe92640559266 by merge
merge Merged 77caf00 at a31b07e
build (prof) Log file. Build time: 04 min 06 sec
ceSimReco Log file. Return Code 9.
g4test_03MT Log file.
transportOnly Log file.
POT Log file.
g4study Log file.
cosmicSimReco Log file.
cosmicOffSpill Log file.
ceSteps Log file.
ceDigi Log file.
muDauSteps Log file.
ceMix Log file.
rootOverlaps Log file.
g4surfaceCheck Log file.
trigger Log file. Return Code 2.
check_cmake Log file. Return Code 1.
FIXME, TODO ➡️ TODO (0) FIXME (1) in 6 files
clang-tidy ➡️ 10 errors 148 warnings
whitespace check no whitespace errors found

N.B. These results were obtained from a build of this Pull Request at 77caf00 after being merged into the base branch at a31b07e.

For more information, please check the job page here.
Build artifacts are deleted after 5 days. If this is not desired, select Keep this build forever on the job page.

@FNALbuild
Copy link
Copy Markdown
Collaborator

☔ The build tests failed for ee787a4.

Test Result Details
test with Command did not list any other PRs to include
merge Merged ee787a4 at 50ac9dc
build (prof) Log file. Build time: 04 min 08 sec
ceSimReco Log file.
g4test_03MT Log file.
transportOnly Log file.
POT Log file.
g4study Log file.
cosmicSimReco Log file.
cosmicOffSpill Log file. Return Code 9.
ceSteps Log file.
ceDigi Log file. Return Code 9.
muDauSteps Log file.
ceMix Log file. Return Code 9.
rootOverlaps Log file.
g4surfaceCheck Log file.
trigger Log file. Return Code 2.
check_cmake Log file.
FIXME, TODO ➡️ TODO (0) FIXME (1) in 7 files
clang-tidy ➡️ 10 errors 156 warnings
whitespace check no whitespace errors found

N.B. These results were obtained from a build of this Pull Request at ee787a4 after being merged into the base branch at 50ac9dc.

For more information, please check the job page here.
Build artifacts are deleted after 5 days. If this is not desired, select Keep this build forever on the job page.

@bechenard
Copy link
Copy Markdown
Contributor Author

@FNALbuild run build test with Mu2e/mu2e-trig-config#152

@FNALbuild
Copy link
Copy Markdown
Collaborator

⌛ The following tests have been triggered for ee787a4: build (Build queue - API unavailable)

@FNALbuild
Copy link
Copy Markdown
Collaborator

☀️ The build tests passed at ee787a4.

Test Result Details
test with Included Mu2e/mu2e-trig-config#152 @ 8a756aff88565cd0d3669cbae9fbe92640559266 by merge
merge Merged ee787a4 at c6b091b
build (prof) Log file. Build time: 04 min 08 sec
ceSimReco Log file.
g4test_03MT Log file.
transportOnly Log file.
POT Log file.
g4study Log file.
cosmicSimReco Log file.
cosmicOffSpill Log file.
ceSteps Log file.
ceDigi Log file.
muDauSteps Log file.
ceMix Log file.
rootOverlaps Log file.
g4surfaceCheck Log file.
trigger Log file. Return Code 2.
check_cmake Log file.
FIXME, TODO ➡️ TODO (0) FIXME (1) in 7 files
clang-tidy ➡️ 10 errors 156 warnings
whitespace check no whitespace errors found

N.B. These results were obtained from a build of this Pull Request at ee787a4 after being merged into the base branch at c6b091b.

For more information, please check the job page here.
Build artifacts are deleted after 5 days. If this is not desired, select Keep this build forever on the job page.

@oksuzian
Copy link
Copy Markdown
Collaborator

oksuzian commented Apr 6, 2026

I'll review this pull request by examining its details and changes.Let me fetch the changed files from the comparison diff and look at the head branch files directly.Now let me fetch the new CaloNNEval_module.cc and the updated FilterEcalNNTrigger_module.cc more carefully to check the full files:Now let me look at the existing review comments to understand what's already been flagged:Here is my review of PR #1778 – "Calo filter". I've identified several issues ranging from potential bugs to code quality concerns.


🔴 Potential Bugs

1. secondMoment() formula change may be incorrect

    double ClusterUtils::secondMoment() const
    {
        double sx(0),sy(0),sz(0),sx2(0),sy2(0),sw(0);
        fill(sx,sy,sz,sx2,sy2,sw);
        return (sx2-sx*sx/sw + sy2-sy*sy/sw)/sw;
    }

The old formula was sx2-sx*sx/sw + sy2-sy*sy/sw (i.e. Var(x)*sw + Var(y)*sw), and the new one divides by sw once more. This is a physics-meaningful change to the second moment calculation. If fill() accumulates weighted sums (sx2 += w*x*x, sw += w), then the old formula was ∑w·x² − (∑w·x)²/∑w which is ∑w · Var(x), and dividing by sw gives the actual variance. This looks intentional and correct, but it means the NN weights trained against the old definition are now incompatible. Verify that the NN weight files (CE_NN_ReLU_noET.weights.xml, etc.) were retrained with this new normalization, or downstream consumers will get wrong MVA scores.

2. CaloNNEval — potential division by zero

        std::vector<float> mvavars(7,0.0);
        mvavars[0] = clusterIt->cog3Vector().perp();
        mvavars[1] = clusterIt->size();
        mvavars[2] = cluUtil.e1()/clusterIt->energyDep();
        mvavars[3] = cluUtil.e2()/clusterIt->energyDep();
        mvavars[4] = cluUtil.e9()/clusterIt->energyDep();
        mvavars[5] = cluUtil.e25()/clusterIt->energyDep();
        mvavars[6] = cluUtil.secondMoment();

clusterIt->energyDep() is used as a divisor. While the minEtoTest_ cut on line 104 should protect against zero-energy clusters, if minEtoTest_ is configured as 0, this will divide by zero. Consider adding a guard or ensuring the configuration enforces minEtoTest_ > 0.

3. FilterEcalNNTrigger — index mismatch between clusters and MVA results

      for (size_t index=0;index<caloClusters.size();++index){
         const auto& cluster = caloClusters[index];
         const auto& mvaout  = caloMVAs[index];

This assumes caloClusters.size() == caloMVAs.size(). If CaloNNEval and FilterEcalNNTrigger consume different cluster collections (which is possible via FCL configuration), the sizes may differ and this will cause an out-of-bounds access. There is no size validation before the loop. Consider adding:

if (caloClusters.size() != caloMVAs.size()) return false;

4. CaloClusterFast — sentinel value collision risk

           if (*iterSeed==hits.size() || hitSeed.energyDep()< EminSeed_) {++iterSeed; continue;}

*iterSeed==hits.size() is used as a sentinel to mark "consumed" entries (line 128: *iterSeed=hits.size()). This works but is fragile — if hits vector is ever resized or if the sentinel value coincidentally matches a valid index, it would silently break. A more robust approach would be to use std::optional, a separate visited vector, or std::numeric_limits<size_t>::max() as the sentinel.


🟡 Design & Robustness Concerns

5. CaloClusterFast::fillCluster — no empty cluster guard

      xCOG /= totalEnergy;
      yCOG /= totalEnergy;

If clusterList is somehow empty or all hits have zero energy, this divides by zero. While the clustering logic should prevent this, a defensive check would be prudent.

6. CaloNNEval has a redundant private method secondMoment()

       float    secondMoment(const Calorimeter& cal, const CaloHitPtrVector& hits) const;

This private method is declared but never defined or called in the module. The code uses cluUtil.secondMoment() instead (line 134). This dead declaration should be removed.

7. CaloClusterFast::cog3Vector sets z=0 unconditionally

      caloClusters.emplace_back(CaloCluster(iDisk,time,0.0,totalEnergy,0.0,CLHEP::Hep3Vector(xCOG,yCOG,0),

The COG z-component is hardcoded to 0. This may cause issues for downstream code that calls cluster.cog3Vector().perp() (as CaloNNEval does), since perp() for (x,y,0) would be sqrt(x²+y²), using local coordinates rather than the disk-frame FF coordinates used by the standard clustering. Make sure this is consistent with what the NN expects.

8. CaloNNDiag — missing validity check on caloClustersHandle

In the analyze() method, the code fetches caloClusterTruthHandle and dereferences it without checking validity, but the early-return logic only checks the select energy filter. If the truth association handle is invalid, this will crash. (The original code also had this issue, but it's worth noting.)


🟢 Minor / Style Issues

9. Commented-out code in CaloNNEval

        //This would be the version with energy and time included
        //std::vector<float> mvavars(9,0.0);
        //mvavars[0] = clusterIt->energyDep();
        ...

There's a large block of commented-out code showing the "version with energy and time." This should be removed or documented more clearly (e.g., in a commit message or separate doc), since the PR description mentions "without energy and time variables."

10. Missing CaloCluster dependency in FilterEcalNNTrigger CMakeLists

cet_build_plugin(FilterEcalNNTrigger art::module
    REG_SOURCE src/FilterEcalNNTrigger_module.cc
    LIBRARIES REG
      Offline::RecoDataProducts
)

The old FilterEcalNNTrigger depended on Offline::CalorimeterGeom, Offline::GeometryService, Offline::Mu2eUtilities, and Offline::CaloCluster. The new version removed those but still uses CaloCluster types transitively through RecoDataProducts. This seems fine, but make sure Offline::RecoDataProducts correctly pulls in headers for CaloCluster and MVAResult — otherwise this may fail at link time.


Summary

# Severity File Issue
1 🔴 High ClusterUtils.cc secondMoment() normalization changed — NN weights must be retrained
2 🔴 Medium CaloNNEval_module.cc Division by zero if minEtoTest_ is 0
3 🔴 Medium FilterEcalNNTrigger_module.cc No bounds check between cluster/MVA collection sizes
4 🟡 Low CaloClusterFast_module.cc Fragile sentinel value pattern
5 🟡 Low CaloClusterFast_module.cc No zero-energy guard in fillCluster
6 🟡 Low CaloNNEval_module.cc Dead declaration of secondMoment()
7 🟡 Medium CaloClusterFast_module.cc COG z=0 may give different perp() than standard clustering
8 🟡 Low CaloNNDiag_module.cc Missing handle validity check
9 🟢 Style CaloNNEval_module.cc Remove commented-out code block
10 🟢 Info CMakeLists.txt Verify transitive dependencies are sufficient

@bechenard
Copy link
Copy Markdown
Contributor Author

@FNALbuild run build test with Mu2e/mu2e-trig-config#152

@FNALbuild
Copy link
Copy Markdown
Collaborator

⌛ The following tests have been triggered for 49d2d9d: build (Build queue - API unavailable)

@bechenard
Copy link
Copy Markdown
Contributor Author

Ok, I have fixed some of the AI stuff, most were actually moot.

@FNALbuild
Copy link
Copy Markdown
Collaborator

☀️ The build tests passed at 49d2d9d.

Test Result Details
test with Included Mu2e/mu2e-trig-config#152 @ 8a756aff88565cd0d3669cbae9fbe92640559266 by merge
merge Merged 49d2d9d at 68688bb
build (prof) Log file. Build time: 08 min 20 sec
ceSimReco Log file.
g4test_03MT Log file.
transportOnly Log file.
POT Log file.
g4study Log file.
cosmicSimReco Log file.
cosmicOffSpill Log file.
ceSteps Log file.
ceDigi Log file.
muDauSteps Log file.
ceMix Log file.
rootOverlaps Log file.
g4surfaceCheck Log file.
trigger Log file. Return Code 2.
check_cmake Log file.
FIXME, TODO ➡️ TODO (0) FIXME (1) in 7 files
clang-tidy ➡️ 10 errors 157 warnings
whitespace check no whitespace errors found

N.B. These results were obtained from a build of this Pull Request at 49d2d9d after being merged into the base branch at 68688bb.

For more information, please check the job page here.
Build artifacts are deleted after 5 days. If this is not desired, select Keep this build forever on the job page.

@oksuzian
Copy link
Copy Markdown
Collaborator

I'll review this pull request by examining its details and changes.Let me get the actual diff to review the changes:Now let me look at the new CaloNNEval_module.cc and the modified FilterEcalNNTrigger_module.cc more closely, and also check for the MVAResult data product:Now let me look at the new CaloNNEval_module.cc and FilterEcalNNTrigger_module.cc more carefully, and also get the existing files for context:Now I have a thorough view of the entire PR. Let me review the new CaloNNEval_module.cc more closely and check the existing FilterEcalNNTrigger_module.cc for correctness.

Here is my detailed review of PR #1778 – Calo filter in Mu2e/Offline:


Summary

This PR refactors the calorimeter NN trigger filter by:

  1. Separating NN evaluation (CaloNNEval) from filtering (FilterEcalNNTrigger)
  2. Adding a new CaloNNTrain diagnostic analyzer
  3. Adding new "fast" hit/cluster makers to the reco chain
  4. Updating CaloNNDiag to use vector<vector<>> for MC truth (better ROOT branch layout) and adding validity checks
  5. Fixing the secondMoment() calculation (normalizing by sw)
  6. Updating NN weights (adding cluTime, cluSec, and ratio-based energy features)
  7. Refactoring CaloHitMakerFast (sorted digis, changed map type, new pulse-merging logic)

Issues Found

1. 🐛 Typo in exception messageFilterEcalNNTrigger_module.cc

throw cet::exception("FILTER")<< "FilterEcalNNTrigger: Clusters and MVA collection sizes incpmpatible\n";

"incpmpatible" → should be "incompatible".


2. 🐛 Missing space in diagnostic outputCaloNNEval_module.cc (line ~128)

if (diagLevel_>0) std::cout<<"CaloNNEval cluster with e="<<clusterIt->energyDep()
                           <<" and r="<< clusterIt->cog3Vector().perp()
                           <<"has mvaout="<<mvaout<<"\n";

"has mvaout=" is missing a leading space — the output will read "...r=450.2has mvaout=0.8". Should be " has mvaout=".


3. ⚠️ Hard-coded MVA vector size of 7 doesn't match all weight filesCaloNNEval_module.cc

The CaloNNEval module always builds 7 features:

std::vector<float> mvavars(7,0.0);
mvavars[0] = clusterIt->cog3Vector().perp();
mvavars[1] = clusterIt->size();
mvavars[2] = cluUtil.e1()/clusterIt->energyDep();
mvavars[3] = cluUtil.e2()/clusterIt->energyDep();
mvavars[4] = cluUtil.e9()/clusterIt->energyDep();
mvavars[5] = cluUtil.e25()/clusterIt->energyDep();
mvavars[6] = cluUtil.secondMoment();

But the updated CE_NN_ReLU.weights.xml now has 9 variables (adds cluEnergy, cluTime). If CaloNNEval is ever configured to use CE_NN_ReLU.weights.xml instead of CE_NN_ReLU_noET.weights.xml, the variable count mismatch will cause incorrect results or a crash. Consider either:

  • Making the feature vector construction configurable / driven by the MVA metadata, or
  • Adding a runtime check that the number of variables matches what the weights file expects.

4. ⚠️ Potential division by zero in secondMoment()ClusterUtils.cc (line 62)

double ClusterUtils::secondMoment() const
{
    double sx(0),sy(0),sz(0),sx2(0),sy2(0),sw(0);
    fill(sx,sy,sz,sx2,sy2,sw);
    return (sx2-sx*sx/sw + sy2-sy*sy/sw)/sw;
}

If all hits are on a different disk from hits_[0], the fill() method skips them all and sw remains 0, causing division by zero. The same issue exists for cog3Vector(). While this may be unlikely in practice, it would be prudent to guard against sw == 0.


5. ⚠️ addPulse logic change may silently discard SiPM matchesCaloHitMakerFast_module.cc

The new logic:

addNewHit    = false;
float ratio  = (eDep-pulse.eDep_)/(eDep+pulse.eDep_);
float sigmaR = 0.707*sqrt(1.0/eMean/nPEperMeV_ + noise2_/eMean/eMean);

if (fabs(ratio) < nSigmaNoise_*sigmaR) {pulse.add(time,eDep);}
else if (eDep>pulse.eDep_)             {pulse.time_=time; pulse.eDep_=eDep;}
break;

When a digi is within deltaTPulses_ but fails the energy consistency check and has less energy (eDep <= pulse.eDep_), the digi is silently dropped — addNewHit is false and the break exits the loop. This is a deliberate design choice, but it differs from the previous behavior (which would continue looking for another matching pulse). Worth confirming this is intentional.


6. ⚠️ unordered_mapmap changes ordering behaviorCaloHitMakerFast_module.cc

using pulseMapType = std::map<unsigned, std::vector<HitInfo>>;

Changing from unordered_map to std::map ensures deterministic output ordering (good for reproducibility) but introduces O(log n) lookup overhead. For the digi volumes in this experiment, this is likely fine, but it's a physics-relevant change since output order may affect downstream consumers. Ensure this doesn't cause validation issues.


7. 💡 CaloNNDiag early-returns before Ntup_->Fill() on missing handles

The new validity checks are good defensive coding:

if (!CaloHitsHandle.isValid()) return;
if (!caloClustersHandle.isValid()) return;

However, be aware that the tree will simply not get filled for events where handles are missing — no counter or warning is emitted. If this module is used in analysis and handles are unexpectedly missing, it may be confusing to debug. Consider adding a warning message.


8. 💡 secondMoment() formula change is a physics-relevant correction

The original formula:

sx2 - sx*sx/sw + sy2 - sy*sy/sw

was changed to:

(sx2 - sx*sx/sw + sy2 - sy*sy/sw) / sw

This is a fix to normalize the second moment properly (variance = (Σw·x² − (Σw·x)²/Σw) / Σw). This changes the numeric value of cluSecMom and any downstream cuts or MVA features that depend on it. The new NN weights are presumably trained with this corrected definition, so consistency is maintained — but any existing downstream analysis code or cuts using the old secondMoment() will break.


9. 💡 New modules added to default Reco sequences

CaloHitFastMaker and CaloClusterFastMaker are added to the default Reco paths in the prolog files. This means all reconstruction jobs will now run these fast reco modules even if the NN trigger isn't used. This adds processing time to every reco job. Consider whether these should be in a separate optional sequence.


Summary Table

# Severity File Issue
1 🐛 Bug FilterEcalNNTrigger_module.cc Typo: "incpmpatible"
2 🐛 Bug CaloNNEval_module.cc Missing space in diagnostic output
3 ⚠️ Risk CaloNNEval_module.cc Hard-coded 7-feature vector incompatible with 9-var weight file
4 ⚠️ Risk ClusterUtils.cc Division by zero if sw == 0
5 ⚠️ Risk CaloHitMakerFast_module.cc Digis silently dropped when energy mismatch and lower energy
6 ⚠️ Minor CaloHitMakerFast_module.cc map vs unordered_map changes output ordering
7 💡 Note CaloNNDiag_module.cc Silent early returns with no logging
8 💡 Note ClusterUtils.cc secondMoment() normalization is a physics-breaking change
9 💡 Note Prolog fcl files Fast reco modules added to default Reco sequence

@bechenard
Copy link
Copy Markdown
Contributor Author

I had a loing conversation with my inner self, and we're fine with two typos. Just merge this PR

@oksuzian oksuzian merged commit f701272 into Mu2e:main Apr 10, 2026
14 checks passed
@michaelmackenzie
Copy link
Copy Markdown
Contributor

@bechenard please see the requested trigger config changes here:
Mu2e/mu2e-trig-config#152

The head of the repos and the CI will be broken until we can merge this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants