Skip to content

separated Diffusion Model resampler for STM PR from PR #1794#1798

Merged
oksuzian merged 4 commits intoMu2e:mainfrom
YongyiBWu:diffusion-model
Apr 25, 2026
Merged

separated Diffusion Model resampler for STM PR from PR #1794#1798
oksuzian merged 4 commits intoMu2e:mainfrom
YongyiBWu:diffusion-model

Conversation

@YongyiBWu
Copy link
Copy Markdown
Contributor

@YongyiBWu YongyiBWu commented Apr 11, 2026

Sub-PR with only diffusion model resampler update, separated from PR #1794
Suggested changes from PR#1794 were applied. Response details see PR #1794

=================================
This pull request introduces a new score-based diffusion model machine learning tool, integrates it into the build system, and adds new modules for virtual detector resampling using this model. Additionally, it updates the generator ID enumeration to support a new generator and defines transformation defaults for the VDResampler. The changes are grouped below by theme.

Machine Learning Model Integration:

  • Added a new ScoreBasedDiffusionModel class for training and sampling using score-based diffusion models, including its header and implementation. This model supports configurable neural network architectures, optimizer and noise schedule selection, and model persistence. (MachineLearningTools/inc/ScoreBasedDiffusionModel.hh, MachineLearningTools/CMakeLists.txt, MachineLearningTools/src/SConscript) [1] [2] [3]
  • Integrated the new machine learning library into the build system and made it available to other modules. (MachineLearningTools/CMakeLists.txt, MachineLearningTools/src/SConscript, STMMC/src/SConscript) [1] [2] [3]

Virtual Detector Resampling Modules:

  • Added four new art modules for VDResampler: configuration, training, generation from model, and generation with mixing. These modules utilize the new machine learning tools and are registered in the build system. (STMMC/CMakeLists.txt)

Generator ID Updates:

  • Extended the GenId enumeration and its string mapping to include a new generator, STMDownStreamGenTool. (MCDataProducts/inc/GenId.hh) [1] [2]

VDResampler Transformation Defaults:

  • Added a header defining default transformation constants for the VDResampler. (STMMC/inc/VDResamplerTransformDefaults.hh)

Copilot AI review requested due to automatic review settings April 11, 2026 22:46
@FNALbuild
Copy link
Copy Markdown
Collaborator

Hi @YongyiBWu,
You have proposed changes to files in these packages:

  • MCDataProducts
  • STMMC
  • MachineLearningTools

which require these tests: build.

@Mu2e/fnalbuild-users, @Mu2e/write have access to CI actions on main.

⌛ The following tests have been triggered for bbe08a4: build (Build queue - API unavailable)

About FNALbuild. Code review on Mu2e/Offline.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a score-based diffusion model implementation to MachineLearningTools and introduces new STMMC modules that configure, train, and sample virtual-detector resampling models (including a “mix across sources” generator). It also extends GenId to tag the new downstream STM generator.

Changes:

  • Added ScoreBasedDiffusionModel (training, sampling, save/load) and wired it into the build.
  • Added VD resampler modules: configure (summary + auto-generated training FHiCL), train, generate-from-model, and generate-mix.
  • Extended GenId with STMDownStreamGenTool for generated particles from these modules.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
MachineLearningTools/inc/ScoreBasedDiffusionModel.hh Public API for score-based diffusion model training/sampling and CSV persistence.
MachineLearningTools/src/ScoreBasedDiffusionModel.cc Core diffusion model implementation (network, optimizer, training loop, save/load, sampling).
MachineLearningTools/CMakeLists.txt Builds/install the new MachineLearningTools library.
MachineLearningTools/src/SConscript SCons build integration for the new package.
STMMC/inc/VDResamplerTransformDefaults.hh Centralizes default transform constants shared by training/generation.
STMMC/src/VDResamplerConfigure_module.cc Scans VD hits, writes hit summary CSV, generates per-PDG training FHiCL.
STMMC/src/VDResamplerTrain_module.cc Collects transformed training samples and trains 1-stage or 2-stage diffusion models.
STMMC/src/VDResamplerGenerateFromModel_module.cc Generates GenParticles from a single model (1-stage or 2-stage).
STMMC/src/VDResamplerGenerateMix_module.cc Mixes across multiple summary sources and particle ratios; loads appropriate models and generates GenParticles.
STMMC/CMakeLists.txt Registers the new STMMC art plugins and links required libraries.
STMMC/src/SConscript Adds MachineLearningTools + SeedService dependencies for SCons plugin builds.
MCDataProducts/inc/GenId.hh Adds STMDownStreamGenTool enum value and corresponding string name.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread MachineLearningTools/src/ScoreBasedDiffusionModel.cc Outdated
Comment thread STMMC/src/VDResamplerConfigure_module.cc Outdated
Comment thread STMMC/src/VDResamplerTrain_module.cc Outdated
Comment thread STMMC/src/VDResamplerTrain_module.cc Outdated
@FNALbuild
Copy link
Copy Markdown
Collaborator

☀️ The build tests passed at bbe08a4.

Test Result Details
test with Command did not list any other PRs to include
merge Merged bbe08a4 at 2cc428c
build (prof) Log file. Build time: 04 min 09 sec
ceSimReco Log file.
g4test_03MT Log file.
transportOnly Log file.
POT Log file.
g4study Log file.
cosmicSimReco Log file.
cosmicOffSpill Log file.
ceSteps Log file.
ceDigi Log file.
muDauSteps Log file.
ceMix Log file.
rootOverlaps Log file.
g4surfaceCheck Log file.
trigger Log file.
check_cmake Log file.
FIXME, TODO TODO (0) FIXME (0) in 8 files
clang-tidy ➡️ 8 errors 27 warnings
whitespace check no whitespace errors found

N.B. These results were obtained from a build of this Pull Request at bbe08a4 after being merged into the base branch at 2cc428c.

For more information, please check the job page here.
Build artifacts are deleted after 5 days. If this is not desired, select Keep this build forever on the job page.

@oksuzian
Copy link
Copy Markdown
Collaborator

I'll review this pull request by examining the details and the code changes.Let me fetch the changed files directly from the PR's head branch to review them.Now let me also check the existing review comments on this PR:Here is my review of PR #1798. This is a substantial PR (~2,967 additions) introducing a custom score-based diffusion model ML framework and four new art modules for virtual detector resampling. I've categorized the issues by severity.


🔴 Bugs / Correctness Issues

1. HepLorentzVector constructor argument order is wrong

In both VDResamplerGenerateFromModel_module.cc and VDResamplerGenerateMix_module.cc, the HepLorentzVector is constructed as:

const CLHEP::HepLorentzVector fourMomParticle(eTotal, momParticle);

CLHEP's HepLorentzVector(double, Hep3Vector) interprets the first argument as the x-component, not energy. The correct constructor for (energy, 3-momentum) is HepLorentzVector(Hep3Vector, double):

const CLHEP::HepLorentzVector fourMomParticle(momParticle, eTotal);

This will produce completely wrong 4-vectors as written.

2. loadModel() double-initializes weights wastefully and risks inconsistency

In ScoreBasedDiffusionModel::loadModel(), the constructor is called (which randomly initializes all weights), and then immediately overwritten:

ScoreBasedDiffusionModel model(randFlat, randGaussQ, dim, ...);
// Overwrite initialized weights with loaded values
for (size_t l = 0; l < model.network_.size(); ++l) {
    model.network_[l].W = loadedNetwork[l].W;
    model.network_[l].b = loadedNetwork[l].b;
}

This wastes random engine state (consuming random numbers during construction), which subtly changes reproducibility compared to a model that was never loaded. Consider adding a private constructor that skips random initialization.

3. trainingSampleSize_ assigned from size_t to int — truncation

In train(), trainingSampleSize_ = N; where N is size_t but trainingSampleSize_ is int. For very large datasets (>2^31 samples), this silently overflows.

4. VDResamplerConfigurettree is uninitialized when doROOTDump is false

The member TTree* ttree; is never initialized to nullptr. If doROOTDump is false, ttree is a dangling pointer. While it's not dereferenced in that path currently, this is a latent bug.


🟡 Robustness / Defensive Programming Issues

5. No validation of VDr > 0 in training or generation modules

As also noted in an existing review comment, if VDr is configured as 0 or negative, the division rho = r / VDr produces inf/NaN. Both VDResamplerTrain, VDResamplerGenerateFromModel, and VDResamplerGenerateMix should validate VDr > 0 in their constructors.

6. Silent fallback for unknown optimizer/noise schedule strings

In VDResamplerTrain_module.cc, unrecognized SBDMoptimizer or SBDMnoiseSchedule values silently default to ADAM/COSINE. A typo like "Aadam" would silently use the wrong optimizer. Add validation or throw on unknown values.

7. Fisher-Yates shuffle has off-by-one potential

size_t j = static_cast<size_t>(randFlat_.fire() * (i + 1));

RandFlat::fire() returns [0, 1), so j ranges over [0, i] which is correct. However, if the engine ever returns exactly 1.0 (which some implementations may do), j == i+1 would be out of bounds. A safer approach: j = std::min(j, i).

8. computeLoss() is declared but never called

The member function computeLoss() is implemented but the loss is actually computed inline in the training loop. This is dead code that could become stale.


🟢 Style / Maintenance Issues

9. Inconsistent coding style

The codebase mixes different formatting conventions throughout:

  • Sometimes for(size_t i=0;i<z.size();++i) (no spaces), other times for (size_t i = 0; i < out; ++i) (with spaces)
  • Inconsistent brace placement ({ on same line vs. next line)
  • The Mu2e Offline codebase generally uses consistent formatting — this should be unified.

10. typedef instead of using

typedef cet::map_vector_key key_type;
typedef unsigned long VolumeId_type;

Modern C++ prefers using declarations. Also, key_type is defined but never used in some modules.

11. Member variables used as scratch space in analyze()

In VDResamplerConfigure and VDResamplerTrain, variables like x, y, z, px, py, pz, pdgId are stored as class members but are only used as local variables within analyze(). They should be local variables instead — storing them as members wastes memory and makes the class harder to reason about thread-safety.

12. Duplicated inverse-transform code in Generate modules

The coordinate inverse transform (arctanh, polar→Cartesian, momentum recovery) is duplicated verbatim between VDResamplerGenerateFromModel_module.cc and VDResamplerGenerateMix_module.cc. This should be extracted into a shared utility function (e.g., in a header under STMMC/inc/).

13. Magic numbers

Several magic numbers appear without constants:

  • 100000 threshold for two-stage training in VDResamplerConfigure
  • 1e-6 for radius safety check in multiple files
  • 0.1 for minimum time in t_safe = std::max(time, 0.1)

These should be named constants or configurable parameters.

14. No #include <memory> in modules using std::unique_ptr

Several module source files use std::unique_ptr without explicitly including <memory>. This compiles because other headers transitively include it, but it's fragile.


Summary

Severity Count Key Items
🔴 Bug 4 Wrong HepLorentzVector arg order, uninitialized ttree, int truncation, RNG state pollution
🟡 Robustness 4 No VDr validation, silent optimizer fallback, shuffle edge case, dead code
🟢 Style 6 Duplicated transforms, member-as-local, magic numbers, inconsistent formatting

The most critical fix needed is the HepLorentzVector constructor argument order — this will produce wrong physics results silently.

@oksuzian oksuzian requested review from brownd1978 and corrodis April 12, 2026 02:11
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@YongyiBWu
Copy link
Copy Markdown
Contributor Author

  1. Geant4.10 and 11 both have HepLorentzVector(double, Hep3Vector) constructors according to their documentations, and some existing modules us that too. But to make things more consistent throughout the repo, all changed to HepLorentzVector(Hep3Vector, double) constructor now.
  2. Since the loaded model is never used for retrain and only used for generating particles, and the model size remains the same, the reproducibility is not a problem. Random states may be wasted but not a large problem. Still, I added an optional flag to the constructor that turns off the RNG initialization when loading from a saved model.
  3. trainingSampleSize_ has been changed to size_t. Comment added to remind use of std::stoull if parsing from a saved file.
  4. Added a nullptr initialization.
  5. VDr, VDz0 values are now validated in all 3 modules.
  6. Now unrecognized settings prompt a warning message before the values fall back to default.
  7. This does not really matter. The RandFlat.fire() returns in [0,1) so there should not be a problem with j=i+1. But I guess it doesn't hurt to add a line j = std::min(j, i);
  8. Now the loss per dimension calculation calls the computeLoss function instead of using its own implementation. Both loss and gradient calculations now unite to a per dimensional basis, and additional dimensional check were added in computeLoss.
  9. ... Fine... spacing conventions unified.
  10. key_type were removed. "typedef unsigned long VolumeId_type;" also used in multiple other files. Kept for consistency.
  11. These are used to store variables and potentially used for writing into ROOT dumps. Not making sense to move them to local vars.
  12. Transform methods along with relevant constants are now collected in a single header file VDResamplerTransforms.hh
  13. Added in corresponding header file or declared as a constant.
  14. Fixed as suggested.

Introduce VDResamplerTransforms (new header) and remove legacy VDResamplerTransformDefaults; centralize forward/inverse sample transforms and numeric safety constants. Harden ScoreBasedDiffusionModel by: adding initializeRandomWeights ctor flag, using size_t for trainingSampleSize_, initializing layer buffers safely, guarding weight initialization, adding dimension checks in computeLoss, improving gradient/Adam math/formatting, fixing gradient clipping early-return, validating and reconstructing loaded network shapes (loadModel uses initializeRandomWeights=false), and various style/robustness fixes. Update STMMC modules (Configure/GenerateFromModel/GenerateMix/Train) to use the new transforms, add runtime validation for VDr/VDz0, improve config parsing with warnings, initialize pointers, replace inlined transform code with calls into VDResampler, and minor API/constructor fixes (HepLorentzVector usage). Misc: remove unused typedefs, add small constants (kTwoStageTrainingHitThreshold), includes, and other small cleanups to improve stability and correctness.
@YongyiBWu
Copy link
Copy Markdown
Contributor Author

@FNALbuild run build test

@FNALbuild
Copy link
Copy Markdown
Collaborator

⌛ The following tests have been triggered for a774f0a: build (Build queue - API unavailable)

@FNALbuild
Copy link
Copy Markdown
Collaborator

☀️ The build tests passed at a774f0a.

Test Result Details
test with Command did not list any other PRs to include
merge Merged a774f0a at 1d1b9f0
build (prof) Log file. Build time: 08 min 37 sec
ceSimReco Log file.
g4test_03MT Log file.
transportOnly Log file.
POT Log file.
g4study Log file.
cosmicSimReco Log file.
cosmicOffSpill Log file.
ceSteps Log file.
ceDigi Log file.
muDauSteps Log file.
ceMix Log file.
rootOverlaps Log file.
g4surfaceCheck Log file.
trigger Log file. Return Code 1.
check_cmake Log file.
FIXME, TODO TODO (0) FIXME (0) in 8 files
clang-tidy ➡️ 8 errors 24 warnings
whitespace check no whitespace errors found

N.B. These results were obtained from a build of this Pull Request at a774f0a after being merged into the base branch at 1d1b9f0.

For more information, please check the job page here.
Build artifacts are deleted after 5 days. If this is not desired, select Keep this build forever on the job page.

Copy link
Copy Markdown
Collaborator

@brownd1978 brownd1978 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't evaluate this code technically. Since it's a pure addition I think its OK.

@oksuzian oksuzian merged commit bb64680 into Mu2e:main Apr 25, 2026
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants