separated Diffusion Model resampler for STM PR from PR #1794#1798
separated Diffusion Model resampler for STM PR from PR #1794#1798
Conversation
|
Hi @YongyiBWu,
which require these tests: build. @Mu2e/fnalbuild-users, @Mu2e/write have access to CI actions on main. ⌛ The following tests have been triggered for bbe08a4: build (Build queue - API unavailable) |
There was a problem hiding this comment.
Pull request overview
This PR adds a score-based diffusion model implementation to MachineLearningTools and introduces new STMMC modules that configure, train, and sample virtual-detector resampling models (including a “mix across sources” generator). It also extends GenId to tag the new downstream STM generator.
Changes:
- Added
ScoreBasedDiffusionModel(training, sampling, save/load) and wired it into the build. - Added VD resampler modules: configure (summary + auto-generated training FHiCL), train, generate-from-model, and generate-mix.
- Extended
GenIdwithSTMDownStreamGenToolfor generated particles from these modules.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
MachineLearningTools/inc/ScoreBasedDiffusionModel.hh |
Public API for score-based diffusion model training/sampling and CSV persistence. |
MachineLearningTools/src/ScoreBasedDiffusionModel.cc |
Core diffusion model implementation (network, optimizer, training loop, save/load, sampling). |
MachineLearningTools/CMakeLists.txt |
Builds/install the new MachineLearningTools library. |
MachineLearningTools/src/SConscript |
SCons build integration for the new package. |
STMMC/inc/VDResamplerTransformDefaults.hh |
Centralizes default transform constants shared by training/generation. |
STMMC/src/VDResamplerConfigure_module.cc |
Scans VD hits, writes hit summary CSV, generates per-PDG training FHiCL. |
STMMC/src/VDResamplerTrain_module.cc |
Collects transformed training samples and trains 1-stage or 2-stage diffusion models. |
STMMC/src/VDResamplerGenerateFromModel_module.cc |
Generates GenParticles from a single model (1-stage or 2-stage). |
STMMC/src/VDResamplerGenerateMix_module.cc |
Mixes across multiple summary sources and particle ratios; loads appropriate models and generates GenParticles. |
STMMC/CMakeLists.txt |
Registers the new STMMC art plugins and links required libraries. |
STMMC/src/SConscript |
Adds MachineLearningTools + SeedService dependencies for SCons plugin builds. |
MCDataProducts/inc/GenId.hh |
Adds STMDownStreamGenTool enum value and corresponding string name. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
☀️ The build tests passed at bbe08a4.
N.B. These results were obtained from a build of this Pull Request at bbe08a4 after being merged into the base branch at 2cc428c. For more information, please check the job page here. |
|
I'll review this pull request by examining the details and the code changes.Let me fetch the changed files directly from the PR's head branch to review them.Now let me also check the existing review comments on this PR:Here is my review of PR #1798. This is a substantial PR (~2,967 additions) introducing a custom score-based diffusion model ML framework and four new art modules for virtual detector resampling. I've categorized the issues by severity. 🔴 Bugs / Correctness Issues1.
|
| Severity | Count | Key Items |
|---|---|---|
| 🔴 Bug | 4 | Wrong HepLorentzVector arg order, uninitialized ttree, int truncation, RNG state pollution |
| 🟡 Robustness | 4 | No VDr validation, silent optimizer fallback, shuffle edge case, dead code |
| 🟢 Style | 6 | Duplicated transforms, member-as-local, magic numbers, inconsistent formatting |
The most critical fix needed is the HepLorentzVector constructor argument order — this will produce wrong physics results silently.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
Introduce VDResamplerTransforms (new header) and remove legacy VDResamplerTransformDefaults; centralize forward/inverse sample transforms and numeric safety constants. Harden ScoreBasedDiffusionModel by: adding initializeRandomWeights ctor flag, using size_t for trainingSampleSize_, initializing layer buffers safely, guarding weight initialization, adding dimension checks in computeLoss, improving gradient/Adam math/formatting, fixing gradient clipping early-return, validating and reconstructing loaded network shapes (loadModel uses initializeRandomWeights=false), and various style/robustness fixes. Update STMMC modules (Configure/GenerateFromModel/GenerateMix/Train) to use the new transforms, add runtime validation for VDr/VDz0, improve config parsing with warnings, initialize pointers, replace inlined transform code with calls into VDResampler, and minor API/constructor fixes (HepLorentzVector usage). Misc: remove unused typedefs, add small constants (kTwoStageTrainingHitThreshold), includes, and other small cleanups to improve stability and correctness.
|
@FNALbuild run build test |
|
⌛ The following tests have been triggered for a774f0a: build (Build queue - API unavailable) |
|
☀️ The build tests passed at a774f0a.
N.B. These results were obtained from a build of this Pull Request at a774f0a after being merged into the base branch at 1d1b9f0. For more information, please check the job page here. |
brownd1978
left a comment
There was a problem hiding this comment.
I can't evaluate this code technically. Since it's a pure addition I think its OK.
Sub-PR with only diffusion model resampler update, separated from PR #1794
Suggested changes from PR#1794 were applied. Response details see PR #1794
=================================
This pull request introduces a new score-based diffusion model machine learning tool, integrates it into the build system, and adds new modules for virtual detector resampling using this model. Additionally, it updates the generator ID enumeration to support a new generator and defines transformation defaults for the VDResampler. The changes are grouped below by theme.
Machine Learning Model Integration:
ScoreBasedDiffusionModelclass for training and sampling using score-based diffusion models, including its header and implementation. This model supports configurable neural network architectures, optimizer and noise schedule selection, and model persistence. (MachineLearningTools/inc/ScoreBasedDiffusionModel.hh,MachineLearningTools/CMakeLists.txt,MachineLearningTools/src/SConscript) [1] [2] [3]MachineLearningTools/CMakeLists.txt,MachineLearningTools/src/SConscript,STMMC/src/SConscript) [1] [2] [3]Virtual Detector Resampling Modules:
STMMC/CMakeLists.txt)Generator ID Updates:
GenIdenumeration and its string mapping to include a new generator,STMDownStreamGenTool. (MCDataProducts/inc/GenId.hh) [1] [2]VDResampler Transformation Defaults:
STMMC/inc/VDResamplerTransformDefaults.hh)