BUG: Fix intermittent itkSPSAOptimizerTest failure#6015
Conversation
Codebase-wide audit for the same bug patternI audited the entire ITK codebase for other tests that use the MersenneTwister singleton ( Tests with non-deterministic RNG but NOT vulnerableThese tests use unseeded or randomly-seeded RNG, but their pass/fail logic is invariant to the specific random data — correctness holds for any sequence:
Tests that already properly seed the RNG
Production code using the singletonThe Why the SPSA test was uniquely vulnerableThe key difference is that SPSA's pass/fail depends on the optimizer converging to within 0.01 of the true solution, which in turn depends on the quality of stochastic gradient estimates. An unlucky perturbation sequence causes the |
|
| Filename | Overview |
|---|---|
| Modules/Numerics/Optimizers/test/itkSPSAOptimizerTest.cxx | Adds a fixed RNG seed before GuessParameters/StartOptimization to eliminate non-deterministic ~1-in-625 false-convergence failures; one minor style nit on using the magic number 121212 instead of the named constant DefaultSeed. |
Sequence Diagram
sequenceDiagram
participant T as itkSPSAOptimizerTest
participant MT as MersenneTwister (singleton)
participant OPT as SPSAOptimizer
T->>MT: SetSeed(121212)
Note over MT: RNG state is now deterministic
T->>OPT: GuessParameters(50, 70.0)
OPT->>MT: GetUniformVariate() [perturbation draws]
T->>OPT: StartOptimization()
loop Up to 100 iterations
OPT->>MT: GetUniformVariate() [Bernoulli ±1 perturbations]
OPT->>OPT: AdvanceOneStep()
OPT->>OPT: Update StateOfConvergence
end
OPT-->>T: GetCurrentPosition()
T->>T: Check |pos − trueParams| < 0.01
Reviews (1): Last reviewed commit: "BUG: Fix intermittent itkSPSAOptimizerTe..." | Re-trigger Greptile
Seed the global MersenneTwister singleton to a fixed value before running the SPSA optimization, making the test fully deterministic. The SPSAOptimizer uses stochastic gradient estimation via random Bernoulli perturbations drawn from the global MersenneTwister singleton, which is seeded from wall-clock time. On rare occasions (~1 in 625 runs), the random perturbation sequence produces several consecutive near-zero gradient estimates while the optimizer is still far from the solution. This causes the exponentially-decaying StateOfConvergence metric to drop below the tolerance threshold prematurely, and the optimizer declares BelowTolerance convergence at iterations 36-54 (instead of running to ~90+), leaving the solution outside the test's 0.01 acceptance window.
68b1b29 to
5913c54
Compare
Summary
Fix a flaky test (
itkSPSAOptimizerTest) that fails approximately 1 in 625 CIruns by seeding the global MersenneTwister RNG to a fixed value, making the test
fully deterministic.
Root Cause Analysis
The
SPSAOptimizeruses the Simultaneous Perturbation StochasticApproximation algorithm, which estimates gradients via random Bernoulli (±1)
perturbations drawn from the global
MersenneTwisterRandomVariateGeneratorsingleton (
GetInstance()). This singleton is seeded from wall-clock time(
time()+clock()), so every test run produces a different randomperturbation sequence.
The optimizer's convergence check uses a
StateOfConvergenceheuristic — anexponentially-decaying running average of
a_k × |gradient|:This metric measures whether the gradient is small, NOT whether the
position is close to the optimum. With certain unlucky random perturbation
sequences, several consecutive gradient estimates can be anomalously small at a
point that is still far from the solution. Because
StateOfConvergenceDecayRateis 0.5 (aggressive exponential decay), a few such iterations drive the metric
below the tolerance, and the optimizer falsely declares
BelowToleranceconvergence — even though the solution is still outside the test's 0.01
acceptance window.
Diagnostic Evidence
Sanitizer results (all clean — no memory/threading/UB issues):
--leak-check=full --track-origins=yes): zero errorsStress test (before fix) — 5,000 runs:
All 8 failures share the same pattern — premature
BelowTolerancestop:Successful runs typically converge at iteration 67–100 with solution error < 0.001.
Fix
Seed the singleton RNG to
121212(theMersenneTwisterRandomVariateGenerator::DefaultSeed)at the start of the test. This makes the perturbation sequence deterministic
while preserving the stochastic nature of the algorithm being tested.
Stress test (after fix) — 5,000 runs:
Every run produces identical output: solution (1.99999, −1.99998), 100
iterations, well within the 0.01 tolerance.
AI Assistance
Claude Code (Opus) was used to:
SPSAOptimizersource,MersenneTwisterRandomVariateGeneratorimplementation, and test code to identify the root cause
All analysis was reviewed and validated by the commit author.
Testing