BUG: Fix intermittent itkSPSAOptimizerTest failure by hjmjohnson · Pull Request #6015 · InsightSoftwareConsortium/ITK

hjmjohnson · 2026-04-06T22:42:28Z

Summary

Fix a flaky test (itkSPSAOptimizerTest) that fails approximately 1 in 625 CI
runs by seeding the global MersenneTwister RNG to a fixed value, making the test
fully deterministic.

Root Cause Analysis

The SPSAOptimizer uses the Simultaneous Perturbation Stochastic
Approximation algorithm, which estimates gradients via random Bernoulli (±1)
perturbations drawn from the global MersenneTwisterRandomVariateGenerator
singleton (GetInstance()). This singleton is seeded from wall-clock time
(time() + clock()), so every test run produces a different random
perturbation sequence.

The optimizer's convergence check uses a StateOfConvergence heuristic — an
exponentially-decaying running average of a_k × |gradient|:

StateOfConvergence += a_k * GradientMagnitude    // AdvanceOneStep()
StateOfConvergence *= StateOfConvergenceDecayRate // ResumeOptimization() (0.5 in test)

This metric measures whether the gradient is small, NOT whether the
position is close to the optimum. With certain unlucky random perturbation
sequences, several consecutive gradient estimates can be anomalously small at a
point that is still far from the solution. Because StateOfConvergenceDecayRate
is 0.5 (aggressive exponential decay), a few such iterations drive the metric
below the tolerance, and the optimizer falsely declares BelowTolerance
convergence — even though the solution is still outside the test's 0.01
acceptance window.

Diagnostic Evidence

Sanitizer results (all clean — no memory/threading/UB issues):

AddressSanitizer + UndefinedBehaviorSanitizer: zero errors
Valgrind memcheck (--leak-check=full --track-origins=yes): zero errors
ThreadSanitizer: no data races (single-threaded test)

Stress test (before fix) — 5,000 runs:

Run	Failures	Rate
Before fix	8 / 5,000	1 in 625 (~0.16%)

All 8 failures share the same pattern — premature BelowTolerance stop:

#	Solution	Error	Iterations	Stop Reason
1	(1.971, −1.977)	0.038	36	BelowTolerance
2	(2.025, −2.020)	0.032	52	BelowTolerance
3	(1.981, −1.985)	0.023	46	BelowTolerance
4	(2.012, −2.010)	0.016	54	BelowTolerance
5	(2.018, −2.014)	0.023	43	BelowTolerance
6	(2.029, −2.023)	0.037	47	BelowTolerance
7	(1.892, −1.914)	0.134	43	BelowTolerance
8	(1.941, −1.952)	0.076	46	BelowTolerance

Successful runs typically converge at iteration 67–100 with solution error < 0.001.

Fix

Seed the singleton RNG to 121212 (the MersenneTwisterRandomVariateGenerator::DefaultSeed)
at the start of the test. This makes the perturbation sequence deterministic
while preserving the stochastic nature of the algorithm being tested.

Stress test (after fix) — 5,000 runs:

Run	Failures	Rate
After fix	0 / 5,000	0%

Every run produces identical output: solution (1.99999, −1.99998), 100
iterations, well within the 0.01 tolerance.

AI Assistance

Claude Code (Opus) was used to:

Analyze the SPSAOptimizer source, MersenneTwisterRandomVariateGenerator
implementation, and test code to identify the root cause
Build and run the test under ASan+UBSan, Valgrind, and TSan
Execute 5,000-iteration stress tests before and after the fix
Draft the one-line fix and this PR description

All analysis was reviewed and validated by the commit author.

Testing

# Verified compilation (Release, Debug, ASan+UBSan) — all clean
cmake --build cmake-build-Release --target ITKOptimizersTestDriver
cmake --build cmake-build-Debug   --target ITKOptimizersTestDriver
cmake --build cmake-build-asan    --target ITKOptimizersTestDriver

# ASan+UBSan run — zero errors
UBSAN_OPTIONS="print_stacktrace=1" ASAN_OPTIONS="detect_leaks=1" \
  cmake-build-asan/bin/ITKOptimizersTestDriver itkSPSAOptimizerTest

# Valgrind — zero errors
valgrind --tool=memcheck --leak-check=full --track-origins=yes \
  cmake-build-Release/bin/ITKOptimizersTestDriver itkSPSAOptimizerTest

# Determinism — 10 consecutive runs produce identical output
# Stress test — 0 failures in 5,000 runs (vs 8/5,000 before fix)

hjmjohnson · 2026-04-06T22:49:28Z

Codebase-wide audit for the same bug pattern

I audited the entire ITK codebase for other tests that use the MersenneTwister singleton (GetInstance()) or call SetSeed() without a fixed value. The SPSA test was the only instance where this pattern could cause intermittent failures.

Tests with non-deterministic RNG but NOT vulnerable

These tests use unseeded or randomly-seeded RNG, but their pass/fail logic is invariant to the specific random data — correctness holds for any sequence:

Test	Why it's safe
`itkKdTreeTest1.cxx`	Verifies nearest-neighbor search matches brute force — correct for any data
`itkKdTreeTest3.cxx`	Same as above
`itkWeightedCentroidKdTreeGeneratorTest1.cxx`	Same pattern
`itkWeightedCentroidKdTreeGeneratorTest8.cxx`	Same pattern
`itkWeightedCentroidKdTreeGeneratorTest9.cxx`	Same pattern (uses `New()` not `GetInstance()`)
`itkMeanSquaresImageMetricTest.cxx`	Compares cached vs uncached derivatives on same parameters
`itkMinimumMaximumImageFilterGTest.cxx`	Places min/max pixels at random positions, checks filter finds them
`itkImageRandomIteratorTest.cxx`	Reads random positions, verifies stored values — correct for any position
`itkImageRandomConstIteratorWithOnlyIndexTest.cxx`	Same pattern

Tests that already properly seed the RNG

Test	Seed
`itkStatisticsImageFilterTest.cxx`	987
`itkParticleSwarmOptimizerTest.cxx`	8775070
`itkInitializationBiasedParticleSwarmOptimizerTest.cxx`	8775070
`itkSimpleImageRegistrationTest.cxx`	Fixed seed
`itkImageRandomConstIteratorWithIndexGTest.cxx`	Fixed seed
`itkScalarConnectedComponentImageFilterTest.cxx`	1031571
`itkMaskConnectedComponentImageFilterTest.cxx`	1031571

Production code using the singleton

The SPSAOptimizer was the only production class using GetInstance() that lacked a way for users to control the seed. ParticleSwarmOptimizerBase has a SetSeed()/UseSeed API, and ImageToImageMetric has m_ReseedIterator.

Why the SPSA test was uniquely vulnerable

The key difference is that SPSA's pass/fail depends on the optimizer converging to within 0.01 of the true solution, which in turn depends on the quality of stochastic gradient estimates. An unlucky perturbation sequence causes the StateOfConvergence heuristic to falsely declare convergence before the position is close enough. The other tests either verify structural properties (nearest-neighbor correctness, pixel value retrieval) that hold regardless of the random data, or already use fixed seeds.

greptile-apps · 2026-04-06T22:55:45Z

Greptile Summary

This PR fixes an intermittent CI failure in itkSPSAOptimizerTest by seeding the global MersenneTwisterRandomVariateGenerator singleton to a fixed value before the test runs, eliminating the ~1-in-625 false-convergence failures caused by wall-clock-seeded random perturbation sequences. The one-line fix is minimal, correctly scoped to the test file, and confirmed by a 5,000-run stress test (0 failures after vs. 8 before).

Confidence Score: 5/5

Safe to merge; the fix is minimal, correct, and scoped entirely to the test file.

No P0 or P1 issues found. The only finding is a P2 style suggestion to use the named DefaultSeed constant instead of the magic number 121212. The fix correctly addresses a well-documented flaky test by seeding the singleton RNG before any stochastic operations, consistent with patterns already used elsewhere in the ITK test suite.

No files require special attention.

Important Files Changed

Filename	Overview
Modules/Numerics/Optimizers/test/itkSPSAOptimizerTest.cxx	Adds a fixed RNG seed before GuessParameters/StartOptimization to eliminate non-deterministic ~1-in-625 false-convergence failures; one minor style nit on using the magic number 121212 instead of the named constant DefaultSeed.

Sequence Diagram

sequenceDiagram
    participant T as itkSPSAOptimizerTest
    participant MT as MersenneTwister (singleton)
    participant OPT as SPSAOptimizer

    T->>MT: SetSeed(121212)
    Note over MT: RNG state is now deterministic
    T->>OPT: GuessParameters(50, 70.0)
    OPT->>MT: GetUniformVariate() [perturbation draws]
    T->>OPT: StartOptimization()
    loop Up to 100 iterations
        OPT->>MT: GetUniformVariate() [Bernoulli ±1 perturbations]
        OPT->>OPT: AdvanceOneStep()
        OPT->>OPT: Update StateOfConvergence
    end
    OPT-->>T: GetCurrentPosition()
    T->>T: Check |pos − trueParams| < 0.01

_{Reviews (1): Last reviewed commit: "BUG: Fix intermittent itkSPSAOptimizerTe..." | Re-trigger Greptile}

Modules/Numerics/Optimizers/test/itkSPSAOptimizerTest.cxx

Seed the global MersenneTwister singleton to a fixed value before running the SPSA optimization, making the test fully deterministic. The SPSAOptimizer uses stochastic gradient estimation via random Bernoulli perturbations drawn from the global MersenneTwister singleton, which is seeded from wall-clock time. On rare occasions (~1 in 625 runs), the random perturbation sequence produces several consecutive near-zero gradient estimates while the optimizer is still far from the solution. This causes the exponentially-decaying StateOfConvergence metric to drop below the tolerance threshold prematurely, and the optimizer declares BelowTolerance convergence at iterations 36-54 (instead of running to ~90+), leaving the solution outside the test's 0.01 acceptance window.

hjmjohnson marked this pull request as ready for review April 6, 2026 22:51

hjmjohnson mentioned this pull request Apr 6, 2026

ENH: Auto-cleanup BigIO test outputs and enforce serial execution #6013

Merged

4 tasks

greptile-apps bot reviewed Apr 6, 2026

View reviewed changes

Modules/Numerics/Optimizers/test/itkSPSAOptimizerTest.cxx Outdated Show resolved Hide resolved

hjmjohnson force-pushed the fix-spsa-optimizer-test-flakiness branch from 68b1b29 to 5913c54 Compare April 6, 2026 23:08

blowekamp self-requested a review April 7, 2026 00:41

blowekamp approved these changes Apr 7, 2026

View reviewed changes

dzenanz approved these changes Apr 7, 2026

View reviewed changes

dzenanz merged commit 214a020 into InsightSoftwareConsortium:main Apr 7, 2026
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BUG: Fix intermittent itkSPSAOptimizerTest failure#6015

BUG: Fix intermittent itkSPSAOptimizerTest failure#6015
dzenanz merged 1 commit intoInsightSoftwareConsortium:mainfrom
hjmjohnson:fix-spsa-optimizer-test-flakiness

hjmjohnson commented Apr 6, 2026

Uh oh!

hjmjohnson commented Apr 6, 2026

Uh oh!

greptile-apps bot commented Apr 6, 2026

Greptile Summary

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

hjmjohnson commented Apr 6, 2026

Summary

Root Cause Analysis

Diagnostic Evidence

Fix

AI Assistance

Testing

Uh oh!

hjmjohnson commented Apr 6, 2026

Codebase-wide audit for the same bug pattern

Tests with non-deterministic RNG but NOT vulnerable

Tests that already properly seed the RNG

Production code using the singleton

Why the SPSA test was uniquely vulnerable

Uh oh!

greptile-apps bot commented Apr 6, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants