Skip to content

BUG: Fix intermittent itkSPSAOptimizerTest failure#6015

Merged
dzenanz merged 1 commit intoInsightSoftwareConsortium:mainfrom
hjmjohnson:fix-spsa-optimizer-test-flakiness
Apr 7, 2026
Merged

BUG: Fix intermittent itkSPSAOptimizerTest failure#6015
dzenanz merged 1 commit intoInsightSoftwareConsortium:mainfrom
hjmjohnson:fix-spsa-optimizer-test-flakiness

Conversation

@hjmjohnson
Copy link
Copy Markdown
Member

Summary

Fix a flaky test (itkSPSAOptimizerTest) that fails approximately 1 in 625 CI
runs by seeding the global MersenneTwister RNG to a fixed value, making the test
fully deterministic.

Root Cause Analysis

The SPSAOptimizer uses the Simultaneous Perturbation Stochastic
Approximation
algorithm, which estimates gradients via random Bernoulli (±1)
perturbations drawn from the global MersenneTwisterRandomVariateGenerator
singleton (GetInstance()). This singleton is seeded from wall-clock time
(time() + clock()), so every test run produces a different random
perturbation sequence.

The optimizer's convergence check uses a StateOfConvergence heuristic — an
exponentially-decaying running average of a_k × |gradient|:

StateOfConvergence += a_k * GradientMagnitude    // AdvanceOneStep()
StateOfConvergence *= StateOfConvergenceDecayRate // ResumeOptimization() (0.5 in test)

This metric measures whether the gradient is small, NOT whether the
position is close to the optimum
. With certain unlucky random perturbation
sequences, several consecutive gradient estimates can be anomalously small at a
point that is still far from the solution. Because StateOfConvergenceDecayRate
is 0.5 (aggressive exponential decay), a few such iterations drive the metric
below the tolerance, and the optimizer falsely declares BelowTolerance
convergence — even though the solution is still outside the test's 0.01
acceptance window.

Diagnostic Evidence

Sanitizer results (all clean — no memory/threading/UB issues):

  • AddressSanitizer + UndefinedBehaviorSanitizer: zero errors
  • Valgrind memcheck (--leak-check=full --track-origins=yes): zero errors
  • ThreadSanitizer: no data races (single-threaded test)

Stress test (before fix) — 5,000 runs:

Run Failures Rate
Before fix 8 / 5,000 1 in 625 (~0.16%)

All 8 failures share the same pattern — premature BelowTolerance stop:

# Solution Error Iterations Stop Reason
1 (1.971, −1.977) 0.038 36 BelowTolerance
2 (2.025, −2.020) 0.032 52 BelowTolerance
3 (1.981, −1.985) 0.023 46 BelowTolerance
4 (2.012, −2.010) 0.016 54 BelowTolerance
5 (2.018, −2.014) 0.023 43 BelowTolerance
6 (2.029, −2.023) 0.037 47 BelowTolerance
7 (1.892, −1.914) 0.134 43 BelowTolerance
8 (1.941, −1.952) 0.076 46 BelowTolerance

Successful runs typically converge at iteration 67–100 with solution error < 0.001.

Fix

Seed the singleton RNG to 121212 (the MersenneTwisterRandomVariateGenerator::DefaultSeed)
at the start of the test. This makes the perturbation sequence deterministic
while preserving the stochastic nature of the algorithm being tested.

Stress test (after fix) — 5,000 runs:

Run Failures Rate
After fix 0 / 5,000 0%

Every run produces identical output: solution (1.99999, −1.99998), 100
iterations, well within the 0.01 tolerance.

AI Assistance

Claude Code (Opus) was used to:

  • Analyze the SPSAOptimizer source, MersenneTwisterRandomVariateGenerator
    implementation, and test code to identify the root cause
  • Build and run the test under ASan+UBSan, Valgrind, and TSan
  • Execute 5,000-iteration stress tests before and after the fix
  • Draft the one-line fix and this PR description

All analysis was reviewed and validated by the commit author.

Testing

# Verified compilation (Release, Debug, ASan+UBSan) — all clean
cmake --build cmake-build-Release --target ITKOptimizersTestDriver
cmake --build cmake-build-Debug   --target ITKOptimizersTestDriver
cmake --build cmake-build-asan    --target ITKOptimizersTestDriver

# ASan+UBSan run — zero errors
UBSAN_OPTIONS="print_stacktrace=1" ASAN_OPTIONS="detect_leaks=1" \
  cmake-build-asan/bin/ITKOptimizersTestDriver itkSPSAOptimizerTest

# Valgrind — zero errors
valgrind --tool=memcheck --leak-check=full --track-origins=yes \
  cmake-build-Release/bin/ITKOptimizersTestDriver itkSPSAOptimizerTest

# Determinism — 10 consecutive runs produce identical output
# Stress test — 0 failures in 5,000 runs (vs 8/5,000 before fix)

@github-actions github-actions bot added type:Bug Inconsistencies or issues which will cause an incorrect result under some or all circumstances type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct area:Numerics Issues affecting the Numerics module labels Apr 6, 2026
@hjmjohnson
Copy link
Copy Markdown
Member Author

Codebase-wide audit for the same bug pattern

I audited the entire ITK codebase for other tests that use the MersenneTwister singleton (GetInstance()) or call SetSeed() without a fixed value. The SPSA test was the only instance where this pattern could cause intermittent failures.

Tests with non-deterministic RNG but NOT vulnerable

These tests use unseeded or randomly-seeded RNG, but their pass/fail logic is invariant to the specific random data — correctness holds for any sequence:

Test Why it's safe
itkKdTreeTest1.cxx Verifies nearest-neighbor search matches brute force — correct for any data
itkKdTreeTest3.cxx Same as above
itkWeightedCentroidKdTreeGeneratorTest1.cxx Same pattern
itkWeightedCentroidKdTreeGeneratorTest8.cxx Same pattern
itkWeightedCentroidKdTreeGeneratorTest9.cxx Same pattern (uses New() not GetInstance())
itkMeanSquaresImageMetricTest.cxx Compares cached vs uncached derivatives on same parameters
itkMinimumMaximumImageFilterGTest.cxx Places min/max pixels at random positions, checks filter finds them
itkImageRandomIteratorTest.cxx Reads random positions, verifies stored values — correct for any position
itkImageRandomConstIteratorWithOnlyIndexTest.cxx Same pattern

Tests that already properly seed the RNG

Test Seed
itkStatisticsImageFilterTest.cxx 987
itkParticleSwarmOptimizerTest.cxx 8775070
itkInitializationBiasedParticleSwarmOptimizerTest.cxx 8775070
itkSimpleImageRegistrationTest.cxx Fixed seed
itkImageRandomConstIteratorWithIndexGTest.cxx Fixed seed
itkScalarConnectedComponentImageFilterTest.cxx 1031571
itkMaskConnectedComponentImageFilterTest.cxx 1031571

Production code using the singleton

The SPSAOptimizer was the only production class using GetInstance() that lacked a way for users to control the seed. ParticleSwarmOptimizerBase has a SetSeed()/UseSeed API, and ImageToImageMetric has m_ReseedIterator.

Why the SPSA test was uniquely vulnerable

The key difference is that SPSA's pass/fail depends on the optimizer converging to within 0.01 of the true solution, which in turn depends on the quality of stochastic gradient estimates. An unlucky perturbation sequence causes the StateOfConvergence heuristic to falsely declare convergence before the position is close enough. The other tests either verify structural properties (nearest-neighbor correctness, pixel value retrieval) that hold regardless of the random data, or already use fixed seeds.

@hjmjohnson hjmjohnson marked this pull request as ready for review April 6, 2026 22:51
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 6, 2026

Greptile Summary

This PR fixes an intermittent CI failure in itkSPSAOptimizerTest by seeding the global MersenneTwisterRandomVariateGenerator singleton to a fixed value before the test runs, eliminating the ~1-in-625 false-convergence failures caused by wall-clock-seeded random perturbation sequences. The one-line fix is minimal, correctly scoped to the test file, and confirmed by a 5,000-run stress test (0 failures after vs. 8 before).

Confidence Score: 5/5

Safe to merge; the fix is minimal, correct, and scoped entirely to the test file.

No P0 or P1 issues found. The only finding is a P2 style suggestion to use the named DefaultSeed constant instead of the magic number 121212. The fix correctly addresses a well-documented flaky test by seeding the singleton RNG before any stochastic operations, consistent with patterns already used elsewhere in the ITK test suite.

No files require special attention.

Important Files Changed

Filename Overview
Modules/Numerics/Optimizers/test/itkSPSAOptimizerTest.cxx Adds a fixed RNG seed before GuessParameters/StartOptimization to eliminate non-deterministic ~1-in-625 false-convergence failures; one minor style nit on using the magic number 121212 instead of the named constant DefaultSeed.

Sequence Diagram

sequenceDiagram
    participant T as itkSPSAOptimizerTest
    participant MT as MersenneTwister (singleton)
    participant OPT as SPSAOptimizer

    T->>MT: SetSeed(121212)
    Note over MT: RNG state is now deterministic
    T->>OPT: GuessParameters(50, 70.0)
    OPT->>MT: GetUniformVariate() [perturbation draws]
    T->>OPT: StartOptimization()
    loop Up to 100 iterations
        OPT->>MT: GetUniformVariate() [Bernoulli ±1 perturbations]
        OPT->>OPT: AdvanceOneStep()
        OPT->>OPT: Update StateOfConvergence
    end
    OPT-->>T: GetCurrentPosition()
    T->>T: Check |pos − trueParams| < 0.01
Loading

Reviews (1): Last reviewed commit: "BUG: Fix intermittent itkSPSAOptimizerTe..." | Re-trigger Greptile

Seed the global MersenneTwister singleton to a fixed value before
running the SPSA optimization, making the test fully deterministic.

The SPSAOptimizer uses stochastic gradient estimation via random
Bernoulli perturbations drawn from the global MersenneTwister
singleton, which is seeded from wall-clock time.  On rare occasions
(~1 in 625 runs), the random perturbation sequence produces several
consecutive near-zero gradient estimates while the optimizer is still
far from the solution.  This causes the exponentially-decaying
StateOfConvergence metric to drop below the tolerance threshold
prematurely, and the optimizer declares BelowTolerance convergence
at iterations 36-54 (instead of running to ~90+), leaving the
solution outside the test's 0.01 acceptance window.
@hjmjohnson hjmjohnson force-pushed the fix-spsa-optimizer-test-flakiness branch from 68b1b29 to 5913c54 Compare April 6, 2026 23:08
@blowekamp blowekamp self-requested a review April 7, 2026 00:41
@dzenanz dzenanz merged commit 214a020 into InsightSoftwareConsortium:main Apr 7, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Numerics Issues affecting the Numerics module type:Bug Inconsistencies or issues which will cause an incorrect result under some or all circumstances type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants