ENH: Add consistency (should fail) to force test failure in ParallelSparseFieldLevelSet deadlock#6320
Conversation
|
| Filename | Overview |
|---|---|
| Modules/Segmentation/LevelSets/test/itkParallelSparseFieldLevelSetImageFilterRobustnessTest.cxx | New 337-line regression test for the deadlock; has a stale "four pipelines" count in the file-level comment (code uses 8), a 23-line comment block violating the prose-budget cap, and uses the legacy test pattern rather than GoogleTest. |
| Modules/Segmentation/LevelSets/test/CMakeLists.txt | Registers the new test source and adds an itk_add_test entry with a 30-second CTest TIMEOUT; straightforward and correct. |
Sequence Diagram
sequenceDiagram
participant Test as RobustnessTest (main)
participant S1 as Scenario 1 (sweep x repeat)
participant S2 as Scenario 2 (determinism)
participant S3 as Scenario 3 (concurrent)
participant F as MorphFilter::Update()
participant PA as ParallelizeArray
participant WU as WorkUnit (thread)
Test->>S1: 20 reps x 7 work-unit counts
S1->>F: run_one(wu, 30)
F->>PA: dispatch N work units
PA->>WU: SignalNeighborsAndWait() blocks on CV
Note over PA,WU: Bug: pool size M < N, all workers blocked, deadlock
WU-->>F: phase complete (if fix applied)
F-->>S1: output image
S1->>Test: isfinite + range check
Test->>S2: "3 independent runs at workUnits=11"
S2->>F: run_one(11, 100) x3
F-->>S2: output image
S2->>Test: bit-identical comparison
Test->>S3: 6 reps x 8 async pipelines
par 8 concurrent pipelines
S3->>F: run_one(11, 60) via std::async
F->>PA: "8x11=88 work units contend same pool"
PA-->>F: all units complete (if fix applied)
end
F-->>S3: 8 output images
S3->>Test: all outputs IsNotNull
Reviews (1): Last reviewed commit: "ENH: Add robustness test exposing Parall..." | Re-trigger Greptile
5738416 to
b89cbc4
Compare
ParallelSparseFieldLevelSetImageFilter::Iterate() dispatches its
per-iteration ThreadedApplyUpdate body via mt->ParallelizeArray().
The body invokes SignalNeighborsAndWait() several times, blocking on
per-thread std::condition_variables until both neighbor work units
have arrived at the same phase. This neighbor-only protocol assumes
every work unit holds its own concurrent OS thread for the full
duration of the dispatch. Pool/TBB-backed ParallelizeArray does not
honor this: N work units are submitted to a shared worker pool whose
size is bounded by core count, and CV-blocked tasks pin the pool so
queued work units never start and the dispatch deadlocks.
Reported as the intermittent timeout of
itkParallelSparseFieldLevelSetImageFilterTest in CDash test 2570265561
("Trying mf->Update()" followed by a 1000s timeout) on Windows
Release with TBB.
The existing test fails at ~3% per run under TBB, which is too rare
to catch the bug reliably in CI. This commit adds a new test
itkParallelSparseFieldLevelSetImageFilterRobustnessTest that drives
the deadlock to a high per-test-invocation probability:
Scenario 1 (sweep x repeat): cycle 1/2/4/8/11/16/32 work-unit
counts for 20 outer iterations, totalling 140 short filter runs.
Per-run pool-starvation deadlock probability is low but
amplification gives a ~95% per-test-invocation hit rate.
Scenario 2 (determinism): three repeated runs at workUnits=11 must
produce bit-identical output. Guards against future barrier
rewrites that introduce numerical non-determinism.
Scenario 3 (concurrent multi-pipeline): eight pipelines run
simultaneously via std::async (88 work units competing for a
core-count-bounded worker pool). Cross-pipeline contention
directly probes the dispatch-starvation deadlock path.
Local measurements on a 16-core x86 box with TBB:
- Upstream code, 30 invocations: 30 timeouts (100% deadlock).
- Upstream code, POOL/PLATFORM: pass (no worker shortage possible).
- Test runtime with a working synchronization: ~5s wallclock.
- CTest TIMEOUT set to 30 (6x leeway over expected runtime).
This commit intentionally introduces a failing test on master. The
proposed fix lands in a follow-up PR.
b89cbc4 to
bf38f7e
Compare
|
|
This commit reproduces the CI evidence (HEAD bf38f7e)Every failing job reports
"Subprocess aborted" is the test's internal watchdog calling Why it reproduces on some platforms and not othersThe deadlock requires the TBB worker-shortage path: when |
bf38f7e
into
InsightSoftwareConsortium:main
Adds a deterministic-failure regression test for the intermittent timeout of
itkParallelSparseFieldLevelSetImageFilterTest(e.g. CDash test 2570265561).This test intentionally fails on
main— it is the first half of a two-PR sequence that demonstrates the bug here, then fixes it in a follow-up PR.ParallelSparseFieldLevelSetImageFilter::Iterate()dispatches its innerThreadedApplyUpdatebody viamt->ParallelizeArray(). The body invokesSignalNeighborsAndWait()several times, blocking on per-threadstd::condition_variables until both neighbor work units have arrived at the same phase. The neighbor-only sync requires every work unit to hold its own concurrent OS thread for the full duration of the dispatch. Pool/TBB-backedParallelizeArraydoes not honor this: N work units are submitted to a shared worker pool whose size is bounded by core count, and CV-blocked tasks pin the pool so queued work units never start and the dispatch deadlocks. The existing baseline test fails at ~3% per run under TBB on a 16-core box — far too rare to catch the bug reliably.Test scenarios
itkParallelSparseFieldLevelSetImageFilterRobustnessTest:Sweep × repeat. Cycle the work-unit counts {1, 2, 4, 8, 11, 16, 32} for 20 outer iterations (140 short filter runs). Each filter run has a low per-invocation deadlock probability; cycling amplifies it to ~95% per test invocation while keeping total runtime under ~10 s with a working synchronization. Output must be finite and in a plausible range at every count.
Determinism. Three repeated runs at workUnits = 11 must produce bit-identical output. Guards future barrier rewrites against introducing numerical non-determinism.
Concurrent multi-pipeline. Eight pipelines run simultaneously via
std::async(8 × 11 = 88 work units competing for a single core-count-bounded worker pool). Cross-pipeline worker contention directly probes the dispatch-starvation deadlock.Measured failure rate
Local soak on a 16-core x86 box with TBB:
CTest TIMEOUTset to 30 s → 6× leeway over expected runtime on the local box, accommodates slower CI hardware while still flagging the deadlock within a reasonable bound.How to verify locally