fix(mid): seed BitMatrixSampler explicitly to restore test reproducibility by ivanbasov · Pull Request #43 · NVIDIA/Ising-Decoding

ivanbasov · 2026-04-06T15:52:23Z

Summary

torch.manual_seed() does not control cuQuantum's BitMatrixSampler internal RNG, so two mid-GPU tests that relied on it for cross-call reproducibility were failing non-deterministically.
Add an optional seed: int | None = None parameter to dem_sampling() and MemoryCircuitTorch.generate_batch(). When provided, a fresh BitMatrixSampler is always created with Options(seed=N), resetting its internal RNG and guaranteeing identical outputs for repeated calls with the same seed. Production paths (seed=None) are unaffected — the module-level cache is reused exactly as before.
Fix test_he_reduces_error_weight and test_full_pipeline_w2_reproducible to pass seed= explicitly instead of calling torch.manual_seed().

Root cause: Commit 5aeebdf removed the pure-torch fallback (which was controlled by torch.manual_seed()) making BitMatrixSampler the sole backend. The two mid tests were written when the torch path still existed and were never updated to account for cuST's independent RNG state.

Test plan

Re-run NVIDIA/Ising-Decoding CI run that failed: mid-gpu-tests / "HE compile tests" — both test_he_reduces_error_weight and test_full_pipeline_w2_reproducible should now pass.
Confirm all other mid-GPU tests (71 total) still pass.
Confirm no regression in other GPU/CPU test suites (sampler cache path unchanged when seed=None).

🤖 Generated with Claude Code

…fault torch.compile=on combined with DataLoader spawn workers during LER validation causes a segfault (20 leaked semaphores, core dumped). Set PREDECODER_TORCH_COMPILE=0 for the Train all orientations step. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…vent segfault" This reverts commit 7f0f6c8.

…ility torch.manual_seed() does not control cuQuantum's BitMatrixSampler internal RNG, so the two mid-GPU tests that relied on it for reproducibility were non-deterministic and intermittently failing. Add an optional `seed` parameter to `dem_sampling()` and `MemoryCircuitTorch.generate_batch()`. When a seed is provided a fresh BitMatrixSampler is always created with `Options(seed=N)`, resetting its internal RNG and guaranteeing identical outputs on every call with the same seed. Production paths (seed=None) are unaffected — the cached sampler is reused as before. Update the two failing tests to use the explicit seed kwarg instead of torch.manual_seed(): - test_he_reduces_error_weight: seed=123 - test_full_pipeline_w2_reproducible: seed=100 Fixes: NVIDIA/Ising-Decoding CI run 23963347042 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add TestDEMSamplingReproducibility to test_dem_sampling.py with four cases: - same seed on CPU produces bit-exact identical frames - different seeds produce different frames - unseeded calls still reuse the cached sampler (perf regression guard) - same seed on GPU produces bit-exact identical frames (GPU-only) These tests use stochastic p values (0.1–0.9) so they would have caught the original regression: before the seed= fix, BitMatrixSampler's internal RNG was not reset between calls, making "same seed" reproducibility impossible regardless of torch.manual_seed(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… seedable Options.__init__() does not accept a 'seed' keyword — the cuST BitMatrixSampler's internal RNG is not exposed via the public API. Replace the attempted Options(seed=N) approach with a small pure-torch fallback (_torch_dem_sampling) that uses a local torch.Generator seeded to the requested value. This path is only taken when seed= is explicitly passed (tests); the production BitMatrixSampler cache path is unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

BitMatrixSampler accepts seed as a constructor kwarg (not via Options). Replace the torch fallback workaround with the correct cuST API: pass seed= directly to BitMatrixSampler(..., seed=seed). A fresh sampler is created on every seeded call so its internal RNG is reset to the requested seed, guaranteeing identical outputs on repeated calls with the same value. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

danlkv · 2026-04-06T18:05:03Z

FrameSimulator docs provide an example on different usages of seed arg, the BitMatrixSampler works the same way.

ivanbasov · 2026-04-06T18:16:59Z

FrameSimulator docs provide an example on different usages of seed arg, the BitMatrixSampler works the same way.

Thank you, @danlkv! Fixed the way you suggested

…ility (#43) * fix(ci): disable torch.compile in orientation training to prevent segfault torch.compile=on combined with DataLoader spawn workers during LER validation causes a segfault (20 leaked semaphores, core dumped). Set PREDECODER_TORCH_COMPILE=0 for the Train all orientations step. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Revert "fix(ci): disable torch.compile in orientation training to prevent segfault" This reverts commit 7f0f6c8. * fix(mid): seed BitMatrixSampler explicitly to restore test reproducibility torch.manual_seed() does not control cuQuantum's BitMatrixSampler internal RNG, so the two mid-GPU tests that relied on it for reproducibility were non-deterministic and intermittently failing. Add an optional `seed` parameter to `dem_sampling()` and `MemoryCircuitTorch.generate_batch()`. When a seed is provided a fresh BitMatrixSampler is always created with `Options(seed=N)`, resetting its internal RNG and guaranteeing identical outputs on every call with the same seed. Production paths (seed=None) are unaffected — the cached sampler is reused as before. Update the two failing tests to use the explicit seed kwarg instead of torch.manual_seed(): - test_he_reduces_error_weight: seed=123 - test_full_pipeline_w2_reproducible: seed=100 Fixes: NVIDIA/Ising-Decoding CI run 23963347042 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * style: fix yapf line-break position in need_new condition Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test: add dem_sampling reproducibility tests for seed= parameter Add TestDEMSamplingReproducibility to test_dem_sampling.py with four cases: - same seed on CPU produces bit-exact identical frames - different seeds produce different frames - unseeded calls still reuse the cached sampler (perf regression guard) - same seed on GPU produces bit-exact identical frames (GPU-only) These tests use stochastic p values (0.1–0.9) so they would have caught the original regression: before the seed= fix, BitMatrixSampler's internal RNG was not reset between calls, making "same seed" reproducibility impossible regardless of torch.manual_seed(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: use torch.Generator for seeded path; BitMatrixSampler RNG is not seedable Options.__init__() does not accept a 'seed' keyword — the cuST BitMatrixSampler's internal RNG is not exposed via the public API. Replace the attempted Options(seed=N) approach with a small pure-torch fallback (_torch_dem_sampling) that uses a local torch.Generator seeded to the requested value. This path is only taken when seed= is explicitly passed (tests); the production BitMatrixSampler cache path is unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: pass seed directly to BitMatrixSampler constructor BitMatrixSampler accepts seed as a constructor kwarg (not via Options). Replace the torch fallback workaround with the correct cuST API: pass seed= directly to BitMatrixSampler(..., seed=seed). A fresh sampler is created on every seeded call so its internal RNG is reset to the requested seed, guaranteeing identical outputs on repeated calls with the same value. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

ivanbasov and others added 6 commits March 30, 2026 11:54

Revert "fix(ci): disable torch.compile in orientation training to pre…

9d3fa08

…vent segfault" This reverts commit 7f0f6c8.

Merge remote-tracking branch 'upstream/main'

838d14f

style: fix yapf line-break position in need_new condition

637d062

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ivanbasov requested review from bmhowe23 and kvmto April 6, 2026 15:58

ivanbasov and others added 2 commits April 6, 2026 09:49

bmhowe23 approved these changes Apr 6, 2026

View reviewed changes

ivanbasov merged commit d09beb7 into NVIDIA:main Apr 7, 2026
17 checks passed

ivanbasov deleted the worktree-mid-running branch April 7, 2026 17:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mid): seed BitMatrixSampler explicitly to restore test reproducibility#43

fix(mid): seed BitMatrixSampler explicitly to restore test reproducibility#43
ivanbasov merged 8 commits into
NVIDIA:mainfrom
ivanbasov:worktree-mid-running

ivanbasov commented Apr 6, 2026

Uh oh!

danlkv commented Apr 6, 2026

Uh oh!

ivanbasov commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ivanbasov commented Apr 6, 2026

Summary

Test plan

Uh oh!

danlkv commented Apr 6, 2026

Uh oh!

ivanbasov commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants