Ci/multiprocessing dataloader tests by kvmto · Pull Request #29 · NVIDIA/Ising-Decoding

kvmto · 2026-03-25T15:01:20Z

Summary

Add test_dataloader_multiprocessing.py: verifies the Stim inference datapipe works with num_workers=2 and spawn multiprocessing (X, Z, mixed bases). CPU-only.
Add multiprocessing-dataloader job to ci.yml to run the above on every push/PR.
Add inference-only step to ci-gpu.yml gpu-tests that re-runs with PREDECODER_INFERENCE_NUM_WORKERS=2, exercising the full logical_error_rate.py pipeline with multi-worker loading.

Production defaults use num_workers=4, but all CI and tests previously forced num_workers=0.

Test plan

New test passes locally (3 tests, ~9s)
CPU CI multiprocessing-dataloader job passes
GPU CI gpu-tests multi-worker inference step passes

copy-pr-bot · 2026-03-25T16:13:10Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Production defaults use num_workers=4 with spawn multiprocessing, but all CI jobs and tests forced num_workers=0. Add coverage for both layers: - New test (test_dataloader_multiprocessing.py): verifies the Stim inference datapipe is pickle-safe and produces correct results with num_workers=2 across X, Z, and mixed bases. Runs on CPU in a dedicated ci.yml job. - New ci-gpu.yml step: re-runs inference with PREDECODER_INFERENCE_NUM_WORKERS=2 after the existing smoke run, exercising the full logical_error_rate.py pipeline (multi-worker DataLoader → model forward → PyMatching → LER check). Signed-off-by: kvmto <kmato@nvidia.com>

kvmto requested a review from ivanbasov March 25, 2026 15:03

ivanbasov approved these changes Mar 25, 2026

View reviewed changes

kvmto force-pushed the ci/multiprocessing-dataloader-tests branch from c015f05 to 2db609e Compare March 25, 2026 16:13

kvmto force-pushed the ci/multiprocessing-dataloader-tests branch from 2db609e to 2b014d3 Compare March 25, 2026 16:20

kvmto merged commit 1b49b69 into NVIDIA:main Mar 25, 2026
13 checks passed

kvmto mentioned this pull request Mar 31, 2026

fix(ler): force num_workers=0 when torch.compile is active to prevent segfault #31

Closed

2 tasks

bmhowe23 deleted the ci/multiprocessing-dataloader-tests branch March 31, 2026 17:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ci/multiprocessing dataloader tests#29

Ci/multiprocessing dataloader tests#29
kvmto merged 1 commit into
NVIDIA:mainfrom
kvmto:ci/multiprocessing-dataloader-tests

kvmto commented Mar 25, 2026

Uh oh!

copy-pr-bot Bot commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kvmto commented Mar 25, 2026

Summary

Test plan

Uh oh!

copy-pr-bot Bot commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants