Skip to content

Ci/multiprocessing dataloader tests#29

Merged
kvmto merged 1 commit into
NVIDIA:mainfrom
kvmto:ci/multiprocessing-dataloader-tests
Mar 25, 2026
Merged

Ci/multiprocessing dataloader tests#29
kvmto merged 1 commit into
NVIDIA:mainfrom
kvmto:ci/multiprocessing-dataloader-tests

Conversation

@kvmto
Copy link
Copy Markdown
Collaborator

@kvmto kvmto commented Mar 25, 2026

Summary

  • Add test_dataloader_multiprocessing.py: verifies the Stim inference datapipe works with num_workers=2 and spawn multiprocessing (X, Z, mixed bases). CPU-only.
  • Add multiprocessing-dataloader job to ci.yml to run the above on every push/PR.
  • Add inference-only step to ci-gpu.yml gpu-tests that re-runs with PREDECODER_INFERENCE_NUM_WORKERS=2, exercising the full logical_error_rate.py pipeline with multi-worker loading.

Production defaults use num_workers=4, but all CI and tests previously forced num_workers=0.

Test plan

  • New test passes locally (3 tests, ~9s)
  • CPU CI multiprocessing-dataloader job passes
  • GPU CI gpu-tests multi-worker inference step passes

@kvmto kvmto requested a review from ivanbasov March 25, 2026 15:03
@kvmto kvmto force-pushed the ci/multiprocessing-dataloader-tests branch from c015f05 to 2db609e Compare March 25, 2026 16:13
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Mar 25, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Production defaults use num_workers=4 with spawn multiprocessing, but all
CI jobs and tests forced num_workers=0. Add coverage for both layers:

- New test (test_dataloader_multiprocessing.py): verifies the Stim inference
  datapipe is pickle-safe and produces correct results with num_workers=2
  across X, Z, and mixed bases. Runs on CPU in a dedicated ci.yml job.

- New ci-gpu.yml step: re-runs inference with PREDECODER_INFERENCE_NUM_WORKERS=2
  after the existing smoke run, exercising the full logical_error_rate.py
  pipeline (multi-worker DataLoader → model forward → PyMatching → LER check).

Signed-off-by: kvmto <kmato@nvidia.com>
@kvmto kvmto force-pushed the ci/multiprocessing-dataloader-tests branch from 2db609e to 2b014d3 Compare March 25, 2026 16:20
@kvmto kvmto merged commit 1b49b69 into NVIDIA:main Mar 25, 2026
13 checks passed
@bmhowe23 bmhowe23 deleted the ci/multiprocessing-dataloader-tests branch March 31, 2026 17:32
ivanbasov pushed a commit that referenced this pull request Apr 10, 2026
Production defaults use num_workers=4 with spawn multiprocessing, but all
CI jobs and tests forced num_workers=0. Add coverage for both layers:

- New test (test_dataloader_multiprocessing.py): verifies the Stim inference
  datapipe is pickle-safe and produces correct results with num_workers=2
  across X, Z, and mixed bases. Runs on CPU in a dedicated ci.yml job.

- New ci-gpu.yml step: re-runs inference with PREDECODER_INFERENCE_NUM_WORKERS=2
  after the existing smoke run, exercising the full logical_error_rate.py
  pipeline (multi-worker DataLoader → model forward → PyMatching → LER check).

Signed-off-by: kvmto <kmato@nvidia.com>
ivanbasov pushed a commit that referenced this pull request Apr 10, 2026
Production defaults use num_workers=4 with spawn multiprocessing, but all
CI jobs and tests forced num_workers=0. Add coverage for both layers:

- New test (test_dataloader_multiprocessing.py): verifies the Stim inference
  datapipe is pickle-safe and produces correct results with num_workers=2
  across X, Z, and mixed bases. Runs on CPU in a dedicated ci.yml job.

- New ci-gpu.yml step: re-runs inference with PREDECODER_INFERENCE_NUM_WORKERS=2
  after the existing smoke run, exercising the full logical_error_rate.py
  pipeline (multi-worker DataLoader → model forward → PyMatching → LER check).

Signed-off-by: kvmto <kmato@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants