V3 optimizations from development branch by kvmto · Pull Request #13 · NVIDIA/Ising-Decoding

kvmto · 2026-03-11T00:27:51Z

Summary

Squashed import of the kmato/v3_opts development branch, bringing v3 optimization work into the public production repo.

Key changes:

Major expansion of homological_equivalence_torch.py with new optimized torch-based equivalence transformations
Reworked logical_error_rate.py evaluation logic
Training loop updates in train.py (scheduling, logging, workflow changes)
Noise model and DEM sampling refinements
Updated test suite to cover new optimization paths
New timing test config (conf/config_v3_timing_test.yaml)

Notes

All code has been YAPF-formatted to match production style rules (Google base, 100-char limit)
SPDX headers verified on all files
No production-only files were removed, additions and modifications only
Single squash commit, no dev repo history carried over

Test plan

CI passes: yapf-check, spdx-header-check, unit-tests
GPU tests pass (training + inference with LER check)

Core Torch homological equivalence implementation with spacelike and timelike weight-1/weight-2 support. Eval/train integration with composable features. Remove all residual JAX references from codebase. Signed-off-by: kvmto <kmato@nvidia.com>

bmhowe23 · 2026-03-16T14:45:29Z

What is the status of this one? It looks like the gpu-coverage test timed out, but I could be wrong?

Signed-off-by: kvmto <kmato@nvidia.com>

kvmto · 2026-03-17T20:12:42Z

I will squash the commits when ready. for now I will address the comments and review rounds

kvmto · 2026-03-17T20:13:52Z

I removed torch compilation to speed up testing

Signed-off-by: kvmto <kmato@nvidia.com>

bmhowe23 · 2026-03-17T20:49:41Z

I removed torch compilation to speed up testing

Wasn't this a key part of the optimization?

kvmto · 2026-03-18T00:33:54Z

I removed torch compilation to speed up testing

Wasn't this a key part of the optimization?

It would require changing the CI execution time to maybe half an hour, it already runs for 20 minutes.
Is this something we want?

kvmto · 2026-03-18T00:51:20Z

I removed torch compilation to speed up testing

Wasn't this a key part of the optimization?

I just understood what you meant. It is there, it won't be tested in CI to save 13/14 minutes

Remove the V3 inline sparsity guard and restore get_training_upscaled_noise_model as the sole noise scaling path. The V3 sparsity guard is preserved on the v3_optimizations_with_noise_scaling branch for a separate PR. Signed-off-by: kvmto <kmato@nvidia.com>

kvmto · 2026-03-18T22:22:17Z

@ivanbasov @bmhowe23 for the longer tests can I put them in a subfolder? Test discovery seems to be pretty coarse grained.
We would either have to use naming conventions, make separate scripts for picking tests in the ci or using subfolders? any preferences?
I think the subfolder is easy and robust across possible changes

Reverts the PREDECODER_TORCH_COMPILE=0 CI workaround from 7cdf556 that leaked a CI concern into production code (inference + HE kernels). torch.compile now always runs in inference and HE paths. The gpu-coverage timeout is solved by moving the three slow HE compile tests (torch.compile + autotune) into code/tests/mid/, which unittest discover naturally skips. mid-gpu-tests picks them up via a dedicated discovery step with a 40-minute budget. Signed-off-by: kvmto <kmato@nvidia.com>

bmhowe23 · 2026-03-18T22:25:21Z

@ivanbasov @bmhowe23 for the longer tests can I put them in a subfolder? Test discovery seems to be pretty coarse grained. We would either have to use naming conventions, make separate scripts for picking tests in the ci or using subfolders? any preferences? I think the subfolder is easy and robust across possible changes

I will defer this question to @ivanbasov.

ivanbasov · 2026-03-18T22:45:55Z

Subfolders will work fine. Thank you!

Signed-off-by: kvmto <kmato@nvidia.com>

Co-authored-by: Ivan Basov <5455484+ivanbasov@users.noreply.github.com>

The compile block in run_inference_and_decode_pre_decoder_memory was unconditionally calling torch.compile() without checking PREDECODER_TORCH_COMPILE, so setting it to 0 had no effect and the config banner always showed torch.compile=on. The env var was already respected in train.py and config_validator.py but was missing from the inference path in logical_error_rate.py. Also add off(env) banner label to distinguish env-disabled from compile-threw-exception. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…valModule as sole inference path The V3 optimizations PR introduced a dual-path inference architecture (PREDECODER_INLINE_INFERENCE) that bypassed PreDecoderMemoryEvalModule with inline tensor ops. This risks the realtime ONNX/TRT pipeline by creating a divergent code path that cannot be ONNX-exported and demotes the validated PreDecoderMemoryEvalModule to "legacy". Revert the inference path split while keeping all orthogonal V3 improvements: torch.compile, channels_last_3d, CUDAPrefetcher, non-blocking GPU->CPU transfer, timing instrumentation, and the standalone compute_syndrome_density_reduction function. HE training optimizations are completely unaffected. Signed-off-by: kvmto <kmato@nvidia.com>

Signed-off-by: kvmto <kmato@nvidia.com>

Signed-off-by: kvmto <kmato@nvidia.com> # Conflicts: # code/evaluation/logical_error_rate.py

…segfault reduce-overhead mode records CUDA graphs whose device-level state is corrupted when DataLoader workers fork the process, causing segfaults on containers with large /dev/shm (num_workers>0 stays enabled). default mode compiles to optimized kernels without CUDA graphs. Signed-off-by: kvmto <kmato@nvidia.com>

bmhowe23

Thanks, Kevin!

* Replace proprietary license headers with Apache-2.0 Update all SPDX headers from LicenseRef-NvidiaProprietary to Apache-2.0 across all 70 tracked source files. Also updates spdx_headers.py to generate Apache-2.0 headers and replace old proprietary headers in-place. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(headers): apply Apache-2.0 headers to files added after branch cut Files added by PRs #13, #14, and #17 still carried the proprietary LicenseRef-NvidiaProprietary header. Replace with Apache-2.0 to match the rest of the codebase after the header migration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * style: apply YAPF formatting after header replacement Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(headers): restore full file content truncated during rebase The first rebase used --theirs to resolve header conflicts, which took the old PR branch content instead of main's newer content for 5 files. Restore from upstream/main and apply Apache-2.0 header correctly. Affected files: - code/qec/noise_model.py - code/qec/surface_code/homological_equivalence_torch.py - code/tests/mid/test_homological_equivalence.py - code/tests/test_noise_model.py - code/training/train.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(test): remove PreDecoderModelMemory_v2 test removed by PR #18 PR #18 removed the unused v2 model architecture. Drop the corresponding test class and import to fix the ImportError in CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

V3 Torch HE optimizations, eval/train integration, and cleanup - Implement Torch homological equivalence (HE) with spacelike/timelike weight support - Integrate evaluation and training with composable features - Remove all residual JAX references - Miscellaneous cleanup - Revert dual-path inline inference; retain PreDecoderMemoryEvalModule for ONNX/TRT compatibility - Restore legacy-only noise scaling and remove V3 sparsity guard from main branch - Fix PREDECODER_TORCH_COMPILE handling in inference - Adjust CI: remove compilation skips, move slow HE tests to mid-tier Signed-off-by: kvmto <kmato@nvidia.com> Co-authored-by: Ivan Basov <5455484+ivanbasov@users.noreply.github.com> Co-authored-by: Ivan Basov <ibasov@nvidia.com>

* Replace proprietary license headers with Apache-2.0 Update all SPDX headers from LicenseRef-NvidiaProprietary to Apache-2.0 across all 70 tracked source files. Also updates spdx_headers.py to generate Apache-2.0 headers and replace old proprietary headers in-place. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(headers): apply Apache-2.0 headers to files added after branch cut Files added by PRs #13, #14, and #17 still carried the proprietary LicenseRef-NvidiaProprietary header. Replace with Apache-2.0 to match the rest of the codebase after the header migration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * style: apply YAPF formatting after header replacement Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(headers): restore full file content truncated during rebase The first rebase used --theirs to resolve header conflicts, which took the old PR branch content instead of main's newer content for 5 files. Restore from upstream/main and apply Apache-2.0 header correctly. Affected files: - code/qec/noise_model.py - code/qec/surface_code/homological_equivalence_torch.py - code/tests/mid/test_homological_equivalence.py - code/tests/test_noise_model.py - code/training/train.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(test): remove PreDecoderModelMemory_v2 test removed by PR #18 PR #18 removed the unused v2 model architecture. Drop the corresponding test class and import to fix the ImportError in CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

V3 Torch HE optimizations, eval/train integration, and cleanup - Implement Torch homological equivalence (HE) with spacelike/timelike weight support - Integrate evaluation and training with composable features - Remove all residual JAX references - Miscellaneous cleanup - Revert dual-path inline inference; retain PreDecoderMemoryEvalModule for ONNX/TRT compatibility - Restore legacy-only noise scaling and remove V3 sparsity guard from main branch - Fix PREDECODER_TORCH_COMPILE handling in inference - Adjust CI: remove compilation skips, move slow HE tests to mid-tier Signed-off-by: kvmto <kmato@nvidia.com> Co-authored-by: Ivan Basov <5455484+ivanbasov@users.noreply.github.com> Co-authored-by: Ivan Basov <ibasov@nvidia.com>

* Replace proprietary license headers with Apache-2.0 Update all SPDX headers from LicenseRef-NvidiaProprietary to Apache-2.0 across all 70 tracked source files. Also updates spdx_headers.py to generate Apache-2.0 headers and replace old proprietary headers in-place. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(headers): apply Apache-2.0 headers to files added after branch cut Files added by PRs #13, #14, and #17 still carried the proprietary LicenseRef-NvidiaProprietary header. Replace with Apache-2.0 to match the rest of the codebase after the header migration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * style: apply YAPF formatting after header replacement Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(headers): restore full file content truncated during rebase The first rebase used --theirs to resolve header conflicts, which took the old PR branch content instead of main's newer content for 5 files. Restore from upstream/main and apply Apache-2.0 header correctly. Affected files: - code/qec/noise_model.py - code/qec/surface_code/homological_equivalence_torch.py - code/tests/mid/test_homological_equivalence.py - code/tests/test_noise_model.py - code/training/train.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(test): remove PreDecoderModelMemory_v2 test removed by PR #18 PR #18 removed the unused v2 model architecture. Drop the corresponding test class and import to fix the ImportError in CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

bmhowe23 reviewed Mar 11, 2026

View reviewed changes

Comment thread code/qec/noise_model.py

bmhowe23 reviewed Mar 11, 2026

View reviewed changes

Comment thread code/workflows/run.py

ivanbasov reviewed Mar 11, 2026

View reviewed changes

Comment thread code/data/generator_torch.py Outdated

ivanbasov reviewed Mar 11, 2026

View reviewed changes

Comment thread code/scripts/local_run.sh

kvmto force-pushed the v3_optimizations branch 2 times, most recently from 61b140e to 77ebbf7 Compare March 11, 2026 23:46

kvmto requested review from bmhowe23 and ivanbasov March 12, 2026 00:34

ivanbasov reviewed Mar 12, 2026

View reviewed changes

Comment thread code/tests/test_homological_equivalence.py Outdated

ivanbasov reviewed Mar 12, 2026

View reviewed changes

Comment thread conf/config_v3_timing_test.yaml Outdated

kvmto force-pushed the v3_optimizations branch from 77ebbf7 to 94acee2 Compare March 12, 2026 21:58

kvmto requested a review from ivanbasov March 12, 2026 21:59

bmhowe23 reviewed Mar 16, 2026

View reviewed changes

Comment thread code/training/train.py Outdated

bmhowe23 reviewed Mar 16, 2026

View reviewed changes

Comment thread code/qec/surface_code/homological_equivalence_torch.py Outdated

bmhowe23 reviewed Mar 16, 2026

View reviewed changes

Comment thread code/qec/surface_code/homological_equivalence_torch.py Outdated

kvmto added 2 commits March 17, 2026 19:34

spacelike colored cleaned

395ed53

Signed-off-by: kvmto <kmato@nvidia.com>

removed compilation for CI

7cdf556

Signed-off-by: kvmto <kmato@nvidia.com>

kvmto requested a review from bmhowe23 March 17, 2026 20:11

quick lint

05f60d7

Signed-off-by: kvmto <kmato@nvidia.com>

bmhowe23 reviewed Mar 17, 2026

View reviewed changes

Comment thread README.md Outdated

ivanbasov requested changes Mar 18, 2026

View reviewed changes

Comment thread .github/workflows/ci-gpu.yml Outdated

Comment thread code/qec/surface_code/homological_equivalence_torch.py Outdated

Comment thread code/qec/surface_code/homological_equivalence_torch.py

Comment thread code/qec/surface_code/memory_circuit_torch.py Outdated

comments added

5a9e2de

Signed-off-by: kvmto <kmato@nvidia.com>

kvmto requested review from bmhowe23, ivanbasov and jolle-ag March 19, 2026 21:47

ivanbasov approved these changes Mar 19, 2026

View reviewed changes

ivanbasov self-requested a review March 19, 2026 23:35

ivanbasov reviewed Mar 19, 2026

View reviewed changes

Comment thread code/evaluation/logical_error_rate.py

ivanbasov requested changes Mar 19, 2026

View reviewed changes

bmhowe23 requested changes Mar 19, 2026

View reviewed changes

Comment thread code/evaluation/logical_error_rate.py Outdated

kvmto and others added 6 commits March 19, 2026 19:59

Update code/evaluation/logical_error_rate.py

2d8dbb1

Co-authored-by: Ivan Basov <5455484+ivanbasov@users.noreply.github.com>

quick lint

bb07d13

Signed-off-by: kvmto <kmato@nvidia.com>

Merge remote-tracking branch 'origin/main' into v3_optimizations

7b8df4d

Signed-off-by: kvmto <kmato@nvidia.com> # Conflicts: # code/evaluation/logical_error_rate.py

bmhowe23 mentioned this pull request Mar 23, 2026

Update noise scaling #20

Closed

bmhowe23 approved these changes Mar 23, 2026

View reviewed changes

ivanbasov approved these changes Mar 23, 2026

View reviewed changes

kvmto merged commit 2e99b0e into NVIDIA:main Mar 23, 2026
12 checks passed

bmhowe23 deleted the v3_optimizations branch March 31, 2026 17:39

Conversation

kvmto commented Mar 11, 2026

Summary

Notes

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bmhowe23 commented Mar 16, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kvmto commented Mar 17, 2026

Uh oh!

kvmto commented Mar 17, 2026

Uh oh!

bmhowe23 commented Mar 17, 2026

Uh oh!

Uh oh!

kvmto commented Mar 18, 2026

Uh oh!

kvmto commented Mar 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kvmto commented Mar 18, 2026

Uh oh!

bmhowe23 commented Mar 18, 2026

Uh oh!

ivanbasov commented Mar 18, 2026

Uh oh!

Uh oh!

Uh oh!

bmhowe23 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants