Skip to content

V3 optimizations from development branch#13

Merged
kvmto merged 13 commits intoNVIDIA:mainfrom
kvmto:v3_optimizations
Mar 23, 2026
Merged

V3 optimizations from development branch#13
kvmto merged 13 commits intoNVIDIA:mainfrom
kvmto:v3_optimizations

Conversation

@kvmto
Copy link
Copy Markdown
Collaborator

@kvmto kvmto commented Mar 11, 2026

Summary

Squashed import of the kmato/v3_opts development branch, bringing v3 optimization work into the public production repo.

Key changes:

  • Major expansion of homological_equivalence_torch.py with new optimized torch-based equivalence transformations
  • Reworked logical_error_rate.py evaluation logic
  • Training loop updates in train.py (scheduling, logging, workflow changes)
  • Noise model and DEM sampling refinements
  • Updated test suite to cover new optimization paths
  • New timing test config (conf/config_v3_timing_test.yaml)

Notes

  • All code has been YAPF-formatted to match production style rules (Google base, 100-char limit)
  • SPDX headers verified on all files
  • No production-only files were removed, additions and modifications only
  • Single squash commit, no dev repo history carried over

Test plan

  • CI passes: yapf-check, spdx-header-check, unit-tests
  • GPU tests pass (training + inference with LER check)

Comment thread code/qec/noise_model.py
Comment thread code/workflows/run.py
Comment thread code/data/generator_torch.py Outdated
Comment thread code/scripts/local_run.sh
@kvmto kvmto force-pushed the v3_optimizations branch 2 times, most recently from 61b140e to 77ebbf7 Compare March 11, 2026 23:46
@kvmto kvmto requested review from bmhowe23 and ivanbasov March 12, 2026 00:34
Comment thread code/tests/test_homological_equivalence.py Outdated
Comment thread conf/config_v3_timing_test.yaml Outdated
Core Torch homological equivalence implementation with spacelike and
timelike weight-1/weight-2 support. Eval/train integration with
composable features. Remove all residual JAX references from codebase.

Signed-off-by: kvmto <kmato@nvidia.com>
@kvmto kvmto force-pushed the v3_optimizations branch from 77ebbf7 to 94acee2 Compare March 12, 2026 21:58
@kvmto kvmto requested a review from ivanbasov March 12, 2026 21:59
@bmhowe23
Copy link
Copy Markdown
Collaborator

What is the status of this one? It looks like the gpu-coverage test timed out, but I could be wrong?

Comment thread code/training/train.py Outdated
Comment thread code/qec/surface_code/homological_equivalence_torch.py Outdated
Comment thread code/qec/surface_code/homological_equivalence_torch.py Outdated
kvmto added 2 commits March 17, 2026 19:34
Signed-off-by: kvmto <kmato@nvidia.com>
Signed-off-by: kvmto <kmato@nvidia.com>
@kvmto kvmto requested a review from bmhowe23 March 17, 2026 20:11
@kvmto
Copy link
Copy Markdown
Collaborator Author

kvmto commented Mar 17, 2026

I will squash the commits when ready. for now I will address the comments and review rounds

@kvmto
Copy link
Copy Markdown
Collaborator Author

kvmto commented Mar 17, 2026

I removed torch compilation to speed up testing

Signed-off-by: kvmto <kmato@nvidia.com>
@bmhowe23
Copy link
Copy Markdown
Collaborator

I removed torch compilation to speed up testing

Wasn't this a key part of the optimization?

Comment thread README.md Outdated
@kvmto
Copy link
Copy Markdown
Collaborator Author

kvmto commented Mar 18, 2026

I removed torch compilation to speed up testing

Wasn't this a key part of the optimization?

It would require changing the CI execution time to maybe half an hour, it already runs for 20 minutes.
Is this something we want?

@kvmto
Copy link
Copy Markdown
Collaborator Author

kvmto commented Mar 18, 2026

I removed torch compilation to speed up testing

Wasn't this a key part of the optimization?

I just understood what you meant. It is there, it won't be tested in CI to save 13/14 minutes

Remove the V3 inline sparsity guard and restore
get_training_upscaled_noise_model as the sole noise scaling path.

The V3 sparsity guard is preserved on the
v3_optimizations_with_noise_scaling branch for a separate PR.

Signed-off-by: kvmto <kmato@nvidia.com>
Comment thread .github/workflows/ci-gpu.yml Outdated
Comment thread code/qec/surface_code/homological_equivalence_torch.py Outdated
Comment thread code/qec/surface_code/homological_equivalence_torch.py
Comment thread code/qec/surface_code/memory_circuit_torch.py Outdated
@kvmto
Copy link
Copy Markdown
Collaborator Author

kvmto commented Mar 18, 2026

@ivanbasov @bmhowe23 for the longer tests can I put them in a subfolder? Test discovery seems to be pretty coarse grained.
We would either have to use naming conventions, make separate scripts for picking tests in the ci or using subfolders? any preferences?
I think the subfolder is easy and robust across possible changes

Reverts the PREDECODER_TORCH_COMPILE=0 CI workaround from 7cdf556 that
leaked a CI concern into production code (inference + HE kernels).
torch.compile now always runs in inference and HE paths.

The gpu-coverage timeout is solved by moving the three slow HE compile
tests (torch.compile + autotune) into code/tests/mid/, which unittest
discover naturally skips.  mid-gpu-tests picks them up via a dedicated
discovery step with a 40-minute budget.

Signed-off-by: kvmto <kmato@nvidia.com>
@bmhowe23
Copy link
Copy Markdown
Collaborator

@ivanbasov @bmhowe23 for the longer tests can I put them in a subfolder? Test discovery seems to be pretty coarse grained. We would either have to use naming conventions, make separate scripts for picking tests in the ci or using subfolders? any preferences? I think the subfolder is easy and robust across possible changes

I will defer this question to @ivanbasov.

@ivanbasov
Copy link
Copy Markdown
Collaborator

Subfolders will work fine. Thank you!

Signed-off-by: kvmto <kmato@nvidia.com>
@ivanbasov ivanbasov self-requested a review March 19, 2026 23:35
Comment thread code/evaluation/logical_error_rate.py
Comment thread code/evaluation/logical_error_rate.py Outdated
kvmto and others added 6 commits March 19, 2026 19:59
Co-authored-by: Ivan Basov <5455484+ivanbasov@users.noreply.github.com>
The compile block in run_inference_and_decode_pre_decoder_memory was
unconditionally calling torch.compile() without checking
PREDECODER_TORCH_COMPILE, so setting it to 0 had no effect and the
config banner always showed torch.compile=on.

The env var was already respected in train.py and config_validator.py
but was missing from the inference path in logical_error_rate.py.

Also add off(env) banner label to distinguish env-disabled from
compile-threw-exception.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…valModule as sole inference path

The V3 optimizations PR introduced a dual-path inference architecture
(PREDECODER_INLINE_INFERENCE) that bypassed PreDecoderMemoryEvalModule
with inline tensor ops. This risks the realtime ONNX/TRT pipeline by
creating a divergent code path that cannot be ONNX-exported and demotes
the validated PreDecoderMemoryEvalModule to "legacy".

Revert the inference path split while keeping all orthogonal V3
improvements: torch.compile, channels_last_3d, CUDAPrefetcher,
non-blocking GPU->CPU transfer, timing instrumentation, and the
standalone compute_syndrome_density_reduction function.

HE training optimizations are completely unaffected.

Signed-off-by: kvmto <kmato@nvidia.com>
Signed-off-by: kvmto <kmato@nvidia.com>
Signed-off-by: kvmto <kmato@nvidia.com>

# Conflicts:
#	code/evaluation/logical_error_rate.py
…segfault

reduce-overhead mode records CUDA graphs whose device-level state is
corrupted when DataLoader workers fork the process, causing segfaults
on containers with large /dev/shm (num_workers>0 stays enabled).
default mode compiles to optimized kernels without CUDA graphs.

Signed-off-by: kvmto <kmato@nvidia.com>
@bmhowe23 bmhowe23 mentioned this pull request Mar 23, 2026
Copy link
Copy Markdown
Collaborator

@bmhowe23 bmhowe23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Kevin!

@kvmto kvmto merged commit 2e99b0e into NVIDIA:main Mar 23, 2026
12 checks passed
ivanbasov added a commit that referenced this pull request Mar 23, 2026
* Replace proprietary license headers with Apache-2.0

Update all SPDX headers from LicenseRef-NvidiaProprietary to Apache-2.0
across all 70 tracked source files. Also updates spdx_headers.py to
generate Apache-2.0 headers and replace old proprietary headers in-place.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(headers): apply Apache-2.0 headers to files added after branch cut

Files added by PRs #13, #14, and #17 still carried the proprietary
LicenseRef-NvidiaProprietary header. Replace with Apache-2.0 to match
the rest of the codebase after the header migration.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* style: apply YAPF formatting after header replacement

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(headers): restore full file content truncated during rebase

The first rebase used --theirs to resolve header conflicts, which took
the old PR branch content instead of main's newer content for 5 files.
Restore from upstream/main and apply Apache-2.0 header correctly.

Affected files:
- code/qec/noise_model.py
- code/qec/surface_code/homological_equivalence_torch.py
- code/tests/mid/test_homological_equivalence.py
- code/tests/test_noise_model.py
- code/training/train.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): remove PreDecoderModelMemory_v2 test removed by PR #18

PR #18 removed the unused v2 model architecture. Drop the corresponding
test class and import to fix the ImportError in CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
@bmhowe23 bmhowe23 deleted the v3_optimizations branch March 31, 2026 17:39
ivanbasov added a commit that referenced this pull request Apr 10, 2026
V3 Torch HE optimizations, eval/train integration, and cleanup

- Implement Torch homological equivalence (HE) with spacelike/timelike weight support
- Integrate evaluation and training with composable features
- Remove all residual JAX references
- Miscellaneous cleanup
- Revert dual-path inline inference; retain PreDecoderMemoryEvalModule for ONNX/TRT compatibility
- Restore legacy-only noise scaling and remove V3 sparsity guard from main branch
- Fix PREDECODER_TORCH_COMPILE handling in inference
- Adjust CI: remove compilation skips, move slow HE tests to mid-tier

Signed-off-by: kvmto <kmato@nvidia.com>
Co-authored-by: Ivan Basov <5455484+ivanbasov@users.noreply.github.com>
Co-authored-by: Ivan Basov <ibasov@nvidia.com>
ivanbasov added a commit that referenced this pull request Apr 10, 2026
* Replace proprietary license headers with Apache-2.0

Update all SPDX headers from LicenseRef-NvidiaProprietary to Apache-2.0
across all 70 tracked source files. Also updates spdx_headers.py to
generate Apache-2.0 headers and replace old proprietary headers in-place.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(headers): apply Apache-2.0 headers to files added after branch cut

Files added by PRs #13, #14, and #17 still carried the proprietary
LicenseRef-NvidiaProprietary header. Replace with Apache-2.0 to match
the rest of the codebase after the header migration.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* style: apply YAPF formatting after header replacement

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(headers): restore full file content truncated during rebase

The first rebase used --theirs to resolve header conflicts, which took
the old PR branch content instead of main's newer content for 5 files.
Restore from upstream/main and apply Apache-2.0 header correctly.

Affected files:
- code/qec/noise_model.py
- code/qec/surface_code/homological_equivalence_torch.py
- code/tests/mid/test_homological_equivalence.py
- code/tests/test_noise_model.py
- code/training/train.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): remove PreDecoderModelMemory_v2 test removed by PR #18

PR #18 removed the unused v2 model architecture. Drop the corresponding
test class and import to fix the ImportError in CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
ivanbasov added a commit that referenced this pull request Apr 10, 2026
V3 Torch HE optimizations, eval/train integration, and cleanup

- Implement Torch homological equivalence (HE) with spacelike/timelike weight support
- Integrate evaluation and training with composable features
- Remove all residual JAX references
- Miscellaneous cleanup
- Revert dual-path inline inference; retain PreDecoderMemoryEvalModule for ONNX/TRT compatibility
- Restore legacy-only noise scaling and remove V3 sparsity guard from main branch
- Fix PREDECODER_TORCH_COMPILE handling in inference
- Adjust CI: remove compilation skips, move slow HE tests to mid-tier

Signed-off-by: kvmto <kmato@nvidia.com>
Co-authored-by: Ivan Basov <5455484+ivanbasov@users.noreply.github.com>
Co-authored-by: Ivan Basov <ibasov@nvidia.com>
ivanbasov added a commit that referenced this pull request Apr 10, 2026
* Replace proprietary license headers with Apache-2.0

Update all SPDX headers from LicenseRef-NvidiaProprietary to Apache-2.0
across all 70 tracked source files. Also updates spdx_headers.py to
generate Apache-2.0 headers and replace old proprietary headers in-place.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(headers): apply Apache-2.0 headers to files added after branch cut

Files added by PRs #13, #14, and #17 still carried the proprietary
LicenseRef-NvidiaProprietary header. Replace with Apache-2.0 to match
the rest of the codebase after the header migration.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* style: apply YAPF formatting after header replacement

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(headers): restore full file content truncated during rebase

The first rebase used --theirs to resolve header conflicts, which took
the old PR branch content instead of main's newer content for 5 files.
Restore from upstream/main and apply Apache-2.0 header correctly.

Affected files:
- code/qec/noise_model.py
- code/qec/surface_code/homological_equivalence_torch.py
- code/tests/mid/test_homological_equivalence.py
- code/tests/test_noise_model.py
- code/training/train.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): remove PreDecoderModelMemory_v2 test removed by PR #18

PR #18 removed the unused v2 model architecture. Drop the corresponding
test class and import to fix the ImportError in CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants