V3 optimizations from development branch#13
Conversation
61b140e to
77ebbf7
Compare
Core Torch homological equivalence implementation with spacelike and timelike weight-1/weight-2 support. Eval/train integration with composable features. Remove all residual JAX references from codebase. Signed-off-by: kvmto <kmato@nvidia.com>
|
What is the status of this one? It looks like the gpu-coverage test timed out, but I could be wrong? |
Signed-off-by: kvmto <kmato@nvidia.com>
Signed-off-by: kvmto <kmato@nvidia.com>
|
I will squash the commits when ready. for now I will address the comments and review rounds |
|
I removed torch compilation to speed up testing |
Signed-off-by: kvmto <kmato@nvidia.com>
Wasn't this a key part of the optimization? |
It would require changing the CI execution time to maybe half an hour, it already runs for 20 minutes. |
I just understood what you meant. It is there, it won't be tested in CI to save 13/14 minutes |
Remove the V3 inline sparsity guard and restore get_training_upscaled_noise_model as the sole noise scaling path. The V3 sparsity guard is preserved on the v3_optimizations_with_noise_scaling branch for a separate PR. Signed-off-by: kvmto <kmato@nvidia.com>
|
@ivanbasov @bmhowe23 for the longer tests can I put them in a subfolder? Test discovery seems to be pretty coarse grained. |
Reverts the PREDECODER_TORCH_COMPILE=0 CI workaround from 7cdf556 that leaked a CI concern into production code (inference + HE kernels). torch.compile now always runs in inference and HE paths. The gpu-coverage timeout is solved by moving the three slow HE compile tests (torch.compile + autotune) into code/tests/mid/, which unittest discover naturally skips. mid-gpu-tests picks them up via a dedicated discovery step with a 40-minute budget. Signed-off-by: kvmto <kmato@nvidia.com>
I will defer this question to @ivanbasov. |
|
Subfolders will work fine. Thank you! |
Signed-off-by: kvmto <kmato@nvidia.com>
Co-authored-by: Ivan Basov <5455484+ivanbasov@users.noreply.github.com>
The compile block in run_inference_and_decode_pre_decoder_memory was unconditionally calling torch.compile() without checking PREDECODER_TORCH_COMPILE, so setting it to 0 had no effect and the config banner always showed torch.compile=on. The env var was already respected in train.py and config_validator.py but was missing from the inference path in logical_error_rate.py. Also add off(env) banner label to distinguish env-disabled from compile-threw-exception. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…valModule as sole inference path The V3 optimizations PR introduced a dual-path inference architecture (PREDECODER_INLINE_INFERENCE) that bypassed PreDecoderMemoryEvalModule with inline tensor ops. This risks the realtime ONNX/TRT pipeline by creating a divergent code path that cannot be ONNX-exported and demotes the validated PreDecoderMemoryEvalModule to "legacy". Revert the inference path split while keeping all orthogonal V3 improvements: torch.compile, channels_last_3d, CUDAPrefetcher, non-blocking GPU->CPU transfer, timing instrumentation, and the standalone compute_syndrome_density_reduction function. HE training optimizations are completely unaffected. Signed-off-by: kvmto <kmato@nvidia.com>
Signed-off-by: kvmto <kmato@nvidia.com>
Signed-off-by: kvmto <kmato@nvidia.com> # Conflicts: # code/evaluation/logical_error_rate.py
…segfault reduce-overhead mode records CUDA graphs whose device-level state is corrupted when DataLoader workers fork the process, causing segfaults on containers with large /dev/shm (num_workers>0 stays enabled). default mode compiles to optimized kernels without CUDA graphs. Signed-off-by: kvmto <kmato@nvidia.com>
* Replace proprietary license headers with Apache-2.0 Update all SPDX headers from LicenseRef-NvidiaProprietary to Apache-2.0 across all 70 tracked source files. Also updates spdx_headers.py to generate Apache-2.0 headers and replace old proprietary headers in-place. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(headers): apply Apache-2.0 headers to files added after branch cut Files added by PRs #13, #14, and #17 still carried the proprietary LicenseRef-NvidiaProprietary header. Replace with Apache-2.0 to match the rest of the codebase after the header migration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * style: apply YAPF formatting after header replacement Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(headers): restore full file content truncated during rebase The first rebase used --theirs to resolve header conflicts, which took the old PR branch content instead of main's newer content for 5 files. Restore from upstream/main and apply Apache-2.0 header correctly. Affected files: - code/qec/noise_model.py - code/qec/surface_code/homological_equivalence_torch.py - code/tests/mid/test_homological_equivalence.py - code/tests/test_noise_model.py - code/training/train.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(test): remove PreDecoderModelMemory_v2 test removed by PR #18 PR #18 removed the unused v2 model architecture. Drop the corresponding test class and import to fix the ImportError in CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
V3 Torch HE optimizations, eval/train integration, and cleanup - Implement Torch homological equivalence (HE) with spacelike/timelike weight support - Integrate evaluation and training with composable features - Remove all residual JAX references - Miscellaneous cleanup - Revert dual-path inline inference; retain PreDecoderMemoryEvalModule for ONNX/TRT compatibility - Restore legacy-only noise scaling and remove V3 sparsity guard from main branch - Fix PREDECODER_TORCH_COMPILE handling in inference - Adjust CI: remove compilation skips, move slow HE tests to mid-tier Signed-off-by: kvmto <kmato@nvidia.com> Co-authored-by: Ivan Basov <5455484+ivanbasov@users.noreply.github.com> Co-authored-by: Ivan Basov <ibasov@nvidia.com>
* Replace proprietary license headers with Apache-2.0 Update all SPDX headers from LicenseRef-NvidiaProprietary to Apache-2.0 across all 70 tracked source files. Also updates spdx_headers.py to generate Apache-2.0 headers and replace old proprietary headers in-place. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(headers): apply Apache-2.0 headers to files added after branch cut Files added by PRs #13, #14, and #17 still carried the proprietary LicenseRef-NvidiaProprietary header. Replace with Apache-2.0 to match the rest of the codebase after the header migration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * style: apply YAPF formatting after header replacement Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(headers): restore full file content truncated during rebase The first rebase used --theirs to resolve header conflicts, which took the old PR branch content instead of main's newer content for 5 files. Restore from upstream/main and apply Apache-2.0 header correctly. Affected files: - code/qec/noise_model.py - code/qec/surface_code/homological_equivalence_torch.py - code/tests/mid/test_homological_equivalence.py - code/tests/test_noise_model.py - code/training/train.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(test): remove PreDecoderModelMemory_v2 test removed by PR #18 PR #18 removed the unused v2 model architecture. Drop the corresponding test class and import to fix the ImportError in CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
V3 Torch HE optimizations, eval/train integration, and cleanup - Implement Torch homological equivalence (HE) with spacelike/timelike weight support - Integrate evaluation and training with composable features - Remove all residual JAX references - Miscellaneous cleanup - Revert dual-path inline inference; retain PreDecoderMemoryEvalModule for ONNX/TRT compatibility - Restore legacy-only noise scaling and remove V3 sparsity guard from main branch - Fix PREDECODER_TORCH_COMPILE handling in inference - Adjust CI: remove compilation skips, move slow HE tests to mid-tier Signed-off-by: kvmto <kmato@nvidia.com> Co-authored-by: Ivan Basov <5455484+ivanbasov@users.noreply.github.com> Co-authored-by: Ivan Basov <ibasov@nvidia.com>
* Replace proprietary license headers with Apache-2.0 Update all SPDX headers from LicenseRef-NvidiaProprietary to Apache-2.0 across all 70 tracked source files. Also updates spdx_headers.py to generate Apache-2.0 headers and replace old proprietary headers in-place. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(headers): apply Apache-2.0 headers to files added after branch cut Files added by PRs #13, #14, and #17 still carried the proprietary LicenseRef-NvidiaProprietary header. Replace with Apache-2.0 to match the rest of the codebase after the header migration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * style: apply YAPF formatting after header replacement Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(headers): restore full file content truncated during rebase The first rebase used --theirs to resolve header conflicts, which took the old PR branch content instead of main's newer content for 5 files. Restore from upstream/main and apply Apache-2.0 header correctly. Affected files: - code/qec/noise_model.py - code/qec/surface_code/homological_equivalence_torch.py - code/tests/mid/test_homological_equivalence.py - code/tests/test_noise_model.py - code/training/train.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(test): remove PreDecoderModelMemory_v2 test removed by PR #18 PR #18 removed the unused v2 model architecture. Drop the corresponding test class and import to fix the ImportError in CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Squashed import of the
kmato/v3_optsdevelopment branch, bringing v3 optimization work into the public production repo.Key changes:
homological_equivalence_torch.pywith new optimized torch-based equivalence transformationslogical_error_rate.pyevaluation logictrain.py(scheduling, logging, workflow changes)conf/config_v3_timing_test.yaml)Notes
Test plan
yapf-check,spdx-header-check,unit-tests