Skip to content

Improvement cuStabilizer DEM sampling backend and consolidate tests#32

Merged
kvmto merged 24 commits into
NVIDIA:mainfrom
kvmto:custab_int_clean
Apr 1, 2026
Merged

Improvement cuStabilizer DEM sampling backend and consolidate tests#32
kvmto merged 24 commits into
NVIDIA:mainfrom
kvmto:custab_int_clean

Conversation

@kvmto
Copy link
Copy Markdown
Collaborator

@kvmto kvmto commented Mar 31, 2026

Summary

  • Remove the torch fallback path from dem_sampling.py, making cuQuantum's cuStabilizer BitMatrixSampler the sole DEM sampling backend. This eliminates the USE_CUSTAB env-var toggle, the _use_custab() helper, and the custab_matrix_sampling() indirection — dem_sampling() now calls cuST directly.
  • Simplify the sampler cache: replace the id(H) address-based cache key with a proper is-identity check (_cached_H is not H) and pre-cache H.T to avoid repeated transposes.
  • Consolidate test files: merge test_dem_sampling_custab.py and test_dem_sampling_integration.py into test_dem_sampling.py, removing the now-unnecessary fallback-specific tests.
  • Add cuquantum>=26.3.0 as an explicit dependency in requirements_public_train.txt.
  • Fix CI to install from requirements_public_train.txt (which includes inference deps) instead of requirements_public_inference.txt alone.

Test plan

  • CI GPU tests pass (test_dem_sampling.py covers CPU, GPU, statistical, integration, and cuST-vs-torch deterministic cross-checks)
  • Verify cuquantum>=26.3.0 resolves correctly in the CI environment
  • Confirm no regressions in training pipeline end-to-end

Remove the torch fallback path from dem_sampling.py — cuquantum's
BitMatrixSampler is now the only sampling backend, simplifying the
module and eliminating the USE_CUSTAB toggle.  The sampler cache uses
identity-based comparison with a pre-cached transpose to avoid
redundant reallocation.

Merge test_dem_sampling_custab.py and test_dem_sampling_integration.py
into test_dem_sampling.py for a single, comprehensive test suite.

Also:
- Add cuquantum>=26.3.0 to requirements_public_train.txt
- Fix CI to install train (not inference) requirements for GPU tests
- Apply yapf formatting (Google style, 100-col limit)

Signed-off-by: kvmto <kmato@nvidia.com>
@kvmto kvmto requested review from bmhowe23 and ivanbasov March 31, 2026 19:03
Comment thread code/qec/dem_sampling.py Outdated
kvmto and others added 8 commits March 31, 2026 19:39
Signed-off-by: kvmto <kmato@nvidia.com>
…lure

The cuquantum meta-package fails to build in environments where
pkg_resources is unavailable. Pin the CUDA-12 specific wheel directly
to bypass the broken auto-detection setup.py.

Signed-off-by: kvmto <kmato@nvidia.com>
Signed-off-by: kvmto <kmato@nvidia.com>
Signed-off-by: kvmto <kmato@nvidia.com>
Signed-off-by: kvmto <kmato@nvidia.com>
Signed-off-by: kvmto <kmato@nvidia.com>
Comment thread .github/workflows/ci-gpu.yml Outdated
Comment thread code/scripts/check_python_compat.sh
Comment thread .github/workflows/ci-gpu.yml Outdated
Copy link
Copy Markdown
Member

@ivanbasov ivanbasov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI failures root cause

Two related issues, both from the same design decision — making cuquantum.BitMatrixSampler the sole backend without adding skip guards in the tests.


Issue 1 — Unit-test CI (CPU runners): ModuleNotFoundError: No module named 'cuquantum'

dem_sampling() now does a hard import of cuquantum.stabilizer unconditionally at call time (line 81 of the new dem_sampling.py):

def dem_sampling(...):
    from cuquantum.stabilizer.dem_sampling import BitMatrixSampler  # <-- always runs
    from cuquantum.stabilizer.simulator import Options

The CPU CI runner doesn't have cuquantum installed, so every test that calls dem_sampling() crashes immediately. Affected test classes in test_dem_sampling.py:

  • TestBitMatrixSamplerIsUsed — imports cuquantum directly, no skip guard
  • TestDemSamplingCPU — calls dem_sampling(), no skip guard
  • TestDEMSamplingIntegration — calls generate_batch()dem_sampling(), no skip guard
  • TestCustVsTorchDeterministic — calls dem_sampling(), no skip guard
  • TestCustVsTorchStatistical.test_small_matrix_cpu — calls dem_sampling(), no skip guard

The old test_dem_sampling_custab.py handled this correctly with @unittest.skipUnless(_custab_available(), ...) at class level. That decorator was dropped when the files were consolidated.


Issue 2 — GPU CI runner: ModuleNotFoundError: No module named 'cuquantum.stabilizer'

The GPU runner has cuquantum installed, but an older version that predates the stabilizer subpackage (which requires >=26.3.0). Same test classes fail for the same reason — no skip guard catches the import error.


Suggested fix

The cleanest approach is to move the cuquantum import to module level with a try/except (restoring _CUSTAB_AVAILABLE), and add @unittest.skipUnless(_dem_mod._CUSTAB_AVAILABLE, "cuquantum>=26.3.0 not available") to all five affected test classes. The old code on main already does this correctly in _custab_available() / _CUSTAB_AVAILABLE — the pattern just needs to be carried forward.

# dem_sampling.py — module level
try:
    from cuquantum.stabilizer.dem_sampling import BitMatrixSampler
    from cuquantum.stabilizer.simulator import Options
    _CUSTAB_AVAILABLE = True
except ImportError:
    BitMatrixSampler = None
    Options = None
    _CUSTAB_AVAILABLE = False
# test_dem_sampling.py
@unittest.skipUnless(_dem_mod._CUSTAB_AVAILABLE, "cuquantum>=26.3.0 (stabilizer) not available")
class TestDemSamplingCPU(unittest.TestCase): ...

@unittest.skipUnless(_dem_mod._CUSTAB_AVAILABLE, "cuquantum>=26.3.0 (stabilizer) not available")
class TestBitMatrixSamplerIsUsed(unittest.TestCase): ...

# ... same for TestDEMSamplingIntegration, TestCustVsTorchDeterministic, TestCustVsTorchStatistical

This makes all five classes skip gracefully on CPU CI (no cuquantum) and on GPU CI until the runner is updated to cuquantum>=26.3.0, while still running correctly when the right version is present.

Otherwise, looks good!

Since custabilizer-cuXY is not a viable standalone package, there is no
need to try to make CUDA major version specific files. Rather, we just
rely on the auto detection logic in cuquantum-python.
@bmhowe23
Copy link
Copy Markdown
Collaborator

bmhowe23 commented Apr 1, 2026

Note: CI passed with 3b8015d, so we could go with that. However, I prefer one commit later (ce055a2). It is currently failing the CI, but I think the cuquantum team is publishing a change that would fix that.

This reverts commit ce055a2ddcb735f43dc7bd7a21e15e3bcc64dd09.
@bmhowe23
Copy link
Copy Markdown
Collaborator

bmhowe23 commented Apr 1, 2026

Note: CI passed with 3b8015d, so we could go with that. However, I prefer one commit later (ce055a2). It is currently failing the CI, but I think the cuquantum team is publishing a change that would fix that.

The cuquantum issue won't have a published fix until April 10th, so I reverted ce055a2.

Signed-off-by: kvmto <kmato@nvidia.com>
kvmto added 3 commits April 1, 2026 22:10
Signed-off-by: kvmto <kmato@nvidia.com>
Signed-off-by: kvmto <kmato@nvidia.com>
Signed-off-by: kvmto <kmato@nvidia.com>
@bmhowe23
Copy link
Copy Markdown
Collaborator

bmhowe23 commented Apr 1, 2026

Note: CI passed with 3b8015d, so we could go with that. However, I prefer one commit later (ce055a2). It is currently failing the CI, but I think the cuquantum team is publishing a change that would fix that.

The cuquantum issue won't have a published fix until April 10th, so I reverted ce055a2.

OK, they overachieved, and it is ready now: https://pypi.org/project/cuquantum-python/26.3.0.post0/

I would like to re-revert ce055a2 (i.e. revert 05e92f8). I will let the CI for Kevin's changes finish up before doing that though. This will be much cleaner to not have the cu12/cu13 split.

This reverts commit 05e92f842a6b01cf1b327b4162e5881d0b951f58.
@bmhowe23
Copy link
Copy Markdown
Collaborator

bmhowe23 commented Apr 1, 2026

Note: CI passed with 3b8015d, so we could go with that. However, I prefer one commit later (ce055a2). It is currently failing the CI, but I think the cuquantum team is publishing a change that would fix that.

The cuquantum issue won't have a published fix until April 10th, so I reverted ce055a2.

OK, they overachieved, and it is ready now: https://pypi.org/project/cuquantum-python/26.3.0.post0/

I would like to re-revert ce055a2 (i.e. revert 05e92f8). I will let the CI for Kevin's changes finish up before doing that though. This will be much cleaner to not have the cu12/cu13 split.

The CI is failing on cu12/cu13 auto-detection again, this time with a new error. It looks like this will have to wait for another day. I will revert the most recent commit (84c814b) and once it passes the CI, I propose we merge it.

…ts files"""

This reverts commit 84c814baa4b5d9911506d74b5cf03c7f12128785.
@kvmto
Copy link
Copy Markdown
Collaborator Author

kvmto commented Apr 1, 2026

Note: CI passed with 3b8015d, so we could go with that. However, I prefer one commit later (ce055a2). It is currently failing the CI, but I think the cuquantum team is publishing a change that would fix that.

The cuquantum issue won't have a published fix until April 10th, so I reverted ce055a2.

OK, they overachieved, and it is ready now: https://pypi.org/project/cuquantum-python/26.3.0.post0/
I would like to re-revert ce055a2 (i.e. revert 05e92f8). I will let the CI for Kevin's changes finish up before doing that though. This will be much cleaner to not have the cu12/cu13 split.

The CI is failing on cu12/cu13 auto-detection again, this time with a new error. It looks like this will have to wait for another day. I will revert the most recent commit (84c814b) and once it passes the CI, I propose we merge it.

agree

@kvmto kvmto merged commit 5aeebdf into NVIDIA:main Apr 1, 2026
13 checks passed
@bmhowe23 bmhowe23 deleted the custab_int_clean branch April 1, 2026 23:02
ivanbasov pushed a commit that referenced this pull request Apr 10, 2026
)

* Make cuStabilizer the sole DEM sampling backend and consolidate tests

Remove the torch fallback path from dem_sampling.py — cuquantum's
BitMatrixSampler is now the only sampling backend, simplifying the
module and eliminating the USE_CUSTAB toggle.  The sampler cache uses
identity-based comparison with a pre-cached transpose to avoid
redundant reallocation.

Merge test_dem_sampling_custab.py and test_dem_sampling_integration.py
into test_dem_sampling.py for a single, comprehensive test suite.

Also:
- Add cuquantum>=26.3.0 to requirements_public_train.txt
- Fix CI to install train (not inference) requirements for GPU tests
- Apply yapf formatting (Google style, 100-col limit)

Signed-off-by: kvmto <kmato@nvidia.com>

* fixed license

Signed-off-by: kvmto <kmato@nvidia.com>

* fix: use cuquantum-python-cu12 wheel to avoid pkg_resources build failure

The cuquantum meta-package fails to build in environments where
pkg_resources is unavailable. Pin the CUDA-12 specific wheel directly
to bypass the broken auto-detection setup.py.

Signed-off-by: kvmto <kmato@nvidia.com>

* lazy imports for safe separation between training and inference

Signed-off-by: kvmto <kmato@nvidia.com>

* quick fix to CI

Signed-off-by: kvmto <kmato@nvidia.com>

* route cuQuantum dem_sampling tests to GPU CI

Signed-off-by: kvmto <kmato@nvidia.com>

* left behind change

Signed-off-by: kvmto <kmato@nvidia.com>

* missing bash session

Signed-off-by: kvmto <kmato@nvidia.com>

* Make CUDA major version specific requirements files and use custabilizer-cuXY

* Revert some changes to test files that are hopefully no longer needed

* Revert REQUIRE_CUQUANTUM changes

* Change custabilizer version to 0.3.0

* Change custabilizer back to cuquantum-python

* Skip test_dem_sampling.py if required deps are not present

* Try again

* Skip a few more tests if cuquantum-python not installed

* Revert CUDA major version specific requirements files

Since custabilizer-cuXY is not a viable standalone package, there is no
need to try to make CUDA major version specific files. Rather, we just
rely on the auto detection logic in cuquantum-python.

* Revert "Revert CUDA major version specific requirements files"

This reverts commit ce055a2.

* small torch device object bug fix for nccl

Signed-off-by: kvmto <kmato@nvidia.com>

* overcome custab device id limitation

Signed-off-by: kvmto <kmato@nvidia.com>

* added tiny logging

Signed-off-by: kvmto <kmato@nvidia.com>

* linted

Signed-off-by: kvmto <kmato@nvidia.com>

* Revert "Revert "Revert CUDA major version specific requirements files""

This reverts commit 05e92f8.

* Revert "Revert "Revert "Revert CUDA major version specific requirements files"""

This reverts commit 84c814b.

---------

Signed-off-by: kvmto <kmato@nvidia.com>
Co-authored-by: Ben Howe <bhowe@nvidia.com>
ivanbasov pushed a commit that referenced this pull request Apr 10, 2026
)

* Make cuStabilizer the sole DEM sampling backend and consolidate tests

Remove the torch fallback path from dem_sampling.py — cuquantum's
BitMatrixSampler is now the only sampling backend, simplifying the
module and eliminating the USE_CUSTAB toggle.  The sampler cache uses
identity-based comparison with a pre-cached transpose to avoid
redundant reallocation.

Merge test_dem_sampling_custab.py and test_dem_sampling_integration.py
into test_dem_sampling.py for a single, comprehensive test suite.

Also:
- Add cuquantum>=26.3.0 to requirements_public_train.txt
- Fix CI to install train (not inference) requirements for GPU tests
- Apply yapf formatting (Google style, 100-col limit)

Signed-off-by: kvmto <kmato@nvidia.com>

* fixed license

Signed-off-by: kvmto <kmato@nvidia.com>

* fix: use cuquantum-python-cu12 wheel to avoid pkg_resources build failure

The cuquantum meta-package fails to build in environments where
pkg_resources is unavailable. Pin the CUDA-12 specific wheel directly
to bypass the broken auto-detection setup.py.

Signed-off-by: kvmto <kmato@nvidia.com>

* lazy imports for safe separation between training and inference

Signed-off-by: kvmto <kmato@nvidia.com>

* quick fix to CI

Signed-off-by: kvmto <kmato@nvidia.com>

* route cuQuantum dem_sampling tests to GPU CI

Signed-off-by: kvmto <kmato@nvidia.com>

* left behind change

Signed-off-by: kvmto <kmato@nvidia.com>

* missing bash session

Signed-off-by: kvmto <kmato@nvidia.com>

* Make CUDA major version specific requirements files and use custabilizer-cuXY

* Revert some changes to test files that are hopefully no longer needed

* Revert REQUIRE_CUQUANTUM changes

* Change custabilizer version to 0.3.0

* Change custabilizer back to cuquantum-python

* Skip test_dem_sampling.py if required deps are not present

* Try again

* Skip a few more tests if cuquantum-python not installed

* Revert CUDA major version specific requirements files

Since custabilizer-cuXY is not a viable standalone package, there is no
need to try to make CUDA major version specific files. Rather, we just
rely on the auto detection logic in cuquantum-python.

* Revert "Revert CUDA major version specific requirements files"

This reverts commit ce055a2.

* small torch device object bug fix for nccl

Signed-off-by: kvmto <kmato@nvidia.com>

* overcome custab device id limitation

Signed-off-by: kvmto <kmato@nvidia.com>

* added tiny logging

Signed-off-by: kvmto <kmato@nvidia.com>

* linted

Signed-off-by: kvmto <kmato@nvidia.com>

* Revert "Revert "Revert CUDA major version specific requirements files""

This reverts commit 05e92f8.

* Revert "Revert "Revert "Revert CUDA major version specific requirements files"""

This reverts commit 84c814b.

---------

Signed-off-by: kvmto <kmato@nvidia.com>
Co-authored-by: Ben Howe <bhowe@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants