Improvement cuStabilizer DEM sampling backend and consolidate tests#32
Conversation
Remove the torch fallback path from dem_sampling.py — cuquantum's BitMatrixSampler is now the only sampling backend, simplifying the module and eliminating the USE_CUSTAB toggle. The sampler cache uses identity-based comparison with a pre-cached transpose to avoid redundant reallocation. Merge test_dem_sampling_custab.py and test_dem_sampling_integration.py into test_dem_sampling.py for a single, comprehensive test suite. Also: - Add cuquantum>=26.3.0 to requirements_public_train.txt - Fix CI to install train (not inference) requirements for GPU tests - Apply yapf formatting (Google style, 100-col limit) Signed-off-by: kvmto <kmato@nvidia.com>
Signed-off-by: kvmto <kmato@nvidia.com>
…lure The cuquantum meta-package fails to build in environments where pkg_resources is unavailable. Pin the CUDA-12 specific wheel directly to bypass the broken auto-detection setup.py. Signed-off-by: kvmto <kmato@nvidia.com>
Signed-off-by: kvmto <kmato@nvidia.com>
Signed-off-by: kvmto <kmato@nvidia.com>
Signed-off-by: kvmto <kmato@nvidia.com>
Signed-off-by: kvmto <kmato@nvidia.com>
Signed-off-by: kvmto <kmato@nvidia.com>
There was a problem hiding this comment.
CI failures root cause
Two related issues, both from the same design decision — making cuquantum.BitMatrixSampler the sole backend without adding skip guards in the tests.
Issue 1 — Unit-test CI (CPU runners): ModuleNotFoundError: No module named 'cuquantum'
dem_sampling() now does a hard import of cuquantum.stabilizer unconditionally at call time (line 81 of the new dem_sampling.py):
def dem_sampling(...):
from cuquantum.stabilizer.dem_sampling import BitMatrixSampler # <-- always runs
from cuquantum.stabilizer.simulator import OptionsThe CPU CI runner doesn't have cuquantum installed, so every test that calls dem_sampling() crashes immediately. Affected test classes in test_dem_sampling.py:
TestBitMatrixSamplerIsUsed— importscuquantumdirectly, no skip guardTestDemSamplingCPU— callsdem_sampling(), no skip guardTestDEMSamplingIntegration— callsgenerate_batch()→dem_sampling(), no skip guardTestCustVsTorchDeterministic— callsdem_sampling(), no skip guardTestCustVsTorchStatistical.test_small_matrix_cpu— callsdem_sampling(), no skip guard
The old test_dem_sampling_custab.py handled this correctly with @unittest.skipUnless(_custab_available(), ...) at class level. That decorator was dropped when the files were consolidated.
Issue 2 — GPU CI runner: ModuleNotFoundError: No module named 'cuquantum.stabilizer'
The GPU runner has cuquantum installed, but an older version that predates the stabilizer subpackage (which requires >=26.3.0). Same test classes fail for the same reason — no skip guard catches the import error.
Suggested fix
The cleanest approach is to move the cuquantum import to module level with a try/except (restoring _CUSTAB_AVAILABLE), and add @unittest.skipUnless(_dem_mod._CUSTAB_AVAILABLE, "cuquantum>=26.3.0 not available") to all five affected test classes. The old code on main already does this correctly in _custab_available() / _CUSTAB_AVAILABLE — the pattern just needs to be carried forward.
# dem_sampling.py — module level
try:
from cuquantum.stabilizer.dem_sampling import BitMatrixSampler
from cuquantum.stabilizer.simulator import Options
_CUSTAB_AVAILABLE = True
except ImportError:
BitMatrixSampler = None
Options = None
_CUSTAB_AVAILABLE = False# test_dem_sampling.py
@unittest.skipUnless(_dem_mod._CUSTAB_AVAILABLE, "cuquantum>=26.3.0 (stabilizer) not available")
class TestDemSamplingCPU(unittest.TestCase): ...
@unittest.skipUnless(_dem_mod._CUSTAB_AVAILABLE, "cuquantum>=26.3.0 (stabilizer) not available")
class TestBitMatrixSamplerIsUsed(unittest.TestCase): ...
# ... same for TestDEMSamplingIntegration, TestCustVsTorchDeterministic, TestCustVsTorchStatisticalThis makes all five classes skip gracefully on CPU CI (no cuquantum) and on GPU CI until the runner is updated to cuquantum>=26.3.0, while still running correctly when the right version is present.
Otherwise, looks good!
Since custabilizer-cuXY is not a viable standalone package, there is no need to try to make CUDA major version specific files. Rather, we just rely on the auto detection logic in cuquantum-python.
This reverts commit ce055a2ddcb735f43dc7bd7a21e15e3bcc64dd09.
Signed-off-by: kvmto <kmato@nvidia.com>
Signed-off-by: kvmto <kmato@nvidia.com>
Signed-off-by: kvmto <kmato@nvidia.com>
OK, they overachieved, and it is ready now: https://pypi.org/project/cuquantum-python/26.3.0.post0/ I would like to re-revert ce055a2 (i.e. revert 05e92f8). I will let the CI for Kevin's changes finish up before doing that though. This will be much cleaner to not have the cu12/cu13 split. |
This reverts commit 05e92f842a6b01cf1b327b4162e5881d0b951f58.
The CI is failing on cu12/cu13 auto-detection again, this time with a new error. It looks like this will have to wait for another day. I will revert the most recent commit (84c814b) and once it passes the CI, I propose we merge it. |
…ts files""" This reverts commit 84c814baa4b5d9911506d74b5cf03c7f12128785.
agree |
) * Make cuStabilizer the sole DEM sampling backend and consolidate tests Remove the torch fallback path from dem_sampling.py — cuquantum's BitMatrixSampler is now the only sampling backend, simplifying the module and eliminating the USE_CUSTAB toggle. The sampler cache uses identity-based comparison with a pre-cached transpose to avoid redundant reallocation. Merge test_dem_sampling_custab.py and test_dem_sampling_integration.py into test_dem_sampling.py for a single, comprehensive test suite. Also: - Add cuquantum>=26.3.0 to requirements_public_train.txt - Fix CI to install train (not inference) requirements for GPU tests - Apply yapf formatting (Google style, 100-col limit) Signed-off-by: kvmto <kmato@nvidia.com> * fixed license Signed-off-by: kvmto <kmato@nvidia.com> * fix: use cuquantum-python-cu12 wheel to avoid pkg_resources build failure The cuquantum meta-package fails to build in environments where pkg_resources is unavailable. Pin the CUDA-12 specific wheel directly to bypass the broken auto-detection setup.py. Signed-off-by: kvmto <kmato@nvidia.com> * lazy imports for safe separation between training and inference Signed-off-by: kvmto <kmato@nvidia.com> * quick fix to CI Signed-off-by: kvmto <kmato@nvidia.com> * route cuQuantum dem_sampling tests to GPU CI Signed-off-by: kvmto <kmato@nvidia.com> * left behind change Signed-off-by: kvmto <kmato@nvidia.com> * missing bash session Signed-off-by: kvmto <kmato@nvidia.com> * Make CUDA major version specific requirements files and use custabilizer-cuXY * Revert some changes to test files that are hopefully no longer needed * Revert REQUIRE_CUQUANTUM changes * Change custabilizer version to 0.3.0 * Change custabilizer back to cuquantum-python * Skip test_dem_sampling.py if required deps are not present * Try again * Skip a few more tests if cuquantum-python not installed * Revert CUDA major version specific requirements files Since custabilizer-cuXY is not a viable standalone package, there is no need to try to make CUDA major version specific files. Rather, we just rely on the auto detection logic in cuquantum-python. * Revert "Revert CUDA major version specific requirements files" This reverts commit ce055a2. * small torch device object bug fix for nccl Signed-off-by: kvmto <kmato@nvidia.com> * overcome custab device id limitation Signed-off-by: kvmto <kmato@nvidia.com> * added tiny logging Signed-off-by: kvmto <kmato@nvidia.com> * linted Signed-off-by: kvmto <kmato@nvidia.com> * Revert "Revert "Revert CUDA major version specific requirements files"" This reverts commit 05e92f8. * Revert "Revert "Revert "Revert CUDA major version specific requirements files""" This reverts commit 84c814b. --------- Signed-off-by: kvmto <kmato@nvidia.com> Co-authored-by: Ben Howe <bhowe@nvidia.com>
) * Make cuStabilizer the sole DEM sampling backend and consolidate tests Remove the torch fallback path from dem_sampling.py — cuquantum's BitMatrixSampler is now the only sampling backend, simplifying the module and eliminating the USE_CUSTAB toggle. The sampler cache uses identity-based comparison with a pre-cached transpose to avoid redundant reallocation. Merge test_dem_sampling_custab.py and test_dem_sampling_integration.py into test_dem_sampling.py for a single, comprehensive test suite. Also: - Add cuquantum>=26.3.0 to requirements_public_train.txt - Fix CI to install train (not inference) requirements for GPU tests - Apply yapf formatting (Google style, 100-col limit) Signed-off-by: kvmto <kmato@nvidia.com> * fixed license Signed-off-by: kvmto <kmato@nvidia.com> * fix: use cuquantum-python-cu12 wheel to avoid pkg_resources build failure The cuquantum meta-package fails to build in environments where pkg_resources is unavailable. Pin the CUDA-12 specific wheel directly to bypass the broken auto-detection setup.py. Signed-off-by: kvmto <kmato@nvidia.com> * lazy imports for safe separation between training and inference Signed-off-by: kvmto <kmato@nvidia.com> * quick fix to CI Signed-off-by: kvmto <kmato@nvidia.com> * route cuQuantum dem_sampling tests to GPU CI Signed-off-by: kvmto <kmato@nvidia.com> * left behind change Signed-off-by: kvmto <kmato@nvidia.com> * missing bash session Signed-off-by: kvmto <kmato@nvidia.com> * Make CUDA major version specific requirements files and use custabilizer-cuXY * Revert some changes to test files that are hopefully no longer needed * Revert REQUIRE_CUQUANTUM changes * Change custabilizer version to 0.3.0 * Change custabilizer back to cuquantum-python * Skip test_dem_sampling.py if required deps are not present * Try again * Skip a few more tests if cuquantum-python not installed * Revert CUDA major version specific requirements files Since custabilizer-cuXY is not a viable standalone package, there is no need to try to make CUDA major version specific files. Rather, we just rely on the auto detection logic in cuquantum-python. * Revert "Revert CUDA major version specific requirements files" This reverts commit ce055a2. * small torch device object bug fix for nccl Signed-off-by: kvmto <kmato@nvidia.com> * overcome custab device id limitation Signed-off-by: kvmto <kmato@nvidia.com> * added tiny logging Signed-off-by: kvmto <kmato@nvidia.com> * linted Signed-off-by: kvmto <kmato@nvidia.com> * Revert "Revert "Revert CUDA major version specific requirements files"" This reverts commit 05e92f8. * Revert "Revert "Revert "Revert CUDA major version specific requirements files""" This reverts commit 84c814b. --------- Signed-off-by: kvmto <kmato@nvidia.com> Co-authored-by: Ben Howe <bhowe@nvidia.com>
Summary
dem_sampling.py, making cuQuantum's cuStabilizerBitMatrixSamplerthe sole DEM sampling backend. This eliminates theUSE_CUSTABenv-var toggle, the_use_custab()helper, and thecustab_matrix_sampling()indirection —dem_sampling()now calls cuST directly.id(H)address-based cache key with a properis-identity check (_cached_H is not H) and pre-cacheH.Tto avoid repeated transposes.test_dem_sampling_custab.pyandtest_dem_sampling_integration.pyintotest_dem_sampling.py, removing the now-unnecessary fallback-specific tests.cuquantum>=26.3.0as an explicit dependency inrequirements_public_train.txt.requirements_public_train.txt(which includes inference deps) instead ofrequirements_public_inference.txtalone.Test plan
test_dem_sampling.pycovers CPU, GPU, statistical, integration, and cuST-vs-torch deterministic cross-checks)cuquantum>=26.3.0resolves correctly in the CI environment