Fix test_cuda_device_order on some multi-GPU systems by mdboom · Pull Request #1590 · NVIDIA/cuda-python

mdboom · 2026-02-09T15:54:37Z

The CUDA_VISIBLE_DEVICES environment variable controls whether all devices are included in cuDeviceGetCount, or just those that are available for CUDA compute.

We should make the test adaptable to either case.

copy-pr-bot · 2026-02-09T15:54:41Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

mdboom · 2026-02-09T15:55:16Z

/ok to test

Copilot

Pull request overview

Adjusts the NVML/CUDA device-order test to avoid failures on multi-GPU systems where device visibility differs depending on CUDA_VISIBLE_DEVICES.

Changes:

Updates test_cuda_device_order to accept the monkeypatch fixture.
Deletes CUDA_VISIBLE_DEVICES during the test run before querying CUDA/NVML devices.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mdboom · 2026-02-09T16:11:01Z

/ok to test

rwgk · 2026-02-09T16:44:20Z

+        # and each of them should still be found in NVML devices.
+        assert len(cuda_devices) <= len(nvml_devices)
+        for cuda_device in cuda_devices:
+            assert cuda_device in nvml_devices, f"CUDA device {cuda_device} not found in NVML device list"


Give me a sec to experiment: does the f-string here suppress the helpful pytest default behavior or not?

This is great as-is.

I hacked the test so that it fails on my workstation. This is the diff with/OUT the f-string:

(TestVenv) smc120-0009.ipp2a2.colossus.nvidia.com:/wrk/forked/cuda-python/cuda_bindings $ diff -u $Z/withOUT_fstring $Z/with_fstring --- /wrk/z/withOUT_fstring 2026-02-09 08:59:27.153944722 -0800 +++ /wrk/z/with_fstring 2026-02-09 08:59:11.085012520 -0800 @@ -2,7 +2,7 @@ platform linux -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 -- /wrk/forked/cuda-python/TestVenv/bin/python cachedir: .pytest_cache benchmark: 5.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000) -Using --randomly-seed=1514160743 +Using --randomly-seed=3206929400 rootdir: /wrk/forked/cuda-python/cuda_bindings configfile: pyproject.toml plugins: repeat-0.9.4, benchmark-5.2.3, mock-3.15.1, randomly-4.0.1 @@ -25,8 +25,9 @@ # and each of them should still be found in NVML devices. assert len(cuda_devices) <= len(nvml_devices) for cuda_device in cuda_devices: -> assert cuda_device not in nvml_devices -E AssertionError: assert {'id': 75, 'name': 'NVIDIA A10G'} not in [{'id': 75, 'name': 'NVIDIA A10G'}] +> assert cuda_device not in nvml_devices, f"CUDA device {cuda_device} not found in NVML device list" +E AssertionError: CUDA device {'name': 'NVIDIA A10G', 'id': 75} not found in NVML device list +E assert {'id': 75, 'name': 'NVIDIA A10G'} not in [{'id': 75, 'name': 'NVIDIA A10G'}] cuda_device = {'id': 75, 'name': 'NVIDIA A10G'} cuda_devices = [{'id': 75, 'name': 'NVIDIA A10G'}] @@ -34,5 +35,5 @@ tests/nvml/test_cuda.py:67: AssertionError =========================== short test summary info ============================ -FAILED tests/nvml/test_cuda.py::test_cuda_device_order - AssertionError: asse... -============================== 1 failed in 0.38s =============================== +FAILED tests/nvml/test_cuda.py::test_cuda_device_order - AssertionError: CUDA... +============================== 1 failed in 0.41s ===============================

rwgk

LGTM except for the suspected possible/possibly typo.

rwgk · 2026-02-09T17:01:39Z

+        # and each of them should still be found in NVML devices.
+        assert len(cuda_devices) <= len(nvml_devices)
+        for cuda_device in cuda_devices:
+            assert cuda_device in nvml_devices, f"CUDA device {cuda_device} not found in NVML device list"


This is great as-is.

I hacked the test so that it fails on my workstation. This is the diff with/OUT the f-string:

(TestVenv) smc120-0009.ipp2a2.colossus.nvidia.com:/wrk/forked/cuda-python/cuda_bindings $ diff -u $Z/withOUT_fstring $Z/with_fstring --- /wrk/z/withOUT_fstring 2026-02-09 08:59:27.153944722 -0800 +++ /wrk/z/with_fstring 2026-02-09 08:59:11.085012520 -0800 @@ -2,7 +2,7 @@ platform linux -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 -- /wrk/forked/cuda-python/TestVenv/bin/python cachedir: .pytest_cache benchmark: 5.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000) -Using --randomly-seed=1514160743 +Using --randomly-seed=3206929400 rootdir: /wrk/forked/cuda-python/cuda_bindings configfile: pyproject.toml plugins: repeat-0.9.4, benchmark-5.2.3, mock-3.15.1, randomly-4.0.1 @@ -25,8 +25,9 @@ # and each of them should still be found in NVML devices. assert len(cuda_devices) <= len(nvml_devices) for cuda_device in cuda_devices: -> assert cuda_device not in nvml_devices -E AssertionError: assert {'id': 75, 'name': 'NVIDIA A10G'} not in [{'id': 75, 'name': 'NVIDIA A10G'}] +> assert cuda_device not in nvml_devices, f"CUDA device {cuda_device} not found in NVML device list" +E AssertionError: CUDA device {'name': 'NVIDIA A10G', 'id': 75} not found in NVML device list +E assert {'id': 75, 'name': 'NVIDIA A10G'} not in [{'id': 75, 'name': 'NVIDIA A10G'}] cuda_device = {'id': 75, 'name': 'NVIDIA A10G'} cuda_devices = [{'id': 75, 'name': 'NVIDIA A10G'}] @@ -34,5 +35,5 @@ tests/nvml/test_cuda.py:67: AssertionError =========================== short test summary info ============================ -FAILED tests/nvml/test_cuda.py::test_cuda_device_order - AssertionError: asse... -============================== 1 failed in 0.38s =============================== +FAILED tests/nvml/test_cuda.py::test_cuda_device_order - AssertionError: CUDA... +============================== 1 failed in 0.41s ===============================

Co-authored-by: Ralf W. Grosse-Kunstleve <rwgkio@gmail.com>

mdboom · 2026-02-09T21:01:27Z

/ok to test

github-actions · 2026-02-10T00:19:38Z

Doc Preview CI
Preview removed because the pull request was closed or merged.

mdboom added 2 commits February 9, 2026 10:47

Fix test_cuda_device_order

1254bc3

Fix test

8209a89

mdboom requested a review from rparolin February 9, 2026 15:54

mdboom self-assigned this Feb 9, 2026

mdboom added test Improvements or additions to tests cuda.bindings Everything related to the cuda.bindings module labels Feb 9, 2026

mdboom requested a review from Copilot February 9, 2026 15:55

Copilot started reviewing on behalf of mdboom February 9, 2026 15:55 View session

Copilot AI reviewed Feb 9, 2026

View reviewed changes

Comment thread cuda_bindings/tests/nvml/test_cuda.py Outdated

This comment has been minimized.

Sign in to view

rwgk reviewed Feb 9, 2026

View reviewed changes

rwgk approved these changes Feb 9, 2026

View reviewed changes

Update cuda_bindings/tests/nvml/test_cuda.py

a96dab7

Co-authored-by: Ralf W. Grosse-Kunstleve <rwgkio@gmail.com>

mdboom enabled auto-merge (squash) February 9, 2026 21:01

mdboom merged commit ea45bbf into NVIDIA:main Feb 9, 2026
86 checks passed

This comment has been minimized.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix test_cuda_device_order on some multi-GPU systems#1590

Fix test_cuda_device_order on some multi-GPU systems#1590
mdboom merged 3 commits intoNVIDIA:mainfrom
mdboom:nvml-device-get-count

mdboom commented Feb 9, 2026

Uh oh!

copy-pr-bot bot commented Feb 9, 2026

Uh oh!

mdboom commented Feb 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

This comment has been minimized.

mdboom commented Feb 9, 2026

Uh oh!

Uh oh!

rwgk Feb 9, 2026

Uh oh!

rwgk Feb 9, 2026

Uh oh!

rwgk left a comment

Uh oh!

rwgk Feb 9, 2026

Uh oh!

mdboom commented Feb 9, 2026

Uh oh!

Uh oh!

This comment has been minimized.

github-actions bot commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mdboom commented Feb 9, 2026

Uh oh!

copy-pr-bot bot commented Feb 9, 2026

Uh oh!

mdboom commented Feb 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

This comment has been minimized.

mdboom commented Feb 9, 2026

Uh oh!

Uh oh!

rwgk Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

rwgk Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

rwgk left a comment

Choose a reason for hiding this comment

Uh oh!

rwgk Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

mdboom commented Feb 9, 2026

Uh oh!

Uh oh!

This comment has been minimized.

github-actions bot commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants