Skip to content

Conversation

@elezar
Copy link
Member

@elezar elezar commented Oct 16, 2025

This change fixes a bug added in #1267. This bug caused device selection via the NVIDIA_VISIBLE_DEVICES envvar to not work as expected.

Before this change:

$ docker run --rm -ti --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 ubuntu nvidia-smi -L
GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-4cf8db2d-06c0-7d70-1a51-e59b25b2c16c)
GPU 1: NVIDIA A100-SXM4-40GB (UUID: GPU-4404041a-04cf-1ccf-9e70-f139a9b1e23c)
GPU 2: NVIDIA A100-SXM4-40GB (UUID: GPU-79a2ba02-a537-ccbf-2965-8e9d90c0bd54)
GPU 3: NVIDIA A100-SXM4-40GB (UUID: GPU-662077db-fa3f-0d8f-9502-21ab0ef058a2)
GPU 4: NVIDIA A100-SXM4-40GB (UUID: GPU-ec9d53cc-125d-d4a3-9687-304df8eb4749)
GPU 5: NVIDIA A100-SXM4-40GB (UUID: GPU-3eb87630-93d5-b2b6-b8ff-9b359caf4ee2)
GPU 6: NVIDIA A100-SXM4-40GB (UUID: GPU-8216274a-c05d-def0-af18-c74647300267)
GPU 7: NVIDIA A100-SXM4-40GB (UUID: GPU-b1028956-cfa2-0990-bf4a-5da9abb51763)

With this change:

$ docker run --rm -ti --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 ubuntu nvidia-smi -L
GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-4cf8db2d-06c0-7d70-1a51-e59b25b2c16c)

As a follow-up we should consider switching to a multi-GPU instance for our testing.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
@elezar elezar merged commit fae680c into NVIDIA:main Oct 16, 2025
13 checks passed
@elezar elezar deleted the fix-device-filtering branch October 16, 2025 07:38
@elezar
Copy link
Member Author

elezar commented Oct 16, 2025

I have created #1356 to add multi-GPU tests to the CI.

@elezar elezar added this to the v1.18.0 milestone Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants