-
Notifications
You must be signed in to change notification settings - Fork 417
Open
Description
Running docker
and podman
directly
Works:
sudo docker run --rm --device nvidia.com/gpu=all ubuntu nvidia-smi -L
sudo podman run --rm --device nvidia.com/gpu=all ubuntu nvidia-smi -L
Does not work:
sudo docker run --rm --gpus all ubuntu nvidia-smi -L
sudo podman run --rm --gpus all ubuntu nvidia-smi -L
The --gpus all
commands fail with the following output:
Error: crun: executable file `nvidia-smi` not found in $PATH: No such file or directory: OCI runtime attempted to invoke a command that was not found
The nvidia device files are also not present.
I have installed nvidia-container-toolkit
and ran sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
.
Running emulated docker-compose
version: '3.8'
services:
resource_test: # this is not working
image: ubuntu:20.04
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu, utility, compute]
tty: true
stdin_open: true
command:
- bash
- -c
- |
nvidia-smi -L
runtime_test: # this is working
image: ubuntu:20.04
runtime: nvidia
environment:
- NVIDIA_VISIBLE_DEVICES=all
tty: true
stdin_open: true
command:
- bash
- -c
- |
nvidia-smi -L
sudo docker-compose up
[+] Running 2/2
✔ Container pytorch_test-resource_test-1 Recreated 0.3s
✔ Container pytorch_test-runtime_test-1 Recreated 0.3s
Attaching to pytorch_test-resource_test-1, pytorch_test-runtime_test-1
pytorch_test-resource_test-1 | bash: nvidia-smi: command not found
pytorch_test-runtime_test-1 | GPU 0: NVIDIA T600 (UUID: GPU-XXXXX)
pytorch_test-resource_test-1 exited with code 127
pytorch_test-runtime_test-1 exited with code 0
I have no idea what is wrong and appreciate any advice.
itzsimpl and CircuitCipher
Metadata
Metadata
Assignees
Labels
No labels