Using `docker-compose` and cdi to passthrough gpu to container via `podman`

# Running `docker` and `podman` directly

Works:
* `sudo docker run --rm --device nvidia.com/gpu=all ubuntu nvidia-smi -L`
* `sudo podman run --rm --device nvidia.com/gpu=all ubuntu nvidia-smi -L`

Does not work:
* `sudo docker run --rm --gpus all ubuntu nvidia-smi -L`
* `sudo podman run --rm --gpus all ubuntu nvidia-smi -L`

The `--gpus all` commands fail with the following output:
```
Error: crun: executable file `nvidia-smi` not found in $PATH: No such file or directory: OCI runtime attempted to invoke a command that was not found
```
The nvidia device files are also not present.

I have installed `nvidia-container-toolkit` and ran `sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml`.

# Running emulated `docker-compose`

```yml
version: '3.8'
services:
  resource_test: # this is not working
    image: ubuntu:20.04
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu, utility, compute]
    tty: true
    stdin_open: true
    command:
      - bash
      - -c
      - |
        nvidia-smi -L
  runtime_test: # this is working
    image: ubuntu:20.04
    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
    tty: true
    stdin_open: true
    command:
      - bash
      - -c
      - |
        nvidia-smi -L
```

`sudo docker-compose up`
```
[+] Running 2/2
 ✔ Container pytorch_test-resource_test-1  Recreated                                                                                                   0.3s 
 ✔ Container pytorch_test-runtime_test-1   Recreated                                                                                                   0.3s 
Attaching to pytorch_test-resource_test-1, pytorch_test-runtime_test-1
pytorch_test-resource_test-1  | bash: nvidia-smi: command not found
pytorch_test-runtime_test-1   | GPU 0: NVIDIA T600 (UUID: GPU-XXXXX)
pytorch_test-resource_test-1 exited with code 127
pytorch_test-runtime_test-1 exited with code 0
```

I have no idea what is wrong and appreciate any advice.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using `docker-compose` and cdi to passthrough gpu to container via `podman` #126

Running `docker` and `podman` directly

Running emulated `docker-compose`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Using docker-compose and cdi to passthrough gpu to container via podman #126

Description

Running docker and podman directly

Running emulated docker-compose

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Using `docker-compose` and cdi to passthrough gpu to container via `podman` #126

Running `docker` and `podman` directly

Running emulated `docker-compose`