Skip to content

Device requests are not respected via API #22645

Open
@jennydaman

Description

@jennydaman

Issue Description

"Device requests" are how GPUs are invoked from the Docker API. However, device requests are not being respected by Podman when creating a container over the Podman socket.

Steps to reproduce the issue

Here is a Python script which tests the Docker and Podman socket APIs.

Setup: install Python version 3.10+ and run pip install docker==7.0.0

Run these tests:

import subprocess as sp
import docker
import docker.types

# Setup: create unix socket clients
# --------------------------------------------------------------------------------

podman_socket = sp.check_output(['podman', 'info', '--format', '{{ .Host.RemoteSocket.Path }}'], text=True).strip()
podman_client = docker.DockerClient(base_url=f'unix://{podman_socket}')
docker_client = docker.DockerClient(base_url=f'unix:///var/run/docker.sock')

# Sanity checks: assert podman is working
# --------------------------------------------------------------------------------

assert b'!... Hello Podman World ...!' in podman_client.containers.run('quay.io/podman/hello', auto_remove=True)

# Sanity checks: assert podman and docker both work with nvidia-container-toolkit
# --------------------------------------------------------------------------------

def test_nvidia_smi_works_using_command(command: str):
    assert sp.check_output([command, 'run', '--rm', '--gpus=all', 'registry.access.redhat.com/ubi9:9.4-947.1714667021', 'nvidia-smi', '-L']).startswith(b'GPU 0')


test_nvidia_smi_works_using_command('docker')
test_nvidia_smi_works_using_command('podman')

# Bug reproduction cases
# --------------------------------------------------------------------------------

GPU_REQUEST = {
    'device_requests': [ docker.types.DeviceRequest(count=1, capabilities=[['gpu']]) ]
}

def test_nvidia_smi_works_using_client(client: docker.DockerClient):
    assert client.containers.run('registry.access.redhat.com/ubi9:9.4-947.1714667021', ['nvidia-smi', '-L'], **GPU_REQUEST).startswith(b'GPU 0')


test_nvidia_smi_works_using_client(docker_client)  # pass
test_nvidia_smi_works_using_client(podman_client)  # fail


def test_device_request_goes_through(client: docker.DockerClient):
    container = client.containers.run('registry.access.redhat.com/ubi9:9.4-947.1714667021', ['nvidia-smi', '-L'], detach=True, **GPU_REQUEST)
    assert len(container.attrs['HostConfig']['DeviceRequests']) > 0
    assert any(request.get('Capabilities', None) == ['gpu'] for request in container.attrs['HostConfig']['DeviceRequests'])


test_device_request_goes_through(docker_client)  # pass
test_device_request_goes_through(podman_client)  # fail

Describe the results you received

  • It should be possible to create containers with GPUs over the Podman socket API
  • The created container should have a non-empty value for .HostConfig.DeviceRequests

Describe the results you expected

  • Device request is not honored when creating container via Podman socket

podman info output

host:
  arch: amd64
  buildahVersion: 1.35.3
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: /usr/bin/conmon is owned by conmon 1:2.1.11-1
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: e21e7c85b7637e622f21c57675bf1154fc8b1866'
  cpuUtilization:
    idlePercent: 94.1
    systemPercent: 1.54
    userPercent: 4.36
  cpus: 20
  databaseBackend: boltdb
  distribution:
    distribution: arch
    version: unknown
  eventLogger: journald
  freeLocks: 2012
  hostname: geo
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.8.9-arch1-1
  linkmode: dynamic
  logDriver: journald
  memFree: 97578004480
  memTotal: 134802944000
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: /usr/lib/podman/aardvark-dns is owned by aardvark-dns 1.10.0-2
      path: /usr/lib/podman/aardvark-dns
      version: aardvark-dns 1.10.0
    package: /usr/lib/podman/netavark is owned by netavark 1.10.3-1
    path: /usr/lib/podman/netavark
    version: netavark 1.10.3
  ociRuntime:
    name: crun
    package: /usr/bin/crun is owned by crun 1.15-1
    path: /usr/bin/crun
    version: |-
      crun version 1.15
      commit: e6eacaf4034e84185fd8780ac9262bbf57082278
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: /usr/bin/pasta is owned by passt 2024_04_26.d03c4e2-1
    version: |
      pasta 2024_04_26.d03c4e2
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: /usr/bin/slirp4netns is owned by slirp4netns 1.3.0-1
    version: |-
      slirp4netns version 1.3.0
      commit: 8a4d4391842f00b9c940bb8f067964427eb0c964
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.5
  swapFree: 0
  swapTotal: 0
  uptime: 1h 31m 25.00s (Approximately 0.04 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries: {}
store:
  configFile: /home/jenni/.config/containers/storage.conf
  containerStore:
    number: 20
    paused: 0
    running: 14
    stopped: 6
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/jenni/.local/share/containers/storage
  graphRootAllocated: 1578640605184
  graphRootUsed: 1019202039808
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 56
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/jenni/.local/share/containers/storage/volumes
version:
  APIVersion: 5.0.2
  Built: 1713438799
  BuiltTime: Thu Apr 18 07:13:19 2024
  GitCommit: 3304dd95b8978a8346b96b7d43134990609b3b29-dirty
  GoVersion: go1.22.2
  Os: linux
  OsArch: linux/amd64
  Version: 5.0.2

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

$ nvidia-container-cli info
NVRM version:   550.78
CUDA version:   12.4

Device Index:   0
Device Minor:   0
Model:          NVIDIA GeForce RTX 3080 Ti
Brand:          GeForce
GPU UUID:       GPU-c61acb21-8716-6540-271c-39beab917d03
Bus Location:   00000000:01:00.0
Architecture:   8.6

Additional information

No response

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions