Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device requests are not respected via API #22645

Open
jennydaman opened this issue May 8, 2024 · 4 comments
Open

Device requests are not respected via API #22645

jennydaman opened this issue May 8, 2024 · 4 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@jennydaman
Copy link

Issue Description

"Device requests" are how GPUs are invoked from the Docker API. However, device requests are not being respected by Podman when creating a container over the Podman socket.

Steps to reproduce the issue

Here is a Python script which tests the Docker and Podman socket APIs.

Setup: install Python version 3.10+ and run pip install docker==7.0.0

Run these tests:

import subprocess as sp
import docker
import docker.types

# Setup: create unix socket clients
# --------------------------------------------------------------------------------

podman_socket = sp.check_output(['podman', 'info', '--format', '{{ .Host.RemoteSocket.Path }}'], text=True).strip()
podman_client = docker.DockerClient(base_url=f'unix://{podman_socket}')
docker_client = docker.DockerClient(base_url=f'unix:///var/run/docker.sock')

# Sanity checks: assert podman is working
# --------------------------------------------------------------------------------

assert b'!... Hello Podman World ...!' in podman_client.containers.run('quay.io/podman/hello', auto_remove=True)

# Sanity checks: assert podman and docker both work with nvidia-container-toolkit
# --------------------------------------------------------------------------------

def test_nvidia_smi_works_using_command(command: str):
    assert sp.check_output([command, 'run', '--rm', '--gpus=all', 'registry.access.redhat.com/ubi9:9.4-947.1714667021', 'nvidia-smi', '-L']).startswith(b'GPU 0')


test_nvidia_smi_works_using_command('docker')
test_nvidia_smi_works_using_command('podman')

# Bug reproduction cases
# --------------------------------------------------------------------------------

GPU_REQUEST = {
    'device_requests': [ docker.types.DeviceRequest(count=1, capabilities=[['gpu']]) ]
}

def test_nvidia_smi_works_using_client(client: docker.DockerClient):
    assert client.containers.run('registry.access.redhat.com/ubi9:9.4-947.1714667021', ['nvidia-smi', '-L'], **GPU_REQUEST).startswith(b'GPU 0')


test_nvidia_smi_works_using_client(docker_client)  # pass
test_nvidia_smi_works_using_client(podman_client)  # fail


def test_device_request_goes_through(client: docker.DockerClient):
    container = client.containers.run('registry.access.redhat.com/ubi9:9.4-947.1714667021', ['nvidia-smi', '-L'], detach=True, **GPU_REQUEST)
    assert len(container.attrs['HostConfig']['DeviceRequests']) > 0
    assert any(request.get('Capabilities', None) == ['gpu'] for request in container.attrs['HostConfig']['DeviceRequests'])


test_device_request_goes_through(docker_client)  # pass
test_device_request_goes_through(podman_client)  # fail

Describe the results you received

  • It should be possible to create containers with GPUs over the Podman socket API
  • The created container should have a non-empty value for .HostConfig.DeviceRequests

Describe the results you expected

  • Device request is not honored when creating container via Podman socket

podman info output

host:
  arch: amd64
  buildahVersion: 1.35.3
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: /usr/bin/conmon is owned by conmon 1:2.1.11-1
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: e21e7c85b7637e622f21c57675bf1154fc8b1866'
  cpuUtilization:
    idlePercent: 94.1
    systemPercent: 1.54
    userPercent: 4.36
  cpus: 20
  databaseBackend: boltdb
  distribution:
    distribution: arch
    version: unknown
  eventLogger: journald
  freeLocks: 2012
  hostname: geo
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.8.9-arch1-1
  linkmode: dynamic
  logDriver: journald
  memFree: 97578004480
  memTotal: 134802944000
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: /usr/lib/podman/aardvark-dns is owned by aardvark-dns 1.10.0-2
      path: /usr/lib/podman/aardvark-dns
      version: aardvark-dns 1.10.0
    package: /usr/lib/podman/netavark is owned by netavark 1.10.3-1
    path: /usr/lib/podman/netavark
    version: netavark 1.10.3
  ociRuntime:
    name: crun
    package: /usr/bin/crun is owned by crun 1.15-1
    path: /usr/bin/crun
    version: |-
      crun version 1.15
      commit: e6eacaf4034e84185fd8780ac9262bbf57082278
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: /usr/bin/pasta is owned by passt 2024_04_26.d03c4e2-1
    version: |
      pasta 2024_04_26.d03c4e2
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: /usr/bin/slirp4netns is owned by slirp4netns 1.3.0-1
    version: |-
      slirp4netns version 1.3.0
      commit: 8a4d4391842f00b9c940bb8f067964427eb0c964
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.5
  swapFree: 0
  swapTotal: 0
  uptime: 1h 31m 25.00s (Approximately 0.04 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries: {}
store:
  configFile: /home/jenni/.config/containers/storage.conf
  containerStore:
    number: 20
    paused: 0
    running: 14
    stopped: 6
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/jenni/.local/share/containers/storage
  graphRootAllocated: 1578640605184
  graphRootUsed: 1019202039808
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 56
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/jenni/.local/share/containers/storage/volumes
version:
  APIVersion: 5.0.2
  Built: 1713438799
  BuiltTime: Thu Apr 18 07:13:19 2024
  GitCommit: 3304dd95b8978a8346b96b7d43134990609b3b29-dirty
  GoVersion: go1.22.2
  Os: linux
  OsArch: linux/amd64
  Version: 5.0.2

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

$ nvidia-container-cli info
NVRM version:   550.78
CUDA version:   12.4

Device Index:   0
Device Minor:   0
Model:          NVIDIA GeForce RTX 3080 Ti
Brand:          GeForce
GPU UUID:       GPU-c61acb21-8716-6540-271c-39beab917d03
Bus Location:   00000000:01:00.0
Architecture:   8.6

Additional information

No response

@jennydaman jennydaman added the kind/bug Categorizes issue or PR as related to a bug. label May 8, 2024
@mheon
Copy link
Member

mheon commented May 9, 2024

Any chance you can get us the JSON being sent by the container create request (the first podman_client.containers.run)? I don't have an nvidia card and as such can't use CDI to try and reproduce.

@jennydaman
Copy link
Author

Sure.

the python code

client.containers.run('registry.access.redhat.com/ubi9:9.4-947.1714667021', ['nvidia-smi', '-L'], device_requests=[ docker.types.DeviceRequest(count=1, capabilities=[['gpu']]) ])

sends this JSON to the socket:

{
  "Hostname": null,
  "Domainname": null,
  "ExposedPorts": null,
  "User": null,
  "Tty": false,
  "OpenStdin": false,
  "StdinOnce": false,
  "AttachStdin": false,
  "AttachStdout": true,
  "AttachStderr": true,
  "Env": null,
  "Cmd": [
    "nvidia-smi",
    "-L"
  ],
  "Image": "registry.access.redhat.com/ubi9:9.4-947.1714667021",
  "Volumes": null,
  "NetworkDisabled": false,
  "Entrypoint": null,
  "WorkingDir": null,
  "HostConfig": {
    "NetworkMode": "default",
    "DeviceRequests": [
      {
        "Driver": "",
        "Count": 1,
        "DeviceIDs": [],
        "Capabilities": [
          [
            "gpu"
          ]
        ],
        "Options": {}
      }
    ]
  },
  "NetworkingConfig": null,
  "MacAddress": null,
  "Labels": null,
  "StopSignal": null,
  "Healthcheck": null,
  "StopTimeout": null,
  "Runtime": null
}

@rrbanda
Copy link

rrbanda commented May 16, 2024

@jennydaman does the following help ? https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#configuring-podman , basically using nvidia-container-toolkit

@jennydaman
Copy link
Author

@rrbanda that is unrelated to this issue.

Podman implements the Docker API in an attempt to be compatible with Docker. This issue is about the DeviceRequests field of Docker's API, which is different from CDI (container device interface).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants