Description
Description
Feature Request:
Add support to the --device
flag to process fully-qualified CDI device names in addition to standard device node paths.
Support for this was added in podman
v4.1.0
in May 2022 and it would be good to have feature parity with this in docker
.
Background:
The Container Device Interface (CDI) is a CNCF sponsored initiative to standardize the way in which complex devices are exposed to containers. It is based on the Container Networking Interface (CNI) model and specification.
With CDI, a "device" is more than just a single device node under /dev
.
Instead, CDI defines a device as a high-level concept mapped to specific set of OCI runtime spec modifications. These modifications include not just the inclusion of device nodes, but also filesystem mounts, environment variables, and container lifecycle hooks.
Vendors are responsible for writing CDI specs that define their devices in terms of these modifications, and CDI-enabled container runtimes are responsible for reading these specs and making the proper modifications when requested to do to.
The list of container runtimes that already include support for CDI are:
Moreover, a new feature in Kubernetes called Dynamic Resource Allocation (DRA) uses CDI under the hood to do its device injection (with the assumption that a CDI-enabled runtime such as cri-o
or containerd
is there to support it).
As a concrete example, consider the following abbreviated CDI spec for injecting an NVIDIA GPU into a container:
---
cdiVersion: 0.4.0
kind: nvidia.com/gpu
devices:
- name: gpu0
containerEdits:
deviceNodes:
- path: /dev/nvidia0
deviceNodes:
- path: /dev/nvidiactl
mounts:
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.460.91.03
options: [ro, nosuid, nodev, bind]
- containerPath: /usr/lib/x86_64-linux-gnu/libcuda.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libcuda.so.460.91.03
options: [ro, nosuid, nodev, bind]
- containerPath: /usr/bin/nvidia-smi
hostPath: /usr/bin/nvidia-smi
options: [ro, nosuid, nodev, bind]
- containerPath: /usr/bin/nvidia-persistenced
hostPath: /usr/bin/nvidia-persistenced
options: [ro, nosuid, nodev, bind]
- containerPath: /var/run/nvidia-persistenced/socket
hostPath: /var/run/nvidia-persistenced/socket
options: [ro, nosuid, nodev, bind]
hooks:
- hookName: createContainer
path: /usr/bin/nvidia-ctk
args:
- /usr/bin/nvidia-ctk
- hook
- update-ldcache
- --folder
- /usr/lib/x86_64-linux-gnu
This spec defines a device whose name is gpu0
associated with vendor nvidia.com
and device class gpu
-- resulting in a fully-qualified CDI device name of nvidia.com/gpu=gpu0
.
Referencing this fully-qualified device name and passing it to a CDI-enabled runtime would ensure that not only does the /dev/nvidia0
device node get injected into a container, but also the required /dev/nvidiactl
control device, as well as a set of host-level libraries and hooks to make working with NVIDIA devices in containers easier.
Note: For those of you familiar with nvidia-docker
and / or the nvidia-container-toolkit
, this effectively obsoletes the need for for these tools, because everything these tools have traditionally been responsible for can now be encoded in a CDI spec.
As such, saving the above spec under /etc/cdi/nvidia.yaml
and running podman
as below results in the container starting with the desired modifications:
$ podman --version
podman version 4.1.0
$ podman run --device nvidia.com/gpu=gpu0 ubuntu:20.04 nvidia-smi -L
GPU 0: Tesla T4 (UUID: GPU-2e5daa54-7530-ede9-5b96-d4e31fbbe7f8)