feat: Use CDI for GPU injection instead of nvidia-container-cli

### Problem Statement

GPU access currently relies on the legacy `nvidia-container-runtime` +
`nvidia-container-cli` stack at two layers: once when Docker injects GPUs
into the k3s cluster container, and again when the nvidia-device-plugin +
`nvidia-container-runtime` inject them into individual sandbox pods.


### Proposed Design


Both layers should be migrated to CDI instead. The general idea:

1. Generate a CDI spec on the host before starting the cluster:
   `nvidia-ctk cdi generate`
2. Use Docker's native CDI support (available since Docker 25) to pass GPUs
   into the k3s container: `--device nvidia.com/gpu=all`
3. Mount `/etc/cdi` into the k3s container, enable `enable_cdi_devices = true`
   in the containerd config, and configure the nvidia-device-plugin to use CDI
   device IDs so containerd handles injection natively

CDI is the canonical way NVIDIA supports GPU access in containerized
environments going forward. Some platforms require CDI and are incompatible
with the legacy runtime stack, so this would also broaden the set of platforms
OpenShell can run on. It also makes what gets injected explicit and
auditable via the CDI spec rather than delegating to a CLI with broad host
access.

/cc @elezar @jgehrcke

### Alternatives Considered

None

### Agent Investigation

_No response_

### Checklist

- [x] I've reviewed existing issues and the architecture docs
- [x] This is a design proposal, not a "please build this" request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Use CDI for GPU injection instead of nvidia-container-cli #398

Problem Statement

Proposed Design

Alternatives Considered

Agent Investigation

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: Use CDI for GPU injection instead of nvidia-container-cli #398

Description

Problem Statement

Proposed Design

Alternatives Considered

Agent Investigation

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions