Description
Update Docker and Podman GPU sandbox defaults so --gpu prefers one CDI GPU device instead of defaulting to nvidia.com/gpu=all.
This is part of the GPU roadmap in #1444. --gpu means the active driver's default GPU behavior, and for GPU-enabled drivers that default should inject or allocate one suitable GPU when the runtime supports individual device selection.
Context
Parent roadmap: #1444
Current local-container behavior maps a GPU request with no explicit gpu_device to nvidia.com/gpu=all through the shared CDI helper. That makes Docker and Podman inconsistent with Kubernetes and VM behavior, where a default GPU request maps to one GPU.
Docker has priority for implementation because OpenShell's Docker GPU path and CDI discovery are more mature today. Podman should be handled in the same task, but may require additional runtime support or an out-of-band CDI device discovery path. Upstream Podman behavior such as containers/podman#28712 may be relevant.
Proposed Scope
- Define local-container default GPU selection semantics for Docker and Podman.
- Change Docker default
--gpu behavior to prefer one CDI GPU device instead of nvidia.com/gpu=all.
- Change Podman default
--gpu behavior to prefer one CDI GPU device instead of nvidia.com/gpu=all.
- Prefer runtime-reported CDI inventory when available.
- Preserve explicit
--gpu-device behavior as a driver-native advanced option.
- Do not add multi-GPU count support in this task.
- Do not require OpenShell-managed GPU assignment/exclusivity tracking in this task.
Target Behavior
Default GPU selection should use this order:
- If the runtime reports individual CDI GPU devices, select one individual device.
- If reliable CDI inventory is unavailable but individual device IDs are expected to work, fall back to
nvidia.com/gpu=0.
- If the runtime/platform only reports or supports
nvidia.com/gpu=all, such as some WSL2-based setups, use nvidia.com/gpu=all as a compatibility fallback.
Additional behavior:
openshell sandbox create --gpu ... on Docker injects one CDI GPU device when individual device selection is available.
openshell sandbox create --gpu ... on Podman injects one CDI GPU device when individual device selection is available.
openshell sandbox create --gpu --gpu-device nvidia.com/gpu=0 ... continues to pass the explicit CDI device ID through.
- The fallback to
nvidia.com/gpu=all should be intentional and documented, not the default for platforms with individual device selection.
- Non-zero
gpu_count remains unsupported unless a driver explicitly implements count-based allocation.
Out of Scope
This task fixes default GPU device selection cardinality. It does not require OpenShell to track active GPU assignments or prevent two OpenShell sandboxes from selecting the same default GPU.
If multiple sandboxes are created concurrently, selecting the same default fallback device is acceptable until a separate allocation/exclusivity task is implemented.
Open Questions
- Where should CDI inventory discovery live: shared OpenShell core helper, driver-specific code, or both?
- What should Podman use as the authoritative CDI device inventory source before runtime-level enumeration is reliable?
- Should assignment/exclusivity tracking be added later at the driver level or as part of a broader resource allocation model?
Definition of Done
Description
Update Docker and Podman GPU sandbox defaults so
--gpuprefers one CDI GPU device instead of defaulting tonvidia.com/gpu=all.This is part of the GPU roadmap in #1444.
--gpumeans the active driver's default GPU behavior, and for GPU-enabled drivers that default should inject or allocate one suitable GPU when the runtime supports individual device selection.Context
Parent roadmap: #1444
Current local-container behavior maps a GPU request with no explicit
gpu_devicetonvidia.com/gpu=allthrough the shared CDI helper. That makes Docker and Podman inconsistent with Kubernetes and VM behavior, where a default GPU request maps to one GPU.Docker has priority for implementation because OpenShell's Docker GPU path and CDI discovery are more mature today. Podman should be handled in the same task, but may require additional runtime support or an out-of-band CDI device discovery path. Upstream Podman behavior such as containers/podman#28712 may be relevant.
Proposed Scope
--gpubehavior to prefer one CDI GPU device instead ofnvidia.com/gpu=all.--gpubehavior to prefer one CDI GPU device instead ofnvidia.com/gpu=all.--gpu-devicebehavior as a driver-native advanced option.Target Behavior
Default GPU selection should use this order:
nvidia.com/gpu=0.nvidia.com/gpu=all, such as some WSL2-based setups, usenvidia.com/gpu=allas a compatibility fallback.Additional behavior:
openshell sandbox create --gpu ...on Docker injects one CDI GPU device when individual device selection is available.openshell sandbox create --gpu ...on Podman injects one CDI GPU device when individual device selection is available.openshell sandbox create --gpu --gpu-device nvidia.com/gpu=0 ...continues to pass the explicit CDI device ID through.nvidia.com/gpu=allshould be intentional and documented, not the default for platforms with individual device selection.gpu_countremains unsupported unless a driver explicitly implements count-based allocation.Out of Scope
This task fixes default GPU device selection cardinality. It does not require OpenShell to track active GPU assignments or prevent two OpenShell sandboxes from selecting the same default GPU.
If multiple sandboxes are created concurrently, selecting the same default fallback device is acceptable until a separate allocation/exclusivity task is implemented.
Open Questions
Definition of Done
--gpuprefers one individual CDI GPU device when available.--gpuprefers one individual CDI GPU device when available.nvidia.com/gpu=0.nvidia.com/gpu=allremains available as a documented compatibility fallback.--gpu-devicepass-through behavior is preserved for Docker and Podman.--gpu-deviceas an advanced driver-native option.