-
Notifications
You must be signed in to change notification settings - Fork 385
Description
Problem Statement
openshell sandbox create currently exposes device access as a special-case GPU feature via --gpu, but there is no first-class CLI support for requesting other device types.
That creates several gaps:
- users can request NVIDIA GPU access, but not other device-plugin resources
- there is no CLI path for SR-IOV VFs, DPUs, SmartNICs, or vendor accelerator resources
- there is no CLI path for CDI-backed device requests
- advanced users are pushed toward unsupported escape hatches such as low-level API usage or raw CR authoring
This makes the CLI narrower than the kinds of device access the platform may eventually support, and keeps sandbox device requests tied to a GPU-only mental model.
Proposed Design
Add a generic sandbox device-request CLI surface that goes beyond --gpu.
Goals:
- preserve
--gpuas the existing compatibility shorthand - add explicit flags for generic device requests
- avoid overloading
--gpuwith multiple unrelated meanings
Suggested CLI shape:
--gpu- compatibility shorthand for one NVIDIA GPU
--resource <name>=<count>- request Kubernetes allocatable device resources such as:
nvidia.com/gpu=2vendor.com/vf=1example.com/dpu=1
- request Kubernetes allocatable device resources such as:
--cdi <device-id>- request explicit CDI-backed device IDs, when supported by the active gateway/runtime
--host-device <path-or-id>- expert-only raw host-device request, gated and clearly marked as unsupported by default
This should map to a structured device request model in the sandbox spec rather than trying to overload --gpu to mean boolean, mode selector, count, and device ID all at once.
Behavioral expectations:
--gpuremains supported and maps to one NVIDIA GPU- generic device flags are additive and repeatable where appropriate
- validation should reject conflicting combinations
- unsupported request types should fail early with clear guidance
- CDI-backed requests should depend on runtime/gateway capability discovery
- host-device passthrough should be expert-only, not a normal default path
This issue is specifically about the CLI surface. The underlying server/schema work may be tracked separately, but the CLI should be designed around a generic device model rather than a GPU-only extension path.
Alternatives Considered
- Extend
--gputhe same way gateway PR feat(bootstrap,cli): switch GPU injection to CDI where supported #495 extended gateway GPU flags
- This works better for gateway/runtime injection mode than for sandbox workload requests.
- For sandboxes, overloading
--gpuwould mix too many concepts:- compatibility shorthand
- device count
- device kind
- CDI selection
- expert passthrough
- That would make the sandbox CLI harder to understand than a dedicated device-request flag family.
- Keep only
--gpuand rely on lower-level APIs for everything else
- This leaves the CLI behind the platform’s capabilities.
- It forces users into unsupported or inconsistent workflows.
- Add only multi-GPU support and stop there
- This solves one immediate need but preserves the GPU-only abstraction.
- It does not address non-GPU device classes or CDI-backed workflows.
Agent Investigation
- Investigated the current sandbox CLI and found
openshell sandbox createonly exposes a boolean--gpuflag. - Investigated the current sandbox API and found the product surface is still GPU-oriented rather than generic-device-oriented.
- Confirmed that gateway/device work such as PR feat(bootstrap,cli): switch GPU injection to CDI where supported #495 is related but separate: it improves gateway/runtime device injection, not the sandbox CLI for workload device requests.
- Confirmed that the underlying system may support richer device models in the future, but the CLI does not currently provide a clean, first-class way to express them.