Skip to content

feat(cli): add generic sandbox device request flags #628

@cheese-head

Description

@cheese-head

Problem Statement

openshell sandbox create currently exposes device access as a special-case GPU feature via --gpu, but there is no first-class CLI support for requesting other device types.

That creates several gaps:

  • users can request NVIDIA GPU access, but not other device-plugin resources
  • there is no CLI path for SR-IOV VFs, DPUs, SmartNICs, or vendor accelerator resources
  • there is no CLI path for CDI-backed device requests
  • advanced users are pushed toward unsupported escape hatches such as low-level API usage or raw CR authoring

This makes the CLI narrower than the kinds of device access the platform may eventually support, and keeps sandbox device requests tied to a GPU-only mental model.

Proposed Design

Add a generic sandbox device-request CLI surface that goes beyond --gpu.

Goals:

  • preserve --gpu as the existing compatibility shorthand
  • add explicit flags for generic device requests
  • avoid overloading --gpu with multiple unrelated meanings

Suggested CLI shape:

  • --gpu
    • compatibility shorthand for one NVIDIA GPU
  • --resource <name>=<count>
    • request Kubernetes allocatable device resources such as:
      • nvidia.com/gpu=2
      • vendor.com/vf=1
      • example.com/dpu=1
  • --cdi <device-id>
    • request explicit CDI-backed device IDs, when supported by the active gateway/runtime
  • --host-device <path-or-id>
    • expert-only raw host-device request, gated and clearly marked as unsupported by default

This should map to a structured device request model in the sandbox spec rather than trying to overload --gpu to mean boolean, mode selector, count, and device ID all at once.

Behavioral expectations:

  • --gpu remains supported and maps to one NVIDIA GPU
  • generic device flags are additive and repeatable where appropriate
  • validation should reject conflicting combinations
  • unsupported request types should fail early with clear guidance
  • CDI-backed requests should depend on runtime/gateway capability discovery
  • host-device passthrough should be expert-only, not a normal default path

This issue is specifically about the CLI surface. The underlying server/schema work may be tracked separately, but the CLI should be designed around a generic device model rather than a GPU-only extension path.

Alternatives Considered

  1. Extend --gpu the same way gateway PR feat(bootstrap,cli): switch GPU injection to CDI where supported #495 extended gateway GPU flags
  • This works better for gateway/runtime injection mode than for sandbox workload requests.
  • For sandboxes, overloading --gpu would mix too many concepts:
    • compatibility shorthand
    • device count
    • device kind
    • CDI selection
    • expert passthrough
  • That would make the sandbox CLI harder to understand than a dedicated device-request flag family.
  1. Keep only --gpu and rely on lower-level APIs for everything else
  • This leaves the CLI behind the platform’s capabilities.
  • It forces users into unsupported or inconsistent workflows.
  1. Add only multi-GPU support and stop there
  • This solves one immediate need but preserves the GPU-only abstraction.
  • It does not address non-GPU device classes or CDI-backed workflows.

Agent Investigation

  • Investigated the current sandbox CLI and found openshell sandbox create only exposes a boolean --gpu flag.
  • Investigated the current sandbox API and found the product surface is still GPU-oriented rather than generic-device-oriented.
  • Confirmed that gateway/device work such as PR feat(bootstrap,cli): switch GPU injection to CDI where supported #495 is related but separate: it improves gateway/runtime device injection, not the sandbox CLI for workload device requests.
  • Confirmed that the underlying system may support richer device models in the future, but the CLI does not currently provide a clean, first-class way to express them.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions