Skip to content

Define GPU validation tests for GPU-enabled drivers #1472

@linear

Description

@linear

Problem Statement

Instead of only running nvidia-smi in a sandbox, OpenShell should have GPU validation tests that exercise basic GPU functionality from inside a GPU-enabled sandbox.

The existing GPU e2e tests cover device discovery, selection, and visibility. This issue adds an execution-focused GPU validation test that verifies a sandbox can run a basic CUDA workload. The test structure should leave room for future validation classes such as OpenCL, Vulkan, and additional driver integrations.

Proposed Design

Use “GPU validation tests” as the umbrella term. Device-selection tests are one class of GPU validation; CUDA execution tests are another.

Initial Scope

The first implementation should target the Docker compute driver only. Docker is currently the most mature GPU-enabled e2e path.

The test code should be organized so additional GPU-enabled drivers can reuse the same validation categories later, but Podman, Kubernetes, and VM integration are out of scope for this issue. Podman should be the first follow-up driver.

Test Layout

Move the existing GPU device-selection tests into a broader GPU test target and add the CUDA execution test alongside them:

e2e/rust/tests/gpu.rs
e2e/rust/tests/gpu/
  device_selection.rs
  execution.rs
  cuda.rs

Update the e2e Cargo test target to use the umbrella GPU test:

[[test]]
name = "gpu"
path = "tests/gpu.rs"
required-features = ["e2e-gpu"]

The existing device-selection test behavior should remain unchanged after the move.

CUDA Execution Test

The initial CUDA execution test should cover the default driver behavior when --gpu is requested.

The test should create a Docker-backed OpenShell sandbox with GPU enabled and use the configured CUDA workload image directly as the sandbox image:

openshell sandbox create \
  --gpu \
  --from "$OPENSHELL_E2E_GPU_CUDA_WORKLOAD_IMAGE"

The workload image should run its default command. Do not add a workload command override in the first implementation.

The first PR should not add --gpu-device or per-device workload permutations. Existing GPU e2e tests continue to cover device selection and nvidia-smi visibility behavior.

Workload Image Contract

Add OPENSHELL_E2E_GPU_CUDA_WORKLOAD_IMAGE for the CUDA execution image.

The image must be usable directly as an OpenShell sandbox image. It should run without requiring nested container tooling or network access inside the sandbox.

The image base is not part of the contract. Workload images are separate test artifacts with their own dependency sets and command contracts.

The workload image must emit this stable success marker to stdout or stderr when the GPU workload completes successfully:

OPENSHELL_GPU_WORKLOAD_SUCCESS

The e2e test should require both:

  • the sandbox workload command exits successfully,
  • combined stdout/stderr contains OPENSHELL_GPU_WORKLOAD_SUCCESS.

On failure, test output should include the workload image name, sandbox context, exit status when available, stdout, and stderr.

Missing Image Behavior

If OPENSHELL_E2E_GPU_CUDA_WORKLOAD_IMAGE is unset or empty, the CUDA execution test may skip itself.

Rust’s default test harness has no dynamic skipped status, so this may appear as ok in the Cargo summary. The test must still emit a clear log message, for example:

skipping CUDA GPU execution test: OPENSHELL_E2E_GPU_CUDA_WORKLOAD_IMAGE is not set

The Docker GPU e2e runner should also print a high-level note when the variable is unset so CI/local logs clearly show that CUDA execution coverage did not run.

Out of Scope

This issue should not add host/runtime preflight checks for the CUDA workload image. Existing GPU setup and visibility checks remain separate concerns. The new execution test should focus on whether an OpenShell sandbox can run a GPU workload successfully once the GPU-enabled driver and workload image are available.

This issue should not define or publish the reference workload images. That work is tracked separately in #1476.

Follow-ups

  • Add Podman integration for the GPU validation tests.
  • Define and publish reference GPU validation images in test(e2e): define GPU validation image artifacts #1476.
  • Once a reference CUDA image is published, update GPU CI to set OPENSHELL_E2E_GPU_CUDA_WORKLOAD_IMAGE and require CUDA execution coverage.
  • Add additional validation classes such as OpenCL or Vulkan when suitable workload images exist.
  • Consider refactoring shared GPU validation helpers when a second workload or driver integration is added.

Acceptance Criteria

  • Existing GPU device-selection tests are moved under the new GPU validation test layout without behavior changes.
  • A new CUDA execution validation test is added for Docker-backed GPU sandboxes.
  • The CUDA execution test uses OPENSHELL_E2E_GPU_CUDA_WORKLOAD_IMAGE through openshell sandbox create --from.
  • The CUDA execution test runs the image default command.
  • The CUDA execution test asserts OPENSHELL_GPU_WORKLOAD_SUCCESS in stdout or stderr.
  • Missing OPENSHELL_E2E_GPU_CUDA_WORKLOAD_IMAGE is reported clearly in test/e2e output and does not fail the lane initially.
  • The Docker GPU e2e path runs the umbrella gpu test target.
  • Documentation or e2e README content explains how to configure and run the CUDA execution validation locally.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions