Define GPU validation tests for GPU-enabled drivers

## Problem Statement

Instead of only running `nvidia-smi` in a sandbox, OpenShell should have GPU validation tests that exercise basic GPU functionality from inside a GPU-enabled sandbox.

The existing GPU e2e tests cover device discovery, selection, and visibility. This issue adds an execution-focused GPU validation test that verifies a sandbox can run a basic CUDA workload. The test structure should leave room for future validation classes such as OpenCL, Vulkan, and additional driver integrations.

## Proposed Design

Use “GPU validation tests” as the umbrella term. Device-selection tests are one class of GPU validation; CUDA execution tests are another.

### Initial Scope

The first implementation should target the Docker compute driver only. Docker is currently the most mature GPU-enabled e2e path.

The test code should be organized so additional GPU-enabled drivers can reuse the same validation categories later, but Podman, Kubernetes, and VM integration are out of scope for this issue. Podman should be the first follow-up driver.

### Test Layout

Move the existing GPU device-selection tests into a broader GPU test target and add the CUDA execution test alongside them:

```text
e2e/rust/tests/gpu.rs
e2e/rust/tests/gpu/
  device_selection.rs
  execution.rs
  cuda.rs
```

Update the e2e Cargo test target to use the umbrella GPU test:

```toml
[[test]]
name = "gpu"
path = "tests/gpu.rs"
required-features = ["e2e-gpu"]
```

The existing device-selection test behavior should remain unchanged after the move.

### CUDA Execution Test

The initial CUDA execution test should cover the default driver behavior when `--gpu` is requested.

The test should create a Docker-backed OpenShell sandbox with GPU enabled and use the configured CUDA workload image directly as the sandbox image:

```shell
openshell sandbox create \
  --gpu \
  --from "$OPENSHELL_E2E_GPU_CUDA_WORKLOAD_IMAGE"
```

The workload image should run its default command. Do not add a workload command override in the first implementation.

The first PR should not add `--gpu-device` or per-device workload permutations. Existing GPU e2e tests continue to cover device selection and `nvidia-smi` visibility behavior.

### Workload Image Contract

Add `OPENSHELL_E2E_GPU_CUDA_WORKLOAD_IMAGE` for the CUDA execution image.

The image must be usable directly as an OpenShell sandbox image. It should run without requiring nested container tooling or network access inside the sandbox.

The image base is not part of the contract. Workload images are separate test artifacts with their own dependency sets and command contracts.

The workload image must emit this stable success marker to stdout or stderr when the GPU workload completes successfully:

```text
OPENSHELL_GPU_WORKLOAD_SUCCESS
```

The e2e test should require both:

- the sandbox workload command exits successfully,
- combined stdout/stderr contains `OPENSHELL_GPU_WORKLOAD_SUCCESS`.

On failure, test output should include the workload image name, sandbox context, exit status when available, stdout, and stderr.

### Missing Image Behavior

If `OPENSHELL_E2E_GPU_CUDA_WORKLOAD_IMAGE` is unset or empty, the CUDA execution test may skip itself.

Rust’s default test harness has no dynamic skipped status, so this may appear as `ok` in the Cargo summary. The test must still emit a clear log message, for example:

```text
skipping CUDA GPU execution test: OPENSHELL_E2E_GPU_CUDA_WORKLOAD_IMAGE is not set
```

The Docker GPU e2e runner should also print a high-level note when the variable is unset so CI/local logs clearly show that CUDA execution coverage did not run.

### Out of Scope

This issue should not add host/runtime preflight checks for the CUDA workload image. Existing GPU setup and visibility checks remain separate concerns. The new execution test should focus on whether an OpenShell sandbox can run a GPU workload successfully once the GPU-enabled driver and workload image are available.

This issue should not define or publish the reference workload images. That work is tracked separately in #1476.

### Follow-ups

- Add Podman integration for the GPU validation tests.
- Define and publish reference GPU validation images in #1476.
- Once a reference CUDA image is published, update GPU CI to set `OPENSHELL_E2E_GPU_CUDA_WORKLOAD_IMAGE` and require CUDA execution coverage.
- Add additional validation classes such as OpenCL or Vulkan when suitable workload images exist.
- Consider refactoring shared GPU validation helpers when a second workload or driver integration is added.

## Acceptance Criteria

- [ ] Existing GPU device-selection tests are moved under the new GPU validation test layout without behavior changes.
- [ ] A new CUDA execution validation test is added for Docker-backed GPU sandboxes.
- [ ] The CUDA execution test uses `OPENSHELL_E2E_GPU_CUDA_WORKLOAD_IMAGE` through `openshell sandbox create --from`.
- [ ] The CUDA execution test runs the image default command.
- [ ] The CUDA execution test asserts `OPENSHELL_GPU_WORKLOAD_SUCCESS` in stdout or stderr.
- [ ] Missing `OPENSHELL_E2E_GPU_CUDA_WORKLOAD_IMAGE` is reported clearly in test/e2e output and does not fail the lane initially.
- [ ] The Docker GPU e2e path runs the umbrella `gpu` test target.
- [ ] Documentation or e2e README content explains how to configure and run the CUDA execution validation locally.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define GPU validation tests for GPU-enabled drivers #1472

Problem Statement

Proposed Design

Initial Scope

Test Layout

CUDA Execution Test

Workload Image Contract

Missing Image Behavior

Out of Scope

Follow-ups

Acceptance Criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Define GPU validation tests for GPU-enabled drivers #1472

Description

Problem Statement

Proposed Design

Initial Scope

Test Layout

CUDA Execution Test

Workload Image Contract

Missing Image Behavior

Out of Scope

Follow-ups

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions