You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Instead of only running nvidia-smi in a sandbox, OpenShell should have GPU validation tests that exercise basic GPU functionality from inside a GPU-enabled sandbox.
The existing GPU e2e tests cover device discovery, selection, and visibility. This issue adds an execution-focused GPU validation test that verifies a sandbox can run a basic CUDA workload. The test structure should leave room for future validation classes such as OpenCL, Vulkan, and additional driver integrations.
Proposed Design
Use “GPU validation tests” as the umbrella term. Device-selection tests are one class of GPU validation; CUDA execution tests are another.
Initial Scope
The first implementation should target the Docker compute driver only. Docker is currently the most mature GPU-enabled e2e path.
The test code should be organized so additional GPU-enabled drivers can reuse the same validation categories later, but Podman, Kubernetes, and VM integration are out of scope for this issue. Podman should be the first follow-up driver.
Test Layout
Move the existing GPU device-selection tests into a broader GPU test target and add the CUDA execution test alongside them:
The workload image should run its default command. Do not add a workload command override in the first implementation.
The first PR should not add --gpu-device or per-device workload permutations. Existing GPU e2e tests continue to cover device selection and nvidia-smi visibility behavior.
Workload Image Contract
Add OPENSHELL_E2E_GPU_CUDA_WORKLOAD_IMAGE for the CUDA execution image.
The image must be usable directly as an OpenShell sandbox image. It should run without requiring nested container tooling or network access inside the sandbox.
The image base is not part of the contract. Workload images are separate test artifacts with their own dependency sets and command contracts.
The workload image must emit this stable success marker to stdout or stderr when the GPU workload completes successfully:
On failure, test output should include the workload image name, sandbox context, exit status when available, stdout, and stderr.
Missing Image Behavior
If OPENSHELL_E2E_GPU_CUDA_WORKLOAD_IMAGE is unset or empty, the CUDA execution test may skip itself.
Rust’s default test harness has no dynamic skipped status, so this may appear as ok in the Cargo summary. The test must still emit a clear log message, for example:
skipping CUDA GPU execution test: OPENSHELL_E2E_GPU_CUDA_WORKLOAD_IMAGE is not set
The Docker GPU e2e runner should also print a high-level note when the variable is unset so CI/local logs clearly show that CUDA execution coverage did not run.
Out of Scope
This issue should not add host/runtime preflight checks for the CUDA workload image. Existing GPU setup and visibility checks remain separate concerns. The new execution test should focus on whether an OpenShell sandbox can run a GPU workload successfully once the GPU-enabled driver and workload image are available.
This issue should not define or publish the reference workload images. That work is tracked separately in #1476.
Follow-ups
Add Podman integration for the GPU validation tests.
Problem Statement
Instead of only running
nvidia-smiin a sandbox, OpenShell should have GPU validation tests that exercise basic GPU functionality from inside a GPU-enabled sandbox.The existing GPU e2e tests cover device discovery, selection, and visibility. This issue adds an execution-focused GPU validation test that verifies a sandbox can run a basic CUDA workload. The test structure should leave room for future validation classes such as OpenCL, Vulkan, and additional driver integrations.
Proposed Design
Use “GPU validation tests” as the umbrella term. Device-selection tests are one class of GPU validation; CUDA execution tests are another.
Initial Scope
The first implementation should target the Docker compute driver only. Docker is currently the most mature GPU-enabled e2e path.
The test code should be organized so additional GPU-enabled drivers can reuse the same validation categories later, but Podman, Kubernetes, and VM integration are out of scope for this issue. Podman should be the first follow-up driver.
Test Layout
Move the existing GPU device-selection tests into a broader GPU test target and add the CUDA execution test alongside them:
Update the e2e Cargo test target to use the umbrella GPU test:
The existing device-selection test behavior should remain unchanged after the move.
CUDA Execution Test
The initial CUDA execution test should cover the default driver behavior when
--gpuis requested.The test should create a Docker-backed OpenShell sandbox with GPU enabled and use the configured CUDA workload image directly as the sandbox image:
openshell sandbox create \ --gpu \ --from "$OPENSHELL_E2E_GPU_CUDA_WORKLOAD_IMAGE"The workload image should run its default command. Do not add a workload command override in the first implementation.
The first PR should not add
--gpu-deviceor per-device workload permutations. Existing GPU e2e tests continue to cover device selection andnvidia-smivisibility behavior.Workload Image Contract
Add
OPENSHELL_E2E_GPU_CUDA_WORKLOAD_IMAGEfor the CUDA execution image.The image must be usable directly as an OpenShell sandbox image. It should run without requiring nested container tooling or network access inside the sandbox.
The image base is not part of the contract. Workload images are separate test artifacts with their own dependency sets and command contracts.
The workload image must emit this stable success marker to stdout or stderr when the GPU workload completes successfully:
The e2e test should require both:
OPENSHELL_GPU_WORKLOAD_SUCCESS.On failure, test output should include the workload image name, sandbox context, exit status when available, stdout, and stderr.
Missing Image Behavior
If
OPENSHELL_E2E_GPU_CUDA_WORKLOAD_IMAGEis unset or empty, the CUDA execution test may skip itself.Rust’s default test harness has no dynamic skipped status, so this may appear as
okin the Cargo summary. The test must still emit a clear log message, for example:The Docker GPU e2e runner should also print a high-level note when the variable is unset so CI/local logs clearly show that CUDA execution coverage did not run.
Out of Scope
This issue should not add host/runtime preflight checks for the CUDA workload image. Existing GPU setup and visibility checks remain separate concerns. The new execution test should focus on whether an OpenShell sandbox can run a GPU workload successfully once the GPU-enabled driver and workload image are available.
This issue should not define or publish the reference workload images. That work is tracked separately in #1476.
Follow-ups
OPENSHELL_E2E_GPU_CUDA_WORKLOAD_IMAGEand require CUDA execution coverage.Acceptance Criteria
OPENSHELL_E2E_GPU_CUDA_WORKLOAD_IMAGEthroughopenshell sandbox create --from.OPENSHELL_GPU_WORKLOAD_SUCCESSin stdout or stderr.OPENSHELL_E2E_GPU_CUDA_WORKLOAD_IMAGEis reported clearly in test/e2e output and does not fail the lane initially.gputest target.