Skip to content

fix(onboard): prefer CDI GPU mode over --gpus on CDI hosts#4956

Open
jason-ma-nv wants to merge 2 commits into
mainfrom
fix/onboard-prefer-cdi-gpu-mode-4948
Open

fix(onboard): prefer CDI GPU mode over --gpus on CDI hosts#4956
jason-ma-nv wants to merge 2 commits into
mainfrom
fix/onboard-prefer-cdi-gpu-mode-4948

Conversation

@jason-ma-nv

@jason-ma-nv jason-ma-nv commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

On Docker-driver GPU hosts that advertise an NVIDIA CDI spec (e.g. /etc/cdi/nvidia.yaml on Ubuntu 24.04/26.04), nemoclaw onboard selected --gpus all for the GPU patch recreate, the OpenShell supervisor never reconnected, the sandbox entered Error phase before the GPU proof, and onboard aborted with exit 1. This reorders GPU mode selection to prefer the CDI mode (--device nvidia.com/gpu=all) ahead of --gpus whenever a CDI spec is detected, matching how OpenShell's gateway start --gpu injects GPU devices.

Related Issue

Fixes #4948

Changes

  • src/lib/onboard/docker-gpu-patch.ts: buildDockerGpuModeCandidates now puts the cdi candidate first when cdiAvailable is true; --gpus and the NVIDIA runtime remain as fallbacks if the CDI probe fails. Non-CDI hosts are unaffected (order unchanged: gpus, nvidia-runtime). Jetson path unchanged.
  • src/lib/onboard/docker-gpu-patch.test.ts: adds a repro test asserting that on a CDI host where the --gpus probe would pass, the cdi mode is selected; updates the existing candidate-order assertion and stale comments to the corrected ordering.

Why this is the right layer

The create-only probe (docker create --gpus all) is accepted on these hosts, so --gpus all looked viable but diverges at runtime from OpenShell's CDI-based injection — the supervisor then never reconnects to the recreated container. Preferring CDI when a spec is present removes that divergence.

Validation caveat

The supervisor-reconnect failure is a runtime symptom that only manifests on real GPU + Docker-CDI hardware. Unit tests pin the deterministic mode-selection decision; final confirmation requires the GPU E2E path (e2e-branch-validation:gpu) on an affected host. The existing NEMOCLAW_DOCKER_GPU_PATCH=0 escape hatch is unchanged.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Verification notes: the full onboard suite passes (npx vitest run src/lib/onboard/ — 1189 tests, including the new #4948 repro), npm run typecheck:cli passes, and Biome is clean on the changed files. The full npm test run has 16 pre-existing, environment-dependent failures (network/port-binding e2e-framework fixtures, a MODULE_NOT_FOUND in ssrf-parity, and missing Docker/OpenShell fixtures in fetch-guard-patch-regression) that are unrelated to this change and do not reference the modified module — hence npm test is left unchecked.


Signed-off-by: Jason Ma jama@nvidia.com

Summary by CodeRabbit

  • Bug Fixes

    • Improved Docker GPU detection: when the Container Device Interface (CDI) is available it is now preferred, otherwise the system falls back to legacy GPU options — improving GPU compatibility and reliability.
  • Tests

    • Added regression tests covering CDI-first selection, fallbacks to other GPU modes, and confirming CDI-based launches omit legacy GPU flags.

On Docker-driver GPU hosts that advertise an NVIDIA CDI spec (e.g.
/etc/cdi/nvidia.yaml on Ubuntu 24.04/26.04), `nemoclaw onboard` selected
the `--gpus all` mode for the GPU patch recreate. `docker create --gpus
all` is accepted on these hosts so the create-only probe passed, but
OpenShell's `gateway start --gpu` injects devices from the CDI spec, so a
container recreated via the legacy --gpus path diverges from how the
supervisor expects the GPU container to be wired up. The supervisor never
reconnected, the sandbox entered Error phase before the GPU proof, and
onboard aborted with exit 1.

Reorder GPU mode candidates so the CDI mode (`--device nvidia.com/gpu=all`)
is preferred ahead of --gpus whenever a CDI spec is detected; --gpus and the
NVIDIA runtime remain as fallbacks if the CDI probe fails. Non-CDI hosts are
unaffected (candidate order unchanged).

Note: the supervisor-reconnect failure is a runtime symptom on real GPU
hardware; final validation requires the GPU E2E path (see "verify" below).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jason-ma-nv jason-ma-nv self-assigned this Jun 8, 2026
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: gpu-repo-local-ollama-openclaw, gpu-e2e
Optional E2E: gpu-double-onboard-e2e, issue-3600-gpu-proof-optional-e2e

Dispatch hint: gpu-repo-local-ollama-openclaw

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • gpu-repo-local-ollama-openclaw (high): Best targeted existing scenario for this PR: it runs the real OpenClaw local Ollama GPU onboarding flow on a self-hosted GPU Docker-CDI runner, covering the host shape this patch changes and verifying the sandbox reaches a ready state with local GPU inference/proxy assertions.
  • gpu-e2e (high): Runs the existing real user flow for install → onboard with NEMOCLAW_PROVIDER=ollama → recreate GPU-enabled sandbox → verify inference. This should be merge-blocking because the changed Docker GPU mode selection can break supervisor reconnect and local GPU inference even if unit tests pass.

Optional E2E

  • gpu-double-onboard-e2e (high): Useful adjacent confidence for repeated GPU onboarding/recreate behavior with Ollama. The PR does not directly change token persistence, so this is optional unless the required GPU E2E shows reconnect or re-onboard instability.
  • issue-3600-gpu-proof-optional-e2e (low): Adjacent GPU-onboarding guard for optional GPU proof behavior. It is lower value for this specific CDI-first recreate change but can catch accidental breakage in GPU preflight/onboard gating.

New E2E recommendations

  • docker-cdi-gpu-patch-mode (high): Existing GPU E2E validates end-to-end readiness/inference but does not appear to explicitly assert that a Docker-CDI host selects and applies --device nvidia.com/gpu=all rather than legacy --gpus all. A focused E2E should inspect the patched create option/container config or emitted diagnostics after onboard on a CDI host.
    • Suggested test: Add a focused Docker-CDI GPU patch E2E that onboards with local Ollama on a CDI GPU runner, asserts the patched container was recreated with --device nvidia.com/gpu=all, asserts --gpus is absent, and verifies supervisor reconnect plus inference.

Dispatch hint

  • Workflow: e2e-scenarios.yaml
  • jobs input: gpu-repo-local-ollama-openclaw

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: gpu-repo-local-ollama-openclaw
Optional scenario E2E: None

Dispatch required scenario E2E:

  • gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=gpu-repo-local-ollama-openclaw

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • gpu-repo-local-ollama-openclaw: Changes Docker GPU patch mode selection and CDI-vs---gpus recreation behavior; the GPU Docker CDI scenario is the scenario-suite path that exercises a local Ollama OpenClaw sandbox on an NVIDIA GPU/CDI Docker runtime.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=gpu-repo-local-ollama-openclaw

Optional scenario E2E

  • None.

Relevant changed files

  • src/lib/onboard/docker-gpu-patch.ts

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f306d9cd-4edf-45a9-83a2-5b71b5dd0117

📥 Commits

Reviewing files that changed from the base of the PR and between db2f9f6 and 591651e.

📒 Files selected for processing (2)
  • src/lib/onboard/docker-gpu-patch-mode-selection.test.ts
  • src/lib/onboard/docker-gpu-patch.test.ts
💤 Files with no reviewable changes (1)
  • src/lib/onboard/docker-gpu-patch.test.ts

📝 Walkthrough

Walkthrough

Reorders GPU patch mode probing to prefer CDI when Docker reports a readable NVIDIA CDI spec, reorganizes related re-exports, and adds tests validating CDI-first selection, fallback to --gpus and nvidia-runtime, and propagation of the CDI device flag into recreate flows.

Changes

CDI-first GPU patch mode candidate selection

Layer / File(s) Summary
CDI-first candidate ordering and re-exports
src/lib/onboard/docker-gpu-patch.ts
Re-export section from ./docker-gpu-supervisor-reconnect is restructured; buildDockerGpuModeCandidates now places the CDI candidate before --gpus and nvidia-runtime when cdiAvailable is true.
New CDI-first mode selection tests
src/lib/onboard/docker-gpu-patch-mode-selection.test.ts
Adds Vitest suite that stubs CDI host probes and inspects Docker recreate flows to assert CDI preference, fallback to gpus, fallback to nvidia-runtime, and that recreateOpenShellDockerSandboxWithGpu uses the CDI --device nvidia.com/gpu=all flag and omits --gpus.
Update existing unit tests and comments
src/lib/onboard/docker-gpu-patch.test.ts
Renamed/updated unit test expectations to reflect CDI-first ordering and adjusted a comment to state CDI candidates are preferred ahead of --gpus on CDI hosts.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

  • NVIDIA/NemoClaw#4407: Shares related diagnostics and docker-gpu-patch plumbing changes (getSandboxFailurePhase, captureDockerGpuPatchSandboxSnapshot, classifyDockerGpuPatchFailure).

Suggested labels

Docker, platform: container, fix, Sandbox, bug-fix

Suggested reviewers

  • cv
  • prekshivyas

Poem

🐰 I found a CDI flag bright and small,
I nudged it forward, now it leads the call.
No legacy flags trailing behind the cart,
The sandbox wakes with GPU in its heart.
Hooray — CDI first, a hoppity new start!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the primary change: preferring CDI GPU mode over --gpus on CDI hosts, directly addressing the main objective of the PR.
Linked Issues check ✅ Passed The PR fully addresses issue #4948 by reordering GPU mode candidates to prefer CDI first when available, ensuring supervisor reconnection and successful onboarding on CDI-capable Docker hosts.
Out of Scope Changes check ✅ Passed All code changes are directly scoped to GPU mode candidate ordering and selection logic; no unrelated modifications are present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/onboard-prefer-cdi-gpu-mode-4948

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 1 needs attention, 3 worth checking, 0 nice ideas
Since last review: 3 prior items resolved, 2 still apply, 1 new item found

Review findings

🛠️ Needs attention

  • Linked issue runtime acceptance is still not proven (src/lib/onboard/docker-gpu-patch.ts:425): Issue [Ubuntu 24.04][Onboard] onboard cannot create a GPU-enabled sandbox on Docker-driver GPU host #4948's expected result is that a normal first onboard on an affected Docker-CDI GPU host creates a GPU-enabled sandbox, the OpenShell supervisor reconnects, the GPU proof runs, the sandbox reaches Ready, and onboard completes. This patch now proves deterministic CDI selection, fallback ordering, and that the recreate command receives `--device nvidia.com/gpu=all`, but the changed tests still stop at mocked selection/recreate boundaries and do not observe supervisor reconnect, GPU proof, sandbox phase Ready, or first-onboard completion.
    • Recommendation: Add or identify targeted runtime/integration validation for normal first onboard on an affected Docker-CDI GPU host, asserting that the recreated container records/uses `--device nvidia.com/gpu=all` rather than `--gpus all` and reaches Ready after supervisor reconnect and GPU proof.
    • Evidence: The new `recreateOpenShellDockerSandboxWithGpu()` test in `docker-gpu-patch-mode-selection.test.ts` mocks `runOpenshell` as success and asserts Docker args only. The issue's Expected Result says: "The sandbox is created with GPU access, the OpenShell supervisor reconnects to the GPU-enabled container, the GPU proof runs, the sandbox reaches Ready, and the first onboard completes."

🔎 Worth checking

  • Source-of-truth review needed: Docker GPU mode selection on hosts that advertise NVIDIA CDI specs: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `docker-gpu-patch.ts` comments at the candidate builder explain the Docker-CDI probe/runtime divergence and fallback intent; new `docker-gpu-patch-mode-selection.test.ts` cases cover ordering and fallback but not source ownership or removal criteria.
  • Sandbox lifecycle security confidence still depends on real Docker-CDI validation (src/lib/onboard/docker-gpu-patch-mode-selection.test.ts:111): No direct sandbox escape, credential leakage, policy bypass, SSRF bypass, workflow, or shell-injection issue was found. However, this is a security-sensitive Docker sandbox lifecycle path, and the changed tests use mocked Docker/OpenShell boundaries. They verify that CDI args are selected and passed, but not that the real Docker-CDI recreate keeps the supervisor and sandbox policy in the intended state.
    • Recommendation: Cover the real Docker-CDI host behavior in runtime validation: after CDI recreate, the supervisor reconnects before timeout, the sandbox does not enter Error during GPU enablement, and the GPU proof runs under the expected container wiring.
    • Evidence: The new test asserts `dockerRunDetached` receives `--device nvidia.com/gpu=all` and omits `--gpus`, while `runOpenshell` is mocked to return `{ status: 0 }`; no changed test exercises real Docker/OpenShell supervisor reconnect behavior.
  • CDI-first workaround still lacks a source-of-truth exit condition (src/lib/onboard/docker-gpu-patch.ts:419): The localized compatibility behavior now documents the invalid state and has good regression tests for ordering and fallback. The remaining gap is source-of-truth: the code does not explain why the gateway/supervisor GPU injection divergence cannot be fixed or unified at its source in this PR, and it does not state when this CDI-preference workaround can be removed or revisited.
    • Recommendation: Add a brief code comment or design note identifying the owning source boundary for OpenShell CDI injection versus NemoClaw Docker GPU probing, why the source cannot be changed here, and the removal condition, such as replacing CDI-spec probing with an authoritative OpenShell/Docker GPU mode contract.
    • Evidence: Production comments explain that `docker create --gpus all` can pass on Docker-CDI hosts while runtime supervisor wiring diverges, and tests pin `[cdi, gpus, nvidia-runtime]`; no changed code/comment states the source-fix constraint or removal condition.

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — Normal first onboard on an affected Docker-CDI GPU host creates/recreates with `patched_create_option=--device nvidia.com/gpu=all`, not `--gpus all`, and reaches Ready.. Unit coverage is now strong for deterministic mode selection, fallback ordering, and recreate argument propagation, but the linked bug is a runtime/sandbox infrastructure path involving Docker-CDI hardware, container recreation, OpenShell supervisor reconnect, GPU proof, and sandbox Ready state.
  • **Runtime validation** — After CDI recreate on a Docker-CDI GPU host, the OpenShell supervisor reconnects before timeout and the GPU proof runs.. Unit coverage is now strong for deterministic mode selection, fallback ordering, and recreate argument propagation, but the linked bug is a runtime/sandbox infrastructure path involving Docker-CDI hardware, container recreation, OpenShell supervisor reconnect, GPU proof, and sandbox Ready state.
  • **Runtime validation** — First onboard with no existing sandbox on Ubuntu 24.04 or 26.04 Docker-driver plus NVIDIA CDI completes successfully with GPU auto-detected.. Unit coverage is now strong for deterministic mode selection, fallback ordering, and recreate argument propagation, but the linked bug is a runtime/sandbox infrastructure path involving Docker-CDI hardware, container recreation, OpenShell supervisor reconnect, GPU proof, and sandbox Ready state.
  • **Runtime validation** — During the GPU-enable step on the CDI host profile, the sandbox phase does not enter Error before the GPU proof can run.. Unit coverage is now strong for deterministic mode selection, fallback ordering, and recreate argument propagation, but the linked bug is a runtime/sandbox infrastructure path involving Docker-CDI hardware, container recreation, OpenShell supervisor reconnect, GPU proof, and sandbox Ready state.
  • **Acceptance clause:** On a Docker-driver GPU host (NVIDIA GPU auto-detected), `nemoclaw onboard` cannot bring up a GPU-enabled sandbox. — add test evidence or identify existing coverage. The diff changes lower-level Docker GPU patch mode ordering and adds mocked CDI-host tests, but no changed test runs `nemoclaw onboard`.
  • **Acceptance clause:** While creating the sandbox, onboard enables GPU passthrough — this is the standard create-then-GPU-enable path that runs on a normal FIRST onboard whenever a GPU is present on a Docker-driver gateway (gated by NEMOCLAW_DOCKER_GPU_PATCH); — add test evidence or identify existing coverage. The new recreate test calls `recreateOpenShellDockerSandboxWithGpu()` directly and asserts CDI args reach `dockerRunDetached`; it does not exercise the first-onboard orchestration or no-existing-sandbox state.
  • **Acceptance clause:** The OpenShell supervisor never reconnects to the GPU-enabled container, so the sandbox enters Error phase before the GPU proof can run, the step aborts with exit 1, and onboard fails. — add test evidence or identify existing coverage. The changed tests mock `runOpenshell` success and do not observe supervisor reconnect, sandbox Error/Ready phase, GPU proof, or onboard exit.
  • **Acceptance clause:** Device: GPU CI runners — Ubuntu 24.04 (NVIDIA RTX PRO 6000 Blackwell Server Edition, 97887 MB) and Ubuntu 26.04 (NVIDIA RTX A6000, 46068 MB) — add test evidence or identify existing coverage. The tests simulate `/etc/cdi/nvidia.yaml` but do not run on or model the affected GPU hardware/Ubuntu runtime profiles.
Since last review details

Current findings:

  • Source-of-truth review needed: Docker GPU mode selection on hosts that advertise NVIDIA CDI specs: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `docker-gpu-patch.ts` comments at the candidate builder explain the Docker-CDI probe/runtime divergence and fallback intent; new `docker-gpu-patch-mode-selection.test.ts` cases cover ordering and fallback but not source ownership or removal criteria.
  • Linked issue runtime acceptance is still not proven (src/lib/onboard/docker-gpu-patch.ts:425): Issue [Ubuntu 24.04][Onboard] onboard cannot create a GPU-enabled sandbox on Docker-driver GPU host #4948's expected result is that a normal first onboard on an affected Docker-CDI GPU host creates a GPU-enabled sandbox, the OpenShell supervisor reconnects, the GPU proof runs, the sandbox reaches Ready, and onboard completes. This patch now proves deterministic CDI selection, fallback ordering, and that the recreate command receives `--device nvidia.com/gpu=all`, but the changed tests still stop at mocked selection/recreate boundaries and do not observe supervisor reconnect, GPU proof, sandbox phase Ready, or first-onboard completion.
    • Recommendation: Add or identify targeted runtime/integration validation for normal first onboard on an affected Docker-CDI GPU host, asserting that the recreated container records/uses `--device nvidia.com/gpu=all` rather than `--gpus all` and reaches Ready after supervisor reconnect and GPU proof.
    • Evidence: The new `recreateOpenShellDockerSandboxWithGpu()` test in `docker-gpu-patch-mode-selection.test.ts` mocks `runOpenshell` as success and asserts Docker args only. The issue's Expected Result says: "The sandbox is created with GPU access, the OpenShell supervisor reconnects to the GPU-enabled container, the GPU proof runs, the sandbox reaches Ready, and the first onboard completes."
  • Sandbox lifecycle security confidence still depends on real Docker-CDI validation (src/lib/onboard/docker-gpu-patch-mode-selection.test.ts:111): No direct sandbox escape, credential leakage, policy bypass, SSRF bypass, workflow, or shell-injection issue was found. However, this is a security-sensitive Docker sandbox lifecycle path, and the changed tests use mocked Docker/OpenShell boundaries. They verify that CDI args are selected and passed, but not that the real Docker-CDI recreate keeps the supervisor and sandbox policy in the intended state.
    • Recommendation: Cover the real Docker-CDI host behavior in runtime validation: after CDI recreate, the supervisor reconnects before timeout, the sandbox does not enter Error during GPU enablement, and the GPU proof runs under the expected container wiring.
    • Evidence: The new test asserts `dockerRunDetached` receives `--device nvidia.com/gpu=all` and omits `--gpus`, while `runOpenshell` is mocked to return `{ status: 0 }`; no changed test exercises real Docker/OpenShell supervisor reconnect behavior.
  • CDI-first workaround still lacks a source-of-truth exit condition (src/lib/onboard/docker-gpu-patch.ts:419): The localized compatibility behavior now documents the invalid state and has good regression tests for ordering and fallback. The remaining gap is source-of-truth: the code does not explain why the gateway/supervisor GPU injection divergence cannot be fixed or unified at its source in this PR, and it does not state when this CDI-preference workaround can be removed or revisited.
    • Recommendation: Add a brief code comment or design note identifying the owning source boundary for OpenShell CDI injection versus NemoClaw Docker GPU probing, why the source cannot be changed here, and the removal condition, such as replacing CDI-spec probing with an authoritative OpenShell/Docker GPU mode contract.
    • Evidence: Production comments explain that `docker create --gpus all` can pass on Docker-CDI hosts while runtime supervisor wiring diverges, and tests pin `[cdi, gpus, nvidia-runtime]`; no changed code/comment states the source-fix constraint or removal condition.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

…4948)

Address PR review-advisor findings on the CDI-first GPU mode change:

- Move the CDI mode-selection tests out of the docker-gpu-patch.test.ts
  monolith into a focused docker-gpu-patch-mode-selection.test.ts spec,
  offsetting the flagged monolith growth (back to ~baseline line count).
- Pin the fallback chain on a CDI host: CDI probe fails -> --gpus selected;
  CDI and --gpus probes fail -> NVIDIA runtime selected (attempt order
  starts with cdi in both cases).
- Add a recreate-boundary assertion: recreateOpenShellDockerSandboxWithGpu
  passes --device nvidia.com/gpu=all to dockerRunDetached on a CDI host and
  never emits --gpus, proving the selected CDI mode reaches the real recreate
  command (the patched_create_option the issue logs).

All four new tests fail under the previous --gpus-first ordering, confirming
they pin the fix rather than restating it. No production code change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jason-ma-nv jason-ma-nv added the v0.0.61 Release target label Jun 8, 2026
@wscurran wscurran added area: providers Inference provider integrations and provider behavior bug-fix PR fixes a bug or regression labels Jun 8, 2026
@wscurran

wscurran commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

@wscurran wscurran requested a review from cv June 8, 2026 16:08
@cv cv added v0.0.62 Release target and removed v0.0.61 Release target labels Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: providers Inference provider integrations and provider behavior bug-fix PR fixes a bug or regression v0.0.62 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Ubuntu 24.04][Onboard] onboard cannot create a GPU-enabled sandbox on Docker-driver GPU host

3 participants