Skip to content

fix(vllm): switch DGX Spark from upstream nightly build to NGC vllm 26.05.post1#4810

Merged
cv merged 1 commit into
mainfrom
fix/vllm-ngc-26-05
Jun 5, 2026
Merged

fix(vllm): switch DGX Spark from upstream nightly build to NGC vllm 26.05.post1#4810
cv merged 1 commit into
mainfrom
fix/vllm-ngc-26-05

Conversation

@zyang-dev
Copy link
Copy Markdown
Contributor

@zyang-dev zyang-dev commented Jun 5, 2026

Summary

Switches the DGX Spark vLLM profile from the upstream vllm/vllm-openai nightly to the stable NGC release `nvcr.io/nvidia/vllm:26.05.post1-py3

Changes

  • Added a VLLM_IMAGES map for profile-specific vLLM image selection.
  • Updated the DGX Spark profile to use nvcr.io/nvidia/vllm:26.05.post1-py3.
  • Updated vLLM profile detection tests for the Spark image change.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: zyang-dev 267119621+zyang-dev@users.noreply.github.com

Summary by CodeRabbit

  • Chores

    • Centralized and updated vLLM container image configuration for supported platforms.
  • Tests

    • Updated vLLM profile detection tests with latest supported container images.

…6.05.post1

Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 5, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1502864b-1c0a-43e7-bae1-2458d9f09841

📥 Commits

Reviewing files that changed from the base of the PR and between a92394f and 8472ddc.

📒 Files selected for processing (2)
  • src/lib/inference/vllm.ts
  • test/detect-vllm-profile.test.ts

📝 Walkthrough

Walkthrough

The PR centralizes vLLM container image configuration by introducing a new VLLM_IMAGES constant with explicit per-version image tags and updates three platform profiles (Spark, Station, generic Linux) to reference the centralized constant instead of prior single-image constants. Tests are updated to validate the new image references.

Changes

vLLM Image Configuration Centralization

Layer / File(s) Summary
Image constant consolidation
src/lib/inference/vllm.ts
A new VLLM_IMAGES object containing explicit nvcr.io vLLM image tags for post1 variants replaces prior UPSTREAM_VLLM_IMAGE and NGC_VLLM_IMAGE constants.
Platform profile updates
src/lib/inference/vllm.ts
SPARK_PROFILE, STATION_PROFILE, and GENERIC_LINUX_PROFILE are updated to reference the appropriate image from the new VLLM_IMAGES constant.
Test alignment and cleanup
test/detect-vllm-profile.test.ts
Spark profile test expectation is updated to the new nvcr.io/nvidia/vllm:26.05.post1-py3 tag, and an unused Station test comment is removed.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#4619: Prior PR modifying vLLM image configuration for the same platform profiles with different image tag strategy.

Suggested labels

Platform: DGX Spark, Platform: Station, Provider: vLLM

Poem

🐰 Images once scattered, now gathered with care,
Constants consolidated throughout the air,
Post1 tags aligned from Spark to Station's care,
A cleaner foundation, refactored fair!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: switching DGX Spark from upstream nightly vLLM build to NGC stable release 26.05.post1, which is directly supported by the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/vllm-ngc-26-05

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

E2E Advisor Recommendation

Required E2E: None
Optional E2E: None

Workflow run

Full advisor summary

E2E Recommendation Advisor

Failed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-advisor-raw-output.txt

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Failed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-scenario-advisor-raw-output.txt

@zyang-dev zyang-dev added the v0.0.60 Release target label Jun 5, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

PR Review Advisor

Findings: 0 needs attention, 2 worth checking, 0 nice ideas
Top item: Pin or verify the managed Spark vLLM image

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • Managed Spark vLLM image is tag-pinned but not digest-pinned or verified (src/lib/inference/vllm.ts:54): The PR switches DGX Spark to a new third-party executable container image, `nvcr.io/nvidia/vllm:26.05.post1-py3`, which NemoClaw pulls and runs with GPU access, a mounted Hugging Face cache, and potentially forwarded Hugging Face token environment variables. The tag is versioned, but tags can still be republished or resolve differently over time, so this weakens reproducibility and supply-chain integrity for a high-trust runtime path.
    • Recommendation: Pin the Spark image by immutable digest, for example `nvcr.io/nvidia/vllm:26.05.post1-py3@sha256:...`, or add an explicit image signature/digest verification step and document the trusted source for the digest.
    • Evidence: The new `VLLM_IMAGES.ngc2605Post1` value is a tag-only image reference at `src/lib/inference/vllm.ts:54`; `SPARK_PROFILE.image` uses that value at `src/lib/inference/vllm.ts:117`. Existing nearby code pulls the image and later runs it with the HF cache mount and optional HF token forwarding.
  • Runtime validation is still needed for the new Spark vLLM image (test/detect-vllm-profile.test.ts:18): The updated unit test confirms that DGX Spark selects the new image string, but this change affects an infrastructure/runtime path. The new image still needs to support the existing Spark default model and serve command flags, including NVFP4/modelopt/FlashInfer settings and readiness on `/v1/models`.
    • Recommendation: Add or identify targeted runtime/integration validation that starts the DGX Spark managed vLLM profile with `nvcr.io/nvidia/vllm:26.05.post1-py3`, serves `nvidia/Qwen3.6-35B-A3B-NVFP4`, and reaches the `/v1/models` readiness endpoint using the existing generated serve command.
    • Evidence: The test only updates `expect(profile!.image).toBe("nvcr.io/nvidia/vllm:26.05.post1-py3")`; the runtime start path in `src/lib/inference/vllm.ts` still composes and runs `docker run ... vllm serve ...`, but no changed test exercises pull/start/readiness behavior with the new image.

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — Validate on DGX Spark that managed vLLM with `nvcr.io/nvidia/vllm:26.05.post1-py3` starts the default `nvidia/Qwen3.6-35B-A3B-NVFP4` model and reaches `/v1/models`.. The unit test is sufficient for string/profile mapping, but managed vLLM is an infrastructure path where a container image tag change can fail only at pull/start/model-load/readiness time.
  • **Runtime validation** — Validate that the new Spark image accepts the existing generated NVFP4 serve command, including `--quantization modelopt`, FlashInfer/MoE flags, `--load-format fastsafetensors`, and Spark-specific serve environment exports.. The unit test is sufficient for string/profile mapping, but managed vLLM is an infrastructure path where a container image tag change can fail only at pull/start/model-load/readiness time.
  • **Runtime validation** — Validate that HF token forwarding for the Spark managed vLLM start path still uses key-only Docker env arguments and does not include token values in the composed Docker command.. The unit test is sufficient for string/profile mapping, but managed vLLM is an infrastructure path where a container image tag change can fail only at pull/start/model-load/readiness time.
  • **Runtime validation** — Keep the existing unit coverage that Station and generic Linux profiles continue selecting `nvcr.io/nvidia/vllm:26.03.post1-py3` after the shared image map refactor.. The unit test is sufficient for string/profile mapping, but managed vLLM is an infrastructure path where a container image tag change can fail only at pull/start/model-load/readiness time.
  • **Runtime validation is still needed for the new Spark vLLM image** — Add or identify targeted runtime/integration validation that starts the DGX Spark managed vLLM profile with `nvcr.io/nvidia/vllm:26.05.post1-py3`, serves `nvidia/Qwen3.6-35B-A3B-NVFP4`, and reaches the `/v1/models` readiness endpoint using the existing generated serve command.
  • **Acceptance clause:** `Tests added or updated for new or changed behavior` — add test evidence or identify existing coverage. The profile-selection unit expectation was updated, but no changed or identified runtime validation exercises pulling/starting the new Spark container image with the default model and serve flags.
  • **Acceptance clause:** `Docs updated for user-facing behavior changes` — add test evidence or identify existing coverage. No docs changed. Nearby user docs describe managed vLLM availability and the Spark default model, but I did not find a user-facing exact container-tag reference that would clearly require updating.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@cv cv merged commit 9096d1f into main Jun 5, 2026
35 checks passed
@cv cv deleted the fix/vllm-ngc-26-05 branch June 5, 2026 07:10
miyoungc added a commit that referenced this pull request Jun 6, 2026
## Summary
- Adds the `v0.0.60` section to `docs/about/release-notes.mdx` using the
dev announcement from discussion #4877.
- Fills the source-doc gaps found during release-prep review across
inference, policy tiers, command behavior, security boundaries, Hermes
dashboard/tooling, runtime context, and troubleshooting.
- Refreshes generated agent skills under `.agents/skills/` from the
current Fern docs output and upgrades Fern from `5.44.3` to `5.45.0`.

## Source summary
- #4037 -> `docs/reference/architecture.mdx`,
`docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents
system-only runtime context that stays out of visible chat.
- #4875 -> `docs/reference/architecture.mdx`,
`docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents
try-first sandbox network/filesystem guidance and clearer failure
classification.
- #4788 -> `docs/security/best-practices.mdx`,
`docs/about/release-notes.mdx`: Documents shared OpenClaw
device-approval policy for startup and connect.
- #4768 -> `docs/reference/network-policies.mdx`,
`docs/network-policy/integration-policy-examples.mdx`,
`docs/get-started/quickstart.mdx`,
`docs/get-started/quickstart-hermes.mdx`, `docs/reference/commands.mdx`:
Documents `weather`, `public-reference`, and Hermes managed-tool gateway
preset behavior.
- #3788 and #4864 -> `docs/reference/network-policies.mdx`,
`docs/reference/commands.mdx`: Documents non-interactive policy-tier
fail-fast behavior and interactive prompt fallback.
- #4756 and #4866 -> `docs/reference/commands.mdx`: Documents env-aware
default sandbox resolution for `list`, `status`, and `tunnel` commands.
- #4320 -> `docs/reference/commands.mdx`: Documents `$$nemoclaw tunnel
status` behavior.
- #4328 -> `docs/reference/commands.mdx`: Documents line-scoped policy
preset descriptions in `policy-list`.
- #4580 and #4748 -> `docs/reference/architecture.mdx`: Documents
package-managed OpenShell gateway service and Docker-driver
gateway-marker behavior.
- #4598 -> `docs/manage-sandboxes/lifecycle.mdx`: Documents concurrent
gateway/dashboard cleanup isolation by sandbox name and port.
- #4777 -> `docs/reference/troubleshooting.mdx`: Documents Docker GPU
patch rollback behavior.
- #4610 -> `docs/reference/troubleshooting.mdx`,
`docs/reference/commands.mdx`: Keeps mutable OpenClaw config permission
guidance aligned and removes skipped experimental wording.
- #4868 -> `docs/reference/commands.mdx`: Keeps `.dockerignore` handling
for custom `onboard --from <Dockerfile>` contexts in generated skills.
- #4870 -> `docs/reference/commands.mdx`,
`docs/manage-sandboxes/runtime-controls.mdx`: Documents
`NEMOCLAW_MINIMAL_BOOTSTRAP` and generated skill coverage.
- #4641 -> `docs/inference/inference-options.mdx`,
`docs/reference/troubleshooting.mdx`: Documents local NVIDIA NIM
platform-digest pulls and served-model id adoption.
- #4810 and #4867 -> `docs/inference/inference-options.mdx`: Documents
stable NGC managed-vLLM image lineage and DGX Station DeepSeek V4 Flash
coverage.
- #4852 -> `docs/inference/use-local-inference.mdx`,
`docs/reference/troubleshooting.mdx`: Documents Ollama model fit
filtering, 16K context floor, cold-load retry, and failed-model
exclusion.
- #4847 -> `docs/inference/switch-inference-providers.mdx`: Documents
API-family sync, Hermes `api_mode`, and Bedrock Runtime exception.
- #4800 -> `docs/inference/tool-calling-reliability.mdx`: Documents
Nemotron managed-inference native tool-search fallback.
- #4333 -> `docs/inference/switch-inference-providers.mdx`: Documents
interactive multimodal input prompting.
- #4086 -> `docs/reference/troubleshooting.mdx`: Keeps proxy bypass
normalization in generated troubleshooting coverage.
- #4811 and #4855 -> `docs/get-started/quickstart-hermes.mdx`: Documents
prebuilt Hermes dashboard assets and TUI recovery without runtime
rebuilds.
- #4854 -> `docs/inference/switch-inference-providers.mdx`,
`docs/reference/commands.mdx`: Documents Hermes proxy API-key
placeholder preservation during inference switches.
- #4248 -> `docs/manage-sandboxes/messaging-channels.mdx`,
`.agents/skills/`: Keeps messaging enrollment behavior aligned with
manifest-hook implementation.
- #4771 -> `docs/security/best-practices.mdx`,
`docs/security/credential-storage.mdx`: Documents Hermes
placeholder-only secret boundary for sandbox-visible runtime files.
- #4787 -> `docs/security/best-practices.mdx`,
`docs/about/release-notes.mdx`: Documents expanded memory scanner
examples for OpenAI project keys and Slack app-level tokens.
- #4848 -> `docs/reference/commands.mdx`: Documents OpenClaw skill
install mirroring into the agent home directory.
- #4790 -> `docs/about/release-notes.mdx`: Uses the prior release-prep
structure and generated `.agents/skills/` refresh as the template for
this release.

## Verification
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --doc-platform fern-mdx`
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ skills/
--prefix nemoclaw-user --doc-platform fern-mdx --dry-run`
- `npm run docs`
- `git diff --check`
- skip-term scan across `docs/`, `.agents/skills/`, and `skills/`
- `npm run build:cli`
- `npm run typecheck:cli`
- Commit and pre-push hook suites, including markdownlint, gitleaks,
env-var docs gate, docs-to-skills verification, and skills YAML tests

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* DeepSeek-V4-Flash now available as default inference model for DGX
Station.
* Hermes dashboard improved with dedicated port and OAuth-authenticated
tool gateway selection.
* Added weather and public-reference policy presets for expanded agent
capabilities.
* Enhanced Ollama model selection with GPU memory filtering and
automatic retry for timeouts.

* **Bug Fixes**
  * Improved policy tier validation to prevent invalid configurations.
* Better sandbox cleanup scoping by port to prevent conflicts across
deployments.
  * Added GPU patch failure recovery with automatic rollback.

* **Documentation**
* Expanded troubleshooting guides for inference, security, and sandbox
lifecycle.
  * Added .dockerignore best practices for custom deployments.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v0.0.60 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants