fix(vllm): switch DGX Spark from upstream nightly build to NGC vllm 26.05.post1 by zyang-dev · Pull Request #4810 · NVIDIA/NemoClaw

zyang-dev · 2026-06-05T06:12:12Z

Summary

Switches the DGX Spark vLLM profile from the upstream vllm/vllm-openai nightly to the stable NGC release `nvcr.io/nvidia/vllm:26.05.post1-py3

Changes

Added a VLLM_IMAGES map for profile-specific vLLM image selection.
Updated the DGX Spark profile to use nvcr.io/nvidia/vllm:26.05.post1-py3.
Updated vLLM profile detection tests for the Spark image change.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: zyang-dev 267119621+zyang-dev@users.noreply.github.com

Summary by CodeRabbit

Chores
- Centralized and updated vLLM container image configuration for supported platforms.
Tests
- Updated vLLM profile detection tests with latest supported container images.

…6.05.post1 Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>

coderabbitai · 2026-06-05T06:12:25Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1502864b-1c0a-43e7-bae1-2458d9f09841

📥 Commits

Reviewing files that changed from the base of the PR and between a92394f and 8472ddc.

📒 Files selected for processing (2)

src/lib/inference/vllm.ts
test/detect-vllm-profile.test.ts

📝 Walkthrough

Walkthrough

The PR centralizes vLLM container image configuration by introducing a new VLLM_IMAGES constant with explicit per-version image tags and updates three platform profiles (Spark, Station, generic Linux) to reference the centralized constant instead of prior single-image constants. Tests are updated to validate the new image references.

Changes

vLLM Image Configuration Centralization

Layer / File(s)	Summary
Image constant consolidation `src/lib/inference/vllm.ts`	A new `VLLM_IMAGES` object containing explicit nvcr.io vLLM image tags for post1 variants replaces prior `UPSTREAM_VLLM_IMAGE` and `NGC_VLLM_IMAGE` constants.
Platform profile updates `src/lib/inference/vllm.ts`	`SPARK_PROFILE`, `STATION_PROFILE`, and `GENERIC_LINUX_PROFILE` are updated to reference the appropriate image from the new `VLLM_IMAGES` constant.
Test alignment and cleanup `test/detect-vllm-profile.test.ts`	Spark profile test expectation is updated to the new `nvcr.io/nvidia/vllm:26.05.post1-py3` tag, and an unused Station test comment is removed.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

NVIDIA/NemoClaw#4619: Prior PR modifying vLLM image configuration for the same platform profiles with different image tag strategy.

Suggested labels

Platform: DGX Spark, Platform: Station, Provider: vLLM

Poem

🐰 Images once scattered, now gathered with care,
Constants consolidated throughout the air,
Post1 tags aligned from Spark to Station's care,
A cleaner foundation, refactored fair! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: switching DGX Spark from upstream nightly vLLM build to NGC stable release 26.05.post1, which is directly supported by the changeset.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/vllm-ngc-26-05

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-05T06:12:46Z

E2E Advisor Recommendation

Required E2E: None
Optional E2E: None

Workflow run

Full advisor summary

E2E Recommendation Advisor

Failed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-advisor-raw-output.txt

github-actions · 2026-06-05T06:12:47Z

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Failed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-scenario-advisor-raw-output.txt

github-actions · 2026-06-05T06:15:33Z

PR Review Advisor

Findings: 0 needs attention, 2 worth checking, 0 nice ideas
Top item: Pin or verify the managed Spark vLLM image

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

Managed Spark vLLM image is tag-pinned but not digest-pinned or verified (src/lib/inference/vllm.ts:54): The PR switches DGX Spark to a new third-party executable container image, `nvcr.io/nvidia/vllm:26.05.post1-py3`, which NemoClaw pulls and runs with GPU access, a mounted Hugging Face cache, and potentially forwarded Hugging Face token environment variables. The tag is versioned, but tags can still be republished or resolve differently over time, so this weakens reproducibility and supply-chain integrity for a high-trust runtime path.
- Recommendation: Pin the Spark image by immutable digest, for example `nvcr.io/nvidia/vllm:26.05.post1-py3@sha256:...`, or add an explicit image signature/digest verification step and document the trusted source for the digest.
- Evidence: The new `VLLM_IMAGES.ngc2605Post1` value is a tag-only image reference at `src/lib/inference/vllm.ts:54`; `SPARK_PROFILE.image` uses that value at `src/lib/inference/vllm.ts:117`. Existing nearby code pulls the image and later runs it with the HF cache mount and optional HF token forwarding.
Runtime validation is still needed for the new Spark vLLM image (test/detect-vllm-profile.test.ts:18): The updated unit test confirms that DGX Spark selects the new image string, but this change affects an infrastructure/runtime path. The new image still needs to support the existing Spark default model and serve command flags, including NVFP4/modelopt/FlashInfer settings and readiness on `/v1/models`.
- Recommendation: Add or identify targeted runtime/integration validation that starts the DGX Spark managed vLLM profile with `nvcr.io/nvidia/vllm:26.05.post1-py3`, serves `nvidia/Qwen3.6-35B-A3B-NVFP4`, and reaches the `/v1/models` readiness endpoint using the existing generated serve command.
- Evidence: The test only updates `expect(profile!.image).toBe("nvcr.io/nvidia/vllm:26.05.post1-py3")`; the runtime start path in `src/lib/inference/vllm.ts` still composes and runs `docker run ... vllm serve ...`, but no changed test exercises pull/start/readiness behavior with the new image.

🌱 Nice ideas

None.

Consider writing more tests for

**Runtime validation** — Validate on DGX Spark that managed vLLM with `nvcr.io/nvidia/vllm:26.05.post1-py3` starts the default `nvidia/Qwen3.6-35B-A3B-NVFP4` model and reaches `/v1/models`.. The unit test is sufficient for string/profile mapping, but managed vLLM is an infrastructure path where a container image tag change can fail only at pull/start/model-load/readiness time.
**Runtime validation** — Validate that the new Spark image accepts the existing generated NVFP4 serve command, including `--quantization modelopt`, FlashInfer/MoE flags, `--load-format fastsafetensors`, and Spark-specific serve environment exports.. The unit test is sufficient for string/profile mapping, but managed vLLM is an infrastructure path where a container image tag change can fail only at pull/start/model-load/readiness time.
**Runtime validation** — Validate that HF token forwarding for the Spark managed vLLM start path still uses key-only Docker env arguments and does not include token values in the composed Docker command.. The unit test is sufficient for string/profile mapping, but managed vLLM is an infrastructure path where a container image tag change can fail only at pull/start/model-load/readiness time.
**Runtime validation** — Keep the existing unit coverage that Station and generic Linux profiles continue selecting `nvcr.io/nvidia/vllm:26.03.post1-py3` after the shared image map refactor.. The unit test is sufficient for string/profile mapping, but managed vLLM is an infrastructure path where a container image tag change can fail only at pull/start/model-load/readiness time.
**Runtime validation is still needed for the new Spark vLLM image** — Add or identify targeted runtime/integration validation that starts the DGX Spark managed vLLM profile with `nvcr.io/nvidia/vllm:26.05.post1-py3`, serves `nvidia/Qwen3.6-35B-A3B-NVFP4`, and reaches the `/v1/models` readiness endpoint using the existing generated serve command.
**Acceptance clause:** `Tests added or updated for new or changed behavior` — add test evidence or identify existing coverage. The profile-selection unit expectation was updated, but no changed or identified runtime validation exercises pulling/starting the new Spark container image with the default model and serve flags.
**Acceptance clause:** `Docs updated for user-facing behavior changes` — add test evidence or identify existing coverage. No docs changed. Nearby user docs describe managed vLLM availability and the Spark default model, but I did not find a user-facing exact container-tag reference that would clearly require updating.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

## Summary - Adds the `v0.0.60` section to `docs/about/release-notes.mdx` using the dev announcement from discussion #4877. - Fills the source-doc gaps found during release-prep review across inference, policy tiers, command behavior, security boundaries, Hermes dashboard/tooling, runtime context, and troubleshooting. - Refreshes generated agent skills under `.agents/skills/` from the current Fern docs output and upgrades Fern from `5.44.3` to `5.45.0`. ## Source summary - #4037 -> `docs/reference/architecture.mdx`, `docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents system-only runtime context that stays out of visible chat. - #4875 -> `docs/reference/architecture.mdx`, `docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents try-first sandbox network/filesystem guidance and clearer failure classification. - #4788 -> `docs/security/best-practices.mdx`, `docs/about/release-notes.mdx`: Documents shared OpenClaw device-approval policy for startup and connect. - #4768 -> `docs/reference/network-policies.mdx`, `docs/network-policy/integration-policy-examples.mdx`, `docs/get-started/quickstart.mdx`, `docs/get-started/quickstart-hermes.mdx`, `docs/reference/commands.mdx`: Documents `weather`, `public-reference`, and Hermes managed-tool gateway preset behavior. - #3788 and #4864 -> `docs/reference/network-policies.mdx`, `docs/reference/commands.mdx`: Documents non-interactive policy-tier fail-fast behavior and interactive prompt fallback. - #4756 and #4866 -> `docs/reference/commands.mdx`: Documents env-aware default sandbox resolution for `list`, `status`, and `tunnel` commands. - #4320 -> `docs/reference/commands.mdx`: Documents `$$nemoclaw tunnel status` behavior. - #4328 -> `docs/reference/commands.mdx`: Documents line-scoped policy preset descriptions in `policy-list`. - #4580 and #4748 -> `docs/reference/architecture.mdx`: Documents package-managed OpenShell gateway service and Docker-driver gateway-marker behavior. - #4598 -> `docs/manage-sandboxes/lifecycle.mdx`: Documents concurrent gateway/dashboard cleanup isolation by sandbox name and port. - #4777 -> `docs/reference/troubleshooting.mdx`: Documents Docker GPU patch rollback behavior. - #4610 -> `docs/reference/troubleshooting.mdx`, `docs/reference/commands.mdx`: Keeps mutable OpenClaw config permission guidance aligned and removes skipped experimental wording. - #4868 -> `docs/reference/commands.mdx`: Keeps `.dockerignore` handling for custom `onboard --from <Dockerfile>` contexts in generated skills. - #4870 -> `docs/reference/commands.mdx`, `docs/manage-sandboxes/runtime-controls.mdx`: Documents `NEMOCLAW_MINIMAL_BOOTSTRAP` and generated skill coverage. - #4641 -> `docs/inference/inference-options.mdx`, `docs/reference/troubleshooting.mdx`: Documents local NVIDIA NIM platform-digest pulls and served-model id adoption. - #4810 and #4867 -> `docs/inference/inference-options.mdx`: Documents stable NGC managed-vLLM image lineage and DGX Station DeepSeek V4 Flash coverage. - #4852 -> `docs/inference/use-local-inference.mdx`, `docs/reference/troubleshooting.mdx`: Documents Ollama model fit filtering, 16K context floor, cold-load retry, and failed-model exclusion. - #4847 -> `docs/inference/switch-inference-providers.mdx`: Documents API-family sync, Hermes `api_mode`, and Bedrock Runtime exception. - #4800 -> `docs/inference/tool-calling-reliability.mdx`: Documents Nemotron managed-inference native tool-search fallback. - #4333 -> `docs/inference/switch-inference-providers.mdx`: Documents interactive multimodal input prompting. - #4086 -> `docs/reference/troubleshooting.mdx`: Keeps proxy bypass normalization in generated troubleshooting coverage. - #4811 and #4855 -> `docs/get-started/quickstart-hermes.mdx`: Documents prebuilt Hermes dashboard assets and TUI recovery without runtime rebuilds. - #4854 -> `docs/inference/switch-inference-providers.mdx`, `docs/reference/commands.mdx`: Documents Hermes proxy API-key placeholder preservation during inference switches. - #4248 -> `docs/manage-sandboxes/messaging-channels.mdx`, `.agents/skills/`: Keeps messaging enrollment behavior aligned with manifest-hook implementation. - #4771 -> `docs/security/best-practices.mdx`, `docs/security/credential-storage.mdx`: Documents Hermes placeholder-only secret boundary for sandbox-visible runtime files. - #4787 -> `docs/security/best-practices.mdx`, `docs/about/release-notes.mdx`: Documents expanded memory scanner examples for OpenAI project keys and Slack app-level tokens. - #4848 -> `docs/reference/commands.mdx`: Documents OpenClaw skill install mirroring into the agent home directory. - #4790 -> `docs/about/release-notes.mdx`: Uses the prior release-prep structure and generated `.agents/skills/` refresh as the template for this release. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ skills/ --prefix nemoclaw-user --doc-platform fern-mdx --dry-run` - `npm run docs` - `git diff --check` - skip-term scan across `docs/`, `.agents/skills/`, and `skills/` - `npm run build:cli` - `npm run typecheck:cli` - Commit and pre-push hook suites, including markdownlint, gitleaks, env-var docs gate, docs-to-skills verification, and skills YAML tests  ## Summary by CodeRabbit ## Release Notes * **New Features** * DeepSeek-V4-Flash now available as default inference model for DGX Station. * Hermes dashboard improved with dedicated port and OAuth-authenticated tool gateway selection. * Added weather and public-reference policy presets for expanded agent capabilities. * Enhanced Ollama model selection with GPU memory filtering and automatic retry for timeouts. * **Bug Fixes** * Improved policy tier validation to prevent invalid configurations. * Better sandbox cleanup scoping by port to prevent conflicts across deployments. * Added GPU patch failure recovery with automatic rollback. * **Documentation** * Expanded troubleshooting guides for inference, security, and sandbox lifecycle. * Added .dockerignore best practices for custom deployments.  --------- Co-authored-by: Carlos Villela <cvillela@nvidia.com>

fix(vllm): switch DGX Spark from upstream nightly build to NGC vllm 2…

8472ddc

…6.05.post1 Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>

zyang-dev added the v0.0.60 Release target label Jun 5, 2026

cv approved these changes Jun 5, 2026

View reviewed changes

cv merged commit 9096d1f into main Jun 5, 2026
35 checks passed

cv deleted the fix/vllm-ngc-26-05 branch June 5, 2026 07:10

coderabbitai Bot mentioned this pull request Jun 5, 2026

feat(inference): update DGX Station vLLM to DeepSeek V4 Flash #4867

Merged

12 tasks

miyoungc mentioned this pull request Jun 6, 2026

docs: refresh v0.0.60 release notes #4879

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(vllm): switch DGX Spark from upstream nightly build to NGC vllm 26.05.post1#4810

fix(vllm): switch DGX Spark from upstream nightly build to NGC vllm 26.05.post1#4810
cv merged 1 commit into
mainfrom
fix/vllm-ngc-26-05

zyang-dev commented Jun 5, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

github-actions Bot commented Jun 5, 2026

E2E Recommendation Advisor

Uh oh!

github-actions Bot commented Jun 5, 2026

E2E Scenario Advisor

Uh oh!

github-actions Bot commented Jun 5, 2026

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zyang-dev commented Jun 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

github-actions Bot commented Jun 5, 2026

E2E Advisor Recommendation

E2E Recommendation Advisor

Uh oh!

github-actions Bot commented Jun 5, 2026

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Uh oh!

github-actions Bot commented Jun 5, 2026

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zyang-dev commented Jun 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading