Skip to content

fix(onboard): restore Qwen3.6 27B FP8 as DGX Station vLLM default#4888

Merged
cv merged 1 commit into
mainfrom
fix/revert-dgx-station-vllm-to-qwen27b
Jun 6, 2026
Merged

fix(onboard): restore Qwen3.6 27B FP8 as DGX Station vLLM default#4888
cv merged 1 commit into
mainfrom
fix/revert-dgx-station-vllm-to-qwen27b

Conversation

@zyang-dev
Copy link
Copy Markdown
Contributor

@zyang-dev zyang-dev commented Jun 6, 2026

Summary

Restores Qwen 3.6 27B FP8 as the default managed-vLLM model for DGX Station because the DeepSeek V4 Flash recipe needs more accuracy validation.

Changes

  • Switched the DGX Station vLLM profile default back to Qwen/Qwen3.6-27B-FP8.
  • Kept deepseek-v4-flash registered as a supported managed-vLLM override.
  • Updated profile tests and docs to describe Qwen 27B FP8 as the DGX Station default.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Your Name your-email@example.com

Summary by CodeRabbit

  • Documentation

    • Updated the default model for DGX Station's managed vLLM profile from DeepSeek V4 Flash to Qwen3.6-27B-FP8.
    • Reordered documentation tables listing available model overrides.
    • Updated environment variable documentation for model configuration.
  • Tests

    • Updated test cases to verify the new default model behavior for DGX Station vLLM profiles.

Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 6, 2026

Too many files changed? Review this PR in Change Stack to see how the pieces fit before you dive in.

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: cbe899ec-e049-4a30-a220-a879eb7e45ad

📥 Commits

Reviewing files that changed from the base of the PR and between 1f0b4c5 and a11e7f4.

📒 Files selected for processing (8)
  • docs/inference/inference-options.mdx
  • docs/reference/commands-nemohermes.mdx
  • docs/reference/commands.mdx
  • src/lib/inference/vllm-models.test.ts
  • src/lib/inference/vllm-models.ts
  • src/lib/inference/vllm.test.ts
  • src/lib/inference/vllm.ts
  • test/detect-vllm-profile.test.ts

📝 Walkthrough

Walkthrough

This PR updates the default vLLM model for DGX Station from DeepSeek V4 Flash to Qwen3.6-27B-FP8. The change spans platform profile configuration, test assertions, documentation tables, and environment variable references to ensure consistent behavior across the codebase.

Changes

DGX Station vLLM Default Model Switch

Layer / File(s) Summary
Core model selection logic
src/lib/inference/vllm.ts
Introduces qwen27bFP8Model() helper and updates STATION_PROFILE.defaultModel to use the new model. The installVllm comment documenting per-platform defaults is adjusted to reflect Station: Qwen3.6-27B and Spark: Qwen3.6-35B-A3B NVFP4.
Model selector documentation
src/lib/inference/vllm-models.ts
selectVllmModelFromEnv documentation is updated to specify per-platform defaults for unset NEMOCLAW_VLLM_MODEL (Station: Qwen3.6-27B; Spark: Qwen3.6-35B-A3B NVFP4).
Test expectations for new model default
src/lib/inference/vllm.test.ts, src/lib/inference/vllm-models.test.ts, test/detect-vllm-profile.test.ts
vLLM profile detection tests assert Qwen/Qwen3.6-27B-FP8 with env value qwen3.6-27b for DGX Station. Test descriptions are clarified to distinguish DeepSeek as a managed-vLLM override rather than a Station-specific default.
User-facing documentation
docs/inference/inference-options.mdx, docs/reference/commands-nemohermes.mdx, docs/reference/commands.mdx
The inference options table promotes qwen3.6-27b to "default on the DGX Station profile" and demotes deepseek-v4-flash to "supported override." Environment variable reference docs reorder recognized slugs to reflect the new ordering.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#4867: Prior vLLM model default changes for DGX Station and related platform profile wiring updates.

Suggested labels

Platform: Station, Provider: vLLM, area: inference, bug-fix

Suggested reviewers

  • cv

Poem

🐰 A Station that once held a DeepSeek so bright,
Now shines with a Qwen's efficient light,
Tests and docs aligned in perfect accord,
Models swapped swift by reviewer's word! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main change: reverting DGX Station's default vLLM model from DeepSeek V4 Flash back to Qwen3.6 27B FP8, which is reflected consistently across documentation and test updates.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/revert-dgx-station-vllm-to-qwen27b

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 6, 2026

E2E Advisor Recommendation

Required E2E: None
Optional E2E: None

Workflow run

Full advisor summary

E2E Recommendation Advisor

Failed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-advisor-raw-output.txt

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 6, 2026

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Failed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-scenario-advisor-raw-output.txt

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 6, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 6, 2026

PR Review Advisor

Findings: 0 needs attention, 0 worth checking, 0 nice ideas
Top item: No actionable findings

Consider writing more tests for
  • **Runtime validation** — DGX Station managed-vLLM install with `NEMOCLAW_VLLM_MODEL` unset pre-downloads/serves `Qwen/Qwen3.6-27B-FP8` and reports that model in the install summary.. Unit tests cover the static profile/default selection and DeepSeek override registry behavior. Because the touched path starts containers and serves a model in managed-vLLM onboarding, runtime validation would improve confidence without being required to understand the code-level change.
  • **Runtime validation** — DGX Station managed-vLLM install with `NEMOCLAW_VLLM_MODEL=deepseek-v4-flash` still selects DeepSeek as an override and keeps the DeepSeek-specific serve flags.. Unit tests cover the static profile/default selection and DeepSeek override registry behavior. Because the touched path starts containers and serves a model in managed-vLLM onboarding, runtime validation would improve confidence without being required to understand the code-level change.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@zyang-dev zyang-dev added the v0.0.61 Release target label Jun 6, 2026
@cv cv merged commit af39e2a into main Jun 6, 2026
36 checks passed
@cv cv deleted the fix/revert-dgx-station-vllm-to-qwen27b branch June 6, 2026 06:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v0.0.61 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants