Skip to content

fix(preflight): reject WDDM placeholder GPU names on non-NVIDIA firmware#4062

Merged
ericksoa merged 2 commits into
mainfrom
fix/preflight-nvidia-gpu-validate-3988
May 22, 2026
Merged

fix(preflight): reject WDDM placeholder GPU names on non-NVIDIA firmware#4062
ericksoa merged 2 commits into
mainfrom
fix/preflight-nvidia-gpu-validate-3988

Conversation

@laitingsheng
Copy link
Copy Markdown
Contributor

@laitingsheng laitingsheng commented May 22, 2026

Summary

On Snapdragon X WSL2 hosts (Windows on ARM, no NVIDIA hardware), a d3d12/WDDM nvidia-smi.exe shim is published into the WSL distro and returns JMJWOA-Generic-GPU as the GPU name. detectGpu()'s primary nvidia-smi --query-gpu=name,memory.total path accepted any non-empty name, so preflight reported a fake "NVIDIA GPU detected (JMJWOA-Generic-GPU, 65471 MB)" and onboard proceeded down GPU-enabled code paths that later broke in rebuild's stricter CDI check.

Real DGX Spark legitimately reports the same placeholder string (see #3510), so the rejection must be conditional: trust the name when it contains NVIDIA or a known NVIDIA product family, otherwise require the firmware platform to vouch for it (spark / station / jetson).

Related Issue

Fixes #3988

Changes

  • src/lib/inference/nim.ts: add isPlausibleNvidiaGpuName() and NVIDIA_GPU_NAME_PATTERN. Primary detectGpu() path filters parsed rows through this check unless detectNvidiaPlatform() already classifies the host as spark/station/jetson; if all rows fail validation, returns null instead of fabricating an NVIDIA GPU.
  • src/lib/inference/nim.test.ts: two regression tests — Snapdragon X-style WSL2 placeholder rejected on generic Linux firmware; same placeholder accepted under DGX Spark firmware.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

  • Bug Fixes

    • Improved GPU detection on Windows-on-ARM by filtering implausible placeholder GPU names when platform firmware does not confirm an NVIDIA device; trusted detection is preserved when firmware identifies known NVIDIA platforms (e.g., DGX Spark).
  • Tests

    • Added regression tests covering rejection of placeholder names, vendor-prefixed placeholders, and acceptance when firmware validates the NVIDIA platform.

Review Change Stack

…are (#3988)

`detectGpu()`'s primary nvidia-smi path used to accept any name the CLI
returned, so a Snapdragon X WSL2 d3d12 shim that publishes
`nvidia-smi.exe` (and prints a `JMJWOA-Generic-GPU` placeholder for the
Snapdragon iGPU) made preflight report "NVIDIA GPU detected" on hosts
with no NVIDIA hardware. Onboard then proceeded with GPU paths that
later failed in rebuild's CDI check.

Gate the primary path with `isPlausibleNvidiaGpuName()` so a name
without an `NVIDIA` token and without a recognised NVIDIA product
family is only trusted when the firmware platform vouches for it
(`spark` / `station` / `jetson`). Real DGX Spark, which also returns
the same placeholder string (#3510), keeps detection via the firmware
classification path.

Fixes #3988

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@laitingsheng laitingsheng added Getting Started Use this label to identify setup, installation, or onboarding issues. NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). fix labels May 22, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2d9ff502-7f97-48f2-ab0b-defbddf65916

📥 Commits

Reviewing files that changed from the base of the PR and between 9f3edfa and 655efbe.

📒 Files selected for processing (2)
  • src/lib/inference/nim.test.ts
  • src/lib/inference/nim.ts

📝 Walkthrough

Walkthrough

Adds a GPU-name plausibility check and firmware-gated filtering to NVIDIA detection, rejecting WSL2/WDDM shim placeholders like "JMJWOA-Generic-GPU" on non-NVIDIA platforms and retaining them only when firmware confirms an NVIDIA platform.

Changes

NVIDIA GPU detection robustness

Layer / File(s) Summary
GPU name plausibility pattern and helper
src/lib/inference/nim.ts
Defines positive-match NVIDIA name regex, denylist for WSL shim placeholder, and implements isPlausibleNvidiaGpuName() to combine them.
NVIDIA detection with firmware confirmation
src/lib/inference/nim.ts
detectGpu() now calls detectNvidiaPlatform() early, uses firmware confirmation to trust all parsed nvidia-smi rows or otherwise filters them via isPlausibleNvidiaGpuName(), returning null if no trusted GPUs remain and computing totals from the trusted subset.
WSL2 placeholder name rejection and acceptance tests
src/lib/inference/nim.test.ts
Adds tests mocking nvidia-smi for JMJWOA-Generic-GPU and NVIDIA JMJWOA-Generic-GPU, asserting null on non-NVIDIA firmware and acceptance when firmware reports NVIDIA DGX Spark.

Sequence Diagram

sequenceDiagram
  participant detectGpu
  participant detectNvidiaPlatform
  participant nvidia-smi
  participant isPlausibleNvidiaGpuName
  detectGpu->>detectNvidiaPlatform: detectNvidiaPlatform()
  detectGpu->>nvidia-smi: run --query-gpu=name,memory.total
  nvidia-smi-->>detectGpu: parsed GPU rows
  alt firmware confirms NVIDIA
    detectGpu->>detectGpu: trust all parsed GPUs and compute totals
  else firmware does not confirm
    detectGpu->>isPlausibleNvidiaGpuName: filter parsed rows by name
    isPlausibleNvidiaGpuName-->>detectGpu: trustedRows
    alt trustedRows non-empty
      detectGpu->>detectGpu: compute totals from trustedRows
    else trustedRows empty
      detectGpu-->>detectGpu: return null
    end
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • ericksoa

Poem

A rabbit hopped through WSL2 fields,
where phantom GPUs wear false shields.
I sniffed their names and checked the board,
firmware vouched, or else ignored.
Now real NVIDIA gleams; the phantoms yield. 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main fix: rejecting placeholder GPU names from WDDM on non-NVIDIA firmware, which directly addresses the core issue of false-positive GPU detection.
Linked Issues check ✅ Passed The PR fully implements the objectives from issue #3988: adds firmware-aware GPU name validation, filters implausible names unless firmware confirms NVIDIA platform, returns null when no valid GPUs remain, and includes regression tests covering both rejection and acceptance cases.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the false-positive GPU detection issue: test refinements validate firmware-aware filtering and the implementation adds plausibility gates with platform-aware logic, with no unrelated modifications present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/preflight-nvidia-gpu-validate-3988

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 22, 2026

E2E Advisor Recommendation

Required E2E: gpu-e2e, wsl-e2e
Optional E2E: gpu-double-onboard-e2e

Auto-dispatched E2E: gpu-e2e via nightly-e2e.yaml at 655efbe3decdf235506f48cf5ac6e750f29a3570nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: medium

Required E2E

  • gpu-e2e (high): Runs the real GPU onboarding/user flow on an NVIDIA GPU runner. This is the highest-signal existing E2E to ensure the new plausibility filtering still accepts real NVIDIA GPUs and does not regress GPU preflight/local inference onboarding.
  • wsl-e2e (high): The change specifically guards a WSL2 false-positive GPU detection path. The WSL E2E validates the real Windows→WSL install/onboard environment and helps catch platform-specific preflight regressions even though the exact Snapdragon shim hardware is not present on the standard runner.

Optional E2E

  • gpu-double-onboard-e2e (high): Useful adjacent confidence for repeated GPU/onboard flows after detectGpu() changes, but the PR does not touch Ollama proxy token persistence, so it should not block merge unless GPU preflight churn is suspected.

New E2E recommendations

  • wsl-arm-gpu-detection (high): Existing WSL E2E does not reproduce Windows-on-ARM Snapdragon X hosts where a d3d12/WDDM nvidia-smi.exe reports JMJWOA-Generic-GPU despite no NVIDIA hardware. Add an E2E or scenario with a controlled nvidia-smi shim plus WSL firmware fixture asserting onboard preflight reports no local NIM/NVIDIA GPU.
    • Suggested test: WSL2 Snapdragon/WDDM placeholder GPU preflight regression E2E
  • dgx-spark-gpu-detection (high): The fix intentionally preserves DGX Spark acceptance of JMJWOA-Generic-GPU when firmware confirms NVIDIA DGX Spark. Current coverage is unit-level; add hardware-backed or fixture-backed Spark preflight coverage so future name filtering does not break Spark installs.
    • Suggested test: DGX Spark placeholder GPU preflight/install E2E
  • local-nim-selection (medium): Existing GPU E2E uses Ollama, not Local NVIDIA NIM. Since detectGpu() controls nimCapable and the Local NIM menu/selection path, add an E2E that validates local NIM model selection/startup on a capable GPU or a safe mocked container path.
    • Suggested test: Local NIM capable GPU onboarding E2E

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ⚠️ No requested jobs ran

Run: 26280077140
Target ref: 9f3edfaf4df0da23ef3d82ac84f22f80e912c7b2
Workflow ref: main
Requested jobs: gpu-e2e
Summary: 0 passed, 0 failed, 1 skipped

Job Result
gpu-e2e ⏭️ skipped

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/lib/inference/nim.test.ts (1)

321-381: ⚡ Quick win

Add a generic-firmware rejection test for NVIDIA JMJWOA-Generic-GPU.

These new tests are good, but they only reject the unprefixed placeholder. Add one case asserting detectGpu() returns null for vendor-prefixed placeholder on generic firmware to lock the intended guardrail.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/inference/nim.test.ts` around lines 321 - 381, Add a new test case in
src/lib/inference/nim.test.ts that mirrors the existing "rejects WDDM
placeholder names..." test but returns the vendor-prefixed placeholder string
"NVIDIA JMJWOA-Generic-GPU, 65471" from the mocked runCapture; use
loadNimWithMockedRunner to get nimModule and wrap the assertion in
withFirmwareModel("Microsoft Corporation Virtual Machine", ...) and assert
nimModule.detectGpu() === null to ensure vendor-prefixed placeholders are also
rejected on generic firmware.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/lib/inference/nim.test.ts`:
- Around line 321-381: Add a new test case in src/lib/inference/nim.test.ts that
mirrors the existing "rejects WDDM placeholder names..." test but returns the
vendor-prefixed placeholder string "NVIDIA JMJWOA-Generic-GPU, 65471" from the
mocked runCapture; use loadNimWithMockedRunner to get nimModule and wrap the
assertion in withFirmwareModel("Microsoft Corporation Virtual Machine", ...) and
assert nimModule.detectGpu() === null to ensure vendor-prefixed placeholders are
also rejected on generic firmware.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c459a9c1-e934-46eb-9b6e-b6397430341f

📥 Commits

Reviewing files that changed from the base of the PR and between ef84117 and 9f3edfa.

📒 Files selected for processing (2)
  • src/lib/inference/nim.test.ts
  • src/lib/inference/nim.ts

Comment thread src/lib/inference/nim.ts
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 22, 2026

PR Review Advisor

Recommendation: blocked
Confidence: high
Analyzed HEAD: 655efbe3decdf235506f48cf5ac6e750f29a3570
Findings: 4 blocker(s), 1 warning(s), 0 suggestion(s)

This is an automated advisory review. A human maintainer must make the final merge decision.

Limitations: Review used trusted deterministic PR metadata and the supplied diff only; no scripts, tests, package-manager commands, or E2E jobs were executed by this advisor.; Issue #3988 had no comments in trusted context, so acceptance mapping uses the issue body clauses plus PR/E2E/review-thread metadata.; Runtime behavior on real WSL2 Snapdragon X and DGX Spark hardware is not proven by the available evidence.; The PR review advisor is advisory and does not replace maintainer review.

Workflow run

Full advisor summary

PR Review Advisor

Base: origin/main
Head: HEAD
Analyzed SHA: 655efbe3decdf235506f48cf5ac6e750f29a3570
Recommendation: blocked
Confidence: high

The targeted GPU-name validation fix now covers the vendor-prefixed WDDM placeholder, but merge is blocked by GitHub mergeStateStatus=BLOCKED, missing required E2E for the current head SHA, and monolith-growth policy blockers.

Gate status

  • CI: pass — 5 required status context(s) completed with no failures for 655efbe. Non-required contexts still pending: 3; failed: 0.
  • Mergeability: fail — GitHub GraphQL reports mergeStateStatus=BLOCKED for PR fix(preflight): reject WDDM placeholder GPU names on non-NVIDIA firmware #4062 at head 655efbe; reviewDecision=REVIEW_REQUIRED.
  • Review threads: pass — 1 review thread(s), all resolved. CodeRabbit's vendor-prefixed placeholder concern is marked addressed in commit 655efbe and isResolved=true.
  • Risky code tested: warning — Risky areas detected: credentials/inference/network. Unit tests were added for unprefixed placeholder rejection, vendor-prefixed placeholder rejection, and Spark firmware-vouched acceptance, but runtime/GPU/WSL behavior still requires E2E; required gpu-e2e was skipped and wsl-e2e has no passing result for this SHA.

🔴 Blockers

  • PR is blocked by GitHub mergeability state: GitHub reports mergeStateStatus=BLOCKED for the current head SHA even though required status contexts are passing. The PR also has reviewDecision=REVIEW_REQUIRED.
    • Recommendation: Resolve branch protection, review, and required validation blockers so mergeability becomes unblocked for the same head SHA.
    • Evidence: GraphQL pullRequest.mergeStateStatus=BLOCKED, reviewDecision=REVIEW_REQUIRED, headRefOid=655efbe3decdf235506f48cf5ac6e750f29a3570.
  • Required E2E jobs are missing for the current head SHA: The E2E Advisor required gpu-e2e and wsl-e2e for this GPU/WSL detection change. For head 655efbe, the auto-dispatched gpu-e2e selective run reported the job skipped, and no passing wsl-e2e result was provided.
    • Recommendation: Ensure required gpu-e2e and wsl-e2e complete successfully for 655efbe before merge. Consider the advisor's suggested WSL ARM placeholder and DGX Spark scenarios as follow-up coverage if not feasible in existing E2E.
    • Evidence: E2E Advisor comment: Required E2E: gpu-e2e, wsl-e2e. Selective E2E Results run 26280458088 target ref 655efbe: gpu-e2e skipped; 0 passed, 0 failed, 1 skipped.
  • Test monolith grew beyond policy threshold (src/lib/inference/nim.test.ts:1): The already-large nim.test.ts file grew by 88 lines, exceeding the repository monolith-growth threshold of 20 or more lines.
    • Recommendation: Extract targeted fixtures/helpers or otherwise offset the growth before merge, unless maintainers consciously waive the monolith-growth policy for this regression coverage.
    • Evidence: Monolith delta: src/lib/inference/nim.test.ts baseLines=1153, headLines=1241, delta=88, severity=blocker.
  • Runtime inference monolith grew beyond policy threshold (src/lib/inference/nim.ts:1): The already-large runtime inference module grew by 37 lines, exceeding the repository monolith-growth threshold of 20 or more lines.
    • Recommendation: Extract GPU-name plausibility/denylist logic into a focused helper module or otherwise offset the growth before merge, unless maintainers consciously waive the monolith-growth policy.
    • Evidence: Monolith delta: src/lib/inference/nim.ts baseLines=673, headLines=710, delta=37, severity=blocker.

🟡 Warnings

  • Unit coverage is useful but does not prove real WSL/DGX/GPU behavior (src/lib/inference/nim.test.ts:300): The added tests directly cover the important mocked detection cases, including the resolved vendor-prefix bypass. However, detectGpu() drives onboarding/local inference/sandbox GPU decisions, and mocked nvidia-smi plus firmware fixtures cannot fully prove real WSL2 shim, DGX Spark firmware, GPU runner, or downstream CDI behavior.
    • Recommendation: Keep the unit tests, and add or require successful E2E evidence for gpu-e2e and wsl-e2e on this head SHA. If possible, add follow-up scenario coverage for WSL ARM placeholder detection and DGX Spark placeholder acceptance.
    • Evidence: New tests reject JMJWOA-Generic-GPU, reject NVIDIA JMJWOA-Generic-GPU on generic firmware, and accept placeholder under NVIDIA DGX Spark firmware. E2E Advisor still marks gpu-e2e and wsl-e2e required.

🔵 Suggestions

  • None.

Acceptance coverage

  • partial — On Windows ARM reference host (Snapdragon X laptop, ARM64 WSL2 Ubuntu-24.04, no NVIDIA hardware), nemoclaw onboard's preflight reports:: The PR adds detectGpu() unit coverage for WSL-style firmware model "Microsoft Corporation Virtual Machine" with placeholder nvidia-smi output returning null. No real Windows ARM reference host/WSL2 onboard E2E output is available for the current head SHA.
  • partial — ✓ NVIDIA GPU detected (JMJWOA-Generic-GPU, 65471 MB): The new unit test rejects WDDM placeholder names on hosts without NVIDIA firmware (#3988) returns JMJWOA-Generic-GPU, 65471 from mocked nvidia-smi and expects detectGpu() to be null. The exact preflight output line is not asserted in an onboard-level test.
  • partial — and proceeds as if an NVIDIA GPU is present.: Returning null from detectGpu() for the mocked placeholder should prevent GPU-capable downstream selection in callers, but no caller-level onboard/CDI E2E was provided for head 655efbe.
  • metJMJWOA-Generic-GPU is the Snapdragon X iGPU identifier — the host has no NVIDIA hardware at all.: The implementation adds NVIDIA_GPU_NAME_DENYLIST_PATTERN for JMJWOA-Generic-GPU and filters it unless firmware confirms an NVIDIA platform. Tests cover both unprefixed and NVIDIA -prefixed placeholder rejection on generic firmware.
  • partial — The preflight's GPU detection logic appears to accept any GPU-like device as "NVIDIA GPU detected" without verifying vendor/driver.: The diff adds isPlausibleNvidiaGpuName() and firmware-vouched filtering in detectGpu(), improving vendor/platform validation. End-to-end preflight rendering remains unverified for the current SHA.
  • partialImpact: Without --no-gpu, onboard's downstream CDI / GPU paths assume NVIDIA-CDI is available and break later.: The direct detectGpu() false positive is addressed in unit tests, but there is no completed onboard/CDI/gateway E2E showing downstream GPU paths no longer activate on affected WSL hardware.
  • unknown — With --no-gpu, the user works around it but the misleading preflight message is still printed.: The patch changes detection and unit tests only. No onboard --no-gpu output assertion or E2E result is present.
  • unknown — Sandbox config + downstream code paths (e.g. rebuild's CDI preflight, openshell gateway start --gpu) then trip on the actual absence.: No rebuild CDI preflight, sandbox GPU configuration, or openshell gateway start --gpu test/E2E evidence is provided for this PR.
  • unknown — Device: Windows ARM reference host (Snapdragon X laptop, no NVIDIA hardware): No hardware-backed Windows ARM reference host validation is reported; tests use mocked runCapture and firmware model fixtures.
  • unknown — OS: Ubuntu 24.04.4 LTS Noble Numbat inside WSL2: The E2E Advisor required wsl-e2e, but no passing wsl-e2e result is present for the head SHA.
  • unknown — Architecture: aarch64 (Snapdragon X): No ARM64 Snapdragon runtime validation is reported.
  • unknown — Node.js: v22.22.2: CI status passed, but the trusted context does not map tests to this exact runtime environment.
  • unknown — npm: 10.9.7: No environment-specific validation for npm 10.9.7 is reported.
  • unknown — Docker: 29.1.3, build 29.1.3-0ubuntu3~24.04.2: No Docker runtime path or WSL onboard E2E result is available for this environment.
  • unknown — OpenShell CLI: 0.0.39: No OpenShell CLI 0.0.39 onboard/gateway E2E is available for the current head SHA.
  • partial — NemoClaw: v0.1.0 (main HEAD cfa817b): The PR targets the reported main regression by changing current source files still present at src/lib/inference/nim.ts and nim.test.ts. It does not validate the original v0.1.0 build.
  • unknown — OpenClaw: 2026.4.24 (cbcfdf6, bundled): No OpenClaw runtime interaction is tested by the diff or E2E results.
  • unknown — 1. On a Snapdragon X laptop with no NVIDIA GPU, install NemoClaw v0.1.0 and OpenShell 0.0.39 in WSL2 Ubuntu-24.04.: No real Snapdragon X/WSL2 install E2E was completed; the E2E Advisor specifically called for wsl-e2e and suggested a WSL ARM GPU detection scenario.
  • unknown — 2. Run onboard with non-interactive flags + --no-gpu:: No onboard command test or E2E assertion covers the non-interactive --no-gpu invocation.
  • unknown — export NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE=1: The diff does not exercise this environment variable.

Security review

  • pass — Category 1: Secrets and Credentials: No hardcoded secrets, API keys, passwords, tokens, credential files, or credential logging are added in src/lib/inference/nim.ts or src/lib/inference/nim.test.ts.
  • pass — Category 2: Input Validation and Data Sanitization: The PR improves validation of user/environment-controlled nvidia-smi output by parsing rows, filtering through known NVIDIA product/vendor patterns, and denylisting the ambiguous JMJWOA-Generic-GPU placeholder unless firmware confirms spark/station/jetson. The previous vendor-prefix bypass is covered by a new negative unit test.
  • pass — Category 3: Authentication and Authorization: No authentication, authorization, endpoint access control, token validation, or user-resource permission logic is changed.
  • pass — Category 4: Dependencies and Third-Party Libraries: No new dependencies, package manifests, registries, installer sources, or version pins are changed.
  • pass — Category 5: Error Handling and Logging: The change does not add sensitive logging or error responses. Existing ignored-probe behavior remains unchanged.
  • pass — Category 6: Cryptography and Data Protection: Not applicable — no cryptographic operations or data-protection mechanisms are changed.
  • warning — Category 7: Configuration and Security Headers: No HTTP security headers are involved, but GPU detection feeds onboarding/sandbox GPU configuration. The code change improves safe defaults for non-NVIDIA placeholder hosts, yet required runtime E2E is missing, so the secure configuration effect is not fully proven in WSL/GPU environments.
  • warning — Category 8: Security Testing: Security-relevant regression tests were added for placeholder rejection, vendor-prefixed placeholder rejection, and firmware-vouched Spark acceptance. However, required gpu-e2e was skipped and wsl-e2e has no passing result for the current head SHA, leaving hardware/runtime coverage incomplete.
  • warning — Category 9: Holistic Security Posture: The intent improves posture by preventing false NVIDIA GPU enablement on WSL Snapdragon hosts, reducing risk of inappropriate sandbox/GPU paths. Residual risk remains due to missing required E2E, unverified downstream onboard/rebuild/CDI behavior, and monolith-growth policy blockers.

Test / E2E status

  • Test depth: e2e_required — Runtime/sandbox/infrastructure paths need real execution coverage: src/lib/inference/nim.ts changes detectGpu(), which influences onboarding, local inference capability, sandbox GPU decisions, and downstream CDI behavior. Unit tests cover the core classifier, including the vendor-prefixed placeholder, but cannot prove real WSL, DGX Spark, nvidia-smi, GPU runner, or caller behavior.
  • E2E Advisor: missing
  • Required E2E jobs: gpu-e2e, wsl-e2e
  • Missing for analyzed SHA: gpu-e2e, wsl-e2e

✅ What looks good

  • The PR is narrowly scoped to the reported GPU detection bug in active files that still exist on the branch; trusted openPrOverlaps is empty.
  • The CodeRabbit-reported vendor-prefix bypass was addressed in commit 655efbe with an explicit JMJWOA-Generic-GPU denylist and a regression test for NVIDIA JMJWOA-Generic-GPU on generic firmware.
  • The primary parser still uses argv arrays for nvidia-smi and preserves the last-comma split behavior for GPU names containing commas.
  • The Spark/DGX compatibility path is preserved by allowing firmware-confirmed NVIDIA platforms to vouch for placeholder names.
  • No dependency, workflow, Dockerfile, installer, credential, network-policy, or blueprint surface was changed.

Review completeness

  • Review used trusted deterministic PR metadata and the supplied diff only; no scripts, tests, package-manager commands, or E2E jobs were executed by this advisor.
  • Issue [WSL2][Onboard] preflight false-positive: Snapdragon iGPU reported as "NVIDIA GPU detected" on Windows ARM #3988 had no comments in trusted context, so acceptance mapping uses the issue body clauses plus PR/E2E/review-thread metadata.
  • Runtime behavior on real WSL2 Snapdragon X and DGX Spark hardware is not proven by the available evidence.
  • The PR review advisor is advisory and does not replace maintainer review.
  • Human maintainer review required: yes

PR #4062 review fix.

The original name guard accepted any name matching `\bNVIDIA\b`, which
made `NVIDIA JMJWOA-Generic-GPU` plausible on generic Linux firmware.
Some WDDM/d3d12 shims (the same family that originally surfaced as
bare `JMJWOA-Generic-GPU` on Snapdragon X WSL2) may add the vendor
prefix, which would re-open #3988.

Add `NVIDIA_GPU_NAME_DENYLIST_PATTERN` so the placeholder is treated as
suspect regardless of any `NVIDIA ` prefix. Real DGX Spark (#3510)
still passes via the firmware-vouch path in `detectGpu()`.

Adds a regression test mirroring the CodeRabbit review comment on
PR #4062.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ⚠️ No requested jobs ran

Run: 26280458088
Target ref: 655efbe3decdf235506f48cf5ac6e750f29a3570
Workflow ref: main
Requested jobs: gpu-e2e
Summary: 0 passed, 0 failed, 1 skipped

Job Result
gpu-e2e ⏭️ skipped

@laitingsheng laitingsheng added the v0.0.50 Release target label May 22, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ⚠️ No requested jobs ran

Run: 26295296470
Target ref: 655efbe3decdf235506f48cf5ac6e750f29a3570
Workflow ref: main
Requested jobs: gpu-e2e
Summary: 0 passed, 0 failed, 1 skipped

Job Result
gpu-e2e ⏭️ skipped

@github-actions
Copy link
Copy Markdown
Contributor

Brev E2E (gpu): PASSED on branch fix/preflight-nvidia-gpu-validate-3988See logs

Copy link
Copy Markdown
Contributor

@ericksoa ericksoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. The WDDM placeholder guard is narrowly scoped to non-NVIDIA firmware and includes the vendor-prefixed placeholder regression coverage CodeRabbit asked for. Live checks are green at 655efbe, including WSL and dedicated Brev GPU branch validation.

@ericksoa ericksoa merged commit 6bebb22 into main May 22, 2026
30 checks passed
cv pushed a commit that referenced this pull request May 22, 2026
## Summary
Refreshes the NemoClaw docs for the v0.0.49 hardening release, including
release notes, command reference updates, troubleshooting guidance,
version metadata, and regenerated user skills.

## Changes
- #3796, #3854, #3863, #3866, #3984, #4001, #4011, #4013, #4020, #4022,
#4023, #4060, #4062 -> `docs/about/release-notes.mdx`: Adds the v0.0.49
hardening release summary covering gateway reliability,
status/doctor/shields and debug UX, OpenClaw compatibility, messaging
channel teardown, Hermes policy scoping, snapshots, source installs and
Docker group security note, GPU preflight, CLI usage, E2E, and CI
improvements.
- #3796 -> `docs/manage-sandboxes/backup-restore.mdx` and
`docs/reference/commands.mdx`: Documents `snapshot restore --to`
overwrite protection and the `--force` opt-in.
- #3863, #4013, #4020, #4023 -> `docs/reference/commands.mdx`: Documents
missing channel argument usage, sandbox-scoped custom preset matching,
session policy preset sync, and gateway failure classification (uses the
real probe states from `src/lib/status-command-deps.ts`).
- #4022, #4060, #4062 -> `docs/reference/troubleshooting.mdx`: Adds
guidance for gateway-down `connect`, source checkout OpenShell
bootstrapping, WDDM placeholder GPU names, and Jetson sandbox GPU
passthrough.
- Release prep -> `docs/project.json`, `docs/versions1.json`,
`.agents/skills/nemoclaw-user-*`: Bumps docs metadata to 0.0.49 and
refreshes generated user skills from the Fern docs.

## Type of Change
- [ ] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [ ] Doc only (prose changes, no code sample modifications)
- [x] Doc only (includes code sample changes)

## Verification
- [x] `npx prek run --all-files` passes
- [ ] `npm test` passes
- [ ] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [x] Docs updated for user-facing behavior changes
- [ ] `make docs` builds without warnings (doc changes only)
- [x] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

\`make docs\` was attempted locally but did not complete because \`npm\`
returned \`403 Forbidden\` while fetching \`fern-api\` from
\`registry.npmjs.org\` in the sandboxed environment.

---
Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Documentation**
* Released v0.0.49 with reliability and compatibility improvements
including faster gateway failure diagnostics and safer snapshot restore
behavior
* Enhanced snapshot restore documentation with `--to` cloning and
`--force` overwrite requirements
* Expanded troubleshooting guides for source installs, GPU setup, and
gateway recovery
* Clarified Docker group access requirements and improved CLI command
reference

* **Chores**
  * Version bumped to 0.0.49

<!-- review_stack_entry_start -->

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/4078?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

<!-- review_stack_entry_end -->

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix Getting Started Use this label to identify setup, installation, or onboarding issues. NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). v0.0.50 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[WSL2][Onboard] preflight false-positive: Snapdragon iGPU reported as "NVIDIA GPU detected" on Windows ARM

2 participants