fix(inference): pull NIM by platform digest, use served model id by hunglp6d · Pull Request #4641 · NVIDIA/NemoClaw

hunglp6d · 2026-06-02T03:04:21Z

Summary

Fixes local NVIDIA NIM onboarding (NEMOCLAW_EXPERIMENTAL=1 nemoclaw onboard →
Local NVIDIA NIM) on Docker 29.x with the containerd image store. Three defects on
that path, all reproduced and verified end-to-end on DGX Spark (GB10, Docker 29.5.2):

Pull fails with error from registry: Incorrect Repository Format. NIM :latest
tags are multi-arch OCI indexes that also carry buildkit attestation manifests
(platform: unknown/unknown). The containerd image store pulls the per-arch layers,
then fetches the attestation manifest, which nvcr.io rejects — aborting after every
layer with no usable image. (Older Docker without the containerd store never fetched
it, so the break tracks the Docker/image-store version, not the host or repo path.)
Health check times out. A 30B NIM loads in ~5 min on GB10, right at the old 300s
wait; larger models always time out.
Endpoint validation 404s. NIM serves the id from its image config
(nvidia/nemotron-3-nano), which differs from the catalog name
(nvidia/nemotron-3-nano-30b-a3b); validating/routing with the catalog id returns 404.

Related Issue

Fixes #3885

Changes

src/lib/inference/nim.ts
- pullNimImage resolves the index to the host-arch image-manifest digest
  (docker manifest inspect → match platform.architecture/os, skipping
  unknown/unknown attestation entries), pulls that single manifest by digest
  (no index walk → no attestation fetch), and re-tags it to the original ref.
  Falls back to a plain tag pull when the ref is not a resolvable multi-arch index,
  and logs which path was taken.
- adoptServedModelId reads the served id from /v1/models and uses it for
  validation/route/config when it differs from the catalog name. The served id is
  local-service-controlled, so it is validated with isSafeModelId before adoption
  (mirrors the adjacent local-vLLM detected-model boundary); an unsafe id is ignored
  with a diagnostic and never echoed into logs.
- waitForNimHealth default raised 300s → 1200s for slow first-loads (no new env var).
src/lib/onboard.ts: calls the helpers above; kept net-neutral per
codebase-growth-guardrails.
src/lib/adapters/docker/inspect.ts, image.ts: add dockerManifestInspect
and dockerTag.
Tests: digest resolution, served-id adoption (incl. unsafe-id rejection), thrown
dockerManifestInspect fallback, and manifest-inspect/tag argv.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Hung Le hple@nvidia.com

Summary by CodeRabbit

New Features
- Docker image tagging and manifest inspection for improved image handling.
- Automatic discovery and adoption of the NIM served-model ID from the local model endpoint.
Improvements
- Prefer host-architecture manifests when pulling multi-arch images and retag to original references.
- Onboarding now adopts served model IDs earlier and tightens inference API validation.
- Increased default health-check timeout for model startup.
Tests
- Added unit tests for manifest selection, image pulling, and served-model parsing.

copy-pr-bot · 2026-06-02T03:04:25Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-06-02T03:04:29Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds Docker manifest inspect/tag helpers; platform-aware OCI manifest digest selection with digest-based pull-and-retag; served-model-id parsing and optional adoption; integrates adoption into onboarding; increases DEFAULT_NIM_HEALTH_TIMEOUT_SECONDS to 1200.

Changes

NIM Platform-Aware Image Pulling and Served Model Discovery

Layer / File(s)	Summary
Docker manifest inspection and tagging helpers `src/lib/adapters/docker/image.ts`, `src/lib/adapters/docker/inspect.ts`, `src/lib/adapters/docker/index.test.ts`	Adds `dockerTag` and `dockerManifestInspect` wrappers around `dockerRun`/`dockerCapture` and tests asserting correct CLI argv and options.
Platform-aware pull utilities and resolver `src/lib/inference/nim.ts`, `src/lib/inference/nim.test.ts`	Adds `nodeArchToOci`, `selectPlatformManifestDigest`, `imageRepository`, and a pull resolver that inspects OCI index JSON, selects the host-arch Linux manifest digest, pulls by digest, then retags to the original ref; tests include mock OCI indexes and pull/retag workflows with fallbacks.
NIM served model ID resolution `src/lib/inference/nim.ts`, `src/lib/inference/nim.test.ts`	Exports `parseServedModelId`, `getServedModelId`, and `adoptServedModelId` to parse `/v1/models` responses and optionally override catalog model ids; tests cover parsing, endpoint responses, adoption rules, and unsafe-id handling.
Onboarding integration and API guard `src/lib/inference/nim.ts`, `src/lib/onboard.ts`	Adopts the served model id during local NIM onboarding and refines the post-validation API forcing guard; updates related health-check comment and timeout constant to 1200 seconds.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Suggested labels

bug-fix, platform: dgx-spark, Docker

Suggested reviewers

cv

Poem

🐰 I sniffed the manifests at dawn,

Found arch-bound digests, neatly drawn.
I pulled by hash, then gave a tag —
The NIM awoke; no more the snag.
Hooray for carrots, code, and wag!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 45.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the two main changes: pulling NIM by platform digest and using served model IDs.
Linked Issues check	✅ Passed	The PR fully addresses all coding requirements from issue `#3885`: digest-based pulling for multi-arch indexes, fallback to plain tag pull with logging, served model ID adoption with safety validation, and increased health-check timeout.
Out of Scope Changes check	✅ Passed	All changes directly support the linked issue objectives: docker helper additions, NIM pull/model-id logic, onboarding integration, and comprehensive test coverage for digest resolution and served-id adoption.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/nim-pull-attestation-index-digest

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-02T03:06:34Z

E2E Advisor Recommendation

Required E2E: onboard-inference-smoke-e2e, inference-routing-e2e, cloud-onboard-e2e
Optional E2E: gpu-e2e, cloud-inference-e2e

Dispatch hint: cloud-onboard-e2e,inference-routing-e2e

Auto-dispatched E2E: inference-routing-e2e, cloud-onboard-e2e via nightly-e2e.yaml at 3b503571a321952bee0ba5d80fc0a8f405e43cf3 — nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: medium

Required E2E

onboard-inference-smoke-e2e (low): Lightweight regression E2E for onboard inference validation. It is the closest existing focused test for the onboard validation behavior changed in the local NIM path, including failing closed when a configured route cannot serve a real chat completion.
inference-routing-e2e (medium): Covers real OpenShell gateway inference routing, credential isolation, and provider error classification. The NIM changes alter local OpenAI-compatible model selection and chat-completions routing, so this should be merge-blocking adjacent confidence.
cloud-onboard-e2e (medium): Exercises the install/onboard flow and sandbox health/security checks through a real user path. Although it does not select local NIM, src/lib/onboard.ts changed in the inference-provider setup area and this guards against broad onboarding regressions.

Optional E2E

gpu-e2e (high): Optional high-cost local GPU confidence. It does not cover NIM/NGC directly, but it exercises local provider onboarding, Docker/GPU availability, sandbox inference wiring, and real assistant user flow on a GPU runner.
cloud-inference-e2e (medium): Optional end-to-end live inference smoke through inference.local and OpenClaw. Useful to confirm the broader inference path still works, but it does not directly exercise local NIM image pull or served model-id adoption.

New E2E recommendations

local NIM provider onboarding (high): No existing E2E appears to run NEMOCLAW_PROVIDER=nim-local with a real NGC-backed NIM container. This PR's main behavior—manifest-index digest resolution, digest pull, tag-back, container start, /v1/models served-id adoption, and chat-completions validation—is therefore not directly covered.
- Suggested test: local-nim-onboard-e2e
NIM OCI manifest attestation regression (high): Add a focused E2E or hermetic integration test with an NGC-like multi-arch OCI index containing linux manifests plus unknown/unknown attestation manifests, asserting the CLI pulls the host-arch digest and never bare-pulls the tag before tagging it back.
- Suggested test: nim-manifest-digest-pull-e2e
NIM served model-id safety (medium): Add E2E coverage where a local OpenAI-compatible NIM mock returns a served model id different from the catalog name and then an unsafe id, verifying validation uses the safe served id and refuses unsafe/log-injection-shaped values.
- Suggested test: nim-served-model-id-validation-e2e

Dispatch hint

Workflow: nightly-e2e.yaml
jobs input: cloud-onboard-e2e,inference-routing-e2e

github-actions · 2026-06-02T03:06:35Z

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

None. The production changes are confined to Docker helper additions and the Local NVIDIA NIM onboarding/runtime path. The dispatchable scenario catalog in e2e-scenarios.yaml/scenarios.yaml has cloud NVIDIA and local Ollama coverage, but no Local NIM/NVCR image-pull scenario that would exercise docker manifest resolution, docker tag, served-model adoption, or the NIM health-timeout change. Unit test changes are outside test/e2e-scenario/. No scenario E2E job would directly validate the changed surface.

Optional scenario E2E

None.

Relevant changed files

src/lib/adapters/docker/image.ts
src/lib/adapters/docker/inspect.ts
src/lib/inference/nim.ts
src/lib/onboard.ts

github-actions · 2026-06-02T03:10:14Z

PR Review Advisor

Findings: 1 needs attention, 5 worth checking, 0 nice ideas
Since last review: 1 prior item resolved, 1 still applies, 3 new items found

Review findings

🛠️ Needs attention

Offset NIM monolith growth before merge (src/lib/inference/nim.ts:1): This PR adds 132 lines to `src/lib/inference/nim.ts` and 265 lines to `src/lib/inference/nim.test.ts`, both already large hotspots. The deterministic growth guard marks both as blocker-level monolith growth.
- Recommendation: Extract the new manifest-resolution and served-model-id helpers/tests into smaller focused modules or otherwise offset the hotspot growth before merge.
- Evidence: `src/lib/inference/nim.ts` grows from 869 to 1001 lines (+132); `src/lib/inference/nim.test.ts` grows from 1814 to 2079 lines (+265).

🔎 Worth checking

Source-of-truth review needed: NIM platform digest pull workaround: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `pullImageResolvingPlatform()` explains the current workaround and fallback, but not the removal condition.
Source-of-truth review needed: Served model id adoption: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `adoptServedModelId()` documents the mismatch but does not state a removal condition.
Validate registry digest strings before constructing Docker refs (src/lib/inference/nim.ts:747): `selectPlatformManifestDigest()` accepts any non-empty `entry.digest` from `docker manifest inspect` JSON and `pullImageResolvingPlatform()` uses it to build `repo@digest`. Docker is invoked via argv arrays, so this is not shell injection, but stricter digest validation would reduce malformed-ref and registry-response hardening risk.
- Recommendation: Accept only canonical OCI digest strings, for example `sha256:<64 hex>`, before returning a digest. Add a negative test where a matching platform entry contains a malformed digest and verify it is rejected or falls back safely.
- Evidence: `typeof entry.digest === "string" && entry.digest.length > 0` is the only digest check before `${imageRepository(image)}@${digest}` is passed to `dockerPull()` and `dockerTag()`.
Add targeted runtime validation for the Docker/NIM pull path (src/lib/inference/nim.ts:776): The unit tests cover the intended argv behavior, but the linked issue is a Docker 29/containerd plus nvcr.io registry interaction and the acceptance clauses include successful pull, container start, and sandbox readiness. Those runtime boundaries are not proven by unit mocks.
- Recommendation: Add or identify targeted runtime/integration validation that exercises Docker 29/containerd against a NIM-style index: resolve Linux platform digest, avoid pulling the tag index, tag back to the friendly ref, start the local NIM container, and verify onboarding uses the served model id.
- Evidence: Tests mock `docker manifest inspect`, `docker pull`, `docker tag`, and `/v1/models`; deterministic test-depth context flags runtime/sandbox/infrastructure paths in `image.ts`, `inspect.ts`, `nim.ts`, and `onboard.ts` for behavioral runtime validation.
Document removal conditions for localized NIM compatibility workarounds (src/lib/inference/nim.ts:776): The digest-pull workaround and served-model-id adoption clearly identify the invalid external states and have regression tests, but the code does not state when these compatibility paths can be removed. Without that source-of-truth marker, workaround behavior can become permanent even after NGC/Docker/catalog behavior changes.
- Recommendation: Add concise comments or tracking references describing the removal conditions, such as Docker/NGC no longer requiring digest pulls for NIM attestation indexes and catalog ids being guaranteed to match served `/v1/models` ids.
- Evidence: `pullImageResolvingPlatform()` and `adoptServedModelId()` are localized fallback/tolerant behaviors with tests, but their comments explain the current workaround and not the condition under which it should be removed.

🌱 Nice ideas

None.

Consider writing more tests for

**Runtime validation** — Docker 29/containerd NGC NIM index pull resolves a Linux platform digest, avoids pulling the tag index, tags back, and the tagged image can be used by `docker run`.. The new unit tests are targeted and useful, but the highest-risk behavior crosses Docker CLI, Docker 29/containerd image store, nvcr.io registry manifests, local NIM startup, and onboarding provider configuration.
**Runtime validation** — Local NIM onboarding passes the adopted `/v1/models` served id into validation and final provider config instead of the catalog id.. The new unit tests are targeted and useful, but the highest-risk behavior crosses Docker CLI, Docker 29/containerd image store, nvcr.io registry manifests, local NIM startup, and onboarding provider configuration.
**Runtime validation** — When manifest inspect returns a matching entry with a malformed digest, the pull path rejects it or falls back without constructing `repo@<bad>`.. The new unit tests are targeted and useful, but the highest-risk behavior crosses Docker CLI, Docker 29/containerd image store, nvcr.io registry manifests, local NIM startup, and onboarding provider configuration.
**Runtime validation** — When no platform digest matches the host arch, the diagnostic and fallback path are visible and the original failure is still surfaced clearly.. The new unit tests are targeted and useful, but the highest-risk behavior crosses Docker CLI, Docker 29/containerd image store, nvcr.io registry manifests, local NIM startup, and onboarding provider configuration.
**Runtime validation** — A running local NIM whose `/v1/models` endpoint is temporarily unreachable keeps the catalog model and still follows the expected validation recovery path.. The new unit tests are targeted and useful, but the highest-risk behavior crosses Docker CLI, Docker 29/containerd image store, nvcr.io registry manifests, local NIM startup, and onboarding provider configuration.
**Add targeted runtime validation for the Docker/NIM pull path** — Add or identify targeted runtime/integration validation that exercises Docker 29/containerd against a NIM-style index: resolve Linux platform digest, avoid pulling the tag index, tag back to the friendly ref, start the local NIM container, and verify onboarding uses the served model id.
**Acceptance clause:** Image pull completes successfully. — add test evidence or identify existing coverage. `pullNimImage()` now resolves a platform digest with `docker manifest inspect`, pulls `repo@digest`, and tags back to the original ref. Unit tests prove the happy path avoids `docker pull ...:latest`, but no runtime Docker 29/nvcr.io validation evidence is present.
**Acceptance clause:** NIM container starts. — add test evidence or identify existing coverage. The digest pull is tagged back to the original image ref used by `startNimContainerByName()`, and the default health timeout is now 1200s. Tests cover timeout and tag argv, but not a real container start from the re-tagged digest image.

Since last review details

Current findings:

Source-of-truth review needed: NIM platform digest pull workaround: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `pullImageResolvingPlatform()` explains the current workaround and fallback, but not the removal condition.
Source-of-truth review needed: Served model id adoption: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `adoptServedModelId()` documents the mismatch but does not state a removal condition.
Offset NIM monolith growth before merge (src/lib/inference/nim.ts:1): This PR adds 132 lines to `src/lib/inference/nim.ts` and 265 lines to `src/lib/inference/nim.test.ts`, both already large hotspots. The deterministic growth guard marks both as blocker-level monolith growth.
- Recommendation: Extract the new manifest-resolution and served-model-id helpers/tests into smaller focused modules or otherwise offset the hotspot growth before merge.
- Evidence: `src/lib/inference/nim.ts` grows from 869 to 1001 lines (+132); `src/lib/inference/nim.test.ts` grows from 1814 to 2079 lines (+265).
Validate registry digest strings before constructing Docker refs (src/lib/inference/nim.ts:747): `selectPlatformManifestDigest()` accepts any non-empty `entry.digest` from `docker manifest inspect` JSON and `pullImageResolvingPlatform()` uses it to build `repo@digest`. Docker is invoked via argv arrays, so this is not shell injection, but stricter digest validation would reduce malformed-ref and registry-response hardening risk.
- Recommendation: Accept only canonical OCI digest strings, for example `sha256:<64 hex>`, before returning a digest. Add a negative test where a matching platform entry contains a malformed digest and verify it is rejected or falls back safely.
- Evidence: `typeof entry.digest === "string" && entry.digest.length > 0` is the only digest check before `${imageRepository(image)}@${digest}` is passed to `dockerPull()` and `dockerTag()`.
Add targeted runtime validation for the Docker/NIM pull path (src/lib/inference/nim.ts:776): The unit tests cover the intended argv behavior, but the linked issue is a Docker 29/containerd plus nvcr.io registry interaction and the acceptance clauses include successful pull, container start, and sandbox readiness. Those runtime boundaries are not proven by unit mocks.
- Recommendation: Add or identify targeted runtime/integration validation that exercises Docker 29/containerd against a NIM-style index: resolve Linux platform digest, avoid pulling the tag index, tag back to the friendly ref, start the local NIM container, and verify onboarding uses the served model id.
- Evidence: Tests mock `docker manifest inspect`, `docker pull`, `docker tag`, and `/v1/models`; deterministic test-depth context flags runtime/sandbox/infrastructure paths in `image.ts`, `inspect.ts`, `nim.ts`, and `onboard.ts` for behavioral runtime validation.
Document removal conditions for localized NIM compatibility workarounds (src/lib/inference/nim.ts:776): The digest-pull workaround and served-model-id adoption clearly identify the invalid external states and have regression tests, but the code does not state when these compatibility paths can be removed. Without that source-of-truth marker, workaround behavior can become permanent even after NGC/Docker/catalog behavior changes.
- Recommendation: Add concise comments or tracking references describing the removal conditions, such as Docker/NGC no longer requiring digest pulls for NIM attestation indexes and catalog ids being guaranteed to match served `/v1/models` ids.
- Evidence: `pullImageResolvingPlatform()` and `adoptServedModelId()` are localized fallback/tolerant behaviors with tests, but their comments explain the current workaround and not the condition under which it should be removed.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/inference/nim.test.ts`:
- Around line 207-245: The test "resolves the host-arch manifest digest..." uses
the real process.arch which makes expectedDigest deterministic only on certain
machines; update the test to pin or stub process.arch (or parameterize per
supported arch) before calling nimModule.nodeArchToOci and pullNimImage so
DIGEST_BY_ARCH[ociArch] is deterministic—e.g., set process.arch temporarily to
"x64" or "arm64" around the calls to nodeArchToOci and pullNimImage (and restore
it in the finally), or loop the test for each supported arch; target symbols:
pullNimImage, nodeArchToOci, DIGEST_BY_ARCH, process.arch,
loadNimWithMockedRunner, and ensure restore() still runs.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8d830609-3f2c-49d6-a054-2bc43ce52ba0

📥 Commits

Reviewing files that changed from the base of the PR and between f17a19a and 3d71312.

📒 Files selected for processing (6)

src/lib/adapters/docker/image.ts
src/lib/adapters/docker/index.test.ts
src/lib/adapters/docker/inspect.ts
src/lib/inference/nim.test.ts
src/lib/inference/nim.ts
src/lib/onboard.ts

github-actions · 2026-06-03T00:15:52Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26855567186
Target ref: 3d713125307111b85f78d3afc2235c3cd34e03c1
Workflow ref: main
Requested jobs: inference-routing-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job	Result
inference-routing-e2e	✅ success

…spect

hunglp6d · 2026-06-03T00:40:00Z

Addressed advisor findings:

Validate served model id: adoptServedModelId now checks the /v1/models id with isSafeModelId before adopting it (mirrors the local-vLLM boundary); an unsafe id is ignored with a diagnostic and never echoed to logs. +test.
Observable digest-pull fallback: pullImageResolvingPlatform now logs when it falls back to a plain tag pull, plus a regression test for a thrown dockerManifestInspect.

github-actions · 2026-06-03T01:06:27Z

Selective E2E Results — ⚠️ No requested jobs ran

Run: 26857466199
Target ref: 271f8e3b6441a5cd7343e8daed6621e761594f8a
Workflow ref: main
Requested jobs: gpu-e2e
Summary: 0 passed, 0 failed, 1 skipped

Job	Result
gpu-e2e	⏭️ skipped

github-actions · 2026-06-03T16:37:09Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26898539348
Target ref: c86f11a03fb480ed9dc1b2b65a632a10654fb599
Workflow ref: main
Requested jobs: inference-routing-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job	Result
inference-routing-e2e	✅ success

github-actions · 2026-06-03T16:44:18Z

Selective E2E Results — ⚠️ No requested jobs ran

Run: 26899287698
Target ref: 550bba2719e12f27ec555daafa4ce03dd7d4b75b
Workflow ref: main
Requested jobs: gpu-e2e
Summary: 0 passed, 0 failed, 1 skipped

Job	Result
gpu-e2e	⏭️ skipped

github-actions · 2026-06-04T21:48:10Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26981266018
Target ref: e30a51f5d738cf230329b0ebe768a05af9cddc5d
Workflow ref: main
Requested jobs: cloud-onboard-e2e,inference-routing-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job	Result
cloud-onboard-e2e	✅ success
inference-routing-e2e	✅ success

github-actions · 2026-06-05T16:06:15Z

Selective E2E Results — ✅ All requested jobs passed

Run: 27025479068
Target ref: 6ef60d97cff2512a984f8507531669f007553535
Workflow ref: main
Requested jobs: inference-routing-e2e,cloud-onboard-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job	Result
cloud-onboard-e2e	✅ success
inference-routing-e2e	✅ success

github-actions · 2026-06-05T16:21:08Z

Selective E2E Results — ✅ All requested jobs passed

Run: 27026265139
Target ref: ff0fa2ccc7daedaa6105c62d1d94c6b31018c8ae
Workflow ref: main
Requested jobs: inference-routing-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job	Result
inference-routing-e2e	✅ success

github-actions · 2026-06-05T19:36:39Z

Selective E2E Results — ⚠️ No requested jobs ran

Run: 27036040845
Target ref: daef088ecca8dce092f97b9076fb40d75d088863
Workflow ref: main
Requested jobs: gpu-e2e
Summary: 0 passed, 0 failed, 1 skipped

Job	Result
gpu-e2e	⏭️ skipped

github-actions · 2026-06-05T19:47:16Z

Selective E2E Results — ✅ All requested jobs passed

Run: 27036186305
Target ref: cc5720710d7c22b7bc48a0f0116c6a88a1ab5098
Workflow ref: main
Requested jobs: gpu-e2e,inference-routing-e2e
Summary: 1 passed, 0 failed, 1 skipped

Job	Result
gpu-e2e	⏭️ skipped
inference-routing-e2e	✅ success

github-actions · 2026-06-05T20:35:52Z

Selective E2E Results — ✅ All requested jobs passed

Run: 27038414371
Target ref: 3b503571a321952bee0ba5d80fc0a8f405e43cf3
Workflow ref: main
Requested jobs: inference-routing-e2e,cloud-onboard-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job	Result
cloud-onboard-e2e	✅ success
inference-routing-e2e	✅ success

## Summary - Adds the `v0.0.60` section to `docs/about/release-notes.mdx` using the dev announcement from discussion #4877. - Fills the source-doc gaps found during release-prep review across inference, policy tiers, command behavior, security boundaries, Hermes dashboard/tooling, runtime context, and troubleshooting. - Refreshes generated agent skills under `.agents/skills/` from the current Fern docs output and upgrades Fern from `5.44.3` to `5.45.0`. ## Source summary - #4037 -> `docs/reference/architecture.mdx`, `docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents system-only runtime context that stays out of visible chat. - #4875 -> `docs/reference/architecture.mdx`, `docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents try-first sandbox network/filesystem guidance and clearer failure classification. - #4788 -> `docs/security/best-practices.mdx`, `docs/about/release-notes.mdx`: Documents shared OpenClaw device-approval policy for startup and connect. - #4768 -> `docs/reference/network-policies.mdx`, `docs/network-policy/integration-policy-examples.mdx`, `docs/get-started/quickstart.mdx`, `docs/get-started/quickstart-hermes.mdx`, `docs/reference/commands.mdx`: Documents `weather`, `public-reference`, and Hermes managed-tool gateway preset behavior. - #3788 and #4864 -> `docs/reference/network-policies.mdx`, `docs/reference/commands.mdx`: Documents non-interactive policy-tier fail-fast behavior and interactive prompt fallback. - #4756 and #4866 -> `docs/reference/commands.mdx`: Documents env-aware default sandbox resolution for `list`, `status`, and `tunnel` commands. - #4320 -> `docs/reference/commands.mdx`: Documents `$$nemoclaw tunnel status` behavior. - #4328 -> `docs/reference/commands.mdx`: Documents line-scoped policy preset descriptions in `policy-list`. - #4580 and #4748 -> `docs/reference/architecture.mdx`: Documents package-managed OpenShell gateway service and Docker-driver gateway-marker behavior. - #4598 -> `docs/manage-sandboxes/lifecycle.mdx`: Documents concurrent gateway/dashboard cleanup isolation by sandbox name and port. - #4777 -> `docs/reference/troubleshooting.mdx`: Documents Docker GPU patch rollback behavior. - #4610 -> `docs/reference/troubleshooting.mdx`, `docs/reference/commands.mdx`: Keeps mutable OpenClaw config permission guidance aligned and removes skipped experimental wording. - #4868 -> `docs/reference/commands.mdx`: Keeps `.dockerignore` handling for custom `onboard --from <Dockerfile>` contexts in generated skills. - #4870 -> `docs/reference/commands.mdx`, `docs/manage-sandboxes/runtime-controls.mdx`: Documents `NEMOCLAW_MINIMAL_BOOTSTRAP` and generated skill coverage. - #4641 -> `docs/inference/inference-options.mdx`, `docs/reference/troubleshooting.mdx`: Documents local NVIDIA NIM platform-digest pulls and served-model id adoption. - #4810 and #4867 -> `docs/inference/inference-options.mdx`: Documents stable NGC managed-vLLM image lineage and DGX Station DeepSeek V4 Flash coverage. - #4852 -> `docs/inference/use-local-inference.mdx`, `docs/reference/troubleshooting.mdx`: Documents Ollama model fit filtering, 16K context floor, cold-load retry, and failed-model exclusion. - #4847 -> `docs/inference/switch-inference-providers.mdx`: Documents API-family sync, Hermes `api_mode`, and Bedrock Runtime exception. - #4800 -> `docs/inference/tool-calling-reliability.mdx`: Documents Nemotron managed-inference native tool-search fallback. - #4333 -> `docs/inference/switch-inference-providers.mdx`: Documents interactive multimodal input prompting. - #4086 -> `docs/reference/troubleshooting.mdx`: Keeps proxy bypass normalization in generated troubleshooting coverage. - #4811 and #4855 -> `docs/get-started/quickstart-hermes.mdx`: Documents prebuilt Hermes dashboard assets and TUI recovery without runtime rebuilds. - #4854 -> `docs/inference/switch-inference-providers.mdx`, `docs/reference/commands.mdx`: Documents Hermes proxy API-key placeholder preservation during inference switches. - #4248 -> `docs/manage-sandboxes/messaging-channels.mdx`, `.agents/skills/`: Keeps messaging enrollment behavior aligned with manifest-hook implementation. - #4771 -> `docs/security/best-practices.mdx`, `docs/security/credential-storage.mdx`: Documents Hermes placeholder-only secret boundary for sandbox-visible runtime files. - #4787 -> `docs/security/best-practices.mdx`, `docs/about/release-notes.mdx`: Documents expanded memory scanner examples for OpenAI project keys and Slack app-level tokens. - #4848 -> `docs/reference/commands.mdx`: Documents OpenClaw skill install mirroring into the agent home directory. - #4790 -> `docs/about/release-notes.mdx`: Uses the prior release-prep structure and generated `.agents/skills/` refresh as the template for this release. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ skills/ --prefix nemoclaw-user --doc-platform fern-mdx --dry-run` - `npm run docs` - `git diff --check` - skip-term scan across `docs/`, `.agents/skills/`, and `skills/` - `npm run build:cli` - `npm run typecheck:cli` - Commit and pre-push hook suites, including markdownlint, gitleaks, env-var docs gate, docs-to-skills verification, and skills YAML tests  ## Summary by CodeRabbit ## Release Notes * **New Features** * DeepSeek-V4-Flash now available as default inference model for DGX Station. * Hermes dashboard improved with dedicated port and OAuth-authenticated tool gateway selection. * Added weather and public-reference policy presets for expanded agent capabilities. * Enhanced Ollama model selection with GPU memory filtering and automatic retry for timeouts. * **Bug Fixes** * Improved policy tier validation to prevent invalid configurations. * Better sandbox cleanup scoping by port to prevent conflicts across deployments. * Added GPU patch failure recovery with automatic rollback. * **Documentation** * Expanded troubleshooting guides for inference, security, and sandbox lifecycle. * Added .dockerignore best practices for custom deployments.  --------- Co-authored-by: Carlos Villela <cvillela@nvidia.com>

fix(onboard): nim pull

754403d

hunglp6d and others added 5 commits June 2, 2026 09:26

fix(onboard): raise local NIM health timeout to 1200s

6d2eec4

Merge branch 'main' into fix/nim-pull-attestation-index-digest

e083ec2

Merge branch 'main' into fix/nim-pull-attestation-index-digest

af234bb

fix(inference): pass codebase-growth-guardrails and env-var-docs gates

a879db8

Merge branch 'main' into fix/nim-pull-attestation-index-digest

3d71312

hunglp6d self-assigned this Jun 3, 2026

hunglp6d added enhancement: inference VRDC Issues and PRs submitted by NVIDIA VRDC test team. labels Jun 3, 2026

hunglp6d marked this pull request as ready for review June 3, 2026 00:06

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread src/lib/inference/nim.test.ts

hunglp6d and others added 4 commits June 3, 2026 05:46

fix(inference): validate NIM served model id with isSafeModelId

d648283

fix(inference): log NIM digest-pull fallback, test thrown manifest in…

52c960d

…spect

fix(inference): resolve pinning process.arch in NIM digest-pull test

1ece6b9

Merge branch 'main' into fix/nim-pull-attestation-index-digest

14ef6e1

hunglp6d added the v0.0.57 Release target label Jun 3, 2026

cv added v0.0.58 Release target and removed v0.0.57 Release target labels Jun 3, 2026

Merge branch 'main' into fix/nim-pull-attestation-index-digest

271f8e3

wscurran added area: inference Inference routing, serving, model selection, or outputs area: onboarding Onboarding FSM, provider setup, sandbox launch, or first-run flow labels Jun 3, 2026

Merge branch 'main' into fix/nim-pull-attestation-index-digest

550bba2

cv added v0.0.59 Release target and removed v0.0.58 Release target labels Jun 4, 2026

hunglp6d added 2 commits June 4, 2026 15:18

Merge branch 'main' into fix/nim-pull-attestation-index-digest

932c699

Merge branch 'main' into fix/nim-pull-attestation-index-digest

2336209

cv added v0.0.60 Release target and removed v0.0.59 Release target labels Jun 4, 2026

hunglp6d and others added 2 commits June 5, 2026 03:06

fix(inference): use DEFAULT_NIM_HEALTH_TIMEOUT_SECONDS const (1200s)

1bd4c40

Merge branch 'main' into fix/nim-pull-attestation-index-digest

e30a51f

hunglp6d and others added 3 commits June 5, 2026 03:33

fix(inference): align NIM health-timeout test with 1200s default

f19fb97

Merge branch 'main' into fix/nim-pull-attestation-index-digest

11d4a24

Merge branch 'main' into fix/nim-pull-attestation-index-digest

6ef60d9

Merge branch 'main' into fix/nim-pull-attestation-index-digest

ff0fa2c

wscurran added the bug-fix PR fixes a bug or regression label Jun 5, 2026

Merge branch 'main' into fix/nim-pull-attestation-index-digest

daef088

Merge branch 'main' into fix/nim-pull-attestation-index-digest

cc57207

Merge branch 'main' into fix/nim-pull-attestation-index-digest

3b50357

cv approved these changes Jun 5, 2026

View reviewed changes

cv merged commit 0b17a14 into main Jun 5, 2026
28 checks passed

cv deleted the fix/nim-pull-attestation-index-digest branch June 5, 2026 22:11

miyoungc mentioned this pull request Jun 6, 2026

docs: refresh v0.0.60 release notes #4879

Merged

Conversation

hunglp6d commented Jun 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Jun 2, 2026

Uh oh!

coderabbitai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 3, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

hunglp6d commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Selective E2E Results — ⚠️ No requested jobs ran

Uh oh!

github-actions Bot commented Jun 3, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

github-actions Bot commented Jun 3, 2026

Selective E2E Results — ⚠️ No requested jobs ran

Uh oh!

github-actions Bot commented Jun 4, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

github-actions Bot commented Jun 5, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

github-actions Bot commented Jun 5, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

github-actions Bot commented Jun 5, 2026

Selective E2E Results — ⚠️ No requested jobs ran

Uh oh!

github-actions Bot commented Jun 5, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

hunglp6d commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading

github-actions Bot commented Jun 2, 2026 •

edited

Loading

github-actions Bot commented Jun 2, 2026 •

edited

Loading

github-actions Bot commented Jun 2, 2026 •

edited

Loading