fix(inference): pull NIM by platform digest, use served model id#4641
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Docker manifest inspect/tag helpers; platform-aware OCI manifest digest selection with digest-based pull-and-retag; served-model-id parsing and optional adoption; integrates adoption into onboarding; increases DEFAULT_NIM_HEALTH_TIMEOUT_SECONDS to 1200. ChangesNIM Platform-Aware Image Pulling and Served Model Discovery
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Auto-dispatched E2E: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
E2E Scenario Advisor RecommendationRequired scenario E2E: None Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
PR Review AdvisorFindings: 1 needs attention, 5 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Consider writing more tests for
Since last review detailsCurrent findings:
This is an automated advisory review. A human maintainer must make the final merge decision. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/inference/nim.test.ts`:
- Around line 207-245: The test "resolves the host-arch manifest digest..." uses
the real process.arch which makes expectedDigest deterministic only on certain
machines; update the test to pin or stub process.arch (or parameterize per
supported arch) before calling nimModule.nodeArchToOci and pullNimImage so
DIGEST_BY_ARCH[ociArch] is deterministic—e.g., set process.arch temporarily to
"x64" or "arm64" around the calls to nodeArchToOci and pullNimImage (and restore
it in the finally), or loop the test for each supported arch; target symbols:
pullNimImage, nodeArchToOci, DIGEST_BY_ARCH, process.arch,
loadNimWithMockedRunner, and ensure restore() still runs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 8d830609-3f2c-49d6-a054-2bc43ce52ba0
📒 Files selected for processing (6)
src/lib/adapters/docker/image.tssrc/lib/adapters/docker/index.test.tssrc/lib/adapters/docker/inspect.tssrc/lib/inference/nim.test.tssrc/lib/inference/nim.tssrc/lib/onboard.ts
Selective E2E Results — ✅ All requested jobs passedRun: 26855567186
|
|
Addressed advisor findings:
|
Selective E2E Results —
|
| Job | Result |
|---|---|
| gpu-e2e | ⏭️ skipped |
Selective E2E Results — ✅ All requested jobs passedRun: 26898539348
|
Selective E2E Results —
|
| Job | Result |
|---|---|
| gpu-e2e | ⏭️ skipped |
Selective E2E Results — ✅ All requested jobs passedRun: 26981266018
|
Selective E2E Results — ✅ All requested jobs passedRun: 27025479068
|
Selective E2E Results — ✅ All requested jobs passedRun: 27026265139
|
Selective E2E Results —
|
| Job | Result |
|---|---|
| gpu-e2e | ⏭️ skipped |
Selective E2E Results — ✅ All requested jobs passedRun: 27036186305
|
Selective E2E Results — ✅ All requested jobs passedRun: 27038414371
|
## Summary - Adds the `v0.0.60` section to `docs/about/release-notes.mdx` using the dev announcement from discussion #4877. - Fills the source-doc gaps found during release-prep review across inference, policy tiers, command behavior, security boundaries, Hermes dashboard/tooling, runtime context, and troubleshooting. - Refreshes generated agent skills under `.agents/skills/` from the current Fern docs output and upgrades Fern from `5.44.3` to `5.45.0`. ## Source summary - #4037 -> `docs/reference/architecture.mdx`, `docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents system-only runtime context that stays out of visible chat. - #4875 -> `docs/reference/architecture.mdx`, `docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents try-first sandbox network/filesystem guidance and clearer failure classification. - #4788 -> `docs/security/best-practices.mdx`, `docs/about/release-notes.mdx`: Documents shared OpenClaw device-approval policy for startup and connect. - #4768 -> `docs/reference/network-policies.mdx`, `docs/network-policy/integration-policy-examples.mdx`, `docs/get-started/quickstart.mdx`, `docs/get-started/quickstart-hermes.mdx`, `docs/reference/commands.mdx`: Documents `weather`, `public-reference`, and Hermes managed-tool gateway preset behavior. - #3788 and #4864 -> `docs/reference/network-policies.mdx`, `docs/reference/commands.mdx`: Documents non-interactive policy-tier fail-fast behavior and interactive prompt fallback. - #4756 and #4866 -> `docs/reference/commands.mdx`: Documents env-aware default sandbox resolution for `list`, `status`, and `tunnel` commands. - #4320 -> `docs/reference/commands.mdx`: Documents `$$nemoclaw tunnel status` behavior. - #4328 -> `docs/reference/commands.mdx`: Documents line-scoped policy preset descriptions in `policy-list`. - #4580 and #4748 -> `docs/reference/architecture.mdx`: Documents package-managed OpenShell gateway service and Docker-driver gateway-marker behavior. - #4598 -> `docs/manage-sandboxes/lifecycle.mdx`: Documents concurrent gateway/dashboard cleanup isolation by sandbox name and port. - #4777 -> `docs/reference/troubleshooting.mdx`: Documents Docker GPU patch rollback behavior. - #4610 -> `docs/reference/troubleshooting.mdx`, `docs/reference/commands.mdx`: Keeps mutable OpenClaw config permission guidance aligned and removes skipped experimental wording. - #4868 -> `docs/reference/commands.mdx`: Keeps `.dockerignore` handling for custom `onboard --from <Dockerfile>` contexts in generated skills. - #4870 -> `docs/reference/commands.mdx`, `docs/manage-sandboxes/runtime-controls.mdx`: Documents `NEMOCLAW_MINIMAL_BOOTSTRAP` and generated skill coverage. - #4641 -> `docs/inference/inference-options.mdx`, `docs/reference/troubleshooting.mdx`: Documents local NVIDIA NIM platform-digest pulls and served-model id adoption. - #4810 and #4867 -> `docs/inference/inference-options.mdx`: Documents stable NGC managed-vLLM image lineage and DGX Station DeepSeek V4 Flash coverage. - #4852 -> `docs/inference/use-local-inference.mdx`, `docs/reference/troubleshooting.mdx`: Documents Ollama model fit filtering, 16K context floor, cold-load retry, and failed-model exclusion. - #4847 -> `docs/inference/switch-inference-providers.mdx`: Documents API-family sync, Hermes `api_mode`, and Bedrock Runtime exception. - #4800 -> `docs/inference/tool-calling-reliability.mdx`: Documents Nemotron managed-inference native tool-search fallback. - #4333 -> `docs/inference/switch-inference-providers.mdx`: Documents interactive multimodal input prompting. - #4086 -> `docs/reference/troubleshooting.mdx`: Keeps proxy bypass normalization in generated troubleshooting coverage. - #4811 and #4855 -> `docs/get-started/quickstart-hermes.mdx`: Documents prebuilt Hermes dashboard assets and TUI recovery without runtime rebuilds. - #4854 -> `docs/inference/switch-inference-providers.mdx`, `docs/reference/commands.mdx`: Documents Hermes proxy API-key placeholder preservation during inference switches. - #4248 -> `docs/manage-sandboxes/messaging-channels.mdx`, `.agents/skills/`: Keeps messaging enrollment behavior aligned with manifest-hook implementation. - #4771 -> `docs/security/best-practices.mdx`, `docs/security/credential-storage.mdx`: Documents Hermes placeholder-only secret boundary for sandbox-visible runtime files. - #4787 -> `docs/security/best-practices.mdx`, `docs/about/release-notes.mdx`: Documents expanded memory scanner examples for OpenAI project keys and Slack app-level tokens. - #4848 -> `docs/reference/commands.mdx`: Documents OpenClaw skill install mirroring into the agent home directory. - #4790 -> `docs/about/release-notes.mdx`: Uses the prior release-prep structure and generated `.agents/skills/` refresh as the template for this release. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ skills/ --prefix nemoclaw-user --doc-platform fern-mdx --dry-run` - `npm run docs` - `git diff --check` - skip-term scan across `docs/`, `.agents/skills/`, and `skills/` - `npm run build:cli` - `npm run typecheck:cli` - Commit and pre-push hook suites, including markdownlint, gitleaks, env-var docs gate, docs-to-skills verification, and skills YAML tests <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * DeepSeek-V4-Flash now available as default inference model for DGX Station. * Hermes dashboard improved with dedicated port and OAuth-authenticated tool gateway selection. * Added weather and public-reference policy presets for expanded agent capabilities. * Enhanced Ollama model selection with GPU memory filtering and automatic retry for timeouts. * **Bug Fixes** * Improved policy tier validation to prevent invalid configurations. * Better sandbox cleanup scoping by port to prevent conflicts across deployments. * Added GPU patch failure recovery with automatic rollback. * **Documentation** * Expanded troubleshooting guides for inference, security, and sandbox lifecycle. * Added .dockerignore best practices for custom deployments. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Summary
Fixes local NVIDIA NIM onboarding (
NEMOCLAW_EXPERIMENTAL=1 nemoclaw onboard→Local NVIDIA NIM) on Docker 29.x with the containerd image store. Three defects on
that path, all reproduced and verified end-to-end on DGX Spark (GB10, Docker 29.5.2):
error from registry: Incorrect Repository Format. NIM:latesttags are multi-arch OCI indexes that also carry buildkit attestation manifests
(
platform: unknown/unknown). The containerd image store pulls the per-arch layers,then fetches the attestation manifest, which nvcr.io rejects — aborting after every
layer with no usable image. (Older Docker without the containerd store never fetched
it, so the break tracks the Docker/image-store version, not the host or repo path.)
wait; larger models always time out.
(
nvidia/nemotron-3-nano), which differs from the catalog name(
nvidia/nemotron-3-nano-30b-a3b); validating/routing with the catalog id returns 404.Related Issue
Fixes #3885
Changes
src/lib/inference/nim.tspullNimImageresolves the index to the host-arch image-manifest digest(
docker manifest inspect→ matchplatform.architecture/os, skippingunknown/unknownattestation entries), pulls that single manifest by digest(no index walk → no attestation fetch), and re-tags it to the original ref.
Falls back to a plain tag pull when the ref is not a resolvable multi-arch index,
and logs which path was taken.
adoptServedModelIdreads the served id from/v1/modelsand uses it forvalidation/route/config when it differs from the catalog name. The served id is
local-service-controlled, so it is validated with
isSafeModelIdbefore adoption(mirrors the adjacent local-vLLM detected-model boundary); an unsafe id is ignored
with a diagnostic and never echoed into logs.
waitForNimHealthdefault raised 300s → 1200s for slow first-loads (no new env var).src/lib/onboard.ts: calls the helpers above; kept net-neutral percodebase-growth-guardrails.
src/lib/adapters/docker/inspect.ts,image.ts: adddockerManifestInspectand
dockerTag.dockerManifestInspectfallback, and manifest-inspect/tag argv.Type of Change
Verification
npx prek run --all-filespassesnpm testpassesnpm run docsbuilds without warnings (doc changes only)Signed-off-by: Hung Le hple@nvidia.com
Summary by CodeRabbit
New Features
Improvements
Tests