fix(inference): tighten Ollama bootstrap fit and raise runtime context floor by laitingsheng · Pull Request #4852 · NVIDIA/NemoClaw

laitingsheng · 2026-06-05T13:42:30Z

Summary

Three compounding faults steer Ollama onboarding into a dead-loop on tight-VRAM dGPU hosts (L4 23 GB) and leave the agent with a 4096-token runtime context window that cannot fit the base prompt + tool catalogue.

Tightens the bootstrap-model registry, raises the auto-adopted runtime context window to a workable floor, extends the cold-load probe retry to non-Spark hosts, and breaks the model-selection re-prompt out of dead-loops after repeated probe failures.

Related Issue

Fixes #4812
Fixes #4813
Refs #3707

Changes

Raised registry requiredMemoryMB so 30B-class entries no longer pass the fit check on L4-class 23 GB dGPUs: nemotron-3-nano:30b 22000 → 26000 and qwen3.6:35b 26000 → 30000. The original 22000 budget left ~1 GB headroom over the 19 GB on-disk weight, which is not enough for KV cache + activations + agent prompt at default context; the runner ended up spilling GPU→CPU during warm-up and the probe timed out.
Added OllamaModelEntry.computeIntensive and GpuInfo.computeConstrained so fittableOllamaModelTags / modelFitsAvailableMemory / anyRegistryModelFits skip 30B-class entries on integrated-GPU hosts (platform === "jetson"), where memory ostensibly fits but token-generation throughput cannot clear agent-loop timeouts.
Dropped the sparkHost guard on the 300 s probe retry in validateOllamaModel. Cold-loading a large model from disk can routinely exceed the default 120 s window on any tight-VRAM dGPU, not just Spark. Fast failures (connection refused) keep timedOut === false and surface immediately.
Added MIN_AUTODETECTED_OLLAMA_CONTEXT_WINDOW = 16_384 and modified applyOllamaRuntimeContextWindow to raise NEMOCLAW_CONTEXT_WINDOW to the floor when the daemon-reported context_length is below it. Ollama's stock num_ctx=4096 cannot fit the OpenClaw agent base prompt + tool catalogue (~7.4 k tokens) so every turn previously hit Context overflow: prompt too large for the model.
Made mergeOllamaLoopbackSystemdOverride write Environment="OLLAMA_CONTEXT_LENGTH=16384" alongside OLLAMA_HOST=127.0.0.1, so a daemon restarted through the override serves the workable context length. Preserves user-supplied values above the NemoClaw floor; strips stale below-floor lines.
Added loop-escape to selectAndValidateOllamaModel: tracks per-model probe-failure counts, threads an excludeModels set through promptOllamaModel, and falls back to provider selection after 2 failures on the same model or 3 failures total. Replaces the previous dead-loop that re-offered the same broken installed model every round.
Added tests covering: L4 23 GB excludes 30B-class entries; computeConstrained: true iGPU excludes compute-intensive entries regardless of memory; runtime context floor raises 4096 → 16384 and preserves 32768; systemd override writes/preserves/strips OLLAMA_CONTEXT_LENGTH; validateOllamaModel retries on non-Spark when timed out; promptOllamaModel excludes failed tags from both menus.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

New Features
- Added GPU compute constraint detection for improved model recommendations on specialized hardware like Jetson platforms.
- Introduced compute-intensive model flagging to avoid recommending unsuitable models on resource-constrained devices.
Improvements
- Enhanced Ollama connection reliability with extended retry logic for timeout scenarios.
- Improved onboarding flow with smarter model exclusion to prevent repeated failures during setup.
- Enforced context window floor for more consistent Ollama runtime behavior.
Tests
- Expanded test coverage for compute-constrained GPU detection and model selection filtering.

…t floor Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

coderabbitai · 2026-06-05T13:42:44Z

Warning

Review limit reached

@cv, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 1 minute and 35 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a0c3ea23-90c1-4def-9f6c-ed1f27eb4453

📥 Commits

Reviewing files that changed from the base of the PR and between 5972e47 and 61f476a.

📒 Files selected for processing (7)

src/lib/inference/local.test.ts
src/lib/inference/ollama-probe-timeout.test.ts
src/lib/onboard.ts
src/lib/onboard/ollama-probe-failure-tracker.test.ts
src/lib/onboard/ollama-probe-failure-tracker.ts
src/lib/onboard/ollama-systemd.test.ts
src/lib/onboard/ollama-systemd.ts

📝 Walkthrough

Walkthrough

This PR fixes two linked dead-loop and context-window issues in Ollama onboarding by detecting compute-constrained platforms, expanding probe timeouts, filtering unsuitable models, preventing model re-selection failures, and enforcing a minimum 16KB context window throughout the runtime and systemd configuration stack.

Changes

Ollama Model Selection and Context Window Hardening

Layer / File(s)	Summary
GPU Capability Detection for Compute Constraints `src/lib/inference/nim.ts`, `src/lib/inference/local.ts`	Added `computeConstrained?: boolean` field to `GpuDetection` and `GpuInfo` interfaces; populated for Jetson platforms to mark integrated/iGPU hosts where compute-intensive models should be filtered.
Probe Timeout Retry Expansion `src/lib/inference/local.ts`, `src/lib/inference/local.test.ts`	Broadened Ollama local probe timeout retry from Spark-only to all hosts; any initial probe timeout now triggers a 300s retry; added test coverage for both fast-fail (no retry) and timeout-then-retry paths.
Model Registry with Compute-Intensive Marking `src/lib/inference/ollama-model-registry.ts`, `src/lib/inference/ollama-model-registry.test.ts`	Extended `OllamaModelEntry` with optional `computeIntensive` flag; marked large models (Qwen 35B, Nemotron 30B) as compute-intensive; updated `modelFitsAvailableMemory`, `fittableOllamaModelTags`, and `anyRegistryModelFits` to exclude compute-intensive entries on compute-constrained hosts; added test coverage for both dGPU and compute-constrained iGPU scenarios.
Model Selection with Exclusion Support `src/lib/inference/ollama/proxy.ts`, `src/lib/inference/ollama/proxy.test.ts`	Extended `promptOllamaModel` to accept optional `promptOptions` with `excludeModels` set; filters out excluded tags from both installed and bootstrap options; prevents re-offering failed models during retries; added test coverage for exclusion filtering and bootstrap fallback.
Onboarding Failure Tracking and Loop Prevention `src/lib/onboard.ts`	Updated `selectAndValidateOllamaModel` to track per-model and total probe failures; accumulates failed models in an exclusion set; passes exclusion set to model prompts; returns to provider selection if per-model or global failure thresholds exceeded, preventing dead-loop on unreachable models.
Context Window Floor Enforcement `src/lib/inference/ollama-runtime-context.ts`, `src/lib/inference/ollama-runtime-context.test.ts`	Introduced and exported `MIN_AUTODETECTED_OLLAMA_CONTEXT_WINDOW = 16384` constant; updated `applyOllamaRuntimeContextWindow` to enforce this floor by returning `Math.max(detected, MIN)` instead of using daemon-reported value directly; branched logging to distinguish floor-raising from using detected context; added test coverage for both raising and preserving scenarios.
Systemd Context Window Override Management `src/lib/onboard/ollama-systemd.ts`, `src/lib/onboard/ollama-systemd.test.ts`	Exported `mergeOllamaLoopbackSystemdOverride` function and enhanced to manage both `OLLAMA_HOST` and `OLLAMA_CONTEXT_LENGTH` in systemd drop-in; preserves existing user-supplied context values above the floor; strips stale/legacy entries; respects `libraryOverride` option; added comprehensive test coverage for override merging, value preservation, and configuration propagation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

NVIDIA/NemoClaw#4776: Overlaps in starter-model registry changes; retrieves PR modifies SMALLEST_OLLAMA_MODEL_TAG and registry inputs while this PR extends the same registry with computeIntensive filtering logic.
NVIDIA/NemoClaw#4132: Related because both PRs extend src/lib/inference/ollama-model-registry.ts selection logic; retrieved PR introduced modelFitsAvailableMemory capacity gating that this PR builds upon with compute-constraint filtering.

Suggested labels

fix, area: inference, area: onboarding, v0.0.59

Suggested reviewers

zyang-dev
cv

Poem

🐰 A Jetson so small, a timeout so long,
We loop, we dead-spin, but now we are strong!
Sixteen kilobytes, a floor we now hold,
No more too-large models offered in cold.
Context windows rise, and failures won't stay—
Onboarding hops forward! Hip-hip-hooray! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(inference): tighten Ollama bootstrap fit and raise runtime context floor' directly summarizes the main changes: tightening model selection/fitting logic and raising the context window floor.
Linked Issues check	✅ Passed	All primary objectives from `#4812` and `#4813` are met: extended probe retries to non-Spark hosts [`#4812`], loop-escape via model exclusion [`#4812`], context-floor enforcement [`#4813`], and systemd override management [`#4813`].
Out of Scope Changes check	✅ Passed	All changes are directly scoped to resolve `#4812` (dead-loop, probe retries, model filtering) and `#4813` (context floor). No unrelated modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/4812-4813-ollama-bootstrap-fit-hardening

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-05T13:45:35Z

E2E Advisor Recommendation

Required E2E: gpu-repo-local-ollama-openclaw, gpu-e2e, ollama-proxy-e2e
Optional E2E: gpu-double-onboard-e2e, onboard-inference-smoke-e2e, strict-tool-call-probe-e2e

Dispatch hint: gpu-repo-local-ollama-openclaw

Auto-dispatched E2E: gpu-e2e via nightly-e2e.yaml at 61f476aa111905978e82ba296b7adae20a5c2320 — nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

gpu-repo-local-ollama-openclaw (high): Highest-signal scenario for this PR: exercises local Ollama onboarding on a GPU runner, sandbox creation, the Ollama auth proxy assertion suite, and local Ollama inference for OpenClaw.
gpu-e2e (high): Runs the real non-interactive Ollama provider user flow: install Ollama, run NemoClaw install/onboard with NEMOCLAW_PROVIDER=ollama, start/pull/validate the model, create a sandbox, and verify inference through the sandbox. This is required because the PR changes model selection, probe retry behavior, context sizing, GPU capacity gating, and Linux Ollama setup.
ollama-proxy-e2e (medium): Validates the real Ollama auth proxy chain, token auth, inference through the proxy, persistence, recovery, and container reachability. Required because src/lib/inference/ollama/proxy.ts is touched and local Ollama proxy behavior is part of the sandbox security boundary.

Optional E2E

gpu-double-onboard-e2e (high): Useful adjacent confidence for re-onboarding with Ollama and proxy-token consistency after the onboard flow changes, but the diff does not directly change token generation/persistence.
onboard-inference-smoke-e2e (low): Optional regression guard for onboard inference validation failure behavior. It is adjacent to the onboard inference path touched here, but it targets a hermetic broken compatible endpoint rather than local Ollama model probing.
strict-tool-call-probe-e2e (low): Optional local-Ollama-adjacent validation of the structured chat-completions tool-call probe path. The PR changes Ollama validation/probing, but not the strict tool-call probe contract directly.

New E2E recommendations

ollama-probe-timeout-and-model-retry (high): Existing real GPU E2E may pass on fast model loads and does not deterministically exercise the new 'first probe timed out, retry with 300s on non-Spark' path or the failure-tracker loop escape behavior.
- Suggested test: Add a hermetic onboard Ollama probe-failure E2E that mocks Ollama /api/generate to time out once, then succeeds, and separately verifies repeated probe failures exclude the failed tag and return to provider selection after the limit.
jetson-compute-constrained-model-selection (medium): The new computeConstrained flag and computeIntensive registry filtering target Jetson/Tegra hosts, but no existing E2E appears to run on Jetson-class hardware or simulate that full non-interactive onboarding decision.
- Suggested test: Add a hermetic platform E2E that stubs detectGpu() as Jetson with high unified memory and asserts non-interactive Ollama onboarding selects the small bootstrap model rather than a 30B/35B compute-intensive tag.
ollama-systemd-context-floor (medium): Unit tests cover mergeOllamaLoopbackSystemdOverride, but there is no Linux service-level E2E proving the installed systemd drop-in sets loopback binding and OLLAMA_CONTEXT_LENGTH, then restarts Ollama with a runtime context usable by NemoClaw.
- Suggested test: Add a Linux Ollama systemd E2E that installs/starts Ollama through onboard, inspects the drop-in, verifies OLLAMA_CONTEXT_LENGTH is at least the NemoClaw floor, and confirms /api/ps reports an adopted context after loading a small model.

Dispatch hint

Workflow: .github/workflows/e2e-scenarios.yaml
jobs input: gpu-repo-local-ollama-openclaw

github-actions · 2026-06-05T13:45:36Z

E2E Scenario Advisor Recommendation

Required scenario E2E: gpu-repo-local-ollama-openclaw
Optional scenario E2E: None

Dispatch required scenario E2E:

gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=gpu-repo-local-ollama-openclaw

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

gpu-repo-local-ollama-openclaw: The PR changes local Ollama onboarding and validation behavior: Ollama model fit selection, probe retry/failure tracking, runtime context auto-configuration, auth proxy model prompting, Linux systemd Ollama setup, and NVIDIA GPU detection metadata. The dispatchable scenario that directly exercises the local Ollama provider path and Ollama auth proxy suites is gpu-repo-local-ollama-openclaw.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=gpu-repo-local-ollama-openclaw

Optional scenario E2E

None.

Relevant changed files

src/lib/inference/local.ts
src/lib/inference/nim.ts
src/lib/inference/ollama-model-registry.ts
src/lib/inference/ollama-runtime-context.ts
src/lib/inference/ollama/proxy.ts
src/lib/onboard.ts
src/lib/onboard/ollama-probe-failure-tracker.ts
src/lib/onboard/ollama-systemd.ts

github-actions · 2026-06-05T13:45:41Z

Selective E2E Results — ⚠️ No requested jobs ran

Run: 27018560230
Target ref: 5972e4704a8f4fcad1a64c33027acc46fc68cf5a
Workflow ref: main
Requested jobs: gpu-e2e
Summary: 0 passed, 0 failed, 1 skipped

Job	Result
gpu-e2e	⏭️ skipped

github-actions · 2026-06-05T13:47:47Z

PR Review Advisor

Findings: 0 needs attention, 4 worth checking, 0 nice ideas
Since last review: 3 prior items resolved, 1 still applies, 1 new item found

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

Source-of-truth review needed: Ollama probe-failure loop escape: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `selectAndValidateOllamaModel()` records failures and passes `excludeModels` into `promptOllamaModel()`, but no caller-level test exercises repeated `prepareOllamaModel()` failures through that loop.
Source-of-truth review needed: Auto-raised Ollama runtime context floor: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `applyOllamaRuntimeContextWindow()` floors the metadata with `Math.max`; daemon-side configuration is implemented in the systemd override path only.
Raised contextWindow can diverge from the actual Ollama daemon context (src/lib/inference/ollama-runtime-context.ts:217): The PR floors any loaded Ollama runtime context below 16384 by setting NEMOCLAW_CONTEXT_WINDOW to 16384, but that only changes NemoClaw's downstream model metadata unless the host daemon is also configured and restarted with OLLAMA_CONTEXT_LENGTH. The Linux systemd path now writes that override, but non-systemd/manual Ollama, macOS, and Windows-host Ollama paths can still report context_length=4096 while the sandbox is told to budget for 16384. That partially satisfies the [Brev][Inference] OpenClaw agent immediately hits context overflow after Ollama onboard — runtime context window 4096 #4813 requirement to raise num_ctx, block the model, or avoid reporting ready: it raises the metadata, but may not prove the runtime was raised.
- Recommendation: Before baking or exporting a floored NEMOCLAW_CONTEXT_WINDOW, either confirm /api/ps reports the raised context length after daemon-side configuration, or keep the detected value and block/warn for paths where NemoClaw cannot raise OLLAMA_CONTEXT_LENGTH. Add a regression for a non-systemd/manual Ollama reporting 4096.
- Evidence: applyOllamaRuntimeContextWindow() computes adopted = Math.max(detected, MIN_AUTODETECTED_OLLAMA_CONTEXT_WINDOW) and writes env.NEMOCLAW_CONTEXT_WINDOW. mergeOllamaLoopbackSystemdOverride() writes OLLAMA_CONTEXT_LENGTH=16384 only for the systemd override path.
Probe-failure loop escape still needs caller-boundary coverage (src/lib/onboard.ts:3901): The tracker and prompt filtering are tested independently, but the real onboarding loop in selectAndValidateOllamaModel() is still not covered. That leaves the [Brev][Onboard] nemoclaw onboard dead-loops on Ollama model selection after the default model probe times out #4812 no-dead-loop acceptance relying on mocked seams rather than a test proving the actual caller records failures, excludes the failed model on the next prompt, and returns to provider selection at the intended threshold.
- Recommendation: Add a caller-level test around selectAndValidateOllamaModel() or an extracted testable wrapper that simulates repeated prepareOllamaModel() failures for an installed model and verifies the failed tag is not re-offered/reselected and the function returns back-to-selection at the configured limit. Document the invalid state and removal condition for the failure tracker.
- Evidence: selectAndValidateOllamaModel() calls probeFailures.recordFailure(selectedModel), handleOllamaProbeFailure(), and promptOllamaModel(gpu, { excludeModels: probeFailures.excludedModels() }); current tests cover OllamaProbeFailureTracker and promptOllamaModel, not this caller loop.

🌱 Nice ideas

None.

Consider writing more tests for

**Runtime validation** — Brev/L4 fresh Ollama onboarding renders `qwen3.5:9b` as the default starter model and does not offer `nemotron-3-nano:30b`.. The PR has useful unit coverage for registry fit, timeout retry, context parsing/flooring, systemd merge, and prompt filtering. It also changes host glue, daemon restart behavior, model runtime metadata, and onboarding loop recovery, so behavior-level validation is still important.
**Runtime validation** — Non-Spark Ollama cold-load timeout retries exactly once with `--max-time 300`, while connection-refused exits after one probe.. The PR has useful unit coverage for registry fit, timeout retry, context parsing/flooring, systemd merge, and prompt filtering. It also changes host glue, daemon restart behavior, model runtime metadata, and onboarding loop recovery, so behavior-level validation is still important.
**Runtime validation** — Linux systemd Ollama override restart results in `/api/ps` reporting `context_length >= 16384` before `NEMOCLAW_CONTEXT_WINDOW=16384` is baked into sandbox modelsConfig.. The PR has useful unit coverage for registry fit, timeout retry, context parsing/flooring, systemd merge, and prompt filtering. It also changes host glue, daemon restart behavior, model runtime metadata, and onboarding loop recovery, so behavior-level validation is still important.
**Runtime validation** — Non-systemd/manual Ollama with `/api/ps context_length=4096` does not silently bake `contextWindow=16384` unless the daemon-side context length was actually raised or explicitly accepted.. The PR has useful unit coverage for registry fit, timeout retry, context parsing/flooring, systemd merge, and prompt filtering. It also changes host glue, daemon restart behavior, model runtime metadata, and onboarding loop recovery, so behavior-level validation is still important.
**Runtime validation** — Repeated failed Ollama model probes through `selectAndValidateOllamaModel()` return to provider selection and do not re-offer or reselect the failed installed model.. The PR has useful unit coverage for registry fit, timeout retry, context parsing/flooring, systemd merge, and prompt filtering. It also changes host glue, daemon restart behavior, model runtime metadata, and onboarding loop recovery, so behavior-level validation is still important.
**Probe-failure loop escape still needs caller-boundary coverage** — Add a caller-level test around selectAndValidateOllamaModel() or an extracted testable wrapper that simulates repeated prepareOllamaModel() failures for an installed model and verifies the failed tag is not re-offered/reselected and the function returns back-to-selection at the configured limit. Document the invalid state and removal condition for the failure tracker.
**Acceptance clause:** [Brev][Onboard] nemoclaw onboard dead-loops on Ollama model selection after the default model probe times out #4812 body: "If a model cannot be loaded on this host, the wizard should not offer it as the default and/or should fall back to a host-fitting model, and must never loop indefinitely with no way forward." — add test evidence or identify existing coverage. Host-fit fallback is covered in the registry tests and prompt filtering excludes failed tags. The remaining gap is caller-boundary evidence that `selectAndValidateOllamaModel()` exits the actual loop after repeated failures.
**Acceptance clause:** [Brev][Onboard] nemoclaw onboard dead-loops on Ollama model selection after the default model probe times out #4812 comment: "Related open issues: - [[DGX Spark][Onboard] Express setup with pre-installed old Ollama (0.6.2) loops on model probe — no version check, no upgrade, no actionable error #4178 [DGX Spark][Onboard] Express setup with pre-installed old Ollama (0.6.2) loops on model probe — no version check, no upgrade, no actionable error]" — add test evidence or identify existing coverage. The existing daemon-failure path remains and this PR adds generic probe-failure limits, but it does not directly implement old-Ollama version checking or upgrade behavior. This reads as related context rather than the direct [Brev][Onboard] nemoclaw onboard dead-loops on Ollama model selection after the default model probe times out #4812 acceptance target.

Since last review details

Current findings:

Source-of-truth review needed: Ollama probe-failure loop escape: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `selectAndValidateOllamaModel()` records failures and passes `excludeModels` into `promptOllamaModel()`, but no caller-level test exercises repeated `prepareOllamaModel()` failures through that loop.
Source-of-truth review needed: Auto-raised Ollama runtime context floor: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `applyOllamaRuntimeContextWindow()` floors the metadata with `Math.max`; daemon-side configuration is implemented in the systemd override path only.
Raised contextWindow can diverge from the actual Ollama daemon context (src/lib/inference/ollama-runtime-context.ts:217): The PR floors any loaded Ollama runtime context below 16384 by setting NEMOCLAW_CONTEXT_WINDOW to 16384, but that only changes NemoClaw's downstream model metadata unless the host daemon is also configured and restarted with OLLAMA_CONTEXT_LENGTH. The Linux systemd path now writes that override, but non-systemd/manual Ollama, macOS, and Windows-host Ollama paths can still report context_length=4096 while the sandbox is told to budget for 16384. That partially satisfies the [Brev][Inference] OpenClaw agent immediately hits context overflow after Ollama onboard — runtime context window 4096 #4813 requirement to raise num_ctx, block the model, or avoid reporting ready: it raises the metadata, but may not prove the runtime was raised.
- Recommendation: Before baking or exporting a floored NEMOCLAW_CONTEXT_WINDOW, either confirm /api/ps reports the raised context length after daemon-side configuration, or keep the detected value and block/warn for paths where NemoClaw cannot raise OLLAMA_CONTEXT_LENGTH. Add a regression for a non-systemd/manual Ollama reporting 4096.
- Evidence: applyOllamaRuntimeContextWindow() computes adopted = Math.max(detected, MIN_AUTODETECTED_OLLAMA_CONTEXT_WINDOW) and writes env.NEMOCLAW_CONTEXT_WINDOW. mergeOllamaLoopbackSystemdOverride() writes OLLAMA_CONTEXT_LENGTH=16384 only for the systemd override path.
Probe-failure loop escape still needs caller-boundary coverage (src/lib/onboard.ts:3901): The tracker and prompt filtering are tested independently, but the real onboarding loop in selectAndValidateOllamaModel() is still not covered. That leaves the [Brev][Onboard] nemoclaw onboard dead-loops on Ollama model selection after the default model probe times out #4812 no-dead-loop acceptance relying on mocked seams rather than a test proving the actual caller records failures, excludes the failed model on the next prompt, and returns to provider selection at the intended threshold.
- Recommendation: Add a caller-level test around selectAndValidateOllamaModel() or an extracted testable wrapper that simulates repeated prepareOllamaModel() failures for an installed model and verifies the failed tag is not re-offered/reselected and the function returns back-to-selection at the configured limit. Document the invalid state and removal condition for the failure tracker.
- Evidence: selectAndValidateOllamaModel() calls probeFailures.recordFailure(selectedModel), handleOllamaProbeFailure(), and promptOllamaModel(gpu, { excludeModels: probeFailures.excludedModels() }); current tests cover OllamaProbeFailureTracker and promptOllamaModel, not this caller loop.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 3878-3888: Extract the new probe-failure state and thresholds into
a small helper (e.g., ProbeFailureTracker) instead of keeping
probeFailureCounts, excludedAfterRepeatFail, MAX_PROBE_FAILS_SAME_MODEL,
MAX_PROBE_FAILS_TOTAL, and totalProbeFailures inline in onboard.ts: create a
module that encapsulates the Map/Set and counters and exposes methods like
recordFailure(tag):boolean (returns whether tag is now excluded),
shouldExclude(tag):boolean, getTotalFailures():number, and reset(); then replace
the inline variables/logic in the onboarding orchestration with a lightweight
instance call to those methods (update the spots that currently reference
probeFailureCounts/excludedAfterRepeatFail/totalProbeFailures or apply the
thresholds) so the function remains orchestration-only and file growth is moved
to the new helper module.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b0faa8bd-9884-4af2-a920-a9f312bcb232

📥 Commits

Reviewing files that changed from the base of the PR and between 5dac380 and 5972e47.

📒 Files selected for processing (12)

src/lib/inference/local.test.ts
src/lib/inference/local.ts
src/lib/inference/nim.ts
src/lib/inference/ollama-model-registry.test.ts
src/lib/inference/ollama-model-registry.ts
src/lib/inference/ollama-runtime-context.test.ts
src/lib/inference/ollama-runtime-context.ts
src/lib/inference/ollama/proxy.test.ts
src/lib/inference/ollama/proxy.ts
src/lib/onboard.ts
src/lib/onboard/ollama-systemd.test.ts
src/lib/onboard/ollama-systemd.ts

github-actions · 2026-06-05T19:39:27Z

Selective E2E Results — ⚠️ No requested jobs ran

Run: 27036160234
Target ref: cac1d6654d8a4a509cc0c6a70fe5eb618a2666b7
Workflow ref: main
Requested jobs: gpu-e2e,gpu-double-onboard-e2e
Summary: 0 passed, 0 failed, 2 skipped

Job	Result
gpu-double-onboard-e2e	⏭️ skipped
gpu-e2e	⏭️ skipped

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

github-actions · 2026-06-05T21:00:20Z

Selective E2E Results — ⚠️ No requested jobs ran

Run: 27039854857
Target ref: 61f476aa111905978e82ba296b7adae20a5c2320
Workflow ref: main
Requested jobs: gpu-e2e
Summary: 0 passed, 0 failed, 1 skipped

Job	Result
gpu-e2e	⏭️ skipped

## Summary - Adds the `v0.0.60` section to `docs/about/release-notes.mdx` using the dev announcement from discussion #4877. - Fills the source-doc gaps found during release-prep review across inference, policy tiers, command behavior, security boundaries, Hermes dashboard/tooling, runtime context, and troubleshooting. - Refreshes generated agent skills under `.agents/skills/` from the current Fern docs output and upgrades Fern from `5.44.3` to `5.45.0`. ## Source summary - #4037 -> `docs/reference/architecture.mdx`, `docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents system-only runtime context that stays out of visible chat. - #4875 -> `docs/reference/architecture.mdx`, `docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents try-first sandbox network/filesystem guidance and clearer failure classification. - #4788 -> `docs/security/best-practices.mdx`, `docs/about/release-notes.mdx`: Documents shared OpenClaw device-approval policy for startup and connect. - #4768 -> `docs/reference/network-policies.mdx`, `docs/network-policy/integration-policy-examples.mdx`, `docs/get-started/quickstart.mdx`, `docs/get-started/quickstart-hermes.mdx`, `docs/reference/commands.mdx`: Documents `weather`, `public-reference`, and Hermes managed-tool gateway preset behavior. - #3788 and #4864 -> `docs/reference/network-policies.mdx`, `docs/reference/commands.mdx`: Documents non-interactive policy-tier fail-fast behavior and interactive prompt fallback. - #4756 and #4866 -> `docs/reference/commands.mdx`: Documents env-aware default sandbox resolution for `list`, `status`, and `tunnel` commands. - #4320 -> `docs/reference/commands.mdx`: Documents `$$nemoclaw tunnel status` behavior. - #4328 -> `docs/reference/commands.mdx`: Documents line-scoped policy preset descriptions in `policy-list`. - #4580 and #4748 -> `docs/reference/architecture.mdx`: Documents package-managed OpenShell gateway service and Docker-driver gateway-marker behavior. - #4598 -> `docs/manage-sandboxes/lifecycle.mdx`: Documents concurrent gateway/dashboard cleanup isolation by sandbox name and port. - #4777 -> `docs/reference/troubleshooting.mdx`: Documents Docker GPU patch rollback behavior. - #4610 -> `docs/reference/troubleshooting.mdx`, `docs/reference/commands.mdx`: Keeps mutable OpenClaw config permission guidance aligned and removes skipped experimental wording. - #4868 -> `docs/reference/commands.mdx`: Keeps `.dockerignore` handling for custom `onboard --from <Dockerfile>` contexts in generated skills. - #4870 -> `docs/reference/commands.mdx`, `docs/manage-sandboxes/runtime-controls.mdx`: Documents `NEMOCLAW_MINIMAL_BOOTSTRAP` and generated skill coverage. - #4641 -> `docs/inference/inference-options.mdx`, `docs/reference/troubleshooting.mdx`: Documents local NVIDIA NIM platform-digest pulls and served-model id adoption. - #4810 and #4867 -> `docs/inference/inference-options.mdx`: Documents stable NGC managed-vLLM image lineage and DGX Station DeepSeek V4 Flash coverage. - #4852 -> `docs/inference/use-local-inference.mdx`, `docs/reference/troubleshooting.mdx`: Documents Ollama model fit filtering, 16K context floor, cold-load retry, and failed-model exclusion. - #4847 -> `docs/inference/switch-inference-providers.mdx`: Documents API-family sync, Hermes `api_mode`, and Bedrock Runtime exception. - #4800 -> `docs/inference/tool-calling-reliability.mdx`: Documents Nemotron managed-inference native tool-search fallback. - #4333 -> `docs/inference/switch-inference-providers.mdx`: Documents interactive multimodal input prompting. - #4086 -> `docs/reference/troubleshooting.mdx`: Keeps proxy bypass normalization in generated troubleshooting coverage. - #4811 and #4855 -> `docs/get-started/quickstart-hermes.mdx`: Documents prebuilt Hermes dashboard assets and TUI recovery without runtime rebuilds. - #4854 -> `docs/inference/switch-inference-providers.mdx`, `docs/reference/commands.mdx`: Documents Hermes proxy API-key placeholder preservation during inference switches. - #4248 -> `docs/manage-sandboxes/messaging-channels.mdx`, `.agents/skills/`: Keeps messaging enrollment behavior aligned with manifest-hook implementation. - #4771 -> `docs/security/best-practices.mdx`, `docs/security/credential-storage.mdx`: Documents Hermes placeholder-only secret boundary for sandbox-visible runtime files. - #4787 -> `docs/security/best-practices.mdx`, `docs/about/release-notes.mdx`: Documents expanded memory scanner examples for OpenAI project keys and Slack app-level tokens. - #4848 -> `docs/reference/commands.mdx`: Documents OpenClaw skill install mirroring into the agent home directory. - #4790 -> `docs/about/release-notes.mdx`: Uses the prior release-prep structure and generated `.agents/skills/` refresh as the template for this release. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ skills/ --prefix nemoclaw-user --doc-platform fern-mdx --dry-run` - `npm run docs` - `git diff --check` - skip-term scan across `docs/`, `.agents/skills/`, and `skills/` - `npm run build:cli` - `npm run typecheck:cli` - Commit and pre-push hook suites, including markdownlint, gitleaks, env-var docs gate, docs-to-skills verification, and skills YAML tests  ## Summary by CodeRabbit ## Release Notes * **New Features** * DeepSeek-V4-Flash now available as default inference model for DGX Station. * Hermes dashboard improved with dedicated port and OAuth-authenticated tool gateway selection. * Added weather and public-reference policy presets for expanded agent capabilities. * Enhanced Ollama model selection with GPU memory filtering and automatic retry for timeouts. * **Bug Fixes** * Improved policy tier validation to prevent invalid configurations. * Better sandbox cleanup scoping by port to prevent conflicts across deployments. * Added GPU patch failure recovery with automatic rollback. * **Documentation** * Expanded troubleshooting guides for inference, security, and sandbox lifecycle. * Added .dockerignore best practices for custom deployments.  --------- Co-authored-by: Carlos Villela <cvillela@nvidia.com>

fix(inference): tighten Ollama bootstrap fit and raise runtime contex…

5972e47

…t floor Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

laitingsheng added the bug-fix PR fixes a bug or regression label Jun 5, 2026

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread src/lib/onboard.ts Outdated

laitingsheng added the provider: ollama Ollama local model provider behavior label Jun 5, 2026

Merge branch 'main' into fix/4812-4813-ollama-bootstrap-fit-hardening

cac1d66

cv added 2 commits June 5, 2026 12:42

refactor(onboard): extract Ollama probe failure tracking

c8e73ba

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

test(inference): cover Ollama edge cases outside hotspots

61f476a

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv approved these changes Jun 5, 2026

View reviewed changes

cv merged commit ed75d14 into main Jun 5, 2026
28 checks passed

cv deleted the fix/4812-4813-ollama-bootstrap-fit-hardening branch June 5, 2026 21:53

miyoungc mentioned this pull request Jun 6, 2026

docs: refresh v0.0.60 release notes #4879

Merged

laitingsheng mentioned this pull request Jun 6, 2026

feat(onboard): register operator-supplied extra placeholder keys #4889

Merged

12 tasks

Conversation

laitingsheng commented Jun 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented Jun 5, 2026

Selective E2E Results — ⚠️ No requested jobs ran

Uh oh!

github-actions Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 5, 2026

Selective E2E Results — ⚠️ No requested jobs ran

Uh oh!

github-actions Bot commented Jun 5, 2026

Selective E2E Results — ⚠️ No requested jobs ran

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

laitingsheng commented Jun 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading

github-actions Bot commented Jun 5, 2026 •

edited

Loading

github-actions Bot commented Jun 5, 2026 •

edited

Loading

github-actions Bot commented Jun 5, 2026 •

edited

Loading