Skip to content

fix(inference): tighten Ollama bootstrap fit and raise runtime context floor#4852

Merged
cv merged 4 commits into
mainfrom
fix/4812-4813-ollama-bootstrap-fit-hardening
Jun 5, 2026
Merged

fix(inference): tighten Ollama bootstrap fit and raise runtime context floor#4852
cv merged 4 commits into
mainfrom
fix/4812-4813-ollama-bootstrap-fit-hardening

Conversation

@laitingsheng
Copy link
Copy Markdown
Contributor

@laitingsheng laitingsheng commented Jun 5, 2026

Summary

Three compounding faults steer Ollama onboarding into a dead-loop on tight-VRAM dGPU hosts (L4 23 GB) and leave the agent with a 4096-token runtime context window that cannot fit the base prompt + tool catalogue.

Tightens the bootstrap-model registry, raises the auto-adopted runtime context window to a workable floor, extends the cold-load probe retry to non-Spark hosts, and breaks the model-selection re-prompt out of dead-loops after repeated probe failures.

Related Issue

Fixes #4812
Fixes #4813
Refs #3707

Changes

  • Raised registry requiredMemoryMB so 30B-class entries no longer pass the fit check on L4-class 23 GB dGPUs: nemotron-3-nano:30b 22000 → 26000 and qwen3.6:35b 26000 → 30000. The original 22000 budget left ~1 GB headroom over the 19 GB on-disk weight, which is not enough for KV cache + activations + agent prompt at default context; the runner ended up spilling GPU→CPU during warm-up and the probe timed out.
  • Added OllamaModelEntry.computeIntensive and GpuInfo.computeConstrained so fittableOllamaModelTags / modelFitsAvailableMemory / anyRegistryModelFits skip 30B-class entries on integrated-GPU hosts (platform === "jetson"), where memory ostensibly fits but token-generation throughput cannot clear agent-loop timeouts.
  • Dropped the sparkHost guard on the 300 s probe retry in validateOllamaModel. Cold-loading a large model from disk can routinely exceed the default 120 s window on any tight-VRAM dGPU, not just Spark. Fast failures (connection refused) keep timedOut === false and surface immediately.
  • Added MIN_AUTODETECTED_OLLAMA_CONTEXT_WINDOW = 16_384 and modified applyOllamaRuntimeContextWindow to raise NEMOCLAW_CONTEXT_WINDOW to the floor when the daemon-reported context_length is below it. Ollama's stock num_ctx=4096 cannot fit the OpenClaw agent base prompt + tool catalogue (~7.4 k tokens) so every turn previously hit Context overflow: prompt too large for the model.
  • Made mergeOllamaLoopbackSystemdOverride write Environment="OLLAMA_CONTEXT_LENGTH=16384" alongside OLLAMA_HOST=127.0.0.1, so a daemon restarted through the override serves the workable context length. Preserves user-supplied values above the NemoClaw floor; strips stale below-floor lines.
  • Added loop-escape to selectAndValidateOllamaModel: tracks per-model probe-failure counts, threads an excludeModels set through promptOllamaModel, and falls back to provider selection after 2 failures on the same model or 3 failures total. Replaces the previous dead-loop that re-offered the same broken installed model every round.
  • Added tests covering: L4 23 GB excludes 30B-class entries; computeConstrained: true iGPU excludes compute-intensive entries regardless of memory; runtime context floor raises 4096 → 16384 and preserves 32768; systemd override writes/preserves/strips OLLAMA_CONTEXT_LENGTH; validateOllamaModel retries on non-Spark when timed out; promptOllamaModel excludes failed tags from both menus.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

  • New Features

    • Added GPU compute constraint detection for improved model recommendations on specialized hardware like Jetson platforms.
    • Introduced compute-intensive model flagging to avoid recommending unsuitable models on resource-constrained devices.
  • Improvements

    • Enhanced Ollama connection reliability with extended retry logic for timeout scenarios.
    • Improved onboarding flow with smarter model exclusion to prevent repeated failures during setup.
    • Enforced context window floor for more consistent Ollama runtime behavior.
  • Tests

    • Expanded test coverage for compute-constrained GPU detection and model selection filtering.

…t floor

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 5, 2026

Review Change Stack

Warning

Review limit reached

@cv, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 1 minute and 35 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a0c3ea23-90c1-4def-9f6c-ed1f27eb4453

📥 Commits

Reviewing files that changed from the base of the PR and between 5972e47 and 61f476a.

📒 Files selected for processing (7)
  • src/lib/inference/local.test.ts
  • src/lib/inference/ollama-probe-timeout.test.ts
  • src/lib/onboard.ts
  • src/lib/onboard/ollama-probe-failure-tracker.test.ts
  • src/lib/onboard/ollama-probe-failure-tracker.ts
  • src/lib/onboard/ollama-systemd.test.ts
  • src/lib/onboard/ollama-systemd.ts
📝 Walkthrough

Walkthrough

This PR fixes two linked dead-loop and context-window issues in Ollama onboarding by detecting compute-constrained platforms, expanding probe timeouts, filtering unsuitable models, preventing model re-selection failures, and enforcing a minimum 16KB context window throughout the runtime and systemd configuration stack.

Changes

Ollama Model Selection and Context Window Hardening

Layer / File(s) Summary
GPU Capability Detection for Compute Constraints
src/lib/inference/nim.ts, src/lib/inference/local.ts
Added computeConstrained?: boolean field to GpuDetection and GpuInfo interfaces; populated for Jetson platforms to mark integrated/iGPU hosts where compute-intensive models should be filtered.
Probe Timeout Retry Expansion
src/lib/inference/local.ts, src/lib/inference/local.test.ts
Broadened Ollama local probe timeout retry from Spark-only to all hosts; any initial probe timeout now triggers a 300s retry; added test coverage for both fast-fail (no retry) and timeout-then-retry paths.
Model Registry with Compute-Intensive Marking
src/lib/inference/ollama-model-registry.ts, src/lib/inference/ollama-model-registry.test.ts
Extended OllamaModelEntry with optional computeIntensive flag; marked large models (Qwen 35B, Nemotron 30B) as compute-intensive; updated modelFitsAvailableMemory, fittableOllamaModelTags, and anyRegistryModelFits to exclude compute-intensive entries on compute-constrained hosts; added test coverage for both dGPU and compute-constrained iGPU scenarios.
Model Selection with Exclusion Support
src/lib/inference/ollama/proxy.ts, src/lib/inference/ollama/proxy.test.ts
Extended promptOllamaModel to accept optional promptOptions with excludeModels set; filters out excluded tags from both installed and bootstrap options; prevents re-offering failed models during retries; added test coverage for exclusion filtering and bootstrap fallback.
Onboarding Failure Tracking and Loop Prevention
src/lib/onboard.ts
Updated selectAndValidateOllamaModel to track per-model and total probe failures; accumulates failed models in an exclusion set; passes exclusion set to model prompts; returns to provider selection if per-model or global failure thresholds exceeded, preventing dead-loop on unreachable models.
Context Window Floor Enforcement
src/lib/inference/ollama-runtime-context.ts, src/lib/inference/ollama-runtime-context.test.ts
Introduced and exported MIN_AUTODETECTED_OLLAMA_CONTEXT_WINDOW = 16384 constant; updated applyOllamaRuntimeContextWindow to enforce this floor by returning Math.max(detected, MIN) instead of using daemon-reported value directly; branched logging to distinguish floor-raising from using detected context; added test coverage for both raising and preserving scenarios.
Systemd Context Window Override Management
src/lib/onboard/ollama-systemd.ts, src/lib/onboard/ollama-systemd.test.ts
Exported mergeOllamaLoopbackSystemdOverride function and enhanced to manage both OLLAMA_HOST and OLLAMA_CONTEXT_LENGTH in systemd drop-in; preserves existing user-supplied context values above the floor; strips stale/legacy entries; respects libraryOverride option; added comprehensive test coverage for override merging, value preservation, and configuration propagation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#4776: Overlaps in starter-model registry changes; retrieves PR modifies SMALLEST_OLLAMA_MODEL_TAG and registry inputs while this PR extends the same registry with computeIntensive filtering logic.
  • NVIDIA/NemoClaw#4132: Related because both PRs extend src/lib/inference/ollama-model-registry.ts selection logic; retrieved PR introduced modelFitsAvailableMemory capacity gating that this PR builds upon with compute-constraint filtering.

Suggested labels

fix, area: inference, area: onboarding, v0.0.59

Suggested reviewers

  • zyang-dev
  • cv

Poem

🐰 A Jetson so small, a timeout so long,
We loop, we dead-spin, but now we are strong!
Sixteen kilobytes, a floor we now hold,
No more too-large models offered in cold.
Context windows rise, and failures won't stay—
Onboarding hops forward! Hip-hip-hooray! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(inference): tighten Ollama bootstrap fit and raise runtime context floor' directly summarizes the main changes: tightening model selection/fitting logic and raising the context window floor.
Linked Issues check ✅ Passed All primary objectives from #4812 and #4813 are met: extended probe retries to non-Spark hosts [#4812], loop-escape via model exclusion [#4812], context-floor enforcement [#4813], and systemd override management [#4813].
Out of Scope Changes check ✅ Passed All changes are directly scoped to resolve #4812 (dead-loop, probe retries, model filtering) and #4813 (context floor). No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/4812-4813-ollama-bootstrap-fit-hardening

Comment @coderabbitai help to get the list of available commands and usage tips.

@laitingsheng laitingsheng added the bug-fix PR fixes a bug or regression label Jun 5, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

E2E Advisor Recommendation

Required E2E: gpu-repo-local-ollama-openclaw, gpu-e2e, ollama-proxy-e2e
Optional E2E: gpu-double-onboard-e2e, onboard-inference-smoke-e2e, strict-tool-call-probe-e2e

Dispatch hint: gpu-repo-local-ollama-openclaw

Auto-dispatched E2E: gpu-e2e via nightly-e2e.yaml at 61f476aa111905978e82ba296b7adae20a5c2320nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • gpu-repo-local-ollama-openclaw (high): Highest-signal scenario for this PR: exercises local Ollama onboarding on a GPU runner, sandbox creation, the Ollama auth proxy assertion suite, and local Ollama inference for OpenClaw.
  • gpu-e2e (high): Runs the real non-interactive Ollama provider user flow: install Ollama, run NemoClaw install/onboard with NEMOCLAW_PROVIDER=ollama, start/pull/validate the model, create a sandbox, and verify inference through the sandbox. This is required because the PR changes model selection, probe retry behavior, context sizing, GPU capacity gating, and Linux Ollama setup.
  • ollama-proxy-e2e (medium): Validates the real Ollama auth proxy chain, token auth, inference through the proxy, persistence, recovery, and container reachability. Required because src/lib/inference/ollama/proxy.ts is touched and local Ollama proxy behavior is part of the sandbox security boundary.

Optional E2E

  • gpu-double-onboard-e2e (high): Useful adjacent confidence for re-onboarding with Ollama and proxy-token consistency after the onboard flow changes, but the diff does not directly change token generation/persistence.
  • onboard-inference-smoke-e2e (low): Optional regression guard for onboard inference validation failure behavior. It is adjacent to the onboard inference path touched here, but it targets a hermetic broken compatible endpoint rather than local Ollama model probing.
  • strict-tool-call-probe-e2e (low): Optional local-Ollama-adjacent validation of the structured chat-completions tool-call probe path. The PR changes Ollama validation/probing, but not the strict tool-call probe contract directly.

New E2E recommendations

  • ollama-probe-timeout-and-model-retry (high): Existing real GPU E2E may pass on fast model loads and does not deterministically exercise the new 'first probe timed out, retry with 300s on non-Spark' path or the failure-tracker loop escape behavior.
    • Suggested test: Add a hermetic onboard Ollama probe-failure E2E that mocks Ollama /api/generate to time out once, then succeeds, and separately verifies repeated probe failures exclude the failed tag and return to provider selection after the limit.
  • jetson-compute-constrained-model-selection (medium): The new computeConstrained flag and computeIntensive registry filtering target Jetson/Tegra hosts, but no existing E2E appears to run on Jetson-class hardware or simulate that full non-interactive onboarding decision.
    • Suggested test: Add a hermetic platform E2E that stubs detectGpu() as Jetson with high unified memory and asserts non-interactive Ollama onboarding selects the small bootstrap model rather than a 30B/35B compute-intensive tag.
  • ollama-systemd-context-floor (medium): Unit tests cover mergeOllamaLoopbackSystemdOverride, but there is no Linux service-level E2E proving the installed systemd drop-in sets loopback binding and OLLAMA_CONTEXT_LENGTH, then restarts Ollama with a runtime context usable by NemoClaw.
    • Suggested test: Add a Linux Ollama systemd E2E that installs/starts Ollama through onboard, inspects the drop-in, verifies OLLAMA_CONTEXT_LENGTH is at least the NemoClaw floor, and confirms /api/ps reports an adopted context after loading a small model.

Dispatch hint

  • Workflow: .github/workflows/e2e-scenarios.yaml
  • jobs input: gpu-repo-local-ollama-openclaw

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

E2E Scenario Advisor Recommendation

Required scenario E2E: gpu-repo-local-ollama-openclaw
Optional scenario E2E: None

Dispatch required scenario E2E:

  • gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=gpu-repo-local-ollama-openclaw

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • gpu-repo-local-ollama-openclaw: The PR changes local Ollama onboarding and validation behavior: Ollama model fit selection, probe retry/failure tracking, runtime context auto-configuration, auth proxy model prompting, Linux systemd Ollama setup, and NVIDIA GPU detection metadata. The dispatchable scenario that directly exercises the local Ollama provider path and Ollama auth proxy suites is gpu-repo-local-ollama-openclaw.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=gpu-repo-local-ollama-openclaw

Optional scenario E2E

  • None.

Relevant changed files

  • src/lib/inference/local.ts
  • src/lib/inference/nim.ts
  • src/lib/inference/ollama-model-registry.ts
  • src/lib/inference/ollama-runtime-context.ts
  • src/lib/inference/ollama/proxy.ts
  • src/lib/onboard.ts
  • src/lib/onboard/ollama-probe-failure-tracker.ts
  • src/lib/onboard/ollama-systemd.ts

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Selective E2E Results — ⚠️ No requested jobs ran

Run: 27018560230
Target ref: 5972e4704a8f4fcad1a64c33027acc46fc68cf5a
Workflow ref: main
Requested jobs: gpu-e2e
Summary: 0 passed, 0 failed, 1 skipped

Job Result
gpu-e2e ⏭️ skipped

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

PR Review Advisor

Findings: 0 needs attention, 4 worth checking, 0 nice ideas
Since last review: 3 prior items resolved, 1 still applies, 1 new item found

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • Source-of-truth review needed: Ollama probe-failure loop escape: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `selectAndValidateOllamaModel()` records failures and passes `excludeModels` into `promptOllamaModel()`, but no caller-level test exercises repeated `prepareOllamaModel()` failures through that loop.
  • Source-of-truth review needed: Auto-raised Ollama runtime context floor: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `applyOllamaRuntimeContextWindow()` floors the metadata with `Math.max`; daemon-side configuration is implemented in the systemd override path only.
  • Raised contextWindow can diverge from the actual Ollama daemon context (src/lib/inference/ollama-runtime-context.ts:217): The PR floors any loaded Ollama runtime context below 16384 by setting NEMOCLAW_CONTEXT_WINDOW to 16384, but that only changes NemoClaw's downstream model metadata unless the host daemon is also configured and restarted with OLLAMA_CONTEXT_LENGTH. The Linux systemd path now writes that override, but non-systemd/manual Ollama, macOS, and Windows-host Ollama paths can still report context_length=4096 while the sandbox is told to budget for 16384. That partially satisfies the [Brev][Inference] OpenClaw agent immediately hits context overflow after Ollama onboard — runtime context window 4096 #4813 requirement to raise num_ctx, block the model, or avoid reporting ready: it raises the metadata, but may not prove the runtime was raised.
    • Recommendation: Before baking or exporting a floored NEMOCLAW_CONTEXT_WINDOW, either confirm /api/ps reports the raised context length after daemon-side configuration, or keep the detected value and block/warn for paths where NemoClaw cannot raise OLLAMA_CONTEXT_LENGTH. Add a regression for a non-systemd/manual Ollama reporting 4096.
    • Evidence: applyOllamaRuntimeContextWindow() computes adopted = Math.max(detected, MIN_AUTODETECTED_OLLAMA_CONTEXT_WINDOW) and writes env.NEMOCLAW_CONTEXT_WINDOW. mergeOllamaLoopbackSystemdOverride() writes OLLAMA_CONTEXT_LENGTH=16384 only for the systemd override path.
  • Probe-failure loop escape still needs caller-boundary coverage (src/lib/onboard.ts:3901): The tracker and prompt filtering are tested independently, but the real onboarding loop in selectAndValidateOllamaModel() is still not covered. That leaves the [Brev][Onboard] nemoclaw onboard dead-loops on Ollama model selection after the default model probe times out #4812 no-dead-loop acceptance relying on mocked seams rather than a test proving the actual caller records failures, excludes the failed model on the next prompt, and returns to provider selection at the intended threshold.
    • Recommendation: Add a caller-level test around selectAndValidateOllamaModel() or an extracted testable wrapper that simulates repeated prepareOllamaModel() failures for an installed model and verifies the failed tag is not re-offered/reselected and the function returns back-to-selection at the configured limit. Document the invalid state and removal condition for the failure tracker.
    • Evidence: selectAndValidateOllamaModel() calls probeFailures.recordFailure(selectedModel), handleOllamaProbeFailure(), and promptOllamaModel(gpu, { excludeModels: probeFailures.excludedModels() }); current tests cover OllamaProbeFailureTracker and promptOllamaModel, not this caller loop.

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — Brev/L4 fresh Ollama onboarding renders `qwen3.5:9b` as the default starter model and does not offer `nemotron-3-nano:30b`.. The PR has useful unit coverage for registry fit, timeout retry, context parsing/flooring, systemd merge, and prompt filtering. It also changes host glue, daemon restart behavior, model runtime metadata, and onboarding loop recovery, so behavior-level validation is still important.
  • **Runtime validation** — Non-Spark Ollama cold-load timeout retries exactly once with `--max-time 300`, while connection-refused exits after one probe.. The PR has useful unit coverage for registry fit, timeout retry, context parsing/flooring, systemd merge, and prompt filtering. It also changes host glue, daemon restart behavior, model runtime metadata, and onboarding loop recovery, so behavior-level validation is still important.
  • **Runtime validation** — Linux systemd Ollama override restart results in `/api/ps` reporting `context_length >= 16384` before `NEMOCLAW_CONTEXT_WINDOW=16384` is baked into sandbox modelsConfig.. The PR has useful unit coverage for registry fit, timeout retry, context parsing/flooring, systemd merge, and prompt filtering. It also changes host glue, daemon restart behavior, model runtime metadata, and onboarding loop recovery, so behavior-level validation is still important.
  • **Runtime validation** — Non-systemd/manual Ollama with `/api/ps context_length=4096` does not silently bake `contextWindow=16384` unless the daemon-side context length was actually raised or explicitly accepted.. The PR has useful unit coverage for registry fit, timeout retry, context parsing/flooring, systemd merge, and prompt filtering. It also changes host glue, daemon restart behavior, model runtime metadata, and onboarding loop recovery, so behavior-level validation is still important.
  • **Runtime validation** — Repeated failed Ollama model probes through `selectAndValidateOllamaModel()` return to provider selection and do not re-offer or reselect the failed installed model.. The PR has useful unit coverage for registry fit, timeout retry, context parsing/flooring, systemd merge, and prompt filtering. It also changes host glue, daemon restart behavior, model runtime metadata, and onboarding loop recovery, so behavior-level validation is still important.
  • **Probe-failure loop escape still needs caller-boundary coverage** — Add a caller-level test around selectAndValidateOllamaModel() or an extracted testable wrapper that simulates repeated prepareOllamaModel() failures for an installed model and verifies the failed tag is not re-offered/reselected and the function returns back-to-selection at the configured limit. Document the invalid state and removal condition for the failure tracker.
  • **Acceptance clause:** [Brev][Onboard] nemoclaw onboard dead-loops on Ollama model selection after the default model probe times out #4812 body: "If a model cannot be loaded on this host, the wizard should not offer it as the default and/or should fall back to a host-fitting model, and must never loop indefinitely with no way forward." — add test evidence or identify existing coverage. Host-fit fallback is covered in the registry tests and prompt filtering excludes failed tags. The remaining gap is caller-boundary evidence that `selectAndValidateOllamaModel()` exits the actual loop after repeated failures.
  • **Acceptance clause:** [Brev][Onboard] nemoclaw onboard dead-loops on Ollama model selection after the default model probe times out #4812 comment: "Related open issues: - [[DGX Spark][Onboard] Express setup with pre-installed old Ollama (0.6.2) loops on model probe — no version check, no upgrade, no actionable error #4178 [DGX Spark][Onboard] Express setup with pre-installed old Ollama (0.6.2) loops on model probe — no version check, no upgrade, no actionable error]" — add test evidence or identify existing coverage. The existing daemon-failure path remains and this PR adds generic probe-failure limits, but it does not directly implement old-Ollama version checking or upgrade behavior. This reads as related context rather than the direct [Brev][Onboard] nemoclaw onboard dead-loops on Ollama model selection after the default model probe times out #4812 acceptance target.
Since last review details

Current findings:

  • Source-of-truth review needed: Ollama probe-failure loop escape: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `selectAndValidateOllamaModel()` records failures and passes `excludeModels` into `promptOllamaModel()`, but no caller-level test exercises repeated `prepareOllamaModel()` failures through that loop.
  • Source-of-truth review needed: Auto-raised Ollama runtime context floor: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `applyOllamaRuntimeContextWindow()` floors the metadata with `Math.max`; daemon-side configuration is implemented in the systemd override path only.
  • Raised contextWindow can diverge from the actual Ollama daemon context (src/lib/inference/ollama-runtime-context.ts:217): The PR floors any loaded Ollama runtime context below 16384 by setting NEMOCLAW_CONTEXT_WINDOW to 16384, but that only changes NemoClaw's downstream model metadata unless the host daemon is also configured and restarted with OLLAMA_CONTEXT_LENGTH. The Linux systemd path now writes that override, but non-systemd/manual Ollama, macOS, and Windows-host Ollama paths can still report context_length=4096 while the sandbox is told to budget for 16384. That partially satisfies the [Brev][Inference] OpenClaw agent immediately hits context overflow after Ollama onboard — runtime context window 4096 #4813 requirement to raise num_ctx, block the model, or avoid reporting ready: it raises the metadata, but may not prove the runtime was raised.
    • Recommendation: Before baking or exporting a floored NEMOCLAW_CONTEXT_WINDOW, either confirm /api/ps reports the raised context length after daemon-side configuration, or keep the detected value and block/warn for paths where NemoClaw cannot raise OLLAMA_CONTEXT_LENGTH. Add a regression for a non-systemd/manual Ollama reporting 4096.
    • Evidence: applyOllamaRuntimeContextWindow() computes adopted = Math.max(detected, MIN_AUTODETECTED_OLLAMA_CONTEXT_WINDOW) and writes env.NEMOCLAW_CONTEXT_WINDOW. mergeOllamaLoopbackSystemdOverride() writes OLLAMA_CONTEXT_LENGTH=16384 only for the systemd override path.
  • Probe-failure loop escape still needs caller-boundary coverage (src/lib/onboard.ts:3901): The tracker and prompt filtering are tested independently, but the real onboarding loop in selectAndValidateOllamaModel() is still not covered. That leaves the [Brev][Onboard] nemoclaw onboard dead-loops on Ollama model selection after the default model probe times out #4812 no-dead-loop acceptance relying on mocked seams rather than a test proving the actual caller records failures, excludes the failed model on the next prompt, and returns to provider selection at the intended threshold.
    • Recommendation: Add a caller-level test around selectAndValidateOllamaModel() or an extracted testable wrapper that simulates repeated prepareOllamaModel() failures for an installed model and verifies the failed tag is not re-offered/reselected and the function returns back-to-selection at the configured limit. Document the invalid state and removal condition for the failure tracker.
    • Evidence: selectAndValidateOllamaModel() calls probeFailures.recordFailure(selectedModel), handleOllamaProbeFailure(), and promptOllamaModel(gpu, { excludeModels: probeFailures.excludedModels() }); current tests cover OllamaProbeFailureTracker and promptOllamaModel, not this caller loop.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 3878-3888: Extract the new probe-failure state and thresholds into
a small helper (e.g., ProbeFailureTracker) instead of keeping
probeFailureCounts, excludedAfterRepeatFail, MAX_PROBE_FAILS_SAME_MODEL,
MAX_PROBE_FAILS_TOTAL, and totalProbeFailures inline in onboard.ts: create a
module that encapsulates the Map/Set and counters and exposes methods like
recordFailure(tag):boolean (returns whether tag is now excluded),
shouldExclude(tag):boolean, getTotalFailures():number, and reset(); then replace
the inline variables/logic in the onboarding orchestration with a lightweight
instance call to those methods (update the spots that currently reference
probeFailureCounts/excludedAfterRepeatFail/totalProbeFailures or apply the
thresholds) so the function remains orchestration-only and file growth is moved
to the new helper module.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b0faa8bd-9884-4af2-a920-a9f312bcb232

📥 Commits

Reviewing files that changed from the base of the PR and between 5dac380 and 5972e47.

📒 Files selected for processing (12)
  • src/lib/inference/local.test.ts
  • src/lib/inference/local.ts
  • src/lib/inference/nim.ts
  • src/lib/inference/ollama-model-registry.test.ts
  • src/lib/inference/ollama-model-registry.ts
  • src/lib/inference/ollama-runtime-context.test.ts
  • src/lib/inference/ollama-runtime-context.ts
  • src/lib/inference/ollama/proxy.test.ts
  • src/lib/inference/ollama/proxy.ts
  • src/lib/onboard.ts
  • src/lib/onboard/ollama-systemd.test.ts
  • src/lib/onboard/ollama-systemd.ts

Comment thread src/lib/onboard.ts Outdated
@laitingsheng laitingsheng added the provider: ollama Ollama local model provider behavior label Jun 5, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Selective E2E Results — ⚠️ No requested jobs ran

Run: 27036160234
Target ref: cac1d6654d8a4a509cc0c6a70fe5eb618a2666b7
Workflow ref: main
Requested jobs: gpu-e2e,gpu-double-onboard-e2e
Summary: 0 passed, 0 failed, 2 skipped

Job Result
gpu-double-onboard-e2e ⏭️ skipped
gpu-e2e ⏭️ skipped

cv added 2 commits June 5, 2026 12:42
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Selective E2E Results — ⚠️ No requested jobs ran

Run: 27039854857
Target ref: 61f476aa111905978e82ba296b7adae20a5c2320
Workflow ref: main
Requested jobs: gpu-e2e
Summary: 0 passed, 0 failed, 1 skipped

Job Result
gpu-e2e ⏭️ skipped

@cv cv merged commit ed75d14 into main Jun 5, 2026
28 checks passed
@cv cv deleted the fix/4812-4813-ollama-bootstrap-fit-hardening branch June 5, 2026 21:53
miyoungc added a commit that referenced this pull request Jun 6, 2026
## Summary
- Adds the `v0.0.60` section to `docs/about/release-notes.mdx` using the
dev announcement from discussion #4877.
- Fills the source-doc gaps found during release-prep review across
inference, policy tiers, command behavior, security boundaries, Hermes
dashboard/tooling, runtime context, and troubleshooting.
- Refreshes generated agent skills under `.agents/skills/` from the
current Fern docs output and upgrades Fern from `5.44.3` to `5.45.0`.

## Source summary
- #4037 -> `docs/reference/architecture.mdx`,
`docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents
system-only runtime context that stays out of visible chat.
- #4875 -> `docs/reference/architecture.mdx`,
`docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents
try-first sandbox network/filesystem guidance and clearer failure
classification.
- #4788 -> `docs/security/best-practices.mdx`,
`docs/about/release-notes.mdx`: Documents shared OpenClaw
device-approval policy for startup and connect.
- #4768 -> `docs/reference/network-policies.mdx`,
`docs/network-policy/integration-policy-examples.mdx`,
`docs/get-started/quickstart.mdx`,
`docs/get-started/quickstart-hermes.mdx`, `docs/reference/commands.mdx`:
Documents `weather`, `public-reference`, and Hermes managed-tool gateway
preset behavior.
- #3788 and #4864 -> `docs/reference/network-policies.mdx`,
`docs/reference/commands.mdx`: Documents non-interactive policy-tier
fail-fast behavior and interactive prompt fallback.
- #4756 and #4866 -> `docs/reference/commands.mdx`: Documents env-aware
default sandbox resolution for `list`, `status`, and `tunnel` commands.
- #4320 -> `docs/reference/commands.mdx`: Documents `$$nemoclaw tunnel
status` behavior.
- #4328 -> `docs/reference/commands.mdx`: Documents line-scoped policy
preset descriptions in `policy-list`.
- #4580 and #4748 -> `docs/reference/architecture.mdx`: Documents
package-managed OpenShell gateway service and Docker-driver
gateway-marker behavior.
- #4598 -> `docs/manage-sandboxes/lifecycle.mdx`: Documents concurrent
gateway/dashboard cleanup isolation by sandbox name and port.
- #4777 -> `docs/reference/troubleshooting.mdx`: Documents Docker GPU
patch rollback behavior.
- #4610 -> `docs/reference/troubleshooting.mdx`,
`docs/reference/commands.mdx`: Keeps mutable OpenClaw config permission
guidance aligned and removes skipped experimental wording.
- #4868 -> `docs/reference/commands.mdx`: Keeps `.dockerignore` handling
for custom `onboard --from <Dockerfile>` contexts in generated skills.
- #4870 -> `docs/reference/commands.mdx`,
`docs/manage-sandboxes/runtime-controls.mdx`: Documents
`NEMOCLAW_MINIMAL_BOOTSTRAP` and generated skill coverage.
- #4641 -> `docs/inference/inference-options.mdx`,
`docs/reference/troubleshooting.mdx`: Documents local NVIDIA NIM
platform-digest pulls and served-model id adoption.
- #4810 and #4867 -> `docs/inference/inference-options.mdx`: Documents
stable NGC managed-vLLM image lineage and DGX Station DeepSeek V4 Flash
coverage.
- #4852 -> `docs/inference/use-local-inference.mdx`,
`docs/reference/troubleshooting.mdx`: Documents Ollama model fit
filtering, 16K context floor, cold-load retry, and failed-model
exclusion.
- #4847 -> `docs/inference/switch-inference-providers.mdx`: Documents
API-family sync, Hermes `api_mode`, and Bedrock Runtime exception.
- #4800 -> `docs/inference/tool-calling-reliability.mdx`: Documents
Nemotron managed-inference native tool-search fallback.
- #4333 -> `docs/inference/switch-inference-providers.mdx`: Documents
interactive multimodal input prompting.
- #4086 -> `docs/reference/troubleshooting.mdx`: Keeps proxy bypass
normalization in generated troubleshooting coverage.
- #4811 and #4855 -> `docs/get-started/quickstart-hermes.mdx`: Documents
prebuilt Hermes dashboard assets and TUI recovery without runtime
rebuilds.
- #4854 -> `docs/inference/switch-inference-providers.mdx`,
`docs/reference/commands.mdx`: Documents Hermes proxy API-key
placeholder preservation during inference switches.
- #4248 -> `docs/manage-sandboxes/messaging-channels.mdx`,
`.agents/skills/`: Keeps messaging enrollment behavior aligned with
manifest-hook implementation.
- #4771 -> `docs/security/best-practices.mdx`,
`docs/security/credential-storage.mdx`: Documents Hermes
placeholder-only secret boundary for sandbox-visible runtime files.
- #4787 -> `docs/security/best-practices.mdx`,
`docs/about/release-notes.mdx`: Documents expanded memory scanner
examples for OpenAI project keys and Slack app-level tokens.
- #4848 -> `docs/reference/commands.mdx`: Documents OpenClaw skill
install mirroring into the agent home directory.
- #4790 -> `docs/about/release-notes.mdx`: Uses the prior release-prep
structure and generated `.agents/skills/` refresh as the template for
this release.

## Verification
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --doc-platform fern-mdx`
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ skills/
--prefix nemoclaw-user --doc-platform fern-mdx --dry-run`
- `npm run docs`
- `git diff --check`
- skip-term scan across `docs/`, `.agents/skills/`, and `skills/`
- `npm run build:cli`
- `npm run typecheck:cli`
- Commit and pre-push hook suites, including markdownlint, gitleaks,
env-var docs gate, docs-to-skills verification, and skills YAML tests

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* DeepSeek-V4-Flash now available as default inference model for DGX
Station.
* Hermes dashboard improved with dedicated port and OAuth-authenticated
tool gateway selection.
* Added weather and public-reference policy presets for expanded agent
capabilities.
* Enhanced Ollama model selection with GPU memory filtering and
automatic retry for timeouts.

* **Bug Fixes**
  * Improved policy tier validation to prevent invalid configurations.
* Better sandbox cleanup scoping by port to prevent conflicts across
deployments.
  * Added GPU patch failure recovery with automatic rollback.

* **Documentation**
* Expanded troubleshooting guides for inference, security, and sandbox
lifecycle.
  * Added .dockerignore best practices for custom deployments.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix PR fixes a bug or regression provider: ollama Ollama local model provider behavior

Projects

None yet

2 participants