fix(inference): retry ollama validation probes and docker runtime detection by zyang-dev · Pull Request #4540 · NVIDIA/NemoClaw

zyang-dev · 2026-05-29T18:06:06Z

Summary

Improves local Ollama onboarding reliability by retrying validation after host-side curl process timeouts and making Docker Desktop runtime detection less sensitive to transient docker info delays.

Related Issue

Fixes #4501

Changes

Normalize spawnSync ETIMEDOUT failures so Ollama validation treats them as retryable probe timeouts.
Retry Ollama tool-call validation with doubled timeouts after curl or parent-process timeout failures.
Retry Docker runtime detection with a longer docker info timeout before falling back to proxy routing.
Add tests for timeout normalization, validation retry behavior, and Docker runtime detection retries.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: zyang-dev 267119621+zyang-dev@users.noreply.github.com

Summary by CodeRabbit

New Features
- Improved container runtime detection with configurable retries/timeouts to make local inference port selection more reliable.
- Unified error normalization for spawn/HTTP probes so timeout/connection failures report consistent status codes and trigger correct retry behavior.
- Retry logic now preserves query-param auth for retried OpenAI-like requests.
Tests
- Added coverage for runtime detection retries/fallbacks and for probe timeout/error normalization and retry scenarios.

Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>

coderabbitai · 2026-05-29T18:06:19Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 54854c33-d986-4ecd-a4ef-605734e8e4df

📥 Commits

Reviewing files that changed from the base of the PR and between 3e84fcc and 348907c.

📒 Files selected for processing (2)

src/lib/inference/onboard-probes.test.ts
src/lib/inference/onboard-probes.ts

📝 Walkthrough

Walkthrough

Adds a retryable Docker runtime detector and re-exports it; normalizes spawnSync ETIMEDOUT → curl_status -110 across HTTP probes; introduces probe timeout/connection-failure predicates; wires changes into onboarding/local inference/topology; and adds tests for retry and timeout behaviors.

Changes

Docker runtime detection and probe timeout handling

Layer / File(s)	Summary
Docker container runtime detection module `src/lib/adapters/docker/runtime.ts`, `src/lib/adapters/docker/runtime.test.ts`, `src/lib/adapters/docker/index.ts`	New `detectContainerRuntimeFromDockerInfo` with configurable attempts/timeout and injected probe support; exported via docker adapter barrel and covered by tests verifying retry/fallback.
HTTP probe spawn error code normalization `src/lib/adapters/http/probe.ts`, `src/lib/adapters/http/probe.test.ts`	Adds `normalizeSpawnErrorCode` mapping spawnSync errors to numeric `curlStatus` (ETIMEDOUT → -110) and updates probe implementations and trace metadata; tests assert mapping and messages.
Probe timeout and connection failure detection `src/lib/inference/onboard-probes.ts`	Introduces `CURL_TIMEOUT_STATUS` and `NODE_SPAWN_TIMEOUT_STATUS`, implements `isProbeTimeout` and `isTimeoutOrConnFailureStatus`, and refactors retry decision and logging to use these predicates.
Onboarding probe retry tests `src/lib/inference/onboard-probes.test.ts`	Adds tests simulating doubled-timeout chat-completions retries and a subprocess-based test that stubs `runCurlProbe` to fail once with ETIMEDOUT then succeed; verifies retry counts, auth handling, and increasing `timeoutMs`.
Local inference runtime detection integration `src/lib/inference/local.ts`	Replaces inline `dockerInfo`+`infer` flow with `detectContainerRuntimeFromDockerInfo()` for runtime selection; removes unused type import and adjusts requires.
Local topology runtime detection refactoring `src/lib/onboard/local-inference-topology.ts`	Rewires `getContainerRuntime()` to call `detectContainerRuntimeFromDockerInfo()` directly, eliminating the previous two-step probe/inference.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

NVIDIA/NemoClaw#4449: Overlaps on HTTP probe timeout/error handling and how runCurlProbe derives timeoutMs.

Suggested labels

Docker, Provider: Ollama, Platform: Windows/WSL, Local Models, fix

Suggested reviewers

ericksoa
cv

Poem

🐰 A little rabbit peeks and peers,
Probes that timeout, now named and clear,
Retry hops twice and finds the runtime,
Traces hum, and tests pass in time.
Hop on — the ports and logs align!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly summarizes the main changes: retrying ollama validation probes and improving docker runtime detection, which aligns with the pull request objectives.
Linked Issues check	✅ Passed	All coding-related requirements from `#4501` are addressed: ETIMEDOUT errors are normalized as retryable timeouts [probe.ts], validation probes retry with doubled timeouts [onboard-probes.ts], and docker runtime detection is improved with retry logic [runtime.ts].
Out of Scope Changes check	✅ Passed	All changes are scoped to fixing Ollama validation timeout handling and Docker runtime detection retries as specified in `#4501`; no unrelated modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/windows-ollama-validation-timeout

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>

github-actions · 2026-05-29T18:10:00Z

E2E Advisor Recommendation

Required E2E: messaging-compatible-endpoint-e2e, onboard-inference-smoke-e2e, gpu-e2e
Optional E2E: ollama-proxy-e2e, inference-routing-e2e, cloud-onboard-e2e, gpu-double-onboard-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

messaging-compatible-endpoint-e2e (high): Exercises OpenAI-compatible onboarding against a hermetic local mock, validates /responses and /chat/completions traffic through inference.local, and proves a real OpenClaw agent turn reaches the compatible endpoint with auth. This is the strongest existing E2E for the changed onboard probe and fallback paths.
onboard-inference-smoke-e2e (medium): Direct regression coverage for onboard inference validation failing closed when a configured route is runtime-broken. The changed curl timeout/status classification and retry behavior can affect this onboarding decision and diagnostics.
gpu-e2e (high): Runs the full local Ollama user flow: install/onboard with NEMOCLAW_PROVIDER=ollama, start auth proxy, create sandbox, and verify sandbox inference. This is the existing E2E that best exercises the changed Docker runtime detection and local inference topology in a real sandbox.

Optional E2E

ollama-proxy-e2e (medium): Adjacent confidence for local Ollama auth proxy behavior and container-to-host proxy reachability. It does not directly exercise the new runtime detector but validates the proxy path selected by local inference topology on native Docker.
inference-routing-e2e (medium): Useful broader coverage for inference.local routing, credential isolation, and transport/credential error classification after changes to probe error status handling.
cloud-onboard-e2e (high): Optional live cloud onboarding confidence because generic onboard probe changes can affect NVIDIA endpoint validation, but the PR is more focused on timeout classification, compatible endpoints, and local Ollama topology than cloud provider-specific behavior.
gpu-double-onboard-e2e (high): Adjacent local Ollama re-onboard coverage for proxy/token consistency. Useful if maintainers want extra assurance that the topology/runtime detection refactor does not regress repeated local-provider onboarding.

New E2E recommendations

wsl-docker-desktop-local-ollama (high): The changed runtime detector specifically affects Docker Desktop vs native Docker decisions that determine whether containers can reach host loopback directly. Existing WSL scenario coverage is cloud-focused and does not appear to validate Windows-host Ollama local inference topology.
- Suggested test: Add a WSL local-Ollama topology E2E that runs onboard with Windows-host Ollama plus Docker Desktop WSL integration and verifies the sandbox reaches the expected Ollama/auth-proxy port.
docker-info-runtime-retry (medium): The new detector retries indeterminate docker info output before choosing a runtime, but existing E2E coverage does not appear to simulate transient empty/unknown docker info in a real onboard flow.
- Suggested test: Add a lightweight E2E or regression E2E that shims docker info to return empty output before a recognizable runtime and asserts local inference topology still selects the correct container reachability path.
spawnSync-etimedout-onboard-retry (medium): Unit tests cover normalized -110 statuses, but there is limited end-to-end coverage that a real onboard-compatible endpoint validation retries and preserves auth after Node-level spawnSync ETIMEDOUT rather than only curl exit 28.
- Suggested test: Add a hermetic onboard probe E2E that shims curl/process timeout to produce spawnSync ETIMEDOUT on the first call, then verifies doubled-timeout retry, auth preservation, and successful provider selection.

github-actions · 2026-05-29T18:14:44Z

E2E Scenario Advisor Recommendation

Required scenario E2E: ubuntu-repo-cloud-openclaw, ubuntu-repo-openai-compatible-openclaw, gpu-repo-local-ollama-openclaw
Optional scenario E2E: wsl-repo-cloud-openclaw, macos-repo-cloud-openclaw

Dispatch required scenario E2E:

gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw
gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-openai-compatible-openclaw
gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=gpu-repo-local-ollama-openclaw

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

ubuntu-repo-cloud-openclaw: Primary Ubuntu repo scenario for cloud onboarding and inference. The PR changes HTTP curl probe timeout handling and OpenAI-like onboarding probe retry behavior used by cloud provider validation and inference checks.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw
ubuntu-repo-openai-compatible-openclaw: Targets the OpenAI-compatible onboarding path most directly affected by changes in src/lib/inference/onboard-probes.ts, including timeout retry handling and query-param auth preservation.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-openai-compatible-openclaw
gpu-repo-local-ollama-openclaw: Primary local Ollama scenario for changes to Docker runtime detection, local inference topology, Ollama container port selection, and Ollama proxy behavior.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=gpu-repo-local-ollama-openclaw

Optional scenario E2E

wsl-repo-cloud-openclaw: Adjacent special-runner coverage for WSL-specific onboarding/probe timing behavior and Docker Desktop runtime/host-loopback detection. Optional because the primary affected paths are covered by Ubuntu and GPU/local-Ollama scenarios.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=wsl-repo-cloud-openclaw
macos-repo-cloud-openclaw: Adjacent special-runner platform coverage for Docker runtime classification on a Docker-optional host. Optional because macOS scenario skips Docker-dependent inference suites on hosted macOS.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=macos-repo-cloud-openclaw

Relevant changed files

src/lib/adapters/docker/index.ts
src/lib/adapters/docker/runtime.ts
src/lib/adapters/http/probe.ts
src/lib/inference/local.ts
src/lib/inference/onboard-probes.ts
src/lib/onboard/local-inference-topology.ts

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/lib/inference/onboard-probes.ts (1)

757-766: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve query-param auth mode in doubled-timeout retry requests.

The retry path always switches to Bearer header auth and drops ?key=.... If authMode is "query-param", retry calls can become unauthenticated and fail despite transient first-attempt timeouts.

Proposed fix

diff --git a/src/lib/inference/onboard-probes.ts b/src/lib/inference/onboard-probes.ts
@@
-    const buildRetryArgs = () => [
+    const retryAuthHeader =
+      !useQueryParam && normalizedKey ? ["-H", `Authorization: Bearer ${normalizedKey}`] : [];
+    const retryUrl =
+      useQueryParam && normalizedKey
+        ? `${String(endpointUrl).replace(/\/+$/, "")}/chat/completions?key=${encodeURIComponent(normalizedKey)}`
+        : `${String(endpointUrl).replace(/\/+$/, "")}/chat/completions`;
+    const buildRetryArgs = () => [
       "-sS",
       ...doubledArgs,
       "-H",
       "Content-Type: application/json",
-      ...(apiKey ? ["-H", `Authorization: Bearer ${normalizeCredentialValue(apiKey)}`] : []),
+      ...retryAuthHeader,
       "-d",
       JSON.stringify(getChatCompletionsProbePayload(model)),
-      `${String(endpointUrl).replace(/\/+$/, "")}/chat/completions`,
+      retryUrl,
     ];

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/inference/onboard-probes.ts` around lines 757 - 766, The
doubled-timeout retry builder buildRetryArgs currently always adds an
Authorization: Bearer header and drops query-param auth; update it to respect
authMode so that if authMode === "query-param" you do NOT add the Authorization
header and instead append the API key as a URL query parameter (properly
URL-encoded and preserving existing query string) to the chat/completions
endpoint; otherwise keep the existing behavior of adding ["-H", `Authorization:
Bearer ${normalizeCredentialValue(apiKey)}`]. Use existing helpers like
normalizeCredentialValue and getChatCompletionsProbePayload and modify how
endpointUrl is combined so the query-param form is preserved for retries.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/adapters/http/probe.ts`:
- Around line 106-112: The streaming probe path currently returns raw errno
values (e.g., -60 on macOS) and doesn't use normalizeSpawnErrorCode, so
timeout-aware logic misses retries; update the streaming probe error handling to
call normalizeSpawnErrorCode on the error (same helper used by runCurlProbeImpl)
before returning or propagating the code and ensure any places that inspect
errno/exit codes for timeouts use the normalized value (-110) instead of raw
errno; locate the streaming handler function in this file and replace/augment
its raw errno extraction with a call to normalizeSpawnErrorCode(error).

---

Outside diff comments:
In `@src/lib/inference/onboard-probes.ts`:
- Around line 757-766: The doubled-timeout retry builder buildRetryArgs
currently always adds an Authorization: Bearer header and drops query-param
auth; update it to respect authMode so that if authMode === "query-param" you do
NOT add the Authorization header and instead append the API key as a URL query
parameter (properly URL-encoded and preserving existing query string) to the
chat/completions endpoint; otherwise keep the existing behavior of adding ["-H",
`Authorization: Bearer ${normalizeCredentialValue(apiKey)}`]. Use existing
helpers like normalizeCredentialValue and getChatCompletionsProbePayload and
modify how endpointUrl is combined so the query-param form is preserved for
retries.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7d89122c-342c-4bdd-aa78-6d2840e5b7a5

📥 Commits

Reviewing files that changed from the base of the PR and between 6fb96bd and 58604b5.

📒 Files selected for processing (9)

src/lib/adapters/docker/index.ts
src/lib/adapters/docker/runtime.test.ts
src/lib/adapters/docker/runtime.ts
src/lib/adapters/http/probe.test.ts
src/lib/adapters/http/probe.ts
src/lib/inference/local.ts
src/lib/inference/onboard-probes.test.ts
src/lib/inference/onboard-probes.ts
src/lib/onboard/local-inference-topology.ts

Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>

github-actions · 2026-05-29T18:25:32Z

PR Review Advisor

Findings: 3 needs attention, 3 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 5 still apply, 1 new item found

Review findings

🛠️ Needs attention

Linked issue's auth-proxy leak is still not addressed: PR fix(inference): retry ollama validation probes and docker runtime detection #4540 says it fixes [Onboard] Local Ollama validation fails with spawnSync curl ETIMEDOUT on host.docker.internal:11434 reachable #4501, but [Onboard] Local Ollama validation fails with spawnSync curl ETIMEDOUT on host.docker.internal:11434 reachable #4501 includes a separate failure mode: after onboarding exits, `ollama-auth-proxy.js` remains running. The current diff still only changes timeout normalization, probe retry/auth behavior, and Docker runtime detection; it does not touch proxy lifecycle or onboarding abort cleanup.
- Recommendation: Either implement and test cleanup for the leaked `ollama-auth-proxy.js` path, or narrow the PR/closing language so this PR does not claim to fix that issue clause.
- Evidence: Issue clause: "The auth proxy process also leaks and stays running after onboarding exits." The changed files do not include `src/lib/inference/ollama/proxy.ts`, local adapter lifecycle cleanup, uninstall cleanup, or onboarding abort cleanup.
Onboarding probe test monolith grew substantially (src/lib/inference/onboard-probes.test.ts): The PR adds another large block of fake-curl and subprocess monkeypatch coverage to an already large hotspot. The new tests are relevant, but continuing to grow this file makes the retry/probe suite harder to navigate and maintain.
- Recommendation: Move the new retry scenarios into a focused companion test file or extract reusable fake-curl/probe-mocking helpers so this hotspot does not keep growing.
- Evidence: Deterministic monolith delta: `src/lib/inference/onboard-probes.test.ts` base 794 lines, head 928 lines, delta +134; rationale: "Current monolith grew by 20 or more lines; extract or offset the growth before merge."
HTTP probe test hotspot grew substantially (src/lib/adapters/http/probe.test.ts): The timeout-normalization tests add significant size to an existing large probe test file. The cases are useful, but the growth continues a large-file hotspot.
- Recommendation: Extract the timeout-normalization cases into a focused companion test file or offset the growth by extracting repeated spawnSync/file-output fixtures.
- Evidence: Deterministic monolith delta: `src/lib/adapters/http/probe.test.ts` base 441 lines, head 525 lines, delta +84; rationale: "Current monolith grew by 20 or more lines; extract or offset the growth before merge."

🔎 Worth checking

Source-of-truth review needed: Docker runtime detection retry on unknown `docker info`: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `detectContainerRuntimeFromDockerInfo()` retries all `unknown` outputs up to 3 attempts with a 5s timeout, and callers in `local.ts` and `local-inference-topology.ts` now depend on it.
Docker runtime retry changes Ollama proxy routing without caller-level coverage (src/lib/adapters/docker/runtime.ts:8): The new detector retries every `unknown` `docker info` output up to 3 times with a 5s timeout, and both `getOllamaContainerPort()` and local inference topology now depend on it. That affects whether WSL Docker Desktop uses direct Ollama port 11434 or the auth proxy port, a security-sensitive routing decision. The helper test covers transient empty output, but no caller-level test proves the WSL Ollama route/proxy decision after a transient `docker info` result, and the source of the transient invalid state is not documented.
- Recommendation: Add caller-level coverage for WSL + transient `docker info` unknown output proving `getOllamaContainerPort()` and topology choose the intended route after retry. Also document or tune the 3 × 5s retry budget and explain why the invalid `unknown` state must be handled here rather than at the Docker probe source.
- Evidence: `DOCKER_INFO_RUNTIME_PROBE_ATTEMPTS = 3` and `DOCKER_INFO_RUNTIME_PROBE_TIMEOUT_MS = 5000` are introduced in `runtime.ts`; `src/lib/inference/local.ts` and `src/lib/onboard/local-inference-topology.ts` now call `detectContainerRuntimeFromDockerInfo()` instead of a single 1500ms `dockerInfo()` probe. The new test only exercises `detectContainerRuntimeFromDockerInfo({ dockerInfoImpl })` directly.
Streaming transport failures still look like fallback-eligible missing events (src/lib/adapters/http/probe.ts): `runStreamingEventProbe()` returns `missingEvents: REQUIRED_STREAMING_EVENTS` for transport/spawn failures, including the newly normalized `ETIMEDOUT` path. The caller in `probeOpenAiLikeEndpoint()` checks `!streamResult.ok && streamResult.missingEvents.length > 0` first and silently falls back to chat completions, even though the next branch says transport or execution failure should surface as a hard error.
- Recommendation: Separate transport failures from semantic missing-event failures in `StreamingProbeResult` and add a caller-level test showing spawn/timeout errors take the hard-error path rather than the silent fallback path.
- Evidence: In `runStreamingEventProbeImpl`, `result.error` now normalizes the curl status but still returns `missingEvents: REQUIRED_STREAMING_EVENTS`; in `probeOpenAiLikeEndpoint`, any non-ok result with non-empty `missingEvents` is treated as a fallback case before the hard-error branch.

🌱 Nice ideas

None.

Since last review details

Current findings:

Source-of-truth review needed: Docker runtime detection retry on unknown `docker info`: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `detectContainerRuntimeFromDockerInfo()` retries all `unknown` outputs up to 3 attempts with a 5s timeout, and callers in `local.ts` and `local-inference-topology.ts` now depend on it.
Linked issue's auth-proxy leak is still not addressed: PR fix(inference): retry ollama validation probes and docker runtime detection #4540 says it fixes [Onboard] Local Ollama validation fails with spawnSync curl ETIMEDOUT on host.docker.internal:11434 reachable #4501, but [Onboard] Local Ollama validation fails with spawnSync curl ETIMEDOUT on host.docker.internal:11434 reachable #4501 includes a separate failure mode: after onboarding exits, `ollama-auth-proxy.js` remains running. The current diff still only changes timeout normalization, probe retry/auth behavior, and Docker runtime detection; it does not touch proxy lifecycle or onboarding abort cleanup.
- Recommendation: Either implement and test cleanup for the leaked `ollama-auth-proxy.js` path, or narrow the PR/closing language so this PR does not claim to fix that issue clause.
- Evidence: Issue clause: "The auth proxy process also leaks and stays running after onboarding exits." The changed files do not include `src/lib/inference/ollama/proxy.ts`, local adapter lifecycle cleanup, uninstall cleanup, or onboarding abort cleanup.
Onboarding probe test monolith grew substantially (src/lib/inference/onboard-probes.test.ts): The PR adds another large block of fake-curl and subprocess monkeypatch coverage to an already large hotspot. The new tests are relevant, but continuing to grow this file makes the retry/probe suite harder to navigate and maintain.
- Recommendation: Move the new retry scenarios into a focused companion test file or extract reusable fake-curl/probe-mocking helpers so this hotspot does not keep growing.
- Evidence: Deterministic monolith delta: `src/lib/inference/onboard-probes.test.ts` base 794 lines, head 928 lines, delta +134; rationale: "Current monolith grew by 20 or more lines; extract or offset the growth before merge."
HTTP probe test hotspot grew substantially (src/lib/adapters/http/probe.test.ts): The timeout-normalization tests add significant size to an existing large probe test file. The cases are useful, but the growth continues a large-file hotspot.
- Recommendation: Extract the timeout-normalization cases into a focused companion test file or offset the growth by extracting repeated spawnSync/file-output fixtures.
- Evidence: Deterministic monolith delta: `src/lib/adapters/http/probe.test.ts` base 441 lines, head 525 lines, delta +84; rationale: "Current monolith grew by 20 or more lines; extract or offset the growth before merge."
Docker runtime retry changes Ollama proxy routing without caller-level coverage (src/lib/adapters/docker/runtime.ts:8): The new detector retries every `unknown` `docker info` output up to 3 times with a 5s timeout, and both `getOllamaContainerPort()` and local inference topology now depend on it. That affects whether WSL Docker Desktop uses direct Ollama port 11434 or the auth proxy port, a security-sensitive routing decision. The helper test covers transient empty output, but no caller-level test proves the WSL Ollama route/proxy decision after a transient `docker info` result, and the source of the transient invalid state is not documented.
- Recommendation: Add caller-level coverage for WSL + transient `docker info` unknown output proving `getOllamaContainerPort()` and topology choose the intended route after retry. Also document or tune the 3 × 5s retry budget and explain why the invalid `unknown` state must be handled here rather than at the Docker probe source.
- Evidence: `DOCKER_INFO_RUNTIME_PROBE_ATTEMPTS = 3` and `DOCKER_INFO_RUNTIME_PROBE_TIMEOUT_MS = 5000` are introduced in `runtime.ts`; `src/lib/inference/local.ts` and `src/lib/onboard/local-inference-topology.ts` now call `detectContainerRuntimeFromDockerInfo()` instead of a single 1500ms `dockerInfo()` probe. The new test only exercises `detectContainerRuntimeFromDockerInfo({ dockerInfoImpl })` directly.
Streaming transport failures still look like fallback-eligible missing events (src/lib/adapters/http/probe.ts): `runStreamingEventProbe()` returns `missingEvents: REQUIRED_STREAMING_EVENTS` for transport/spawn failures, including the newly normalized `ETIMEDOUT` path. The caller in `probeOpenAiLikeEndpoint()` checks `!streamResult.ok && streamResult.missingEvents.length > 0` first and silently falls back to chat completions, even though the next branch says transport or execution failure should surface as a hard error.
- Recommendation: Separate transport failures from semantic missing-event failures in `StreamingProbeResult` and add a caller-level test showing spawn/timeout errors take the hard-error path rather than the silent fallback path.
- Evidence: In `runStreamingEventProbeImpl`, `result.error` now normalizes the curl status but still returns `missingEvents: REQUIRED_STREAMING_EVENTS`; in `probeOpenAiLikeEndpoint`, any non-ok result with non-empty `missingEvents` is treated as a fallback case before the hard-error branch.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

ericksoa

Thanks for the quick follow-up here. I no longer see the streaming-probe timeout-normalization issue on the current head, but one retry-path blocker remains before I can approve.

In src/lib/inference/onboard-probes.ts, the doubled-timeout retry path for ordinary Chat Completions probes does not preserve authMode: "query-param". The first request uses the expected ...?key=... URL, but after a timeout buildRetryArgs() always adds Authorization: Bearer ... and retries /chat/completions without the query key. I reproduced this against the current head by stubbing runCurlProbe: the first call had hasBearer=false and url=https://api.example.com/v1/chat/completions?key=...; the retry had hasBearer=true and url=https://api.example.com/v1/chat/completions. That means timeout recovery can still turn into an auth failure for query-param providers.

Please keep the retry auth/url construction aligned with the original probe auth mode: no Bearer header for query-param, and append the normalized/encoded key to the retry URL instead.

Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>

ericksoa

Rechecked the current head after the follow-up fixes. The streaming timeout normalization is now applied to the streaming paths, and the doubled-timeout retry now preserves query-param auth by reusing the original auth/url construction.

Local validation passed for npm run build:cli and focused vitest coverage: src/lib/adapters/http/probe.test.ts, src/lib/adapters/docker/runtime.test.ts, and src/lib/inference/onboard-probes.test.ts. Live PR checks are green as well. Approving.

## Summary Promotes NemoClaw user skills in the most visible docs entry points and refreshes the docs for v0.0.55. This keeps the public docs and generated user skills aligned with the latest release tag. ## Related Issue None. ## Changes - Adds homepage, overview, prerequisites, and quickstart links that make NemoClaw user skills easier to discover. - #4540 -> `docs/inference/use-local-inference.mdx` and `docs/about/release-notes.mdx`: Documents local Ollama validation retries and Docker runtime detection retries. - #4519 -> `docs/about/release-notes.mdx`: Notes the plugin secret-scanner fallback behavior for embedded fallback mode. - #4526 -> `docs/get-started/quickstart.mdx`, `docs/manage-sandboxes/messaging-channels.mdx`, and `docs/about/release-notes.mdx`: Documents that pressing Enter with no messaging channels selected skips setup. - Refreshes generated `nemoclaw-user-*` skills from the updated Fern docs. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [x] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [ ] `npx prek run --all-files` passes - [ ] `npm test` passes - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [ ] `npm run docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) Verification details: - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` passed. - `npm run docs` passed with 0 errors and the existing Fern light-mode contrast warning. - Commit hooks passed for the docs commits. - Full all-files/pre-push checks were attempted but hit unrelated CLI test timeouts, so the all-files checkbox is left unchecked. --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>  ## Summary by CodeRabbit * **Documentation** * Added Agent Skills guidance and links across overview, getting started, quickstart, and homepage to load NemoClaw user skills into AI coding assistants. * Clarified local inference onboarding reliability (retry/timeout behavior and Docker detection) for Ollama/vLLM. * Clarified messaging-channel onboarding to skip setup when no channels are selected and Enter is pressed. * Improved secret-scanner behavior descriptions and added v0.0.55 release notes.  [![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/4547?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

fix(inference): Ollama validation probes and retry Docker runtime

58604b5

Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>

zyang-dev added the v0.0.55 Release target label May 29, 2026

zyang-dev self-assigned this May 29, 2026

zyang-dev changed the title ~~fix(inference): Ollama validation probes and retry Docker runtime~~ fix(inference): retry ollama validation probes and docker runtime detection May 29, 2026

test(inference): mock runner in docker runtime retry test

68cabce

Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

Comment thread src/lib/adapters/http/probe.ts

fix(inference): normalize streaming probe timeout errors

3e84fcc

Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>

cv approved these changes May 29, 2026

View reviewed changes

ericksoa requested changes May 29, 2026

View reviewed changes

fix(inference): preserve query-param auth on validation retry

348907c

Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>

ericksoa approved these changes May 29, 2026

View reviewed changes

jyaunches added the R1 label May 29, 2026

ericksoa merged commit 95d483f into main May 29, 2026
30 checks passed

ericksoa deleted the fix/windows-ollama-validation-timeout branch May 29, 2026 19:03

miyoungc mentioned this pull request May 29, 2026

docs: promote skills and refresh release docs #4547

Merged

12 tasks

Conversation

zyang-dev commented May 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

ericksoa left a comment

Choose a reason for hiding this comment

Uh oh!

ericksoa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zyang-dev commented May 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 29, 2026 •

edited

Loading

github-actions Bot commented May 29, 2026 •

edited

Loading

github-actions Bot commented May 29, 2026 •

edited

Loading

github-actions Bot commented May 29, 2026 •

edited

Loading