Skip to content

fix(inference): retry ollama validation probes and docker runtime detection#4540

Merged
ericksoa merged 4 commits into
mainfrom
fix/windows-ollama-validation-timeout
May 29, 2026
Merged

fix(inference): retry ollama validation probes and docker runtime detection#4540
ericksoa merged 4 commits into
mainfrom
fix/windows-ollama-validation-timeout

Conversation

@zyang-dev
Copy link
Copy Markdown
Contributor

@zyang-dev zyang-dev commented May 29, 2026

Summary

Improves local Ollama onboarding reliability by retrying validation after host-side curl process timeouts and making Docker Desktop runtime detection less sensitive to transient docker info delays.

Related Issue

Fixes #4501

Changes

  • Normalize spawnSync ETIMEDOUT failures so Ollama validation treats them as retryable probe timeouts.
  • Retry Ollama tool-call validation with doubled timeouts after curl or parent-process timeout failures.
  • Retry Docker runtime detection with a longer docker info timeout before falling back to proxy routing.
  • Add tests for timeout normalization, validation retry behavior, and Docker runtime detection retries.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: zyang-dev 267119621+zyang-dev@users.noreply.github.com

Summary by CodeRabbit

  • New Features

    • Improved container runtime detection with configurable retries/timeouts to make local inference port selection more reliable.
    • Unified error normalization for spawn/HTTP probes so timeout/connection failures report consistent status codes and trigger correct retry behavior.
    • Retry logic now preserves query-param auth for retried OpenAI-like requests.
  • Tests

    • Added coverage for runtime detection retries/fallbacks and for probe timeout/error normalization and retry scenarios.

Review Change Stack

Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>
@zyang-dev zyang-dev added the v0.0.55 Release target label May 29, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 54854c33-d986-4ecd-a4ef-605734e8e4df

📥 Commits

Reviewing files that changed from the base of the PR and between 3e84fcc and 348907c.

📒 Files selected for processing (2)
  • src/lib/inference/onboard-probes.test.ts
  • src/lib/inference/onboard-probes.ts

📝 Walkthrough

Walkthrough

Adds a retryable Docker runtime detector and re-exports it; normalizes spawnSync ETIMEDOUT → curl_status -110 across HTTP probes; introduces probe timeout/connection-failure predicates; wires changes into onboarding/local inference/topology; and adds tests for retry and timeout behaviors.

Changes

Docker runtime detection and probe timeout handling

Layer / File(s) Summary
Docker container runtime detection module
src/lib/adapters/docker/runtime.ts, src/lib/adapters/docker/runtime.test.ts, src/lib/adapters/docker/index.ts
New detectContainerRuntimeFromDockerInfo with configurable attempts/timeout and injected probe support; exported via docker adapter barrel and covered by tests verifying retry/fallback.
HTTP probe spawn error code normalization
src/lib/adapters/http/probe.ts, src/lib/adapters/http/probe.test.ts
Adds normalizeSpawnErrorCode mapping spawnSync errors to numeric curlStatus (ETIMEDOUT → -110) and updates probe implementations and trace metadata; tests assert mapping and messages.
Probe timeout and connection failure detection
src/lib/inference/onboard-probes.ts
Introduces CURL_TIMEOUT_STATUS and NODE_SPAWN_TIMEOUT_STATUS, implements isProbeTimeout and isTimeoutOrConnFailureStatus, and refactors retry decision and logging to use these predicates.
Onboarding probe retry tests
src/lib/inference/onboard-probes.test.ts
Adds tests simulating doubled-timeout chat-completions retries and a subprocess-based test that stubs runCurlProbe to fail once with ETIMEDOUT then succeed; verifies retry counts, auth handling, and increasing timeoutMs.
Local inference runtime detection integration
src/lib/inference/local.ts
Replaces inline dockerInfo+infer flow with detectContainerRuntimeFromDockerInfo() for runtime selection; removes unused type import and adjusts requires.
Local topology runtime detection refactoring
src/lib/onboard/local-inference-topology.ts
Rewires getContainerRuntime() to call detectContainerRuntimeFromDockerInfo() directly, eliminating the previous two-step probe/inference.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#4449: Overlaps on HTTP probe timeout/error handling and how runCurlProbe derives timeoutMs.

Suggested labels

Docker, Provider: Ollama, Platform: Windows/WSL, Local Models, fix

Suggested reviewers

  • ericksoa
  • cv

Poem

🐰 A little rabbit peeks and peers,
Probes that timeout, now named and clear,
Retry hops twice and finds the runtime,
Traces hum, and tests pass in time.
Hop on — the ports and logs align!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly summarizes the main changes: retrying ollama validation probes and improving docker runtime detection, which aligns with the pull request objectives.
Linked Issues check ✅ Passed All coding-related requirements from #4501 are addressed: ETIMEDOUT errors are normalized as retryable timeouts [probe.ts], validation probes retry with doubled timeouts [onboard-probes.ts], and docker runtime detection is improved with retry logic [runtime.ts].
Out of Scope Changes check ✅ Passed All changes are scoped to fixing Ollama validation timeout handling and Docker runtime detection retries as specified in #4501; no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/windows-ollama-validation-timeout

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

@zyang-dev zyang-dev self-assigned this May 29, 2026
@zyang-dev zyang-dev changed the title fix(inference): Ollama validation probes and retry Docker runtime fix(inference): retry ollama validation probes and docker runtime detection May 29, 2026
Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 29, 2026

E2E Advisor Recommendation

Required E2E: messaging-compatible-endpoint-e2e, onboard-inference-smoke-e2e, gpu-e2e
Optional E2E: ollama-proxy-e2e, inference-routing-e2e, cloud-onboard-e2e, gpu-double-onboard-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • messaging-compatible-endpoint-e2e (high): Exercises OpenAI-compatible onboarding against a hermetic local mock, validates /responses and /chat/completions traffic through inference.local, and proves a real OpenClaw agent turn reaches the compatible endpoint with auth. This is the strongest existing E2E for the changed onboard probe and fallback paths.
  • onboard-inference-smoke-e2e (medium): Direct regression coverage for onboard inference validation failing closed when a configured route is runtime-broken. The changed curl timeout/status classification and retry behavior can affect this onboarding decision and diagnostics.
  • gpu-e2e (high): Runs the full local Ollama user flow: install/onboard with NEMOCLAW_PROVIDER=ollama, start auth proxy, create sandbox, and verify sandbox inference. This is the existing E2E that best exercises the changed Docker runtime detection and local inference topology in a real sandbox.

Optional E2E

  • ollama-proxy-e2e (medium): Adjacent confidence for local Ollama auth proxy behavior and container-to-host proxy reachability. It does not directly exercise the new runtime detector but validates the proxy path selected by local inference topology on native Docker.
  • inference-routing-e2e (medium): Useful broader coverage for inference.local routing, credential isolation, and transport/credential error classification after changes to probe error status handling.
  • cloud-onboard-e2e (high): Optional live cloud onboarding confidence because generic onboard probe changes can affect NVIDIA endpoint validation, but the PR is more focused on timeout classification, compatible endpoints, and local Ollama topology than cloud provider-specific behavior.
  • gpu-double-onboard-e2e (high): Adjacent local Ollama re-onboard coverage for proxy/token consistency. Useful if maintainers want extra assurance that the topology/runtime detection refactor does not regress repeated local-provider onboarding.

New E2E recommendations

  • wsl-docker-desktop-local-ollama (high): The changed runtime detector specifically affects Docker Desktop vs native Docker decisions that determine whether containers can reach host loopback directly. Existing WSL scenario coverage is cloud-focused and does not appear to validate Windows-host Ollama local inference topology.
    • Suggested test: Add a WSL local-Ollama topology E2E that runs onboard with Windows-host Ollama plus Docker Desktop WSL integration and verifies the sandbox reaches the expected Ollama/auth-proxy port.
  • docker-info-runtime-retry (medium): The new detector retries indeterminate docker info output before choosing a runtime, but existing E2E coverage does not appear to simulate transient empty/unknown docker info in a real onboard flow.
    • Suggested test: Add a lightweight E2E or regression E2E that shims docker info to return empty output before a recognizable runtime and asserts local inference topology still selects the correct container reachability path.
  • spawnSync-etimedout-onboard-retry (medium): Unit tests cover normalized -110 statuses, but there is limited end-to-end coverage that a real onboard-compatible endpoint validation retries and preserves auth after Node-level spawnSync ETIMEDOUT rather than only curl exit 28.
    • Suggested test: Add a hermetic onboard probe E2E that shims curl/process timeout to produce spawnSync ETIMEDOUT on the first call, then verifies doubled-timeout retry, auth preservation, and successful provider selection.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 29, 2026

E2E Scenario Advisor Recommendation

Required scenario E2E: ubuntu-repo-cloud-openclaw, ubuntu-repo-openai-compatible-openclaw, gpu-repo-local-ollama-openclaw
Optional scenario E2E: wsl-repo-cloud-openclaw, macos-repo-cloud-openclaw

Dispatch required scenario E2E:

  • gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw
  • gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-openai-compatible-openclaw
  • gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=gpu-repo-local-ollama-openclaw

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • ubuntu-repo-cloud-openclaw: Primary Ubuntu repo scenario for cloud onboarding and inference. The PR changes HTTP curl probe timeout handling and OpenAI-like onboarding probe retry behavior used by cloud provider validation and inference checks.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw
  • ubuntu-repo-openai-compatible-openclaw: Targets the OpenAI-compatible onboarding path most directly affected by changes in src/lib/inference/onboard-probes.ts, including timeout retry handling and query-param auth preservation.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-openai-compatible-openclaw
  • gpu-repo-local-ollama-openclaw: Primary local Ollama scenario for changes to Docker runtime detection, local inference topology, Ollama container port selection, and Ollama proxy behavior.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=gpu-repo-local-ollama-openclaw

Optional scenario E2E

  • wsl-repo-cloud-openclaw: Adjacent special-runner coverage for WSL-specific onboarding/probe timing behavior and Docker Desktop runtime/host-loopback detection. Optional because the primary affected paths are covered by Ubuntu and GPU/local-Ollama scenarios.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=wsl-repo-cloud-openclaw
  • macos-repo-cloud-openclaw: Adjacent special-runner platform coverage for Docker runtime classification on a Docker-optional host. Optional because macOS scenario skips Docker-dependent inference suites on hosted macOS.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=macos-repo-cloud-openclaw

Relevant changed files

  • src/lib/adapters/docker/index.ts
  • src/lib/adapters/docker/runtime.ts
  • src/lib/adapters/http/probe.ts
  • src/lib/inference/local.ts
  • src/lib/inference/onboard-probes.ts
  • src/lib/onboard/local-inference-topology.ts

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/lib/inference/onboard-probes.ts (1)

757-766: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve query-param auth mode in doubled-timeout retry requests.

The retry path always switches to Bearer header auth and drops ?key=.... If authMode is "query-param", retry calls can become unauthenticated and fail despite transient first-attempt timeouts.

Proposed fix
diff --git a/src/lib/inference/onboard-probes.ts b/src/lib/inference/onboard-probes.ts
@@
-    const buildRetryArgs = () => [
+    const retryAuthHeader =
+      !useQueryParam && normalizedKey ? ["-H", `Authorization: Bearer ${normalizedKey}`] : [];
+    const retryUrl =
+      useQueryParam && normalizedKey
+        ? `${String(endpointUrl).replace(/\/+$/, "")}/chat/completions?key=${encodeURIComponent(normalizedKey)}`
+        : `${String(endpointUrl).replace(/\/+$/, "")}/chat/completions`;
+    const buildRetryArgs = () => [
       "-sS",
       ...doubledArgs,
       "-H",
       "Content-Type: application/json",
-      ...(apiKey ? ["-H", `Authorization: Bearer ${normalizeCredentialValue(apiKey)}`] : []),
+      ...retryAuthHeader,
       "-d",
       JSON.stringify(getChatCompletionsProbePayload(model)),
-      `${String(endpointUrl).replace(/\/+$/, "")}/chat/completions`,
+      retryUrl,
     ];
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/inference/onboard-probes.ts` around lines 757 - 766, The
doubled-timeout retry builder buildRetryArgs currently always adds an
Authorization: Bearer header and drops query-param auth; update it to respect
authMode so that if authMode === "query-param" you do NOT add the Authorization
header and instead append the API key as a URL query parameter (properly
URL-encoded and preserving existing query string) to the chat/completions
endpoint; otherwise keep the existing behavior of adding ["-H", `Authorization:
Bearer ${normalizeCredentialValue(apiKey)}`]. Use existing helpers like
normalizeCredentialValue and getChatCompletionsProbePayload and modify how
endpointUrl is combined so the query-param form is preserved for retries.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/adapters/http/probe.ts`:
- Around line 106-112: The streaming probe path currently returns raw errno
values (e.g., -60 on macOS) and doesn't use normalizeSpawnErrorCode, so
timeout-aware logic misses retries; update the streaming probe error handling to
call normalizeSpawnErrorCode on the error (same helper used by runCurlProbeImpl)
before returning or propagating the code and ensure any places that inspect
errno/exit codes for timeouts use the normalized value (-110) instead of raw
errno; locate the streaming handler function in this file and replace/augment
its raw errno extraction with a call to normalizeSpawnErrorCode(error).

---

Outside diff comments:
In `@src/lib/inference/onboard-probes.ts`:
- Around line 757-766: The doubled-timeout retry builder buildRetryArgs
currently always adds an Authorization: Bearer header and drops query-param
auth; update it to respect authMode so that if authMode === "query-param" you do
NOT add the Authorization header and instead append the API key as a URL query
parameter (properly URL-encoded and preserving existing query string) to the
chat/completions endpoint; otherwise keep the existing behavior of adding ["-H",
`Authorization: Bearer ${normalizeCredentialValue(apiKey)}`]. Use existing
helpers like normalizeCredentialValue and getChatCompletionsProbePayload and
modify how endpointUrl is combined so the query-param form is preserved for
retries.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7d89122c-342c-4bdd-aa78-6d2840e5b7a5

📥 Commits

Reviewing files that changed from the base of the PR and between 6fb96bd and 58604b5.

📒 Files selected for processing (9)
  • src/lib/adapters/docker/index.ts
  • src/lib/adapters/docker/runtime.test.ts
  • src/lib/adapters/docker/runtime.ts
  • src/lib/adapters/http/probe.test.ts
  • src/lib/adapters/http/probe.ts
  • src/lib/inference/local.ts
  • src/lib/inference/onboard-probes.test.ts
  • src/lib/inference/onboard-probes.ts
  • src/lib/onboard/local-inference-topology.ts

Comment thread src/lib/adapters/http/probe.ts
Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 29, 2026

PR Review Advisor

Findings: 3 needs attention, 3 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 5 still apply, 1 new item found

Review findings

🛠️ Needs attention

  • Linked issue's auth-proxy leak is still not addressed: PR fix(inference): retry ollama validation probes and docker runtime detection #4540 says it fixes [Onboard] Local Ollama validation fails with spawnSync curl ETIMEDOUT on host.docker.internal:11434 reachable #4501, but [Onboard] Local Ollama validation fails with spawnSync curl ETIMEDOUT on host.docker.internal:11434 reachable #4501 includes a separate failure mode: after onboarding exits, `ollama-auth-proxy.js` remains running. The current diff still only changes timeout normalization, probe retry/auth behavior, and Docker runtime detection; it does not touch proxy lifecycle or onboarding abort cleanup.
    • Recommendation: Either implement and test cleanup for the leaked `ollama-auth-proxy.js` path, or narrow the PR/closing language so this PR does not claim to fix that issue clause.
    • Evidence: Issue clause: "The auth proxy process also leaks and stays running after onboarding exits." The changed files do not include `src/lib/inference/ollama/proxy.ts`, local adapter lifecycle cleanup, uninstall cleanup, or onboarding abort cleanup.
  • Onboarding probe test monolith grew substantially (src/lib/inference/onboard-probes.test.ts): The PR adds another large block of fake-curl and subprocess monkeypatch coverage to an already large hotspot. The new tests are relevant, but continuing to grow this file makes the retry/probe suite harder to navigate and maintain.
    • Recommendation: Move the new retry scenarios into a focused companion test file or extract reusable fake-curl/probe-mocking helpers so this hotspot does not keep growing.
    • Evidence: Deterministic monolith delta: `src/lib/inference/onboard-probes.test.ts` base 794 lines, head 928 lines, delta +134; rationale: "Current monolith grew by 20 or more lines; extract or offset the growth before merge."
  • HTTP probe test hotspot grew substantially (src/lib/adapters/http/probe.test.ts): The timeout-normalization tests add significant size to an existing large probe test file. The cases are useful, but the growth continues a large-file hotspot.
    • Recommendation: Extract the timeout-normalization cases into a focused companion test file or offset the growth by extracting repeated spawnSync/file-output fixtures.
    • Evidence: Deterministic monolith delta: `src/lib/adapters/http/probe.test.ts` base 441 lines, head 525 lines, delta +84; rationale: "Current monolith grew by 20 or more lines; extract or offset the growth before merge."

🔎 Worth checking

  • Source-of-truth review needed: Docker runtime detection retry on unknown `docker info`: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `detectContainerRuntimeFromDockerInfo()` retries all `unknown` outputs up to 3 attempts with a 5s timeout, and callers in `local.ts` and `local-inference-topology.ts` now depend on it.
  • Docker runtime retry changes Ollama proxy routing without caller-level coverage (src/lib/adapters/docker/runtime.ts:8): The new detector retries every `unknown` `docker info` output up to 3 times with a 5s timeout, and both `getOllamaContainerPort()` and local inference topology now depend on it. That affects whether WSL Docker Desktop uses direct Ollama port 11434 or the auth proxy port, a security-sensitive routing decision. The helper test covers transient empty output, but no caller-level test proves the WSL Ollama route/proxy decision after a transient `docker info` result, and the source of the transient invalid state is not documented.
    • Recommendation: Add caller-level coverage for WSL + transient `docker info` unknown output proving `getOllamaContainerPort()` and topology choose the intended route after retry. Also document or tune the 3 × 5s retry budget and explain why the invalid `unknown` state must be handled here rather than at the Docker probe source.
    • Evidence: `DOCKER_INFO_RUNTIME_PROBE_ATTEMPTS = 3` and `DOCKER_INFO_RUNTIME_PROBE_TIMEOUT_MS = 5000` are introduced in `runtime.ts`; `src/lib/inference/local.ts` and `src/lib/onboard/local-inference-topology.ts` now call `detectContainerRuntimeFromDockerInfo()` instead of a single 1500ms `dockerInfo()` probe. The new test only exercises `detectContainerRuntimeFromDockerInfo({ dockerInfoImpl })` directly.
  • Streaming transport failures still look like fallback-eligible missing events (src/lib/adapters/http/probe.ts): `runStreamingEventProbe()` returns `missingEvents: REQUIRED_STREAMING_EVENTS` for transport/spawn failures, including the newly normalized `ETIMEDOUT` path. The caller in `probeOpenAiLikeEndpoint()` checks `!streamResult.ok && streamResult.missingEvents.length > 0` first and silently falls back to chat completions, even though the next branch says transport or execution failure should surface as a hard error.
    • Recommendation: Separate transport failures from semantic missing-event failures in `StreamingProbeResult` and add a caller-level test showing spawn/timeout errors take the hard-error path rather than the silent fallback path.
    • Evidence: In `runStreamingEventProbeImpl`, `result.error` now normalizes the curl status but still returns `missingEvents: REQUIRED_STREAMING_EVENTS`; in `probeOpenAiLikeEndpoint`, any non-ok result with non-empty `missingEvents` is treated as a fallback case before the hard-error branch.

🌱 Nice ideas

  • None.
Since last review details

Current findings:

  • Source-of-truth review needed: Docker runtime detection retry on unknown `docker info`: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `detectContainerRuntimeFromDockerInfo()` retries all `unknown` outputs up to 3 attempts with a 5s timeout, and callers in `local.ts` and `local-inference-topology.ts` now depend on it.
  • Linked issue's auth-proxy leak is still not addressed: PR fix(inference): retry ollama validation probes and docker runtime detection #4540 says it fixes [Onboard] Local Ollama validation fails with spawnSync curl ETIMEDOUT on host.docker.internal:11434 reachable #4501, but [Onboard] Local Ollama validation fails with spawnSync curl ETIMEDOUT on host.docker.internal:11434 reachable #4501 includes a separate failure mode: after onboarding exits, `ollama-auth-proxy.js` remains running. The current diff still only changes timeout normalization, probe retry/auth behavior, and Docker runtime detection; it does not touch proxy lifecycle or onboarding abort cleanup.
    • Recommendation: Either implement and test cleanup for the leaked `ollama-auth-proxy.js` path, or narrow the PR/closing language so this PR does not claim to fix that issue clause.
    • Evidence: Issue clause: "The auth proxy process also leaks and stays running after onboarding exits." The changed files do not include `src/lib/inference/ollama/proxy.ts`, local adapter lifecycle cleanup, uninstall cleanup, or onboarding abort cleanup.
  • Onboarding probe test monolith grew substantially (src/lib/inference/onboard-probes.test.ts): The PR adds another large block of fake-curl and subprocess monkeypatch coverage to an already large hotspot. The new tests are relevant, but continuing to grow this file makes the retry/probe suite harder to navigate and maintain.
    • Recommendation: Move the new retry scenarios into a focused companion test file or extract reusable fake-curl/probe-mocking helpers so this hotspot does not keep growing.
    • Evidence: Deterministic monolith delta: `src/lib/inference/onboard-probes.test.ts` base 794 lines, head 928 lines, delta +134; rationale: "Current monolith grew by 20 or more lines; extract or offset the growth before merge."
  • HTTP probe test hotspot grew substantially (src/lib/adapters/http/probe.test.ts): The timeout-normalization tests add significant size to an existing large probe test file. The cases are useful, but the growth continues a large-file hotspot.
    • Recommendation: Extract the timeout-normalization cases into a focused companion test file or offset the growth by extracting repeated spawnSync/file-output fixtures.
    • Evidence: Deterministic monolith delta: `src/lib/adapters/http/probe.test.ts` base 441 lines, head 525 lines, delta +84; rationale: "Current monolith grew by 20 or more lines; extract or offset the growth before merge."
  • Docker runtime retry changes Ollama proxy routing without caller-level coverage (src/lib/adapters/docker/runtime.ts:8): The new detector retries every `unknown` `docker info` output up to 3 times with a 5s timeout, and both `getOllamaContainerPort()` and local inference topology now depend on it. That affects whether WSL Docker Desktop uses direct Ollama port 11434 or the auth proxy port, a security-sensitive routing decision. The helper test covers transient empty output, but no caller-level test proves the WSL Ollama route/proxy decision after a transient `docker info` result, and the source of the transient invalid state is not documented.
    • Recommendation: Add caller-level coverage for WSL + transient `docker info` unknown output proving `getOllamaContainerPort()` and topology choose the intended route after retry. Also document or tune the 3 × 5s retry budget and explain why the invalid `unknown` state must be handled here rather than at the Docker probe source.
    • Evidence: `DOCKER_INFO_RUNTIME_PROBE_ATTEMPTS = 3` and `DOCKER_INFO_RUNTIME_PROBE_TIMEOUT_MS = 5000` are introduced in `runtime.ts`; `src/lib/inference/local.ts` and `src/lib/onboard/local-inference-topology.ts` now call `detectContainerRuntimeFromDockerInfo()` instead of a single 1500ms `dockerInfo()` probe. The new test only exercises `detectContainerRuntimeFromDockerInfo({ dockerInfoImpl })` directly.
  • Streaming transport failures still look like fallback-eligible missing events (src/lib/adapters/http/probe.ts): `runStreamingEventProbe()` returns `missingEvents: REQUIRED_STREAMING_EVENTS` for transport/spawn failures, including the newly normalized `ETIMEDOUT` path. The caller in `probeOpenAiLikeEndpoint()` checks `!streamResult.ok && streamResult.missingEvents.length > 0` first and silently falls back to chat completions, even though the next branch says transport or execution failure should surface as a hard error.
    • Recommendation: Separate transport failures from semantic missing-event failures in `StreamingProbeResult` and add a caller-level test showing spawn/timeout errors take the hard-error path rather than the silent fallback path.
    • Evidence: In `runStreamingEventProbeImpl`, `result.error` now normalizes the curl status but still returns `missingEvents: REQUIRED_STREAMING_EVENTS`; in `probeOpenAiLikeEndpoint`, any non-ok result with non-empty `missingEvents` is treated as a fallback case before the hard-error branch.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

Copy link
Copy Markdown
Contributor

@ericksoa ericksoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick follow-up here. I no longer see the streaming-probe timeout-normalization issue on the current head, but one retry-path blocker remains before I can approve.

In src/lib/inference/onboard-probes.ts, the doubled-timeout retry path for ordinary Chat Completions probes does not preserve authMode: "query-param". The first request uses the expected ...?key=... URL, but after a timeout buildRetryArgs() always adds Authorization: Bearer ... and retries /chat/completions without the query key. I reproduced this against the current head by stubbing runCurlProbe: the first call had hasBearer=false and url=https://api.example.com/v1/chat/completions?key=...; the retry had hasBearer=true and url=https://api.example.com/v1/chat/completions. That means timeout recovery can still turn into an auth failure for query-param providers.

Please keep the retry auth/url construction aligned with the original probe auth mode: no Bearer header for query-param, and append the normalized/encoded key to the retry URL instead.

Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@ericksoa ericksoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rechecked the current head after the follow-up fixes. The streaming timeout normalization is now applied to the streaming paths, and the doubled-timeout retry now preserves query-param auth by reusing the original auth/url construction.

Local validation passed for npm run build:cli and focused vitest coverage: src/lib/adapters/http/probe.test.ts, src/lib/adapters/docker/runtime.test.ts, and src/lib/inference/onboard-probes.test.ts. Live PR checks are green as well. Approving.

@jyaunches jyaunches added the R1 label May 29, 2026
@ericksoa ericksoa merged commit 95d483f into main May 29, 2026
30 checks passed
@ericksoa ericksoa deleted the fix/windows-ollama-validation-timeout branch May 29, 2026 19:03
miyoungc added a commit that referenced this pull request May 29, 2026
## Summary
Promotes NemoClaw user skills in the most visible docs entry points and
refreshes the docs for v0.0.55.
This keeps the public docs and generated user skills aligned with the
latest release tag.

## Related Issue
None.

## Changes
- Adds homepage, overview, prerequisites, and quickstart links that make
NemoClaw user skills easier to discover.
- #4540 -> `docs/inference/use-local-inference.mdx` and
`docs/about/release-notes.mdx`: Documents local Ollama validation
retries and Docker runtime detection retries.
- #4519 -> `docs/about/release-notes.mdx`: Notes the plugin
secret-scanner fallback behavior for embedded fallback mode.
- #4526 -> `docs/get-started/quickstart.mdx`,
`docs/manage-sandboxes/messaging-channels.mdx`, and
`docs/about/release-notes.mdx`: Documents that pressing Enter with no
messaging channels selected skips setup.
- Refreshes generated `nemoclaw-user-*` skills from the updated Fern
docs.

## Type of Change
- [ ] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [x] Doc only (prose changes, no code sample modifications)
- [ ] Doc only (includes code sample changes)

## Verification
- [ ] `npx prek run --all-files` passes
- [ ] `npm test` passes
- [ ] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [x] Docs updated for user-facing behavior changes
- [ ] `npm run docs` builds without warnings (doc changes only)
- [x] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

Verification details:
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --doc-platform fern-mdx` passed.
- `npm run docs` passed with 0 errors and the existing Fern light-mode
contrast warning.
- Commit hooks passed for the docs commits.
- Full all-files/pre-push checks were attempted but hit unrelated CLI
test timeouts, so the all-files checkbox is left unchecked.

---
Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Documentation**
* Added Agent Skills guidance and links across overview, getting
started, quickstart, and homepage to load NemoClaw user skills into AI
coding assistants.
* Clarified local inference onboarding reliability (retry/timeout
behavior and Docker detection) for Ollama/vLLM.
* Clarified messaging-channel onboarding to skip setup when no channels
are selected and Enter is pressed.
* Improved secret-scanner behavior descriptions and added v0.0.55
release notes.

<!-- review_stack_entry_start -->

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/4547?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

<!-- review_stack_entry_end -->

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v0.0.55 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Onboard] Local Ollama validation fails with spawnSync curl ETIMEDOUT on host.docker.internal:11434 reachable

4 participants