fix(inference): retry ollama validation probes and docker runtime detection#4540
Conversation
Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds a retryable Docker runtime detector and re-exports it; normalizes spawnSync ETIMEDOUT → curl_status -110 across HTTP probes; introduces probe timeout/connection-failure predicates; wires changes into onboarding/local inference/topology; and adds tests for retry and timeout behaviors. ChangesDocker runtime detection and probe timeout handling
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>
E2E Advisor RecommendationRequired E2E: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
|
E2E Scenario Advisor RecommendationRequired scenario E2E: Dispatch required scenario E2E:
Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lib/inference/onboard-probes.ts (1)
757-766:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPreserve
query-paramauth mode in doubled-timeout retry requests.The retry path always switches to Bearer header auth and drops
?key=.... IfauthModeis"query-param", retry calls can become unauthenticated and fail despite transient first-attempt timeouts.Proposed fix
diff --git a/src/lib/inference/onboard-probes.ts b/src/lib/inference/onboard-probes.ts @@ - const buildRetryArgs = () => [ + const retryAuthHeader = + !useQueryParam && normalizedKey ? ["-H", `Authorization: Bearer ${normalizedKey}`] : []; + const retryUrl = + useQueryParam && normalizedKey + ? `${String(endpointUrl).replace(/\/+$/, "")}/chat/completions?key=${encodeURIComponent(normalizedKey)}` + : `${String(endpointUrl).replace(/\/+$/, "")}/chat/completions`; + const buildRetryArgs = () => [ "-sS", ...doubledArgs, "-H", "Content-Type: application/json", - ...(apiKey ? ["-H", `Authorization: Bearer ${normalizeCredentialValue(apiKey)}`] : []), + ...retryAuthHeader, "-d", JSON.stringify(getChatCompletionsProbePayload(model)), - `${String(endpointUrl).replace(/\/+$/, "")}/chat/completions`, + retryUrl, ];🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/inference/onboard-probes.ts` around lines 757 - 766, The doubled-timeout retry builder buildRetryArgs currently always adds an Authorization: Bearer header and drops query-param auth; update it to respect authMode so that if authMode === "query-param" you do NOT add the Authorization header and instead append the API key as a URL query parameter (properly URL-encoded and preserving existing query string) to the chat/completions endpoint; otherwise keep the existing behavior of adding ["-H", `Authorization: Bearer ${normalizeCredentialValue(apiKey)}`]. Use existing helpers like normalizeCredentialValue and getChatCompletionsProbePayload and modify how endpointUrl is combined so the query-param form is preserved for retries.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/adapters/http/probe.ts`:
- Around line 106-112: The streaming probe path currently returns raw errno
values (e.g., -60 on macOS) and doesn't use normalizeSpawnErrorCode, so
timeout-aware logic misses retries; update the streaming probe error handling to
call normalizeSpawnErrorCode on the error (same helper used by runCurlProbeImpl)
before returning or propagating the code and ensure any places that inspect
errno/exit codes for timeouts use the normalized value (-110) instead of raw
errno; locate the streaming handler function in this file and replace/augment
its raw errno extraction with a call to normalizeSpawnErrorCode(error).
---
Outside diff comments:
In `@src/lib/inference/onboard-probes.ts`:
- Around line 757-766: The doubled-timeout retry builder buildRetryArgs
currently always adds an Authorization: Bearer header and drops query-param
auth; update it to respect authMode so that if authMode === "query-param" you do
NOT add the Authorization header and instead append the API key as a URL query
parameter (properly URL-encoded and preserving existing query string) to the
chat/completions endpoint; otherwise keep the existing behavior of adding ["-H",
`Authorization: Bearer ${normalizeCredentialValue(apiKey)}`]. Use existing
helpers like normalizeCredentialValue and getChatCompletionsProbePayload and
modify how endpointUrl is combined so the query-param form is preserved for
retries.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 7d89122c-342c-4bdd-aa78-6d2840e5b7a5
📒 Files selected for processing (9)
src/lib/adapters/docker/index.tssrc/lib/adapters/docker/runtime.test.tssrc/lib/adapters/docker/runtime.tssrc/lib/adapters/http/probe.test.tssrc/lib/adapters/http/probe.tssrc/lib/inference/local.tssrc/lib/inference/onboard-probes.test.tssrc/lib/inference/onboard-probes.tssrc/lib/onboard/local-inference-topology.ts
Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>
PR Review AdvisorFindings: 3 needs attention, 3 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Since last review detailsCurrent findings:
This is an automated advisory review. A human maintainer must make the final merge decision. |
ericksoa
left a comment
There was a problem hiding this comment.
Thanks for the quick follow-up here. I no longer see the streaming-probe timeout-normalization issue on the current head, but one retry-path blocker remains before I can approve.
In src/lib/inference/onboard-probes.ts, the doubled-timeout retry path for ordinary Chat Completions probes does not preserve authMode: "query-param". The first request uses the expected ...?key=... URL, but after a timeout buildRetryArgs() always adds Authorization: Bearer ... and retries /chat/completions without the query key. I reproduced this against the current head by stubbing runCurlProbe: the first call had hasBearer=false and url=https://api.example.com/v1/chat/completions?key=...; the retry had hasBearer=true and url=https://api.example.com/v1/chat/completions. That means timeout recovery can still turn into an auth failure for query-param providers.
Please keep the retry auth/url construction aligned with the original probe auth mode: no Bearer header for query-param, and append the normalized/encoded key to the retry URL instead.
Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>
ericksoa
left a comment
There was a problem hiding this comment.
Rechecked the current head after the follow-up fixes. The streaming timeout normalization is now applied to the streaming paths, and the doubled-timeout retry now preserves query-param auth by reusing the original auth/url construction.
Local validation passed for npm run build:cli and focused vitest coverage: src/lib/adapters/http/probe.test.ts, src/lib/adapters/docker/runtime.test.ts, and src/lib/inference/onboard-probes.test.ts. Live PR checks are green as well. Approving.
## Summary Promotes NemoClaw user skills in the most visible docs entry points and refreshes the docs for v0.0.55. This keeps the public docs and generated user skills aligned with the latest release tag. ## Related Issue None. ## Changes - Adds homepage, overview, prerequisites, and quickstart links that make NemoClaw user skills easier to discover. - #4540 -> `docs/inference/use-local-inference.mdx` and `docs/about/release-notes.mdx`: Documents local Ollama validation retries and Docker runtime detection retries. - #4519 -> `docs/about/release-notes.mdx`: Notes the plugin secret-scanner fallback behavior for embedded fallback mode. - #4526 -> `docs/get-started/quickstart.mdx`, `docs/manage-sandboxes/messaging-channels.mdx`, and `docs/about/release-notes.mdx`: Documents that pressing Enter with no messaging channels selected skips setup. - Refreshes generated `nemoclaw-user-*` skills from the updated Fern docs. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [x] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [ ] `npx prek run --all-files` passes - [ ] `npm test` passes - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [ ] `npm run docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) Verification details: - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` passed. - `npm run docs` passed with 0 errors and the existing Fern light-mode contrast warning. - Commit hooks passed for the docs commits. - Full all-files/pre-push checks were attempted but hit unrelated CLI test timeouts, so the all-files checkbox is left unchecked. --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Added Agent Skills guidance and links across overview, getting started, quickstart, and homepage to load NemoClaw user skills into AI coding assistants. * Clarified local inference onboarding reliability (retry/timeout behavior and Docker detection) for Ollama/vLLM. * Clarified messaging-channel onboarding to skip setup when no channels are selected and Enter is pressed. * Improved secret-scanner behavior descriptions and added v0.0.55 release notes. <!-- review_stack_entry_start --> [](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/4547?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
Improves local Ollama onboarding reliability by retrying validation after host-side curl process timeouts and making Docker Desktop runtime detection less sensitive to transient
docker infodelays.Related Issue
Fixes #4501
Changes
spawnSyncETIMEDOUTfailures so Ollama validation treats them as retryable probe timeouts.docker infotimeout before falling back to proxy routing.Type of Change
Verification
npx prek run --all-filespassesnpm testpassesnpm run docsbuilds without warnings (doc changes only)Signed-off-by: zyang-dev 267119621+zyang-dev@users.noreply.github.com
Summary by CodeRabbit
New Features
Tests