Skip to content

fix(onboard): probe-and-pick WSL2 host IP candidates for local inference#1864

Open
gburachas wants to merge 3 commits intomainfrom
fix/ollama-inference-timeout-1472
Open

fix(onboard): probe-and-pick WSL2 host IP candidates for local inference#1864
gburachas wants to merge 3 commits intomainfrom
fix/ollama-inference-timeout-1472

Conversation

@gburachas
Copy link
Copy Markdown

@gburachas gburachas commented Apr 14, 2026

Summary

On WSL2 with Docker Desktop, host.openshell.internal (via Docker's host-gateway) resolves to an unreachable IPv6 ULA or un-routable gateway IP. That breaks:

  • The onboard-time container reachability probe.
  • Runtime inference routing from inside the sandbox to a host-side Ollama or vLLM.

This PR resolves a reachable IPv4 on WSL2 + Docker Desktop and uses it for both the OPENAI_BASE_URL written into the gateway and the container reachability probe. It now handles both Ollama/vLLM placements correctly:

  • Server inside WSL → reached via the distro's eth0 IPv4 (ip -4 -o route get 1.1.1.1 src).
  • Server on the Windows host (NAT mode) → reached via the WSL2 default gateway (ip -4 -o route show default).

detectWsl2HostIpCandidates() returns an ordered, filtered list (drops loopback, link-local, and common Docker/k8s bridge ranges). validateLocalProvider() probes each candidate with the container reachability check and returns the first that answers as resolvedHostIp. onboard.js uses that winner, prints it, and tries to persist it to the sandbox registry. Non-WSL2 platforms are unchanged — host.openshell.internal remains the default.

Host-side prerequisites (for Windows-hosted Ollama)

When Ollama runs on the Windows host (not inside WSL), the fix only helps once the host itself is reachable from WSL. Run these before onboarding (full step-by-step with shell labels lives in docs/reference/troubleshooting.md):

  1. Bind Ollama to all interfaces — PowerShell (Administrator):
    [System.Environment]::SetEnvironmentVariable('OLLAMA_HOST','0.0.0.0:11434','Machine')
    Then restart Ollama from a fresh PowerShell.
  2. Allow inbound TCP 11434 in Windows Defender Firewall — PowerShell (Administrator):
    New-NetFirewallRule -DisplayName "Ollama 11434 (WSL)" -Direction Inbound -Protocol TCP -LocalPort 11434 -Action Allow -Profile Any
  3. Switch WSL2 to mirrored networking mode. NAT mode on recent Windows 11 routes WSL traffic through a separate Hyper-V firewall layer that ignores standard inbound rules (NATInboundRuleNotApplicable). Add to %USERPROFILE%\.wslconfig:
    [wsl2]
    networkingMode=mirrored
    Then wsl --shutdown from PowerShell and reopen WSL.

This PR itself does not change any of the above — it handles the NemoClaw/Docker side once the host is reachable.

Linked issues

Addressed in this PR

Related but not fixed here

Verification

End-to-end on WSL2 (mirrored networking) + Docker Desktop + Windows-hosted Ollama, with qwen2.5-coder:3b:

Unit tests: 43/43 pass in src/lib/local-inference.test.ts, covering candidate ordering, filtering (docker/loopback/link-local/IPv6), probe-and-pick fallback to the default gateway, resolvedHostIp return value, and non-WSL2 regression path.

Known follow-ups (tracked separately)

  • registry.updateSandbox(sandboxName, {resolvedHostIp, model, provider}) is a no-op at step 4/8 because the sandbox entry is created at step 6/8 (same pre-existing bug affects model and provider).
  • Onboard does not always return cleanly to the shell after step 8/8.

Type of Change

  • Code change for a bug fix.

Testing

  • Unit tests added for new helpers and branches (detectWsl2HostIpCandidates, probe-and-pick fallback, candidate filtering).
  • npm run typecheck:cli passes.
  • Targeted vitest --project cli src/lib/local-inference.test.ts — 43/43 pass.
  • Manual E2E on WSL2 + Docker Desktop + Windows Ollama: onboard completes, sandbox → Ollama chat roundtrip ~3s.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Resolved local inference connectivity on WSL2 + Docker Desktop by probing multiple host IP candidates and routing via the first reachable address.
    • Onboarding validation now prompts on warnings instead of exiting immediately.
  • New Features

    • Automatically persists the detected host IP for subsequent local-inference sessions.
  • Documentation

    • Added detailed troubleshooting for WSL2 + Docker Desktop networking and manual workarounds.
  • Tests

    • Expanded tests covering host-IP detection and validation flows.

gburachas and others added 2 commits April 10, 2026 17:46
#1472)

On WSL2 with Docker Desktop, `host.openshell.internal` resolves to an
unreachable IPv6 ULA or gateway IP.  This breaks both the onboard
container reachability check (step 4/8 hard-exits) and runtime inference
routing (proxy cannot reach upstream Ollama).

Changes:
- detect WSL2 + Docker Desktop at onboard time and resolve the distro's
  eth0 IPv4 via `hostname -I`
- pass the reachable IP to `OPENAI_BASE_URL` and the container
  reachability probe instead of `host.openshell.internal`
- add `-4` flag to curl in the reachability check to force IPv4
- replace hard `process.exit(1)` with a "Continue anyway?" prompt when
  the container reachability check fails
- improve the error message with actionable diagnostic causes

Non-WSL2 platforms (macOS, native Linux, Docker Engine) are unaffected;
`host.openshell.internal` remains the default when no override is needed.

Signed-off-by: Giedrius Burachas <gburachas@nvidia.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extends the earlier host.openshell.internal override to cover both
Ollama/vLLM server placements on WSL2 + Docker Desktop:

  A. Server inside WSL  — reached via the distro's eth0 IPv4
     (`ip -4 -o route get 1.1.1.1` src).
  B. Server on Windows host (NAT mode) — reached only via the WSL2
     default gateway (`ip -4 -o route show default`).

`detectWsl2HostIpCandidates()` now returns an ordered, filtered list
(drops loopback, link-local, and common Docker/k8s bridge ranges).
`validateLocalProvider()` probes each candidate with the container
reachability check and returns the first that answers as
`resolvedHostIp`. `onboard.js` uses that winner for OPENAI_BASE_URL and
the in-container probe, prints the resolved IP, and tries to persist
it to the sandbox registry. Non-WSL2 platforms are unchanged.

Verified end-to-end: onboard with ollama-local + Windows-hosted Ollama
over WSL2 mirrored networking reaches step 8/8, prints a resolved host
IP, and a 5-turn chat completion from inside the sandbox returns in
~3s with no timeout.

Adds a note to docs/CONTRIBUTING.md telling AI contributors not to
stage `.agents/skills/nemoclaw-user-*/` — those files are regenerated
from `docs/` by the pre-commit hook.

Addresses: #1472 (host.openshell.internal unreachable on WSL2 +
Docker Desktop, runtime inference routing), #336 (sandbox cannot reach
Windows-hosted Ollama; covers points 1 and 3 of the reporter's
reproduction).

Related (not fixed here):
  - #305 — WSL2 tracking issue; this PR helps the inference path only,
    not the gateway bootstrap / image-pull / TLS issues listed there.
  - #315 — vLLM-on-WSL2 walkthrough; sandbox egress iptables/veth
    workarounds documented there are orthogonal.
  - #246 — Ollama reasoning-model blank-content bug; separate issue,
    pick a non-reasoning model until it's fixed.

Known follow-ups, not in this PR:
  - `registry.updateSandbox(sandboxName, {resolvedHostIp, ...})` is a
    no-op at step 4/8 because the sandbox entry is created at step 6/8;
    persistence order needs fixing (same pre-existing bug affects
    `model` and `provider` fields).
  - Onboard does not always return cleanly to the shell after step 8/8.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 14, 2026

📝 Walkthrough
📝 Walkthrough
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: probing and selecting WSL2 host IP candidates for local inference restoration.
Linked Issues check ✅ Passed The PR implements all coding requirements from issue #1472: detects reachable host IPs via probing on WSL2, restores container reachability to local Ollama/vLLM endpoints, and persists the resolved IP for consistent routing.
Out of Scope Changes check ✅ Passed All changes are directly related to resolving the WSL2 host IP reachability issue: local inference validation logic, onboarding flow, documentation, tests, registry persistence, and contributor guidance.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/ollama-inference-timeout-1472

Comment @coderabbitai help to get the list of available commands and usage tips.

When Ollama runs on the Windows host (not inside WSL), NemoClaw's
auto-detection only helps once the host itself is actually reachable
from WSL. Document the three prerequisite steps, each labeled with the
shell to run it in (PowerShell vs WSL):

  1. Bind Ollama to 0.0.0.0 via OLLAMA_HOST at Machine scope.
  2. Allow inbound TCP 11434 in Windows Defender Firewall.
  3. Switch WSL2 to mirrored networking mode (NAT mode hits
     NATInboundRuleNotApplicable on the Hyper-V firewall layer).

Also improves the fallback troubleshooting list with explicit shell
labels and the gateway-IP manual override for users who cannot switch
to mirrored mode.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
docs/CONTRIBUTING.md (1)

36-36: Please align this note with docs style rules (bold/colon/line-splitting).

This line has three style issues: unnecessary bold on routine instruction (LLM pattern detected), a colon not introducing a list, and multiple sentences on one source line.

Suggested edit
-**For AI coding assistants:** Do not `git add` any file under `.agents/skills/nemoclaw-user-*/` — not even when `git status` shows it as modified. The pre-commit hook regenerates and stages those files automatically from `docs/`. Staging them manually makes the commit diff harder to review and can mask out-of-date hand edits. If you changed user-facing behavior, update the matching page under `docs/` and stage only `docs/**/*.md`; the hook does the rest.
+For AI coding assistants, do not `git add` any file under `.agents/skills/nemoclaw-user-*/`, even when `git status` shows it as modified.
+The pre-commit hook regenerates and stages those files automatically from `docs/`.
+Staging them manually makes the commit diff harder to review and can mask out-of-date hand edits.
+If you changed user-facing behavior, update the matching page under `docs/` and stage only `docs/**/*.md`.
+The hook does the rest.

As per coding guidelines: “One sentence per line in source”, “Colons should only introduce a list”, and “Unnecessary bold on routine instructions … flag as suggestions with the note ‘LLM pattern detected.’”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/CONTRIBUTING.md` at line 36, Update the sentence about AI coding
assistants to follow docs style: remove the bold formatting around the routine
instruction, change the colon so it either introduces a list or is replaced with
a period, and split the content so there is one sentence per source line;
reference the exact text mentioning `.agents/skills/nemoclaw-user-*/` and
`docs/` when making the change and add a short suggestion note "LLM pattern
detected" (not bold) after the instruction to indicate it’s a stylistic
suggestion rather than a rule.
docs/reference/troubleshooting.md (1)

158-200: Reflow this section to one sentence per line and remove routine bold emphasis.

The new prose is wrapped mid-sentence across multiple source lines, and bolding phrases like “inside WSL”, “Windows host”, and “mirrored” reads like routine emphasis rather than a warning. LLM pattern detected.

As per coding guidelines, "One sentence per line in source (makes diffs readable)" and "Bold is reserved for UI labels, parameter names, and genuine warnings."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/reference/troubleshooting.md` around lines 158 - 200, The "Local
inference on WSL2 + Docker Desktop" section has wrapped sentences across lines
and uses routine bolding (e.g., "**inside WSL**", "**Windows host**",
"**mirrored**"); reflow every sentence so each ends on its own source line and
remove bold from routine emphasis (leave bold only for UI labels/params/warnings
like OPENAI_BASE_URL, host.openshell.internal, OLLAMA_HOST), preserving content
and examples (ip commands, env vars, resolvedHostIp, NO_PROXY) and ensuring
lists and bullet points remain one sentence per line for clearer diffs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/lib/onboard.js`:
- Around line 3405-3409: The call to registry.updateSandbox(...) is writing
using the temporary GATEWAY_NAME before the real sandbox exists (setupInference
runs before createSandbox), so its return is false and the data is lost; fix by
deferring the persistence until after the sandbox is registered (i.e., after
createSandbox completes inside onboard) or by caching the {model, provider,
resolvedHostIp} in session state and applying them when registry.createSandbox /
registry.updateSandbox is called for the real sandbox name; specifically modify
the flow around setupInference(), createSandbox(), and the
registry.updateSandbox call so you either move the registry.updateSandbox
invocation to post-createSandbox or add a session/cache write-read that
registry.updateSandbox consumes once the real sandbox entry exists.
- Around line 3324-3331: On validation failure in the local-provider probe
branches (the blocks that check validation.ok, call prompt(), and call
process.exit(1)), respect the global non-interactive flag instead of always
invoking prompt(): if the non-interactive mode flag (the same boolean used
elsewhere in the wizard, e.g., nonInteractive or flags.nonInteractive) is set
then log the validation.message and immediately call process.exit(1) (no
prompt), otherwise keep the existing interactive prompt flow; apply this same
guard to both places that call prompt() (the shown block and the other branch
around lines 3362-3369).

In `@src/lib/local-inference.ts`:
- Around line 85-96: The curl-based container reachability commands returned by
getLocalProviderContainerReachabilityCheck (cases "vllm-local" and
"ollama-local") lack timeout flags and can block on blackholed IPs; modify
getLocalProviderContainerReachabilityCheck to add the same timeout options used
by getOllamaProbeCommand (e.g., --max-time and optionally --connect-timeout) to
the returned command strings so each probe times out quickly and the caller loop
(the WSL2 IP candidate loop) can proceed promptly.

---

Nitpick comments:
In `@docs/CONTRIBUTING.md`:
- Line 36: Update the sentence about AI coding assistants to follow docs style:
remove the bold formatting around the routine instruction, change the colon so
it either introduces a list or is replaced with a period, and split the content
so there is one sentence per source line; reference the exact text mentioning
`.agents/skills/nemoclaw-user-*/` and `docs/` when making the change and add a
short suggestion note "LLM pattern detected" (not bold) after the instruction to
indicate it’s a stylistic suggestion rather than a rule.

In `@docs/reference/troubleshooting.md`:
- Around line 158-200: The "Local inference on WSL2 + Docker Desktop" section
has wrapped sentences across lines and uses routine bolding (e.g., "**inside
WSL**", "**Windows host**", "**mirrored**"); reflow every sentence so each ends
on its own source line and remove bold from routine emphasis (leave bold only
for UI labels/params/warnings like OPENAI_BASE_URL, host.openshell.internal,
OLLAMA_HOST), preserving content and examples (ip commands, env vars,
resolvedHostIp, NO_PROXY) and ensuring lists and bullet points remain one
sentence per line for clearer diffs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: a21e145a-cafe-493c-b807-56b5bd3afcb8

📥 Commits

Reviewing files that changed from the base of the PR and between cd36c58 and 5800da8.

📒 Files selected for processing (6)
  • bin/lib/onboard.js
  • docs/CONTRIBUTING.md
  • docs/reference/troubleshooting.md
  • src/lib/local-inference.test.ts
  • src/lib/local-inference.ts
  • src/lib/registry.ts

Comment thread bin/lib/onboard.js
Comment on lines 3324 to +3331
if (!validation.ok) {
console.error(` ${validation.message}`);
process.exit(1);
const answer = (await prompt(" Continue anyway? Inference may fail at runtime. [y/N]: "))
.trim()
.toLowerCase();
if (answer !== "y") {
process.exit(1);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep local-provider probe failures non-interactive in --non-interactive mode.

These branches now call prompt() unconditionally on validation failure. If nemoclaw onboard --non-interactive selects ollama or vllm, a bad probe turns into a hang instead of the hard failure the rest of the wizard uses.

Suggested fix
 if (!validation.ok) {
   console.error(`  ${validation.message}`);
+  if (isNonInteractive()) {
+    process.exit(1);
+  }
   const answer = (await prompt("  Continue anyway? Inference may fail at runtime. [y/N]: "))
     .trim()
     .toLowerCase();
   if (answer !== "y") {
     process.exit(1);
   }
 }

Apply the same guard in both local-provider branches.

Also applies to: 3362-3369

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 3324 - 3331, On validation failure in the
local-provider probe branches (the blocks that check validation.ok, call
prompt(), and call process.exit(1)), respect the global non-interactive flag
instead of always invoking prompt(): if the non-interactive mode flag (the same
boolean used elsewhere in the wizard, e.g., nonInteractive or
flags.nonInteractive) is set then log the validation.message and immediately
call process.exit(1) (no prompt), otherwise keep the existing interactive prompt
flow; apply this same guard to both places that call prompt() (the shown block
and the other branch around lines 3362-3369).

Comment thread bin/lib/onboard.js
Comment on lines +3405 to +3409
registry.updateSandbox(sandboxName, {
model,
provider,
resolvedHostIp: resolvedHostIp || null,
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This registry write never reaches the real sandbox entry.

setupInference() still runs before createSandbox(), and onboard() passes GATEWAY_NAME here (Line 4415), not the eventual sandbox name. That makes registry.updateSandbox(...) return false and silently drop model, provider, and resolvedHostIp for normal onboard runs.

Please move this persistence until after the sandbox has been registered, or store it in session state and apply it once the real sandbox entry exists.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 3405 - 3409, The call to
registry.updateSandbox(...) is writing using the temporary GATEWAY_NAME before
the real sandbox exists (setupInference runs before createSandbox), so its
return is false and the data is lost; fix by deferring the persistence until
after the sandbox is registered (i.e., after createSandbox completes inside
onboard) or by caching the {model, provider, resolvedHostIp} in session state
and applying them when registry.createSandbox / registry.updateSandbox is called
for the real sandbox name; specifically modify the flow around setupInference(),
createSandbox(), and the registry.updateSandbox call so you either move the
registry.updateSandbox invocation to post-createSandbox or add a session/cache
write-read that registry.updateSandbox consumes once the real sandbox entry
exists.

Comment on lines +85 to +96
export function getLocalProviderContainerReachabilityCheck(
provider: string,
hostIp?: string,
): string | null {
const addHost = hostIp
? `--add-host host.openshell.internal:${hostIp}`
: "--add-host host.openshell.internal:host-gateway";
switch (provider) {
case "vllm-local":
return `docker run --rm --add-host host.openshell.internal:host-gateway ${CONTAINER_REACHABILITY_IMAGE} -sf http://host.openshell.internal:8000/v1/models 2>/dev/null`;
return `docker run --rm ${addHost} ${CONTAINER_REACHABILITY_IMAGE} -4 -sf http://host.openshell.internal:8000/v1/models 2>/dev/null`;
case "ollama-local":
return `docker run --rm --add-host host.openshell.internal:host-gateway ${CONTAINER_REACHABILITY_IMAGE} -sf http://host.openshell.internal:11434/api/tags 2>/dev/null`;
return `docker run --rm ${addHost} ${CONTAINER_REACHABILITY_IMAGE} -4 -sf http://host.openshell.internal:11434/api/tags 2>/dev/null`;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check if the file exists and get its total line count
if [ -f "src/lib/local-inference.ts" ]; then
  echo "File exists. Line count:"
  wc -l src/lib/local-inference.ts
else
  echo "File not found"
fi

Repository: NVIDIA/NemoClaw

Length of output: 1870


🏁 Script executed:

# Read the specific lines mentioned in the review (85-96 and 201-210)
echo "=== Lines 85-96 ==="
sed -n '85,96p' src/lib/local-inference.ts

echo ""
echo "=== Lines 201-210 ==="
sed -n '201,210p' src/lib/local-inference.ts

Repository: NVIDIA/NemoClaw

Length of output: 1185


🏁 Script executed:

# Check for SPDX license header at the top of the file
echo "=== First 10 lines (checking for SPDX header) ==="
head -10 src/lib/local-inference.ts

Repository: NVIDIA/NemoClaw

Length of output: 508


🏁 Script executed:

# Search for all curl commands in the file to understand the context
echo "=== All curl/container reachability commands in the file ==="
rg "curl|getLocalProviderContainerReachabilityCheck" src/lib/local-inference.ts -A 3

Repository: NVIDIA/NemoClaw

Length of output: 1505


Add timeout flags to container reachability probe commands.

The getLocalProviderContainerReachabilityCheck() function returns curl commands without timeout flags. When this function is called in a loop for multiple WSL2 host IP candidates (lines 201-210), a blackholed IP can block progress for minutes before attempting the next candidate. This is inconsistent with getOllamaProbeCommand() in the same file, which already includes --max-time.

Suggested fix
 export function getLocalProviderContainerReachabilityCheck(
   provider: string,
   hostIp?: string,
 ): string | null {
   const addHost = hostIp
     ? `--add-host host.openshell.internal:${hostIp}`
     : "--add-host host.openshell.internal:host-gateway";
   switch (provider) {
     case "vllm-local":
-      return `docker run --rm ${addHost} ${CONTAINER_REACHABILITY_IMAGE} -4 -sf http://host.openshell.internal:8000/v1/models 2>/dev/null`;
+      return `docker run --rm ${addHost} ${CONTAINER_REACHABILITY_IMAGE} -4 --connect-timeout 2 --max-time 5 -sf http://host.openshell.internal:8000/v1/models 2>/dev/null`;
     case "ollama-local":
-      return `docker run --rm ${addHost} ${CONTAINER_REACHABILITY_IMAGE} -4 -sf http://host.openshell.internal:11434/api/tags 2>/dev/null`;
+      return `docker run --rm ${addHost} ${CONTAINER_REACHABILITY_IMAGE} -4 --connect-timeout 2 --max-time 5 -sf http://host.openshell.internal:11434/api/tags 2>/dev/null`;
     default:
       return null;
   }
 }

Also applies to: 201-210

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/local-inference.ts` around lines 85 - 96, The curl-based container
reachability commands returned by getLocalProviderContainerReachabilityCheck
(cases "vllm-local" and "ollama-local") lack timeout flags and can block on
blackholed IPs; modify getLocalProviderContainerReachabilityCheck to add the
same timeout options used by getOllamaProbeCommand (e.g., --max-time and
optionally --connect-timeout) to the returned command strings so each probe
times out quickly and the caller loop (the WSL2 IP candidate loop) can proceed
promptly.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/reference/troubleshooting.md`:
- Around line 255-257: The fenced code block containing "```bash" and "ip route
show default | awk '/default/ {print $3}'" needs a blank line immediately before
the opening ``` and a blank line immediately after the closing ``` to satisfy
MD031 (blanks-around-fences); update the nested fenced block in the markdown
list so there is an empty line above and below the triple-backtick fence.
- Around line 176-179: The documentation currently states that the first
successful container-side probe is definitively "injected into both
`OPENAI_BASE_URL` and the reachability check, and persisted to the sandbox
registry entry as `resolvedHostIp`"; change this absolute language to reflect
best-effort behavior by replacing claims of guaranteed persistence with phrases
like "attempts to persist" or "is attempted to be persisted" for
`resolvedHostIp`, and clarify that `OPENAI_BASE_URL` and the reachability check
receive the candidate when the probe succeeds, noting this occurs during the
onboarding flow and that persistence ordering/guarantees are not strict.
- Around line 192-200: The CLI examples currently use `powershell`/`bash` fenced
blocks without the required `$` prompt; update the fenced code blocks to use the
`console` language tag and prefix each command line with a `$` prompt (e.g., for
the PowerShell snippet replace the ```powershell block containing
[System.Environment]::SetEnvironmentVariable('OLLAMA_HOST'...) and the
Get-Process | Where-Object ... | Stop-Process -Force and ollama serve lines with
a ```console block and add `$ ` before each command), and apply the same
transformation to the other referenced blocks (lines 204-207, 211-215, 232-234,
238-241, 255-257) so all CLI examples follow the docs guideline.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 59fd9dd1-922e-41f4-974d-b4c09699f1ac

📥 Commits

Reviewing files that changed from the base of the PR and between 5800da8 and 768b4f4.

📒 Files selected for processing (1)
  • docs/reference/troubleshooting.md

Comment on lines +176 to +179
The first candidate whose container-side probe succeeds is injected
into both `OPENAI_BASE_URL` and the reachability check, and persisted
to the sandbox registry entry as `resolvedHostIp`. No manual override
is needed for either Ollama placement.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid absolute wording about resolvedHostIp persistence.

This states persistence as guaranteed, but current behavior is best-effort in onboarding flow and has a known persistence-ordering follow-up. Please soften this to "attempts to persist" to avoid misleading troubleshooting expectations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/reference/troubleshooting.md` around lines 176 - 179, The documentation
currently states that the first successful container-side probe is definitively
"injected into both `OPENAI_BASE_URL` and the reachability check, and persisted
to the sandbox registry entry as `resolvedHostIp`"; change this absolute
language to reflect best-effort behavior by replacing claims of guaranteed
persistence with phrases like "attempts to persist" or "is attempted to be
persisted" for `resolvedHostIp`, and clarify that `OPENAI_BASE_URL` and the
reachability check receive the candidate when the probe succeeds, noting this
occurs during the onboarding flow and that persistence ordering/guarantees are
not strict.

Comment on lines +192 to +200
```powershell
# Persist across reboots; Machine scope so services also inherit it.
[System.Environment]::SetEnvironmentVariable('OLLAMA_HOST','0.0.0.0:11434','Machine')

# Stop Ollama (tray + server) and start it in a new shell so it picks
# up the new env var. Open a NEW PowerShell window first, then:
Get-Process | Where-Object { $_.ProcessName -like 'ollama*' } | Stop-Process -Force
ollama serve
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use console fenced blocks with $ prompts for CLI commands.

These CLI examples are tagged as powershell/bash, but the docs rule requires console blocks with $ prompt prefixes for command examples.

Suggested formatting adjustment
-```powershell
+```console
+$ [System.Environment]::SetEnvironmentVariable('OLLAMA_HOST','0.0.0.0:11434','Machine')
 ...
-Get-Process | Where-Object { $_.ProcessName -like 'ollama*' } | Stop-Process -Force
-ollama serve
+$ Get-Process | Where-Object { $_.ProcessName -like 'ollama*' } | Stop-Process -Force
+$ ollama serve

</details>

  
As per coding guidelines, "CLI code blocks must use the `console` language tag with `$` prompt prefix. Flag ```bash or ```shell for CLI examples."


Also applies to: 204-207, 211-215, 232-234, 238-241, 255-257

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/reference/troubleshooting.md around lines 192 - 200, The CLI examples
currently use powershell/bash fenced blocks without the required $ prompt;
update the fenced code blocks to use the console language tag and prefix each
command line with a $ prompt (e.g., for the PowerShell snippet replace the

[System.Environment]::SetEnvironmentVariable('OLLAMA_HOST'...) and the
Get-Process | Where-Object ... | Stop-Process -Force and ollama serve lines with
a ```console block and add `$ ` before each command), and apply the same
transformation to the other referenced blocks (lines 204-207, 211-215, 232-234,
238-241, 255-257) so all CLI examples follow the docs guideline.

Comment on lines +255 to +257
```bash
ip route show default | awk '/default/ {print $3}'
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add blank lines around the nested fenced code block.

This block trips MD031 (blanks-around-fences) in the list item. Insert blank lines before and after the fence to satisfy markdownlint and keep rendering stable.

As per coding guidelines, "Follow style guide in docs/CONTRIBUTING.md for documentation."

🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 255-255: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 257-257: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/reference/troubleshooting.md` around lines 255 - 257, The fenced code
block containing "```bash" and "ip route show default | awk '/default/ {print
$3}'" needs a blank line immediately before the opening ``` and a blank line
immediately after the closing ``` to satisfy MD031 (blanks-around-fences);
update the nested fenced block in the markdown list so there is an empty line
above and below the triple-backtick fence.

@wscurran wscurran added bug Something isn't working Platform: Windows/WSL Support for Windows Subsystem for Linux Local Models Running NemoClaw with local models fix labels Apr 14, 2026
@jieunl24
Copy link
Copy Markdown
Contributor

Does this work for both WSL hosted Ollama and Windows hosted Ollama?
Windows hosted Ollama is not a supported path as NemoClaw only shows Ollama as an inference option if and only if Ollama is installed on WSL and user selected model gets loaded on Ollama on WSL.
That being said, Windows hosted Ollama is still reachable from the sandbox already, it'll just return 404 model not found when the user selected model was not already loaded by the user on Windows hosted Ollama side.

I wonder if the reporter was running Ollama on Windows, and if not - the expected binding for Ollama on WSL is 127.0.0.1 (the default, no need for binding specification), not 0.0.0.0.

"Local Ollama is responding on localhost, but the container reachability check failed for http://host.openshell.internal:11434.\n" +
" Common causes:\n" +
" • Ollama is bound to 127.0.0.1 — set OLLAMA_HOST=0.0.0.0:11434\n" +
" • Docker Desktop on WSL2 resolves host-gateway to IPv6 — try installing Docker Engine natively in WSL2\n" +
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prerequisite of running NemoClaw on Windows is Docker Desktop.

@jieunl24
Copy link
Copy Markdown
Contributor

The premise about host.openshell.internal resolving to IPv6 / un-routable IPs seems incorrect / outdated. It's set as a hostAlias and always resolves to the Docker host-gateway IPv4 address (e.g. 172.29.0.254):

/ # kubectl get po my-assistant -o yaml -n openshell | grep hostAlias -A 6
  hostAliases:
  - hostnames:
    - host.docker.internal
    - host.openshell.internal
    ip: 172.29.0.254
  initContainers:
  - command:

("my-assistant" is my sandbox's name) and then verified the full chain: gateway container → host-gateway IP → Windows host → wslrelay → Ollama - works end-to-end. The IP is routable and Ollama responds.

/ # kubectl exec -it openshell-0 -n openshell -- sh
$ bash -c 'echo -e "GET /api/tags HTTP/1.0\r\n\r\n" > /dev/tcp/172.29.0.254/11434 && echo "REACHABLE" || echo "UNREACHABLE"'
REACHABLE
$

The actual root cause is conditional, it only manifests when Ollama binds to a dual-stack socket. WSL2's wslrelay.exe doesn't forward dual-stack (AF_INET6) sockets (microsoft/WSL#4851). Go's net.Listen("tcp", "0.0.0.0") creates a dual-stack socket, so Ollama with OLLAMA_HOST=0.0.0.0 binds to *:11434 (AF_INET6) rather than .0.0.0:11434 (AF_INET4). The relay ignores it, and the connection blackholes.
The fix is simpler: on WSL2, don't set OLLAMA_HOST=0.0.0.0 - let Ollama bind to its default 127.0.0.1, which the relay does forward. This fix has been merged around 2 weeks ago (#1104).

Re: the reporter (#1472) - they're on OpenShell 0.0.14, I've verified on openshell 0.0.26. Also as mentioned on my previous comment, reporter seems to be binding Ollama host to 0.0.0.0, which causes the dual-stack socket issue above.

There's a windows setup documentation that was added recently https://github.com/NVIDIA/NemoClaw/blob/main/docs/get-started/windows-setup.md which also covers how to set up local inference with Ollama.

Kindly verify whether issue is reproducible with the latest nemoclaw and openshell versions, following the installation guide.

@cv cv added the v0.0.18 Release target label Apr 16, 2026
@cv
Copy link
Copy Markdown
Contributor

cv commented Apr 16, 2026

Friendly AI-generated maintainer note:

Thanks for the WSL2 follow-up. I took a pass through the current branch and I need these blockers addressed before I can move it forward:

  1. Rebase onto current main and port the onboarding changes to the active TypeScript path (src/lib/onboard.ts / current CLI flow), not the older bin/lib/onboard.js path by itself.
  2. Keep --non-interactive failures non-interactive — local-provider validation failures should not prompt.
  3. Move the resolvedHostIp registry write so it targets the real sandbox entry after sandbox creation, not the pre-create gateway placeholder.
  4. Resolve the remaining major CodeRabbit threads, including the docs wording that currently overstates the persistence guarantee.

After that update lands, I can re-run the gate check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working fix Local Models Running NemoClaw with local models Platform: Windows/WSL Support for Windows Subsystem for Linux v0.0.18 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[WSL2] Local Ollama inference timeout — inference.local TLS handshake fails, host.openshell.internal:11434 unreachable from openshell-0 pod

4 participants