fix(onboard): probe-and-pick WSL2 host IP candidates for local inference#1864
fix(onboard): probe-and-pick WSL2 host IP candidates for local inference#1864
Conversation
#1472) On WSL2 with Docker Desktop, `host.openshell.internal` resolves to an unreachable IPv6 ULA or gateway IP. This breaks both the onboard container reachability check (step 4/8 hard-exits) and runtime inference routing (proxy cannot reach upstream Ollama). Changes: - detect WSL2 + Docker Desktop at onboard time and resolve the distro's eth0 IPv4 via `hostname -I` - pass the reachable IP to `OPENAI_BASE_URL` and the container reachability probe instead of `host.openshell.internal` - add `-4` flag to curl in the reachability check to force IPv4 - replace hard `process.exit(1)` with a "Continue anyway?" prompt when the container reachability check fails - improve the error message with actionable diagnostic causes Non-WSL2 platforms (macOS, native Linux, Docker Engine) are unaffected; `host.openshell.internal` remains the default when no override is needed. Signed-off-by: Giedrius Burachas <gburachas@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extends the earlier host.openshell.internal override to cover both
Ollama/vLLM server placements on WSL2 + Docker Desktop:
A. Server inside WSL — reached via the distro's eth0 IPv4
(`ip -4 -o route get 1.1.1.1` src).
B. Server on Windows host (NAT mode) — reached only via the WSL2
default gateway (`ip -4 -o route show default`).
`detectWsl2HostIpCandidates()` now returns an ordered, filtered list
(drops loopback, link-local, and common Docker/k8s bridge ranges).
`validateLocalProvider()` probes each candidate with the container
reachability check and returns the first that answers as
`resolvedHostIp`. `onboard.js` uses that winner for OPENAI_BASE_URL and
the in-container probe, prints the resolved IP, and tries to persist
it to the sandbox registry. Non-WSL2 platforms are unchanged.
Verified end-to-end: onboard with ollama-local + Windows-hosted Ollama
over WSL2 mirrored networking reaches step 8/8, prints a resolved host
IP, and a 5-turn chat completion from inside the sandbox returns in
~3s with no timeout.
Adds a note to docs/CONTRIBUTING.md telling AI contributors not to
stage `.agents/skills/nemoclaw-user-*/` — those files are regenerated
from `docs/` by the pre-commit hook.
Addresses: #1472 (host.openshell.internal unreachable on WSL2 +
Docker Desktop, runtime inference routing), #336 (sandbox cannot reach
Windows-hosted Ollama; covers points 1 and 3 of the reporter's
reproduction).
Related (not fixed here):
- #305 — WSL2 tracking issue; this PR helps the inference path only,
not the gateway bootstrap / image-pull / TLS issues listed there.
- #315 — vLLM-on-WSL2 walkthrough; sandbox egress iptables/veth
workarounds documented there are orthogonal.
- #246 — Ollama reasoning-model blank-content bug; separate issue,
pick a non-reasoning model until it's fixed.
Known follow-ups, not in this PR:
- `registry.updateSandbox(sandboxName, {resolvedHostIp, ...})` is a
no-op at step 4/8 because the sandbox entry is created at step 6/8;
persistence order needs fixing (same pre-existing bug affects
`model` and `provider` fields).
- Onboard does not always return cleanly to the shell after step 8/8.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📝 Walkthrough📝 Walkthrough🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
When Ollama runs on the Windows host (not inside WSL), NemoClaw's
auto-detection only helps once the host itself is actually reachable
from WSL. Document the three prerequisite steps, each labeled with the
shell to run it in (PowerShell vs WSL):
1. Bind Ollama to 0.0.0.0 via OLLAMA_HOST at Machine scope.
2. Allow inbound TCP 11434 in Windows Defender Firewall.
3. Switch WSL2 to mirrored networking mode (NAT mode hits
NATInboundRuleNotApplicable on the Hyper-V firewall layer).
Also improves the fallback troubleshooting list with explicit shell
labels and the gateway-IP manual override for users who cannot switch
to mirrored mode.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
docs/CONTRIBUTING.md (1)
36-36: Please align this note with docs style rules (bold/colon/line-splitting).This line has three style issues: unnecessary bold on routine instruction (LLM pattern detected), a colon not introducing a list, and multiple sentences on one source line.
Suggested edit
-**For AI coding assistants:** Do not `git add` any file under `.agents/skills/nemoclaw-user-*/` — not even when `git status` shows it as modified. The pre-commit hook regenerates and stages those files automatically from `docs/`. Staging them manually makes the commit diff harder to review and can mask out-of-date hand edits. If you changed user-facing behavior, update the matching page under `docs/` and stage only `docs/**/*.md`; the hook does the rest. +For AI coding assistants, do not `git add` any file under `.agents/skills/nemoclaw-user-*/`, even when `git status` shows it as modified. +The pre-commit hook regenerates and stages those files automatically from `docs/`. +Staging them manually makes the commit diff harder to review and can mask out-of-date hand edits. +If you changed user-facing behavior, update the matching page under `docs/` and stage only `docs/**/*.md`. +The hook does the rest.As per coding guidelines: “One sentence per line in source”, “Colons should only introduce a list”, and “Unnecessary bold on routine instructions … flag as suggestions with the note ‘LLM pattern detected.’”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/CONTRIBUTING.md` at line 36, Update the sentence about AI coding assistants to follow docs style: remove the bold formatting around the routine instruction, change the colon so it either introduces a list or is replaced with a period, and split the content so there is one sentence per source line; reference the exact text mentioning `.agents/skills/nemoclaw-user-*/` and `docs/` when making the change and add a short suggestion note "LLM pattern detected" (not bold) after the instruction to indicate it’s a stylistic suggestion rather than a rule.docs/reference/troubleshooting.md (1)
158-200: Reflow this section to one sentence per line and remove routine bold emphasis.The new prose is wrapped mid-sentence across multiple source lines, and bolding phrases like “inside WSL”, “Windows host”, and “mirrored” reads like routine emphasis rather than a warning. LLM pattern detected.
As per coding guidelines, "One sentence per line in source (makes diffs readable)" and "Bold is reserved for UI labels, parameter names, and genuine warnings."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/reference/troubleshooting.md` around lines 158 - 200, The "Local inference on WSL2 + Docker Desktop" section has wrapped sentences across lines and uses routine bolding (e.g., "**inside WSL**", "**Windows host**", "**mirrored**"); reflow every sentence so each ends on its own source line and remove bold from routine emphasis (leave bold only for UI labels/params/warnings like OPENAI_BASE_URL, host.openshell.internal, OLLAMA_HOST), preserving content and examples (ip commands, env vars, resolvedHostIp, NO_PROXY) and ensuring lists and bullet points remain one sentence per line for clearer diffs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/lib/onboard.js`:
- Around line 3405-3409: The call to registry.updateSandbox(...) is writing
using the temporary GATEWAY_NAME before the real sandbox exists (setupInference
runs before createSandbox), so its return is false and the data is lost; fix by
deferring the persistence until after the sandbox is registered (i.e., after
createSandbox completes inside onboard) or by caching the {model, provider,
resolvedHostIp} in session state and applying them when registry.createSandbox /
registry.updateSandbox is called for the real sandbox name; specifically modify
the flow around setupInference(), createSandbox(), and the
registry.updateSandbox call so you either move the registry.updateSandbox
invocation to post-createSandbox or add a session/cache write-read that
registry.updateSandbox consumes once the real sandbox entry exists.
- Around line 3324-3331: On validation failure in the local-provider probe
branches (the blocks that check validation.ok, call prompt(), and call
process.exit(1)), respect the global non-interactive flag instead of always
invoking prompt(): if the non-interactive mode flag (the same boolean used
elsewhere in the wizard, e.g., nonInteractive or flags.nonInteractive) is set
then log the validation.message and immediately call process.exit(1) (no
prompt), otherwise keep the existing interactive prompt flow; apply this same
guard to both places that call prompt() (the shown block and the other branch
around lines 3362-3369).
In `@src/lib/local-inference.ts`:
- Around line 85-96: The curl-based container reachability commands returned by
getLocalProviderContainerReachabilityCheck (cases "vllm-local" and
"ollama-local") lack timeout flags and can block on blackholed IPs; modify
getLocalProviderContainerReachabilityCheck to add the same timeout options used
by getOllamaProbeCommand (e.g., --max-time and optionally --connect-timeout) to
the returned command strings so each probe times out quickly and the caller loop
(the WSL2 IP candidate loop) can proceed promptly.
---
Nitpick comments:
In `@docs/CONTRIBUTING.md`:
- Line 36: Update the sentence about AI coding assistants to follow docs style:
remove the bold formatting around the routine instruction, change the colon so
it either introduces a list or is replaced with a period, and split the content
so there is one sentence per source line; reference the exact text mentioning
`.agents/skills/nemoclaw-user-*/` and `docs/` when making the change and add a
short suggestion note "LLM pattern detected" (not bold) after the instruction to
indicate it’s a stylistic suggestion rather than a rule.
In `@docs/reference/troubleshooting.md`:
- Around line 158-200: The "Local inference on WSL2 + Docker Desktop" section
has wrapped sentences across lines and uses routine bolding (e.g., "**inside
WSL**", "**Windows host**", "**mirrored**"); reflow every sentence so each ends
on its own source line and remove bold from routine emphasis (leave bold only
for UI labels/params/warnings like OPENAI_BASE_URL, host.openshell.internal,
OLLAMA_HOST), preserving content and examples (ip commands, env vars,
resolvedHostIp, NO_PROXY) and ensuring lists and bullet points remain one
sentence per line for clearer diffs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: a21e145a-cafe-493c-b807-56b5bd3afcb8
📒 Files selected for processing (6)
bin/lib/onboard.jsdocs/CONTRIBUTING.mddocs/reference/troubleshooting.mdsrc/lib/local-inference.test.tssrc/lib/local-inference.tssrc/lib/registry.ts
| if (!validation.ok) { | ||
| console.error(` ${validation.message}`); | ||
| process.exit(1); | ||
| const answer = (await prompt(" Continue anyway? Inference may fail at runtime. [y/N]: ")) | ||
| .trim() | ||
| .toLowerCase(); | ||
| if (answer !== "y") { | ||
| process.exit(1); | ||
| } |
There was a problem hiding this comment.
Keep local-provider probe failures non-interactive in --non-interactive mode.
These branches now call prompt() unconditionally on validation failure. If nemoclaw onboard --non-interactive selects ollama or vllm, a bad probe turns into a hang instead of the hard failure the rest of the wizard uses.
Suggested fix
if (!validation.ok) {
console.error(` ${validation.message}`);
+ if (isNonInteractive()) {
+ process.exit(1);
+ }
const answer = (await prompt(" Continue anyway? Inference may fail at runtime. [y/N]: "))
.trim()
.toLowerCase();
if (answer !== "y") {
process.exit(1);
}
}Apply the same guard in both local-provider branches.
Also applies to: 3362-3369
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bin/lib/onboard.js` around lines 3324 - 3331, On validation failure in the
local-provider probe branches (the blocks that check validation.ok, call
prompt(), and call process.exit(1)), respect the global non-interactive flag
instead of always invoking prompt(): if the non-interactive mode flag (the same
boolean used elsewhere in the wizard, e.g., nonInteractive or
flags.nonInteractive) is set then log the validation.message and immediately
call process.exit(1) (no prompt), otherwise keep the existing interactive prompt
flow; apply this same guard to both places that call prompt() (the shown block
and the other branch around lines 3362-3369).
| registry.updateSandbox(sandboxName, { | ||
| model, | ||
| provider, | ||
| resolvedHostIp: resolvedHostIp || null, | ||
| }); |
There was a problem hiding this comment.
This registry write never reaches the real sandbox entry.
setupInference() still runs before createSandbox(), and onboard() passes GATEWAY_NAME here (Line 4415), not the eventual sandbox name. That makes registry.updateSandbox(...) return false and silently drop model, provider, and resolvedHostIp for normal onboard runs.
Please move this persistence until after the sandbox has been registered, or store it in session state and apply it once the real sandbox entry exists.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bin/lib/onboard.js` around lines 3405 - 3409, The call to
registry.updateSandbox(...) is writing using the temporary GATEWAY_NAME before
the real sandbox exists (setupInference runs before createSandbox), so its
return is false and the data is lost; fix by deferring the persistence until
after the sandbox is registered (i.e., after createSandbox completes inside
onboard) or by caching the {model, provider, resolvedHostIp} in session state
and applying them when registry.createSandbox / registry.updateSandbox is called
for the real sandbox name; specifically modify the flow around setupInference(),
createSandbox(), and the registry.updateSandbox call so you either move the
registry.updateSandbox invocation to post-createSandbox or add a session/cache
write-read that registry.updateSandbox consumes once the real sandbox entry
exists.
| export function getLocalProviderContainerReachabilityCheck( | ||
| provider: string, | ||
| hostIp?: string, | ||
| ): string | null { | ||
| const addHost = hostIp | ||
| ? `--add-host host.openshell.internal:${hostIp}` | ||
| : "--add-host host.openshell.internal:host-gateway"; | ||
| switch (provider) { | ||
| case "vllm-local": | ||
| return `docker run --rm --add-host host.openshell.internal:host-gateway ${CONTAINER_REACHABILITY_IMAGE} -sf http://host.openshell.internal:8000/v1/models 2>/dev/null`; | ||
| return `docker run --rm ${addHost} ${CONTAINER_REACHABILITY_IMAGE} -4 -sf http://host.openshell.internal:8000/v1/models 2>/dev/null`; | ||
| case "ollama-local": | ||
| return `docker run --rm --add-host host.openshell.internal:host-gateway ${CONTAINER_REACHABILITY_IMAGE} -sf http://host.openshell.internal:11434/api/tags 2>/dev/null`; | ||
| return `docker run --rm ${addHost} ${CONTAINER_REACHABILITY_IMAGE} -4 -sf http://host.openshell.internal:11434/api/tags 2>/dev/null`; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, check if the file exists and get its total line count
if [ -f "src/lib/local-inference.ts" ]; then
echo "File exists. Line count:"
wc -l src/lib/local-inference.ts
else
echo "File not found"
fiRepository: NVIDIA/NemoClaw
Length of output: 1870
🏁 Script executed:
# Read the specific lines mentioned in the review (85-96 and 201-210)
echo "=== Lines 85-96 ==="
sed -n '85,96p' src/lib/local-inference.ts
echo ""
echo "=== Lines 201-210 ==="
sed -n '201,210p' src/lib/local-inference.tsRepository: NVIDIA/NemoClaw
Length of output: 1185
🏁 Script executed:
# Check for SPDX license header at the top of the file
echo "=== First 10 lines (checking for SPDX header) ==="
head -10 src/lib/local-inference.tsRepository: NVIDIA/NemoClaw
Length of output: 508
🏁 Script executed:
# Search for all curl commands in the file to understand the context
echo "=== All curl/container reachability commands in the file ==="
rg "curl|getLocalProviderContainerReachabilityCheck" src/lib/local-inference.ts -A 3Repository: NVIDIA/NemoClaw
Length of output: 1505
Add timeout flags to container reachability probe commands.
The getLocalProviderContainerReachabilityCheck() function returns curl commands without timeout flags. When this function is called in a loop for multiple WSL2 host IP candidates (lines 201-210), a blackholed IP can block progress for minutes before attempting the next candidate. This is inconsistent with getOllamaProbeCommand() in the same file, which already includes --max-time.
Suggested fix
export function getLocalProviderContainerReachabilityCheck(
provider: string,
hostIp?: string,
): string | null {
const addHost = hostIp
? `--add-host host.openshell.internal:${hostIp}`
: "--add-host host.openshell.internal:host-gateway";
switch (provider) {
case "vllm-local":
- return `docker run --rm ${addHost} ${CONTAINER_REACHABILITY_IMAGE} -4 -sf http://host.openshell.internal:8000/v1/models 2>/dev/null`;
+ return `docker run --rm ${addHost} ${CONTAINER_REACHABILITY_IMAGE} -4 --connect-timeout 2 --max-time 5 -sf http://host.openshell.internal:8000/v1/models 2>/dev/null`;
case "ollama-local":
- return `docker run --rm ${addHost} ${CONTAINER_REACHABILITY_IMAGE} -4 -sf http://host.openshell.internal:11434/api/tags 2>/dev/null`;
+ return `docker run --rm ${addHost} ${CONTAINER_REACHABILITY_IMAGE} -4 --connect-timeout 2 --max-time 5 -sf http://host.openshell.internal:11434/api/tags 2>/dev/null`;
default:
return null;
}
}Also applies to: 201-210
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/lib/local-inference.ts` around lines 85 - 96, The curl-based container
reachability commands returned by getLocalProviderContainerReachabilityCheck
(cases "vllm-local" and "ollama-local") lack timeout flags and can block on
blackholed IPs; modify getLocalProviderContainerReachabilityCheck to add the
same timeout options used by getOllamaProbeCommand (e.g., --max-time and
optionally --connect-timeout) to the returned command strings so each probe
times out quickly and the caller loop (the WSL2 IP candidate loop) can proceed
promptly.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/reference/troubleshooting.md`:
- Around line 255-257: The fenced code block containing "```bash" and "ip route
show default | awk '/default/ {print $3}'" needs a blank line immediately before
the opening ``` and a blank line immediately after the closing ``` to satisfy
MD031 (blanks-around-fences); update the nested fenced block in the markdown
list so there is an empty line above and below the triple-backtick fence.
- Around line 176-179: The documentation currently states that the first
successful container-side probe is definitively "injected into both
`OPENAI_BASE_URL` and the reachability check, and persisted to the sandbox
registry entry as `resolvedHostIp`"; change this absolute language to reflect
best-effort behavior by replacing claims of guaranteed persistence with phrases
like "attempts to persist" or "is attempted to be persisted" for
`resolvedHostIp`, and clarify that `OPENAI_BASE_URL` and the reachability check
receive the candidate when the probe succeeds, noting this occurs during the
onboarding flow and that persistence ordering/guarantees are not strict.
- Around line 192-200: The CLI examples currently use `powershell`/`bash` fenced
blocks without the required `$` prompt; update the fenced code blocks to use the
`console` language tag and prefix each command line with a `$` prompt (e.g., for
the PowerShell snippet replace the ```powershell block containing
[System.Environment]::SetEnvironmentVariable('OLLAMA_HOST'...) and the
Get-Process | Where-Object ... | Stop-Process -Force and ollama serve lines with
a ```console block and add `$ ` before each command), and apply the same
transformation to the other referenced blocks (lines 204-207, 211-215, 232-234,
238-241, 255-257) so all CLI examples follow the docs guideline.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 59fd9dd1-922e-41f4-974d-b4c09699f1ac
📒 Files selected for processing (1)
docs/reference/troubleshooting.md
| The first candidate whose container-side probe succeeds is injected | ||
| into both `OPENAI_BASE_URL` and the reachability check, and persisted | ||
| to the sandbox registry entry as `resolvedHostIp`. No manual override | ||
| is needed for either Ollama placement. |
There was a problem hiding this comment.
Avoid absolute wording about resolvedHostIp persistence.
This states persistence as guaranteed, but current behavior is best-effort in onboarding flow and has a known persistence-ordering follow-up. Please soften this to "attempts to persist" to avoid misleading troubleshooting expectations.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/reference/troubleshooting.md` around lines 176 - 179, The documentation
currently states that the first successful container-side probe is definitively
"injected into both `OPENAI_BASE_URL` and the reachability check, and persisted
to the sandbox registry entry as `resolvedHostIp`"; change this absolute
language to reflect best-effort behavior by replacing claims of guaranteed
persistence with phrases like "attempts to persist" or "is attempted to be
persisted" for `resolvedHostIp`, and clarify that `OPENAI_BASE_URL` and the
reachability check receive the candidate when the probe succeeds, noting this
occurs during the onboarding flow and that persistence ordering/guarantees are
not strict.
| ```powershell | ||
| # Persist across reboots; Machine scope so services also inherit it. | ||
| [System.Environment]::SetEnvironmentVariable('OLLAMA_HOST','0.0.0.0:11434','Machine') | ||
|
|
||
| # Stop Ollama (tray + server) and start it in a new shell so it picks | ||
| # up the new env var. Open a NEW PowerShell window first, then: | ||
| Get-Process | Where-Object { $_.ProcessName -like 'ollama*' } | Stop-Process -Force | ||
| ollama serve | ||
| ``` |
There was a problem hiding this comment.
Use console fenced blocks with $ prompts for CLI commands.
These CLI examples are tagged as powershell/bash, but the docs rule requires console blocks with $ prompt prefixes for command examples.
Suggested formatting adjustment
-```powershell
+```console
+$ [System.Environment]::SetEnvironmentVariable('OLLAMA_HOST','0.0.0.0:11434','Machine')
...
-Get-Process | Where-Object { $_.ProcessName -like 'ollama*' } | Stop-Process -Force
-ollama serve
+$ Get-Process | Where-Object { $_.ProcessName -like 'ollama*' } | Stop-Process -Force
+$ ollama serve
</details>
As per coding guidelines, "CLI code blocks must use the `console` language tag with `$` prompt prefix. Flag ```bash or ```shell for CLI examples."
Also applies to: 204-207, 211-215, 232-234, 238-241, 255-257
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.
In @docs/reference/troubleshooting.md around lines 192 - 200, The CLI examples
currently use powershell/bash fenced blocks without the required $ prompt;
update the fenced code blocks to use the console language tag and prefix each
command line with a $ prompt (e.g., for the PowerShell snippet replace the
[System.Environment]::SetEnvironmentVariable('OLLAMA_HOST'...) and the
Get-Process | Where-Object ... | Stop-Process -Force and ollama serve lines with
a ```console block and add `$ ` before each command), and apply the same
transformation to the other referenced blocks (lines 204-207, 211-215, 232-234,
238-241, 255-257) so all CLI examples follow the docs guideline.| ```bash | ||
| ip route show default | awk '/default/ {print $3}' | ||
| ``` |
There was a problem hiding this comment.
Add blank lines around the nested fenced code block.
This block trips MD031 (blanks-around-fences) in the list item. Insert blank lines before and after the fence to satisfy markdownlint and keep rendering stable.
As per coding guidelines, "Follow style guide in docs/CONTRIBUTING.md for documentation."
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 255-255: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 257-257: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/reference/troubleshooting.md` around lines 255 - 257, The fenced code
block containing "```bash" and "ip route show default | awk '/default/ {print
$3}'" needs a blank line immediately before the opening ``` and a blank line
immediately after the closing ``` to satisfy MD031 (blanks-around-fences);
update the nested fenced block in the markdown list so there is an empty line
above and below the triple-backtick fence.
|
Does this work for both WSL hosted Ollama and Windows hosted Ollama? I wonder if the reporter was running Ollama on Windows, and if not - the expected binding for Ollama on WSL is 127.0.0.1 (the default, no need for binding specification), not 0.0.0.0. |
| "Local Ollama is responding on localhost, but the container reachability check failed for http://host.openshell.internal:11434.\n" + | ||
| " Common causes:\n" + | ||
| " • Ollama is bound to 127.0.0.1 — set OLLAMA_HOST=0.0.0.0:11434\n" + | ||
| " • Docker Desktop on WSL2 resolves host-gateway to IPv6 — try installing Docker Engine natively in WSL2\n" + |
There was a problem hiding this comment.
Prerequisite of running NemoClaw on Windows is Docker Desktop.
|
The premise about host.openshell.internal resolving to IPv6 / un-routable IPs seems incorrect / outdated. It's set as a hostAlias and always resolves to the Docker host-gateway IPv4 address (e.g. 172.29.0.254): ("my-assistant" is my sandbox's name) and then verified the full chain: gateway container → host-gateway IP → Windows host → wslrelay → Ollama - works end-to-end. The IP is routable and Ollama responds. The actual root cause is conditional, it only manifests when Ollama binds to a dual-stack socket. WSL2's wslrelay.exe doesn't forward dual-stack (AF_INET6) sockets (microsoft/WSL#4851). Go's net.Listen("tcp", "0.0.0.0") creates a dual-stack socket, so Ollama with OLLAMA_HOST=0.0.0.0 binds to *:11434 (AF_INET6) rather than .0.0.0:11434 (AF_INET4). The relay ignores it, and the connection blackholes. Re: the reporter (#1472) - they're on OpenShell 0.0.14, I've verified on openshell 0.0.26. Also as mentioned on my previous comment, reporter seems to be binding Ollama host to 0.0.0.0, which causes the dual-stack socket issue above. There's a windows setup documentation that was added recently https://github.com/NVIDIA/NemoClaw/blob/main/docs/get-started/windows-setup.md which also covers how to set up local inference with Ollama. Kindly verify whether issue is reproducible with the latest nemoclaw and openshell versions, following the installation guide. |
|
Friendly AI-generated maintainer note: Thanks for the WSL2 follow-up. I took a pass through the current branch and I need these blockers addressed before I can move it forward:
After that update lands, I can re-run the gate check. |
Summary
On WSL2 with Docker Desktop,
host.openshell.internal(via Docker'shost-gateway) resolves to an unreachable IPv6 ULA or un-routable gateway IP. That breaks:This PR resolves a reachable IPv4 on WSL2 + Docker Desktop and uses it for both the
OPENAI_BASE_URLwritten into the gateway and the container reachability probe. It now handles both Ollama/vLLM placements correctly:ip -4 -o route get 1.1.1.1src).ip -4 -o route show default).detectWsl2HostIpCandidates()returns an ordered, filtered list (drops loopback, link-local, and common Docker/k8s bridge ranges).validateLocalProvider()probes each candidate with the container reachability check and returns the first that answers asresolvedHostIp.onboard.jsuses that winner, prints it, and tries to persist it to the sandbox registry. Non-WSL2 platforms are unchanged —host.openshell.internalremains the default.Host-side prerequisites (for Windows-hosted Ollama)
When Ollama runs on the Windows host (not inside WSL), the fix only helps once the host itself is reachable from WSL. Run these before onboarding (full step-by-step with shell labels lives in
docs/reference/troubleshooting.md):NATInboundRuleNotApplicable). Add to%USERPROFILE%\.wslconfig:wsl --shutdownfrom PowerShell and reopen WSL.This PR itself does not change any of the above — it handles the NemoClaw/Docker side once the host is reachable.
Linked issues
Addressed in this PR
host.openshell.internalunreachable on WSL2 + Docker Desktop; local Ollama/vLLM inference routing failed.Related but not fixed here
Verification
End-to-end on WSL2 (mirrored networking) + Docker Desktop + Windows-hosted Ollama, with
qwen2.5-coder:3b:Resolved WSL2 host IP for container access: 10.125.212.156.openshell inference getshows provider=ollama-local, model=qwen2.5-coder:3b, timeout=180s.http://host.openshell.internal:11434/api/tagsreturns the model list ([WSL2] NemoClaw sandbox cannot reach Windows-hosted Ollama — local inference path blocked (relates to #305, #315, #246) #336 point 1 confirmed).Unit tests: 43/43 pass in
src/lib/local-inference.test.ts, covering candidate ordering, filtering (docker/loopback/link-local/IPv6), probe-and-pick fallback to the default gateway,resolvedHostIpreturn value, and non-WSL2 regression path.Known follow-ups (tracked separately)
registry.updateSandbox(sandboxName, {resolvedHostIp, model, provider})is a no-op at step 4/8 because the sandbox entry is created at step 6/8 (same pre-existing bug affectsmodelandprovider).Type of Change
Testing
detectWsl2HostIpCandidates, probe-and-pick fallback, candidate filtering).npm run typecheck:clipasses.vitest --project cli src/lib/local-inference.test.ts— 43/43 pass.🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
New Features
Documentation
Tests