Skip to content

fix(defaults): resolve host gateway via Docker for Colima/Rancher Desktop#432

Merged
OisinKyne merged 1 commit into
mainfrom
fix/colima-host-resolution
May 7, 2026
Merged

fix(defaults): resolve host gateway via Docker for Colima/Rancher Desktop#432
OisinKyne merged 1 commit into
mainfrom
fix/colima-host-resolution

Conversation

@apham0001
Copy link
Copy Markdown
Contributor

Summary

OllamaHostIPForBackend hardcoded the Docker Desktop magic gateway IP (192.168.65.254) on darwin+k3d. On Colima and Rancher Desktop the host gateway lives on a different bridge (e.g. 192.168.5.2 for Colima's default profile), so the rendered Ollama Endpoints pointed at an IP no pod could reach. The Service silently timed out for every LiteLLM call, surfaced upstream in OpenClaw as the cryptic:

provider rejected the request schema or tool payload

This PR resolves the host gateway by running a transient alpine:3 container that prints host.docker.internal from inside Docker — works the same on Docker Desktop, Colima, and Rancher Desktop because all three expose host.docker.internal inside containers (mapping it to whatever bridge IP their VM uses).

Resolution order on darwin+k3d:

  1. Docker-based lookup (new): docker run --rm alpine:3 getent hosts host.docker.internal → falls back to --add-host=host.docker.internal:host-gateway for stricter setups.
  2. Docker Desktop magic IP (192.168.65.254) as a last resort, so installs without a working docker CLI still behave as before.
  3. Linux unchanged (docker0/br-* bridge fallback).

Also adds a "Minimum local model size" note to the README. The stack relies heavily on tool-calling, and 1B–4B / *-coder Ollama variants either return raw JSON in content or hallucinate tool failures. Recommends llama3.1:8b / qwen3:8b / qwen2.5:7b (instruct) for the agent role.

Why this happened

net.LookupHost("host.docker.internal") runs on the host machine, where the name is not resolvable (it only exists inside Docker containers). The function therefore always fell through to DockerDesktopGatewayIP(). On Docker Desktop that IP happens to be correct; on Colima/Rancher it's wrong, and the cluster gets a black-hole Endpoint.

Test plan

  • go test ./... (unit suite, full repo) — all green
  • go vet ./internal/defaults/... — clean
  • gofmt -l — clean
  • Smoke test on Colima (default profile, 4 CPU / 8 GiB, k3d backend, macOS Apple Silicon):
    • ResolveHostGatewayViaDocker() returns 192.168.5.2 in ~400ms
    • OllamaHostIPForBackend("k3d") returns 192.168.5.2
    • Rendered Endpoints becomes reachable from in-cluster pods
    • OpenClaw chat completes a full LiteLLM → Ollama → tool-call round-trip with llama3.1:8b
  • Verification on Docker Desktop pending — needs a reviewer with Docker Desktop to confirm the new code path returns 192.168.65.254 (the magic IP fallback is still in place, so behaviour should be identical even if Docker resolution fails)
  • Verification on Linux/Rancher Desktop pending

Notes

  • New container spawn at obol stack init adds ~400 ms of latency. Acceptable; only happens at init/refresh.
  • alpine:3 is auto-pulled if missing (~7 MB).
  • DockerDesktopGatewayIP() is preserved as a public symbol with an updated doc comment, since internal/stack/stack.go still re-exports it.
  • README addition is concise and kept in the "Model Providers" section to be visible during initial setup, where the choice matters most.

@OisinKyne OisinKyne force-pushed the fix/colima-host-resolution branch from 1e32841 to 80665ea Compare May 6, 2026 15:01
@OisinKyne
Copy link
Copy Markdown
Contributor

lgtm, won't merge pre 0.9.0 as idk how many non docker desktop people there are on mac. can go in after. we can also tip docker desktop users to increase their resources if they have the defaults which are too tight

@apham0001 apham0001 marked this pull request as ready for review May 6, 2026 15:10
…ktop

OllamaHostIPForBackend hardcoded the Docker Desktop magic gateway IP
(192.168.65.254) on darwin+k3d. On Colima and Rancher Desktop the host
gateway lives on a different bridge (e.g. 192.168.5.2 for Colima's
default profile), so the rendered Endpoints object pointed at an IP no
pod could reach. The Ollama Service silently timed out for every
LiteLLM call, surfaced upstream as "provider rejected the request
schema or tool payload" in OpenClaw.

Replace the hardcoded fallback with a Docker-based lookup that resolves
host.docker.internal from inside a transient alpine container. All
three macOS Docker runtimes (Docker Desktop, Colima, Rancher Desktop)
expose host.docker.internal in containers and map it to whatever
gateway their VM is using, so the same resolution works everywhere.
Falls back to --add-host=host.docker.internal:host-gateway and finally
to the Docker Desktop magic IP for offline / docker-CLI-missing setups.

Also add a "Minimum local model size" note to the README explaining
that 1B-4B and *-coder Ollama variants do not handle the structured
tool-calling channel reliably and recommending llama3.1:8b /
qwen3:8b / qwen2.5:7b for the agent role.

Tested on Colima (default profile, 4 CPU 8GiB, k3d backend): the new
resolver returns 192.168.5.2 in ~400ms, the rendered Ollama Endpoints
becomes reachable from in-cluster pods, and OpenClaw chat goes through
the full LiteLLM -> Ollama -> tool-call round trip.

Docker Desktop verification still pending (someone with Docker Desktop
needed to confirm host.docker.internal resolution returns the expected
192.168.65.254 via the new code path).
@OisinKyne OisinKyne force-pushed the fix/colima-host-resolution branch from 80665ea to 95e63f0 Compare May 7, 2026 11:04
@OisinKyne OisinKyne enabled auto-merge (rebase) May 7, 2026 11:05
@OisinKyne OisinKyne merged commit aa9744e into main May 7, 2026
5 checks passed
@OisinKyne OisinKyne deleted the fix/colima-host-resolution branch May 9, 2026 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants