Skip to content

fix(sandbox): add managed loopback proxy#1501

Open
ericksoa wants to merge 2 commits into
mainfrom
fix/managed-loopback-proxy
Open

fix(sandbox): add managed loopback proxy#1501
ericksoa wants to merge 2 commits into
mainfrom
fix/managed-loopback-proxy

Conversation

@ericksoa
Copy link
Copy Markdown
Collaborator

@ericksoa ericksoa commented May 21, 2026

Summary

  • Add an OpenShell-managed sandbox-local loopback HTTP proxy URL exposed as OPENSHELL_LOOPBACK_PROXY_URL.
  • Start a lifecycle-bound listener inside the sandbox network namespace on the proxy port, defaulting to 127.0.0.1:3128, for clients that require a loopback proxy URL.
  • Accept loopback client sockets in the sandbox netns, then hand accepted sockets back to the supervisor runtime so policy, DNS, TLS handling, WebSocket text rewrite, credential resolution, and upstream TCP dialing remain in the normal OpenShell proxy path.
  • Keep HTTP_PROXY/HTTPS_PROXY behavior unchanged for ordinary proxy-aware clients.

Methodology

  • Re-read the OpenShell proxy, network namespace, nftables bypass, process-identity, WebSocket rewrite, and child-env paths from the live PR branch.
  • Compared the design against similar industry patterns: gVisor's sandbox networking model, Envoy network-namespace-aware listeners, Istio sidecar explicit listener capture modes, Open Service Mesh iptables redirection, and Docker rootless namespace behavior.
  • Ran an adversarial review focused on lifecycle, netns boundaries, process identity, SSRF or policy bypass, shutdown races, port collisions, env semantics, WebSocket rewrite, TLS handling, and secret-safe logging.
  • Ran a combined OpenShell plus NemoClaw validation pass against the dependent NemoClaw PR using Colima, not Docker Desktop.

Findings Incorporated

  • The important design correction from adversarial review: the listener must accept inside the sandbox network namespace for loopback reachability and socket attribution, but upstream dialing must stay in the normal supervisor-side proxy path. The PR now accepts sockets in the sandbox netns and dispatches accepted sockets back to the host-side proxy runtime.
  • This keeps OpenShell as the security boundary: OPA policy, SSRF checks, TLS handling, L7/WebSocket handling, credential rewrite, and upstream TCP connect remain centralized in the existing proxy implementation.
  • The feature remains generic. There is no Discord-specific routing or credential logic in OpenShell.
  • Startup is fail-explicit for bind/startup failures, and the exposed URL is a separate OPENSHELL_LOOPBACK_PROXY_URL so existing proxy env behavior stays compatible.

Combined Validation

  • Environment: Colima Docker runtime/socket, isolated temp HOME/XDG/npm state, gateway port 18080, dashboard port 18889, unique Docker network, and unique OpenShell Docker sandbox namespace.
  • OpenShell under test: PR head 083c0663187a3e93e60cd4d32b30053475cb0890, openshell 0.0.47-dev.3+gb75abad.
  • NemoClaw under test: dependent PR head dd022b578806aa3880b33ac2c3fe6a86f230ff33.
  • Docker supervisor under test: Linux arm64 openshell-sandbox built from this OpenShell PR and mounted through [openshell.drivers.docker].supervisor_bin.
  • Result: the combined messaging E2E proved the target behavior. M9b baked http://127.0.0.1:3128 into OpenClaw Discord config, M13d-config connected to the fake Discord Gateway using the proxy URL from openclaw.json, M13d completed the native WebSocket upgrade, M13f proved the fake Gateway received the host-side Discord token while the sandbox-visible IDENTIFY carried only the placeholder, and M13g proved an unregistered placeholder fails closed.
  • Scope note: the broad messaging script ended with 96 passed, 1 failed, 5 skipped. The single failure was S1, an existing Slack guard probe that hardcodes internal port 18789; this isolated run intentionally used NEMOCLAW_DASHBOARD_PORT=18889 to avoid a local port collision. That failure is outside the Discord loopback/WebSocket credential-rewrite path and S2 still proved the Slack guard caught invalid_auth without crashing the gateway.

References Considered

Test Plan

  • cargo fmt
  • cargo fmt --all -- --check
  • git diff --check
  • cargo check -p openshell-sandbox
  • cargo clippy -p openshell-core -p openshell-sandbox --all-targets -- -D warnings
  • cargo clippy --workspace --all-targets -- -D warnings
  • cargo clippy --manifest-path e2e/rust/Cargo.toml --all-targets -- -D warnings
  • cargo test -p openshell-sandbox apply_loopback_proxy_env_exposes_managed_url_without_changing_proxy_vars --lib
  • cargo test -p openshell-sandbox websocket --lib
  • cargo test -p openshell-sandbox --test websocket_upgrade
  • Combined NemoClaw/OpenShell Colima E2E target proof: M9b, M13d-config, M13d, M13f, M13g passed using the OpenClaw Discord account proxy from generated config.

Note: mise run rust:format:check / mise run rust:lint are blocked locally because this checkout's mise.toml is not trusted on the machine, so the underlying task commands above were run directly.

@johntmyers
Copy link
Copy Markdown
Collaborator

@ericksoa I see this maps back to something for Discord in NemoClaw but can you expand on the requirement a bit more and the use case?

@ericksoa
Copy link
Copy Markdown
Collaborator Author

Good question. The OpenShell requirement here is not Discord-specific; Discord/OpenClaw is just the first concrete client that exposed the gap.

The generic use case is: a sandboxed app or SDK needs to use an HTTP CONNECT proxy, but the app will only accept a loopback proxy URL such as http://127.0.0.1:<port>. OpenShell already injects and owns the real gateway proxy path, but that address is not always accepted by application-level proxy config validators. In those cases we need a sandbox-local loopback adapter that still forwards into the normal OpenShell L7 proxy path, rather than making every integration ship its own little localhost proxy.

The Discord case is a good example because OpenClaw's Discord Gateway client ignores HTTP_PROXY/HTTPS_PROXY and uses the per-account Discord proxy config instead. That config only accepts loopback proxy URLs. At the same time, the Gateway path is a WebSocket path where OpenShell must remain in the middle so policy, the allowed 101 upgrade, WEBSOCKET_TEXT, and credential rewrite for the IDENTIFY payload all still happen centrally in OpenShell. Without this managed loopback listener, NemoClaw has to provide a temporary transport shim just to adapt 127.0.0.1:<port> back to the OpenShell proxy.

So the intended OpenShell contract is:

  • expose a lifecycle-bound sandbox-local proxy URL, currently via OPENSHELL_LOOPBACK_PROXY_URL;
  • bind it only on loopback inside the sandbox network namespace;
  • forward accepted sockets into the existing OpenShell proxy implementation;
  • keep policy, TLS handling, WebSocket handling, credential rewrite, and upstream dialing in the normal OpenShell path;
  • avoid any Discord-specific logic in OpenShell.

That gives clients with strict localhost proxy requirements a standard OpenShell affordance, while keeping the security boundary and protocol handling in one place.

@ericksoa
Copy link
Copy Markdown
Collaborator Author

For the concrete downstream context, the dependent NemoClaw PR is NVIDIA/NemoClaw#4005: NVIDIA/NemoClaw#4005

That PR consumes this OpenShell affordance by preferring OPENSHELL_LOOPBACK_PROXY_URL when generating the OpenClaw Discord account config, while keeping NemoClaw's existing helper as a compatibility fallback until NemoClaw can raise its minimum OpenShell pin past the release that includes this feature.

@ericksoa ericksoa self-assigned this May 21, 2026
@johntmyers
Copy link
Copy Markdown
Collaborator

Ok, understood on the use case. So the listener is placed on the sandbox loopback but then after setns() gets called, the thread stays there so the DNS lookup and upstream dial are attempting to happen from inside the restricted sandbox-side namespace. I would think the NFT rules are gonna see that as direct outbound traffic and block it.

The better impl is to keep only the loopback listener in the sandbox network namespace and then hand the accepted client socket back to code that's running in the supervisor network namesapce before DNS, policy handling, cred rewrite and upstream dialing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants