refactor(inference): simplify routing — introduce inference.local, remove implicit catch-all by pimlock · Pull Request #146 · NVIDIA/OpenShell

pimlock · 2026-03-06T07:11:29Z

Summary

Closes #133

Simplifies the inference routing system from a multi-route CRUD model to a single managed cluster route (inference.local). This is a large architectural cleanup that removes ~1,750 lines of dead/unnecessary code while making the system easier to reason about.

Key changes

Remove inference route CRUD: Delete Create/Update/Delete/ListInferenceRoutes RPCs, CLI commands, and all supporting code. Cluster inference is now configured solely via nemoclaw cluster inference set/get/update.
Remove InspectForInference OPA action: Policy evaluation is now binary allow/deny. Inference routing is triggered explicitly by CONNECT to inference.local, not by an implicit catch-all on unmatched connections.
Provider-agnostic router: Introduce AuthHeader enum and InferenceProviderProfile in navigator-core. The router no longer checks provider types — auth style (Bearer vs custom header) and default headers are carried on ResolvedRoute.
Proto cleanup: Rename routing_hint → name, SandboxResolvedRoute → ResolvedRoute, GetSandboxInferenceBundle → GetInferenceBundle. Replace InferenceRouteSpec (8 fields, 6 always empty) with ClusterInferenceConfig (2 fields: provider_name, model_id). Drop unused sandbox_id parameter from bundle request.
Dead code removal: Delete stale checked-in navigator.inference.v1.rs (~800 lines), remove InferencePolicy/InferenceApiPattern from proto and policy crate, clean up OPA Rego rules.
Add nemoclaw cluster inference update: New CLI command for partial updates (get-then-set) — change provider or model without re-specifying the full config.
Architecture docs: Updated sandbox.md, README.md, gateway.md, inference-routing.md, security-policy.md to reflect new design.

Configuration flow

# 1. Create a provider with credentials
nemoclaw provider create --name nvidia-build --type nvidia --from-existing

# 2. Set cluster-level inference (provider + model)
nemoclaw cluster inference set --provider nvidia-build --model meta/llama-3.1-8b-instruct

# 3. Verify configuration
nemoclaw cluster inference get
# provider: nvidia-build
# model:    meta/llama-3.1-8b-instruct
# version:  1

# 4. Switch models without re-specifying the provider
nemoclaw cluster inference update --model meta/llama-3.3-70b-instruct

# 5. Switch to a different provider entirely
nemoclaw provider create --name openai-prod --type openai --from-existing
nemoclaw cluster inference set --provider openai-prod --model gpt-4o

Using inference from a sandbox

Important: inference.local only works over HTTPS. The sandbox proxy intercepts
HTTPS CONNECT requests to inference.local and routes them to the configured backend.
Plain HTTP requests are not intercepted.

# Inside any sandbox — no credentials needed, model is overridden by cluster config
curl https://inference.local/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anything",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
# → routed to the configured provider with injected API key
# → "model" field is rewritten to the cluster-configured model

The proxy intercepts the CONNECT tunnel to inference.local, performs a TLS handshake internally, parses the HTTP request inside the tunnel, and forwards it to the configured upstream provider with the correct credentials and model override.

Plan

See the plan for more details.

Follow-up

Deferred items tracked in #148:

Default NVIDIA inference fallback when no cluster config is set
Inject inference.local instructions into sandboxes for agent discovery

Related issues

Closes #21
Closes #20

pimlock · 2026-03-06T07:50:01Z

One design question to think through: should provider lifecycle be coupled to cluster inference configuration?

Right now SetClusterInference stores only a reference to the provider. That means if the referenced provider is later deleted, inference requests will fail at bundle resolution / request time.

The alternative would be to snapshot provider details when inference is configured, but that has a downside too: provider fixes would stop propagating automatically. For example, if I create a provider with an invalid API key, then correct the key on the provider, I would also need to re-run nemoclaw cluster inference set for inference to start working again.

Curious whether we want to:

prevent deletion of providers that are currently referenced by cluster inference,
allow deletion but make the failure mode / UX explicit, or
snapshot provider config into inference and accept the loss of automatic propagation.

Feels like option 1 or 2 is probably cleaner than snapshotting, but wanted to call out the tradeoff explicitly.

There is a similar consideration for providers that are used with sandboxes -> once we are injecting things on the fly, is an update to a provider applied to any running sandboxes?

pimlock · 2026-03-06T08:12:43Z

Migration skill (agent guide for upgrading existing clusters) is on a separate branch: https://github.com/NVIDIA/NemoClaw/blob/feat/migrate-inference-routing-skill/.agents/skills/migrate-inference-routing/SKILL.md

johntmyers · 2026-03-06T16:12:34Z

This is great!

Re: your question:

prevent deletion of providers that are currently referenced by cluster inference,

I kind of lean that way.

Does this same problem exist for providers that are in use by a sandbox too?

drew · 2026-03-06T16:42:21Z

.agents/skills/nemoclaw-cli/cli-reference.md

+│   ├── inference
+│   │   ├── set --provider --model
+│   │   ├── update [--provider] [--model]
+│   │   └── get


somewhat arbitrarily starting a thread to discuss

prevent deletion of providers that are currently referenced by cluster inference,

I'm inclined to punt on this for now, and then solve referential integrity across the platform as a separate effort. could probably be a post gtc thing.

From @johntmyers above

I kind of lean that way.
Does this same problem exist for providers that are in use by a sandbox too?

Will capture this as a ticket.

…move implicit catch-all [skip ci] Remove multi-route CRUD system and replace with single managed cluster route (inference.local). Key changes: - Remove inference route CRUD RPCs and CLI commands - Remove InspectForInference OPA action; policy is binary allow/deny - Introduce AuthHeader enum and InferenceProviderProfile in navigator-core - Router is now provider-agnostic: auth style carried on ResolvedRoute - Replace InferenceRouteSpec with ClusterInferenceConfig (2 fields vs 8) - Rename proto: routing_hint->name, SandboxResolvedRoute->ResolvedRoute, GetSandboxInferenceBundle->GetInferenceBundle, drop sandbox_id param - Rename RouteConfig.route -> RouteConfig.name; use inference.local - Add 'nemoclaw cluster inference update' for partial config changes - Delete stale navigator.inference.v1.rs checked-in proto file - Update architecture docs, agent skills, and CLI reference Closes #133

The inference routing simplification in #146 reduced NetworkAction to Allow/Deny, removing InspectForInference. Drop the dead match arm from handle_forward_proxy.

#158) * feat(proxy): support plain HTTP forward proxy for private IP endpoints Add forward proxy mode to the sandbox proxy so that standard HTTP libraries (httpx, requests, etc.) work with HTTP_PROXY for plain HTTP calls to private IP endpoints. Previously, non-CONNECT methods were unconditionally rejected with 403. The forward proxy path requires all three conditions to be met: - OPA policy explicitly allows the destination - The matched endpoint has allowed_ips configured - All resolved IPs are RFC 1918 private This ensures plain HTTP never reaches the public internet while enabling seamless access to internal services without custom CONNECT tunnel code. Implementation: - parse_proxy_uri(): parses absolute-form URIs into components - rewrite_forward_request(): rewrites to origin-form, strips hop-by-hop headers, adds Via and Connection: close - handle_forward_proxy(): full handler with OPA eval, SSRF checks, private-IP gate, upstream connect, and bidirectional relay - Updated dispatch in handle_tcp_connection to route non-CONNECT methods Includes 14 unit tests and 6 E2E tests (FWD-1 through FWD-6). CONNECT path remains completely untouched. Closes #155 * fix(proxy): remove InspectForInference match arm removed by #146 The inference routing simplification in #146 reduced NetworkAction to Allow/Deny, removing InspectForInference. Drop the dead match arm from handle_forward_proxy. * fix(sandbox): restore BestEffort as default Landlock compatibility The Landlock V2 upgrade in #151 changed the default from BestEffort to HardRequirement. This causes all proxy-mode sandboxes to crash with Permission denied when the policy omits the landlock field, because the child process gets locked to only /etc/navigator-tls and /sandbox. Restore BestEffort as the default so policies without an explicit landlock field degrade gracefully. Fixes #161 * fix(sandbox): inject baseline filesystem paths for proxy-mode sandboxes Proxy-mode sandboxes need baseline filesystem paths (/usr, /lib, /etc, /app, /var/log read-only; /sandbox, /tmp read-write) for the child process to function under Landlock. Without these, the child can't exec binaries, resolve DNS, or load shared libraries. The supervisor now enriches the policy with these baseline paths at startup, covering both standalone (file) and gateway (gRPC) modes. For gateway mode, the enriched policy is synced back so users see the effective policy via 'nemoclaw sandbox get'. The gateway validation is relaxed to allow additive filesystem changes (new paths can be added, existing paths cannot be removed) to support the supervisor's enrichment sync-back. Includes 2 E2E tests: BFS-1 (missing filesystem_policy) and BFS-2 (incomplete filesystem_policy). Fixes #161 * fix(e2e): update assertion for relaxed filesystem validation message --------- Co-authored-by: John Myers <johntmyers@users.noreply.github.com>

…move implicit catch-all (#146) Remove multi-route CRUD system and replace with single managed cluster route (inference.local). Key changes: - Remove inference route CRUD RPCs and CLI commands - Remove InspectForInference OPA action; policy is binary allow/deny - Introduce AuthHeader enum and InferenceProviderProfile in navigator-core - Router is now provider-agnostic: auth style carried on ResolvedRoute - Replace InferenceRouteSpec with ClusterInferenceConfig (2 fields vs 8) - Rename proto: routing_hint->name, SandboxResolvedRoute->ResolvedRoute, GetSandboxInferenceBundle->GetInferenceBundle, drop sandbox_id param - Rename RouteConfig.route -> RouteConfig.name; use inference.local - Add 'nemoclaw cluster inference update' for partial config changes - Delete stale navigator.inference.v1.rs checked-in proto file - Update architecture docs, agent skills, and CLI reference Closes #133

#158) * feat(proxy): support plain HTTP forward proxy for private IP endpoints Add forward proxy mode to the sandbox proxy so that standard HTTP libraries (httpx, requests, etc.) work with HTTP_PROXY for plain HTTP calls to private IP endpoints. Previously, non-CONNECT methods were unconditionally rejected with 403. The forward proxy path requires all three conditions to be met: - OPA policy explicitly allows the destination - The matched endpoint has allowed_ips configured - All resolved IPs are RFC 1918 private This ensures plain HTTP never reaches the public internet while enabling seamless access to internal services without custom CONNECT tunnel code. Implementation: - parse_proxy_uri(): parses absolute-form URIs into components - rewrite_forward_request(): rewrites to origin-form, strips hop-by-hop headers, adds Via and Connection: close - handle_forward_proxy(): full handler with OPA eval, SSRF checks, private-IP gate, upstream connect, and bidirectional relay - Updated dispatch in handle_tcp_connection to route non-CONNECT methods Includes 14 unit tests and 6 E2E tests (FWD-1 through FWD-6). CONNECT path remains completely untouched. Closes #155 * fix(proxy): remove InspectForInference match arm removed by #146 The inference routing simplification in #146 reduced NetworkAction to Allow/Deny, removing InspectForInference. Drop the dead match arm from handle_forward_proxy. * fix(sandbox): restore BestEffort as default Landlock compatibility The Landlock V2 upgrade in #151 changed the default from BestEffort to HardRequirement. This causes all proxy-mode sandboxes to crash with Permission denied when the policy omits the landlock field, because the child process gets locked to only /etc/navigator-tls and /sandbox. Restore BestEffort as the default so policies without an explicit landlock field degrade gracefully. Fixes #161 * fix(sandbox): inject baseline filesystem paths for proxy-mode sandboxes Proxy-mode sandboxes need baseline filesystem paths (/usr, /lib, /etc, /app, /var/log read-only; /sandbox, /tmp read-write) for the child process to function under Landlock. Without these, the child can't exec binaries, resolve DNS, or load shared libraries. The supervisor now enriches the policy with these baseline paths at startup, covering both standalone (file) and gateway (gRPC) modes. For gateway mode, the enriched policy is synced back so users see the effective policy via 'nemoclaw sandbox get'. The gateway validation is relaxed to allow additive filesystem changes (new paths can be added, existing paths cannot be removed) to support the supervisor's enrichment sync-back. Includes 2 E2E tests: BFS-1 (missing filesystem_policy) and BFS-2 (incomplete filesystem_policy). Fixes #161 * fix(e2e): update assertion for relaxed filesystem validation message --------- Co-authored-by: John Myers <johntmyers@users.noreply.github.com>

pimlock self-assigned this Mar 6, 2026

pimlock changed the title ~~refactor(inference): simplify routing — introduce inference.local, remove implicit catch-all [skip ci]~~ refactor(inference): simplify routing — introduce inference.local, remove implicit catch-all Mar 6, 2026

pimlock force-pushed the feat/133-inference-local-simplification/pimlock branch 3 times, most recently from f7153b2 to 5cdf9c4 Compare March 6, 2026 07:47

pimlock force-pushed the feat/133-inference-local-simplification/pimlock branch from 5cdf9c4 to ad8ac0f Compare March 6, 2026 07:57

pimlock requested review from drew and johntmyers March 6, 2026 07:58

pimlock force-pushed the feat/133-inference-local-simplification/pimlock branch from ad8ac0f to 021b11e Compare March 6, 2026 08:10

This was referenced Mar 6, 2026

Inference routing follow-ups: default NVIDIA fallback and sandbox agent instructions #148

Closed

Inference route API keys exposed via ListInferenceRoutes #20

Closed

Inference route API keys stored in plain object store #21

Closed

drew reviewed Mar 6, 2026

View reviewed changes

drew approved these changes Mar 6, 2026

View reviewed changes

pimlock force-pushed the feat/133-inference-local-simplification/pimlock branch from 021b11e to 23456ec Compare March 6, 2026 17:03

johntmyers approved these changes Mar 6, 2026

View reviewed changes

pimlock force-pushed the feat/133-inference-local-simplification/pimlock branch from 23456ec to e76a11a Compare March 6, 2026 21:34

rename

24b55ea

pimlock merged commit e3ea796 into main Mar 6, 2026
10 checks passed

pimlock deleted the feat/133-inference-local-simplification/pimlock branch March 6, 2026 21:54

pimlock mentioned this pull request Mar 6, 2026

Documentation for NemoClaw #129

Closed

johntmyers mentioned this pull request Mar 6, 2026

feat(proxy): support plain HTTP forward proxy for private IP endpoints #158

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(inference): simplify routing — introduce inference.local, remove implicit catch-all#146

refactor(inference): simplify routing — introduce inference.local, remove implicit catch-all#146
pimlock merged 2 commits intomainfrom
feat/133-inference-local-simplification/pimlock

pimlock commented Mar 6, 2026 •

edited

Loading

Uh oh!

pimlock commented Mar 6, 2026 •

edited

Loading

Uh oh!

pimlock commented Mar 6, 2026 •

edited

Loading

Uh oh!

johntmyers commented Mar 6, 2026

Uh oh!

drew Mar 6, 2026

Uh oh!

pimlock Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pimlock commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key changes

Configuration flow

Using inference from a sandbox

Plan

Follow-up

Related issues

Uh oh!

pimlock commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pimlock commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johntmyers commented Mar 6, 2026

Uh oh!

drew Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

pimlock Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pimlock commented Mar 6, 2026 •

edited

Loading

pimlock commented Mar 6, 2026 •

edited

Loading

pimlock commented Mar 6, 2026 •

edited

Loading