Skip to content

refactor(inference): simplify routing — introduce inference.local, remove implicit catch-all#146

Merged
pimlock merged 2 commits intomainfrom
feat/133-inference-local-simplification/pimlock
Mar 6, 2026
Merged

refactor(inference): simplify routing — introduce inference.local, remove implicit catch-all#146
pimlock merged 2 commits intomainfrom
feat/133-inference-local-simplification/pimlock

Conversation

@pimlock
Copy link
Collaborator

@pimlock pimlock commented Mar 6, 2026

Summary

Closes #133

Simplifies the inference routing system from a multi-route CRUD model to a single managed cluster route (inference.local). This is a large architectural cleanup that removes ~1,750 lines of dead/unnecessary code while making the system easier to reason about.

Key changes

  • Remove inference route CRUD: Delete Create/Update/Delete/ListInferenceRoutes RPCs, CLI commands, and all supporting code. Cluster inference is now configured solely via nemoclaw cluster inference set/get/update.
  • Remove InspectForInference OPA action: Policy evaluation is now binary allow/deny. Inference routing is triggered explicitly by CONNECT to inference.local, not by an implicit catch-all on unmatched connections.
  • Provider-agnostic router: Introduce AuthHeader enum and InferenceProviderProfile in navigator-core. The router no longer checks provider types — auth style (Bearer vs custom header) and default headers are carried on ResolvedRoute.
  • Proto cleanup: Rename routing_hintname, SandboxResolvedRouteResolvedRoute, GetSandboxInferenceBundleGetInferenceBundle. Replace InferenceRouteSpec (8 fields, 6 always empty) with ClusterInferenceConfig (2 fields: provider_name, model_id). Drop unused sandbox_id parameter from bundle request.
  • Dead code removal: Delete stale checked-in navigator.inference.v1.rs (~800 lines), remove InferencePolicy/InferenceApiPattern from proto and policy crate, clean up OPA Rego rules.
  • Add nemoclaw cluster inference update: New CLI command for partial updates (get-then-set) — change provider or model without re-specifying the full config.
  • Architecture docs: Updated sandbox.md, README.md, gateway.md, inference-routing.md, security-policy.md to reflect new design.

Configuration flow

# 1. Create a provider with credentials
nemoclaw provider create --name nvidia-build --type nvidia --from-existing

# 2. Set cluster-level inference (provider + model)
nemoclaw cluster inference set --provider nvidia-build --model meta/llama-3.1-8b-instruct

# 3. Verify configuration
nemoclaw cluster inference get
# provider: nvidia-build
# model:    meta/llama-3.1-8b-instruct
# version:  1

# 4. Switch models without re-specifying the provider
nemoclaw cluster inference update --model meta/llama-3.3-70b-instruct

# 5. Switch to a different provider entirely
nemoclaw provider create --name openai-prod --type openai --from-existing
nemoclaw cluster inference set --provider openai-prod --model gpt-4o

Using inference from a sandbox

Important: inference.local only works over HTTPS. The sandbox proxy intercepts
HTTPS CONNECT requests to inference.local and routes them to the configured backend.
Plain HTTP requests are not intercepted.

# Inside any sandbox — no credentials needed, model is overridden by cluster config
curl https://inference.local/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anything",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
# → routed to the configured provider with injected API key
# → "model" field is rewritten to the cluster-configured model

The proxy intercepts the CONNECT tunnel to inference.local, performs a TLS handshake internally, parses the HTTP request inside the tunnel, and forwards it to the configured upstream provider with the correct credentials and model override.

Plan

See the plan for more details.

Follow-up

Deferred items tracked in #148:

  • Default NVIDIA inference fallback when no cluster config is set
  • Inject inference.local instructions into sandboxes for agent discovery

Related issues

Closes #21
Closes #20

@pimlock pimlock self-assigned this Mar 6, 2026
@pimlock pimlock changed the title refactor(inference): simplify routing — introduce inference.local, remove implicit catch-all [skip ci] refactor(inference): simplify routing — introduce inference.local, remove implicit catch-all Mar 6, 2026
@pimlock pimlock force-pushed the feat/133-inference-local-simplification/pimlock branch 3 times, most recently from f7153b2 to 5cdf9c4 Compare March 6, 2026 07:47
@pimlock
Copy link
Collaborator Author

pimlock commented Mar 6, 2026

One design question to think through: should provider lifecycle be coupled to cluster inference configuration?

Right now SetClusterInference stores only a reference to the provider. That means if the referenced provider is later deleted, inference requests will fail at bundle resolution / request time.

The alternative would be to snapshot provider details when inference is configured, but that has a downside too: provider fixes would stop propagating automatically. For example, if I create a provider with an invalid API key, then correct the key on the provider, I would also need to re-run nemoclaw cluster inference set for inference to start working again.

Curious whether we want to:

  1. prevent deletion of providers that are currently referenced by cluster inference,
  2. allow deletion but make the failure mode / UX explicit, or
  3. snapshot provider config into inference and accept the loss of automatic propagation.

Feels like option 1 or 2 is probably cleaner than snapshotting, but wanted to call out the tradeoff explicitly.


There is a similar consideration for providers that are used with sandboxes -> once we are injecting things on the fly, is an update to a provider applied to any running sandboxes?

@pimlock pimlock force-pushed the feat/133-inference-local-simplification/pimlock branch from 5cdf9c4 to ad8ac0f Compare March 6, 2026 07:57
@pimlock pimlock requested review from drew and johntmyers March 6, 2026 07:58
@pimlock pimlock force-pushed the feat/133-inference-local-simplification/pimlock branch from ad8ac0f to 021b11e Compare March 6, 2026 08:10
@pimlock
Copy link
Collaborator Author

pimlock commented Mar 6, 2026

Migration skill (agent guide for upgrading existing clusters) is on a separate branch: https://github.com/NVIDIA/NemoClaw/blob/feat/migrate-inference-routing-skill/.agents/skills/migrate-inference-routing/SKILL.md

@johntmyers
Copy link
Collaborator

This is great!

Re: your question:

prevent deletion of providers that are currently referenced by cluster inference,

I kind of lean that way.

Does this same problem exist for providers that are in use by a sandbox too?

Comment on lines +31 to +34
│ ├── inference
│ │ ├── set --provider --model
│ │ ├── update [--provider] [--model]
│ │ └── get
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somewhat arbitrarily starting a thread to discuss

prevent deletion of providers that are currently referenced by cluster inference,

I'm inclined to punt on this for now, and then solve referential integrity across the platform as a separate effort. could probably be a post gtc thing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From @johntmyers above

I kind of lean that way.
Does this same problem exist for providers that are in use by a sandbox too?

Will capture this as a ticket.

@pimlock pimlock force-pushed the feat/133-inference-local-simplification/pimlock branch from 021b11e to 23456ec Compare March 6, 2026 17:03
…move implicit catch-all [skip ci]

Remove multi-route CRUD system and replace with single managed cluster
route (inference.local). Key changes:

- Remove inference route CRUD RPCs and CLI commands
- Remove InspectForInference OPA action; policy is binary allow/deny
- Introduce AuthHeader enum and InferenceProviderProfile in navigator-core
- Router is now provider-agnostic: auth style carried on ResolvedRoute
- Replace InferenceRouteSpec with ClusterInferenceConfig (2 fields vs 8)
- Rename proto: routing_hint->name, SandboxResolvedRoute->ResolvedRoute,
  GetSandboxInferenceBundle->GetInferenceBundle, drop sandbox_id param
- Rename RouteConfig.route -> RouteConfig.name; use inference.local
- Add 'nemoclaw cluster inference update' for partial config changes
- Delete stale navigator.inference.v1.rs checked-in proto file
- Update architecture docs, agent skills, and CLI reference

Closes #133
@pimlock pimlock force-pushed the feat/133-inference-local-simplification/pimlock branch from 23456ec to e76a11a Compare March 6, 2026 21:34
@pimlock pimlock merged commit e3ea796 into main Mar 6, 2026
10 checks passed
@pimlock pimlock deleted the feat/133-inference-local-simplification/pimlock branch March 6, 2026 21:54
johntmyers added a commit that referenced this pull request Mar 6, 2026
The inference routing simplification in #146 reduced NetworkAction to
Allow/Deny, removing InspectForInference. Drop the dead match arm from
handle_forward_proxy.
johntmyers added a commit that referenced this pull request Mar 7, 2026
#158)

* feat(proxy): support plain HTTP forward proxy for private IP endpoints

Add forward proxy mode to the sandbox proxy so that standard HTTP
libraries (httpx, requests, etc.) work with HTTP_PROXY for plain HTTP
calls to private IP endpoints. Previously, non-CONNECT methods were
unconditionally rejected with 403.

The forward proxy path requires all three conditions to be met:
- OPA policy explicitly allows the destination
- The matched endpoint has allowed_ips configured
- All resolved IPs are RFC 1918 private

This ensures plain HTTP never reaches the public internet while enabling
seamless access to internal services without custom CONNECT tunnel code.

Implementation:
- parse_proxy_uri(): parses absolute-form URIs into components
- rewrite_forward_request(): rewrites to origin-form, strips hop-by-hop
  headers, adds Via and Connection: close
- handle_forward_proxy(): full handler with OPA eval, SSRF checks,
  private-IP gate, upstream connect, and bidirectional relay
- Updated dispatch in handle_tcp_connection to route non-CONNECT methods

Includes 14 unit tests and 6 E2E tests (FWD-1 through FWD-6).
CONNECT path remains completely untouched.

Closes #155

* fix(proxy): remove InspectForInference match arm removed by #146

The inference routing simplification in #146 reduced NetworkAction to
Allow/Deny, removing InspectForInference. Drop the dead match arm from
handle_forward_proxy.

* fix(sandbox): restore BestEffort as default Landlock compatibility

The Landlock V2 upgrade in #151 changed the default from BestEffort to
HardRequirement. This causes all proxy-mode sandboxes to crash with
Permission denied when the policy omits the landlock field, because the
child process gets locked to only /etc/navigator-tls and /sandbox.

Restore BestEffort as the default so policies without an explicit
landlock field degrade gracefully.

Fixes #161

* fix(sandbox): inject baseline filesystem paths for proxy-mode sandboxes

Proxy-mode sandboxes need baseline filesystem paths (/usr, /lib, /etc,
/app, /var/log read-only; /sandbox, /tmp read-write) for the child
process to function under Landlock. Without these, the child can't exec
binaries, resolve DNS, or load shared libraries.

The supervisor now enriches the policy with these baseline paths at
startup, covering both standalone (file) and gateway (gRPC) modes. For
gateway mode, the enriched policy is synced back so users see the
effective policy via 'nemoclaw sandbox get'.

The gateway validation is relaxed to allow additive filesystem changes
(new paths can be added, existing paths cannot be removed) to support
the supervisor's enrichment sync-back.

Includes 2 E2E tests: BFS-1 (missing filesystem_policy) and BFS-2
(incomplete filesystem_policy).

Fixes #161

* fix(e2e): update assertion for relaxed filesystem validation message

---------

Co-authored-by: John Myers <johntmyers@users.noreply.github.com>
drew pushed a commit that referenced this pull request Mar 16, 2026
…move implicit catch-all (#146)

Remove multi-route CRUD system and replace with single managed cluster
route (inference.local). Key changes:

- Remove inference route CRUD RPCs and CLI commands
- Remove InspectForInference OPA action; policy is binary allow/deny
- Introduce AuthHeader enum and InferenceProviderProfile in navigator-core
- Router is now provider-agnostic: auth style carried on ResolvedRoute
- Replace InferenceRouteSpec with ClusterInferenceConfig (2 fields vs 8)
- Rename proto: routing_hint->name, SandboxResolvedRoute->ResolvedRoute,
  GetSandboxInferenceBundle->GetInferenceBundle, drop sandbox_id param
- Rename RouteConfig.route -> RouteConfig.name; use inference.local
- Add 'nemoclaw cluster inference update' for partial config changes
- Delete stale navigator.inference.v1.rs checked-in proto file
- Update architecture docs, agent skills, and CLI reference

Closes #133
drew pushed a commit that referenced this pull request Mar 16, 2026
#158)

* feat(proxy): support plain HTTP forward proxy for private IP endpoints

Add forward proxy mode to the sandbox proxy so that standard HTTP
libraries (httpx, requests, etc.) work with HTTP_PROXY for plain HTTP
calls to private IP endpoints. Previously, non-CONNECT methods were
unconditionally rejected with 403.

The forward proxy path requires all three conditions to be met:
- OPA policy explicitly allows the destination
- The matched endpoint has allowed_ips configured
- All resolved IPs are RFC 1918 private

This ensures plain HTTP never reaches the public internet while enabling
seamless access to internal services without custom CONNECT tunnel code.

Implementation:
- parse_proxy_uri(): parses absolute-form URIs into components
- rewrite_forward_request(): rewrites to origin-form, strips hop-by-hop
  headers, adds Via and Connection: close
- handle_forward_proxy(): full handler with OPA eval, SSRF checks,
  private-IP gate, upstream connect, and bidirectional relay
- Updated dispatch in handle_tcp_connection to route non-CONNECT methods

Includes 14 unit tests and 6 E2E tests (FWD-1 through FWD-6).
CONNECT path remains completely untouched.

Closes #155

* fix(proxy): remove InspectForInference match arm removed by #146

The inference routing simplification in #146 reduced NetworkAction to
Allow/Deny, removing InspectForInference. Drop the dead match arm from
handle_forward_proxy.

* fix(sandbox): restore BestEffort as default Landlock compatibility

The Landlock V2 upgrade in #151 changed the default from BestEffort to
HardRequirement. This causes all proxy-mode sandboxes to crash with
Permission denied when the policy omits the landlock field, because the
child process gets locked to only /etc/navigator-tls and /sandbox.

Restore BestEffort as the default so policies without an explicit
landlock field degrade gracefully.

Fixes #161

* fix(sandbox): inject baseline filesystem paths for proxy-mode sandboxes

Proxy-mode sandboxes need baseline filesystem paths (/usr, /lib, /etc,
/app, /var/log read-only; /sandbox, /tmp read-write) for the child
process to function under Landlock. Without these, the child can't exec
binaries, resolve DNS, or load shared libraries.

The supervisor now enriches the policy with these baseline paths at
startup, covering both standalone (file) and gateway (gRPC) modes. For
gateway mode, the enriched policy is synced back so users see the
effective policy via 'nemoclaw sandbox get'.

The gateway validation is relaxed to allow additive filesystem changes
(new paths can be added, existing paths cannot be removed) to support
the supervisor's enrichment sync-back.

Includes 2 E2E tests: BFS-1 (missing filesystem_policy) and BFS-2
(incomplete filesystem_policy).

Fixes #161

* fix(e2e): update assertion for relaxed filesystem validation message

---------

Co-authored-by: John Myers <johntmyers@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants