Conversation
f7153b2 to
5cdf9c4
Compare
|
One design question to think through: should provider lifecycle be coupled to cluster inference configuration? Right now The alternative would be to snapshot provider details when inference is configured, but that has a downside too: provider fixes would stop propagating automatically. For example, if I create a provider with an invalid API key, then correct the key on the provider, I would also need to re-run Curious whether we want to:
Feels like option 1 or 2 is probably cleaner than snapshotting, but wanted to call out the tradeoff explicitly. There is a similar consideration for providers that are used with sandboxes -> once we are injecting things on the fly, is an update to a provider applied to any running sandboxes? |
5cdf9c4 to
ad8ac0f
Compare
ad8ac0f to
021b11e
Compare
|
Migration skill (agent guide for upgrading existing clusters) is on a separate branch: https://github.com/NVIDIA/NemoClaw/blob/feat/migrate-inference-routing-skill/.agents/skills/migrate-inference-routing/SKILL.md |
|
This is great! Re: your question:
I kind of lean that way. Does this same problem exist for providers that are in use by a sandbox too? |
| │ ├── inference | ||
| │ │ ├── set --provider --model | ||
| │ │ ├── update [--provider] [--model] | ||
| │ │ └── get |
There was a problem hiding this comment.
somewhat arbitrarily starting a thread to discuss
prevent deletion of providers that are currently referenced by cluster inference,
I'm inclined to punt on this for now, and then solve referential integrity across the platform as a separate effort. could probably be a post gtc thing.
There was a problem hiding this comment.
From @johntmyers above
I kind of lean that way.
Does this same problem exist for providers that are in use by a sandbox too?
Will capture this as a ticket.
021b11e to
23456ec
Compare
…move implicit catch-all [skip ci] Remove multi-route CRUD system and replace with single managed cluster route (inference.local). Key changes: - Remove inference route CRUD RPCs and CLI commands - Remove InspectForInference OPA action; policy is binary allow/deny - Introduce AuthHeader enum and InferenceProviderProfile in navigator-core - Router is now provider-agnostic: auth style carried on ResolvedRoute - Replace InferenceRouteSpec with ClusterInferenceConfig (2 fields vs 8) - Rename proto: routing_hint->name, SandboxResolvedRoute->ResolvedRoute, GetSandboxInferenceBundle->GetInferenceBundle, drop sandbox_id param - Rename RouteConfig.route -> RouteConfig.name; use inference.local - Add 'nemoclaw cluster inference update' for partial config changes - Delete stale navigator.inference.v1.rs checked-in proto file - Update architecture docs, agent skills, and CLI reference Closes #133
23456ec to
e76a11a
Compare
The inference routing simplification in #146 reduced NetworkAction to Allow/Deny, removing InspectForInference. Drop the dead match arm from handle_forward_proxy.
#158) * feat(proxy): support plain HTTP forward proxy for private IP endpoints Add forward proxy mode to the sandbox proxy so that standard HTTP libraries (httpx, requests, etc.) work with HTTP_PROXY for plain HTTP calls to private IP endpoints. Previously, non-CONNECT methods were unconditionally rejected with 403. The forward proxy path requires all three conditions to be met: - OPA policy explicitly allows the destination - The matched endpoint has allowed_ips configured - All resolved IPs are RFC 1918 private This ensures plain HTTP never reaches the public internet while enabling seamless access to internal services without custom CONNECT tunnel code. Implementation: - parse_proxy_uri(): parses absolute-form URIs into components - rewrite_forward_request(): rewrites to origin-form, strips hop-by-hop headers, adds Via and Connection: close - handle_forward_proxy(): full handler with OPA eval, SSRF checks, private-IP gate, upstream connect, and bidirectional relay - Updated dispatch in handle_tcp_connection to route non-CONNECT methods Includes 14 unit tests and 6 E2E tests (FWD-1 through FWD-6). CONNECT path remains completely untouched. Closes #155 * fix(proxy): remove InspectForInference match arm removed by #146 The inference routing simplification in #146 reduced NetworkAction to Allow/Deny, removing InspectForInference. Drop the dead match arm from handle_forward_proxy. * fix(sandbox): restore BestEffort as default Landlock compatibility The Landlock V2 upgrade in #151 changed the default from BestEffort to HardRequirement. This causes all proxy-mode sandboxes to crash with Permission denied when the policy omits the landlock field, because the child process gets locked to only /etc/navigator-tls and /sandbox. Restore BestEffort as the default so policies without an explicit landlock field degrade gracefully. Fixes #161 * fix(sandbox): inject baseline filesystem paths for proxy-mode sandboxes Proxy-mode sandboxes need baseline filesystem paths (/usr, /lib, /etc, /app, /var/log read-only; /sandbox, /tmp read-write) for the child process to function under Landlock. Without these, the child can't exec binaries, resolve DNS, or load shared libraries. The supervisor now enriches the policy with these baseline paths at startup, covering both standalone (file) and gateway (gRPC) modes. For gateway mode, the enriched policy is synced back so users see the effective policy via 'nemoclaw sandbox get'. The gateway validation is relaxed to allow additive filesystem changes (new paths can be added, existing paths cannot be removed) to support the supervisor's enrichment sync-back. Includes 2 E2E tests: BFS-1 (missing filesystem_policy) and BFS-2 (incomplete filesystem_policy). Fixes #161 * fix(e2e): update assertion for relaxed filesystem validation message --------- Co-authored-by: John Myers <johntmyers@users.noreply.github.com>
…move implicit catch-all (#146) Remove multi-route CRUD system and replace with single managed cluster route (inference.local). Key changes: - Remove inference route CRUD RPCs and CLI commands - Remove InspectForInference OPA action; policy is binary allow/deny - Introduce AuthHeader enum and InferenceProviderProfile in navigator-core - Router is now provider-agnostic: auth style carried on ResolvedRoute - Replace InferenceRouteSpec with ClusterInferenceConfig (2 fields vs 8) - Rename proto: routing_hint->name, SandboxResolvedRoute->ResolvedRoute, GetSandboxInferenceBundle->GetInferenceBundle, drop sandbox_id param - Rename RouteConfig.route -> RouteConfig.name; use inference.local - Add 'nemoclaw cluster inference update' for partial config changes - Delete stale navigator.inference.v1.rs checked-in proto file - Update architecture docs, agent skills, and CLI reference Closes #133
#158) * feat(proxy): support plain HTTP forward proxy for private IP endpoints Add forward proxy mode to the sandbox proxy so that standard HTTP libraries (httpx, requests, etc.) work with HTTP_PROXY for plain HTTP calls to private IP endpoints. Previously, non-CONNECT methods were unconditionally rejected with 403. The forward proxy path requires all three conditions to be met: - OPA policy explicitly allows the destination - The matched endpoint has allowed_ips configured - All resolved IPs are RFC 1918 private This ensures plain HTTP never reaches the public internet while enabling seamless access to internal services without custom CONNECT tunnel code. Implementation: - parse_proxy_uri(): parses absolute-form URIs into components - rewrite_forward_request(): rewrites to origin-form, strips hop-by-hop headers, adds Via and Connection: close - handle_forward_proxy(): full handler with OPA eval, SSRF checks, private-IP gate, upstream connect, and bidirectional relay - Updated dispatch in handle_tcp_connection to route non-CONNECT methods Includes 14 unit tests and 6 E2E tests (FWD-1 through FWD-6). CONNECT path remains completely untouched. Closes #155 * fix(proxy): remove InspectForInference match arm removed by #146 The inference routing simplification in #146 reduced NetworkAction to Allow/Deny, removing InspectForInference. Drop the dead match arm from handle_forward_proxy. * fix(sandbox): restore BestEffort as default Landlock compatibility The Landlock V2 upgrade in #151 changed the default from BestEffort to HardRequirement. This causes all proxy-mode sandboxes to crash with Permission denied when the policy omits the landlock field, because the child process gets locked to only /etc/navigator-tls and /sandbox. Restore BestEffort as the default so policies without an explicit landlock field degrade gracefully. Fixes #161 * fix(sandbox): inject baseline filesystem paths for proxy-mode sandboxes Proxy-mode sandboxes need baseline filesystem paths (/usr, /lib, /etc, /app, /var/log read-only; /sandbox, /tmp read-write) for the child process to function under Landlock. Without these, the child can't exec binaries, resolve DNS, or load shared libraries. The supervisor now enriches the policy with these baseline paths at startup, covering both standalone (file) and gateway (gRPC) modes. For gateway mode, the enriched policy is synced back so users see the effective policy via 'nemoclaw sandbox get'. The gateway validation is relaxed to allow additive filesystem changes (new paths can be added, existing paths cannot be removed) to support the supervisor's enrichment sync-back. Includes 2 E2E tests: BFS-1 (missing filesystem_policy) and BFS-2 (incomplete filesystem_policy). Fixes #161 * fix(e2e): update assertion for relaxed filesystem validation message --------- Co-authored-by: John Myers <johntmyers@users.noreply.github.com>
Summary
Closes #133
Simplifies the inference routing system from a multi-route CRUD model to a single managed cluster route (
inference.local). This is a large architectural cleanup that removes ~1,750 lines of dead/unnecessary code while making the system easier to reason about.Key changes
Create/Update/Delete/ListInferenceRoutesRPCs, CLI commands, and all supporting code. Cluster inference is now configured solely vianemoclaw cluster inference set/get/update.InspectForInferenceOPA action: Policy evaluation is now binary allow/deny. Inference routing is triggered explicitly by CONNECT toinference.local, not by an implicit catch-all on unmatched connections.AuthHeaderenum andInferenceProviderProfileinnavigator-core. The router no longer checks provider types — auth style (Bearer vs custom header) and default headers are carried onResolvedRoute.routing_hint→name,SandboxResolvedRoute→ResolvedRoute,GetSandboxInferenceBundle→GetInferenceBundle. ReplaceInferenceRouteSpec(8 fields, 6 always empty) withClusterInferenceConfig(2 fields:provider_name,model_id). Drop unusedsandbox_idparameter from bundle request.navigator.inference.v1.rs(~800 lines), removeInferencePolicy/InferenceApiPatternfrom proto and policy crate, clean up OPA Rego rules.nemoclaw cluster inference update: New CLI command for partial updates (get-then-set) — change provider or model without re-specifying the full config.sandbox.md,README.md,gateway.md,inference-routing.md,security-policy.mdto reflect new design.Configuration flow
Using inference from a sandbox
The proxy intercepts the CONNECT tunnel to
inference.local, performs a TLS handshake internally, parses the HTTP request inside the tunnel, and forwards it to the configured upstream provider with the correct credentials and model override.Plan
See the plan for more details.
Follow-up
Deferred items tracked in #148:
inference.localinstructions into sandboxes for agent discoveryRelated issues
Closes #21
Closes #20