-
Notifications
You must be signed in to change notification settings - Fork 380
Description
Problem Statement
Today, nemoclaw inference set supports exactly one model via a single --provider and --model flag. The model field in API payloads from sandboxes is unconditionally overwritten by the router, making it impossible for agents to select between different models. This limits use cases where sandboxes need access to multiple models — e.g., a cheap fast model for simple tasks and a more capable model for complex reasoning, or a specialized model for policy analysis.
The goal is to support multiple models per cluster, with an implicit default (first in the list) and optional alias-based routing so agents can specify which model to use via the standard model field in their API payloads.
Desired UX
# Multiple models, first is implicit default
nemoclaw inference set --model openai/gpt-4o --model anthropic/claude-sonnet-4
# With explicit aliases
nemoclaw inference set --model default:openai/gpt-4o --model policy-analyzer:anthropic/claude-sonnet-4
# Agent sends {"model": "policy-analyzer", ...} → routes to claude-sonnet-4
# Agent sends {"model": "gpt-4o", ...} or omits model → routes to default (gpt-4o)Technical Context
The inference routing system has a clean layered architecture: CLI → gRPC → server storage → bundle delivery → sandbox polling → proxy interception → router backend. The system already supports a repeated ResolvedRoute routes field on the wire (GetInferenceBundleResponse), but only ever populates 0 or 1 route. The core change is widening the single-model assumption that runs through every layer.
Critically, the model field from sandbox API payloads is currently overwritten unconditionally in the router (proxy_to_backend()), so the agent's model preference is discarded. Multi-model routing requires intercepting the model field before route selection, using it to pick the correct route, then letting the existing overwrite mechanism set the correct backend model ID.
Affected Components
| Component | Key Files | Role |
|---|---|---|
| CLI | crates/navigator-cli/src/main.rs, crates/navigator-cli/src/run.rs |
Argument parsing and gRPC dispatch for inference set/get/update |
| Proto | proto/inference.proto |
Wire format for inference config, routes, and bundles |
| Server | crates/navigator-server/src/inference.rs |
Storage, validation, and bundle resolution |
| Store | crates/navigator-server/src/persistence/mod.rs |
Generic key-value persistence for inference routes |
| Sandbox proxy | crates/navigator-sandbox/src/proxy.rs |
TLS interception, route selection, request forwarding |
| Sandbox lifecycle | crates/navigator-sandbox/src/lib.rs |
Bundle fetching, route refresh, proto→router conversion |
| Router | crates/navigator-router/src/lib.rs, config.rs, backend.rs |
Route matching, backend proxying, model field overwrite |
| gRPC client | crates/navigator-sandbox/src/grpc_client.rs |
Bundle fetching from gateway |
| Architecture docs | architecture/inference-routing.md |
Documentation |
Technical Investigation
Architecture Overview
The inference routing pipeline flows as:
- CLI (
nemoclaw inference set) sendsSetClusterInferenceRequestwith singleprovider_name+model_idto the gateway via gRPC. - Server validates the provider, builds a
ClusterInferenceConfig, and upserts a singleInferenceRouterecord with name"inference.local"into the persistent store. - Bundle delivery: When a sandbox calls
GetInferenceBundle, the server loads the"inference.local"route, resolves provider credentials/endpoint/protocols dynamically, and returns aGetInferenceBundleResponsewithrepeated ResolvedRoute routes(currently 0 or 1 entry). - Sandbox polling: A background task polls every 5s, converts proto
ResolvedRoute→ routerResolvedRoute, and atomically replaces the route cache (Arc<RwLock<Vec<ResolvedRoute>>>). - Proxy interception: On each inference request, the proxy reads routes from the cache and passes the full
Vec<ResolvedRoute>toproxy_with_candidates(). - Router:
proxy_with_candidates()picks the first route matching the detected protocol.proxy_to_backend()then overwrites themodelfield in the request body withroute.model.
Code References
| Location | Description |
|---|---|
crates/navigator-cli/src/main.rs:811-837 |
ClusterInferenceCommands::Set { provider, model } — singular args |
crates/navigator-cli/src/run.rs:2787-2809 |
cluster_inference_set() — builds single-model gRPC request |
crates/navigator-cli/src/run.rs:2852-2866 |
cluster_inference_get() — displays single provider/model/version |
crates/navigator-cli/src/run.rs:2811-2849 |
cluster_inference_update() — merges onto single config |
proto/inference.proto:33-38 |
ClusterInferenceConfig — single provider_name + model_id |
proto/inference.proto:41-48 |
InferenceRoute — wraps single config with id/name/version |
proto/inference.proto:50-55 |
SetClusterInferenceRequest — single model fields |
proto/inference.proto:74-81 |
ResolvedRoute — has name, base_url, model_id, api_key, etc. |
proto/inference.proto:83-89 |
GetInferenceBundleResponse — already repeated ResolvedRoute routes |
crates/navigator-server/src/inference.rs:118-171 |
upsert_cluster_inference_route() — single model upsert |
crates/navigator-server/src/inference.rs:260-291 |
resolve_inference_bundle() — wraps single route in vec |
crates/navigator-server/src/inference.rs:293-340 |
resolve_managed_cluster_route() — resolves one route by name |
crates/navigator-sandbox/src/proxy.rs:48-75 |
InferenceContext — holds routes: Arc<RwLock<Vec<ResolvedRoute>>> |
crates/navigator-sandbox/src/proxy.rs:752-841 |
route_inference_request() — passes all routes to router, no model filtering |
crates/navigator-sandbox/src/lib.rs:677-696 |
bundle_to_resolved_routes() — proto→router conversion, drops name field |
crates/navigator-router/src/config.rs:37-46 |
Router ResolvedRoute — no name field |
crates/navigator-router/src/lib.rs:59-97 |
proxy_with_candidates() — selects first protocol-matching route |
crates/navigator-router/src/backend.rs:82-93 |
proxy_to_backend() — unconditionally overwrites model field in body |
Current Behavior
- Agent sends
POST /v1/chat/completionswith{"model": "anything", ...} - Proxy intercepts, strips auth headers, detects inference pattern
route_inference_request()reads all routes (0 or 1), passes toproxy_with_candidates()- Router picks first protocol-matching route (always the single configured route)
proxy_to_backend()parses body JSON, replacesmodelwithroute.model, forwards to provider- Agent's
modelvalue is silently discarded
What Would Need to Change
Proto layer (proto/inference.proto):
- New message for a model entry: alias (string) + provider_name (string) + model_id (string)
ClusterInferenceConfigneeds arepeatedfield of model entries instead of (or alongside) the single fieldsSetClusterInferenceRequestneeds to accept multiple model entriesSetClusterInferenceResponseandGetClusterInferenceResponseneed to return the full model listResolvedRoute.namefield is already present and can carry the alias
CLI layer (crates/navigator-cli/):
--modelbecomes a repeatable arg accepting[alias:]provider/modelor[alias:]model(with separate--providerfor provider-scoped syntax)- Need to decide on CLI syntax for specifying provider per model (see Open Questions)
inference getoutput needs a table format for multiple modelsinference updatesemantics need clarification (additive? replace all?)
Server layer (crates/navigator-server/src/inference.rs):
upsert_cluster_inference_route()must handle multiple model entries- Two storage strategies: (a) multiple
InferenceRouterecords (one per alias), or (b) single record with a list inside. Option (a) fits the existing store API better since names are unique per object type. resolve_inference_bundle()must iterate all routes and return multipleResolvedRouteobjects- Each
ResolvedRouteneeds itsnameset to the alias
Router layer (crates/navigator-router/):
ResolvedRouteinconfig.rsneeds aname: Stringfield for alias matching- New route selection logic: match by
name(alias) OR bymodel(original model ID), with fallback to default proxy_with_candidates()or a new method needs to accept a model name hint and filter routes accordingly
Sandbox proxy layer (crates/navigator-sandbox/src/proxy.rs):
route_inference_request()must parse the request body to extract themodelfield before calling the router- Pass the extracted model name to the router for route selection
- Handle missing/empty model field → use default route
Sandbox lifecycle (crates/navigator-sandbox/src/lib.rs):
bundle_to_resolved_routes()must propagate the protoResolvedRoute.nameinto the routerResolvedRoute.name
Alternative Approaches Considered
A. Multiple InferenceRoute store records (one per alias)
- Fits existing store API (
put_messagewith unique names) - Each alias is a separate record:
"default","policy-analyzer", etc. resolve_inference_bundle()lists all records of typeinference_routeand resolves each- Pro: Simple store operations, easy to add/remove individual models
- Con: Atomicity — setting multiple models isn't transactional
B. Single InferenceRoute record with repeated model entries
- Keep one
"inference.local"record but expand the proto to hold a list - Pro: Atomic updates, single version number for the whole config
- Con: Requires proto restructuring, update semantics are more complex
C. Separate InferenceModelConfig object type
- New store object type for model configs, referenced by the route
- Pro: Clean separation, extensible
- Con: More moving parts, over-engineered for current scope
Recommendation: Option B (single record, repeated entries) is likely cleanest — it preserves atomic versioning which matters for the revision-based change detection in sandbox polling. The version bump signals "something changed" and the sandbox replaces its entire route cache, which already works for N routes.
Patterns to Follow
- The
ProviderCRUD in the server follows the same store pattern — look at how providers are stored/listed for reference - The
repeated ResolvedRoute routesinGetInferenceBundleResponsealready anticipates multiple routes - The
ResolvedRoute.namefield in the proto already exists and is set to"inference.local"— repurpose as alias - Clap's
Vec<String>pattern for repeatable args is idiomatic (used elsewhere in the CLI)
Proposed Approach
Extend the proto ClusterInferenceConfig with a repeated model entry message (alias + provider + model), keeping the single-record storage model for atomic versioning. Add a name field to the router's ResolvedRoute and propagate aliases through the bundle delivery pipeline. In the sandbox proxy, extract the model field from the request body before route selection, match it against route aliases/model IDs, and fall back to the first route (default). The CLI accepts --model as a repeatable flag with [alias:]model syntax.
Scope Assessment
- Complexity: Medium — changes touch many files but each change is well-scoped and the architecture is clean
- Confidence: High — the wire format already supports multiple routes, and the route cache is already a
Vec - Estimated files to change: ~12-15 (proto, CLI, server, router, sandbox proxy, sandbox lib, tests, docs)
- Issue type:
feat
Risks & Open Questions
- CLI syntax for per-model providers: If models can come from different providers, the current
--providerflag doesn't work. Options: (a)--model alias:provider/model, (b) separate--route alias --provider p --model mtriplets, (c) require provider in model string when ambiguous. Needs a design decision. - Update semantics: Should
inference setalways replace the full model list, or should there beinference add-model/inference remove-modelsubcommands? The currentupdatecommand merges fields — that pattern doesn't extend cleanly to lists. - Backward compatibility: Existing clusters have a single
ClusterInferenceConfigwithprovider_name/model_id. Proto migration needs to handle the old format gracefully (treat as single default model). - Alias collision: What if an alias matches a real model ID from a different route? Need clear precedence rules (alias match wins over model ID match?).
- Body parsing overhead: Extracting the
modelfield requires parsing the request body JSON before forwarding. This is already done inproxy_to_backend()— but moving it earlier means double-parsing. Consider parsing once and passing the parsed value through. - File-mode routes: The
--inference-routesYAML file path also needs to support multiple named routes. Lower priority but should be considered in the design. - Empty model field: Some API clients may omit the
modelfield entirely. The default route must handle this gracefully.
Test Considerations
- Server unit tests: Existing tests in
crates/navigator-server/src/inference.rs:342-581cover single-model set/get/resolve. Need parallel tests for multi-model: set multiple, resolve bundle returns all, alias round-trip, version increments atomically. - Router unit tests: Test route selection by model name/alias, fallback to default, protocol + alias combined filtering.
- Sandbox integration tests: Test bundle conversion with multiple routes, route refresh replacing multi-route cache.
- CLI integration: Test repeatable
--modelflag parsing, alias syntax parsing,inference getmulti-model display. - Backward compatibility test: Ensure a cluster with old single-model config still works after upgrade (proto migration).
- E2e test: Agent sends requests with different
modelvalues, verify correct backend routing.
Created by spike investigation. Use build-from-issue to plan and implement.