feat: support multiple models with alias-based routing in inference

## Problem Statement

Today, `nemoclaw inference set` supports exactly one model via a single `--provider` and `--model` flag. The `model` field in API payloads from sandboxes is unconditionally overwritten by the router, making it impossible for agents to select between different models. This limits use cases where sandboxes need access to multiple models — e.g., a cheap fast model for simple tasks and a more capable model for complex reasoning, or a specialized model for policy analysis.

The goal is to support multiple models per cluster, with an implicit default (first in the list) and optional alias-based routing so agents can specify which model to use via the standard `model` field in their API payloads.

### Desired UX

```bash
# Multiple models, first is implicit default
nemoclaw inference set --model openai/gpt-4o --model anthropic/claude-sonnet-4

# With explicit aliases
nemoclaw inference set --model default:openai/gpt-4o --model policy-analyzer:anthropic/claude-sonnet-4

# Agent sends {"model": "policy-analyzer", ...} → routes to claude-sonnet-4
# Agent sends {"model": "gpt-4o", ...} or omits model → routes to default (gpt-4o)
```

## Technical Context

The inference routing system has a clean layered architecture: CLI → gRPC → server storage → bundle delivery → sandbox polling → proxy interception → router backend. The system already supports a `repeated ResolvedRoute routes` field on the wire (`GetInferenceBundleResponse`), but only ever populates 0 or 1 route. The core change is widening the single-model assumption that runs through every layer.

Critically, the `model` field from sandbox API payloads is currently **overwritten unconditionally** in the router (`proxy_to_backend()`), so the agent's model preference is discarded. Multi-model routing requires intercepting the model field *before* route selection, using it to pick the correct route, then letting the existing overwrite mechanism set the correct backend model ID.

## Affected Components

| Component | Key Files | Role |
|-----------|-----------|------|
| CLI | `crates/navigator-cli/src/main.rs`, `crates/navigator-cli/src/run.rs` | Argument parsing and gRPC dispatch for `inference set/get/update` |
| Proto | `proto/inference.proto` | Wire format for inference config, routes, and bundles |
| Server | `crates/navigator-server/src/inference.rs` | Storage, validation, and bundle resolution |
| Store | `crates/navigator-server/src/persistence/mod.rs` | Generic key-value persistence for inference routes |
| Sandbox proxy | `crates/navigator-sandbox/src/proxy.rs` | TLS interception, route selection, request forwarding |
| Sandbox lifecycle | `crates/navigator-sandbox/src/lib.rs` | Bundle fetching, route refresh, proto→router conversion |
| Router | `crates/navigator-router/src/lib.rs`, `config.rs`, `backend.rs` | Route matching, backend proxying, model field overwrite |
| gRPC client | `crates/navigator-sandbox/src/grpc_client.rs` | Bundle fetching from gateway |
| Architecture docs | `architecture/inference-routing.md` | Documentation |

## Technical Investigation

### Architecture Overview

The inference routing pipeline flows as:

1. **CLI** (`nemoclaw inference set`) sends `SetClusterInferenceRequest` with single `provider_name` + `model_id` to the gateway via gRPC.
2. **Server** validates the provider, builds a `ClusterInferenceConfig`, and upserts a single `InferenceRoute` record with name `"inference.local"` into the persistent store.
3. **Bundle delivery**: When a sandbox calls `GetInferenceBundle`, the server loads the `"inference.local"` route, resolves provider credentials/endpoint/protocols dynamically, and returns a `GetInferenceBundleResponse` with `repeated ResolvedRoute routes` (currently 0 or 1 entry).
4. **Sandbox polling**: A background task polls every 5s, converts proto `ResolvedRoute` → router `ResolvedRoute`, and atomically replaces the route cache (`Arc<RwLock<Vec<ResolvedRoute>>>`).
5. **Proxy interception**: On each inference request, the proxy reads routes from the cache and passes the full `Vec<ResolvedRoute>` to `proxy_with_candidates()`.
6. **Router**: `proxy_with_candidates()` picks the first route matching the detected protocol. `proxy_to_backend()` then **overwrites** the `model` field in the request body with `route.model`.

### Code References

| Location | Description |
|----------|-------------|
| `crates/navigator-cli/src/main.rs:811-837` | `ClusterInferenceCommands::Set { provider, model }` — singular args |
| `crates/navigator-cli/src/run.rs:2787-2809` | `cluster_inference_set()` — builds single-model gRPC request |
| `crates/navigator-cli/src/run.rs:2852-2866` | `cluster_inference_get()` — displays single provider/model/version |
| `crates/navigator-cli/src/run.rs:2811-2849` | `cluster_inference_update()` — merges onto single config |
| `proto/inference.proto:33-38` | `ClusterInferenceConfig` — single `provider_name` + `model_id` |
| `proto/inference.proto:41-48` | `InferenceRoute` — wraps single config with id/name/version |
| `proto/inference.proto:50-55` | `SetClusterInferenceRequest` — single model fields |
| `proto/inference.proto:74-81` | `ResolvedRoute` — has `name`, `base_url`, `model_id`, `api_key`, etc. |
| `proto/inference.proto:83-89` | `GetInferenceBundleResponse` — already `repeated ResolvedRoute routes` |
| `crates/navigator-server/src/inference.rs:118-171` | `upsert_cluster_inference_route()` — single model upsert |
| `crates/navigator-server/src/inference.rs:260-291` | `resolve_inference_bundle()` — wraps single route in vec |
| `crates/navigator-server/src/inference.rs:293-340` | `resolve_managed_cluster_route()` — resolves one route by name |
| `crates/navigator-sandbox/src/proxy.rs:48-75` | `InferenceContext` — holds `routes: Arc<RwLock<Vec<ResolvedRoute>>>` |
| `crates/navigator-sandbox/src/proxy.rs:752-841` | `route_inference_request()` — passes all routes to router, no model filtering |
| `crates/navigator-sandbox/src/lib.rs:677-696` | `bundle_to_resolved_routes()` — proto→router conversion, **drops `name` field** |
| `crates/navigator-router/src/config.rs:37-46` | Router `ResolvedRoute` — no `name` field |
| `crates/navigator-router/src/lib.rs:59-97` | `proxy_with_candidates()` — selects first protocol-matching route |
| `crates/navigator-router/src/backend.rs:82-93` | `proxy_to_backend()` — **unconditionally overwrites** `model` field in body |

### Current Behavior

1. Agent sends `POST /v1/chat/completions` with `{"model": "anything", ...}`
2. Proxy intercepts, strips auth headers, detects inference pattern
3. `route_inference_request()` reads all routes (0 or 1), passes to `proxy_with_candidates()`
4. Router picks first protocol-matching route (always the single configured route)
5. `proxy_to_backend()` parses body JSON, **replaces** `model` with `route.model`, forwards to provider
6. Agent's `model` value is silently discarded

### What Would Need to Change

**Proto layer** (`proto/inference.proto`):
- New message for a model entry: alias (string) + provider_name (string) + model_id (string)
- `ClusterInferenceConfig` needs a `repeated` field of model entries instead of (or alongside) the single fields
- `SetClusterInferenceRequest` needs to accept multiple model entries
- `SetClusterInferenceResponse` and `GetClusterInferenceResponse` need to return the full model list
- `ResolvedRoute.name` field is already present and can carry the alias

**CLI layer** (`crates/navigator-cli/`):
- `--model` becomes a repeatable arg accepting `[alias:]provider/model` or `[alias:]model` (with separate `--provider` for provider-scoped syntax)
- Need to decide on CLI syntax for specifying provider per model (see Open Questions)
- `inference get` output needs a table format for multiple models
- `inference update` semantics need clarification (additive? replace all?)

**Server layer** (`crates/navigator-server/src/inference.rs`):
- `upsert_cluster_inference_route()` must handle multiple model entries
- Two storage strategies: (a) multiple `InferenceRoute` records (one per alias), or (b) single record with a list inside. Option (a) fits the existing store API better since names are unique per object type.
- `resolve_inference_bundle()` must iterate all routes and return multiple `ResolvedRoute` objects
- Each `ResolvedRoute` needs its `name` set to the alias

**Router layer** (`crates/navigator-router/`):
- `ResolvedRoute` in `config.rs` needs a `name: String` field for alias matching
- New route selection logic: match by `name` (alias) OR by `model` (original model ID), with fallback to default
- `proxy_with_candidates()` or a new method needs to accept a model name hint and filter routes accordingly

**Sandbox proxy layer** (`crates/navigator-sandbox/src/proxy.rs`):
- `route_inference_request()` must parse the request body to extract the `model` field *before* calling the router
- Pass the extracted model name to the router for route selection
- Handle missing/empty model field → use default route

**Sandbox lifecycle** (`crates/navigator-sandbox/src/lib.rs`):
- `bundle_to_resolved_routes()` must propagate the proto `ResolvedRoute.name` into the router `ResolvedRoute.name`

### Alternative Approaches Considered

**A. Multiple `InferenceRoute` store records (one per alias)**
- Fits existing store API (`put_message` with unique names)
- Each alias is a separate record: `"default"`, `"policy-analyzer"`, etc.
- `resolve_inference_bundle()` lists all records of type `inference_route` and resolves each
- Pro: Simple store operations, easy to add/remove individual models
- Con: Atomicity — setting multiple models isn't transactional

**B. Single `InferenceRoute` record with repeated model entries**
- Keep one `"inference.local"` record but expand the proto to hold a list
- Pro: Atomic updates, single version number for the whole config
- Con: Requires proto restructuring, update semantics are more complex

**C. Separate `InferenceModelConfig` object type**
- New store object type for model configs, referenced by the route
- Pro: Clean separation, extensible
- Con: More moving parts, over-engineered for current scope

**Recommendation**: Option B (single record, repeated entries) is likely cleanest — it preserves atomic versioning which matters for the revision-based change detection in sandbox polling. The version bump signals "something changed" and the sandbox replaces its entire route cache, which already works for N routes.

### Patterns to Follow

- The `Provider` CRUD in the server follows the same store pattern — look at how providers are stored/listed for reference
- The `repeated ResolvedRoute routes` in `GetInferenceBundleResponse` already anticipates multiple routes
- The `ResolvedRoute.name` field in the proto already exists and is set to `"inference.local"` — repurpose as alias
- Clap's `Vec<String>` pattern for repeatable args is idiomatic (used elsewhere in the CLI)

## Proposed Approach

Extend the proto `ClusterInferenceConfig` with a repeated model entry message (alias + provider + model), keeping the single-record storage model for atomic versioning. Add a `name` field to the router's `ResolvedRoute` and propagate aliases through the bundle delivery pipeline. In the sandbox proxy, extract the `model` field from the request body before route selection, match it against route aliases/model IDs, and fall back to the first route (default). The CLI accepts `--model` as a repeatable flag with `[alias:]model` syntax.

## Scope Assessment

- **Complexity:** Medium — changes touch many files but each change is well-scoped and the architecture is clean
- **Confidence:** High — the wire format already supports multiple routes, and the route cache is already a `Vec`
- **Estimated files to change:** ~12-15 (proto, CLI, server, router, sandbox proxy, sandbox lib, tests, docs)
- **Issue type:** `feat`

## Risks & Open Questions

- **CLI syntax for per-model providers**: If models can come from different providers, the current `--provider` flag doesn't work. Options: (a) `--model alias:provider/model`, (b) separate `--route alias --provider p --model m` triplets, (c) require provider in model string when ambiguous. Needs a design decision.
- **Update semantics**: Should `inference set` always replace the full model list, or should there be `inference add-model` / `inference remove-model` subcommands? The current `update` command merges fields — that pattern doesn't extend cleanly to lists.
- **Backward compatibility**: Existing clusters have a single `ClusterInferenceConfig` with `provider_name`/`model_id`. Proto migration needs to handle the old format gracefully (treat as single default model).
- **Alias collision**: What if an alias matches a real model ID from a different route? Need clear precedence rules (alias match wins over model ID match?).
- **Body parsing overhead**: Extracting the `model` field requires parsing the request body JSON before forwarding. This is already done in `proxy_to_backend()` — but moving it earlier means double-parsing. Consider parsing once and passing the parsed value through.
- **File-mode routes**: The `--inference-routes` YAML file path also needs to support multiple named routes. Lower priority but should be considered in the design.
- **Empty model field**: Some API clients may omit the `model` field entirely. The default route must handle this gracefully.

## Test Considerations

- **Server unit tests**: Existing tests in `crates/navigator-server/src/inference.rs:342-581` cover single-model set/get/resolve. Need parallel tests for multi-model: set multiple, resolve bundle returns all, alias round-trip, version increments atomically.
- **Router unit tests**: Test route selection by model name/alias, fallback to default, protocol + alias combined filtering.
- **Sandbox integration tests**: Test bundle conversion with multiple routes, route refresh replacing multi-route cache.
- **CLI integration**: Test repeatable `--model` flag parsing, alias syntax parsing, `inference get` multi-model display.
- **Backward compatibility test**: Ensure a cluster with old single-model config still works after upgrade (proto migration).
- **E2e test**: Agent sends requests with different `model` values, verify correct backend routing.

---
*Created by spike investigation. Use `build-from-issue` to plan and implement.*

Location	Description
`crates/navigator-cli/src/main.rs:811-837`	`ClusterInferenceCommands::Set { provider, model }` — singular args
`crates/navigator-cli/src/run.rs:2787-2809`	`cluster_inference_set()` — builds single-model gRPC request
`crates/navigator-cli/src/run.rs:2852-2866`	`cluster_inference_get()` — displays single provider/model/version
`crates/navigator-cli/src/run.rs:2811-2849`	`cluster_inference_update()` — merges onto single config
`proto/inference.proto:33-38`	`ClusterInferenceConfig` — single `provider_name` + `model_id`
`proto/inference.proto:41-48`	`InferenceRoute` — wraps single config with id/name/version
`proto/inference.proto:50-55`	`SetClusterInferenceRequest` — single model fields
`proto/inference.proto:74-81`	`ResolvedRoute` — has `name`, `base_url`, `model_id`, `api_key`, etc.
`proto/inference.proto:83-89`	`GetInferenceBundleResponse` — already `repeated ResolvedRoute routes`
`crates/navigator-server/src/inference.rs:118-171`	`upsert_cluster_inference_route()` — single model upsert
`crates/navigator-server/src/inference.rs:260-291`	`resolve_inference_bundle()` — wraps single route in vec
`crates/navigator-server/src/inference.rs:293-340`	`resolve_managed_cluster_route()` — resolves one route by name
`crates/navigator-sandbox/src/proxy.rs:48-75`	`InferenceContext` — holds `routes: Arc<RwLock<Vec<ResolvedRoute>>>`
`crates/navigator-sandbox/src/proxy.rs:752-841`	`route_inference_request()` — passes all routes to router, no model filtering
`crates/navigator-sandbox/src/lib.rs:677-696`	`bundle_to_resolved_routes()` — proto→router conversion, drops `name` field
`crates/navigator-router/src/config.rs:37-46`	Router `ResolvedRoute` — no `name` field
`crates/navigator-router/src/lib.rs:59-97`	`proxy_with_candidates()` — selects first protocol-matching route
`crates/navigator-router/src/backend.rs:82-93`	`proxy_to_backend()` — unconditionally overwrites `model` field in body

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support multiple models with alias-based routing in inference #203

Problem Statement

Desired UX

Technical Context

Affected Components

Technical Investigation

Architecture Overview

Code References

Current Behavior

What Would Need to Change

Alternative Approaches Considered

Patterns to Follow

Proposed Approach

Scope Assessment

Risks & Open Questions

Test Considerations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Component	Key Files	Role
CLI	`crates/navigator-cli/src/main.rs`, `crates/navigator-cli/src/run.rs`	Argument parsing and gRPC dispatch for `inference set/get/update`
Proto	`proto/inference.proto`	Wire format for inference config, routes, and bundles
Server	`crates/navigator-server/src/inference.rs`	Storage, validation, and bundle resolution
Store	`crates/navigator-server/src/persistence/mod.rs`	Generic key-value persistence for inference routes
Sandbox proxy	`crates/navigator-sandbox/src/proxy.rs`	TLS interception, route selection, request forwarding
Sandbox lifecycle	`crates/navigator-sandbox/src/lib.rs`	Bundle fetching, route refresh, proto→router conversion
Router	`crates/navigator-router/src/lib.rs`, `config.rs`, `backend.rs`	Route matching, backend proxying, model field overwrite
gRPC client	`crates/navigator-sandbox/src/grpc_client.rs`	Bundle fetching from gateway
Architecture docs	`architecture/inference-routing.md`	Documentation

feat: support multiple models with alias-based routing in inference #203

Description

Problem Statement

Desired UX

Technical Context

Affected Components

Technical Investigation

Architecture Overview

Code References

Current Behavior

What Would Need to Change

Alternative Approaches Considered

Patterns to Follow

Proposed Approach

Scope Assessment

Risks & Open Questions

Test Considerations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions