From 8341bd6bf657d90a41f3430d86ddcf877b6e363d Mon Sep 17 00:00:00 2001
From: Piotr Mlocek <pmlocek@nvidia.com>
Date: Mon, 9 Mar 2026 19:55:54 -0700
Subject: [PATCH 1/3] docs(inference): clarify local inference routing

---
 docs/get-started/run-opencode.md   |  53 ++++++------
 docs/index.md                      |   2 +-
 docs/inference/configure-routes.md |  92 ---------------------
 docs/inference/configure.md        | 100 +++++++++++++++++++++++
 docs/inference/index.md            | 127 +++++++++++++++--------------
 docs/reference/cli.md              |  38 +++++----
 docs/reference/policy-schema.md    |  23 +-----
 7 files changed, 213 insertions(+), 222 deletions(-)
 delete mode 100644 docs/inference/configure-routes.md
 create mode 100644 docs/inference/configure.md

diff --git a/docs/get-started/run-opencode.md b/docs/get-started/run-opencode.md
index e088ee4a..731166e3 100644
--- a/docs/get-started/run-opencode.md
+++ b/docs/get-started/run-opencode.md
@@ -5,15 +5,15 @@
 
 # Run OpenCode with NVIDIA Inference
 
-This tutorial walks you through a realistic setup where you run [OpenCode](https://opencode.ai) inside a OpenShell sandbox with inference routed to NVIDIA API endpoints. Along the way, you will hit a policy denial, diagnose it from logs, write a custom policy, and configure inference routing. This is the full policy iteration loop that you will use whenever you onboard a new tool.
+This tutorial walks you through a realistic setup where you run [OpenCode](https://opencode.ai) inside an OpenShell sandbox and configure private inference through `inference.local`. Along the way, you will hit a policy denial, diagnose it from logs, write a custom policy, and configure inference routing. This is the full policy iteration loop that you will use whenever you onboard a new tool.
 
 ## What You Will Learn
 
 - Create a provider manually using the `--from-existing` flag.
 - Write a custom policy to replace the default policy.
 - Read sandbox logs to diagnose denied actions.
-- Distinguish between agent traffic and userland inference.
-- Set up inference routes for code running inside the sandbox.
+- Distinguish between agent traffic and sandbox inference traffic.
+- Configure inference routing for code running inside the sandbox.
 
 ## Prerequisites
 
@@ -22,7 +22,7 @@ This tutorial walks you through a realistic setup where you run [OpenCode](https
 
 ## Create the Provider
 
-In the Claude Code tutorial, the CLI auto-discovered credentials. Here you create a provider explicitly, which gives you control over the provider name and type.
+In the Claude Code tutorial, the CLI auto-discovers credentials. Here you create a provider explicitly, which gives you control over the provider name and type.
 
 ```console
 $ nemoclaw provider create --name nvidia --type nvidia --from-existing
@@ -64,17 +64,16 @@ $ nemoclaw term
 
 Look for lines like these:
 
-```
+```text
 action=deny  host=integrate.api.nvidia.com  binary=/usr/local/bin/opencode  reason="no matching network policy"
 action=deny  host=opencode.ai               binary=/usr/bin/node            reason="no matching network policy"
-action=inspect_for_inference  host=integrate.api.nvidia.com  binary=/bin/bash
 ```
 
 Each log entry tells you the exact host, binary, and reason for the denial.
 
 ## Understand the Denial
 
-The default policy contains a `nvidia_inference` network policy entry, but it is configured for a narrow set of binaries — typically `/usr/local/bin/claude` and `/usr/bin/node`. When OpenCode makes HTTP calls through its own binary, `curl`, or a shell subprocess, those connections do not match any policy rule and get denied.
+The default policy contains a `nvidia_inference` network policy entry, but it is configured for a narrow set of binaries - typically `/usr/local/bin/claude` and `/usr/bin/node`. When OpenCode makes HTTP calls through its own binary, `curl`, or a shell subprocess, those connections do not match any policy rule and get denied.
 
 Two separate problems are at play:
 
@@ -89,9 +88,6 @@ Create a file called `opencode-policy.yaml` with the following content:
 
 ```yaml
 version: 1
-inference:
-  allowed_routes:
-    - nvidia
 filesystem_policy:
   include_workdir: true
   read_only:
@@ -180,15 +176,14 @@ network_policies:
       - path: /usr/bin/git
 ```
 
-This policy differs from the default in four key ways:
+This policy differs from the default in three key ways:
 
 - `opencode_api`: Allows OpenCode and Node.js to reach `opencode.ai:443`.
 - Broader `nvidia_inference` binaries: Adds `/usr/local/bin/opencode`, `/usr/bin/curl`, and `/bin/bash` so OpenCode's subprocesses can reach the NVIDIA endpoint.
-- `inference.allowed_routes`: Includes `nvidia` so inference routing works for userland code.
 - GitHub access: Scoped to support OpenCode's git operations.
 
 :::{warning}
-The `filesystem_policy`, `landlock`, and `process` sections are static. They are set at sandbox creation time and cannot be changed on a running sandbox. To modify these, delete and recreate the sandbox. The `network_policies` and `inference` sections are dynamic and can be hot-reloaded.
+The `filesystem_policy`, `landlock`, and `process` sections are static. They are set at sandbox creation time and cannot be changed on a running sandbox. To modify these, delete and recreate the sandbox. The `network_policies` section is dynamic and can be hot-reloaded.
 :::
 
 ## Apply the Policy
@@ -209,24 +204,26 @@ $ nemoclaw policy list opencode-sandbox
 
 The latest revision should show status `loaded`.
 
-## Set Up Inference Routing
+## Configure Inference Routing
 
-So far, you have allowed the OpenCode *agent* to reach `integrate.api.nvidia.com` directly through network policy. But code that OpenCode writes and runs inside the sandbox — scripts, notebooks, applications — uses a separate mechanism called the privacy router.
+So far, you have allowed the OpenCode *agent* to reach `integrate.api.nvidia.com` directly through network policy. But code that OpenCode writes and runs inside the sandbox should use `https://inference.local`.
 
-Create an inference route so userland code can access NVIDIA models:
+Configure inference so `inference.local` routes to your NVIDIA provider:
 
 ```console
-$ nemoclaw inference create \
-  --routing-hint nvidia \
-  --base-url https://integrate.api.nvidia.com \
-  --model-id z-ai/glm5 \
-  --api-key $NVIDIA_API_KEY
+$ nemoclaw inference set \
+  --provider nvidia \
+  --model z-ai/glm5
 ```
 
-The policy you wrote earlier already includes `nvidia` in `inference.allowed_routes`, so no policy update is needed. If you had omitted it, you would add the route to the policy and push again.
+Verify the active configuration:
+
+```console
+$ nemoclaw inference get
+```
 
 :::{note}
-*Network policies* and *inference routes* are two separate enforcement points. Network policies control which hosts the agent binary can reach directly. Inference routes control where LLM API calls from userland code get routed through the privacy proxy.
+*Network policies* and managed inference are two separate enforcement points. Network policies control which external hosts the agent binary can reach directly. Inference configuration controls where userland calls to `https://inference.local` are routed.
 :::
 
 ## Verify the Policy
@@ -237,7 +234,9 @@ Tail the logs again:
 $ nemoclaw logs opencode-sandbox --tail
 ```
 
-You should no longer see `action=deny` lines for the endpoints you added. Connections to `opencode.ai`, `integrate.api.nvidia.com`, and GitHub should show `action=allow`.
+You should see `action=allow` lines for the endpoints you added. Connections to `opencode.ai`, `integrate.api.nvidia.com`, and GitHub should show `action=allow`.
+
+To verify userland inference, run code inside the sandbox that targets `https://inference.local/v1`.
 
 If you still see denials, read the log line carefully. It tells you the exact host, port, and binary that was blocked. Add the missing entry to your policy and push again with `nemoclaw policy set`. This observe-modify-push cycle is the normal workflow for onboarding any new tool in OpenShell.
 
@@ -252,7 +251,7 @@ $ nemoclaw sandbox delete opencode-sandbox
 ## Next Steps
 
 - {doc}`../safety-and-privacy/policies`: Full reference on policy YAML structure, static and dynamic fields, and enforcement modes.
-- {doc}`../safety-and-privacy/policies`: How the proxy evaluates network rules and policy enforcement.
-- {doc}`../inference/index`: Inference route configuration, protocol detection, and transparent rerouting.
+- {doc}`../safety-and-privacy/policies`: How network policy fits into the sandbox iteration workflow.
+- {doc}`../inference/index`: `inference.local`, supported API patterns, and request routing.
 - {doc}`../sandboxes/providers`: Provider types, credential discovery, and manual and automatic creation.
-- {doc}`../safety-and-privacy/security-model`: The four protection layers and how they interact.
+- {doc}`../safety-and-privacy/security-model`: The protection layers and how they interact.
diff --git a/docs/index.md b/docs/index.md
index bb094a90..1c7d1956 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -221,7 +221,7 @@ safety-and-privacy/policies
 :hidden:
 
 inference/index
-inference/configure-routes
+inference/configure
 ```
 
 ```{toctree}
diff --git a/docs/inference/configure-routes.md b/docs/inference/configure-routes.md
deleted file mode 100644
index 1b487743..00000000
--- a/docs/inference/configure-routes.md
+++ /dev/null
@@ -1,92 +0,0 @@
-<!--
-  SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
-  SPDX-License-Identifier: Apache-2.0
--->
-
-# Configure Inference Routes
-
-This guide covers how to create and manage inference routes so that sandboxes can route AI API calls from userland code to policy-controlled backends. You will learn to create routes, connect them to sandboxes through policy, and manage routes across a cluster.
-
-:::{note}
-Inference routes are for *user code*, which are scripts and programs that the agent writes and executes inside the sandbox. The agent's own API traffic flows directly through network policies, not through inference routing. Refer to {doc}`../safety-and-privacy/policies` for the distinction between agent traffic and user traffic.
-:::
-
-## Create a Route
-
-Use `nemoclaw inference create` to register a new inference backend:
-
-```console
-$ nemoclaw inference create \
-    --routing-hint local \
-    --base-url https://my-llm.example.com \
-    --model-id my-model-v1 \
-    --api-key sk-abc123
-```
-
-This creates a route named after the routing hint. Any sandbox whose policy includes `local` in its `inference.allowed_routes` list can use this route. If you omit `--protocol`, the CLI probes the endpoint and auto-detects the supported protocol (refer to [Supported API Patterns](index.md#supported-api-patterns)). Refer to the [CLI Reference](../reference/cli.md#inference-create-flags) for all flags.
-
-## Manage Routes
-
-### List all routes
-
-```console
-$ nemoclaw inference list
-```
-
-### Update a route
-
-Change any field on an existing route:
-
-```console
-$ nemoclaw inference update <name> --base-url https://new-backend.example.com
-```
-
-```console
-$ nemoclaw inference update <name> --model-id updated-model-v2 --api-key sk-new-key
-```
-
-### Delete a route
-
-```console
-$ nemoclaw inference delete <name>
-```
-
-Deleting a route that is referenced by running sandboxes does not interrupt those sandboxes immediately. Future inference requests that would have matched the deleted route will be denied.
-
-## Connect a Sandbox to Routes
-
-Inference routes take effect only when a sandbox policy references the route's `routing_hint` in its `inference.allowed_routes` list.
-
-### Step 1: Add the routing hint to your policy
-
-```yaml
-inference:
-  allowed_routes:
-    - local
-```
-
-### Step 2: Create or update the sandbox with that policy
-
-```console
-$ nemoclaw sandbox create --policy ./my-policy.yaml --keep -- claude
-```
-
-Or, if the sandbox is already running, push an updated policy:
-
-```console
-$ nemoclaw policy set <name> --policy ./my-policy.yaml --wait
-```
-
-The `inference` section is a dynamic field, so you can add or remove routing hints on a running sandbox without recreating it.
-
-## Good to Know
-
-- Cluster-level: routes are shared across all sandboxes in the cluster, not scoped to one sandbox.
-- Per-model: each route maps to one model. Create multiple routes with the same `--routing-hint` but different `--model-id` values to expose multiple models.
-- Hot-reloadable: routes can be created, updated, or deleted at any time without restarting sandboxes.
-
-## Next Steps
-
-- {doc}`index`: understand the inference routing architecture, interception sequence, and routing hints.
-- {doc}`../safety-and-privacy/policies`: configure the network policies that control agent traffic (as opposed to userland inference traffic).
-- {doc}`../safety-and-privacy/policies`: the full policy iteration workflow.
diff --git a/docs/inference/configure.md b/docs/inference/configure.md
new file mode 100644
index 00000000..b0a6b7e0
--- /dev/null
+++ b/docs/inference/configure.md
@@ -0,0 +1,100 @@
+<!--
+  SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+  SPDX-License-Identifier: Apache-2.0
+-->
+
+# Configure Inference Routing
+
+OpenShell exposes one managed inference backend behind `https://inference.local`
+for the active gateway.
+
+External inference endpoints still go through sandbox `network_policies`. This
+page covers the special local inference endpoint only.
+
+That configuration consists of two values:
+
+- a provider record name
+- a model ID
+
+## Step 1: Create a Provider
+
+Create a provider that holds the backend credentials you want OpenShell to use.
+
+```console
+$ nemoclaw provider create --name nvidia-prod --type nvidia --from-existing
+```
+
+You can also use `openai` or `claude` providers.
+
+## Step 2: Set Inference Routing
+
+Point `inference.local` at that provider and choose the model to use:
+
+```console
+$ nemoclaw inference set \
+    --provider nvidia-prod \
+    --model meta/llama-3.1-8b-instruct
+```
+
+This sets the managed inference configuration.
+
+## Step 3: Verify the Active Config
+
+```console
+$ nemoclaw inference get
+provider: nvidia-prod
+model:    meta/llama-3.1-8b-instruct
+version:  1
+```
+
+## Step 4: Update Part of the Config
+
+Use `update` when you want to change only one field:
+
+```console
+$ nemoclaw inference update --model meta/llama-3.3-70b-instruct
+```
+
+Or switch providers without repeating the current model manually:
+
+```console
+$ nemoclaw inference update --provider openai-prod
+```
+
+## Use It from a Sandbox
+
+Once inference is configured, userland code inside any sandbox can call
+`https://inference.local` directly:
+
+```python
+from openai import OpenAI
+
+client = OpenAI(base_url="https://inference.local/v1", api_key="dummy")
+
+response = client.chat.completions.create(
+    model="anything",
+    messages=[{"role": "user", "content": "Hello"}],
+)
+```
+
+The client-supplied model is ignored for generation requests. OpenShell
+rewrites it to the configured model before forwarding upstream.
+
+Use this endpoint when inference should stay local to the host for privacy and
+security reasons. External providers that should be reached directly belong in
+`network_policies` instead.
+
+## Good to Know
+
+- Gateway-scoped: every sandbox on the active gateway sees the same
+  `inference.local` backend.
+- HTTPS only: `inference.local` is intercepted only for HTTPS traffic.
+
+## Next Steps
+
+- {doc}`index`: understand the interception flow and supported API patterns.
+- [Network access rules](/safety-and-privacy/policies.md#network-access-rules):
+  configure direct access to external inference endpoints.
+- {doc}`../sandboxes/providers`: create and manage provider records.
+- {doc}`../reference/cli`: see the CLI reference for `nemoclaw inference`
+  commands.
diff --git a/docs/inference/index.md b/docs/inference/index.md
index 9e4cbd61..017b141a 100644
--- a/docs/inference/index.md
+++ b/docs/inference/index.md
@@ -5,79 +5,80 @@
 
 # About Inference Routing
 
-The inference routing system keeps your AI inference traffic private by
-transparently intercepting API calls from sandboxed agents and rerouting them
-to backends you control.
-
-:::{note}
-Inference routing applies to userland traffic: code that the agent writes
-or runs, not the agent itself. The agent's own API calls (for example, Claude calling
-`api.anthropic.com`) go directly through network policy. Refer to
-{doc}`/safety-and-privacy/policies` for the distinction.
-:::
-
-## How It Works
-
-When userland code inside a sandbox makes an API call (for example, using the OpenAI
-or Anthropic SDK), the request flows through the sandbox proxy. If the
-destination does not match any explicit network policy but the sandbox has
-inference routes configured, the proxy:
-
-1. TLS-terminates the connection using the sandbox's ephemeral CA.
-2. Detects the inference API pattern (for example, `POST /v1/chat/completions`).
-3. Strips authorization headers and forwards to a matching backend.
-4. Rewrites the authorization with the route's API key and model ID.
-5. Returns the response to the agent's code. The agent sees a normal HTTP
-   response as if it came from the original API.
-
-The agent's code needs zero changes. Standard OpenAI/Anthropic SDK calls work
-transparently.
-
-```{mermaid}
-sequenceDiagram
-    participant Code as Userland Code
-    participant Proxy as Sandbox Proxy
-    participant OPA as Policy Engine
-    participant Router as Privacy Router
-    participant Backend as Your Backend
-
-    Code->>Proxy: CONNECT api.openai.com:443
-    Proxy->>OPA: evaluate policy
-    OPA-->>Proxy: InspectForInference
-    Proxy-->>Code: 200 Connection Established
-    Proxy->>Proxy: TLS terminate
-    Code->>Proxy: POST /v1/chat/completions
-    Proxy->>Router: route to matching backend
-    Router->>Backend: forwarded request
-    Backend-->>Router: response
-    Router-->>Proxy: response
-    Proxy-->>Code: HTTP 200 OK
-```
+OpenShell handles inference in two ways:
+
+- External inference endpoints are controlled by sandbox `network_policies`.
+- Each sandbox also exposes `https://inference.local`, a special endpoint for
+  inference that should stay local to the host for privacy and security.
+
+## External Inference
+
+If sandbox code calls an external inference API like `api.openai.com` or
+`api.anthropic.com`, that traffic is treated like any other outbound network
+request. It is allowed or denied by `network_policies`.
+
+Refer to {doc}`/safety-and-privacy/policies` and the
+[Network access rules](/safety-and-privacy/policies.md#network-access-rules)
+section for details.
+
+## `inference.local`
+
+Every sandbox also exposes a special endpoint: `https://inference.local`.
+
+This endpoint exists so inference can be routed to a model running locally on
+the same host. In the future, it can also route to a model managed by the
+cluster. It is the special case for inference that should stay local for
+privacy and security reasons.
+
+## Using `inference.local`
+
+When code inside a sandbox calls `https://inference.local`, OpenShell routes the
+request to the configured backend for that gateway.
+
+The configured model is applied to generation requests, and provider
+credentials are supplied by OpenShell rather than by code inside the sandbox.
+
+If code calls an external inference host directly, that traffic is evaluated
+only by `network_policies`.
 
 ## Supported API Patterns
 
-The proxy detects these inference patterns:
+Supported request patterns depend on the provider configured for
+`inference.local`.
+
+For OpenAI-compatible providers, these patterns are supported:
+
+| Pattern | Method | Path |
+|---|---|---|
+| OpenAI Chat Completions | `POST` | `/v1/chat/completions` |
+| OpenAI Completions | `POST` | `/v1/completions` |
+| OpenAI Responses | `POST` | `/v1/responses` |
+| Model Discovery | `GET` | `/v1/models` |
+| Model Discovery | `GET` | `/v1/models/*` |
+
+For Anthropic-compatible providers, this pattern is supported:
 
 | Pattern | Method | Path |
 |---|---|---|
-| OpenAI Chat Completions | POST | `/v1/chat/completions` |
-| OpenAI Completions | POST | `/v1/completions` |
-| Anthropic Messages | POST | `/v1/messages` |
+| Anthropic Messages | `POST` | `/v1/messages` |
 
-If an intercepted request does not match any known pattern, it is denied.
+Requests to `inference.local` that do not match the configured provider's
+supported patterns are denied.
 
 ## Key Properties
 
-- Zero code changes: standard SDK calls work transparently.
-- Inference privacy: prompts and responses stay on your infrastructure.
-- Credential isolation: the agent's code never sees your backend API key.
-- Policy-controlled: `inference.allowed_routes` determines which routes a
-  sandbox can use.
-- Hot-reloadable: update `allowed_routes` on a running sandbox without
-  restarting.
+- External endpoints use `network_policies`.
+- Explicit local endpoint: special local routing happens through
+  `inference.local`.
+- No sandbox API keys: credentials come from the configured provider record.
+- Single managed config: one provider and one model define sandbox inference.
+- Provider-agnostic: OpenAI, Anthropic, and NVIDIA providers all work through
+  the same endpoint.
+- Hot-refresh: provider credential changes and inference updates are picked up
+  without recreating sandboxes.
 
 ## Next Steps
 
-- {doc}`configure-routes`: Create and manage inference routes.
-- {doc}`/safety-and-privacy/policies`: Understand agent traffic versus
-  userland traffic and how network rules interact with inference routing.
+- {doc}`configure`: configure the backend behind `inference.local`.
+- [Network access rules](/safety-and-privacy/policies.md#network-access-rules):
+  understand how external endpoints are controlled.
diff --git a/docs/reference/cli.md b/docs/reference/cli.md
index d5e9a0af..35e79ed8 100644
--- a/docs/reference/cli.md
+++ b/docs/reference/cli.md
@@ -44,10 +44,9 @@ nemoclaw
 │   ├── update <name>
 │   └── delete <name>
 ├── inference
-│   ├── create
-│   ├── update <name>
-│   ├── delete <name>
-│   └── list
+│   ├── set
+│   ├── update
+│   └── get
 ├── term
 └── completions <shell>
 ```
@@ -133,25 +132,30 @@ Manage credential providers that inject secrets into sandboxes.
 
 ## Inference Commands
 
-Manage inference routes that intercept and reroute LLM API calls from userland code.
+Configure the backend used by `https://inference.local`.
 
-| Command | Description |
+### `nemoclaw inference set`
+
+Set the provider and model for managed inference. Both flags are required.
+
+| Flag | Description |
 |---|---|
-| `nemoclaw inference create` | Create a new inference route. See flag reference below. |
-| `nemoclaw inference update <name>` | Update an existing route's configuration. |
-| `nemoclaw inference delete <name>` | Delete an inference route. |
-| `nemoclaw inference list` | List all inference routes in the active cluster. |
+| `--provider` | Provider record name to use for injected credentials. |
+| `--model` | Model identifier to force on generation requests. |
+
+### `nemoclaw inference update`
 
-### Inference Create Flags
+Update only the fields you specify.
 
 | Flag | Description |
 |---|---|
-| `--routing-hint` | Short label that identifies this route (for example, `local`, `nvidia`, `staging`). Referenced by `allowed_routes` in sandbox policies. |
-| `--base-url` | Base URL of the inference backend (for example, `https://vllm.internal:8000`). |
-| `--model-id` | Model identifier to send to the backend (for example, `meta/llama-3.1-8b`). |
-| `--api-key` | API key for authenticating with the backend. |
-| `--protocol` | API protocol: `openai` or `anthropic`. Defaults to `openai`. |
-| `--disabled` | Create the route in a disabled state. |
+| `--provider` | Replace the current provider record. |
+| `--model` | Replace the current model ID. |
+
+### `nemoclaw inference get`
+
+Show the current inference configuration, including provider, model, and
+version.
 
 ## OpenShell Terminal
 
diff --git a/docs/reference/policy-schema.md b/docs/reference/policy-schema.md
index 033d1b09..5f626a87 100644
--- a/docs/reference/policy-schema.md
+++ b/docs/reference/policy-schema.md
@@ -15,7 +15,6 @@ filesystem_policy: { ... }
 landlock: { ... }
 process: { ... }
 network_policies: { ... }
-inference: { ... }
 ```
 
 | Field | Type | Required | Category | Description |
@@ -25,7 +24,6 @@ inference: { ... }
 | `landlock` | object | No | Static | Configures Landlock LSM enforcement behavior. |
 | `process` | object | No | Static | Sets the user and group the agent process runs as. |
 | `network_policies` | map | No | Dynamic | Declares which binaries can reach which network endpoints. |
-| `inference` | object | No | Dynamic | Controls which inference routing backends are available. |
 
 Static fields are set at sandbox creation time. Changing them requires destroying and recreating the sandbox. Dynamic fields can be updated on a running sandbox with `nemoclaw policy set` and take effect without restarting.
 
@@ -137,7 +135,7 @@ Each endpoint defines a reachable destination and optional inspection rules.
 | `protocol` | string | No | Set to `rest` to enable L7 (HTTP) inspection. Omit for L4-only (TCP passthrough). |
 | `tls` | string | No | TLS handling mode. `terminate` decrypts TLS at the proxy for inspection. `passthrough` forwards encrypted traffic without inspection. Only relevant when `protocol` is `rest`. |
 | `enforcement` | string | No | `enforce` actively blocks disallowed requests. `audit` logs violations but allows traffic through. |
-| `access` | string | No | HTTP access level. One of `read-only`, `read-write`, or `full`. Refer to table below. Mutually exclusive with `rules`. |
+| `access` | string | No | HTTP access level. One of `read-only`, `read-write`, or `full`. Mutually exclusive with `rules`. |
 | `rules` | list of rule objects | No | Fine-grained per-method, per-path allow rules. Mutually exclusive with `access`. |
 
 #### Access Levels
@@ -203,22 +201,3 @@ network_policies:
       - path: /usr/bin/npm
       - path: /usr/bin/node
 ```
-
-## Inference
-
-**Category:** Dynamic
-
-Controls which inference routing backends userland code can access. The `allowed_routes` list names route types that the privacy router will accept. Traffic matching an inference API pattern that targets a route type not in this list is denied.
-
-| Field | Type | Required | Description |
-|---|---|---|---|
-| `allowed_routes` | list of strings | No | Routing hint labels (e.g., `local`, `nvidia`, `staging`) that this sandbox can use. Must match the `routing_hint` of inference routes created with `nemoclaw inference create`. |
-
-Example:
-
-```yaml
-inference:
-  allowed_routes:
-    - local
-    - nvidia
-```

From e03cbbadb36edbe1404f7b23910d9798883679e0 Mon Sep 17 00:00:00 2001
From: Piotr Mlocek <pmlocek@nvidia.com>
Date: Mon, 9 Mar 2026 20:03:59 -0700
Subject: [PATCH 2/3] docs(inference): update provider and model examples

---
 docs/inference/configure.md        | 8 ++++----
 examples/local-inference/README.md | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/docs/inference/configure.md b/docs/inference/configure.md
index b0a6b7e0..ad437be0 100644
--- a/docs/inference/configure.md
+++ b/docs/inference/configure.md
@@ -24,7 +24,7 @@ Create a provider that holds the backend credentials you want OpenShell to use.
 $ nemoclaw provider create --name nvidia-prod --type nvidia --from-existing
 ```
 
-You can also use `openai` or `claude` providers.
+You can also use `openai` or `anthropic` providers.
 
 ## Step 2: Set Inference Routing
 
@@ -33,7 +33,7 @@ Point `inference.local` at that provider and choose the model to use:
 ```console
 $ nemoclaw inference set \
     --provider nvidia-prod \
-    --model meta/llama-3.1-8b-instruct
+    --model nvidia/nemotron-3-nano-30b-a3b
 ```
 
 This sets the managed inference configuration.
@@ -43,7 +43,7 @@ This sets the managed inference configuration.
 ```console
 $ nemoclaw inference get
 provider: nvidia-prod
-model:    meta/llama-3.1-8b-instruct
+model:    nvidia/nemotron-3-nano-30b-a3b
 version:  1
 ```
 
@@ -52,7 +52,7 @@ version:  1
 Use `update` when you want to change only one field:
 
 ```console
-$ nemoclaw inference update --model meta/llama-3.3-70b-instruct
+$ nemoclaw inference update --model nvidia/nemotron-3-nano-30b-a3b
 ```
 
 Or switch providers without repeating the current model manually:
diff --git a/examples/local-inference/README.md b/examples/local-inference/README.md
index 814b4ae6..0a05ed60 100644
--- a/examples/local-inference/README.md
+++ b/examples/local-inference/README.md
@@ -68,7 +68,7 @@ Then configure the cluster-managed `inference.local` route:
 # Example: use an existing provider record
 nemoclaw cluster inference set \
   --provider openai-prod \
-  --model gpt-4o-mini
+  --model nvidia/nemotron-3-nano-30b-a3b
 ```
 
 Verify the active config:

From 39554bb669c1289c138341c365d2b7f808e56e0d Mon Sep 17 00:00:00 2001
From: Miyoung Choi <miyoungc@nvidia.com>
Date: Mon, 9 Mar 2026 21:30:42 -0700
Subject: [PATCH 3/3] fix doc build

---
 docs/get-started/run-opencode.md | 4 ----
 docs/inference/configure.md      | 2 +-
 docs/inference/index.md          | 4 ++--
 3 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/docs/get-started/run-opencode.md b/docs/get-started/run-opencode.md
index 85ac0f00..de5ebf2c 100644
--- a/docs/get-started/run-opencode.md
+++ b/docs/get-started/run-opencode.md
@@ -254,8 +254,4 @@ $ nemoclaw sandbox delete opencode-sandbox
 - {doc}`../safety-and-privacy/policies`: How network policy fits into the sandbox iteration workflow.
 - {doc}`../inference/index`: `inference.local`, supported API patterns, and request routing.
 - {doc}`../sandboxes/providers`: Provider types, credential discovery, and manual and automatic creation.
-<<<<<<< pmlocek/inference-docs-pr
-- {doc}`../safety-and-privacy/security-model`: The protection layers and how they interact.
-=======
 - {doc}`../safety-and-privacy/index`: The four protection layers and how they interact.
->>>>>>> main
diff --git a/docs/inference/configure.md b/docs/inference/configure.md
index ad437be0..32dfd180 100644
--- a/docs/inference/configure.md
+++ b/docs/inference/configure.md
@@ -93,7 +93,7 @@ security reasons. External providers that should be reached directly belong in
 ## Next Steps
 
 - {doc}`index`: understand the interception flow and supported API patterns.
-- [Network access rules](/safety-and-privacy/policies.md#network-access-rules):
+- [Network policy evaluation](/safety-and-privacy/policies.md#network-policy-evaluation):
   configure direct access to external inference endpoints.
 - {doc}`../sandboxes/providers`: create and manage provider records.
 - {doc}`../reference/cli`: see the CLI reference for `nemoclaw inference`
diff --git a/docs/inference/index.md b/docs/inference/index.md
index 017b141a..da8abe08 100644
--- a/docs/inference/index.md
+++ b/docs/inference/index.md
@@ -18,7 +18,7 @@ If sandbox code calls an external inference API like `api.openai.com` or
 request. It is allowed or denied by `network_policies`.
 
 Refer to {doc}`/safety-and-privacy/policies` and the
-[Network access rules](/safety-and-privacy/policies.md#network-access-rules)
+[Network policy evaluation](/safety-and-privacy/policies.md#network-policy-evaluation)
 section for details.
 
 ## `inference.local`
@@ -80,5 +80,5 @@ supported patterns are denied.
 ## Next Steps
 
 - {doc}`configure`: configure the backend behind `inference.local`.
-- [Network access rules](/safety-and-privacy/policies.md#network-access-rules):
+- [Network policy evaluation](/safety-and-privacy/policies.md#network-policy-evaluation):
   understand how external endpoints are controlled.