Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 25 additions & 26 deletions docs/get-started/run-opencode.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,15 @@

# Run OpenCode with NVIDIA Inference

This tutorial walks you through a realistic setup where you run [OpenCode](https://opencode.ai) inside a OpenShell sandbox with inference routed to NVIDIA API endpoints. Along the way, you will hit a policy denial, diagnose it from logs, write a custom policy, and configure inference routing. This is the full policy iteration loop that you will use whenever you onboard a new tool.
This tutorial walks you through a realistic setup where you run [OpenCode](https://opencode.ai) inside an OpenShell sandbox and configure private inference through `inference.local`. Along the way, you will hit a policy denial, diagnose it from logs, write a custom policy, and configure inference routing. This is the full policy iteration loop that you will use whenever you onboard a new tool.

## What You Will Learn

- Create a provider manually using the `--from-existing` flag.
- Write a custom policy to replace the default policy.
- Read sandbox logs to diagnose denied actions.
- Distinguish between agent traffic and userland inference.
- Set up inference routes for code running inside the sandbox.
- Distinguish between agent traffic and sandbox inference traffic.
- Configure inference routing for code running inside the sandbox.

## Prerequisites

Expand All @@ -22,7 +22,7 @@ This tutorial walks you through a realistic setup where you run [OpenCode](https

## Create the Provider

In the Claude Code tutorial, the CLI auto-discovered credentials. Here you create a provider explicitly, which gives you control over the provider name and type.
In the Claude Code tutorial, the CLI auto-discovers credentials. Here you create a provider explicitly, which gives you control over the provider name and type.

```console
$ nemoclaw provider create --name nvidia --type nvidia --from-existing
Expand Down Expand Up @@ -64,17 +64,16 @@ $ nemoclaw term

Look for lines like these:

```
```text
action=deny host=integrate.api.nvidia.com binary=/usr/local/bin/opencode reason="no matching network policy"
action=deny host=opencode.ai binary=/usr/bin/node reason="no matching network policy"
action=inspect_for_inference host=integrate.api.nvidia.com binary=/bin/bash
```

Each log entry tells you the exact host, binary, and reason for the denial.

## Understand the Denial

The default policy contains a `nvidia_inference` network policy entry, but it is configured for a narrow set of binaries typically `/usr/local/bin/claude` and `/usr/bin/node`. When OpenCode makes HTTP calls through its own binary, `curl`, or a shell subprocess, those connections do not match any policy rule and get denied.
The default policy contains a `nvidia_inference` network policy entry, but it is configured for a narrow set of binaries - typically `/usr/local/bin/claude` and `/usr/bin/node`. When OpenCode makes HTTP calls through its own binary, `curl`, or a shell subprocess, those connections do not match any policy rule and get denied.

Two separate problems are at play:

Expand All @@ -89,9 +88,6 @@ Create a file called `opencode-policy.yaml` with the following content:

```yaml
version: 1
inference:
allowed_routes:
- nvidia
filesystem_policy:
include_workdir: true
read_only:
Expand Down Expand Up @@ -180,15 +176,14 @@ network_policies:
- path: /usr/bin/git
```

This policy differs from the default in four key ways:
This policy differs from the default in three key ways:

- `opencode_api`: Allows OpenCode and Node.js to reach `opencode.ai:443`.
- Broader `nvidia_inference` binaries: Adds `/usr/local/bin/opencode`, `/usr/bin/curl`, and `/bin/bash` so OpenCode's subprocesses can reach the NVIDIA endpoint.
- `inference.allowed_routes`: Includes `nvidia` so inference routing works for userland code.
- GitHub access: Scoped to support OpenCode's git operations.

:::{warning}
The `filesystem_policy`, `landlock`, and `process` sections are static. They are set at sandbox creation time and cannot be changed on a running sandbox. To modify these, delete and recreate the sandbox. The `network_policies` and `inference` sections are dynamic and can be hot-reloaded.
The `filesystem_policy`, `landlock`, and `process` sections are static. They are set at sandbox creation time and cannot be changed on a running sandbox. To modify these, delete and recreate the sandbox. The `network_policies` section is dynamic and can be hot-reloaded.
:::

## Apply the Policy
Expand All @@ -209,24 +204,26 @@ $ nemoclaw policy list opencode-sandbox

The latest revision should show status `loaded`.

## Set Up Inference Routing
## Configure Inference Routing

So far, you have allowed the OpenCode *agent* to reach `integrate.api.nvidia.com` directly through network policy. But code that OpenCode writes and runs inside the sandbox — scripts, notebooks, applications — uses a separate mechanism called the privacy router.
So far, you have allowed the OpenCode *agent* to reach `integrate.api.nvidia.com` directly through network policy. But code that OpenCode writes and runs inside the sandbox should use `https://inference.local`.

Create an inference route so userland code can access NVIDIA models:
Configure inference so `inference.local` routes to your NVIDIA provider:

```console
$ nemoclaw inference create \
--routing-hint nvidia \
--base-url https://integrate.api.nvidia.com \
--model-id z-ai/glm5 \
--api-key $NVIDIA_API_KEY
$ nemoclaw inference set \
--provider nvidia \
--model z-ai/glm5
```

The policy you wrote earlier already includes `nvidia` in `inference.allowed_routes`, so no policy update is needed. If you had omitted it, you would add the route to the policy and push again.
Verify the active configuration:

```console
$ nemoclaw inference get
```

:::{note}
*Network policies* and *inference routes* are two separate enforcement points. Network policies control which hosts the agent binary can reach directly. Inference routes control where LLM API calls from userland code get routed through the privacy proxy.
*Network policies* and managed inference are two separate enforcement points. Network policies control which external hosts the agent binary can reach directly. Inference configuration controls where userland calls to `https://inference.local` are routed.
:::

## Verify the Policy
Expand All @@ -237,7 +234,9 @@ Tail the logs again:
$ nemoclaw logs opencode-sandbox --tail
```

You should no longer see `action=deny` lines for the endpoints you added. Connections to `opencode.ai`, `integrate.api.nvidia.com`, and GitHub should show `action=allow`.
You should see `action=allow` lines for the endpoints you added. Connections to `opencode.ai`, `integrate.api.nvidia.com`, and GitHub should show `action=allow`.

To verify userland inference, run code inside the sandbox that targets `https://inference.local/v1`.

If you still see denials, read the log line carefully. It tells you the exact host, port, and binary that was blocked. Add the missing entry to your policy and push again with `nemoclaw policy set`. This observe-modify-push cycle is the normal workflow for onboarding any new tool in OpenShell.

Expand All @@ -252,7 +251,7 @@ $ nemoclaw sandbox delete opencode-sandbox
## Next Steps

- {doc}`../safety-and-privacy/policies`: Full reference on policy YAML structure, static and dynamic fields, and enforcement modes.
- {doc}`../safety-and-privacy/policies`: How the proxy evaluates network rules and policy enforcement.
- {doc}`../inference/index`: Inference route configuration, protocol detection, and transparent rerouting.
- {doc}`../safety-and-privacy/policies`: How network policy fits into the sandbox iteration workflow.
- {doc}`../inference/index`: `inference.local`, supported API patterns, and request routing.
- {doc}`../sandboxes/providers`: Provider types, credential discovery, and manual and automatic creation.
- {doc}`../safety-and-privacy/index`: The four protection layers and how they interact.
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,7 @@ safety-and-privacy/policies
:hidden:

inference/index
inference/configure-routes
inference/configure
```

```{toctree}
Expand Down
92 changes: 0 additions & 92 deletions docs/inference/configure-routes.md

This file was deleted.

100 changes: 100 additions & 0 deletions docs/inference/configure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
<!--
SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
-->

# Configure Inference Routing

OpenShell exposes one managed inference backend behind `https://inference.local`
for the active gateway.

External inference endpoints still go through sandbox `network_policies`. This
page covers the special local inference endpoint only.

That configuration consists of two values:

- a provider record name
- a model ID

## Step 1: Create a Provider

Create a provider that holds the backend credentials you want OpenShell to use.

```console
$ nemoclaw provider create --name nvidia-prod --type nvidia --from-existing
```

You can also use `openai` or `anthropic` providers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: suggest writing "provider types."


## Step 2: Set Inference Routing

Point `inference.local` at that provider and choose the model to use:

```console
$ nemoclaw inference set \
--provider nvidia-prod \
--model nvidia/nemotron-3-nano-30b-a3b
```

This sets the managed inference configuration.

## Step 3: Verify the Active Config

```console
$ nemoclaw inference get
provider: nvidia-prod
model: nvidia/nemotron-3-nano-30b-a3b
version: 1
```

## Step 4: Update Part of the Config

Use `update` when you want to change only one field:

```console
$ nemoclaw inference update --model nvidia/nemotron-3-nano-30b-a3b
```

Or switch providers without repeating the current model manually:

```console
$ nemoclaw inference update --provider openai-prod
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to update the provider without updating the model?

```

## Use It from a Sandbox

Once inference is configured, userland code inside any sandbox can call
`https://inference.local` directly:

```python
from openai import OpenAI

client = OpenAI(base_url="https://inference.local/v1", api_key="dummy")

response = client.chat.completions.create(
model="anything",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we recommend users pass here? Can model be omitted?

messages=[{"role": "user", "content": "Hello"}],
)
```

The client-supplied model is ignored for generation requests. OpenShell
rewrites it to the configured model before forwarding upstream.

Use this endpoint when inference should stay local to the host for privacy and
security reasons. External providers that should be reached directly belong in
`network_policies` instead.

## Good to Know

- Gateway-scoped: every sandbox on the active gateway sees the same
`inference.local` backend.
- HTTPS only: `inference.local` is intercepted only for HTTPS traffic.

## Next Steps

- {doc}`index`: understand the interception flow and supported API patterns.
- [Network policy evaluation](/safety-and-privacy/policies.md#network-policy-evaluation):
configure direct access to external inference endpoints.
- {doc}`../sandboxes/providers`: create and manage provider records.
- {doc}`../reference/cli`: see the CLI reference for `nemoclaw inference`
commands.
Loading
Loading