docker · dgageot · May 6, 2026 · May 6, 2026
@@ -94,6 +94,8 @@
       url: /features/acp/
     - title: API Server
       url: /features/api-server/
+    - title: Chat Server
+      url: /features/chat-server/
     - title: Evaluation
       url: /features/evaluation/
     - title: RAG

@@ -40,7 +40,7 @@ models:
 | Property              | Type       | Required | Description                                                                           |
 | --------------------- | ---------- | -------- | ------------------------------------------------------------------------------------- |
 | `provider`            | string     | ✓        | Provider: `openai`, `anthropic`, `google`, `amazon-bedrock`, `dmr`, `mistral`, `xai`, `nebius`, `minimax`, `requesty`, `azure`, `ollama`, `github-copilot`, or any [named provider]({{ '/providers/custom/' | relative_url }}). |
-| `model`               | string     | ✓        | Model name (e.g., `gpt-4o`, `claude-sonnet-4-0`, `gemini-2.5-flash`)                  |
+| `model`               | string     | ✓        | Model name (e.g., `gpt-4o`, `claude-sonnet-4-5`, `gemini-2.5-flash`)                  |
 | `temperature`         | float      | ✗        | Sampling randomness. Range is provider-dependent — typically `0.0–2.0` (Anthropic caps at `1.0`). `0.0` is deterministic. |
 | `max_tokens`          | int        | ✗        | Maximum response length in tokens                                                     |
 | `top_p`               | float      | ✗        | Nucleus sampling threshold (`0.0–1.0`)                                                |
@@ -232,7 +232,7 @@ models:
   # Anthropic
   claude:
     provider: anthropic
-    model: claude-sonnet-4-0
+    model: claude-sonnet-4-5
     max_tokens: 64000
 
   # Google Gemini

@@ -192,7 +192,7 @@ agents:
 ```yaml
 agents:
   classifier:
-    model: anthropic/claude-sonnet-4-0
+    model: anthropic/claude-sonnet-4-5
     description: Classify support tickets
     instruction: |
       Classify the support ticket into the appropriate category

@@ -395,7 +395,7 @@ toolsets:
 ```yaml
 agents:
   root:
-    model: anthropic/claude-sonnet-4-0
+    model: anthropic/claude-sonnet-4-5
     description: Full-featured developer assistant
     instruction: You are an expert developer.
     toolsets:

@@ -190,6 +190,6 @@ Toggle auto-approve with `POST /api/sessions/:id/tools/toggle` for automated wor
 <div class="callout callout-info" markdown="1">
 <div class="callout-title">ℹ️ See also
 </div>
-  <p>For interactive use, see the <a href="{{ '/features/tui/' | relative_url }}">Terminal UI</a>. For agent-to-agent communication, see <a href="{{ '/features/a2a/' | relative_url }}">A2A Protocol</a> and <a href="{{ '/features/acp/' | relative_url }}">ACP</a>. For MCP integration, see <a href="{{ '/features/mcp-mode/' | relative_url }}">MCP Mode</a>.</p>
+  <p>For interactive use, see the <a href="{{ '/features/tui/' | relative_url }}">Terminal UI</a>. For agent-to-agent communication, see <a href="{{ '/features/a2a/' | relative_url }}">A2A Protocol</a> and <a href="{{ '/features/acp/' | relative_url }}">ACP</a>. For MCP integration, see <a href="{{ '/features/mcp-mode/' | relative_url }}">MCP Mode</a>. For an OpenAI-compatible chat-completions API, see the <a href="{{ '/features/chat-server/' | relative_url }}">Chat Server</a>.</p>
 
 </div>
@@ -0,0 +1,230 @@
+---
+title: "Chat Server"
+description: "Expose your agents through an OpenAI-compatible Chat Completions API so any tool that already speaks OpenAI can drive a docker-agent agent."
+permalink: /features/chat-server/
+---
+
+# Chat Server
+
+_Expose your agents through an OpenAI-compatible Chat Completions API so any tool that already speaks OpenAI can drive a docker-agent agent._
+
+## Overview
+
+The `docker agent serve chat` command starts an HTTP server that exposes one or
+more agents through an **OpenAI-compatible Chat Completions API** at
+`/v1/chat/completions` and `/v1/models`. Any client that already speaks the
+OpenAI protocol — for example
+[Open WebUI](https://github.com/open-webui/open-webui), `curl`, the OpenAI
+Python SDK, or LangChain — can drive a docker-agent agent without any custom
+integration.
+
+```bash
+# Single agent — exposed as the model `root`
+$ docker agent serve chat agent.yaml
+
+# Multi-agent config — every agent in the team becomes a model
+$ docker agent serve chat ./team.yaml
+
+# Pick a specific agent from a multi-agent config
+$ docker agent serve chat ./team.yaml --agent reviewer
+
+# Run an agent straight from the registry
+$ docker agent serve chat agentcatalog/pirate --listen 127.0.0.1:9090
+
+# Require a Bearer token, sourced from an env var
+$ docker agent serve chat agent.yaml --api-key-env CHAT_BEARER_TOKEN
+```
+
+<div class="callout callout-tip" markdown="1">
+<div class="callout-title">💡 When to use chat server vs. API server
+</div>
+  <p>Use the <strong>chat server</strong> when you want to plug docker-agent into existing OpenAI-compatible tooling (chat UIs, IDE integrations, OpenAI SDK clients). Use the <a href="{{ '/features/api-server/' | relative_url }}">API server</a> when you want full control over sessions, agent execution, tool-call confirmations, and streamed runtime events.</p>
+
+</div>
+
+## Endpoints
+
+The OpenAI-compatible endpoints live under the `/v1` prefix to match the
+OpenAI API surface. The OpenAPI specification is served at the top level so it
+can be discovered without authentication.
+
+| Method | Path                   | Description                                                            |
+| ------ | ---------------------- | ---------------------------------------------------------------------- |
+| `GET`  | `/v1/models`           | List the agents that this server exposes as models                     |
+| `POST` | `/v1/chat/completions` | Send messages and receive a completion (regular or streaming)          |
+| `GET`  | `/openapi.json`        | OpenAPI specification for the chat server                              |
+
+The model identifier in `POST /v1/chat/completions` is the **agent name**.
+For a single-agent config that's typically `root`; for a multi-agent config,
+each named agent becomes its own selectable model.
+
+## Quick Start
+
+```bash
+# 1. Start the server
+$ docker agent serve chat agent.yaml
+Listening on 127.0.0.1:8083
+OpenAI-compatible chat completions endpoint: http://127.0.0.1:8083/v1/chat/completions
+
+# 2. List exposed agents (models)
+$ curl http://127.0.0.1:8083/v1/models
+{"object":"list","data":[{"id":"root","object":"model","owned_by":"docker-agent"}]}
+
+# 3. Send a chat request
+$ curl http://127.0.0.1:8083/v1/chat/completions \
+    -H 'Content-Type: application/json' \
+    -d '{
+      "model": "root",
+      "messages": [{"role": "user", "content": "Hello!"}]
+    }'
+```
+
+### Streaming
+
+Set `"stream": true` in the request body to receive a Server-Sent Events
+(SSE) stream of OpenAI-format `chat.completion.chunk` deltas:
+
+```bash
+$ curl -N http://127.0.0.1:8083/v1/chat/completions \
+    -H 'Content-Type: application/json' \
+    -d '{
+      "model": "root",
+      "stream": true,
+      "messages": [{"role": "user", "content": "Stream a poem"}]
+    }'
+```
+
+### Drive it from the OpenAI Python SDK
+
+Because the wire format is OpenAI-compatible, point any OpenAI client at the
+chat server's `base_url` and use the agent name as the model:
+
+```python
+from openai import OpenAI
+
+client = OpenAI(
+    base_url="http://127.0.0.1:8083/v1",
+    api_key="not-needed-when-no-api-key-flag",  # required by the SDK, ignored if no auth
+)
+
+resp = client.chat.completions.create(
+    model="root",
+    messages=[{"role": "user", "content": "Hello!"}],
+)
+print(resp.choices[0].message.content)
+```
+
+## Server-side Conversation Caching
+
+By default the server is **stateless**: every request must contain the full
+message history, exactly like OpenAI's API. Enable server-side caching by
+setting `--conversations-max` to a positive value, then send a stable
+`X-Conversation-Id` header on each request:
+
+```bash
+$ docker agent serve chat agent.yaml --conversations-max 100 --conversation-ttl 30m
+```
+
+```bash
+$ curl http://127.0.0.1:8083/v1/chat/completions \
+    -H 'Content-Type: application/json' \
+    -H 'X-Conversation-Id: my-thread-1' \
+    -d '{
+      "model": "root",
+      "messages": [{"role": "user", "content": "Remember my name is Alice"}]
+    }'
+
+$ curl http://127.0.0.1:8083/v1/chat/completions \
+    -H 'Content-Type: application/json' \
+    -H 'X-Conversation-Id: my-thread-1' \
+    -d '{
+      "model": "root",
+      "messages": [{"role": "user", "content": "What is my name?"}]
+    }'
+```
+
+Cached conversations are evicted after `--conversation-ttl` of inactivity, or
+when the cache hits `--conversations-max` items (oldest entries are evicted
+first).
+
+## Authentication
+
+The chat server has **no authentication by default**. To require a Bearer
+token, pass `--api-key` (literal value) or `--api-key-env` (name of an
+environment variable that holds the value):
+
+```bash
+$ docker agent serve chat agent.yaml --api-key-env CHAT_BEARER_TOKEN
+```
+
+Clients must then send an `Authorization: Bearer <token>` header on every
+request to `/v1/*`. Both `/v1/models` and `/v1/chat/completions` are
+protected once a key is set.
+
+<div class="callout callout-warning" markdown="1">
+<div class="callout-title">⚠️ Public exposure
+</div>
+  <p>The default listen address is <code>127.0.0.1:8083</code>. If you bind to a non-loopback address, always set <code>--api-key</code> or <code>--api-key-env</code> — there is no other authentication layer.</p>
+
+</div>
+
+## CORS
+
+CORS is **disabled by default**. To allow a browser-based client to call the
+server, set `--cors-origin` to the exact origin (scheme + host + port) that
+should be allowed:
+
+```bash
+$ docker agent serve chat agent.yaml --cors-origin https://my-ui.example.com
+```
+
+## CLI Flags
+
+```bash
+docker agent serve chat <agent-file>|<registry-ref> [flags]
+```
+
+| Flag                          | Default            | Description                                                                                                       |
+| ----------------------------- | ------------------ | ----------------------------------------------------------------------------------------------------------------- |
+| `-a, --agent <name>`          | (all agents)       | Name of the agent to expose. If omitted, every agent in the config is exposed as a separate model.                |
+| `-l, --listen <addr>`         | `127.0.0.1:8083`   | Address to listen on.                                                                                             |
+| `--cors-origin <origin>`      | (none)             | Allowed CORS origin (e.g. `https://example.com`). Empty disables CORS.                                            |
+| `--api-key <token>`           | (none)             | Required Bearer token clients must present (`Authorization: Bearer <token>`). Empty disables auth.                |
+| `--api-key-env <name>`        | (none)             | Read the API key from this environment variable instead of the command line.                                      |
+| `--max-request-size <bytes>`  | `1048576` (1 MiB)  | Maximum request body size.                                                                                        |
+| `--request-timeout <dur>`     | `5m`               | Per-request timeout (covers model + tool calls + streaming).                                                      |
+| `--conversations-max <n>`     | `0`                | Cache up to N conversations server-side, keyed by `X-Conversation-Id`. `0` disables — clients must resend history. |
+| `--conversation-ttl <dur>`    | `30m`              | Idle TTL after which a cached conversation is evicted.                                                            |
+| `--max-idle-runtimes <n>`     | `4`                | Maximum number of idle runtimes pooled per agent. `0` disables pooling.                                           |
+
+All [runtime configuration flags]({{ '/features/cli/#runtime-configuration-flags' | relative_url }})
+(`--working-dir`, `--env-from-file`, `--models-gateway`, `--hook-*`, …) are
+also accepted.
+
+## Open WebUI Integration
+
+Open WebUI can talk to any OpenAI-compatible endpoint. To plug docker-agent
+in:
+
+1. Start the chat server, optionally with auth:
+
+    ```bash
+    $ docker agent serve chat agent.yaml \
+        --listen 127.0.0.1:8083 \
+        --cors-origin http://localhost:3000 \
+        --api-key-env OPEN_WEBUI_TOKEN
+    ```
+
+2. In Open WebUI, add an OpenAI-compatible connection:
+
+    - **API Base URL:** `http://127.0.0.1:8083/v1`
+    - **API Key:** the value of `OPEN_WEBUI_TOKEN`
+
+3. Each agent in your config appears as a selectable model.
+
+<div class="callout callout-info" markdown="1">
+<div class="callout-title">ℹ️ See also
+</div>
+  <p>For the docker-agent–native HTTP API (sessions, tool-call confirmation, runtime events), see the <a href="{{ '/features/api-server/' | relative_url }}">API Server</a>. For full CLI flag documentation, see the <a href="{{ '/features/cli/#docker-agent-serve-chat' | relative_url }}">CLI Reference</a>.</p>
+
+</div>
@@ -65,8 +65,8 @@ $ docker agent run [config] [message...] [flags]
 $ docker agent run agent.yaml
 $ docker agent run agent.yaml "Fix the bug in auth.go"
 $ docker agent run agent.yaml -a developer --yolo
-$ docker agent run agent.yaml --model anthropic/claude-sonnet-4-0
-$ docker agent run agent.yaml --model "dev=openai/gpt-4o,reviewer=anthropic/claude-sonnet-4-0"
+$ docker agent run agent.yaml --model anthropic/claude-sonnet-4-5
+$ docker agent run agent.yaml --model "dev=openai/gpt-4o,reviewer=anthropic/claude-sonnet-4-5"
 $ docker agent run agent.yaml --session -1  # resume last session
 $ docker agent run agent.yaml --prompt-file ./context.md  # include file as context
 
@@ -265,6 +265,8 @@ $ curl http://127.0.0.1:8083/v1/chat/completions \
     -d '{"model": "root", "messages": [{"role": "user", "content": "hello"}]}'
 ```
 
+See [Chat Server]({{ '/features/chat-server/' | relative_url }}) for the full feature reference.
+
 ### `docker agent share push` / `docker agent share pull`
 
 Share agents via OCI registries.
@@ -344,7 +346,7 @@ $ docker agent alias add other ociReference
 # Add an alias with runtime options
 $ docker agent alias add yolo-coder agentcatalog/coder --yolo
 $ docker agent alias add fast-coder agentcatalog/coder --model openai/gpt-4o-mini
-$ docker agent alias add turbo agentcatalog/coder --yolo --model anthropic/claude-sonnet-4-0
+$ docker agent alias add turbo agentcatalog/coder --yolo --model anthropic/claude-sonnet-4-5
 
 # Use an alias
 $ docker agent run pirate
@@ -364,7 +366,7 @@ $ docker agent alias ls
 Registered aliases (3):
 
   fast-coder  → agentcatalog/coder [model=openai/gpt-4o-mini]
-  turbo       → agentcatalog/coder [yolo, model=anthropic/claude-sonnet-4-0]
+  turbo       → agentcatalog/coder [yolo, model=anthropic/claude-sonnet-4-5]
   yolo-coder  → agentcatalog/coder [yolo]
 
 Run an alias with: docker agent run <alias>

@@ -108,14 +108,14 @@ When you expose a multi-agent configuration via MCP, each agent becomes a separa
 ```yaml
 agents:
   root:
-    model: anthropic/claude-sonnet-4-0
+    model: anthropic/claude-sonnet-4-5
     description: Main coordinator
     sub_agents: [designer, engineer]
   designer:
     model: openai/gpt-5-mini
     description: UI/UX design specialist
   engineer:
-    model: anthropic/claude-sonnet-4-0
+    model: anthropic/claude-sonnet-4-5
     description: Software engineer
 ```
 

@@ -188,7 +188,7 @@ Combine multiple remote MCP servers in a single agent:
 ```yaml
 agents:
   root:
-    model: anthropic/claude-sonnet-4-0
+    model: anthropic/claude-sonnet-4-5
     instruction: |
       You help manage projects and deployments.
     toolsets:

@@ -294,7 +294,7 @@ openaiClient, _ := openai.NewClient(ctx, &latest.ModelConfig{
 // Anthropic
 anthropicClient, _ := anthropic.NewClient(ctx, &latest.ModelConfig{
     Provider: "anthropic",
-    Model:    "claude-sonnet-4-0",
+    Model:    "claude-sonnet-4-5",
 }, env)
 
 // Google Gemini