Classic298 · Classic298 · Apr 22, 2026 · Apr 22, 2026 · Apr 22, 2026
diff --git a/docs/features/extensibility/pipelines/pipes.md b/docs/features/extensibility/pipelines/pipes.md
@@ -20,3 +20,137 @@ Pipes that are defined in your WebUI show up as a new model with an "External" d
     ![Pipe Models in WebUI](/images/pipelines/pipe-model-example.png)
   </a>
 </div>
+
+## Streaming response format
+
+Pipes can return either a single `str` or an iterator/generator. When streaming, each yielded item can be:
+
+- **A plain string** — treated as assistant-visible text content and appended to the message as it arrives. This is the simplest form and the one most agent pipelines should use for regular output.
+- **An OpenAI-compatible SSE chunk dict** — same shape as the `/v1/chat/completions` streaming response, i.e.
+
+  ```python
+  {"choices": [{"delta": {"content": "..."}, "finish_reason": None}]}
+  ```
+
+  Use this when you need to set fields other than `content` (for example `finish_reason` on the final chunk).
+
+For a self-contained stream, close it with a single terminating chunk:
+
+```python
+yield {"choices": [{"delta": {}, "finish_reason": "stop"}]}
+```
+
+`finish_reason` should appear **exactly once**, at the end, and for a pipeline that handles its own tool execution it should always be `"stop"` — not `"tool_calls"` (see the next section).
+
+## Self-contained agents and `delta.tool_calls`
+
+This is the single biggest gotcha when building an agent pipeline (LangChain, LlamaIndex, a custom planner, anything that executes its own tools and streams the result back).
+
+`delta.tool_calls` in a chunk means **"please execute this tool call for me, client"**. When Open WebUI's middleware sees it, the tool executor picks up the call, runs it, appends a `role: "tool"` message, and fires a continuation request back at the same pipeline. It does this in a loop capped by `CHAT_RESPONSE_MAX_TOOL_CALL_RETRIES` (≈30).
+
+If your pipeline already executed the tool internally, emitting `delta.tool_calls` makes Open WebUI try to execute it *again* — and since the pipeline keeps emitting the same call on every retry, you get 30 copies of the response stacked on top of each other before the retry cap trips. Same thing happens if you set `finish_reason: "tool_calls"` mid-stream.
+
+**Rule of thumb:**
+
+- The model is calling a tool Open WebUI should run → emit `delta.tool_calls`, terminate with `finish_reason: "tool_calls"`, let the middleware call the tool and re-enter your pipeline.
+- The pipeline is running an agent that owns its own tools → **do not** emit `delta.tool_calls` at all. Render the tool execution as content using the `<details type="tool_calls">` block described below.
+
+### Rendering tool execution as content
+
+Open WebUI's own server-side tool path renders finished tool executions as `<details type="tool_calls">` blocks in the message content. You can emit the same block from an agent pipeline to get the identical "Called &lt;tool&gt;" chip with an expandable arguments + result view:
+
+```python
+import html
+import json
+
+call_id = "call_123"
+name = "get_weather_test"
+arguments = {"location": "SF"}
+result = {"temp_c": 22}
+
+details_block = (
+    f'<details type="tool_calls" done="true" '
+    f'id="{call_id}" name="{name}" '
+    f'arguments="{html.escape(json.dumps(arguments))}">\n'
+    f'<summary>Tool Executed</summary>\n'
+    f'{html.escape(json.dumps(result, ensure_ascii=False))}\n'
+    f'</details>\n'
+)
+```
+
+Yield `details_block` as content — either directly as a string (simplest on a Pipelines server) or inside a `delta.content` chunk:
+
+```python
+# Simplest — works on Pipelines servers:
+yield details_block
+
+# Or as an explicit OpenAI chunk:
+yield {"choices": [{"delta": {"content": details_block}, "finish_reason": None}]}
+```
+
+The final stream for a self-contained agent that ran one tool looks like this end-to-end:
+
+```python
+def pipe(self, user_message, model_id, messages, body):
+    # 1. Pre-tool narrative
+    yield {"choices": [{"delta": {"role": "assistant", "content": "Looking up the weather… "}, "finish_reason": None}]}
+
+    # 2. Agent runs the tool internally (not shown)
+    call_id = "call_123"
+    name = "get_weather_test"
+    arguments = {"location": "SF"}
+    result = {"temp_c": 22}
+
+    # 3. Render the execution as a <details> block — NOT delta.tool_calls
+    details_block = (
+        f'<details type="tool_calls" done="true" '
+        f'id="{call_id}" name="{name}" '
+        f'arguments="{html.escape(json.dumps(arguments))}">\n'
+        f'<summary>Tool Executed</summary>\n'
+        f'{html.escape(json.dumps(result, ensure_ascii=False))}\n'
+        f'</details>\n'
+    )
+    yield details_block
+
+    # 4. Post-tool narrative
+    yield "The weather is 22°C. Done."
+
+    # 5. Single terminating chunk
+    yield {"choices": [{"delta": {}, "finish_reason": "stop"}]}
+```
+
+### LangChain agent example
+
+Wiring a LangChain agent into this pattern — drop `tool_calls` on `AIMessageChunk`, render `ToolMessage` as a `<details>` block:
+
+```python
+import html
+import json
+
+for chunk in agent.stream({"messages": messages}, stream_mode=["updates", "messages"]):
+    if chunk["type"] != "messages":
+        continue
+    message = chunk["data"][0]
+
+    if isinstance(message, AIMessageChunk):
+        # Stream content only — drop message.tool_calls entirely.
+        if message.content:
+            yield message.content
+
+    elif isinstance(message, ToolMessage):
+        args = getattr(message, "args", {}) or {}
+        details = (
+            f'<details type="tool_calls" done="true" '
+            f'id="{message.tool_call_id}" name="{message.name}" '
+            f'arguments="{html.escape(json.dumps(args))}">\n'
+            f'<summary>Tool Executed</summary>\n'
+            f'{html.escape(json.dumps(message.content, ensure_ascii=False, default=str))}\n'
+            f'</details>\n'
+        )
+        yield details
+
+# Single terminating chunk
+yield {"choices": [{"delta": {}, "finish_reason": "stop"}]}
+```
+
+Reference discussion: [open-webui #23957](https://github.com/open-webui/open-webui/issues/23957) walks through the duplication symptom and the fix in detail.
diff --git a/docs/features/extensibility/plugin/functions/pipe.mdx b/docs/features/extensibility/plugin/functions/pipe.mdx
@@ -278,6 +278,12 @@ If you must use a synchronous third-party library in an async handler, wrap the
 
 You can modify this proxy Pipe to support additional service providers like Anthropic, Perplexity, and more by adjusting the API endpoints, headers, and logic within the `pipes` and `pipe` functions.
 
+:::caution Building a self-contained agent? Don't emit `delta.tool_calls`.
+If your Pipe wraps an agent (LangChain, LlamaIndex, a custom planner, …) that executes tools **internally** and then streams the final answer back to the chat, emitting `delta.tool_calls` in the stream will trigger Open WebUI's tool-execution retry loop — the middleware treats `delta.tool_calls` as "please execute this for me, client" and loops back through your pipe, duplicating the response up to `CHAT_RESPONSE_MAX_TOOL_CALL_RETRIES` (~30) times.
+
+For self-contained agents, render tool executions as `<details type="tool_calls">` content blocks instead — the same shape Open WebUI itself emits after internal tool execution. See the [Pipes → Self-contained agents and `delta.tool_calls`](/features/extensibility/pipelines/pipes#self-contained-agents-and-deltatool_calls) section for the full pattern, a LangChain example, and the rule of thumb for which path to take.
+:::
+
 ---
 
 ## Using Internal Open WebUI Functions

diff --git a/docs/features/extensibility/plugin/tools/index.mdx b/docs/features/extensibility/plugin/tools/index.mdx
@@ -142,19 +142,19 @@ Native Mode can be enabled at two levels:
 
 1.  **Universal Default for Every Model (Fastest — Recommended)**:
     *   Navigate to **Admin Panel → Settings → Models**.
-    *   Click the **gear icon** (⚙️) at the **top right** of the models list — this opens **global model parameters**, which apply to *every* model in your instance (current and future) unless a specific model overrides them.
-    *   Under **Advanced Parameters**, set **Function Calling** to `Native`.
+    *   Click the **Settings** button at the **top right** of the models list — this opens **global model parameters**, which apply to *every* model in your instance (current and future) unless a specific model overrides them.
+    *   Under **Model Parameters**, set **Function Calling** to `Native`.
     *   Save. All existing models that haven't explicitly set their own value, and all models you add later, inherit `Native`. You do **not** need to edit them one by one.
 2.  **Per-Model Override**:
     *   In **Admin Panel → Settings → Models**, pick a specific model and click its edit button.
-    *   Under **Advanced Parameters**, set **Function Calling** to `Native`. This value overrides the global default for that model only.
+    *   Under **Model Parameters**, set **Function Calling** to `Native`. This value overrides the global default for that model only.
     *   Use this when a specific model needs different parameters — otherwise prefer the global setting.
 3.  **Per-Chat Override**:
-    *   Inside a chat, click the ⚙️ **Chat Controls** icon.
+    *   Inside a chat, open **Chat Controls** (right sidebar).
     *   Under **Advanced Params**, set **Function Calling** to `Native`. Applies to that chat only.
 
 :::tip Set Function Calling Globally — Once, For All Models
-Tired of switching every model to Native one at a time? The **global model parameters** menu (the gear icon at the top right of **Admin Panel → Settings → Models**) lets you configure any advanced parameter — `function_calling`, temperature, top_p, max_tokens, etc. — **once, for every model in your Open WebUI instance**. Values set there become the default for every existing model that hasn't overridden them *and* every model you add later. Set `Function Calling = Native` there, save, done.
+Tired of switching every model to Native one at a time? The **global model parameters** panel (the **Settings** button at the top right of **Admin Panel → Settings → Models**) lets you configure any model parameter — `function_calling`, temperature, top_p, max_tokens, etc. — **once, for every model in your Open WebUI instance**. Values set there become the default for every existing model that hasn't overridden them *and* every model you add later. Set `Function Calling = Native` there, save, done.
 :::
 
 ![Chat Controls](/images/features/plugin/tools/chat-controls.png)

diff --git a/docs/getting-started/essentials.mdx b/docs/getting-started/essentials.mdx
@@ -110,8 +110,8 @@ Open WebUI has two tool-calling modes in the UI: **Native** and **Default**. **D
 
 Every mainstream model supports it — OpenAI, Anthropic, Gemini, Llama 3.1+, Qwen 2.5+, DeepSeek, GLM, and essentially any other current model. Turn it on:
 
-- **Best — once, for every model:** in **Admin Panel → Settings → Models**, click the ⚙️ gear at the **top right** of the models list. That opens **global model parameters** — set **Function Calling = Native** there, save, and every current *and future* model in your instance inherits it. No per-model click-through required.
-- Per-model override: **Admin Panel → Settings → Models → [your model] → Advanced Params → Function Calling = Native**
+- **Best — once, for every model:** in **Admin Panel → Settings → Models**, click the **Settings** button at the **top right** of the models list. That opens **global model parameters** — set **Function Calling = Native** there, save, and every current *and future* model in your instance inherits it. No per-model click-through required.
+- Per-model override: **Admin Panel → Settings → Models → [your model] → Model Parameters → Function Calling = Native**
 - Per-chat override: in a chat's **Chat Controls** (right sidebar)
 
 If a tool "isn't being called" on a capable model, 90% of the time Native Mode just needs flipping on. If a specific small model struggles with Native Mode, the fix is to use a stronger model for tool-using conversations — not to fall back to Default Mode.