From f81575a573ccb76ef12732d7e7ce228fc3194730 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Wed, 22 Apr 2026 15:44:55 +0000
Subject: [PATCH 1/2] docs: document streaming format + self-contained-agent
 tool-call pattern
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

A recurring support question from pipeline authors: their agent-backed
pipe (LangChain, LlamaIndex, custom planner) duplicates its response
~30 times in the chat because it emits delta.tool_calls while also
executing the tool internally. The docs didn't cover any of:

- What a Pipe is actually allowed to yield when streaming (plain
  strings vs OpenAI-format chunks, where finish_reason goes).
- The semantics of delta.tool_calls ("please execute this for me,
  client") and why emitting it from a self-contained agent triggers
  the CHAT_RESPONSE_MAX_TOOL_CALL_RETRIES retry loop.
- The <details type="tool_calls"> content-embedded block — the
  exact shape Open WebUI's own server-side tool path emits after
  internal tool execution, which agent pipes can emit themselves
  to render a proper "Called <tool>" chip without triggering the
  execute-and-retry loop.

Expanded docs/features/extensibility/pipelines/pipes.md (previously a
22-line stub) with:

- "Streaming response format" section covering plain string yields,
  chunk-dict yields, and the "single terminating finish_reason: stop
  chunk at the end" rule.
- "Self-contained agents and delta.tool_calls" section with the full
  rule of thumb (model wants OWUI to run the tool vs. agent already
  ran it), a minimal end-to-end pipe example, and a LangChain-
  specific loop mapping.
- Link to the source issue (#23957) for anyone debugging the exact
  duplication symptom.

Also added a :::caution cross-reference from the Function Pipes page
(plugin/functions/pipe.mdx) pointing at the same section, since the
pattern applies identically to in-process Pipes that own their tool
execution — it's not a Pipelines-only concern.

https://claude.ai/code/session_01K54Vwr1NMczgh9vmQkWe17
---
 .../features/extensibility/pipelines/pipes.md | 134 ++++++++++++++++++
 .../extensibility/plugin/functions/pipe.mdx   |   6 +
 2 files changed, 140 insertions(+)
diff --git a/docs/features/extensibility/pipelines/pipes.md b/docs/features/extensibility/pipelines/pipes.md
index 849b297b4c..a02365b672 100644
--- a/docs/features/extensibility/pipelines/pipes.md
+++ b/docs/features/extensibility/pipelines/pipes.md
@@ -20,3 +20,137 @@ Pipes that are defined in your WebUI show up as a new model with an "External" d
     ![Pipe Models in WebUI](/images/pipelines/pipe-model-example.png)
   </a>
 </div>
+
+## Streaming response format
+
+Pipes can return either a single `str` or an iterator/generator. When streaming, each yielded item can be:
+
+- **A plain string** — treated as assistant-visible text content and appended to the message as it arrives. This is the simplest form and the one most agent pipelines should use for regular output.
+- **An OpenAI-compatible SSE chunk dict** — same shape as the `/v1/chat/completions` streaming response, i.e.
+
+  ```python
+  {"choices": [{"delta": {"content": "..."}, "finish_reason": None}]}
+  ```
+
+  Use this when you need to set fields other than `content` (for example `finish_reason` on the final chunk).
+
+For a self-contained stream, close it with a single terminating chunk:
+
+```python
+yield {"choices": [{"delta": {}, "finish_reason": "stop"}]}
+```
+
+`finish_reason` should appear **exactly once**, at the end, and for a pipeline that handles its own tool execution it should always be `"stop"` — not `"tool_calls"` (see the next section).
+
+## Self-contained agents and `delta.tool_calls`
+
+This is the single biggest gotcha when building an agent pipeline (LangChain, LlamaIndex, a custom planner, anything that executes its own tools and streams the result back).
+
+`delta.tool_calls` in a chunk means **"please execute this tool call for me, client"**. When Open WebUI's middleware sees it, the tool executor picks up the call, runs it, appends a `role: "tool"` message, and fires a continuation request back at the same pipeline. It does this in a loop capped by `CHAT_RESPONSE_MAX_TOOL_CALL_RETRIES` (≈30).
+
+If your pipeline already executed the tool internally, emitting `delta.tool_calls` makes Open WebUI try to execute it *again* — and since the pipeline keeps emitting the same call on every retry, you get 30 copies of the response stacked on top of each other before the retry cap trips. Same thing happens if you set `finish_reason: "tool_calls"` mid-stream.
+
+**Rule of thumb:**
+
+- The model is calling a tool Open WebUI should run → emit `delta.tool_calls`, terminate with `finish_reason: "tool_calls"`, let the middleware call the tool and re-enter your pipeline.
+- The pipeline is running an agent that owns its own tools → **do not** emit `delta.tool_calls` at all. Render the tool execution as content using the `<details type="tool_calls">` block described below.
+
+### Rendering tool execution as content
+
+Open WebUI's own server-side tool path renders finished tool executions as `<details type="tool_calls">` blocks in the message content. You can emit the same block from an agent pipeline to get the identical "Called &lt;tool&gt;" chip with an expandable arguments + result view:
+
+```python
+import html
+import json
+
+call_id = "call_123"
+name = "get_weather_test"
+arguments = {"location": "SF"}
+result = {"temp_c": 22}
+
+details_block = (
+    f'<details type="tool_calls" done="true" '
+    f'id="{call_id}" name="{name}" '
+    f'arguments="{html.escape(json.dumps(arguments))}">\n'
+    f'<summary>Tool Executed</summary>\n'
+    f'{html.escape(json.dumps(result, ensure_ascii=False))}\n'
+    f'</details>\n'
+)
+```
+
+Yield `details_block` as content — either directly as a string (simplest on a Pipelines server) or inside a `delta.content` chunk:
+
+```python
+# Simplest — works on Pipelines servers:
+yield details_block
+
+# Or as an explicit OpenAI chunk:
+yield {"choices": [{"delta": {"content": details_block}, "finish_reason": None}]}
+```
+
+The final stream for a self-contained agent that ran one tool looks like this end-to-end:
+
+```python
+def pipe(self, user_message, model_id, messages, body):
+    # 1. Pre-tool narrative
+    yield {"choices": [{"delta": {"role": "assistant", "content": "Looking up the weather… "}, "finish_reason": None}]}
+
+    # 2. Agent runs the tool internally (not shown)
+    call_id = "call_123"
+    name = "get_weather_test"
+    arguments = {"location": "SF"}
+    result = {"temp_c": 22}
+
+    # 3. Render the execution as a <details> block — NOT delta.tool_calls
+    details_block = (
+        f'<details type="tool_calls" done="true" '
+        f'id="{call_id}" name="{name}" '
+        f'arguments="{html.escape(json.dumps(arguments))}">\n'
+        f'<summary>Tool Executed</summary>\n'
+        f'{html.escape(json.dumps(result, ensure_ascii=False))}\n'
+        f'</details>\n'
+    )
+    yield details_block
+
+    # 4. Post-tool narrative
+    yield "The weather is 22°C. Done."
+
+    # 5. Single terminating chunk
+    yield {"choices": [{"delta": {}, "finish_reason": "stop"}]}
+```
+
+### LangChain agent example
+
+Wiring a LangChain agent into this pattern — drop `tool_calls` on `AIMessageChunk`, render `ToolMessage` as a `<details>` block:
+
+```python
+import html
+import json
+
+for chunk in agent.stream({"messages": messages}, stream_mode=["updates", "messages"]):
+    if chunk["type"] != "messages":
+        continue
+    message = chunk["data"][0]
+
+    if isinstance(message, AIMessageChunk):
+        # Stream content only — drop message.tool_calls entirely.
+        if message.content:
+            yield message.content
+
+    elif isinstance(message, ToolMessage):
+        args = getattr(message, "args", {}) or {}
+        details = (
+            f'<details type="tool_calls" done="true" '
+            f'id="{message.tool_call_id}" name="{message.name}" '
+            f'arguments="{html.escape(json.dumps(args))}">\n'
+            f'<summary>Tool Executed</summary>\n'
+            f'{html.escape(json.dumps(message.content, ensure_ascii=False, default=str))}\n'
+            f'</details>\n'
+        )
+        yield details
+
+# Single terminating chunk
+yield {"choices": [{"delta": {}, "finish_reason": "stop"}]}
+```
+
+Reference discussion: [open-webui #23957](https://github.com/open-webui/open-webui/issues/23957) walks through the duplication symptom and the fix in detail.
diff --git a/docs/features/extensibility/plugin/functions/pipe.mdx b/docs/features/extensibility/plugin/functions/pipe.mdx
index 7c7b5beee9..8eb46f9235 100644
--- a/docs/features/extensibility/plugin/functions/pipe.mdx
+++ b/docs/features/extensibility/plugin/functions/pipe.mdx
@@ -278,6 +278,12 @@ If you must use a synchronous third-party library in an async handler, wrap the
 
 You can modify this proxy Pipe to support additional service providers like Anthropic, Perplexity, and more by adjusting the API endpoints, headers, and logic within the `pipes` and `pipe` functions.
 
+:::caution Building a self-contained agent? Don't emit `delta.tool_calls`.
+If your Pipe wraps an agent (LangChain, LlamaIndex, a custom planner, …) that executes tools **internally** and then streams the final answer back to the chat, emitting `delta.tool_calls` in the stream will trigger Open WebUI's tool-execution retry loop — the middleware treats `delta.tool_calls` as "please execute this for me, client" and loops back through your pipe, duplicating the response up to `CHAT_RESPONSE_MAX_TOOL_CALL_RETRIES` (~30) times.
+
+For self-contained agents, render tool executions as `<details type="tool_calls">` content blocks instead — the same shape Open WebUI itself emits after internal tool execution. See the [Pipes → Self-contained agents and `delta.tool_calls`](/features/extensibility/pipelines/pipes#self-contained-agents-and-deltatool_calls) section for the full pattern, a LangChain example, and the rule of thumb for which path to take.
+:::
+
 ---
 
 ## Using Internal Open WebUI Functions

From 87c7ec965e76b895dda72a7ede50edb4cee39924 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Wed, 22 Apr 2026 16:11:55 +0000
Subject: [PATCH 2/2] =?UTF-8?q?docs:=20fix=20Native=20Mode=20UI=20names=20?=
 =?UTF-8?q?=E2=80=94=20Settings=20button=20+=20Model=20Parameters?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

A user on r/OpenWebUI flagged that the Native Mode enable steps don't
match the current UI. Two mismatches in the sections I added:

- The top-right control on Admin Panel → Settings → Models is labeled
  "Settings", not "gear icon (⚙️)". Renamed in both tools/index.mdx
  (Enable Native Mode list + the tip callout) and essentials.mdx.

- The section inside the global-settings panel is called "Model
  Parameters", not "Advanced Parameters". Fixed in the same three
  places plus the per-model override path (also "Model Parameters"
  there).

Per-chat Chat Controls still say "Advanced Params" in the UI so that
path is unchanged. Tightened the phrasing there slightly to say
"open Chat Controls (right sidebar)" rather than "click the ⚙️ Chat
Controls icon" since the exact per-chat control shape can also shift.

https://claude.ai/code/session_01K54Vwr1NMczgh9vmQkWe17
---
 docs/features/extensibility/plugin/tools/index.mdx | 10 +++++-----
 docs/getting-started/essentials.mdx                |  4 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/docs/features/extensibility/plugin/tools/index.mdx b/docs/features/extensibility/plugin/tools/index.mdx
index 44bbbdaa8d..1b4b354cf6 100644
--- a/docs/features/extensibility/plugin/tools/index.mdx
+++ b/docs/features/extensibility/plugin/tools/index.mdx
@@ -142,19 +142,19 @@ Native Mode can be enabled at two levels:
 
 1.  **Universal Default for Every Model (Fastest — Recommended)**:
     *   Navigate to **Admin Panel → Settings → Models**.
-    *   Click the **gear icon** (⚙️) at the **top right** of the models list — this opens **global model parameters**, which apply to *every* model in your instance (current and future) unless a specific model overrides them.
-    *   Under **Advanced Parameters**, set **Function Calling** to `Native`.
+    *   Click the **Settings** button at the **top right** of the models list — this opens **global model parameters**, which apply to *every* model in your instance (current and future) unless a specific model overrides them.
+    *   Under **Model Parameters**, set **Function Calling** to `Native`.
     *   Save. All existing models that haven't explicitly set their own value, and all models you add later, inherit `Native`. You do **not** need to edit them one by one.
 2.  **Per-Model Override**:
     *   In **Admin Panel → Settings → Models**, pick a specific model and click its edit button.
-    *   Under **Advanced Parameters**, set **Function Calling** to `Native`. This value overrides the global default for that model only.
+    *   Under **Model Parameters**, set **Function Calling** to `Native`. This value overrides the global default for that model only.
     *   Use this when a specific model needs different parameters — otherwise prefer the global setting.
 3.  **Per-Chat Override**:
-    *   Inside a chat, click the ⚙️ **Chat Controls** icon.
+    *   Inside a chat, open **Chat Controls** (right sidebar).
     *   Under **Advanced Params**, set **Function Calling** to `Native`. Applies to that chat only.
 
 :::tip Set Function Calling Globally — Once, For All Models
-Tired of switching every model to Native one at a time? The **global model parameters** menu (the gear icon at the top right of **Admin Panel → Settings → Models**) lets you configure any advanced parameter — `function_calling`, temperature, top_p, max_tokens, etc. — **once, for every model in your Open WebUI instance**. Values set there become the default for every existing model that hasn't overridden them *and* every model you add later. Set `Function Calling = Native` there, save, done.
+Tired of switching every model to Native one at a time? The **global model parameters** panel (the **Settings** button at the top right of **Admin Panel → Settings → Models**) lets you configure any model parameter — `function_calling`, temperature, top_p, max_tokens, etc. — **once, for every model in your Open WebUI instance**. Values set there become the default for every existing model that hasn't overridden them *and* every model you add later. Set `Function Calling = Native` there, save, done.
 :::
 
 ![Chat Controls](/images/features/plugin/tools/chat-controls.png)
diff --git a/docs/getting-started/essentials.mdx b/docs/getting-started/essentials.mdx
index 0d6a755199..d48951b3b4 100644
--- a/docs/getting-started/essentials.mdx
+++ b/docs/getting-started/essentials.mdx
@@ -110,8 +110,8 @@ Open WebUI has two tool-calling modes in the UI: **Native** and **Default**. **D
 
 Every mainstream model supports it — OpenAI, Anthropic, Gemini, Llama 3.1+, Qwen 2.5+, DeepSeek, GLM, and essentially any other current model. Turn it on:
 
-- **Best — once, for every model:** in **Admin Panel → Settings → Models**, click the ⚙️ gear at the **top right** of the models list. That opens **global model parameters** — set **Function Calling = Native** there, save, and every current *and future* model in your instance inherits it. No per-model click-through required.
-- Per-model override: **Admin Panel → Settings → Models → [your model] → Advanced Params → Function Calling = Native**
+- **Best — once, for every model:** in **Admin Panel → Settings → Models**, click the **Settings** button at the **top right** of the models list. That opens **global model parameters** — set **Function Calling = Native** there, save, and every current *and future* model in your instance inherits it. No per-model click-through required.
+- Per-model override: **Admin Panel → Settings → Models → [your model] → Model Parameters → Function Calling = Native**
 - Per-chat override: in a chat's **Chat Controls** (right sidebar)
 
 If a tool "isn't being called" on a capable model, 90% of the time Native Mode just needs flipping on. If a specific small model struggles with Native Mode, the fix is to use a stronger model for tool-using conversations — not to fall back to Default Mode.