Split thinking output into reasoning_content by Defilan · Pull Request #2 · defilantech/mlx-server

Defilan · 2026-05-15T18:02:50Z

What

Thinking models (Qwen3.x) emit their chain-of-thought inline. Until now that text landed in the response content, so opencode and other clients showed the reasoning as part of the answer. This separates it into an OpenAI-style reasoning_content field.

ReasoningSplitter — a streaming-safe classifier that splits model output into reasoning vs answer on <think> / </think> markers, holding back partial markers that straddle a chunk boundary.
--reasoning flag — auto (default; splits on a literal <think>), prefilled (output begins mid-thought because the chat template prefilled <think> — for Qwen3.5 / Qwen3.6), or off.
reasoning_content added to the non-streaming message and to streaming deltas.
8 ReasoningSplitter tests: both modes, token-by-token streaming, markers split across chunks, incomplete trailing markers.

Why

content should be the model's actual answer. Mixing reasoning into it is noisy for chat UIs and confuses agentic clients.

Verified

On Qwen3.6-35B-A3B-8bit with --reasoning prefilled: content is the clean answer ("42"), reasoning_content holds the thinking; streaming emits incremental reasoning_content then content deltas; tool calls unaffected. 28 tests pass.

Note

When reasoning precedes the answer, content keeps the leading whitespace that followed </think> (e.g. "\n\n42"). Trimming that is a small follow-up.

Thinking models (Qwen3.x) emit their reasoning inline, which previously landed in the response `content`. Separate it into an OpenAI-style `reasoning_content` field on both the message and streaming deltas. - ReasoningSplitter: streaming-safe <think>/</think> classifier, with partial-marker handling across chunk boundaries - --reasoning mode: auto (split on literal <think>), prefilled (output starts mid-thought, for Qwen3.5/3.6), or off - reasoning_content on the non-streaming message and on stream deltas - 8 splitter tests covering both modes, token-by-token streaming, and split markers Verified on Qwen3.6-35B-A3B with --reasoning prefilled: clean content, separated reasoning, streaming and tool calls intact.

Defilan merged commit beed802 into main May 15, 2026
1 check passed

Defilan deleted the feat/reasoning-content branch May 15, 2026 18:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split thinking output into reasoning_content#2

Split thinking output into reasoning_content#2
Defilan merged 1 commit into
mainfrom
feat/reasoning-content

Defilan commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Defilan commented May 15, 2026

What

Why

Verified

Note

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant