Skip to content

Split thinking output into reasoning_content#2

Merged
Defilan merged 1 commit into
mainfrom
feat/reasoning-content
May 15, 2026
Merged

Split thinking output into reasoning_content#2
Defilan merged 1 commit into
mainfrom
feat/reasoning-content

Conversation

@Defilan
Copy link
Copy Markdown
Member

@Defilan Defilan commented May 15, 2026

What

Thinking models (Qwen3.x) emit their chain-of-thought inline. Until now that text landed in the response content, so opencode and other clients showed the reasoning as part of the answer. This separates it into an OpenAI-style reasoning_content field.

  • ReasoningSplitter — a streaming-safe classifier that splits model output into reasoning vs answer on <think> / </think> markers, holding back partial markers that straddle a chunk boundary.
  • --reasoning flagauto (default; splits on a literal <think>), prefilled (output begins mid-thought because the chat template prefilled <think> — for Qwen3.5 / Qwen3.6), or off.
  • reasoning_content added to the non-streaming message and to streaming deltas.
  • 8 ReasoningSplitter tests: both modes, token-by-token streaming, markers split across chunks, incomplete trailing markers.

Why

content should be the model's actual answer. Mixing reasoning into it is noisy for chat UIs and confuses agentic clients.

Verified

On Qwen3.6-35B-A3B-8bit with --reasoning prefilled: content is the clean answer ("42"), reasoning_content holds the thinking; streaming emits incremental reasoning_content then content deltas; tool calls unaffected. 28 tests pass.

Note

When reasoning precedes the answer, content keeps the leading whitespace that followed </think> (e.g. "\n\n42"). Trimming that is a small follow-up.

Thinking models (Qwen3.x) emit their reasoning inline, which previously
landed in the response `content`. Separate it into an OpenAI-style
`reasoning_content` field on both the message and streaming deltas.

- ReasoningSplitter: streaming-safe <think>/</think> classifier, with
  partial-marker handling across chunk boundaries
- --reasoning mode: auto (split on literal <think>), prefilled (output
  starts mid-thought, for Qwen3.5/3.6), or off
- reasoning_content on the non-streaming message and on stream deltas
- 8 splitter tests covering both modes, token-by-token streaming, and
  split markers

Verified on Qwen3.6-35B-A3B with --reasoning prefilled: clean content,
separated reasoning, streaming and tool calls intact.
@Defilan Defilan merged commit beed802 into main May 15, 2026
1 check passed
@Defilan Defilan deleted the feat/reasoning-content branch May 15, 2026 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant