Conversation
|
@skrimix thank you. Renaming the "thinking" block to "think" is fine. However, we should keep the signature attribute or pass it back another way. Does removing it cause issues with tools that handle think blocks? From what I remember, I included the signature so we can reconstruct the thinking block when needed for tool use. I don't know how other tools handle this, but Anthropic requires it. https://platform.claude.com/docs/en/api/messages#thinking_block https://platform.claude.com/docs/en/build-with-claude/extended-thinking "Preserving thinking blocks: During tool use, you must pass https://platform.claude.com/docs/en/build-with-claude/extended-thinking#preserving-thinking-blocks "During tool use, you must pass |
|
I'll be honest, I didn't look too much into the tool calling side of things, since that isn't in my use case and so I missed that and likely broke whatever handling is there. I apologize. Responses API -> Messages APIIn this case I guess we could try to glue it together by treating "thinking" field as "summary", and "signature" as "encrypted_content". Not sure. Chat Completions API -> Messages APIThis is even messier. Many providers don't support passing thinking in requests, so I'm guessing clients too. Those who do support that (e.g. Z.AI, Moonshot) use a simple "reasoning_context" text field, which isn't enough for handling Anthropic.
LiteLLM's exampleLiteLLM seems to be using both "reasoning_content" and a different "thinking_blocks" field specific to Anthropic, on their Chat Completions endpoint. When using Anthropic models with thinking enabled and tool calling, you must include |
There was a problem hiding this comment.
Pull request overview
This PR fixes handling of “thinking”/reasoning blocks across Anthropic and OpenAI compatibility layers, ensuring request validation accepts thinking blocks and streaming output wraps reasoning in a single tag pair.
Changes:
- Accept
thinkingandredacted_thinkingblocks in Anthropic request content validation. - Switch OpenAI-facing reasoning markup from
<thinking ...>to<think>. - Fix streaming so
<think>/</think>wrap the entire thinking block rather than each delta chunk.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| ccproxy/llms/models/anthropic.py | Expands request content block union to accept thinking-related blocks. |
| ccproxy/llms/formatters/anthropic_to_openai/streams.py | Moves <think> wrapping to block start/stop events for correct streaming output. |
| ccproxy/llms/formatters/anthropic_to_openai/responses.py | Normalizes non-streaming conversions to use <think> and skips redacted thinking. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| openai_models.StreamingChoice( | ||
| index=0, | ||
| delta=openai_models.DeltaMessage( | ||
| role="assistant", content="<think>" | ||
| ), | ||
| finish_reason=None, | ||
| ) |
There was a problem hiding this comment.
For OpenAI-compatible streaming, delta.role is typically only sent once at the start of the assistant message (many clients assume subsequent chunks omit it). For these synthetic <think> wrapper chunks, consider omitting role (i.e., only set content) or only including role if it hasn’t been emitted yet for the message.
| openai_models.StreamingChoice( | ||
| index=0, | ||
| delta=openai_models.DeltaMessage( | ||
| role="assistant", content="</think>" |
There was a problem hiding this comment.
Same as the opening-tag emission: emitting delta.role=\"assistant\" on this closing-tag chunk may be inconsistent with common streaming expectations. Prefer omitting role (or gate it behind a ‘role already emitted’ flag) for these wrapper-only chunks.
| role="assistant", content="</think>" | |
| content="</think>" |
| yield openai_models.ChatCompletionChunk( | ||
| id="chatcmpl-stream", | ||
| object="chat.completion.chunk", | ||
| created=0, | ||
| model=model_id, | ||
| choices=[ | ||
| openai_models.StreamingChoice( | ||
| index=0, | ||
| delta=openai_models.DeltaMessage( | ||
| role="assistant", content="<think>" | ||
| ), | ||
| finish_reason=None, | ||
| ) | ||
| ], | ||
| ) |
There was a problem hiding this comment.
The ChatCompletionChunk construction for emitting wrapper tags is duplicated (opening and closing) with many identical fields. Consider extracting a small helper/factory (e.g., emit_text_chunk(content: str, *, role: str | None = None)) to reduce repetition and the risk of future inconsistencies across these synthetic chunks.
|
@copilot open a new pull request to apply changes based on the comments in this thread |
|
@CaddyGlow |
This PR fixes several issues around handling thinking blocks:
Anthropic API: Accept thinking blocks in requests
Clients that try to pass thinking blocks back in consecutive requests were getting an error:
Fix: Added
ThinkingBlockandRedactedThinkingBlockto the list of accepted request content blocks.OpenAI API: Use more common thinking tag format
Thinking blocks were formatted as
<thinking signature="Eok...">, which I'm not sure there's any client that can handle.Fix: Changed to the common
<think>tag format.OpenAI API: Fix streaming reasoning chunks
When streaming reasoning content, each chunk was being enclosed in its own thinking tag, resulting in broken output:
<thinking>User</thinking><thinking> wants to simpl</thinking><thinking>ify -</thinking>...Fix: Moved the enclosing tag logic into the start and stop event handlers so the tags wrap the entire thinking content rather than each chunk.