fix(anthropic): allow cache_control on tool_result blocks#4121
Conversation
…turn breakpoint advances The Anthropic adapter places three cache_control breakpoints (system, last tool, last user message), but `addCacheControlToMessages` only attached the third when the last block of the last user message was a non-empty text block. After turn 1 of any agentic conversation, the last user message is a tool_result, so the breakpoint was silently dropped and the cacheable region collapsed back to system+tools. Per-turn history was never cached. Anthropic's docs explicitly list tool_result as a cacheable block type (https://docs.claude.com/en/docs/build-with-claude/prompt-caching). Accepting both text and tool_result keeps the breakpoint moving forward as the conversation grows.
E2E test report — cache amortization across Anthropic-protocol providersCaptured the raw Prompt (forces sequential
ReproductionPROMPT="…the prompt above…"
node dist/cli.js "$PROMPT" \
--approval-mode yolo --auth-type anthropic -m claude-opus-4-7 \
--max-session-turns 40 --output-format json
# repeat with -m deepseek-v4-proHeadline numbers
DeepSeek completes the same work using 35% fewer prompt tokens and amortizes 2.6× more of them. Why DeepSeek shows ~100% hit rateDeepSeek's Per-call growth on DeepSeek (monotonic, never misses): Why IDEALAB shows ~37% hit rateIDEALAB's Per-call routing and cache hit pattern: Pattern in plain language:
So the low hit rate on IDEALAB is operational, not protocol-level: every time the proxy chooses a different downstream than the previous request used, the cache lookup fails by design. The client has no control over routing decisions, and there's nothing this PR (or any client-side change) can do to fix it. Realistic mitigation paths are all infrastructure-side: sticky routing in the IDEALAB proxy, single-backend deployment, or direct What this PR does for both providersBefore this fix, qwen-code's per-turn cache breakpoint was silently dropped after turn 1 (the last user message was a
Files referenced
|
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — DeepSeek/deepseek-v4-pro via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — DeepSeek/deepseek-v4-pro via Qwen Code /review
… breakpoint advances (QwenLM#4121) Cherry-picked from upstream QwenLM/qwen-code commit f72a156. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
addCacheControlToMessagesnow attaches the per-turn cache breakpoint to the last user message's last block whether that block istextortool_result. Previously onlytextblocks qualified.tool_result. The old guard silently dropped the per-turn breakpoint, collapsing the cacheable region back to system+tools (~32k) and never caching conversation history. The Anthropic docs (prompt caching) explicitly listtool_resultas a cacheable block type inmessages.content.Validation
gh-driven prompt againstclaude-opus-4-7(via IDEALAB → Bedrock/Vertex) anddeepseek-v4-pro(api.deepseek.com/anthropic), both under--auth-type anthropic. Raw token trace captured pre- and post-fix.cache_creation_input_tokensgrows turn-over-turn once the conversation history starts flowing into the cacheable region.cache_readgrew from ~32k (system+tools only, pre-fix) to 43k → 47k → 51k as conversation history accumulated. Detailed E2E numbers in the comment below.tool_result-last user message getscache_controlattached.Scope / Risk
cache_control: { type: 'ephemeral' }attribute totool_resultcontent blocks emitted by the converter. This is explicitly documented by Anthropic as supported. Providers that ignorecache_control(DeepSeek's auto-caching) are unaffected; providers that honor it (real Anthropic / Bedrock / Vertex) now amortize history.api.anthropic.comwas not exercised in E2E (no key in this environment). The behavior is symmetric to the existing system/tool markers, both of which DO ship to api.anthropic.com today.tool_resultblocks are billable at 1.25× per Anthropic's pricing, but the same blocks were already in the request body as fresh input tokens — net cost on the first turn is roughly even, and reads on subsequent turns are 0.1× (≈10× cheaper) so any multi-turn workload comes out ahead.Testing Matrix
Testing matrix notes:
Linked Issues / Bugs
None.