Skip to content

fix(compiler): strip cache_control for non-Anthropic providers#154

Merged
KylinMountain merged 1 commit into
VectifyAI:mainfrom
Aldominguez12:fix/cache-control-non-anthropic-400
Jun 30, 2026
Merged

fix(compiler): strip cache_control for non-Anthropic providers#154
KylinMountain merged 1 commit into
VectifyAI:mainfrom
Aldominguez12:fix/cache-control-non-anthropic-400

Conversation

@Aldominguez12

Copy link
Copy Markdown
Contributor

Problem

_cached_text tags reusable prompt context (system prompt, document, summary, known-targets) with an Anthropic ephemeral cache_control marker — added in #37 for Anthropic prompt caching. Its docstring assumes non-supporting providers simply ignore the marker:

For providers that ignore cache_control, the list-of-blocks payload remains a valid OpenAI-compatible content shape.

That assumption does not hold for Gemini. LiteLLM translates the cache_control block into a Gemini CachedContent object, which then conflicts with system_instruction/tools, so every request fails with:

400 CachedContent can not be used with system_instruction, tools or tool_config ...

Net effect: every compile with a gemini/* model fails out of the box — anything that routes through the compiler (openkb add, lint, query, skill, deck). Reproduced on gemini/gemini-2.5-pro and gemini/gemini-3-flash-preview with v0.4.2.

Fix

Strip the marker at the single request egress (_llm_call / _llm_call_async) for any provider that won't honour it, keeping it for Anthropic (direct) and Claude served via OpenRouter / Bedrock / Vertex. Producers keep tagging optimistically; only the egress decides whether the marker survives.

  • Anthropic prompt caching is unchanged.
  • Gemini's implicit caching still applies to the remaining plain-text blocks (verified the cached= token count is still reported).
  • supports_prompt_caching() is deliberately not used as the gate: it returns True for Gemini and GPT-4o (they have some caching), which is not the same as accepting Anthropic's cache_control block. Provider identity via get_llm_provider is the correct signal.

Tests

TestCacheControlStripping (5 tests): provider gating (Anthropic / OpenRouter-Anthropic kept; Gemini / GPT stripped), non-mutating marker removal, and the keep/strip paths through _llm_call. The existing TestCacheControl (Anthropic still gets breakpoints) and TestLLMCallExtraHeaders continue to pass.

get_llm_provider is imported locally inside the gate so provider detection stays correct even under tests that patch the module-level litellm.

The compiler tags reusable prompt context with an Anthropic ephemeral
`cache_control` marker (`_cached_text`). The docstring assumed providers
that don't support it would simply ignore it — but LiteLLM translates the
marker into a provider-native cached-content object for Gemini, which then
conflicts with `system_instruction`/`tools` and fails every request with
`400 CachedContent can not be used with ...`. As a result, *all* Gemini
compiles fail out of the box.

Strip the marker at the single request egress (`_llm_call` /
`_llm_call_async`) for any non-Anthropic provider, keeping it for Anthropic
direct and Claude via OpenRouter/Bedrock/Vertex. Anthropic prompt caching is
unchanged; Gemini's implicit caching still applies to the plain text blocks.

Provider detection uses litellm.get_llm_provider, imported locally so it
stays correct even when tests patch the module-level `litellm` reference.

Adds TestCacheControlStripping covering provider gating, marker removal
(non-mutating), and the sync stripping/keeping paths.
@Aldominguez12 Aldominguez12 force-pushed the fix/cache-control-non-anthropic-400 branch from 51b0791 to f454572 Compare June 30, 2026 06:07

@KylinMountain KylinMountain left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the thorough fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants