fix(compiler): strip cache_control for non-Anthropic providers#154
Merged
KylinMountain merged 1 commit intoJun 30, 2026
Merged
Conversation
The compiler tags reusable prompt context with an Anthropic ephemeral `cache_control` marker (`_cached_text`). The docstring assumed providers that don't support it would simply ignore it — but LiteLLM translates the marker into a provider-native cached-content object for Gemini, which then conflicts with `system_instruction`/`tools` and fails every request with `400 CachedContent can not be used with ...`. As a result, *all* Gemini compiles fail out of the box. Strip the marker at the single request egress (`_llm_call` / `_llm_call_async`) for any non-Anthropic provider, keeping it for Anthropic direct and Claude via OpenRouter/Bedrock/Vertex. Anthropic prompt caching is unchanged; Gemini's implicit caching still applies to the plain text blocks. Provider detection uses litellm.get_llm_provider, imported locally so it stays correct even when tests patch the module-level `litellm` reference. Adds TestCacheControlStripping covering provider gating, marker removal (non-mutating), and the sync stripping/keeping paths.
51b0791 to
f454572
Compare
KylinMountain
approved these changes
Jun 30, 2026
KylinMountain
left a comment
Collaborator
There was a problem hiding this comment.
LGTM, thanks for the thorough fix.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
_cached_texttags reusable prompt context (system prompt, document, summary, known-targets) with an Anthropic ephemeralcache_controlmarker — added in #37 for Anthropic prompt caching. Its docstring assumes non-supporting providers simply ignore the marker:That assumption does not hold for Gemini. LiteLLM translates the
cache_controlblock into a GeminiCachedContentobject, which then conflicts withsystem_instruction/tools, so every request fails with:Net effect: every compile with a
gemini/*model fails out of the box — anything that routes through the compiler (openkb add,lint,query,skill,deck). Reproduced ongemini/gemini-2.5-proandgemini/gemini-3-flash-previewwith v0.4.2.Fix
Strip the marker at the single request egress (
_llm_call/_llm_call_async) for any provider that won't honour it, keeping it for Anthropic (direct) and Claude served via OpenRouter / Bedrock / Vertex. Producers keep tagging optimistically; only the egress decides whether the marker survives.cached=token count is still reported).supports_prompt_caching()is deliberately not used as the gate: it returnsTruefor Gemini and GPT-4o (they have some caching), which is not the same as accepting Anthropic'scache_controlblock. Provider identity viaget_llm_provideris the correct signal.Tests
TestCacheControlStripping(5 tests): provider gating (Anthropic / OpenRouter-Anthropic kept; Gemini / GPT stripped), non-mutating marker removal, and the keep/strip paths through_llm_call. The existingTestCacheControl(Anthropic still gets breakpoints) andTestLLMCallExtraHeaderscontinue to pass.