fix: inject cache_control on content blocks for openai-compatible proxies to Anthropic backends (Bifrost, LiteLLM, Databricks)#25985
Conversation
|
Thanks for updating your PR! It now meets our contributing guidelines. 👍 |
…rock proxies
When setCacheKey: true is set on an @ai-sdk/openai-compatible provider and the
model ID contains 'bedrock/', or when cacheStrategy: 'bedrock' is explicitly
set, OpenCode now injects cache_control: {type:'ephemeral'} onto message content
blocks instead of sending a promptCacheKey request option.
promptCacheKey is an OpenAI-native mechanism that Bifrost, LiteLLM, and other
proxies routing to AWS Bedrock/Anthropic ignore entirely. These proxies require
cache_control on individual content blocks (Anthropic-style), which they then
translate to the native backend caching format.
Key changes:
- applyCompatCaching(): new function that converts string system messages to
content block arrays and annotates the last block of system/user messages with
cache_control via providerOptions.openaiCompatible — matching what Bifrost and
LiteLLM expect on the wire
- Guards applyCaching() from running on @ai-sdk/openai-compatible models to
prevent the 'claude' model-id heuristic from triggering the wrong caching path
- Passes provider options (item.options) into ProviderTransform.message() so
setCacheKey / cacheStrategy are available at message-transform time
- Adds cacheStrategy: 'bedrock' option to provider config schema
- Docs: new section explaining caching for openai-compatible Bedrock proxies
ea6982e to
cc3b3b9
Compare
|
Since opening this PR, the underlying issue has been confirmed by two more users on different providers:
This makes it clear the issue affects a broad class of OpenAI-compatible proxies that route to Anthropic-capable backends — not just Bifrost/LiteLLM. The The fix is minimal and isolated to |
|
Hey @rekram1-node and @thdxr — would love to get a review on this when you have a moment. This fixes a caching issue for users routing Claude models through OpenAI-compatible proxies (Bifrost, LiteLLM, Databricks, Xiaomi Mimo) to Bedrock/Anthropic backends. The root cause: @rekram1-node — you just touched this area in #26276, so you likely have the most context right now. The fix lives entirely in The issue has been independently confirmed by users on Databricks and Xiaomi Mimo direct API (see #25984) — so this affects a broad class of OpenAI-compatible proxies, not just Bifrost/LiteLLM. |
Issue for this PR
Closes #25984
Type of change
What does this PR do?
setCacheKey: trueon@ai-sdk/openai-compatibleproviders was causingpromptCacheKeyto be sent as a top-level request option. Bifrost and LiteLLM (which proxy to Bedrock/Anthropic) don't use this field — they requirecache_control: { type: "ephemeral" }on individual message content blocks, which they then translate to the backend's native caching format.The fix adds a new
applyCompatCaching()function intransform.tsthat:cache_controlon each block (message-level injection doesn't work because the SDK spreads it as a top-level field, not a block property)cache_controlmessage()when the provider is@ai-sdk/openai-compatibleand eithercacheStrategy: "bedrock"is set explicitly, orsetCacheKey: truewith a model ID containingbedrock/I also added a guard to stop
applyCaching()from running on@ai-sdk/openai-compatibleproviders, since themodel.id.includes("claude")heuristic there would have triggered the wrong path for Bifrost models.I understand why this works:
getOpenAIMetadata()in the AI SDK readsmessage.providerOptions?.openaiCompatibleand spreads it onto the serialized message/block objects. So putting{ cache_control: { type: "ephemeral" } }underproviderOptions.openaiCompatibleon a content block means it lands on the wire as{ type: "text", text: "...", cache_control: { type: "ephemeral" } }, which is exactly what Bifrost/LiteLLM expect.How did you verify your code works?
packages/opencode/test/provider/transform.test.tscovering: string system → content block conversion, user block annotation, auto-trigger viabedrock/model ID, negative cases (no opts, non-bedrock model), and multi-part user messages. All 155 tests pass.bun typecheckfrompackages/opencode— no errors.localhost:24242routing tobedrock/global.anthropic.claude-sonnet-4-6. Inspected outgoing requests and confirmedcache_control: { type: "ephemeral" }appears on content blocks.Screenshots / recordings
No UI changes.
Checklist