Skip to content

fix: inject cache_control on content blocks for openai-compatible proxies to Anthropic backends (Bifrost, LiteLLM, Databricks)#25985

Open
KTS-o7 wants to merge 9 commits into
anomalyco:devfrom
KTS-o7:feat/openai-compatible-bedrock-cache-control
Open

fix: inject cache_control on content blocks for openai-compatible proxies to Anthropic backends (Bifrost, LiteLLM, Databricks)#25985
KTS-o7 wants to merge 9 commits into
anomalyco:devfrom
KTS-o7:feat/openai-compatible-bedrock-cache-control

Conversation

@KTS-o7
Copy link
Copy Markdown

@KTS-o7 KTS-o7 commented May 6, 2026

Issue for this PR

Closes #25984

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

setCacheKey: true on @ai-sdk/openai-compatible providers was causing promptCacheKey to be sent as a top-level request option. Bifrost and LiteLLM (which proxy to Bedrock/Anthropic) don't use this field — they require cache_control: { type: "ephemeral" } on individual message content blocks, which they then translate to the backend's native caching format.

The fix adds a new applyCompatCaching() function in transform.ts that:

  • Converts string system messages into content block arrays with cache_control on each block (message-level injection doesn't work because the SDK spreads it as a top-level field, not a block property)
  • Annotates the last content block of targeted user messages with cache_control
  • Gets called from message() when the provider is @ai-sdk/openai-compatible and either cacheStrategy: "bedrock" is set explicitly, or setCacheKey: true with a model ID containing bedrock/

I also added a guard to stop applyCaching() from running on @ai-sdk/openai-compatible providers, since the model.id.includes("claude") heuristic there would have triggered the wrong path for Bifrost models.

I understand why this works: getOpenAIMetadata() in the AI SDK reads message.providerOptions?.openaiCompatible and spreads it onto the serialized message/block objects. So putting { cache_control: { type: "ephemeral" } } under providerOptions.openaiCompatible on a content block means it lands on the wire as { type: "text", text: "...", cache_control: { type: "ephemeral" } }, which is exactly what Bifrost/LiteLLM expect.

How did you verify your code works?

  • Added 7 tests in packages/opencode/test/provider/transform.test.ts covering: string system → content block conversion, user block annotation, auto-trigger via bedrock/ model ID, negative cases (no opts, non-bedrock model), and multi-part user messages. All 155 tests pass.
  • Ran bun typecheck from packages/opencode — no errors.
  • Tested locally with Bifrost running at localhost:24242 routing to bedrock/global.anthropic.claude-sonnet-4-6. Inspected outgoing requests and confirmed cache_control: { type: "ephemeral" } appears on content blocks.

Screenshots / recordings

No UI changes.

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

@github-actions github-actions Bot added the needs:compliance This means the issue will auto-close after 2 hours. label May 6, 2026
@github-actions github-actions Bot removed the needs:compliance This means the issue will auto-close after 2 hours. label May 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Thanks for updating your PR! It now meets our contributing guidelines. 👍

…rock proxies

When setCacheKey: true is set on an @ai-sdk/openai-compatible provider and the
model ID contains 'bedrock/', or when cacheStrategy: 'bedrock' is explicitly
set, OpenCode now injects cache_control: {type:'ephemeral'} onto message content
blocks instead of sending a promptCacheKey request option.

promptCacheKey is an OpenAI-native mechanism that Bifrost, LiteLLM, and other
proxies routing to AWS Bedrock/Anthropic ignore entirely. These proxies require
cache_control on individual content blocks (Anthropic-style), which they then
translate to the native backend caching format.

Key changes:
- applyCompatCaching(): new function that converts string system messages to
  content block arrays and annotates the last block of system/user messages with
  cache_control via providerOptions.openaiCompatible — matching what Bifrost and
  LiteLLM expect on the wire
- Guards applyCaching() from running on @ai-sdk/openai-compatible models to
  prevent the 'claude' model-id heuristic from triggering the wrong caching path
- Passes provider options (item.options) into ProviderTransform.message() so
  setCacheKey / cacheStrategy are available at message-transform time
- Adds cacheStrategy: 'bedrock' option to provider config schema
- Docs: new section explaining caching for openai-compatible Bedrock proxies
@KTS-o7 KTS-o7 force-pushed the feat/openai-compatible-bedrock-cache-control branch from ea6982e to cc3b3b9 Compare May 6, 2026 07:49
@KTS-o7
Copy link
Copy Markdown
Author

KTS-o7 commented May 10, 2026

Since opening this PR, the underlying issue has been confirmed by two more users on different providers:

This makes it clear the issue affects a broad class of OpenAI-compatible proxies that route to Anthropic-capable backends — not just Bifrost/LiteLLM. The cacheStrategy: "bedrock" approach this PR introduces generalises cleanly to all of them.

The fix is minimal and isolated to transform.ts with a guard that keeps the existing applyCaching() path completely unchanged for native providers. Happy to address any review feedback.

@KTS-o7 KTS-o7 changed the title fix: inject cache_control on content blocks for openai-compatible Bedrock proxies (Bifrost, LiteLLM) fix: inject cache_control on content blocks for openai-compatible proxies to Anthropic backends (Bifrost, LiteLLM, Databricks) May 10, 2026
@KTS-o7
Copy link
Copy Markdown
Author

KTS-o7 commented May 10, 2026

Hey @rekram1-node and @thdxr — would love to get a review on this when you have a moment.

This fixes a caching issue for users routing Claude models through OpenAI-compatible proxies (Bifrost, LiteLLM, Databricks, Xiaomi Mimo) to Bedrock/Anthropic backends. The root cause: setCacheKey: true sends promptCacheKey as a top-level option, which these proxies don't recognise — they require cache_control: { type: "ephemeral" } injected directly onto message content blocks.

@rekram1-node — you just touched this area in #26276, so you likely have the most context right now. The fix lives entirely in transform.ts with a new applyCompatCaching() function, guarded so it only runs for @ai-sdk/openai-compatible providers and never interferes with the existing applyCaching() path.

The issue has been independently confirmed by users on Databricks and Xiaomi Mimo direct API (see #25984) — so this affects a broad class of OpenAI-compatible proxies, not just Bifrost/LiteLLM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

setCacheKey sends promptCacheKey (wrong) instead of cache_control on content blocks for openai-compatible Bedrock proxies (Bifrost, LiteLLM)

1 participant