Skip to content

Commit da9acc6

Browse files
🤖 fix: enable prompt caching for mux-gateway Anthropic models (#816)
## Problem Prompt caching wasn't working for Anthropic models accessed via mux-gateway. Only the system message (~5.4k tokens) was being cached, but conversation history was not. ## Root Cause The gateway provider uses a different request format and translation mechanism: | Provider | Request Format | Cache Control Translation | |----------|---------------|--------------------------| | Direct Anthropic | `json.messages` | SDK translates `providerOptions` at all levels | | Gateway | `json.prompt` | Gateway server only translates `providerOptions` at **message level** | Our `applyCacheControl()` was setting `providerOptions.anthropic.cacheControl` at the **content part level**, which the gateway server ignores. Only `createCachedSystemMessage()` was setting it at message level (hence system prompt caching worked). ## Fix Update `wrapFetchWithAnthropicCacheControl` to: 1. Detect gateway format by checking for `json.prompt` array 2. Add `providerOptions.anthropic.cacheControl` at **message level** for gateway requests 3. Keep `cache_control` injection at content part level for direct Anthropic ## Testing Verified caching works by checking that cache read tokens increase on subsequent messages. --- _Generated with `mux`_
1 parent 22a816f commit da9acc6

File tree

1 file changed

+23
-5
lines changed

1 file changed

+23
-5
lines changed

src/node/services/aiService.ts

Lines changed: 23 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -125,16 +125,34 @@ function wrapFetchWithAnthropicCacheControl(baseFetch: typeof fetch): typeof fet
125125

126126
// Inject cache_control on last message's last content part
127127
// This caches the entire conversation
128-
if (Array.isArray(json.messages) && json.messages.length >= 1) {
129-
const lastMsg = json.messages[json.messages.length - 1] as Record<string, unknown>;
130-
const content = lastMsg.content;
128+
// Handle both formats:
129+
// - Direct Anthropic provider: json.messages (Anthropic API format)
130+
// - Gateway provider: json.prompt (AI SDK internal format)
131+
const messages = Array.isArray(json.messages)
132+
? json.messages
133+
: Array.isArray(json.prompt)
134+
? json.prompt
135+
: null;
136+
137+
if (messages && messages.length >= 1) {
138+
const lastMsg = messages[messages.length - 1] as Record<string, unknown>;
139+
140+
// For gateway: add providerOptions.anthropic.cacheControl at message level
141+
// (gateway validates schema strictly, doesn't allow raw cache_control on messages)
142+
if (Array.isArray(json.prompt)) {
143+
const providerOpts = (lastMsg.providerOptions ?? {}) as Record<string, unknown>;
144+
const anthropicOpts = (providerOpts.anthropic ?? {}) as Record<string, unknown>;
145+
anthropicOpts.cacheControl ??= { type: "ephemeral" };
146+
providerOpts.anthropic = anthropicOpts;
147+
lastMsg.providerOptions = providerOpts;
148+
}
131149

150+
// For direct Anthropic: add cache_control to last content part
151+
const content = lastMsg.content;
132152
if (Array.isArray(content) && content.length > 0) {
133-
// Array content: add cache_control to last part
134153
const lastPart = content[content.length - 1] as Record<string, unknown>;
135154
lastPart.cache_control ??= { type: "ephemeral" };
136155
}
137-
// Note: String content messages are rare after SDK conversion; skip for now
138156
}
139157

140158
// Update body with modified JSON

0 commit comments

Comments
 (0)