feat: implement prompt caching for Anthropic and OpenAI models#140
Merged
julianbenegas merged 2 commits intomainfrom Feb 7, 2026
Merged
feat: implement prompt caching for Anthropic and OpenAI models#140julianbenegas merged 2 commits intomainfrom
julianbenegas merged 2 commits intomainfrom
Conversation
Add prompt caching to reduce token costs and improve latency for LLM conversations, following patterns from vercel/ai SDK cookbook and anomalyco/opencode. ## Anthropic (explicit cache control) - Adds cacheControl breakpoints on system messages and the last 2 conversation messages (max 4 breakpoints per Anthropic's limit) - Cached tokens cost 10% of input tokens (with 25% write premium) - Applied via providerOptions on messages, automatically translated by the AI SDK to block-level cache_control ## OpenAI (routing optimization) - Sets promptCacheKey per post to improve cache hit routing - OpenAI automatically caches prompts >= 1024 tokens; the key helps route requests to the same server for better cache hits - No cost premium for cache writes ## Implementation - New utility: agent/prompt-cache.ts with provider detection and caching strategies - Updated response-agent.ts streamTextStep to apply caching - Comprehensive test suite (18 tests) covering all edge cases - Non-Anthropic/OpenAI providers pass through unchanged (safe for provider-agnostic code)
Contributor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
…cache purge Wrap the system prompt as a SystemModelMessage with cacheControl providerOptions for Anthropic models. Without this, the system prompt (passed as a plain string to streamText) has no cache_control marker, meaning Anthropic won't establish the cache prefix and the cache gets purged between requests. - New wrapSystemPrompt() function: returns SystemModelMessage with cacheControl for Anthropic, plain string for other providers - Updated response-agent.ts to use wrapSystemPrompt() - Added 4 new tests for wrapSystemPrompt (22 total, all passing)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements prompt caching to reduce token costs and improve latency for LLM conversations. This follows patterns established by vercel/ai SDK (dynamic prompt caching cookbook) and anomalyco/opencode (
ProviderTransform.applyCaching).What's Changed
New:
agent/prompt-cache.tsA provider-aware utility that applies the optimal caching strategy per model:
Anthropic models (
claude-*):cacheControl: { type: 'ephemeral' }breakpoints on:providerOptionsto block-levelcache_controlautomaticallyOpenAI models (
gpt-*,o1-*,o3-*,o4-*):promptCacheKey: 'forums-{postId}'viaproviderOptionsOther providers: Messages pass through unchanged (safe for provider-agnostic code).
Modified:
agent/response-agent.tsaddCacheControlToMessages()to model messages before streaminggetCacheProviderOptions()for OpenAI routing hintsNew:
agent/__tests__/prompt-cache.test.tsComprehensive test suite — 18 tests, all passing:
providerOptionspromptCacheKeygenerationResearch
addCacheControlToMessagesutility, message-levelproviderOptions@ai-sdk/anthropicCacheControlValidatorwith 4-breakpoint limitProviderTransformapplyCaching: cache first 2 system + last 2 non-system messagespromptCacheKeyfor server routing, automatic prefix cachingpromptCacheKeyproviderOption