Skip to content

feat: implement prompt caching for Anthropic and OpenAI models#140

Merged
julianbenegas merged 2 commits intomainfrom
forums/prompt-caching-84e29
Feb 7, 2026
Merged

feat: implement prompt caching for Anthropic and OpenAI models#140
julianbenegas merged 2 commits intomainfrom
forums/prompt-caching-84e29

Conversation

@julianbenegas
Copy link
Member

Summary

Implements prompt caching to reduce token costs and improve latency for LLM conversations. This follows patterns established by vercel/ai SDK (dynamic prompt caching cookbook) and anomalyco/opencode (ProviderTransform.applyCaching).

What's Changed

New: agent/prompt-cache.ts

A provider-aware utility that applies the optimal caching strategy per model:

Anthropic models (claude-*):

  • Adds cacheControl: { type: 'ephemeral' } breakpoints on:
    • First 2 system messages (static instructions)
    • Last 2 non-system messages (conversation frontier)
  • Respects Anthropic's max of 4 cache breakpoints per request
  • Cached tokens cost 10% of input tokens (25% write premium)
  • The AI SDK translates message-level providerOptions to block-level cache_control automatically

OpenAI models (gpt-*, o1-*, o3-*, o4-*):

  • Sets promptCacheKey: 'forums-{postId}' via providerOptions
  • OpenAI caching is automatic for prompts ≥ 1024 tokens
  • The key improves cache routing (requests with same key+prefix hash go to same server)
  • No cost premium for cache writes

Other providers: Messages pass through unchanged (safe for provider-agnostic code).

Modified: agent/response-agent.ts

  • Applies addCacheControlToMessages() to model messages before streaming
  • Applies getCacheProviderOptions() for OpenAI routing hints

New: agent/__tests__/prompt-cache.test.ts

Comprehensive test suite — 18 tests, all passing:

  • Anthropic cache breakpoint placement (system + last 2 messages)
  • Max 4 breakpoints enforcement with long conversations
  • Preservation of existing providerOptions
  • Tool messages in conversation
  • LanguageModel object detection
  • OpenAI promptCacheKey generation
  • Unknown provider pass-through
  • Edge cases (empty arrays, single messages)

Research

Source Pattern Applied
vercel/ai cookbook addCacheControlToMessages utility, message-level providerOptions
vercel/ai @ai-sdk/anthropic CacheControlValidator with 4-breakpoint limit
anomalyco/opencode ProviderTransform applyCaching: cache first 2 system + last 2 non-system messages
OpenAI caching docs promptCacheKey for server routing, automatic prefix caching
AI SDK OpenAI provider promptCacheKey providerOption

Add prompt caching to reduce token costs and improve latency for
LLM conversations, following patterns from vercel/ai SDK cookbook
and anomalyco/opencode.

## Anthropic (explicit cache control)
- Adds cacheControl breakpoints on system messages and the last 2
  conversation messages (max 4 breakpoints per Anthropic's limit)
- Cached tokens cost 10% of input tokens (with 25% write premium)
- Applied via providerOptions on messages, automatically translated
  by the AI SDK to block-level cache_control

## OpenAI (routing optimization)
- Sets promptCacheKey per post to improve cache hit routing
- OpenAI automatically caches prompts >= 1024 tokens; the key
  helps route requests to the same server for better cache hits
- No cost premium for cache writes

## Implementation
- New utility: agent/prompt-cache.ts with provider detection and
  caching strategies
- Updated response-agent.ts streamTextStep to apply caching
- Comprehensive test suite (18 tests) covering all edge cases
- Non-Anthropic/OpenAI providers pass through unchanged (safe for
  provider-agnostic code)
@vercel
Copy link
Contributor

vercel bot commented Feb 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
forums Ready Ready Preview, Comment Feb 7, 2026 10:19pm

…cache purge

Wrap the system prompt as a SystemModelMessage with cacheControl
providerOptions for Anthropic models. Without this, the system
prompt (passed as a plain string to streamText) has no cache_control
marker, meaning Anthropic won't establish the cache prefix and the
cache gets purged between requests.

- New wrapSystemPrompt() function: returns SystemModelMessage with
  cacheControl for Anthropic, plain string for other providers
- Updated response-agent.ts to use wrapSystemPrompt()
- Added 4 new tests for wrapSystemPrompt (22 total, all passing)
ModelMessage,
SystemModelMessage,
} from "ai"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module-level documentation incorrectly describes caching strategy as marking "the last conversation message" (singular) when implementation marks "the last 2 non-system messages" (plural)

Fix on Vercel

@julianbenegas julianbenegas merged commit 70e7615 into main Feb 7, 2026
6 checks passed
@julianbenegas julianbenegas deleted the forums/prompt-caching-84e29 branch February 7, 2026 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant