feat: implement prompt caching for Anthropic and OpenAI models by julianbenegas · Pull Request #140 · basehub-ai/forums

julianbenegas · 2026-02-07T22:12:29Z

Summary

Implements prompt caching to reduce token costs and improve latency for LLM conversations. This follows patterns established by vercel/ai SDK (dynamic prompt caching cookbook) and anomalyco/opencode (ProviderTransform.applyCaching).

What's Changed

New: `agent/prompt-cache.ts`

A provider-aware utility that applies the optimal caching strategy per model:

Anthropic models (claude-*):

Adds cacheControl: { type: 'ephemeral' } breakpoints on:
- First 2 system messages (static instructions)
- Last 2 non-system messages (conversation frontier)
Respects Anthropic's max of 4 cache breakpoints per request
Cached tokens cost 10% of input tokens (25% write premium)
The AI SDK translates message-level providerOptions to block-level cache_control automatically

OpenAI models (gpt-*, o1-*, o3-*, o4-*):

Sets promptCacheKey: 'forums-{postId}' via providerOptions
OpenAI caching is automatic for prompts ≥ 1024 tokens
The key improves cache routing (requests with same key+prefix hash go to same server)
No cost premium for cache writes

Other providers: Messages pass through unchanged (safe for provider-agnostic code).

Modified: `agent/response-agent.ts`

Applies addCacheControlToMessages() to model messages before streaming
Applies getCacheProviderOptions() for OpenAI routing hints

New: `agent/tests/prompt-cache.test.ts`

Comprehensive test suite — 18 tests, all passing:

Anthropic cache breakpoint placement (system + last 2 messages)
Max 4 breakpoints enforcement with long conversations
Preservation of existing providerOptions
Tool messages in conversation
LanguageModel object detection
OpenAI promptCacheKey generation
Unknown provider pass-through
Edge cases (empty arrays, single messages)

Research

Source	Pattern Applied
vercel/ai cookbook	`addCacheControlToMessages` utility, message-level `providerOptions`
vercel/ai `@ai-sdk/anthropic`	`CacheControlValidator` with 4-breakpoint limit
anomalyco/opencode `ProviderTransform`	`applyCaching`: cache first 2 system + last 2 non-system messages
OpenAI caching docs	`promptCacheKey` for server routing, automatic prefix caching
AI SDK OpenAI provider	`promptCacheKey` providerOption

Add prompt caching to reduce token costs and improve latency for LLM conversations, following patterns from vercel/ai SDK cookbook and anomalyco/opencode. ## Anthropic (explicit cache control) - Adds cacheControl breakpoints on system messages and the last 2 conversation messages (max 4 breakpoints per Anthropic's limit) - Cached tokens cost 10% of input tokens (with 25% write premium) - Applied via providerOptions on messages, automatically translated by the AI SDK to block-level cache_control ## OpenAI (routing optimization) - Sets promptCacheKey per post to improve cache hit routing - OpenAI automatically caches prompts >= 1024 tokens; the key helps route requests to the same server for better cache hits - No cost premium for cache writes ## Implementation - New utility: agent/prompt-cache.ts with provider detection and caching strategies - Updated response-agent.ts streamTextStep to apply caching - Comprehensive test suite (18 tests) covering all edge cases - Non-Anthropic/OpenAI providers pass through unchanged (safe for provider-agnostic code)

vercel · 2026-02-07T22:12:34Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
forums	Ready	Preview, Comment	Feb 7, 2026 10:19pm

…cache purge Wrap the system prompt as a SystemModelMessage with cacheControl providerOptions for Anthropic models. Without this, the system prompt (passed as a plain string to streamText) has no cache_control marker, meaning Anthropic won't establish the cache prefix and the cache gets purged between requests. - New wrapSystemPrompt() function: returns SystemModelMessage with cacheControl for Anthropic, plain string for other providers - Updated response-agent.ts to use wrapSystemPrompt() - Added 4 new tests for wrapSystemPrompt (22 total, all passing)

vercel · 2026-02-07T22:28:33Z

apps/web/agent/prompt-cache.ts

+  ModelMessage,
+  SystemModelMessage,
+} from "ai"
+


Module-level documentation incorrectly describes caching strategy as marking "the last conversation message" (singular) when implementation marks "the last 2 non-system messages" (plural)

vercel bot deployed to Preview February 7, 2026 22:14 View deployment

vercel bot deployed to Preview February 7, 2026 22:19 View deployment

vercel bot reviewed Feb 7, 2026

View reviewed changes

julianbenegas merged commit 70e7615 into main Feb 7, 2026
6 checks passed

julianbenegas deleted the forums/prompt-caching-84e29 branch February 7, 2026 22:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement prompt caching for Anthropic and OpenAI models#140

feat: implement prompt caching for Anthropic and OpenAI models#140
julianbenegas merged 2 commits intomainfrom
forums/prompt-caching-84e29

julianbenegas commented Feb 7, 2026

Uh oh!

vercel bot commented Feb 7, 2026 •

edited

Loading

Uh oh!

vercel bot Feb 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

julianbenegas commented Feb 7, 2026

Summary

What's Changed

New: agent/prompt-cache.ts

Modified: agent/response-agent.ts

New: agent/__tests__/prompt-cache.test.ts

Research

Uh oh!

vercel bot commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vercel bot Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

New: `agent/prompt-cache.ts`

Modified: `agent/response-agent.ts`

New: `agent/tests/prompt-cache.test.ts`

vercel bot commented Feb 7, 2026 •

edited

Loading