perf(ai): anthropic prompt cache + cache token accounting by ABB65 · Pull Request #53 · Contentrain/studio

ABB65 · 2026-05-15T22:10:48Z

Summary

System prompt + tools were rebuilt and re-sent uncached on every request. Brain content index alone can grow past 10K tokens, so a typical 10-turn session was paying for the same prefix ten times. Anthropic's prompt cache cuts that prefix to ~10% of base input price when reused within 5 minutes; this PR wires the markers up.

Why this PR needed real surgery

The naive shape — wrap the existing buildSystemPrompt(...) string in a cached block — would have miscached because:

buildSchemaSection embedded the active-model marker (### ▶ Posts vs ### Posts) keyed on uiContext.activeModelId.
buildRulesSection injected the out_of_scope rule based on intent.

Cache markers placed over content that varies per request bust the prefix and pay the 1.25x creation penalty on every turn — net negative. So the prompt builder is actually split:

buildStaticBody — role, architecture, config, schema (no active-model marker), relations, vocab, permissions, base rules, custom instructions
buildDynamicBody — UI context (active-model annotation here), inferred intent, project state, intent-specific rules
buildContentIndex — already separate (brain cache); its own cached block

buildSystemPromptBlocks(...) returns the three pieces; toSystemBlocks(...) materializes them as AISystemBlock[] with cache_control on the static body + content index (2 of 4 breakpoints). Last AITool gets cache_control too (3rd breakpoint, ~95% hit rate within a session).

Provider surface

// ai.ts
system: string | AISystemBlock[]         // string callers wrap to single uncached block
AISystemBlock.cacheControl?: { type: 'ephemeral' }
AITool.cacheControl?: { type: 'ephemeral' }
AIUsage: { inputTokens, outputTokens, cacheCreationInputTokens, cacheReadInputTokens }

Anthropic provider captures all three input buckets from message_start / message_delta and yields normalized message_end once. Engine accumulates across the tool loop and forwards on done.

Migration 008 — additive `_v2` RPCs

Per review feedback, RPCs ship as _v2 rather than mutating signatures in place. Mid-deploy schema-and-app skew on Supabase is a real failure mode; _v1 stays registered, unused, and can be dropped in a future cleanup.

ALTER TABLE agent_usage / api_message_usage / messages
  ADD COLUMN cache_creation_input_tokens
  ADD COLUMN cache_read_input_tokens

CREATE FUNCTION increment_agent_usage_tokens_v2(... + 2 cache params)
CREATE FUNCTION increment_api_usage_tokens_v2(... + 2 cache params)

saveChatResult / saveApiChatResult switched to object-form args (positional list grew unwieldy).

Business semantic — UNCHANGED

Cache is a Contentrain-side cost win, not a customer-facing quota expansion:

Plan quotas stay message-based (ai.messages_per_month, api.messages_per_month).
cache_read_input_tokens do NOT earn extra messages.
input_tokens semantic unchanged (= non-cached input only).
Existing dashboard queries summing input_tokens stay correct.

Test plan

pnpm typecheck clean
pnpm lint — 0 errors
pnpm test — 618 passed (608 + 10 new)
- 3 new anthropic-ai: three-bucket stream capture, system-block + tools cache_control mapping
- 6 new agent-system-prompt-cache: static body byte-identical across UI/intent/state changes (the actual cache-hit invariant), contentIndex separation, toSystemBlocks ≤ 2 cached blocks
- 1 new db: cache tokens propagate through saveChatResult to agent_usage + messages row

Out of scope (separate PRs)

Message-level cache breakpoint — history mutates per turn, needs prefix-stability analysis.
1-hour cache TTL beta — defer until we observe hit rates from this PR.
Cache hit-rate dashboard UI — accounting columns ship here; UI is a separate concern.
History budget increase — once we have observed hit rates, the conservative Sonnet/Opus values from PR refactor(chat): shared history builder with model/plan/source-aware budgets #52 can grow safely.

Sources

System prompt + tools were rebuilt and re-sent uncached on every request. Brain content index alone can grow past 10K tokens, so a typical 10-turn session was paying for the same prefix ten times. Anthropic's prompt cache cuts that prefix to ~10% of base input price when reused within 5 minutes; this PR wires the markers up. The naive shape — wrap the existing `buildSystemPrompt(...)` string in a cached block — would have miscached because `buildSchemaSection` embeds the active-model marker and `buildRulesSection` injects the out-of-scope rule based on intent. Cache markers placed over content that varies per request bust the prefix and pay the 1.25x creation penalty on every turn. The prompt builder is now actually split: - `buildStaticBody` role, architecture, config, schema (NO active-model marker), relations, vocab, permissions, base rules, custom instructions - `buildDynamicBody` UI context (active-model annotation lives here), inferred intent, project state, intent-specific rules (off-topic, etc.) - `buildContentIndex` already separate (brain cache); rendered as its own cached block `buildSystemPromptBlocks(...)` returns the three pieces; `toSystemBlocks(...)` materializes them as an `AISystemBlock[]` with `cache_control` markers on the static body and the content index (2 of Anthropic's 4 explicit breakpoints). The Studio chat handler and the Conversation API handler both compose the prompt this way and additionally tag the last AITool with `cache_control`, so the tools array gets the third breakpoint — tools rarely change within a session, very high hit rate. `AIProvider.system` accepts `string | AISystemBlock[]`; string callers (legacy paths, tests) get a single uncached block automatically. The Anthropic provider also captures `cache_creation_input_tokens` and `cache_read_input_tokens` from `message_start`/`message_delta` and surfaces them through a 4-field `AIUsage` shape. The engine accumulates all four buckets across the tool loop and forwards them on the `done` event. Persistence: Migration 008 ships as additive `_v2` RPCs and new columns: agent_usage / api_message_usage / messages + cache_creation_input_tokens + cache_read_input_tokens increment_agent_usage_tokens_v2 increment_api_usage_tokens_v2 `_v1` RPCs stay registered so a rolling deploy doesn't have to coordinate schema-and-app cutover. App code calls `_v2` exclusively; `_v1` becomes legacy and can be dropped in a future cleanup migration. `saveChatResult` / `saveApiChatResult` switched to object-form arguments — the positional list had grown unwieldy with four token fields plus ten other params. Business semantic preserved: cache is a Contentrain-side cost win. Plan quotas stay message-based; cache_read tokens DO NOT earn extra messages. `input_tokens` semantic is unchanged (= non-cached input), so existing dashboard queries summing it stay correct. Tests: - anthropic-ai: three-bucket stream-event capture; system-block + tools `cache_control` mapping to SDK shape. - agent-system-prompt-cache: static body is byte-identical across UI-context / intent / state changes (the actual cache-hit invariant); contentIndex separates; `toSystemBlocks` emits 2 cached blocks max so tools breakpoint stays available. - db: cache tokens propagate through `saveChatResult` to both `agent_usage` and the `messages` row. - chat-route / overage-soft-cap integration mocks updated to the new helper names and object-form save signature. Out of scope (separate follow-ups): - Message-level cache breakpoints (history mutates per turn — needs prefix-stability analysis). - 1-hour cache TTL beta. - Cache hit-rate dashboard UI. - History budget increase — defer until we have observed hit rates from this PR's accounting.

ABB65 merged commit 7580e4e into main May 15, 2026
1 check passed

ABB65 deleted the feat/ai-prompt-cache branch May 15, 2026 22:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(ai): anthropic prompt cache + cache token accounting#53

perf(ai): anthropic prompt cache + cache token accounting#53
ABB65 merged 1 commit into
mainfrom
feat/ai-prompt-cache

ABB65 commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ABB65 commented May 15, 2026

Summary

Why this PR needed real surgery

Provider surface

Migration 008 — additive _v2 RPCs

Business semantic — UNCHANGED

Test plan

Out of scope (separate PRs)

Sources

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Migration 008 — additive `_v2` RPCs