feat: 3-block system prompt for cache-efficient LTM injection#311
Merged
Conversation
Split LTM into stable (preferences, 1h cache) and context-bound (gotchas/patterns/architecture, rides conversation cache) blocks to improve Anthropic prompt cache hit rates. System prompt structure: Block 0: Host prompt — no cache_control (covered by prefix) Block 1: Stable LTM (prefs) — cache_control: 1h TTL Block 2: Context-bound LTM — no cache_control (rides 5m conv cache) Blocks 0+1 form a stable prefix written once at 2x cost, read at 0.1x for the entire session. When context-bound LTM changes (turn 1→2, curation), only block 2 + messages are re-processed. Changes: - Add stableLtmSystem to AnthropicCacheOptions - buildAnthropicRequest emits 3 system blocks when stableLtmSystem set - pipeline.ts computes preferences separately via stableLtmCache (pinned for >=1h, not invalidated by curation) - Context-bound entries use excludeCategories to avoid duplicating prefs - Layer 4 emergency refreshes only context-bound LTM, keeps stable pinned - Add excludeCategories option to forSession() in ltm.ts - OpenAI fallback concatenates both LTM texts into single string
a2121bb to
28e4d15
Compare
BYK
added a commit
that referenced
this pull request
May 14, 2026
## Summary Fixes all pre-existing and newly-identified issues from the code review of PR #311 (3-block system prompt). ### Critical fixes - **OpenAI/Responses API upstreams now receive LTM**: Previously, LTM text was only passed via `AnthropicCacheOptions`, which only `buildAnthropicRequest` consumed. OpenAI Chat Completions and Responses API paths received `req.system` unmodified (no LTM). Fix: `forwardToUpstream` now concatenates `stableLtmSystem` + `ltmSystem` into `req.system` before calling OpenAI builders. - **`vectorSearch()` now filters by category**: Previously queried all knowledge entries with embeddings regardless of category. When `excludeCategories: ["preference"]` was passed to `forSession()`, the SQL filtered preferences out of the entry lists, but `vectorSearch()` still returned preference entries in its top-50 hits — crowding out relevant context-bound entries. Fix: add optional `excludeCategories` parameter to `vectorSearch()`, propagated from `forSession()`. ### Medium fixes - **`textDiffRatio` rewritten**: Old implementation only compared prefix + suffix characters, missing interior content changes entirely (e.g., entries added/removed in the middle). New implementation samples up to 1000 evenly-spaced positions across the full string length for O(1) cost, reliably detecting interior edits. - **Context-bound LTM budget floor**: Added a 50% minimum floor for context-bound entries. Previously, a project with many preferences could consume the entire LTM budget, starving gotchas/patterns/architecture entries that are more critical for correctness during active work. ### Low fixes - **`KnowledgeCategory` type alias**: Exported from `ltm.ts` for autocomplete on well-known categories (`decision`, `pattern`, `preference`, `architecture`, `gotcha`). `ForSessionOptions.categories` and `excludeCategories` use `(KnowledgeCategory | (string & {}))[]` for type safety with escape hatch. - **Cross-project `excludeCategories` test**: Verifies the filter applies to cross-project entries too, not just project-local ones. - **Empty `excludeCategories` array test**: Confirms `excludeCategories: []` has no filtering effect (falsy `.length` check).
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Split LTM injection into two separate system prompt blocks — stable LTM (preferences) and context-bound LTM (gotchas, patterns, architecture) — to improve Anthropic prompt cache hit rates.
System prompt structure
Cache behavior
Changes
packages/gateway/src/translate/anthropic.ts: AddstableLtmSystemtoAnthropicCacheOptions;buildAnthropicRequestemits 3 system blocks whenstableLtmSystemis set. Host prompt no longer gets its own cache breakpoint — the stable LTM block's 1h breakpoint covers the prefix. Falls back to host prompt breakpoint when no stable LTM exists.packages/gateway/src/pipeline.ts: Compute preferences separately viastableLtmCache(computed once per session, not invalidated by curation or idle resume). Context-bound LTM usesexcludeCategories: ["preference"]to avoid duplicating preferences already in block 1. Layer 4 emergency only refreshes context-bound LTM, keeping stable LTM pinned.packages/core/src/ltm.ts: AddexcludeCategoriestoForSessionOptions— generatesAND category NOT IN (...)SQL clause. Mutually exclusive withcategories(include wins).anthropic-caching.test.ts; 2 new tests forexcludeCategoriesinltm.test.ts.Cost analysis