Skip to content

feat: 3-block system prompt for cache-efficient LTM injection#311

Merged
BYK merged 1 commit into
mainfrom
3-block-system-prompt
May 14, 2026
Merged

feat: 3-block system prompt for cache-efficient LTM injection#311
BYK merged 1 commit into
mainfrom
3-block-system-prompt

Conversation

@BYK
Copy link
Copy Markdown
Owner

@BYK BYK commented May 14, 2026

Summary

Split LTM injection into two separate system prompt blocks — stable LTM (preferences) and context-bound LTM (gotchas, patterns, architecture) — to improve Anthropic prompt cache hit rates.

System prompt structure

Block 0: Host prompt        — no cache_control (covered by prefix)
Block 1: Stable LTM (prefs) — cache_control: 1h TTL
Block 2: Context-bound LTM  — no cache_control (rides 5m conversation cache)

Cache behavior

  • Blocks 0+1 form a stable prefix written once at 2× cost, read at 0.1× for the entire session (and across sessions within 1h)
  • Block 2 + messages ride the conversation cache (5m TTL, 1.25× write)
  • When context-bound LTM changes (turn 1→2, curation), only block 2 + messages are re-processed; blocks 0+1 remain cache reads
  • Stable LTM is pinned for ≥1h even through curation changes — stale preferences are tolerated to preserve the cache prefix investment

Changes

  • packages/gateway/src/translate/anthropic.ts: Add stableLtmSystem to AnthropicCacheOptions; buildAnthropicRequest emits 3 system blocks when stableLtmSystem is set. Host prompt no longer gets its own cache breakpoint — the stable LTM block's 1h breakpoint covers the prefix. Falls back to host prompt breakpoint when no stable LTM exists.
  • packages/gateway/src/pipeline.ts: Compute preferences separately via stableLtmCache (computed once per session, not invalidated by curation or idle resume). Context-bound LTM uses excludeCategories: ["preference"] to avoid duplicating preferences already in block 1. Layer 4 emergency only refreshes context-bound LTM, keeping stable LTM pinned.
  • packages/core/src/ltm.ts: Add excludeCategories to ForSessionOptions — generates AND category NOT IN (...) SQL clause. Mutually exclusive with categories (include wins).
  • Tests: 8 new tests for 3-block system prompt behavior in anthropic-caching.test.ts; 2 new tests for excludeCategories in ltm.test.ts.

Cost analysis

Scenario Before After
Turn 1→2 (LTM changes) Re-process all LTM + messages Only block 2 + messages; blocks 0+1 are cache reads
Curation changes prefs Full LTM cache bust Block 1 stays pinned; only block 2 may change
New session within 1h Host prompt is cache read, LTM is cold Blocks 0+1 are cache reads from previous session

Split LTM into stable (preferences, 1h cache) and context-bound
(gotchas/patterns/architecture, rides conversation cache) blocks to
improve Anthropic prompt cache hit rates.

System prompt structure:
  Block 0: Host prompt        — no cache_control (covered by prefix)
  Block 1: Stable LTM (prefs) — cache_control: 1h TTL
  Block 2: Context-bound LTM  — no cache_control (rides 5m conv cache)

Blocks 0+1 form a stable prefix written once at 2x cost, read at 0.1x
for the entire session. When context-bound LTM changes (turn 1→2,
curation), only block 2 + messages are re-processed.

Changes:
- Add stableLtmSystem to AnthropicCacheOptions
- buildAnthropicRequest emits 3 system blocks when stableLtmSystem set
- pipeline.ts computes preferences separately via stableLtmCache (pinned
  for >=1h, not invalidated by curation)
- Context-bound entries use excludeCategories to avoid duplicating prefs
- Layer 4 emergency refreshes only context-bound LTM, keeps stable pinned
- Add excludeCategories option to forSession() in ltm.ts
- OpenAI fallback concatenates both LTM texts into single string
@BYK BYK force-pushed the 3-block-system-prompt branch from a2121bb to 28e4d15 Compare May 14, 2026 16:36
@BYK BYK merged commit 72c7556 into main May 14, 2026
7 checks passed
@BYK BYK deleted the 3-block-system-prompt branch May 14, 2026 16:41
BYK added a commit that referenced this pull request May 14, 2026
## Summary

Fixes all pre-existing and newly-identified issues from the code review
of PR #311 (3-block system prompt).

### Critical fixes

- **OpenAI/Responses API upstreams now receive LTM**: Previously, LTM
text was only passed via `AnthropicCacheOptions`, which only
`buildAnthropicRequest` consumed. OpenAI Chat Completions and Responses
API paths received `req.system` unmodified (no LTM). Fix:
`forwardToUpstream` now concatenates `stableLtmSystem` + `ltmSystem`
into `req.system` before calling OpenAI builders.

- **`vectorSearch()` now filters by category**: Previously queried all
knowledge entries with embeddings regardless of category. When
`excludeCategories: ["preference"]` was passed to `forSession()`, the
SQL filtered preferences out of the entry lists, but `vectorSearch()`
still returned preference entries in its top-50 hits — crowding out
relevant context-bound entries. Fix: add optional `excludeCategories`
parameter to `vectorSearch()`, propagated from `forSession()`.

### Medium fixes

- **`textDiffRatio` rewritten**: Old implementation only compared prefix
+ suffix characters, missing interior content changes entirely (e.g.,
entries added/removed in the middle). New implementation samples up to
1000 evenly-spaced positions across the full string length for O(1)
cost, reliably detecting interior edits.

- **Context-bound LTM budget floor**: Added a 50% minimum floor for
context-bound entries. Previously, a project with many preferences could
consume the entire LTM budget, starving gotchas/patterns/architecture
entries that are more critical for correctness during active work.

### Low fixes

- **`KnowledgeCategory` type alias**: Exported from `ltm.ts` for
autocomplete on well-known categories (`decision`, `pattern`,
`preference`, `architecture`, `gotcha`). `ForSessionOptions.categories`
and `excludeCategories` use `(KnowledgeCategory | (string & {}))[]` for
type safety with escape hatch.

- **Cross-project `excludeCategories` test**: Verifies the filter
applies to cross-project entries too, not just project-local ones.

- **Empty `excludeCategories` array test**: Confirms `excludeCategories:
[]` has no filtering effect (falsy `.length` check).
@craft-deployer craft-deployer Bot mentioned this pull request May 14, 2026
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant