feat: 3-block system prompt for cache-efficient LTM injection by BYK · Pull Request #311 · BYK/loreai

BYK · 2026-05-14T16:23:23Z

Summary

Split LTM injection into two separate system prompt blocks — stable LTM (preferences) and context-bound LTM (gotchas, patterns, architecture) — to improve Anthropic prompt cache hit rates.

System prompt structure

Block 0: Host prompt        — no cache_control (covered by prefix)
Block 1: Stable LTM (prefs) — cache_control: 1h TTL
Block 2: Context-bound LTM  — no cache_control (rides 5m conversation cache)

Cache behavior

Blocks 0+1 form a stable prefix written once at 2× cost, read at 0.1× for the entire session (and across sessions within 1h)
Block 2 + messages ride the conversation cache (5m TTL, 1.25× write)
When context-bound LTM changes (turn 1→2, curation), only block 2 + messages are re-processed; blocks 0+1 remain cache reads
Stable LTM is pinned for ≥1h even through curation changes — stale preferences are tolerated to preserve the cache prefix investment

Changes

packages/gateway/src/translate/anthropic.ts: Add stableLtmSystem to AnthropicCacheOptions; buildAnthropicRequest emits 3 system blocks when stableLtmSystem is set. Host prompt no longer gets its own cache breakpoint — the stable LTM block's 1h breakpoint covers the prefix. Falls back to host prompt breakpoint when no stable LTM exists.
packages/gateway/src/pipeline.ts: Compute preferences separately via stableLtmCache (computed once per session, not invalidated by curation or idle resume). Context-bound LTM uses excludeCategories: ["preference"] to avoid duplicating preferences already in block 1. Layer 4 emergency only refreshes context-bound LTM, keeping stable LTM pinned.
packages/core/src/ltm.ts: Add excludeCategories to ForSessionOptions — generates AND category NOT IN (...) SQL clause. Mutually exclusive with categories (include wins).
Tests: 8 new tests for 3-block system prompt behavior in anthropic-caching.test.ts; 2 new tests for excludeCategories in ltm.test.ts.

Cost analysis

Scenario	Before	After
Turn 1→2 (LTM changes)	Re-process all LTM + messages	Only block 2 + messages; blocks 0+1 are cache reads
Curation changes prefs	Full LTM cache bust	Block 1 stays pinned; only block 2 may change
New session within 1h	Host prompt is cache read, LTM is cold	Blocks 0+1 are cache reads from previous session

Split LTM into stable (preferences, 1h cache) and context-bound (gotchas/patterns/architecture, rides conversation cache) blocks to improve Anthropic prompt cache hit rates. System prompt structure: Block 0: Host prompt — no cache_control (covered by prefix) Block 1: Stable LTM (prefs) — cache_control: 1h TTL Block 2: Context-bound LTM — no cache_control (rides 5m conv cache) Blocks 0+1 form a stable prefix written once at 2x cost, read at 0.1x for the entire session. When context-bound LTM changes (turn 1→2, curation), only block 2 + messages are re-processed. Changes: - Add stableLtmSystem to AnthropicCacheOptions - buildAnthropicRequest emits 3 system blocks when stableLtmSystem set - pipeline.ts computes preferences separately via stableLtmCache (pinned for >=1h, not invalidated by curation) - Context-bound entries use excludeCategories to avoid duplicating prefs - Layer 4 emergency refreshes only context-bound LTM, keeps stable pinned - Add excludeCategories option to forSession() in ltm.ts - OpenAI fallback concatenates both LTM texts into single string

## Summary Fixes all pre-existing and newly-identified issues from the code review of PR #311 (3-block system prompt). ### Critical fixes - **OpenAI/Responses API upstreams now receive LTM**: Previously, LTM text was only passed via `AnthropicCacheOptions`, which only `buildAnthropicRequest` consumed. OpenAI Chat Completions and Responses API paths received `req.system` unmodified (no LTM). Fix: `forwardToUpstream` now concatenates `stableLtmSystem` + `ltmSystem` into `req.system` before calling OpenAI builders. - **`vectorSearch()` now filters by category**: Previously queried all knowledge entries with embeddings regardless of category. When `excludeCategories: ["preference"]` was passed to `forSession()`, the SQL filtered preferences out of the entry lists, but `vectorSearch()` still returned preference entries in its top-50 hits — crowding out relevant context-bound entries. Fix: add optional `excludeCategories` parameter to `vectorSearch()`, propagated from `forSession()`. ### Medium fixes - **`textDiffRatio` rewritten**: Old implementation only compared prefix + suffix characters, missing interior content changes entirely (e.g., entries added/removed in the middle). New implementation samples up to 1000 evenly-spaced positions across the full string length for O(1) cost, reliably detecting interior edits. - **Context-bound LTM budget floor**: Added a 50% minimum floor for context-bound entries. Previously, a project with many preferences could consume the entire LTM budget, starving gotchas/patterns/architecture entries that are more critical for correctness during active work. ### Low fixes - **`KnowledgeCategory` type alias**: Exported from `ltm.ts` for autocomplete on well-known categories (`decision`, `pattern`, `preference`, `architecture`, `gotcha`). `ForSessionOptions.categories` and `excludeCategories` use `(KnowledgeCategory | (string & {}))[]` for type safety with escape hatch. - **Cross-project `excludeCategories` test**: Verifies the filter applies to cross-project entries too, not just project-local ones. - **Empty `excludeCategories` array test**: Confirms `excludeCategories: []` has no filtering effect (falsy `.length` check).

BYK force-pushed the 3-block-system-prompt branch from a2121bb to 28e4d15 Compare May 14, 2026 16:36

BYK merged commit 72c7556 into main May 14, 2026
7 checks passed

BYK deleted the 3-block-system-prompt branch May 14, 2026 16:41

BYK mentioned this pull request May 14, 2026

fix: address review findings from 3-block system prompt PR #312

Merged

craft-deployer Bot mentioned this pull request May 14, 2026

publish: BYK/loreai@0.19.0 #328

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: 3-block system prompt for cache-efficient LTM injection#311

feat: 3-block system prompt for cache-efficient LTM injection#311
BYK merged 1 commit into
mainfrom
3-block-system-prompt

BYK commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BYK commented May 14, 2026

Summary

System prompt structure

Cache behavior

Changes

Cost analysis

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant