refactor: replace EMA-driven context cap with tier-based cost-aware decisions by BYK · Pull Request #348 · BYK/loreai

BYK · 2026-05-15T17:09:14Z

Summary

Remove the adaptive context cap system (EMA bust rate tracking, dynamic cap tightening/relaxation, MIN_CONTEXT_FLOOR, maxContextTokensCeiling) that caused death spirals by ratcheting the cap below the session's actual token count
Replace with tier-based quality watermarks (200K/500K/model limit) and per-turn economic bust-vs-continue decisions using actual cache pricing data
Add rolling bust detection: after 5 consecutive cache busts, stop compressing and inject a warning message advising the user to compact or start a new conversation

Key changes

New API

setCachePricing(write, read) — sets per-token cache costs from models.dev
shouldCompress(currentTokens, compressedTokens, consecutiveBusts) — bust-vs-continue economic decision with 0.85 threshold. Heavily favors NOT compressing since cache writes are 12.5x more expensive than reads on Opus. Returns false (don't compress) when no pricing data is available.
getTier(tokens) — maps token count to quality tier (0: ≤200K best, 1: ≤500K acceptable, 2: >500K degraded)
recordCacheUsage(write, read, inputTokens, sessionID) — tracks consecutive busts using total input tokens as denominator (not just cache tokens)

Integration points

Tier gate in transformInner: Between the layer-0 passthrough and compression stages, shouldCompress() is called. When bust cost isn't justified, the session stays at layer 0 with full context up to the model limit.
Unsustainable warning in pipeline.ts: When TransformResult.unsustainable is set (5+ consecutive busts), a <system-reminder> warning is injected into the last user message advising to compact or start fresh.
DB persistence: consecutiveBusts is persisted via the repurposed dynamicContextCap column (no migration needed) and restored on session load.

Removed

adaptContextCap(), computeContextCap(), setMaxContextTokens(), getMaxContextTokens(), updateBustRate()
bustRateEMA, interBustIntervalEMA, lastBustAt, dynamicContextCap session state fields (DB columns retained, repurposed)
effectiveCap constraint in transformInner — context now uses full model window
targetBustCost, maxContextTokens config fields (marked deprecated, still parsed to avoid breaking existing .lore.json)

Why

The old EMA system optimized for minimizing input_tokens per turn via a static dollar-based cap. This created a death spiral: high bust rate → tighten cap → session can't fit in lower layers → forced to Layer 4 → bust rate stays high → cap ratchets down further. The tiers are quality-based (empirical model effectiveness), not pricing-based — token costs scale linearly regardless of tier.

Math verification (Opus 4.6)

Cache write: $6.25/MTok, Cache read: $0.50/MTok (write is 12.5x more expensive)
At 250K → 150K compressed: bustCost=$0.94, continueCost=$0.125 → don't compress ✓
At 500K → 100K compressed: bustCost=$0.625, continueCost=$0.25 → don't compress ✓
At 2M → 100K compressed: bustCost=$0.625, continueCost=$1.00 → compress ✓

…ecisions Remove the adaptive context cap system (EMA bust rate tracking, dynamic cap tightening/relaxation, MIN_CONTEXT_FLOOR) that caused death spirals by ratcheting the cap below the session's token count. Replace with a tier-based model using quality watermarks (200K/500K/model limit) and per-turn economic bust-vs-continue decisions: - setCachePricing(write, read): configures per-token cache costs from models.dev - shouldCompress(current, compressed, busts): compares bust cost vs continue cost with 0.85 threshold — heavily favors NOT compressing since cache writes are 12.5x more expensive than reads on Opus - getTier(tokens): maps token count to quality tier (0/1/2) - recordCacheUsage(): tracks consecutive busts for rolling detection - TransformResult.unsustainable: signals 5+ consecutive busts for user warning The write-to-read cost ratio (12.5x on Opus) naturally makes shouldCompress reject compression in most cases — only extreme ratios (e.g. 2M->100K) trigger it. After 5 consecutive busts, compression stops entirely and the unsustainable flag is set for warning injection.

…s, inject unsustainable warning Address critical review findings: C1: shouldCompress() is now called in the tier gate between layer-0 passthrough and compression stages. When bust cost is not justified, the session stays at layer 0 with full context up to the model limit. C2: unsustainable flag is consumed in pipeline.ts step 7c — injects a system-reminder warning into the last user message advising to compact or start a new conversation. C3: consecutiveBusts is persisted to DB via the repurposed dynamicContextCap column in session_state (no migration needed). Restored on session load. M2: recordCacheUsage now takes total inputTokens as denominator for bust ratio, not just cacheWrite+cacheRead — prevents inflated bust ratios when a large fraction of tokens is uncached. M5: shouldCompress fallback (no pricing data) now returns false (don't compress) instead of true — conservative default matches the design principle of favoring cache preservation. L4: resetCalibration now resets cache pricing globals for test isolation. L5: Fixed misleading 'no artificial cap' comment.

BYK added 2 commits May 15, 2026 17:08

BYK merged commit ed5b203 into main May 15, 2026
7 checks passed

BYK deleted the refactor/cost-aware-tier-system branch May 15, 2026 17:47

BYK mentioned this pull request May 15, 2026

release: 0.20.1 #349

Closed

This was referenced May 15, 2026

publish: BYK/loreai@0.20.1 #350

Closed

publish: BYK/loreai@0.20.2 #353

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: replace EMA-driven context cap with tier-based cost-aware decisions#348

refactor: replace EMA-driven context cap with tier-based cost-aware decisions#348
BYK merged 2 commits into
mainfrom
refactor/cost-aware-tier-system

BYK commented May 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BYK commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key changes

New API

Integration points

Removed

Why

Math verification (Opus 4.6)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

BYK commented May 15, 2026 •

edited

Loading