fix: prevent layer-4 stickiness trap and refactor compression stages#300
Merged
Conversation
Fix a critical bug where the sticky layer guard trapped sessions at emergency mode (layer 4) indefinitely after a single emergency trigger, causing cache busts every turn for the rest of the session — the root cause of long-session cost blowups. Changes: - Exclude layer 4 from stickiness guard (layers 1-3 only) since emergency already blows the cache and stickiness there is pure downside - Replace three hardcoded layer blocks (1-3) with a data-driven compression stage table and loop — same behavior, tunable, extensible - Add importance-aware distillation trimming (selectDistillations) that keeps decisions/gotchas/architecture entries longer under pressure, replacing blind slice(-N)
- Fix rawWindowCache reset skipped when effectiveMinLayer > 2 (regression) - Fix selectDistillations recency formula: use (length-1) divisor so oldest entry gets 0.0 and newest gets full 0.7 recency weight - Tighten importanceBonus regexes: remove overly broad 'fix'/'error'/'pattern' matches that fire on most distillations, keep distinctive signals only - Hoist CompressionStage type and COMPRESSION_STAGES to module level - Simplify urgent distillation logic (collapse redundant if/else if) - Clean up .lore.md: remove garbled curator entries, fix contradictory title
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix critical stickiness bug (
gradient.ts:1533): Sessions hitting emergency mode (layer 4) were trapped there indefinitely — stickiness guard pinnedeffectiveMinLayer = 4forever since message count only grows after urgent distillation. This caused cache busts every turn for the rest of the session, the root cause of long-session cost blowups. Fix: exclude layer 4 from stickiness (layers 1-3 only, where dropping back would bust a warm cache).Replace hardcoded layer blocks with data-driven stage table: Three separate
if (effectiveMinLayer <= N)blocks for layers 1-3 collapsed into aCOMPRESSION_STAGEStable + loop. Same behavior, but escalation path is now visible, tunable, and extensible (new stage = one table row).Add importance-aware distillation trimming: When stages limit distillation count,
selectDistillations()scores by recency (70%) + content signals (30%: decisions, gotchas, architecture, meta-distilled) instead of blindslice(-N). Results re-sorted chronologically for Approach C prefix cache safety.Cost analysis
Layer-4 stickiness trap was causing every turn after emergency to pay full cache-write cost (~$1.00/turn at Opus pricing with 160K context). With the fix, emergency is transient (1-2 turns), then falls back to layer 1 where cache is warm. Over a 100-turn Opus session, this saves $50-80+ for sessions that hit emergency.
Testing