fix: consolidation retry storm, idle curation frequency, and session memory leak#473
Merged
Merged
Conversation
…memory leak - Add 1-hour cooldown for failed consolidation attempts (stops wasting Sonnet calls when LLM correctly concludes all entries are unique) - Apply cost-aware curation multiplier in idle path (was using raw afterTurns=3 instead of afterTurns*2=6 for Sonnet — 2x too frequent) - Strengthen consolidation prompt with forced-eviction fallback (LLM must now reduce to target count, not just try) - Gate curation entry creation at maxEntries limit (prevents ratchet effect where entries grow monotonically) - Add session eviction after 1 hour idle (frees gradient state, recall store, LTM caches, cost tracking, auth — all persisted to SQLite) - Export evictSession() from core for clean single-session cleanup
BYK
added a commit
that referenced
this pull request
May 27, 2026
## Summary Follow-up to #473. Onur's logs revealed **1143 knowledge entries** — far beyond what the single-pass consolidation can handle. The previous consolidation sent all entries in one prompt, but with 1143 entries (~343K tokens of input) this overflows the context window, and the 4096 output token budget can only express ~80-100 delete ops (vs the ~1118 needed). ## Context from Onur's logs ``` entry count 1143 exceeds maxEntries 25 — running consolidation entry count 1143 exceeds maxEntries 25 — running consolidation entry count 1143 exceeds maxEntries 25 — running consolidation ...repeating every ~60s... cost-tracker: worker overhead=$1140.8017 (distillation-only=$2.5246) ``` The retry storm (#473) burned $1,138 in consolidation calls that could never succeed due to the token budget constraint. ## Changes Adds batched consolidation mode in `curator.ts`: - When entries ≤ 50: unchanged — sends all entries in a single prompt - When entries > 50 (batched mode): takes the **lowest-confidence** entries (tail of the confidence-sorted list from `forProject()`) as candidates for deletion. Each pass targets removing ~25 entries (half the batch). - The idle scheduler's cooldown (from #473) clears when entry count changes, automatically triggering the next batch on the following idle tick. - Converges to `maxEntries` over multiple passes: 1143 → 1118 → 1093 → ... → 25 For Onur's case: ~45 passes × 1 Sonnet call each ≈ $2-3 total to clean up 1143 entries, spread across idle periods. vs the previous behavior of infinite retries that never made progress.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes three user-reported issues: excessive Sonnet API usage, 5GB RAM consumption, and knowledge consolidation stuck in an infinite retry loop. All three are interconnected — the consolidation loop was a major driver of the Sonnet overhead.
Changes
Consolidation retry storm (Issue 3 + Issue 1)
idle.ts): tracks per-project{attemptedAt, entryCount}. When consolidation runs but produces no changes (LLM correctly concludes all entries are unique), enters a 1-hour cooldown. Cooldown clears when entry count changes (curation adds/removes entries). Previously retried every 30-60s indefinitely — 15-30 wasted Sonnet calls per 30-minute idle period.prompt.ts): added a "FORCED EVICTION" step — when merging/trimming isn't enough, the LLM MUST delete least-valuable entries to reach the target. The user prompt now states "must remove at least N entries."curator.ts): when entry count is at or abovemaxEntries, curation runs withskipCreate: true, preventing the ratchet effect where entries grow monotonically.Excessive Sonnet API usage (Issue 1)
idle.ts): the idle path was using rawafterTurns=3while the inline path usesafterTurns * curationMultiplier(=6 for Sonnet, =9 for Opus). Idle curation was firing 2x more often than intended for Sonnet-class models.Session memory leak (Issue 2)
idle.ts,pipeline.ts,gradient.ts,index.ts): sessions idle > 1 hour are evicted from all in-memory Maps. Persists final cost/gradient state to SQLite before cleanup. Cleans up: gradient state, curation tracker, cost tracking, auth, billing prefix, warmup auth, and pipeline satellite Maps (headerSessionIndex, ltmSessionCache, ltmPinnedText, stableLtmCache, cwdWarned). NewevictSession()exported from core for clean single-session gradient cleanup.