Skip to content

fix: reduce cache-bust cost with idle threshold, sticky layers, bust tracking, and meta-distill gating#132

Merged
BYK merged 1 commit intomainfrom
cache-cost-optimization
May 6, 2026
Merged

fix: reduce cache-bust cost with idle threshold, sticky layers, bust tracking, and meta-distill gating#132
BYK merged 1 commit intomainfrom
cache-cost-optimization

Conversation

@BYK
Copy link
Copy Markdown
Owner

@BYK BYK commented May 6, 2026

Summary

Four targeted fixes to reduce prompt cache-bust frequency and cost, building on top of the distillation consumption freeze (PR #123):

  • idleResumeMinutes 60→5: Match Anthropic's default-tier 5m cache TTL. After 5m idle, the cache is cold — preserving byte-identity just pays write cost for no benefit. Extended-tier users can set 60 in .lore.json.
  • Generalized sticky-layer guard: Pin at sessState.lastLayer instead of hardcoded 1. Prevents 2→1→2 and 3→2→3 oscillation that causes repeated structural cache busts.
  • Bust-rate tracking: bustCount/transformCount in SessionState with log.warn() when rate >50% after 20+ transforms. Runtime visibility into cache stability.
  • Meta-distillation gating on cache warmth: skipMeta option on distillation.run(). When the prompt cache is likely warm (timeSinceLastTurn < cacheTTL), defer meta-distillation to avoid row ID rewrites that bust the prefix cache. Forced calls (compaction, overflow) always allow it.

Cost impact

Scenario Before After
10-min pause on default tier Cache write at 1.25× (stale preservation) Cache refresh with better content (same write cost)
Layer 2→1→2 oscillation 2 busts per cycle (~$3.76 Sonnet) 0 busts (pinned at layer 2)
Meta-distill during idle, user returns <5m Prefix re-render → bust (~$1.88) Deferred — bust-free

Files changed

  • packages/core/src/config.ts — idle threshold default
  • packages/core/src/gradient.ts — sticky guard, bust tracking, getLastTurnAt()
  • packages/core/src/distillation.tsskipMeta option
  • packages/core/src/index.ts — barrel export
  • packages/gateway/src/idle.ts — cache-warm gating
  • packages/opencode/src/index.ts — cache-warm gating
  • packages/opencode/test/index.test.ts — adjust test for new 5m default

744 tests pass, 0 failures.

…tracking, and meta-distill gating

- Change idleResumeMinutes default from 60 to 5 to match Anthropic's
  default-tier cache TTL (5m). Prevents paying cache-write cost to
  preserve byte-identity of an already-cold cache after short pauses.

- Generalize sticky-layer guard to pin at sessState.lastLayer instead
  of hardcoded 1. Prevents 2→1→2 and 3→2→3 layer oscillation that
  causes repeated cache busts from structural message changes.

- Add bustCount/transformCount to SessionState with log.warn() when
  bust rate exceeds 50% after 20+ transforms, providing runtime
  visibility into cache-stability effectiveness.

- Add skipMeta option to distillation.run() and gate meta-distillation
  on cache warmth (timeSinceLastTurn < cacheTTL). Prevents prefix cache
  invalidation from row ID rewrites when the user returns within the
  cache TTL window. Forced calls (compaction, overflow) always allow
  meta-distillation.
@BYK BYK enabled auto-merge (squash) May 6, 2026 17:18
@BYK BYK merged commit db4f830 into main May 6, 2026
1 check passed
@BYK BYK deleted the cache-cost-optimization branch May 6, 2026 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant