Skip to content

fix(cache-warmer): tighten warming heuristics to reduce net negative spend#429

Merged
BYK merged 1 commit into
mainfrom
fix/cache-warming-cost-optimization
May 20, 2026
Merged

fix(cache-warmer): tighten warming heuristics to reduce net negative spend#429
BYK merged 1 commit into
mainfrom
fix/cache-warming-cost-optimization

Conversation

@BYK
Copy link
Copy Markdown
Owner

@BYK BYK commented May 20, 2026

Problem

Cache warming was net negative: $11.18 spent on warmup pings but only $5.52 saved from cache hits (net loss: $5.66). The system was too eager — warming sessions that don't return, warming too many cycles during tool calls, and accepting trivially low return probabilities.

Root Causes (from live dashboard data)

Issue Example Impact
Initial threshold too low (4-9%) Sessions with 2.1% P(returns) got warmed Nearly every session gets first warmup
Tool-call warming uncapped 31 warmups on one session (52% hit rate) Excessive spend during tool chains
Small sessions warmed too eagerly 5-turn session got 6 warmups, 0 hits Poor ROI on short sessions
ROI check kicks in too late 2/11 hit rate — stopped at warmup #11 Wasted $0.50+ before cutoff

Fixes

1. Initial commitment threshold floor (HIGHEST IMPACT)

P(returns) must now exceed 30% (was 4-9% break-even ratio). Prevents cascading continuation chains on marginal sessions.

2. Tool-call warming cap

  • New TOOL_CALL_MAX_CYCLES = 2 (was unlimited up to break-even)
  • MAX_TOOL_CALL_WARMING_MS: 30min -> 10min
  • Most tool calls complete in <5 min; 2 cycles covers 10 min operations

3. Higher minimum turn count

MIN_TURNS_FOR_WARMING: 3 -> 5 (requires 10 messages). Ensures sufficient survival model data.

4. Minimum context size gate

New MIN_INPUT_TOKENS_FOR_WARMING = 50,000. Below this, absolute savings per hit are too small to justify risk. Force-keep path is exempt.

5. Tighter session-level ROI check

  • MIN_WARMUPS_FOR_ROI_CHECK: 10 -> 5 (kicks in sooner)
  • MIN_SESSION_HIT_RATE: 20% -> 25% (stricter bar)

Files Changed

  • packages/gateway/src/cache-warmer.ts — 5 constant changes, 3 new gates in shouldWarm(), dashboard reason strings
  • packages/gateway/src/idle.ts — context size early-exit + import
  • packages/gateway/test/cache-warmer.test.ts — 3 existing tests updated, 10 new tests
  • packages/gateway/test/helpers/idle-worker.ts — mock updated for new exports

Expected Impact

Estimated ~37-49 fewer wasted warmups per cycle, saving $3.70-4.90 against the $5.66 net loss. Should flip warming to net positive as only profitable sessions get warmed.

What we're NOT changing

  • pSessionFinished() base rate — the 30% threshold floor addresses the same problem more directly
  • Warming overall — historical data shows $369 in warming savings from 775 hits across 964 sessions; the system is profitable at scale with better heuristics

…spend

Cache warming was net negative ($11.18 spent, $5.52 saved). Five
targeted fixes to the decision heuristics:

1. Raise initial commitment threshold floor to 30% P(returns), up from
   the break-even ratio of 4-9% which was trivially exceeded by nearly
   every session.

2. Cap tool-call warming at 2 cycles (was unlimited up to 30 min). Most
   tool calls complete in <5 minutes; also reduce MAX_TOOL_CALL_WARMING_MS
   from 30 to 10 minutes.

3. Raise MIN_TURNS_FOR_WARMING from 3 to 5. Sessions with <5 turns have
   insufficient survival model data and are more likely one-shots.

4. Add MIN_INPUT_TOKENS_FOR_WARMING (50K). Below this threshold, absolute
   savings per hit are too small to justify the risk of wasted warmups.

5. Tighten session-level ROI check: kick in at 5 warmups (was 10) and
   require 25% hit rate (was 20%).
@BYK BYK force-pushed the fix/cache-warming-cost-optimization branch from 902406b to f061d38 Compare May 20, 2026 17:52
@BYK BYK merged commit 1f74311 into main May 20, 2026
10 checks passed
@BYK BYK deleted the fix/cache-warming-cost-optimization branch May 20, 2026 18:07
This was referenced May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant