Skip to content

fix: correct cost intelligence accuracy and add overhead breakdown#272

Merged
BYK merged 1 commit into
mainfrom
fix/cost-intelligence-accuracy
May 12, 2026
Merged

fix: correct cost intelligence accuracy and add overhead breakdown#272
BYK merged 1 commit into
mainfrom
fix/cost-intelligence-accuracy

Conversation

@BYK
Copy link
Copy Markdown
Owner

@BYK BYK commented May 12, 2026

Summary

  • Fix double-counting of warmup + TTL savings — the same cacheReadTokens were counted in both warmupSavings and ttlSavings when a user returned after >5min following a warmup ping. Since warmups exist for longer gaps, this was the common case, inflating savings by ~2x. Now mutually exclusive: warmup hits take priority, TTL savings only fire for non-warmup turns.

  • Fix shadow context undercount for avoided compactionsupdateShadowContext() was snapshotting the compressed API token count, which underestimates the "without Lore" counterfactual because Lore's gradient manager already trimmed context. Replaced with additive growth tracking using output tokens (always uncompressed) as the reliable per-turn growth signal.

  • Add Lore overhead breakdown to main costs page — the /ui/costs aggregated view now shows per-worker-bucket breakdown (distill, curate, compact, warmup, recall) on the Lore overhead line, matching the per-session Cost Intelligence card format.

Files Changed

File Change
packages/gateway/src/pipeline.ts Warmup/TTL mutual exclusion; pass outputTokens to updateShadowContext
packages/gateway/src/cost-tracker.ts Additive shadow context tracking with _lastActualInput/_lastOutputTokens fields
packages/gateway/src/ui.ts Accumulate + render per-worker-bucket overhead breakdown

- Fix double-counting of warmup + TTL savings: the same cacheReadTokens
  were counted in both buckets when a user returned after >5min following
  a warmup. Now mutually exclusive — warmup hits take priority.

- Fix shadow context undercount for avoided compactions: replaced the
  snapshot approach (which tracked compressed token counts) with additive
  growth tracking using output tokens (always uncompressed) as the
  reliable per-turn growth signal.

- Add per-worker-bucket breakdown (distill, curate, compact, warmup,
  recall) to the Lore overhead line on the main /ui/costs page,
  matching the per-session Cost Intelligence card format.
@BYK BYK force-pushed the fix/cost-intelligence-accuracy branch from 1ded983 to 0268437 Compare May 12, 2026 21:37
@BYK BYK merged commit 8218035 into main May 12, 2026
7 checks passed
@BYK BYK deleted the fix/cost-intelligence-accuracy branch May 12, 2026 21:40
This was referenced May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant