Skip to content

fix: use compaction as primary baseline, fix threshold, update marketing#437

Merged
BYK merged 1 commit into
mainfrom
fix-compaction-baseline
May 21, 2026
Merged

fix: use compaction as primary baseline, fix threshold, update marketing#437
BYK merged 1 commit into
mainfrom
fix-compaction-baseline

Conversation

@BYK
Copy link
Copy Markdown
Owner

@BYK BYK commented May 21, 2026

Summary

Switches the eval comparison from tail-window to compaction (what real tools actually do), fixes the compaction threshold, and updates marketing copy with honest numbers from a fresh head-to-head run.

Eval changes

Remove tail-window from default baselines (run.ts)

Nobody uses raw tail-window in production — Claude Code, Codex, OpenCode all use compaction. Compaction is the honest baseline.

Fix compaction threshold (baselines.ts)

The first compaction was triggering at 80K total tokens. Real tools don't compact until ~140K (83.5% of the 200K context window minus output reserve). Fixed: compaction now only triggers when total > compactionThreshold (~140K), not at the tail budget (80K).

Fresh eval results (CM-1, 400K inflation, head-to-head)

Difficulty Lore Compaction Delta
Easy 4.7/5 4.8/5 -2%
Medium 4.8/5 4.0/5 +19%
Hard 4.9/5 4.7/5 +5%
Average 4.8/5 4.5/5 +7%
Perfect scores 12/15 9/15

Lore's advantage is on mid-session details (decision alternatives, exact error messages, rejected approaches) that compaction summarizes away through repeated compression cycles.

Marketing updates

README.md

  • Context retention table: compaction as primary comparison (not tail-window)
  • Preference recall table: noted baselines pending re-run with compaction
  • Summary: 4.8/5 with 12/15 perfect scores

docs/index.html

  • Hero stat: "12/15 Perfect Scores at 400K Tokens"
  • Detail retention: 4.8/5

Version history

  • v6: updated to reference compaction comparison

Tests

  • 1752 pass, 0 fail
  • Typecheck clean

- Remove tail-window from default eval baselines (compaction is what real tools do)
- Fix compaction threshold: don't trigger before ~140K (was 80K)
- Update README: Lore 4.8/5 vs compaction 4.5/5, 12/15 perfect scores
- Update landing page: 12/15 perfect scores, 4.8/5 detail retention
@BYK BYK self-assigned this May 21, 2026
@BYK BYK merged commit d056b65 into main May 21, 2026
9 of 10 checks passed
@BYK BYK deleted the fix-compaction-baseline branch May 21, 2026 07:27
This was referenced May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant