eval: add 2.3M-token mega-session scenario — Lore 4.0 vs Compaction 2.4 (+70%) by BYK · Pull Request #440 · BYK/loreai

BYK · 2026-05-21T08:34:57Z

Summary

Adds a real 2.3M-token eval scenario extracted from a 5-day getsentry/cli refactoring session. At this extreme scale, Lore demonstrates a +70% advantage over classical compaction.

Results

Metric	Lore	Compaction	Delta
Overall	4.0/5	2.4/5	+70%
Easy (late-session)	4.0	2.4	+67%
Medium (mid-session)	3.9	3.0	+29%
Hard (early-session)	4.1	1.8	+136%
Perfect scores (5.0)	13/20	5/20	2.6x
Passing (≥4.0)	14/20	5/20	2.8x

At 2.3M tokens, compaction reduces the entire conversation to ~11K tokens of summary (200x compression). This destroys early-session details. Lore preserves them through distillation (17-21 distillations, ~10K tokens) + 64K raw tail + searchable temporal archive via recall.

Scenario

Source: Real getsentry/cli session (ses_33198e726ffeDyEZ4ZoowIUDJO)
Duration: 5 days (Mar 8-12, 2026), 95 user turns, 3959 assistant turns
Content: Issue triage, 7+ PRs (feat: tool-call-aware cache warming + /lore:warm:* commands + UI controls #370-394), architectural decisions (buildCommand migration), multi-phase plan, code reviews, design debates
20 questions across easy (5), medium (7), hard (8) — targeting issue selection, PR details, architectural decisions, test counts, code cleanup reasoning
Fixture: 1.7MB gzipped JSON

Code Changes

packages/core/eval/scenarios/mega-session.ts — scenario module with 20 questions
packages/core/eval/scenarios/cli-refactor-session.json.gz — compressed session fixture
packages/core/eval/harness.ts — register mega scenario in context dimension
packages/core/eval/baselines.ts — fix compaction chunking for >1M token prefixes, fix threshold to ~140K
packages/core/eval/run.ts — remove tail-window from default baselines

Tests

Typecheck clean across all 4 packages

Extracts a real 5-day coding session (95 user turns, 3959 assistant turns, 2.37M tokens) from the Lore DB and uses it as an eval scenario with 20 questions targeting various depths: early (issue selection, first PR), mid (architectural decisions, design debates), and late (phase execution). No inflation needed — the session is already at mega-scale. Fixture stored as gzipped JSON (1.7MB). Also removes tail-window from default baselines and fixes compaction threshold to ~140K (was triggering at 80K).

…ry (#442) ## Summary Updates the landing page copy to reflect the mega-session benchmark results and stronger value proposition. ## Changes ### Hero section - Description: emphasizes "crystal-clear memory across sessions lasting days, hundreds of turns, any LLM provider" - Chip: "400K+ Token Sessions" → "Sessions Lasting Days" ### Stats strip Already updated in #440: +70% vs compaction, 13/20 perfect scores, 2.3M+ tokens tested. ### "The Problem" section - Compaction step: now cites the real 2.3M-token benchmark — "compaction reduces 2.3 million tokens to an 11K summary, scoring 2.4/5. Lore scores 4.0/5." ### "The Solution" section - Recall step: "13 out of 20 perfect recall scores where compaction managed 5" ### Feature cards - Cost card: "Infinite sessions, lower cost" → "Sessions as long as you want" — "Work for days, hundreds of turns, millions of tokens — memory stays sharp." - Compatibility card: "Works with any provider" → "Portable memory, any provider" — "Switch providers, switch tools, switch machines — your memory travels with you." ### Ticker - Lead with "2.3M tokens, 5 days, crystal-clear recall" + scores - Added "Any provider, any tool — Portable memory that travels with you" - Replaced "Compaction destroys details" (negative) with real benchmark numbers (positive proof)

BYK self-assigned this May 21, 2026

BYK force-pushed the eval-mega-session branch from 133dc0a to 4f2b864 Compare May 21, 2026 08:42

BYK merged commit 6d650e5 into main May 21, 2026
9 of 10 checks passed

BYK deleted the eval-mega-session branch May 21, 2026 09:06

BYK mentioned this pull request May 21, 2026

docs: update website copy — sessions lasting days, crystal-clear memory #442

Merged

craft-deployer Bot mentioned this pull request May 21, 2026

publish: BYK/loreai@0.23.0 #448

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eval: add 2.3M-token mega-session scenario — Lore 4.0 vs Compaction 2.4 (+70%)#440

eval: add 2.3M-token mega-session scenario — Lore 4.0 vs Compaction 2.4 (+70%)#440
BYK merged 1 commit into
mainfrom
eval-mega-session

BYK commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BYK commented May 21, 2026

Summary

Results

Scenario

Code Changes

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant