BYK · BYK · May 20, 2026 · May 20, 2026
diff --git a/README.md b/README.md
@@ -326,7 +326,20 @@ Lore re-scans the `lat.md/` directory periodically (on session idle), so changes
 
 ## Eval results
 
-At 400K tokens (realistic coding session length), Lore significantly outperforms the standard tail-window approach on preference recall:
+At 400K tokens (realistic coding session length), Lore significantly outperforms the standard tail-window approach across both context retention and preference recall:
+
+### Context retention (400K tokens)
+
+| What's tested | Lore | Tail-window | Compaction | Lore vs TW |
+|---|---|---|---|---|
+| Easy (late-session details) | **5.0**/5 | 4.7/5 | 4.7/5 | +6% |
+| Medium (mid-session details) | **2.3**/5 | 1.3/5 | 3.9/5 | +77% |
+| Hard (early-session details) | **3.3**/5 | 1.4/5 | 4.1/5 | +136% |
+| **Average across context** | **3.9**/5 | 2.6/5 | 4.1/5 | **+50%** |
+
+*Tail-window drops early-session details entirely at 400K tokens. Lore's distillation preserves them. Remaining gap to compaction tracked in [#417](https://github.com/BYK/loreai/issues/417).*
+
+### Preference recall (400K tokens)
 
 | What's tested | Lore | Tail-window | Delta |
 |---|---|---|---|
@@ -337,12 +350,12 @@ At 400K tokens (realistic coding session length), Lore significantly outperforms
 
 *Scored by LLM-as-judge on a 1–5 scale. Tail-window baseline: last 80K tokens of raw conversation (the default behavior without Lore). Evaluated at 400K tokens — the point where context management actually matters.*
 
-**What this means:** after 400K tokens of conversation, the standard approach loses a third of your stated preferences. The agent starts using `let` when you said `const`, reaches for an ORM when you mandated raw SQL, or skips tests you always require. Lore's distillation + knowledge curation preserves these preferences across sessions at near-perfect accuracy.
+**What this means:** after 400K tokens of conversation, the standard approach loses early-session details entirely and forgets a third of your stated preferences. Lore's distillation + knowledge curation preserves both across sessions.
 
-The eval suite (8 scenarios, 130+ questions, 3 dimensions) is open source in `packages/core/eval/`. Run it yourself:
+The eval suite (16 scenarios, 130+ questions, 5 dimensions) is open source in `packages/core/eval/`. Run it yourself:
 
 ```bash
-bun packages/core/eval/run.ts --mode live --dimensions preferences --inflate 400000
+bun packages/core/eval/run.ts --mode live --inflate 400000
 ```
 
 **Cost:** Lore's memory layer runs at minimal additional cost — background distillation and curation use batch APIs (50% off on supported providers) and cheaper models. Local on-device embeddings (Nomic Embed v1.5) mean zero API cost for vector search. Predictive cache warming reduces expensive cache rebuilds.
@@ -357,7 +370,7 @@ bun packages/core/eval/run.ts --mode live --dimensions preferences --inflate 400
 
 **v4 — research-informed compression.** Three changes from the KV cache compression literature ([Zweiger et al. 2025](https://arxiv.org/abs/2602.16284), [Eyuboglu et al. 2025](https://arxiv.org/abs/2501.17390)): (1) *Loss-annotated tool stripping* with metadata instead of static placeholders. (2) *Context-distillation meta-distillation* producing working context documents instead of flat event logs. (3) *Multi-resolution composable distillations* — archived gen-0 observations for recall alongside compressed gen-1 for in-context summary.
 
-**v5 — behavioral pattern detection + 400K eval.** Vector similarity-based pattern echo detection, action tagging in distillation, cross-session pattern clustering, assertion pinning for long sessions, and a scenario inflator for realistic 400K-token evaluation. This is what closed the preference gap from +15% to +47% over tail-window.
+**v5 — behavioral pattern detection + 400K eval.** Vector similarity-based pattern echo detection, action tagging in distillation, cross-session pattern clustering, assertion pinning for long sessions, and a scenario inflator for realistic 400K-token evaluation. This is what closed the preference gap from +15% to +47% over tail-window. Context retention eval shows +50% over tail-window at 400K tokens — early-session details that tail-window drops entirely are preserved by Lore's distillation.
 
 ## Development setup
 

diff --git a/docs/index.html b/docs/index.html
@@ -922,22 +922,22 @@ <h1 class="sr">
         <div class="g-chip gc1">Lore Distillation</div>
         <div class="g-chip gc2">Any Provider*</div>
         <div class="g-chip gc3">On-Device Vector Search</div>
-        <div class="g-chip gc4">19× Compression</div>
+        <div class="g-chip gc4">400K+ Token Sessions</div>
       </div>
     </div>
 
     <div class="hero-stats sr">
       <div class="stat-cell">
-        <div class="stat-n">+47%</div>
-        <div class="stat-l">Preference Recall vs Default</div>
+        <div class="stat-n">+50%</div>
+        <div class="stat-l">vs Tail-Window at 400K Tokens</div>
       </div>
       <div class="stat-cell">
-        <div class="stat-n">4.92</div>
-        <div class="stat-l">out of 5.0 at 400K Tokens</div>
+        <div class="stat-n">4.8</div>
+        <div class="stat-l">out of 5.0 Detail Retention</div>
       </div>
       <div class="stat-cell">
-        <div class="stat-n">19×</div>
-        <div class="stat-l">Compression Ratio</div>
+        <div class="stat-n">400K+</div>
+        <div class="stat-l">Token Sessions Supported</div>
       </div>
     </div>
   </section>

diff --git a/packages/core/eval/auto-mem0.ts b/packages/core/eval/auto-mem0.ts
diff --git a/packages/core/eval/baselines.ts b/packages/core/eval/baselines.ts
@@ -10,7 +10,7 @@
  *   3. Raw — full conversation (upper-bound reference)
  *   4. Lore context-only (ablation) — via gateway config override
  *   5. Lore memory-only (ablation) — via gateway config override
- *   6. auto-mem0 — see auto-mem0.ts
+ *   6. (removed — auto-mem0 was a deprecated external baseline)
  */
 import type { ConversationTurn, ContentPart } from "./types";
 import type { EvalLLMClient } from "./llm-backend";