Problem
At 400K tokens, Lore scores 3.5 vs compaction's 4.2 on CM-1. The gap is specifically on medium-difficulty questions where distillation dropped specific details (exact error messages, alternative approaches considered, config file paths).
Root Cause
Lore's two-stage compression pipeline loses more information than compaction's single-pass summary:
- Gen-0 compression: Each 16K segment gets
8*sqrt(16384) = 1024 tokens. This is 16:1 compression — specific identifiers (error messages, file paths, version numbers, rejected alternatives) are dropped.
- Meta-distillation: 20 gen-0 segments (~20K tokens) compressed to ~5-6K tokens. Another 3-4:1 compression compounding the loss.
- Total: 320K → 20K → 5-6K = ~53-64:1 compression vs compaction's 320K → 4K single-pass (80:1 but higher quality due to seeing raw content).
Evidence
CM-1 inflated to 400K tokens:
| Baseline |
Easy |
Medium |
Hard |
Overall |
| Lore |
5.0 |
2.3 |
3.3 |
3.5 |
| Tail-window |
4.7 |
1.3 |
1.4 |
2.5 |
| Compaction |
4.7 |
3.9 |
4.1 |
4.2 |
Failed medium questions: model says 'no alternative approach was explicitly discussed' (it was), 'I don't have information about an integration test failure' (there was one).
Possible Fixes
- Increase gen-0 budget multiplier from 8 to 10-12 (more tokens per segment)
- Raise the CAP from 4096 to 6144 or 8192 (allows larger segments more budget)
- Skip meta-distillation for recent segments (keep gen-0 detail for the most recent N segments)
- Improve observer prompt to prioritize specific identifiers over chronological event logging
- Increase tool output truncation limit from 2K chars (distillation input) — specific error messages in tool results get truncated before the LLM sees them
Context
Discovered during #410 investigation. The eval fix (#414) correctly represents Lore's behavior — this gap is a real Lore limitation, not an eval bug.
Problem
At 400K tokens, Lore scores 3.5 vs compaction's 4.2 on CM-1. The gap is specifically on medium-difficulty questions where distillation dropped specific details (exact error messages, alternative approaches considered, config file paths).
Root Cause
Lore's two-stage compression pipeline loses more information than compaction's single-pass summary:
8*sqrt(16384) = 1024tokens. This is 16:1 compression — specific identifiers (error messages, file paths, version numbers, rejected alternatives) are dropped.Evidence
CM-1 inflated to 400K tokens:
Failed medium questions: model says 'no alternative approach was explicitly discussed' (it was), 'I don't have information about an integration test failure' (there was one).
Possible Fixes
Context
Discovered during #410 investigation. The eval fix (#414) correctly represents Lore's behavior — this gap is a real Lore limitation, not an eval bug.