Skip to content

feat: scenario inflator for 400K token eval scenarios#384

Merged
BYK merged 1 commit into
mainfrom
feat-inflate
May 19, 2026
Merged

feat: scenario inflator for 400K token eval scenarios#384
BYK merged 1 commit into
mainfrom
feat-inflate

Conversation

@BYK
Copy link
Copy Markdown
Owner

@BYK BYK commented May 19, 2026

Summary

Adds a scenario inflator that pads existing eval scenarios to ~400K tokens with realistic filler content. This forces all baselines (tail-window, compaction, Lore) to actually compress, making the comparison fair.

Problem

Current eval scenarios are ~5-20K tokens — well within the 80K tail-window budget. Tail-window sees the full conversation while Lore has distilled it, making the comparison unfair. Real coding sessions are 200-400K tokens.

Solution

inflateScenario(scenario, targetTokens) injects realistic coding conversation filler between existing key turns:

  • 15 filler templates across 4 categories: feature implementation, test output, refactor, debug
  • Each template produces 2-4K tokens of realistic code content (full TypeScript files, test output, stack traces, etc.)
  • Keyword exclusion prevents filler from contaminating test questions
  • Proportional distribution across sessions based on current size
  • Deterministic seeded PRNG for reproducible inflation
  • Token-accurate targeting (400K ± 1%)

Example

import { inflateScenario } from './inflate';
const inflated = inflateScenario(msrScenario, 400_000, 42);
// Original: 5,928 tokens, 63 turns
// Inflated: 400,494 tokens, 2,083 turns
// Questions: all 12 preserved

Files

  • packages/core/eval/inflate.ts (NEW, ~2000 lines — mostly template content)

Adds inflate.ts with 15 filler templates (feature, test, refactor, debug)
that inflate existing eval scenarios to ~400K tokens. This forces all
baselines to actually compress, making the comparison fair.

- inflateScenario() distributes filler proportionally across sessions
- Protected keyword extraction prevents filler from contaminating test questions
- Deterministic seeded PRNG for reproducible inflation
- Token-accurate targeting (400K ± 1%)
@BYK BYK self-assigned this May 19, 2026
@BYK BYK merged commit e0badc9 into main May 19, 2026
10 checks passed
@BYK BYK deleted the feat-inflate branch May 19, 2026 09:13
This was referenced May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant