feat: wire --inflate flag into eval CLI for 400K token scenario testing by BYK · Pull Request #386 · BYK/loreai

BYK · 2026-05-19T10:40:41Z

Summary

Adds --inflate <tokens> flag to the eval CLI that inflates scenarios to a target token count before running them. This enables fair baseline comparison at realistic conversation lengths.

First Results at 400K Tokens

PR-2 (implicit preferences):

Baseline	Score	Delta
Lore	4.20	—
Tail-window	2.90	-1.30

Lore decisively outperforms tail-window at realistic conversation lengths. At 400K tokens, tail-window can only fit the last 80K tokens (dropping preferences stated early in the conversation), while Lore's distillation preserves them.

Lore wins 6/8 questions. Tail-window only wins on 2 questions where the relevant facts happen to be in its surviving 80K window.

Usage

bun packages/core/eval/run.ts --mode live --inflate 400000 --baselines lore,tail-window

Files Changed

packages/core/eval/run.ts — --inflate arg parsing + logging
packages/core/eval/types.ts — inflateTokens field on EvalConfig
packages/core/eval/harness.ts — inflation before scenario execution

Adds --inflate <tokens> flag to eval run.ts that inflates scenarios before running them. Uses inflateScenario() from inflate.ts. First results at 400K tokens (PR-2 implicit preferences): Lore: 4.20 Tail-window: 2.90 At realistic conversation lengths, Lore decisively outperforms tail-window on 6/8 questions.

BYK self-assigned this May 19, 2026

BYK merged commit 1e3ab93 into main May 19, 2026
10 checks passed

BYK deleted the feat-inflate-cli branch May 19, 2026 10:42

This was referenced May 21, 2026

publish: BYK/loreai@0.23.0 #439

Closed

publish: BYK/loreai@0.23.0 #448

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: wire --inflate flag into eval CLI for 400K token scenario testing#386

feat: wire --inflate flag into eval CLI for 400K token scenario testing#386
BYK merged 1 commit into
mainfrom
feat-inflate-cli

BYK commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BYK commented May 19, 2026

Summary

First Results at 400K Tokens

Usage

Files Changed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant