Skip to content

docs: update README with 400K token eval results#403

Merged
BYK merged 1 commit into
mainfrom
docs-readme-eval
May 19, 2026
Merged

docs: update README with 400K token eval results#403
BYK merged 1 commit into
mainfrom
docs-readme-eval

Conversation

@BYK
Copy link
Copy Markdown
Owner

@BYK BYK commented May 19, 2026

Summary

Update README eval section with results from the new eval suite (14 PRs of eval infrastructure + improvements this session).

Changes

  • Replace old eval numbers (93.3%, +35pp, 7x cost efficiency) with new 400K-token results
  • Add eval results table: Lore 4.92/5 vs tail-window 3.34/5 (+47%) on preferences
  • Add plain-language explanation of what the numbers mean
  • Add eval run instructions (the suite is open source)
  • Add cost story (batch APIs, local embeddings, cache warming)
  • Condense v1-v4 history into concise summaries
  • Add v5 summary (behavioral patterns, 400K eval, assertion pinning)

Eval Numbers

Scenario Lore Tail-window Delta
Explicit preferences 4.96/5 3.40/5 +46%
Implicit behavioral patterns 4.83/5 2.97/5 +63%
Preference evolution 5.00/5 3.67/5 +36%
Average 4.92/5 3.34/5 +47%

Replace old eval numbers (93.3%, +35pp) with new eval suite results:
- Preferences at 400K: Lore 4.92/5 vs tail-window 3.34/5 (+47%)
- Explicit prefs: 4.96, implicit patterns: 4.83, preference evolution: 5.00
- Add eval run instructions and cost story
- Condense v1-v4 history, add v5 summary
@BYK BYK self-assigned this May 19, 2026
@BYK BYK merged commit 9b28e83 into main May 19, 2026
7 checks passed
@BYK BYK deleted the docs-readme-eval branch May 19, 2026 21:07
This was referenced May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant