v0.5.0 — BEAM benchmark + wikis on by default
Highlights
- Public benchmark harness for BEAM — anyone can clone, point at an LLM provider, and reproduce our numbers. Same dataset, same upstream eval, same judge prompt as published comparators.
- Wiki pipeline now on by default — the maintainer + writer were opt-in before. Idle ticks cost zero LLM calls; opt out with
WIKI_ENABLED=false. - Better long-document ingestion — bigger chunks (1200 words), density-aware fact extraction, byte-range source pointers, per-chunk retries.
- Friendlier defaults — bench points at
deepinfra(google/gemma-4-31B-it) by default. A fresh clone just needs an API key.
Upgrade notes (v0.4.0 → v0.5.0)
- Wiki pipeline starts automatically — set
WIKI_ENABLED=falseif you don't want it. - Bench default provider is deepinfra — set
LLM_PROFILE_BENCH=vllm_workstation_qwenplus judge overrides in.env.benchfor self-hosted Qwen. - Watcher chunk size is 1200 words (was 600) — affects extraction wall-clock on long sources.
What's next
A full BEAM 100K run on this release is in flight; score numbers + cross-judge calibration land in a follow-up.