v0.5.0 — BEAM benchmark + wikis on by default

dimknaf released this 09 Jun 10:48

· 28 commits to main since this release

60f02d8

Highlights

Public benchmark harness for BEAM — anyone can clone, point at an LLM provider, and reproduce our numbers. Same dataset, same upstream eval, same judge prompt as published comparators.
Wiki pipeline now on by default — the maintainer + writer were opt-in before. Idle ticks cost zero LLM calls; opt out with WIKI_ENABLED=false.
Better long-document ingestion — bigger chunks (1200 words), density-aware fact extraction, byte-range source pointers, per-chunk retries.
Friendlier defaults — bench points at deepinfra (google/gemma-4-31B-it) by default. A fresh clone just needs an API key.

Upgrade notes (v0.4.0 → v0.5.0)

Wiki pipeline starts automatically — set WIKI_ENABLED=false if you don't want it.
Bench default provider is deepinfra — set LLM_PROFILE_BENCH=vllm_workstation_qwen plus judge overrides in .env.bench for self-hosted Qwen.
Watcher chunk size is 1200 words (was 600) — affects extraction wall-clock on long sources.

What's next

A full BEAM 100K run on this release is in flight; score numbers + cross-judge calibration land in a follow-up.

Assets 2