Skip to content

v0.5.0 — BEAM benchmark + wikis on by default

Choose a tag to compare

@dimknaf dimknaf released this 09 Jun 10:48
· 28 commits to main since this release
60f02d8

Highlights

  • Public benchmark harness for BEAM — anyone can clone, point at an LLM provider, and reproduce our numbers. Same dataset, same upstream eval, same judge prompt as published comparators.
  • Wiki pipeline now on by default — the maintainer + writer were opt-in before. Idle ticks cost zero LLM calls; opt out with WIKI_ENABLED=false.
  • Better long-document ingestion — bigger chunks (1200 words), density-aware fact extraction, byte-range source pointers, per-chunk retries.
  • Friendlier defaults — bench points at deepinfra (google/gemma-4-31B-it) by default. A fresh clone just needs an API key.

Upgrade notes (v0.4.0 → v0.5.0)

  • Wiki pipeline starts automatically — set WIKI_ENABLED=false if you don't want it.
  • Bench default provider is deepinfra — set LLM_PROFILE_BENCH=vllm_workstation_qwen plus judge overrides in .env.bench for self-hosted Qwen.
  • Watcher chunk size is 1200 words (was 600) — affects extraction wall-clock on long sources.

What's next

A full BEAM 100K run on this release is in flight; score numbers + cross-judge calibration land in a follow-up.