Skip to content

v1.13.0 — harness & governance hardening

Choose a tag to compare

@cgbarlow cgbarlow released this 14 Jun 21:44
· 2100 commits to main since this release
a758650

Harness & governance hardening — the swarm gets more self-sufficient. This is also the first release cut with the ADR-040 changelog-fragment tool (tools.changelog --release), folding 13 accumulated fragments since v1.12.0.

Added

  • Isolated per-agent worktree (ADR-042): each swarm/agent.sh run operates in its own git worktree.
  • Self-updating harness (ADR-039): the agent re-execs itself when the harness updates mid-run.
  • OpenAI --prove text-extraction fallback (ADR-041, #560): proof-specialised models that emit Lean as text rather than OpenAI tool calls can now prove, not just translate. Gate A re-checks every extracted proof, so soundness is unchanged.
  • Shared web-surface design language (ADR-038): one documented design language restyling the leaderboard and proof-graph visualiser.
  • Leaderboard model-distribution panel (#455) and a README preview SVG for the visualiser (#447).

Changed

  • Changelog fragments (ADR-040): per-PR changelog.d/ fragments replace direct [Unreleased] edits, so concurrent PRs never conflict on the changelog at any merge rate.
  • Gate A runner routing by replay workload (#441): light per-proof replay and heavy full replay run on separate Namespace profiles.
  • Leaderboard UI fixes (#455) and visualiser restyle (#447).

Fixed

  • The #387 retro-audit decide-noise misclassification.
  • The check_triviality --all retro-audit is no longer ~11 builds/goal (#411).
  • The triviality probe now classifies a failed to synthesize elaboration error as a probe-error, not non-trivial (#410).

See CHANGELOG.md for the full entries.