Skip to content

Releases: agenticsnz/unsorry

v1.20.0 — back open for business (resilient recovery + sharded Gate A)

17 Jun 10:13
86a357e

Choose a tag to compare

Headline: back open for business — resilient recovery after the "ruv sledgehammer". A contributor's ~337 direct prove(...) PRs ("the ruv sledgehammer", #1904) flooded Gate A and stranded ~200 valid proofs behind a saturated verifier queue. The backlog is now cleared and the swarm is hardened so it can't re-form: the stranded/failed direct PRs were closed, ~318 valid proofs re-routed onto the metered queue, the orphaned CI runs cancelled, and attribution fixed (ruvruvnet). Around that we shipped a full resilience-and-recovery stack across v1.17.1–v1.20.0 — admission control now offers the re-route on close so a flood self-heals, a reroute_stranded.py tool recovers a stranded proof without re-proving, a scheduled queue-dispatcher drains the queue governor-metered, a close-superseded sweep retires originals as their re-routes land, and a recovery playbook documents it all. This release lands the capacity piece: sharded Gate A kernel replay (ADR-063, #1909) cuts the ~1-hour verification long pole ~N× across parallel runners so verification keeps pace with submissions — every olean still kernel-replayed exactly once. The doors are open again, and the queue drains itself.

Added

  • Added sharded Gate A kernel replay (ADR-063 / SPEC-063-A) to attack the verification-capacity bottleneck (#1909). The full kernel replay is a ~1-hour serial pass over the whole library because leanchecker holds mathlib resident per process (two won't fit one runner) — but the work is already chunked and embarrassingly parallel across runners. New plan and replay-shard subcommands in tools/gate_a/parallel_modules.py partition the replay target set into N disjoint, covering shards (reusing the existing scope logic verbatim, so a shard never checks a different set than a full replay would) and replay one shard per parallel matrix runner, cutting the long pole to ~1/N wall-clock. Soundness is preserved — every olean is still kernel-replayed exactly once (the partition is unit-tested disjoint+covering, the matrix cover job fails closed on any non-green shard, and the daily full-replay backstop is unchanged), and shards share only the git SHA so leanchecker's inputs stay locally-derived (ADR-049). Ships with the tooling + tests and a non-required manual gate-a-shard-pilot workflow that validates the matrix on real runners; promoting it into the required gate-a.yml is a pilot-gated follow-up (ADR-058).
  • Added swarm/run.sh: a one-command launcher for the governed swarm (ADR-058). It starts the two halves of the coordinated --prove flow together — one dispatcher loop (agent.sh --dispatch-queue) plus one resilient prover loop (supervise.sh --prove "$@") — sharing the shell's UNSORRY_* env and torn down together on exit, so an operator runs ./swarm/run.sh instead of managing two commands. Args pass through to the prover. Run exactly one dispatcher; for more provers, start additional supervise.sh --prove. Added to the agent-lint shellcheck/bash-n gate and documented in CONTRIBUTING.

Changed

  • Made the aisp-advisory Gate B check incremental — it now validates only the .aisp coordination records changed in the PR/push (via git diff against the base, with fetch-depth: 0), instead of re-running npx aisp-validator@0.3.0 over every record in goals/translations/decompositions/library/index/proof-runs on each run. The per-file npx loop across thousands of immutable proof-runs/*.aisp was making this advisory check the slowest in the pipeline (~8 min); a typical PR touches a handful of records, so it now finishes in seconds. Behaviour is otherwise unchanged: same job name (aisp-advisory), still continue-on-error (never reddens, ADR-003 / issue #318), and it falls back to validating all records when there's no usable base diff (first push / shallow history). The swarm-contract validation is unchanged.
  • Gate A now routes prepare/archive, audit, and replay work to dedicated Namespace profiles (namespace-profile-unsorry-prepare, namespace-profile-unsorry-audit, and namespace-profile-unsorry-replay) so operators can scale each verifier lane independently.
  • The Gate A .lake/build GitHub Actions cache (ADR-045) now only runs on non-Namespace profiles (if: needs.detect.outputs.volume != 'true'). On a Namespace runner the .lake cache volume (ADR-046) already persists .lake/build across jobs and runs, so the GitHub cache was redundant churn — a ~320 MB per-commit entry (keyed on github.sha) that thrashed the 10 GB repo cache and created one near-duplicate entry per PR. Gating it keeps the cache list small and entries reused efficiently. Trade-off: on Namespace the #801 exact-key build-skip no longer fires, but the warm volume makes audit/replay's library build a cheap incremental, so throughput is unchanged.

v1.19.0 — the proof-recovery pipeline goes automatic

17 Jun 08:18
d7b8634

Choose a tag to compare

Headline: the proof-recovery pipeline goes automatic — the queued-proof backlog now drains itself. A scheduled queue-dispatcher runs the governor-metered dispatcher every 15 minutes (it was previously only an operator-invoked mode), so re-routed and queued proofs flow into PRs that Gate A re-verifies and auto-merges; a close-superseded sweep retires each stranded direct PR once its goal lands, so the backlog shrinks instead of doubling; and docs/recovery.md (linked from a new README Recovery section) documents the whole submission/recovery machinery and the Gate A capacity backstop (ADR-058). Together with the earlier re-route tool and admission self-heal (v1.18.0 / v1.17.1), this closes the loop on "the ruv sledgehammer fix" (#1904) — stranded proofs recover into the metered queue with no manual PR surgery and no lost work.

Added

  • Added tools/repo/close_superseded.py and a scheduled close-superseded workflow that closes a stranded direct prove(...) PR once its goal is proved on main (its queued re-route, or a peer, landed) — and never a PR whose goal is still open. This is the safe, incremental implementation of "close the stranded originals once their re-routes land" for the ruv-sledgehammer cleanup (#1904) and the capacity drain (#1909): as the queue-dispatcher lands re-routed proofs, this sweep retires the superseded originals, so the backlog shrinks instead of doubling. Closing needs only the default GITHUB_TOKEN. Pure helpers are pytest-covered; --author and --dry-run are supported.
  • Added docs/recovery.md and a Recovery section to the README documenting how the swarm submits proofs through the governor-metered queue, why a proof can get stranded (the direct-submission cutover + Gate A flood, #1904/#1909), and the self-healing recovery pipeline: the admission re-route offer, reroute_stranded.py (re-package without re-proving), the scheduled queue-dispatcher, and the close-superseded sweep — plus the operator quick-reference and the Gate A capacity backstop (ADR-058).
  • Added a scheduled queue-dispatcher workflow that drains the queued-proof backlog automatically. The swarm/agent.sh --dispatch-queue dispatcher (SPEC-007-A) previously ran only when an operator invoked it by hand, so queued/prove/* branches accumulated unbounded (the backlog in #1909). The workflow now runs it every 15 minutes (governor-metered via UNSORRY_MAX_OPEN_PROVE_PRS / UNSORRY_MAX_GATE_A_IN_FLIGHT, ADR-058), re-packaging already-locally-verified queued branches into PRs that Gate A re-verifies from scratch. It authenticates with the admin REFRESH_TOKEN (a GITHUB_TOKEN-opened PR would not trigger Gate A) and degrades to a report-only notice when that secret is unset. Self-throttling: it dispatches only while in-flight Gate A work is below the cap and no-ops when full.

v1.18.0 — stranded-proof recovery tool + fuller browser surfaces

17 Jun 07:24
43d3297

Choose a tag to compare

Headline: recovery tooling and fuller browser surfaces — a tools/repo/reroute_stranded.py maintainer tool re-routes a stranded direct proof PR onto the accepted queued path without re-proving (the dispatcher re-packages it; Gate A re-verifies from scratch) — the recovery engine behind "the ruv sledgehammer fix" (#1904). On the browser side, the leaderboard gains a Sourcing tab that finally surfaces the already-generated sourcing board, and the proof graph now renders every goal — standalone goals fold into one collapsed summary cluster per status that expands on click, instead of being omitted so the diagram looked nearly empty.

Added

  • Added tools/repo/reroute_stranded.py, a maintainer tool that re-routes a stranded direct proof PR through the accepted queued path (ADR-058) without re-proving. Direct prove(...) PRs are no longer accepted (the queued-proof cutover), and pre-cutover ones strand with gate-a-audit/gate-a-replay perpetually queued. Since the dispatcher only re-packages an existing branch (Gate A re-verifies from scratch), the tool copies a stranded PR's proof files onto a fresh queued/prove/<goal>/… branch off current origin/main, Gate-B-validates it, and (with --push) pushes it so the dispatcher meters it into CI. Self-contained proofs re-route cleanly; one importing a now-archived Unsorry.* module fails Gate A on dispatch (the safe outcome). Built to recover @ruvnet's stranded proofs (issue #1904, "the ruv sledgehammer fix").
  • Added a Sourcing view to the leaderboard page (ADR-060): a fourth toggle that surfaces the sourcing leaderboard (docs/metrics/sourcing-leaderboard.json) in the browser — who sourced each goal, ranked by goals sourced with a proved/open/difficulty split. The data was already generated by python3 -m tools.leaderboard --sourcing but had no browser surface; it is now one click from the contributor leaderboard, lazily fetched the first time the tab is opened.

Changed

  • The proof-graph page now shows every goal, not just the ~7% that sit in decomposition forests. Standalone goals (no parent→sub lineage) are folded into one collapsed summary cluster per status alongside the forest (ADR-032 amendment / SPEC-032-A); on the interactive page a cluster expands on click into its individual, clickable goals and collapses again (the client rebuilds the Mermaid source from the embedded model, byte-identical to the server render when collapsed), while the static markdown shows the collapsed clusters and the full per-goal list stays in the table. Previously standalone goals were omitted from the diagram entirely, so it read as if most of the 762 goals were missing.

v1.17.1 — the ruv sledgehammer fix + board reconciliation

17 Jun 07:22
760e648

Choose a tag to compare

Headline: maintenance — "the ruv sledgehammer fix" and a board reconciliation — a patch release of fixes only. After a contributor's ~337 direct prove(...) PRs stranded against the queued-proof cutover (#1904), admission control now offers the re-route path when it closes a direct submission, so a flood self-heals into the metered queue instead of stranding, and @ruvnet's two git identities are mapped to the canonical handle so credit attaches. Alongside, the #1888 reconciliation flips two genuinely-proved goals that were still showing as open/translated (proved-status now matches the proof-index count), and the proof-graph page's heading scale and insets are aligned with the shared home/leaderboard card.

Fixed

  • When admission control closes a post-cutover direct prove(...) PR, the auto-comment now offers the re-route path as well as the re-prove path, so a contributor's direct-submission campaign self-heals into the metered queue instead of stranding (the failure mode behind #1904, "the ruv sledgehammer fix"). The comment now tells anyone whose branch already holds a verified proof to run python3 -m tools.repo.reroute_stranded --pr <n> --push — the queued dispatcher re-packages it and Gate A re-verifies from scratch — in addition to the existing ./swarm/agent.sh --prove re-prove instructions.
  • Fixed two goals that carried a kernel-verified proof index record but whose goal record was never flipped to proved, so they showed as open/translated on the queue and proof graph despite being solved (surfaced by the #1888 reconciliation; the proved-status count now matches the active proof-index count, 286). no-nat-sq-eq-two-mul-sq-s4 — a decomposition sub-lemma whose parent no-nat-sq-eq-two-mul-sq is already proved — was left status≜open/sha≜∅; it now records status≜proved and the proof sha. nat-zero-lt-succ — a grandfathered bootstrap lemma proved manually in Unsorry/Basic.lean with no goals/*.lean stub (the statement-binding generator deliberately skips such entries, ADR-011) — now records status≜proved; its phase≜translate/lean≜∅ are kept intentionally as the grandfathered marker. Statements are unchanged (ADR-018) — only the status/sha metadata is corrected.
  • Fixed the proof-graph page's layout drifting from the home and leaderboard pages: its heading scale and section margins now match the shared card (text-5xl md:text-7xl wordmark, px-6 md:px-10 inset — previously text-4xl md:text-6xl / md:px-8), so the three browser surfaces align. Also removed the duplicate Contributor leaderboard header link — the shared top-nav already links the leaderboard.
  • Fixed leaderboard attribution for @ruvnet, whose proof contributions carried solver≜ruvnet but were authored under the git identities ruv <ruv@ruv.net> (the proof campaign) and rUv <…ruvnet@users.noreply.github.com> (merged PR #1278), so the ADR-037 provenance audit flagged them as uncorroborated. docs/metrics/contributor-aliases.json now maps both identities to the canonical handle ruvnet, so the credit attaches correctly (issue #1904).

v1.17.0 — a runnable sourcing loop for the swarm

17 Jun 04:37
d485d14

Choose a tag to compare

Headline: a runnable sourcing loop for the swarm — goal-sourcing gets the same hands-off harness the prove loop already had. A new swarm/sourcing.sh (ADR-062) is the sourcing counterpart to swarm/agent.sh: it fires up Claude to run one cycle of the unsorry-goal-sourcing skill (ADR-060) — pick a hard theme, run the four sourcing gates, promote survivors to triples with gen_triples.py, and open one chore(sourcing): PR — answering the #1837 maintainer ask for "a bash script run like ./swarm/agent.sh that fires up Claude and runs a cycle of goal sourcing." It is bounded by default (one cycle; --cycles N to repeat) since sourcing has no empty-pool terminator, and reuses agent.sh's preflight, ADR-016 failure classification, ADR-059 fetch resilience, and 0/1/2/3 exit codes so supervise.sh wraps it unchanged. Alongside, a refresh-workflow fix lands the long-stale docs/proof-graph.svg (#1888) — it was regenerated every run but never staged, frozen at 153 goals · 104 proved while the real total had grown to 762 goals · 284 proved.

Added

  • Added swarm/sourcing.sh, the goal-sourcing runner (ADR-062 / SPEC-062-A) — the sourcing counterpart to swarm/agent.sh. It fires up Claude to run one cycle of the unsorry-goal-sourcing skill (ADR-060): pick a hard theme, run the four sourcing gates over candidates, promote survivors to triples with gen_triples.py, and open one chore(sourcing): PR of ≤ --max-goals goals. Unlike the prove loop it is bounded by default (one cycle; --cycles N / UNSORRY_SOURCING_CYCLES to repeat) because sourcing has no empty-pool terminator, supports --theme, --max-goals (clamped to 1..50), and --dry-run, and reuses agent.sh's preflight, ADR-016 infrastructure-failure classification, ADR-059 fetch resilience, and 0/1/2/3 exit codes so swarm/supervise.sh wraps it unchanged. Driven by a new swarm/prompts/source.md playbook; shellcheck-clean with a hermetic --self-test, both wired into agent-lint.yml.

Fixed

  • Fixed the proof-graph preview (docs/proof-graph.svg) being permanently stale, so it under-counted solved problems (#1888). The proofs-visualisation refresh workflow regenerates all three visualiser outputs with tools.visualiser --write, but its commit $paths listed only the .md table and .html page — not docs/proof-graph.svg. So the bot regenerated the SVG every run but never staged it, and its per-$paths git diff short-circuited to "already in sync", leaving the SVG frozen at an old snapshot (153 goals · 104 proved) while the real total had grown to 762 goals · 284 proved (+138 archived). Same path-gap class as the earlier sourcing-leaderboard refresh fix. Added docs/proof-graph.svg to the workflow's tracked $paths and regenerated it so main reflects reality now. The leaderboard itself was unaffected — its --check was already green and it correctly counts all 424 verified proofs including archived ones.

v1.16.0 — contributor goal-sourcing (harder problems, more of them)

17 Jun 04:08
11778d9

Choose a tag to compare

Headline: opening problem-sourcing to contributors — harder problems, and many more of them — the #400 mandate becomes self-serve. A contributor-facing unsorry-goal-sourcing skill (ADR-060) drives the four-gate sourcing pipeline (absence → type-check → non-triviality → provable+skeptic) under a skill-enforced difficulty bar, backed by a tested gen_triples.py triple assembler and a new sourcing leaderboard that credits whoever sourced each goal — a no-pre-claim, merge-time-dedup model that works from a fork today. Around it the swarm gets steadier at scale: coordinated provers queue locally-verified branches and agent.sh --dispatch-queue meters them into PRs under the ADR-058 submission governor, with proof-PR admission control for the queued-dispatcher cutover and a scheduled pr-label-sweep backstop for unlabelled PRs. Process hygiene tightens — the protocol gate now enforces ADR/SPEC number uniqueness (ADR-061, closing the double-ADR-059 collision), the PR classifier learns the unblock(…) shape, and a bracket-prefixed title gets a targeted hint. The prove loop gets much faster by seeding each worktree's .lake/build from a warm library (a ~735 s cold verify drops to ~2 s; soundness unchanged — CI Gate A still re-verifies from scratch), and two leaderboard refresh/attribution fixes land the sourcing board and fold the Your Name placeholder into @perttu. A fork-submission research note (#1206) maps what is left (ADR-053/054 coordination, not the kernel boundary) to open sourcing to the world.

Added

  • Decided (ADR-060, SPEC-060-A, #400/#1533) to add a contributor-facing unsorry-goal-sourcing skill that walks contributors and agents through the four-gate sourcing pipeline to generate harder problems and more of them, with a skill-enforced difficulty bar, a tools/sourcing/gen_triples.py triple assembler, a no-pre-claim + merge-time-dedup conflict model that works from a fork today, parallel Phase-2 (Euler substrate) and Phase-3 (library growth) sourcing, and a new sourcing-leaderboard mode that credits contributors who source goals.
  • The protocol-compliance gate (tools/repo/pr_protocol.py, pr-protocol check) now enforces ADR/SPEC number uniqueness in addition to spec linkage (ADR-061 / SPEC-061-A). An added ADR-<n> that reuses a number already taken by a different ADR file — or an added SPEC-<n>-<letter> reusing an existing number+letter — now fails the gate, naming the colliding files (multiple spec letters per ADR, e.g. SPEC-003-A/B/C, stay legal). This closes the door on the kind of collision that put two ADR-059 files on main when #1757 and #1837 merged seconds apart.
  • Added tools/sourcing/gen_triples.py (ADR-060 / SPEC-060-A): a goal-triple assembler that turns one validated sourcing candidate into the three on-disk files — the goals/<slug>.lean sorry-stub, the SPEC-003-A goals/<slug>.aisp record (status≜open, sha≜∅), and the backlog/<slug>.md entry — and (with --validate) re-runs Gate B over them. Previously this triple was hand-authored; the tool makes it deterministic and Gate-B-clean, refuses to clobber an existing goal (ADR-018 immutability), and is the assembler the forthcoming unsorry-goal-sourcing skill invokes.
  • Added a research investigation (docs/proposals/non-contributor-fork-submission.md, #1206) into whether non-contributors can submit proofs from a fork. Finding: the autonomous --prove loop assumes canonical-repo write access at four points (claim/feature-push/PR/merge) plus a credentialed verification gate, so it does not work from a fork as wired — but accepting an untrusted proof is already soundness-safe under ADR-049's central re-check, so the remaining work is coordination and governance (ADR-053/054), not the kernel boundary. A maintainer-mediated manual path (--prove-local + hand-opened PR) already works today.
  • Added repository-side proof PR admission control for the queued dispatcher cutover: post-cutover direct feature/goal-*, prove/*, or prove(...) submissions are labelled, commented, closed, and short-circuited from expensive CI while existing pre-cutover PRs continue draining.
  • Added a scheduled pr-label-sweep workflow that re-runs the single-source PR title classifier (tools/repo/pr_labels.py) every 15 minutes over any open PR left with no label and applies the correct labels. This is a backstop for the per-PR pr-labels job, whose run can be queued-out, superseded, or cancelled when the swarm or contributors open many PRs at once — leaving the PR permanently unlabelled. Both paths reuse the same classifier, so labelling stays deterministic and the rules live in one place.
  • Coordinated proof producers now queue locally verified queued/prove/... branches by default instead of opening PRs immediately; ./swarm/agent.sh --dispatch-queue meters those branches into ordinary auto-merge PRs when the ADR-058 submission governor allows more Gate A work.
  • Added a sourcing leaderboard (ADR-060 / SPEC-060-A §6): python3 -m tools.leaderboard --sourcing credits whoever sourced each goal (added goals/<id>.aisp), independent of who proves it — ranked by goals sourced then difficulty points, with a proved/open split. Attribution is git add-author (the mechanism the proof leaderboard already uses), mapped through docs/metrics/contributor-aliases.json, so no goal-record schema change. docs/metrics/sourcing-leaderboard.json joins the standard --write/--check artifact set and is refreshed post-merge like the other leaderboards.
  • Added the unsorry-goal-sourcing skill (ADR-060 / SPEC-060-A, authored with skill-creator) under Skills/. It walks a contributor or agent through SOURCING new open goals — the four-gate pipeline (absence → type-check → non-triviality → provable+skeptic), a skill-enforced max-difficulty bar, the gen_triples.py triple format, the no-pre-claim + merge-time-dedup conflict model that works from a fork, and the chore(sourcing): ≤50-goal PR + #81/#400 announce discipline — with references/, assets/ templates, and an adversarial-skeptic sub-agent. It complements unsorry-proof-authoring (which proves existing goals).

Changed

  • Coordinated --prove now pauses before claiming new work when open proof PRs or Gate A queue pressure exceed the ADR-058 submission-governor thresholds, while --prove-local and --dry-run remain available.

Fixed

  • Fixed the statement-binding generator (tools/gate_a/check_statement_binding.py) picking the wrong module when a theorem name collides with a private declaration in an alphabetically-earlier module. _module_declaring searched sorted(*.lean) for any theorem/lemma <name> and returned the first hit, even if it was private — which the binding cannot reference (the binding imports the module and uses the name by value), producing Unknown identifier. It now skips private declarations and binds to the public proof module. This unblocked goal nat-sq-lt-two-pow-s2, which looped through three failed attempts (#679, #963, #1173): a recompose proof in the earlier NatSqLtTwoPow re-used the sub-lemma name sq_lt_two_pow_step_from_five as a private helper, so every attempt's binding bound to that instead of the public proof in NatSqLtTwoPowS2.
  • The PR title classifier (tools/repo/pr_labels.py) now recognises the unblock(<goal>): swarm shape — the ADR-009 step that re-opens a blocked parent goal once its sub-lemmas are proved. Previously this sanctioned-and-frequently-used shape (41 PRs in history) matched no known shape, so the pr-conventions gate falsely failed every unblock PR and none received a label. It now classifies to a new swarm:unblock label and passes the convention gate.
  • Fixed a transient git fetch failure aborting a whole solving run (#983). All ADR-042 per-agent worktrees on a host share one object database, and a fetch racing a sibling fetch or a gc.auto repack could leave a thin-pack base object momentarily unreadable (failed to read delta-pack base object …fatal: unpack-objects error). With no retry, the first such blip made sync_repo exit and the run stopped. Every swarm fetch now goes through a git_fetch_retry helper (ADR-059) that runs with gc.auto=0 and retries with exponential backoff (UNSORRY_FETCH_RETRIES, UNSORRY_FETCH_BACKOFF); a fetch that still cannot complete surfaces as an infrastructure failure (exit 3) so supervise.sh backs off and reschedules instead of dying on the first hiccup.
  • Fixed the leaderboard auto-refresh so it actually commits docs/metrics/sourcing-leaderboard.json. The sourcing artifact was added to the generator's --write/--check set (the sourcing leaderboard, ADR-060) but was missing from the refresh workflow's $paths list, so the bot regenerated it but never staged it — and its per-$paths drift check falsely reported "in sync", so the sourcing board (including the Your Name@perttu attribution fold) never landed on main. Added the path so the refresh tracks, stages, and pushes it like the other generated artifacts.
  • The pr-conventions gate now gives a targeted diagnostic when a PR title fails only because of a leading bracket prefix (e.g. [codex] ci: …): it names the prefix and sh...
Read more

v1.15.0 — a leaner, cheaper Gate A at scale + decentralised-runner groundwork

15 Jun 22:28
83d0381

Choose a tag to compare

Headline: a leaner, cheaper Gate A at scale — plus groundwork for a decentralised runner — as proofs compound, Gate A is kept fast and memory-bounded instead of slowly OOM-ing: two more archive blocks (0003 and 0004, ADR-041) retire 58 proved goals and shrink the active library from 108 → 50 modules; archive validation switches to verify-on-ingest (git-blob provenance instead of a kernel re-replay, ADR-048) so archive PRs fit an 8 GB runner; push-to-main now kernel-replays incrementally with a daily full-replay backstop (ADR-048 Phase 2); the full-replay trigger narrows to olean-invalidating changes only; and a persistent Namespace .lake cache volume (ADR-046) removes the last constant per-job mathlib-restore cost. Soundness is unchanged throughout — every proof is still kernel-replayed once under its pinned toolchain. Alongside, ADR-049 lands the research + a Phase-0 conformance suite for a future decentralised CI runner, and the browser surfaces grow a proofs-over-time graph, a Top 5 view, and a shared home page.

Added

  • ADR-049 Phase-0 conformance suite (tools/gate_a/tests/test_decentralised_runner_conformance.py, #635) — locks the decentralised-runner soundness invariant: the central re-check feeds leanchecker only locally-derived modules (never a contributor artifact), an untrusted diff fails closed to a full re-check, and a changed module always drags its *Binding + reverse-import closure into scope.
  • ADR-049 research (#635) — a ranked options analysis (docs/proposals/decentralised-ci-runner-architecture.md) and ADR-049 (Proposed) recommending a tiered split: push the expensive lake build elaboration onto the contributor, keep a mandatory cheap central kernel re-check from canonical source as the sole load-bearing gate.
  • Leaderboard proofs-over-time view (#738) — a cumulative line graph of kernel-verified proofs by date, plus a shared Home · Leaderboard · Proof graph top nav and a new docs/index.html landing page.
  • Leaderboard Top 5 view — a three-tab switcher (Leaderboard · Top 5 · Proofs over time), README links from the home page, and small-screen tuning.
  • Persistent Namespace .lake cache volume (nscloud-cache-action, ADR-046) — mathlib oleans + the local build survive across jobs/runs, eliminating the ~2–3.5-min GitHub-cache mathlib restore paid on every job. Advisory: leanchecker still re-checks every olean.

Changed

  • Amended ADR-041: archive closed proof families only — a decomposition family (parent + subs + proofs + metadata) is archived as one atomic unit, and only when every member is proved.
  • Cut archive block 0003 (ADR-041) — retired 30 proved goals (whole trees + standalone) into packages/unsorry-archive-0003/; active library 108 → 78 modules.
  • Cut archive block 0004 (ADR-041) — retired 28 proved goals into packages/unsorry-archive-0004/; active library 78 → 50 modules.
  • Documented the archive-block cut as a repeatable runbook (SPEC-041-A §8/§9), including the "cut early and often, 40 is a ceiling not a goal" at-scale guidance.
  • Corrected the archive-cut runbook after 0003/0004: archive whole decomposition trees only (Gate B GB008/GB016 atomicity) and never regenerate generated docs in the cut (they race main's refresh bot).
  • Gate A push-to-main now replays incrementally (ADR-048 Phase 2) with a daily gate-a-full-replay.yml backstop; routine Gate A (proof PRs and push to main) routes to the 8 GB unsorry-1 runner. Soundness unchanged — every proof still kernel-replayed once.
  • Narrowed the Gate A full-replay trigger to olean-invalidating changes only (lean-toolchain, lakefile.*, lake-manifest.json), so gate/CI PRs no longer ride the memory-bound serial full replay that OOM-killed the 16 GB runner.
  • Gate A no longer rebuilds UnsorryLibrary in the audit/replay jobs on an exact-sha cache hit (ADR-045 amendment) — the audit job drops from ~535 s to ~320 s.
  • Adopted verify-on-ingest for archive validation (ADR-048) — archived proofs are checked byte-identical to the already-verified version (git-blob provenance) instead of re-running leanchecker, removing the archive-OOM class.

Fixed

  • gate-a-archive now runs on the 16 GB unsorry-2 runner (was routed to the 8 GB pool and OOM-killed, #764).
  • Archive-package replay now chunks at REPLAY_CHUNK_SIZE, bounding peak memory — fixes the exit-137 OOM on memory-bound runners (#764).
  • Archive-package validation runs in its own parallel gate-a-archive job (own headroom + swap + 120-min budget) instead of inside gate-a-prepare (#764).
  • Gate B's advisory AISP validator now runs on Node.js 24.

Full changelog: https://github.com/agenticsnz/unsorry/blob/main/CHANGELOG.md
Compare: v1.14.0...v1.15.0

v1.14.0 — the Identity Engine + a faster, leaner swarm

15 Jun 02:03
43a599d

Choose a tag to compare

Headline: the Identity Engine, and a faster, more self-sufficient swarm — the #400 big-hairy-goal lands as the Identity Engine (ADR-043), a themed program to mass-source mathlib-absent, non-trivial, swarm-provable identities & inequalities at scale, each vetted (absence + triviality + a compiling proof + an adversarial skeptic) before sourcing, with the next 200–300 staged as a candidate backlog. Around it, Gate A gets much faster — an incremental Lake build cache (ADR-045) replaces the ~16–21-min-per-job cold build, with timeout headroom for the rare cold path — and the swarm gets less idle and less brittle: parked-goal recovery (ADR-044), an archive-aware goal-immutability check, and self-verify/lint parity fixes that unblock main (#612).

Added

  • The Identity Engine (docs/plans/identity-engine.md, ADR-043): a themed program to mass-source mathlib-absent, non-trivial, swarm-provable elementary identities & inequalities at scale — the #400 big-hairy-goal. Each candidate is verified before sourcing (absence + ADR-035 triviality + a compiling intended proof + an adversarial skeptic pass), and the next 200–300 are staged as a vetted candidate backlog under backlog/candidates/.

Changed

  • Gate A's gate-a-prepare and gate-a-audit jobs now allow 45 minutes (was 30); gate-a-replay stays at 60 — headroom for the rare cold library build (#567/#573, SPEC-045-A).
  • Gate A now caches the local Lake build directory (.lake/build) across runs, so lake build UnsorryLibrary is incremental instead of cold-building the whole library in every job (ADR-045/SPEC-045-A) — the library is built once per run instead of three times. Soundness unchanged: the axiom audit + leanchecker kernel replay still validate every olean that reaches main.

Fixed

  • Fixed agent-lint failing on main (and therefore every open PR): rewrote the #612 binding-suppression check in swarm/agent.sh to clear shellcheck SC2015. No behaviour change; --self-test green.
  • Gate A's goal-immutability check (tools/gate_a/check_goal_immutability.py, ADR-018) is now archive-aware, unblocking ADR-041 archive-block retirement — a delete is exempt iff the goal id is in an archive manifest and the archived statement is byte-identical at the base ref.
  • The swarm no longer reports "nothing claimable" while dozens of open goals sit recoverable (ADR-044/SPEC-044-A): a --prove cycle with no claimable viable goal now re-surfaces one goal parked below TAU_V and retries it through the same pipeline with its accumulated lesson history.
  • The swarm's local proof self-verify now suppresses linter.unusedVariables in the regenerated ADR-011 statement-binding obligation, matching Gate A — so a proof that leaves a hypothesis unused is no longer needlessly decomposed (#612).

Full changelog: https://github.com/agenticsnz/unsorry/blob/main/CHANGELOG.md

v1.13.0 — harness & governance hardening

14 Jun 21:44
a758650

Choose a tag to compare

Harness & governance hardening — the swarm gets more self-sufficient. This is also the first release cut with the ADR-040 changelog-fragment tool (tools.changelog --release), folding 13 accumulated fragments since v1.12.0.

Added

  • Isolated per-agent worktree (ADR-042): each swarm/agent.sh run operates in its own git worktree.
  • Self-updating harness (ADR-039): the agent re-execs itself when the harness updates mid-run.
  • OpenAI --prove text-extraction fallback (ADR-041, #560): proof-specialised models that emit Lean as text rather than OpenAI tool calls can now prove, not just translate. Gate A re-checks every extracted proof, so soundness is unchanged.
  • Shared web-surface design language (ADR-038): one documented design language restyling the leaderboard and proof-graph visualiser.
  • Leaderboard model-distribution panel (#455) and a README preview SVG for the visualiser (#447).

Changed

  • Changelog fragments (ADR-040): per-PR changelog.d/ fragments replace direct [Unreleased] edits, so concurrent PRs never conflict on the changelog at any merge rate.
  • Gate A runner routing by replay workload (#441): light per-proof replay and heavy full replay run on separate Namespace profiles.
  • Leaderboard UI fixes (#455) and visualiser restyle (#447).

Fixed

  • The #387 retro-audit decide-noise misclassification.
  • The check_triviality --all retro-audit is no longer ~11 builds/goal (#411).
  • The triviality probe now classifies a failed to synthesize elaboration error as a probe-error, not non-trivial (#410).

See CHANGELOG.md for the full entries.

v1.12.0 - Leaderboard, fixed auto-update for proof graph

14 Jun 08:51
53cde3c

Choose a tag to compare

The contributor leaderboard (issue #270, ADR-023) — a standalone page ranking solvers by verified proofs, cross-linked with the #371 proof-graph visualiser.

Around it, the post-merge generated-artifact pipeline (board, leaderboard, visualisation) became self-healing: it now actually pushes to main via an admin REFRESH_TOKEN (the GH006 fix, #417) with a race-tolerant retry loop (#426), so the three surfaces refresh on every merge without red runs.

Contributor attribution is also corrected and guarded — a phantom-solver check (ADR-037) flags any solver≜ not backed by a proof-run, git author, or alias (#431/#433). Plus a Skills framework for agent-driven work, and the REFRESH_TOKEN secret unification.

See CHANGELOG.md for the full list.