Releases: agenticsnz/unsorry
v1.20.0 — back open for business (resilient recovery + sharded Gate A)
Headline: back open for business — resilient recovery after the "ruv sledgehammer". A contributor's ~337 direct prove(...) PRs ("the ruv sledgehammer", #1904) flooded Gate A and stranded ~200 valid proofs behind a saturated verifier queue. The backlog is now cleared and the swarm is hardened so it can't re-form: the stranded/failed direct PRs were closed, ~318 valid proofs re-routed onto the metered queue, the orphaned CI runs cancelled, and attribution fixed (ruv → ruvnet). Around that we shipped a full resilience-and-recovery stack across v1.17.1–v1.20.0 — admission control now offers the re-route on close so a flood self-heals, a reroute_stranded.py tool recovers a stranded proof without re-proving, a scheduled queue-dispatcher drains the queue governor-metered, a close-superseded sweep retires originals as their re-routes land, and a recovery playbook documents it all. This release lands the capacity piece: sharded Gate A kernel replay (ADR-063, #1909) cuts the ~1-hour verification long pole ~N× across parallel runners so verification keeps pace with submissions — every olean still kernel-replayed exactly once. The doors are open again, and the queue drains itself.
Added
- Added sharded Gate A kernel replay (ADR-063 / SPEC-063-A) to attack the verification-capacity bottleneck (#1909). The full kernel replay is a ~1-hour serial pass over the whole library because
leancheckerholds mathlib resident per process (two won't fit one runner) — but the work is already chunked and embarrassingly parallel across runners. Newplanandreplay-shardsubcommands intools/gate_a/parallel_modules.pypartition the replay target set into N disjoint, covering shards (reusing the existing scope logic verbatim, so a shard never checks a different set than a full replay would) and replay one shard per parallel matrix runner, cutting the long pole to ~1/N wall-clock. Soundness is preserved — every olean is still kernel-replayed exactly once (the partition is unit-tested disjoint+covering, the matrix cover job fails closed on any non-green shard, and the daily full-replay backstop is unchanged), and shards share only the git SHA soleanchecker's inputs stay locally-derived (ADR-049). Ships with the tooling + tests and a non-required manualgate-a-shard-pilotworkflow that validates the matrix on real runners; promoting it into the requiredgate-a.ymlis a pilot-gated follow-up (ADR-058). - Added
swarm/run.sh: a one-command launcher for the governed swarm (ADR-058). It starts the two halves of the coordinated--proveflow together — one dispatcher loop (agent.sh --dispatch-queue) plus one resilient prover loop (supervise.sh --prove "$@") — sharing the shell'sUNSORRY_*env and torn down together on exit, so an operator runs./swarm/run.shinstead of managing two commands. Args pass through to the prover. Run exactly one dispatcher; for more provers, start additionalsupervise.sh --prove. Added to the agent-lint shellcheck/bash-n gate and documented in CONTRIBUTING.
Changed
- Made the
aisp-advisoryGate B check incremental — it now validates only the.aispcoordination records changed in the PR/push (viagit diffagainst the base, withfetch-depth: 0), instead of re-runningnpx aisp-validator@0.3.0over every record ingoals/translations/decompositions/library/index/proof-runson each run. The per-filenpxloop across thousands of immutableproof-runs/*.aispwas making this advisory check the slowest in the pipeline (~8 min); a typical PR touches a handful of records, so it now finishes in seconds. Behaviour is otherwise unchanged: same job name (aisp-advisory), stillcontinue-on-error(never reddens, ADR-003 / issue #318), and it falls back to validating all records when there's no usable base diff (first push / shallow history). The swarm-contract validation is unchanged. - Gate A now routes prepare/archive, audit, and replay work to dedicated Namespace profiles (
namespace-profile-unsorry-prepare,namespace-profile-unsorry-audit, andnamespace-profile-unsorry-replay) so operators can scale each verifier lane independently. - The Gate A
.lake/buildGitHub Actions cache (ADR-045) now only runs on non-Namespace profiles (if: needs.detect.outputs.volume != 'true'). On a Namespace runner the.lakecache volume (ADR-046) already persists.lake/buildacross jobs and runs, so the GitHub cache was redundant churn — a ~320 MB per-commit entry (keyed ongithub.sha) that thrashed the 10 GB repo cache and created one near-duplicate entry per PR. Gating it keeps the cache list small and entries reused efficiently. Trade-off: on Namespace the #801 exact-key build-skip no longer fires, but the warm volume makes audit/replay's library build a cheap incremental, so throughput is unchanged.
v1.19.0 — the proof-recovery pipeline goes automatic
Headline: the proof-recovery pipeline goes automatic — the queued-proof backlog now drains itself. A scheduled queue-dispatcher runs the governor-metered dispatcher every 15 minutes (it was previously only an operator-invoked mode), so re-routed and queued proofs flow into PRs that Gate A re-verifies and auto-merges; a close-superseded sweep retires each stranded direct PR once its goal lands, so the backlog shrinks instead of doubling; and docs/recovery.md (linked from a new README Recovery section) documents the whole submission/recovery machinery and the Gate A capacity backstop (ADR-058). Together with the earlier re-route tool and admission self-heal (v1.18.0 / v1.17.1), this closes the loop on "the ruv sledgehammer fix" (#1904) — stranded proofs recover into the metered queue with no manual PR surgery and no lost work.
Added
- Added
tools/repo/close_superseded.pyand a scheduledclose-supersededworkflow that closes a stranded directprove(...)PR once its goal is proved onmain(its queued re-route, or a peer, landed) — and never a PR whose goal is still open. This is the safe, incremental implementation of "close the stranded originals once their re-routes land" for the ruv-sledgehammer cleanup (#1904) and the capacity drain (#1909): as thequeue-dispatcherlands re-routed proofs, this sweep retires the superseded originals, so the backlog shrinks instead of doubling. Closing needs only the defaultGITHUB_TOKEN. Pure helpers are pytest-covered;--authorand--dry-runare supported. - Added
docs/recovery.mdand a Recovery section to the README documenting how the swarm submits proofs through the governor-metered queue, why a proof can get stranded (the direct-submission cutover + Gate A flood, #1904/#1909), and the self-healing recovery pipeline: the admission re-route offer,reroute_stranded.py(re-package without re-proving), the scheduledqueue-dispatcher, and theclose-supersededsweep — plus the operator quick-reference and the Gate A capacity backstop (ADR-058). - Added a scheduled
queue-dispatcherworkflow that drains the queued-proof backlog automatically. Theswarm/agent.sh --dispatch-queuedispatcher (SPEC-007-A) previously ran only when an operator invoked it by hand, soqueued/prove/*branches accumulated unbounded (the backlog in #1909). The workflow now runs it every 15 minutes (governor-metered viaUNSORRY_MAX_OPEN_PROVE_PRS/UNSORRY_MAX_GATE_A_IN_FLIGHT, ADR-058), re-packaging already-locally-verified queued branches into PRs that Gate A re-verifies from scratch. It authenticates with the adminREFRESH_TOKEN(a GITHUB_TOKEN-opened PR would not trigger Gate A) and degrades to a report-only notice when that secret is unset. Self-throttling: it dispatches only while in-flight Gate A work is below the cap and no-ops when full.
v1.18.0 — stranded-proof recovery tool + fuller browser surfaces
Headline: recovery tooling and fuller browser surfaces — a tools/repo/reroute_stranded.py maintainer tool re-routes a stranded direct proof PR onto the accepted queued path without re-proving (the dispatcher re-packages it; Gate A re-verifies from scratch) — the recovery engine behind "the ruv sledgehammer fix" (#1904). On the browser side, the leaderboard gains a Sourcing tab that finally surfaces the already-generated sourcing board, and the proof graph now renders every goal — standalone goals fold into one collapsed summary cluster per status that expands on click, instead of being omitted so the diagram looked nearly empty.
Added
- Added
tools/repo/reroute_stranded.py, a maintainer tool that re-routes a stranded direct proof PR through the accepted queued path (ADR-058) without re-proving. Directprove(...)PRs are no longer accepted (the queued-proof cutover), and pre-cutover ones strand withgate-a-audit/gate-a-replayperpetually queued. Since the dispatcher only re-packages an existing branch (Gate A re-verifies from scratch), the tool copies a stranded PR's proof files onto a freshqueued/prove/<goal>/…branch off currentorigin/main, Gate-B-validates it, and (with--push) pushes it so the dispatcher meters it into CI. Self-contained proofs re-route cleanly; one importing a now-archivedUnsorry.*module fails Gate A on dispatch (the safe outcome). Built to recover @ruvnet's stranded proofs (issue #1904, "the ruv sledgehammer fix"). - Added a Sourcing view to the leaderboard page (ADR-060): a fourth toggle that surfaces the sourcing leaderboard (
docs/metrics/sourcing-leaderboard.json) in the browser — who sourced each goal, ranked by goals sourced with a proved/open/difficulty split. The data was already generated bypython3 -m tools.leaderboard --sourcingbut had no browser surface; it is now one click from the contributor leaderboard, lazily fetched the first time the tab is opened.
Changed
- The proof-graph page now shows every goal, not just the ~7% that sit in decomposition forests. Standalone goals (no parent→sub lineage) are folded into one collapsed summary cluster per status alongside the forest (ADR-032 amendment / SPEC-032-A); on the interactive page a cluster expands on click into its individual, clickable goals and collapses again (the client rebuilds the Mermaid source from the embedded model, byte-identical to the server render when collapsed), while the static markdown shows the collapsed clusters and the full per-goal list stays in the table. Previously standalone goals were omitted from the diagram entirely, so it read as if most of the 762 goals were missing.
v1.17.1 — the ruv sledgehammer fix + board reconciliation
Headline: maintenance — "the ruv sledgehammer fix" and a board reconciliation — a patch release of fixes only. After a contributor's ~337 direct prove(...) PRs stranded against the queued-proof cutover (#1904), admission control now offers the re-route path when it closes a direct submission, so a flood self-heals into the metered queue instead of stranding, and @ruvnet's two git identities are mapped to the canonical handle so credit attaches. Alongside, the #1888 reconciliation flips two genuinely-proved goals that were still showing as open/translated (proved-status now matches the proof-index count), and the proof-graph page's heading scale and insets are aligned with the shared home/leaderboard card.
Fixed
- When admission control closes a post-cutover direct
prove(...)PR, the auto-comment now offers the re-route path as well as the re-prove path, so a contributor's direct-submission campaign self-heals into the metered queue instead of stranding (the failure mode behind #1904, "the ruv sledgehammer fix"). The comment now tells anyone whose branch already holds a verified proof to runpython3 -m tools.repo.reroute_stranded --pr <n> --push— the queued dispatcher re-packages it and Gate A re-verifies from scratch — in addition to the existing./swarm/agent.sh --provere-prove instructions. - Fixed two goals that carried a kernel-verified proof index record but whose goal record was never flipped to
proved, so they showed as open/translated on the queue and proof graph despite being solved (surfaced by the #1888 reconciliation; the proved-status count now matches the active proof-index count, 286).no-nat-sq-eq-two-mul-sq-s4— a decomposition sub-lemma whose parentno-nat-sq-eq-two-mul-sqis already proved — was leftstatus≜open/sha≜∅; it now recordsstatus≜provedand the proof sha.nat-zero-lt-succ— a grandfathered bootstrap lemma proved manually inUnsorry/Basic.leanwith nogoals/*.leanstub (the statement-binding generator deliberately skips such entries, ADR-011) — now recordsstatus≜proved; itsphase≜translate/lean≜∅are kept intentionally as the grandfathered marker. Statements are unchanged (ADR-018) — only the status/sha metadata is corrected. - Fixed the proof-graph page's layout drifting from the home and leaderboard pages: its heading scale and section margins now match the shared card (
text-5xl md:text-7xlwordmark,px-6 md:px-10inset — previouslytext-4xl md:text-6xl/md:px-8), so the three browser surfaces align. Also removed the duplicate Contributor leaderboard header link — the shared top-nav already links the leaderboard. - Fixed leaderboard attribution for @ruvnet, whose proof contributions carried
solver≜ruvnetbut were authored under the git identitiesruv <ruv@ruv.net>(the proof campaign) andrUv <…ruvnet@users.noreply.github.com>(merged PR #1278), so the ADR-037 provenance audit flagged them as uncorroborated.docs/metrics/contributor-aliases.jsonnow maps both identities to the canonical handleruvnet, so the credit attaches correctly (issue #1904).
v1.17.0 — a runnable sourcing loop for the swarm
Headline: a runnable sourcing loop for the swarm — goal-sourcing gets the same hands-off harness the prove loop already had. A new swarm/sourcing.sh (ADR-062) is the sourcing counterpart to swarm/agent.sh: it fires up Claude to run one cycle of the unsorry-goal-sourcing skill (ADR-060) — pick a hard theme, run the four sourcing gates, promote survivors to triples with gen_triples.py, and open one chore(sourcing): PR — answering the #1837 maintainer ask for "a bash script run like ./swarm/agent.sh that fires up Claude and runs a cycle of goal sourcing." It is bounded by default (one cycle; --cycles N to repeat) since sourcing has no empty-pool terminator, and reuses agent.sh's preflight, ADR-016 failure classification, ADR-059 fetch resilience, and 0/1/2/3 exit codes so supervise.sh wraps it unchanged. Alongside, a refresh-workflow fix lands the long-stale docs/proof-graph.svg (#1888) — it was regenerated every run but never staged, frozen at 153 goals · 104 proved while the real total had grown to 762 goals · 284 proved.
Added
- Added
swarm/sourcing.sh, the goal-sourcing runner (ADR-062 / SPEC-062-A) — the sourcing counterpart toswarm/agent.sh. It fires up Claude to run one cycle of theunsorry-goal-sourcingskill (ADR-060): pick a hard theme, run the four sourcing gates over candidates, promote survivors to triples withgen_triples.py, and open onechore(sourcing):PR of≤ --max-goalsgoals. Unlike the prove loop it is bounded by default (one cycle;--cycles N/UNSORRY_SOURCING_CYCLESto repeat) because sourcing has no empty-pool terminator, supports--theme,--max-goals(clamped to 1..50), and--dry-run, and reusesagent.sh's preflight, ADR-016 infrastructure-failure classification, ADR-059 fetch resilience, and 0/1/2/3 exit codes soswarm/supervise.shwraps it unchanged. Driven by a newswarm/prompts/source.mdplaybook; shellcheck-clean with a hermetic--self-test, both wired intoagent-lint.yml.
Fixed
- Fixed the proof-graph preview (
docs/proof-graph.svg) being permanently stale, so it under-counted solved problems (#1888). Theproofs-visualisationrefresh workflow regenerates all three visualiser outputs withtools.visualiser --write, but its commit$pathslisted only the.mdtable and.htmlpage — notdocs/proof-graph.svg. So the bot regenerated the SVG every run but never staged it, and its per-$pathsgit diffshort-circuited to "already in sync", leaving the SVG frozen at an old snapshot (153 goals · 104 proved) while the real total had grown to762 goals · 284 proved(+138 archived). Same path-gap class as the earlier sourcing-leaderboard refresh fix. Addeddocs/proof-graph.svgto the workflow's tracked$pathsand regenerated it somainreflects reality now. The leaderboard itself was unaffected — its--checkwas already green and it correctly counts all 424 verified proofs including archived ones.
v1.16.0 — contributor goal-sourcing (harder problems, more of them)
Headline: opening problem-sourcing to contributors — harder problems, and many more of them — the #400 mandate becomes self-serve. A contributor-facing unsorry-goal-sourcing skill (ADR-060) drives the four-gate sourcing pipeline (absence → type-check → non-triviality → provable+skeptic) under a skill-enforced difficulty bar, backed by a tested gen_triples.py triple assembler and a new sourcing leaderboard that credits whoever sourced each goal — a no-pre-claim, merge-time-dedup model that works from a fork today. Around it the swarm gets steadier at scale: coordinated provers queue locally-verified branches and agent.sh --dispatch-queue meters them into PRs under the ADR-058 submission governor, with proof-PR admission control for the queued-dispatcher cutover and a scheduled pr-label-sweep backstop for unlabelled PRs. Process hygiene tightens — the protocol gate now enforces ADR/SPEC number uniqueness (ADR-061, closing the double-ADR-059 collision), the PR classifier learns the unblock(…) shape, and a bracket-prefixed title gets a targeted hint. The prove loop gets much faster by seeding each worktree's .lake/build from a warm library (a ~735 s cold verify drops to ~2 s; soundness unchanged — CI Gate A still re-verifies from scratch), and two leaderboard refresh/attribution fixes land the sourcing board and fold the Your Name placeholder into @perttu. A fork-submission research note (#1206) maps what is left (ADR-053/054 coordination, not the kernel boundary) to open sourcing to the world.
Added
- Decided (ADR-060, SPEC-060-A, #400/#1533) to add a contributor-facing
unsorry-goal-sourcingskill that walks contributors and agents through the four-gate sourcing pipeline to generate harder problems and more of them, with a skill-enforced difficulty bar, atools/sourcing/gen_triples.pytriple assembler, a no-pre-claim + merge-time-dedup conflict model that works from a fork today, parallel Phase-2 (Euler substrate) and Phase-3 (library growth) sourcing, and a new sourcing-leaderboard mode that credits contributors who source goals. - The protocol-compliance gate (
tools/repo/pr_protocol.py,pr-protocolcheck) now enforces ADR/SPEC number uniqueness in addition to spec linkage (ADR-061 / SPEC-061-A). An addedADR-<n>that reuses a number already taken by a different ADR file — or an addedSPEC-<n>-<letter>reusing an existing number+letter — now fails the gate, naming the colliding files (multiple spec letters per ADR, e.g. SPEC-003-A/B/C, stay legal). This closes the door on the kind of collision that put twoADR-059files onmainwhen #1757 and #1837 merged seconds apart. - Added
tools/sourcing/gen_triples.py(ADR-060 / SPEC-060-A): a goal-triple assembler that turns one validated sourcing candidate into the three on-disk files — thegoals/<slug>.leansorry-stub, the SPEC-003-Agoals/<slug>.aisprecord (status≜open,sha≜∅), and thebacklog/<slug>.mdentry — and (with--validate) re-runs Gate B over them. Previously this triple was hand-authored; the tool makes it deterministic and Gate-B-clean, refuses to clobber an existing goal (ADR-018 immutability), and is the assembler the forthcomingunsorry-goal-sourcingskill invokes. - Added a research investigation (docs/proposals/non-contributor-fork-submission.md, #1206) into whether non-contributors can submit proofs from a fork. Finding: the autonomous
--proveloop assumes canonical-repo write access at four points (claim/feature-push/PR/merge) plus a credentialed verification gate, so it does not work from a fork as wired — but accepting an untrusted proof is already soundness-safe under ADR-049's central re-check, so the remaining work is coordination and governance (ADR-053/054), not the kernel boundary. A maintainer-mediated manual path (--prove-local+ hand-opened PR) already works today. - Added repository-side proof PR admission control for the queued dispatcher cutover: post-cutover direct
feature/goal-*,prove/*, orprove(...)submissions are labelled, commented, closed, and short-circuited from expensive CI while existing pre-cutover PRs continue draining. - Added a scheduled
pr-label-sweepworkflow that re-runs the single-source PR title classifier (tools/repo/pr_labels.py) every 15 minutes over any open PR left with no label and applies the correct labels. This is a backstop for the per-PRpr-labelsjob, whose run can be queued-out, superseded, or cancelled when the swarm or contributors open many PRs at once — leaving the PR permanently unlabelled. Both paths reuse the same classifier, so labelling stays deterministic and the rules live in one place. - Coordinated proof producers now queue locally verified
queued/prove/...branches by default instead of opening PRs immediately;./swarm/agent.sh --dispatch-queuemeters those branches into ordinary auto-merge PRs when the ADR-058 submission governor allows more Gate A work. - Added a sourcing leaderboard (ADR-060 / SPEC-060-A §6):
python3 -m tools.leaderboard --sourcingcredits whoever sourced each goal (addedgoals/<id>.aisp), independent of who proves it — ranked by goals sourced then difficulty points, with a proved/open split. Attribution is git add-author (the mechanism the proof leaderboard already uses), mapped throughdocs/metrics/contributor-aliases.json, so no goal-record schema change.docs/metrics/sourcing-leaderboard.jsonjoins the standard--write/--checkartifact set and is refreshed post-merge like the other leaderboards. - Added the
unsorry-goal-sourcingskill (ADR-060 / SPEC-060-A, authored with skill-creator) underSkills/. It walks a contributor or agent through SOURCING new open goals — the four-gate pipeline (absence → type-check → non-triviality → provable+skeptic), a skill-enforced max-difficulty bar, thegen_triples.pytriple format, the no-pre-claim + merge-time-dedup conflict model that works from a fork, and thechore(sourcing):≤50-goal PR + #81/#400 announce discipline — with references/, assets/ templates, and an adversarial-skeptic sub-agent. It complementsunsorry-proof-authoring(which proves existing goals).
Changed
- Coordinated
--provenow pauses before claiming new work when open proof PRs or Gate A queue pressure exceed the ADR-058 submission-governor thresholds, while--prove-localand--dry-runremain available.
Fixed
- Fixed the statement-binding generator (
tools/gate_a/check_statement_binding.py) picking the wrong module when a theorem name collides with aprivatedeclaration in an alphabetically-earlier module._module_declaringsearchedsorted(*.lean)for anytheorem/lemma <name>and returned the first hit, even if it wasprivate— which the binding cannot reference (the binding imports the module and uses the name by value), producingUnknown identifier. It now skipsprivatedeclarations and binds to the public proof module. This unblocked goalnat-sq-lt-two-pow-s2, which looped through three failed attempts (#679, #963, #1173): a recompose proof in the earlierNatSqLtTwoPowre-used the sub-lemma namesq_lt_two_pow_step_from_fiveas aprivatehelper, so every attempt's binding bound to that instead of the public proof inNatSqLtTwoPowS2. - The PR title classifier (
tools/repo/pr_labels.py) now recognises theunblock(<goal>):swarm shape — the ADR-009 step that re-opens ablockedparent goal once its sub-lemmas are proved. Previously this sanctioned-and-frequently-used shape (41 PRs in history) matched no known shape, so thepr-conventionsgate falsely failed everyunblockPR and none received a label. It now classifies to a newswarm:unblocklabel and passes the convention gate. - Fixed a transient
git fetchfailure aborting a whole solving run (#983). All ADR-042 per-agent worktrees on a host share one object database, and a fetch racing a sibling fetch or agc.autorepack could leave a thin-pack base object momentarily unreadable (failed to read delta-pack base object …→fatal: unpack-objects error). With no retry, the first such blip madesync_repoexit and the run stopped. Every swarm fetch now goes through agit_fetch_retryhelper (ADR-059) that runs withgc.auto=0and retries with exponential backoff (UNSORRY_FETCH_RETRIES,UNSORRY_FETCH_BACKOFF); a fetch that still cannot complete surfaces as an infrastructure failure (exit 3) sosupervise.shbacks off and reschedules instead of dying on the first hiccup. - Fixed the leaderboard auto-refresh so it actually commits
docs/metrics/sourcing-leaderboard.json. The sourcing artifact was added to the generator's--write/--checkset (the sourcing leaderboard, ADR-060) but was missing from the refresh workflow's$pathslist, so the bot regenerated it but never staged it — and its per-$pathsdrift check falsely reported "in sync", so the sourcing board (including theYour Name→ @perttu attribution fold) never landed onmain. Added the path so the refresh tracks, stages, and pushes it like the other generated artifacts. - The
pr-conventionsgate now gives a targeted diagnostic when a PR title fails only because of a leading bracket prefix (e.g.[codex] ci: …): it names the prefix and sh...
v1.15.0 — a leaner, cheaper Gate A at scale + decentralised-runner groundwork
Headline: a leaner, cheaper Gate A at scale — plus groundwork for a decentralised runner — as proofs compound, Gate A is kept fast and memory-bounded instead of slowly OOM-ing: two more archive blocks (0003 and 0004, ADR-041) retire 58 proved goals and shrink the active library from 108 → 50 modules; archive validation switches to verify-on-ingest (git-blob provenance instead of a kernel re-replay, ADR-048) so archive PRs fit an 8 GB runner; push-to-main now kernel-replays incrementally with a daily full-replay backstop (ADR-048 Phase 2); the full-replay trigger narrows to olean-invalidating changes only; and a persistent Namespace .lake cache volume (ADR-046) removes the last constant per-job mathlib-restore cost. Soundness is unchanged throughout — every proof is still kernel-replayed once under its pinned toolchain. Alongside, ADR-049 lands the research + a Phase-0 conformance suite for a future decentralised CI runner, and the browser surfaces grow a proofs-over-time graph, a Top 5 view, and a shared home page.
Added
- ADR-049 Phase-0 conformance suite (
tools/gate_a/tests/test_decentralised_runner_conformance.py, #635) — locks the decentralised-runner soundness invariant: the central re-check feedsleancheckeronly locally-derived modules (never a contributor artifact), an untrusted diff fails closed to a full re-check, and a changed module always drags its*Binding+ reverse-import closure into scope. - ADR-049 research (#635) — a ranked options analysis (
docs/proposals/decentralised-ci-runner-architecture.md) and ADR-049 (Proposed) recommending a tiered split: push the expensivelake buildelaboration onto the contributor, keep a mandatory cheap central kernel re-check from canonical source as the sole load-bearing gate. - Leaderboard proofs-over-time view (#738) — a cumulative line graph of kernel-verified proofs by date, plus a shared Home · Leaderboard · Proof graph top nav and a new
docs/index.htmllanding page. - Leaderboard Top 5 view — a three-tab switcher (Leaderboard · Top 5 · Proofs over time), README links from the home page, and small-screen tuning.
- Persistent Namespace
.lakecache volume (nscloud-cache-action, ADR-046) — mathlib oleans + the local build survive across jobs/runs, eliminating the ~2–3.5-min GitHub-cache mathlib restore paid on every job. Advisory:leancheckerstill re-checks every olean.
Changed
- Amended ADR-041: archive closed proof families only — a decomposition family (parent + subs + proofs + metadata) is archived as one atomic unit, and only when every member is proved.
- Cut archive block 0003 (ADR-041) — retired 30 proved goals (whole trees + standalone) into
packages/unsorry-archive-0003/; active library 108 → 78 modules. - Cut archive block 0004 (ADR-041) — retired 28 proved goals into
packages/unsorry-archive-0004/; active library 78 → 50 modules. - Documented the archive-block cut as a repeatable runbook (SPEC-041-A §8/§9), including the "cut early and often, 40 is a ceiling not a goal" at-scale guidance.
- Corrected the archive-cut runbook after 0003/0004: archive whole decomposition trees only (Gate B GB008/GB016 atomicity) and never regenerate generated docs in the cut (they race main's refresh bot).
- Gate A push-to-
mainnow replays incrementally (ADR-048 Phase 2) with a dailygate-a-full-replay.ymlbackstop; routine Gate A (proof PRs and push tomain) routes to the 8 GBunsorry-1runner. Soundness unchanged — every proof still kernel-replayed once. - Narrowed the Gate A full-replay trigger to olean-invalidating changes only (
lean-toolchain,lakefile.*,lake-manifest.json), so gate/CI PRs no longer ride the memory-bound serial full replay that OOM-killed the 16 GB runner. - Gate A no longer rebuilds
UnsorryLibraryin the audit/replay jobs on an exact-sha cache hit (ADR-045 amendment) — the audit job drops from ~535 s to ~320 s. - Adopted verify-on-ingest for archive validation (ADR-048) — archived proofs are checked byte-identical to the already-verified version (git-blob provenance) instead of re-running
leanchecker, removing the archive-OOM class.
Fixed
gate-a-archivenow runs on the 16 GBunsorry-2runner (was routed to the 8 GB pool and OOM-killed, #764).- Archive-package replay now chunks at
REPLAY_CHUNK_SIZE, bounding peak memory — fixes the exit-137 OOM on memory-bound runners (#764). - Archive-package validation runs in its own parallel
gate-a-archivejob (own headroom + swap + 120-min budget) instead of insidegate-a-prepare(#764). - Gate B's advisory AISP validator now runs on Node.js 24.
Full changelog: https://github.com/agenticsnz/unsorry/blob/main/CHANGELOG.md
Compare: v1.14.0...v1.15.0
v1.14.0 — the Identity Engine + a faster, leaner swarm
Headline: the Identity Engine, and a faster, more self-sufficient swarm — the #400 big-hairy-goal lands as the Identity Engine (ADR-043), a themed program to mass-source mathlib-absent, non-trivial, swarm-provable identities & inequalities at scale, each vetted (absence + triviality + a compiling proof + an adversarial skeptic) before sourcing, with the next 200–300 staged as a candidate backlog. Around it, Gate A gets much faster — an incremental Lake build cache (ADR-045) replaces the ~16–21-min-per-job cold build, with timeout headroom for the rare cold path — and the swarm gets less idle and less brittle: parked-goal recovery (ADR-044), an archive-aware goal-immutability check, and self-verify/lint parity fixes that unblock main (#612).
Added
- The Identity Engine (
docs/plans/identity-engine.md, ADR-043): a themed program to mass-source mathlib-absent, non-trivial, swarm-provable elementary identities & inequalities at scale — the #400 big-hairy-goal. Each candidate is verified before sourcing (absence + ADR-035 triviality + a compiling intended proof + an adversarial skeptic pass), and the next 200–300 are staged as a vetted candidate backlog underbacklog/candidates/.
Changed
- Gate A's
gate-a-prepareandgate-a-auditjobs now allow 45 minutes (was 30);gate-a-replaystays at 60 — headroom for the rare cold library build (#567/#573, SPEC-045-A). - Gate A now caches the local Lake build directory (
.lake/build) across runs, solake build UnsorryLibraryis incremental instead of cold-building the whole library in every job (ADR-045/SPEC-045-A) — the library is built once per run instead of three times. Soundness unchanged: the axiom audit +leancheckerkernel replay still validate every olean that reachesmain.
Fixed
- Fixed
agent-lintfailing onmain(and therefore every open PR): rewrote the #612 binding-suppression check inswarm/agent.shto clear shellcheck SC2015. No behaviour change;--self-testgreen. - Gate A's goal-immutability check (
tools/gate_a/check_goal_immutability.py, ADR-018) is now archive-aware, unblocking ADR-041 archive-block retirement — a delete is exempt iff the goal id is in an archive manifest and the archived statement is byte-identical at the base ref. - The swarm no longer reports "nothing claimable" while dozens of open goals sit recoverable (ADR-044/SPEC-044-A): a
--provecycle with no claimable viable goal now re-surfaces one goal parked belowTAU_Vand retries it through the same pipeline with its accumulated lesson history. - The swarm's local proof self-verify now suppresses
linter.unusedVariablesin the regenerated ADR-011 statement-binding obligation, matching Gate A — so a proof that leaves a hypothesis unused is no longer needlessly decomposed (#612).
Full changelog: https://github.com/agenticsnz/unsorry/blob/main/CHANGELOG.md
v1.13.0 — harness & governance hardening
Harness & governance hardening — the swarm gets more self-sufficient. This is also the first release cut with the ADR-040 changelog-fragment tool (tools.changelog --release), folding 13 accumulated fragments since v1.12.0.
Added
- Isolated per-agent worktree (ADR-042): each
swarm/agent.shrun operates in its own git worktree. - Self-updating harness (ADR-039): the agent re-execs itself when the harness updates mid-run.
- OpenAI
--provetext-extraction fallback (ADR-041, #560): proof-specialised models that emit Lean as text rather than OpenAI tool calls can now prove, not just translate. Gate A re-checks every extracted proof, so soundness is unchanged. - Shared web-surface design language (ADR-038): one documented design language restyling the leaderboard and proof-graph visualiser.
- Leaderboard model-distribution panel (#455) and a README preview SVG for the visualiser (#447).
Changed
- Changelog fragments (ADR-040): per-PR
changelog.d/fragments replace direct[Unreleased]edits, so concurrent PRs never conflict on the changelog at any merge rate. - Gate A runner routing by replay workload (#441): light per-proof replay and heavy full replay run on separate Namespace profiles.
- Leaderboard UI fixes (#455) and visualiser restyle (#447).
Fixed
- The #387 retro-audit
decide-noise misclassification. - The
check_triviality --allretro-audit is no longer ~11 builds/goal (#411). - The triviality probe now classifies a
failed to synthesizeelaboration error as a probe-error, not non-trivial (#410).
See CHANGELOG.md for the full entries.
v1.12.0 - Leaderboard, fixed auto-update for proof graph
The contributor leaderboard (issue #270, ADR-023) — a standalone page ranking solvers by verified proofs, cross-linked with the #371 proof-graph visualiser.
Around it, the post-merge generated-artifact pipeline (board, leaderboard, visualisation) became self-healing: it now actually pushes to main via an admin REFRESH_TOKEN (the GH006 fix, #417) with a race-tolerant retry loop (#426), so the three surfaces refresh on every merge without red runs.
Contributor attribution is also corrected and guarded — a phantom-solver check (ADR-037) flags any solver≜ not backed by a proof-run, git author, or alias (#431/#433). Plus a Skills framework for agent-driven work, and the REFRESH_TOKEN secret unification.
See CHANGELOG.md for the full list.