Skip to content

v1.16.0 — contributor goal-sourcing (harder problems, more of them)

Choose a tag to compare

@cgbarlow cgbarlow released this 17 Jun 04:08
· 259 commits to main since this release
11778d9

Headline: opening problem-sourcing to contributors — harder problems, and many more of them — the #400 mandate becomes self-serve. A contributor-facing unsorry-goal-sourcing skill (ADR-060) drives the four-gate sourcing pipeline (absence → type-check → non-triviality → provable+skeptic) under a skill-enforced difficulty bar, backed by a tested gen_triples.py triple assembler and a new sourcing leaderboard that credits whoever sourced each goal — a no-pre-claim, merge-time-dedup model that works from a fork today. Around it the swarm gets steadier at scale: coordinated provers queue locally-verified branches and agent.sh --dispatch-queue meters them into PRs under the ADR-058 submission governor, with proof-PR admission control for the queued-dispatcher cutover and a scheduled pr-label-sweep backstop for unlabelled PRs. Process hygiene tightens — the protocol gate now enforces ADR/SPEC number uniqueness (ADR-061, closing the double-ADR-059 collision), the PR classifier learns the unblock(…) shape, and a bracket-prefixed title gets a targeted hint. The prove loop gets much faster by seeding each worktree's .lake/build from a warm library (a ~735 s cold verify drops to ~2 s; soundness unchanged — CI Gate A still re-verifies from scratch), and two leaderboard refresh/attribution fixes land the sourcing board and fold the Your Name placeholder into @perttu. A fork-submission research note (#1206) maps what is left (ADR-053/054 coordination, not the kernel boundary) to open sourcing to the world.

Added

  • Decided (ADR-060, SPEC-060-A, #400/#1533) to add a contributor-facing unsorry-goal-sourcing skill that walks contributors and agents through the four-gate sourcing pipeline to generate harder problems and more of them, with a skill-enforced difficulty bar, a tools/sourcing/gen_triples.py triple assembler, a no-pre-claim + merge-time-dedup conflict model that works from a fork today, parallel Phase-2 (Euler substrate) and Phase-3 (library growth) sourcing, and a new sourcing-leaderboard mode that credits contributors who source goals.
  • The protocol-compliance gate (tools/repo/pr_protocol.py, pr-protocol check) now enforces ADR/SPEC number uniqueness in addition to spec linkage (ADR-061 / SPEC-061-A). An added ADR-<n> that reuses a number already taken by a different ADR file — or an added SPEC-<n>-<letter> reusing an existing number+letter — now fails the gate, naming the colliding files (multiple spec letters per ADR, e.g. SPEC-003-A/B/C, stay legal). This closes the door on the kind of collision that put two ADR-059 files on main when #1757 and #1837 merged seconds apart.
  • Added tools/sourcing/gen_triples.py (ADR-060 / SPEC-060-A): a goal-triple assembler that turns one validated sourcing candidate into the three on-disk files — the goals/<slug>.lean sorry-stub, the SPEC-003-A goals/<slug>.aisp record (status≜open, sha≜∅), and the backlog/<slug>.md entry — and (with --validate) re-runs Gate B over them. Previously this triple was hand-authored; the tool makes it deterministic and Gate-B-clean, refuses to clobber an existing goal (ADR-018 immutability), and is the assembler the forthcoming unsorry-goal-sourcing skill invokes.
  • Added a research investigation (docs/proposals/non-contributor-fork-submission.md, #1206) into whether non-contributors can submit proofs from a fork. Finding: the autonomous --prove loop assumes canonical-repo write access at four points (claim/feature-push/PR/merge) plus a credentialed verification gate, so it does not work from a fork as wired — but accepting an untrusted proof is already soundness-safe under ADR-049's central re-check, so the remaining work is coordination and governance (ADR-053/054), not the kernel boundary. A maintainer-mediated manual path (--prove-local + hand-opened PR) already works today.
  • Added repository-side proof PR admission control for the queued dispatcher cutover: post-cutover direct feature/goal-*, prove/*, or prove(...) submissions are labelled, commented, closed, and short-circuited from expensive CI while existing pre-cutover PRs continue draining.
  • Added a scheduled pr-label-sweep workflow that re-runs the single-source PR title classifier (tools/repo/pr_labels.py) every 15 minutes over any open PR left with no label and applies the correct labels. This is a backstop for the per-PR pr-labels job, whose run can be queued-out, superseded, or cancelled when the swarm or contributors open many PRs at once — leaving the PR permanently unlabelled. Both paths reuse the same classifier, so labelling stays deterministic and the rules live in one place.
  • Coordinated proof producers now queue locally verified queued/prove/... branches by default instead of opening PRs immediately; ./swarm/agent.sh --dispatch-queue meters those branches into ordinary auto-merge PRs when the ADR-058 submission governor allows more Gate A work.
  • Added a sourcing leaderboard (ADR-060 / SPEC-060-A §6): python3 -m tools.leaderboard --sourcing credits whoever sourced each goal (added goals/<id>.aisp), independent of who proves it — ranked by goals sourced then difficulty points, with a proved/open split. Attribution is git add-author (the mechanism the proof leaderboard already uses), mapped through docs/metrics/contributor-aliases.json, so no goal-record schema change. docs/metrics/sourcing-leaderboard.json joins the standard --write/--check artifact set and is refreshed post-merge like the other leaderboards.
  • Added the unsorry-goal-sourcing skill (ADR-060 / SPEC-060-A, authored with skill-creator) under Skills/. It walks a contributor or agent through SOURCING new open goals — the four-gate pipeline (absence → type-check → non-triviality → provable+skeptic), a skill-enforced max-difficulty bar, the gen_triples.py triple format, the no-pre-claim + merge-time-dedup conflict model that works from a fork, and the chore(sourcing): ≤50-goal PR + #81/#400 announce discipline — with references/, assets/ templates, and an adversarial-skeptic sub-agent. It complements unsorry-proof-authoring (which proves existing goals).

Changed

  • Coordinated --prove now pauses before claiming new work when open proof PRs or Gate A queue pressure exceed the ADR-058 submission-governor thresholds, while --prove-local and --dry-run remain available.

Fixed

  • Fixed the statement-binding generator (tools/gate_a/check_statement_binding.py) picking the wrong module when a theorem name collides with a private declaration in an alphabetically-earlier module. _module_declaring searched sorted(*.lean) for any theorem/lemma <name> and returned the first hit, even if it was private — which the binding cannot reference (the binding imports the module and uses the name by value), producing Unknown identifier. It now skips private declarations and binds to the public proof module. This unblocked goal nat-sq-lt-two-pow-s2, which looped through three failed attempts (#679, #963, #1173): a recompose proof in the earlier NatSqLtTwoPow re-used the sub-lemma name sq_lt_two_pow_step_from_five as a private helper, so every attempt's binding bound to that instead of the public proof in NatSqLtTwoPowS2.
  • The PR title classifier (tools/repo/pr_labels.py) now recognises the unblock(<goal>): swarm shape — the ADR-009 step that re-opens a blocked parent goal once its sub-lemmas are proved. Previously this sanctioned-and-frequently-used shape (41 PRs in history) matched no known shape, so the pr-conventions gate falsely failed every unblock PR and none received a label. It now classifies to a new swarm:unblock label and passes the convention gate.
  • Fixed a transient git fetch failure aborting a whole solving run (#983). All ADR-042 per-agent worktrees on a host share one object database, and a fetch racing a sibling fetch or a gc.auto repack could leave a thin-pack base object momentarily unreadable (failed to read delta-pack base object …fatal: unpack-objects error). With no retry, the first such blip made sync_repo exit and the run stopped. Every swarm fetch now goes through a git_fetch_retry helper (ADR-059) that runs with gc.auto=0 and retries with exponential backoff (UNSORRY_FETCH_RETRIES, UNSORRY_FETCH_BACKOFF); a fetch that still cannot complete surfaces as an infrastructure failure (exit 3) so supervise.sh backs off and reschedules instead of dying on the first hiccup.
  • Fixed the leaderboard auto-refresh so it actually commits docs/metrics/sourcing-leaderboard.json. The sourcing artifact was added to the generator's --write/--check set (the sourcing leaderboard, ADR-060) but was missing from the refresh workflow's $paths list, so the bot regenerated it but never staged it — and its per-$paths drift check falsely reported "in sync", so the sourcing board (including the Your Name@perttu attribution fold) never landed on main. Added the path so the refresh tracks, stages, and pushes it like the other generated artifacts.
  • The pr-conventions gate now gives a targeted diagnostic when a PR title fails only because of a leading bracket prefix (e.g. [codex] ci: …): it names the prefix and shows the corrected title (ci: …). Previously the gate just listed all valid shapes, so agents that auto-prefix their titles (e.g. Codex) had no clear signal to strip the prefix. The first-token rule is now also documented in CONTRIBUTING.md and docs/pr-labels.md.
  • Fixed prove-mode verification recompiling the entire UnsorryLibrary (hundreds of modules, growing with every merged proof) from scratch in each fresh prove worktree — the dominant per-cycle cost, and crippling under concurrency where many agents contend for cores (measured: a ~735 s cold verify, ~9–12 min uncontended and 20–30 min with ten agents). The agent now keeps its own checkout's library oleans warm (ensure_warm_library, incremental after the first cycle) and seeds each prove worktree's .lake/build from it (seed_library_cache) so the verify compiles only the agent's new module: the same build drops to ~2 s (0 modules recompiled). This also speeds the provider's own iterative build/fix loop, which previously stalled for many minutes per attempt waiting on full rebuilds. Soundness is unchanged — CI Gate A still re-verifies the whole library from scratch (ADR-049); seeding only accelerates the local pre-check. Opt out with UNSORRY_SEED_LIBRARY=0.
  • Fixed the sourcing-leaderboard attribution gap where 4 goals appeared under the git default placeholder identity Your Name <you@example.com>. They were sourced by @perttu on an early commit (PR #258) before git was configured; docs/metrics/contributor-aliases.json now maps both that placeholder and Perttu's real git identity to the perttu handle, so the goals are credited to their real sourcer and Perttu's entries consolidate into one row.