Repo-agnostic engine for running Claude Code (and, in future, other CLI agents) autonomy loops against any target repo, from one operator's account.
One guided entry point chains the whole onboarding sequence:
bin/quickstart.sh /path/to/target-repoIt scaffolds the .autonomy/ pack, walks you through the minimum config
(board owner/title, model, merge-gate strategy — Enter keeps the shown value;
writes are comment-preserving), runs the doctor report, offers to create the
dedicated worktree + launchd plist, offers to register the repo for the
dashboard, and prints the exact go-live commands. It never runs launchctl
itself — loading the supervisor stays a deliberate operator step.
Idempotent: re-run it any time; pressing Enter everywhere changes nothing.
Every prompt has a flag twin (--board-owner, --board-title, --model,
--merge-gate, --worktree yes|no, --register yes|no) for non-interactive
use.
Under the hood it runs the same tools you can drive by hand:
# 1. Scaffold a new target repo's pack:
bin/onboard.sh /path/to/target-repo
# edit /path/to/target-repo/.autonomy/config.yaml
# 2. Check it's ready:
bin/doctor.sh /path/to/target-repo
# 3. Create its dedicated worktree + launchd plist:
bin/setup_worktree.sh /path/to/target-repo
# 4. Load it (see setup_worktree.sh's own printed next-steps for the exact commands)Every target repo needs a .autonomy/ directory with exactly these three files:
loop_prompt.md— the standing task, passed as the primary prompt (claude -p).hard_rules.md— non-negotiable safety rules, appended to the session's system prompt.config.yaml— project policy (see schema below).bin/onboard.shscaffolds all three fromtemplates/autonomy-pack/.
.autonomy/config.yaml existing and parsing is the engine's hard requirement for treating a
directory as a valid target repo — doctor.sh/supervisor.sh both refuse to proceed without it.
board:
owner: <github-user-or-org>
project_title: "<Projects v2 board title>"
engine:
label: <slug> # optional; disambiguates two repos sharing a basename
requires_claude_md: <bool> # hard-fail (not just warn) if .claude/CLAUDE.md is missing
account_key: <key> # optional; which account's shared usage-limit marker this
# repo participates in (default: "default" -- all repos on
# this machine share one account's rate-limit state)
agent:
type: claude # claude | codex (only claude has an adapter implemented)
model:
primary: <model-id>
fallback: <model-id>
config: {} # opaque, adapter-owned pass-through (unused by the claude adapter)
merge_gate:
strategy: manual | ci_only | bot_comment | gh_review
author_login: <string> # bot_comment only
marker: <string> # bot_comment only
doc_only_extensions: [<ext>] # bot_comment only, e.g. [".md"]
reviewer_login: <string> # gh_review only
worktree:
default_path: "../.{repo-slug}-autonomy" # worktree path: positional arg overrides, {repo-slug} substitutedEvery value is either optional-with-an-engine-default or required-only-for-the-strategy-that-uses- it. Nothing in the engine hardcodes any one repo's actual values.
{repo-slug} = engine.label if set, else the target repo's directory basename, lowercased,
non-alphanumeric runs collapsed to -.
CI-green is checked first (any failing/pending check refuses; a gh API failure itself refuses,
never treated as green; ci_only additionally refuses on zero configured checks). Then:
| Strategy | What it checks |
|---|---|
manual (default) |
Nothing further — never auto-merges. PRs stay open for a human. |
ci_only |
Nothing further — CI green is the whole gate. |
bot_comment |
Latest matching issue comment (by author_login + marker) postdates the head commit and reads APPROVE, no BLOCKING/REQUEST CHANGES. Includes a doc-only fast path for PRs where every changed file matches doc_only_extensions. |
gh_review |
Latest GitHub Review object from reviewer_login postdates the head commit and its state == APPROVED. |
bin/safe_merge.sh <pr-number> is the only sanctioned merge path — the loop must never call
gh pr merge directly.
| Script | Purpose |
|---|---|
supervisor.sh --repo <path> [--agent-type] [--model] [--fallback-model] [--label] |
The main loop launchd runs |
quickstart.sh <target-repo> [flags] |
Guided single-entry onboarding: onboard → minimum config → doctor → optional worktree → optional dashboard registration → printed go-live commands (never runs launchctl) |
control.sh list | register | unregister | start | stop | pause | resume |
Multi-repo control unit over ~/.config/autonomy/repos: loop states from the supervisor's own lock/sentinel, start/stop via launchctl against the installed plists, graceful pause/resume via the sentinel. --all fans out. Never provisions |
onboard.sh <target-repo> |
Scaffold .autonomy/ (idempotent) |
doctor.sh <target-repo> |
Full readiness report (network calls; diagnostic-only, never provisions) |
setup_worktree.sh <target-repo> [worktree-path] |
Create/reuse the dedicated worktree + install the launchd plist |
worktree_gc.sh --repo <path> |
Prune stale worktrees + merged branches |
safe_merge.sh <pr-number> |
The only sanctioned merge path |
board.sh status <issue#> "<status>" | add <issue#> |
Best-effort GitHub Projects v2 board updates |
unblock_dependents.sh <merged-pr-number> |
Post-merge "blocked by #X" notifier |
dashboard.py --repo <path> [--repo …] [--port 8787] |
Control-room page. Stdlib HTTP+SSE, binds 127.0.0.1 only. Renders the engine's emitted artifacts (session logs, supervisor.log, git/gh, config, quota); lifecycle controls (start / graceful-stop / hard-stop / resume) via a token-guarded POST |
agents/claude.sh |
The Claude Code agent adapter (only one implemented) |
Three levers, distinct on purpose:
- Start (hard) / stop (hard):
launchctl bootstrap/launchctl bootoutthe plistsetup_worktree.shinstalled. Hard-stop kills the supervisor process; a mid-session hard-stop interrupts the running agent. - Graceful-stop (pause): create the sentinel file
<target-repo>/var/autonomy-logs/autonomy-PAUSE. The supervisor checks it at the top of the loop, so the current session always finishes — never a mid-session kill — then idles (polling everyPAUSE_POLL=30s). It logs the pause once tosupervisor.log. - Resume / start-if-stopped: remove the sentinel; the supervisor logs "resuming" and continues.
(Under launchd
KeepAlive=true, exiting on pause would just be relaunched — idling is the only stop that holds, which is why graceful-stop pauses rather than exits.)
The dashboard's graceful-stop / resume controls (issue #10) drive this sentinel.
All three levers are also available across every registered repo at once through
bin/control.sh (start/stop wrap launchctl against the installed plist;
pause/resume wrap the sentinel; --all fans out over ~/.config/autonomy/repos).
When a session hits the account rate limit, the agent adapter extracts the
API-reported reset epoch and the supervisor persists it (that split is an
invariant) — to the repo-local var/autonomy-logs/.last_usage_reset and to an
account-shared marker ~/.config/autonomy/usage-reset.<account_key>, so parallel
supervisors on the same account back off together instead of each rediscovering the
wall. Waits use the latest valid epoch across both files; garbage/stale/torn
markers are ignored per-file (fail-safe), and a clean session clears both. Repos on
different accounts set engine.account_key to keep their markers separate.
bin/dashboard.py --repo /path/to/worktree [--repo /another] [--port 8787]
# open http://127.0.0.1:8787/--repo can be omitted: discovery falls back to the AUTONOMY_DASHBOARD_REPOS
environment variable (newline-separated paths), then to ~/.config/autonomy/repos
(one path per line — quickstart.sh's register step appends here). CLI --repo
always wins.
A single self-contained local page — stdlib HTTP + SSE, localhost-bind only, no build step. It exposes what the engine already emits, nothing invented:
- Now — each repo's current worker: working / idle / paused / stopped (working-vs-idle is the freshness of the session log, not the lock pid), current step, in-flight ticket, elapsed.
- Repos & roles — the multi-role roster (Coder live; PM/QA/Researcher shown per docs/agent-org-design.md, rendered even before they're built).
- Activity — tree / timeline / tally over the session's stream-json tool calls (subagents
nest by
parent_tool_use_id). - Account quota — real 5-hour and weekly
utilizationfromrate_limit_events. - Throughput — server-sampled output tok/min over wall-clock (flatlines when idle).
- Supervisor voice — the loop's own decisions, tailing
supervisor.log. - Git in flight — open PRs (CI / review / mergeable) + recently-merged tickets.
Lifecycle controls (per running/stopped repo, in the Now cards): start / graceful-stop / hard-stop
/ resume. Behind POST /api/control, which requires a per-process token embedded in the served page
(defeats cross-origin / DNS-rebinding drive-by) and only ever acts on a managed repo — lifecycle
only, never a target repo's trade/order path. Hard-stop and start prompt for confirmation; start
never auto-fires. graceful-stop/resume drive the autonomy-PAUSE sentinel; hard-stop/start drive
launchctl bootout/bootstrap on the repo's installed plist.
Design + the research behind it: docs/dashboard-design.md, docs/control-room-research.md. Dark/light toggle.
bin/agents/<type>.sh, dispatched by agent.type. Each implements two functions:
agent_invoke(prompt_file, safety_file, model, fallback_model, log_file) -> exit codeagent_classify_outcome(log_file, exit_code) -> "success" | "usage_limit [epoch]" | "error"
Only claude.sh exists today. A codex.sh is a real future possibility (Codex's CLI differs
structurally — no system-prompt-append flag, its own JSONL schema, no native fallback-model
support) but is not built or tested here.
bash tests/run_all.sh