Coding agents declare "done" on code that is still wrong. AgentOps catches that. Before a change counts as done, something that didn't write it has to check it: a different model, or a test that actually runs. No verdict = not done. It sits on top of the agent you already use (Claude Code, Codex, Cursor, OpenCode).
Pick your runtime and install:
# Claude Code
claude plugin marketplace add boshu2/agentops
claude plugin install agentops@agentops-marketplace
# Codex CLI (macOS/Linux/WSL) — OpenCode: install-opencode.sh
curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-codex.sh | bash
# Codex CLI (Windows):
irm https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-codex.ps1 | iex
# Gemini / Antigravity
curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-agy.sh | bash
# Other skills-compatible agents (Cursor, etc.)
npx skills@latest add boshu2/agentops --cursor -gThe ao CLI is optional but recommended (bookkeeping, retrieval, the release gate):
brew tap boshu2/agentops https://github.com/boshu2/homebrew-agentops && brew install agentops # macOS
# Windows: irm https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-ao.ps1 | iex
# Or release binaries / build from source (cli/README.md).Installs hookless. The only hard requirement is an agent runtime and git; everything else degrades gracefully. Dependencies: docs/dependencies.md · Day-2 ops (update, backup, recovery): docs/install-day2-ops.md.
- A validation membrane. Tests, gates,
/pre-mortem,/validate, and/councilprove or reject the work before you trust it. No verdict, not done. - A bookkeeper that outlives the session. Work is tracked as beads, and every verdict is bound into a hash-chained provenance ledger: tamper-evident, grep-able, and portable across sessions and models. The record is the proof a change was actually checked — not a memory of one.
- An evidence trail that's yours. Every run, decision, and verdict lands in
.agents/in your repo: grep-able, diff-able, portable to whatever model wins next quarter. AgentOps adds no hosted control plane and no telemetry; the corpus lives in your repo, not on our servers. Apache-2.0. - It runs on the agent you already pay for. Claude Code, Codex, Cursor, OpenCode. Same skills, same corpus.
> /validate --mixed # the agent reported this PR done
[membrane] evidence sealed → fresh-context judges, Claude Code + Codex CLI
[claude/judge-1] REFUTE /login has no rate limit — claimed "covered", isn't
[codex/judge-1] REFUTE token-bucket refill lacks jitter under burst
[claude/judge-2] PASS redis integration follows the repo pattern
Verdict: HOLD — not done. Fix /login limit + refill jitter, then re-verify.
Recorded as a proof artifact — no verdict, not done.
Already installed? Try it in three steps: make a small change and commit it, run ao verify my-first-change, then read the verdict. A model that had no part in writing the change reviews your commit, prints CONFIRMED or REFUTED, and records the result as a line in docs/provenance/ledger.jsonl inside your repo.
The rest is below the fold for anyone who wants the detail.
Every skill works alone; flows compose them. Full catalog: docs/SKILLS.md · Skill Router.
| Skill | Use it when |
|---|---|
/research |
you need codebase context and prior learnings before changing code |
/pre-mortem |
you want to pressure-test a plan before building |
/rpi |
you want discovery, build, validation, and bookkeeping in one flow |
/council |
you want independent judges (optionally Claude and Codex) to return one verdict |
/validate |
you want a code-quality and risk review before shipping |
/evolve |
a goal-driven improvement loop that runs without mutating source |
Repo-native control plane behind the skills. Full reference: CLI commands.
ao verify # independent verdict on your latest change
ao gate check --fast # the release gate before you push
ao provenance show <sha> # the recorded verdict trail for any commit
ao done <bead-id> # close tracked work with its verdict attached
ao quick-start # set up AgentOps in a repo
ao doctor # check reviewers, binary, and ledger health
# Experimental (still measuring whether these pay off; see the honest version below):
ao search "query" # search history and local knowledge
ao lookup --query "topic" # retrieve curated learnings
ao compile # rebuild the corpusThe whole loop runs in a plain session. No daemon, no scheduler, no cloud. For always-on work, it can hand each task to a background runner instead. Details: docs/3.0.md · operating loop.
Proven: independent verification that records a verdict, and a durable, tamper-evident record of it. A change isn't done until something that didn't write it checks it, and that verdict is bound into the provenance ledger. No verdict, not done.
The receipts are public: membrane receipts — every number derived straight from the verdict ledger, none hand-written.
Still measuring: whether the accumulated corpus makes the next session measurably better. We won't claim it until the numbers say so (ADR-0004, ADR-0011).
AgentOps proves the work. It doesn't write the code; your agent still does that, and the cross-checks cost tokens. The .agents/ folder is plain markdown your agents keep up as they go.
When the labs ship their own version of this, your .agents/ folder comes with you. It's in your repo, in plain markdown, Apache-2.0.
What 3.0 is · vs hosted code review · docs index · newcomer guide · architecture · FAQ · built on the 12-factor doctrine.
Contributing: docs/CONTRIBUTING.md (agents: read AGENTS.md, track work with br). License: Apache-2.0.