AgentOps

Autonomous code validation for coding agents

Coding agents declare "done" on code that is still wrong. AgentOps catches that. Before a change counts as done, something that didn't write it has to check it: a different model, or a test that actually runs. No verdict = not done. It sits on top of the agent you already use (Claude Code, Codex, Cursor, OpenCode).

Install

Pick your runtime and install:

# Claude Code
claude plugin marketplace add boshu2/agentops
claude plugin install agentops@agentops-marketplace

# Codex CLI (macOS/Linux/WSL) — OpenCode: install-opencode.sh
curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-codex.sh | bash
# Codex CLI (Windows):
irm https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-codex.ps1 | iex

# Gemini / Antigravity
curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-agy.sh | bash

# Other skills-compatible agents (Cursor, etc.)
npx skills@latest add boshu2/agentops --cursor -g

The ao CLI is optional but recommended (bookkeeping, retrieval, the release gate):

brew tap boshu2/agentops https://github.com/boshu2/homebrew-agentops && brew install agentops   # macOS
# Windows: irm https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-ao.ps1 | iex
# Or release binaries / build from source (cli/README.md).

Installs hookless. The only hard requirement is an agent runtime and git; everything else degrades gracefully. Dependencies: docs/dependencies.md · Day-2 ops (update, backup, recovery): docs/install-day2-ops.md.

What you get

A validation membrane. Tests, gates, /pre-mortem, /validate, and /council prove or reject the work before you trust it. No verdict, not done.
A bookkeeper that outlives the session. Work is tracked as beads, and every verdict is bound into a hash-chained provenance ledger: tamper-evident, grep-able, and portable across sessions and models. The record is the proof a change was actually checked — not a memory of one.
An evidence trail that's yours. Every run, decision, and verdict lands in .agents/ in your repo: grep-able, diff-able, portable to whatever model wins next quarter. AgentOps adds no hosted control plane and no telemetry; the corpus lives in your repo, not on our servers. Apache-2.0.
It runs on the agent you already pay for. Claude Code, Codex, Cursor, OpenCode. Same skills, same corpus.

> /validate --mixed   # the agent reported this PR done

[membrane] evidence sealed → fresh-context judges, Claude Code + Codex CLI
[claude/judge-1] REFUTE  /login has no rate limit — claimed "covered", isn't
[codex/judge-1]  REFUTE  token-bucket refill lacks jitter under burst
[claude/judge-2] PASS    redis integration follows the repo pattern
Verdict: HOLD — not done. Fix /login limit + refill jitter, then re-verify.
Recorded as a proof artifact — no verdict, not done.

Already installed? Try it in three steps: make a small change and commit it, run ao verify my-first-change, then read the verdict. A model that had no part in writing the change reviews your commit, prints CONFIRMED or REFUTED, and records the result as a line in docs/provenance/ledger.jsonl inside your repo.

The rest is below the fold for anyone who wants the detail.

Skills

Every skill works alone; flows compose them. Full catalog: docs/SKILLS.md · Skill Router.

Skill	Use it when
`/research`	you need codebase context and prior learnings before changing code
`/pre-mortem`	you want to pressure-test a plan before building
`/rpi`	you want discovery, build, validation, and bookkeeping in one flow
`/council`	you want independent judges (optionally Claude and Codex) to return one verdict
`/validate`	you want a code-quality and risk review before shipping
`/evolve`	a goal-driven improvement loop that runs without mutating source

The `ao` CLI

Repo-native control plane behind the skills. Full reference: CLI commands.

ao verify                 # independent verdict on your latest change
ao gate check --fast      # the release gate before you push
ao provenance show <sha>  # the recorded verdict trail for any commit
ao done <bead-id>         # close tracked work with its verdict attached
ao quick-start            # set up AgentOps in a repo
ao doctor                 # check reviewers, binary, and ledger health

# Experimental (still measuring whether these pay off; see the honest version below):
ao search "query"         # search history and local knowledge
ao lookup --query "topic" # retrieve curated learnings
ao compile                # rebuild the corpus

The whole loop runs in a plain session. No daemon, no scheduler, no cloud. For always-on work, it can hand each task to a background runner instead. Details: docs/3.0.md · operating loop.

The honest version

Proven: independent verification that records a verdict, and a durable, tamper-evident record of it. A change isn't done until something that didn't write it checks it, and that verdict is bound into the provenance ledger. No verdict, not done.

The receipts are public: membrane receipts — every number derived straight from the verdict ledger, none hand-written.

Still measuring: whether the accumulated corpus makes the next session measurably better. We won't claim it until the numbers say so (ADR-0004, ADR-0011).

AgentOps proves the work. It doesn't write the code; your agent still does that, and the cross-checks cost tokens. The .agents/ folder is plain markdown your agents keep up as they go.

When the labs ship their own version of this, your .agents/ folder comes with you. It's in your repo, in plain markdown, Apache-2.0.

What 3.0 is · vs hosted code review · docs index · newcomer guide · architecture · FAQ · built on the 12-factor doctrine.

Contributing: docs/CONTRIBUTING.md (agents: read AGENTS.md, track work with br). License: Apache-2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 4,200 Commits
.agents		.agents
.agy-plugin		.agy-plugin
.claude-plugin		.claude-plugin
.claude		.claude
.codex-plugin		.codex-plugin
.codex		.codex
.githooks		.githooks
.github		.github
.opencode		.opencode
agents		agents
bin		bin
cli		cli
deploy		deploy
docs		docs
evals		evals
evidence		evidence
examples/schedules		examples/schedules
homebrew-tap		homebrew-tap
images		images
lib		lib
plugins		plugins
schemas		schemas
scripts		scripts
skills-codex-overrides		skills-codex-overrides
skills-codex		skills-codex
skills		skills
spec		spec
tests		tests
.codecov.yml		.codecov.yml
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.goreleaser.yml		.goreleaser.yml
.markdownlint.json		.markdownlint.json
AGENTS-CI.md		AGENTS-CI.md
AGENTS-CODEX.md		AGENTS-CODEX.md
AGENTS-RUNTIME.md		AGENTS-RUNTIME.md
AGENTS-WORKFLOW.md		AGENTS-WORKFLOW.md
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
GOALS.md		GOALS.md
LICENSE		LICENSE
MEMORY.md		MEMORY.md
Makefile		Makefile
NOTICE		NOTICE
PRACTICE-REGISTRY.md		PRACTICE-REGISTRY.md
PRE-REDUCTION-SNAPSHOT.md		PRE-REDUCTION-SNAPSHOT.md
PRODUCT.md		PRODUCT.md
PROGRAM.md		PROGRAM.md
README.md		README.md
REDUCTION.md		REDUCTION.md
SYSTEM.md		SYSTEM.md
goals-affects-files.yaml		goals-affects-files.yaml
mkdocs.yml		mkdocs.yml
registry.json		registry.json
renovate.json		renovate.json
requirements-docs.txt		requirements-docs.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentOps

Autonomous code validation for coding agents

Install

What you get

Skills

The `ao` CLI

The honest version

About

Uh oh!

Releases 91

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AgentOps

Autonomous code validation for coding agents

Install

What you get

Skills

The ao CLI

The honest version

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 91

Uh oh!

Contributors

Uh oh!

Languages

The `ao` CLI