Multi-agent adversarial code review for any coding agent. Ships as both a standalone CLI and a Claude Code Skill — pick whichever fits your workflow, the underlying logic is the same Node.js code.
$ adverse review ./src
⏳ collecting source...
42 files in ./src
⏳ round 1: 3 reviewers in parallel...
✓ auditor: 5 findings, conditional (38.2s)
✓ adversary: 3 findings, reject (44.7s)
✓ pragmatist: 4 findings, approve (29.1s)
⏳ round 2: 3 reviewers cross-examining...
✓ auditor: 2 validated, 1 challenged, 0 added
✓ adversary: 4 validated, 0 challenged, 1 added
✓ pragmatist: 1 validated, 2 challenged, 0 added
⏳ synthesizing...
# Adversarial Code Review
**Verdict:** SHIP-WITH-CAVEATS (2/3 ship, 1/3 block)
**Findings:** 3 critical · 5 warning · 1 info
## Cross-validated findings (multiple reviewers reported independently)
### 🔴 [CRITICAL] SQL injection in query builder — db.py:22
…
Wraps any coding-agent CLI that reads stdin and writes stdout — Claude Code, Codex CLI, Gemini, Aider, Ollama. Good when you want to gate CI, run from a Makefile, or use a non-Anthropic model.
npm install -g adverse
# or run directly without installing:
npx adverse review ./srcThe skill is in skills/adverse-review/. One-liner install via skills.sh:
npx skills add addyosmani/adverseThat clones the repo, locates skills/adverse-review/, and installs it to ~/.claude/skills/adverse-review/ (or .claude/skills/ with --project). Re-run to update. If you'd rather pin the exact subpath, npx skills add https://github.com/addyosmani/adverse/tree/main/skills/adverse-review works too. Or do it by hand:
git clone https://github.com/addyosmani/adverse.git ~/.adverse-source
mkdir -p ~/.claude/skills
ln -s ~/.adverse-source/skills/adverse-review ~/.claude/skills/adverse-reviewInside Claude Code, ask for "adversarial review", "adverse review of these changes", or "review my changes from multiple angles" — or hit /adverse-review. The skill handles scope detection (uncommitted changes vs branch diff vs full tree), spawns three reviewer subagents in parallel via Claude Code's native Agent tool (no nested-auth issues, no subprocess overhead), runs the cross-review round, and calls a small Node helper for the deterministic synthesis step. Needs node ≥ 20 on PATH; no npm install.
Both modes share the same src/ core, so a finding the CLI flags is the same finding the Skill flags — no drift between the two.
The naive way to do "AI code review" is one model, one shot. You get one perspective with all the blind spots that perspective has.
The next step up is what some prior projects did: two different models (Claude + GPT Codex), so each catches what the other misses. This works but it's expensive, slow, requires two API keys, and ties you to whichever two providers the script knows about.
adverse does the third thing: one model, three personas, with explicit cross-examination between them. The personas are designed to be orthogonal — Auditor catches logic bugs the Adversary won't go looking for; the Adversary names attack chains the Auditor won't think about; the Pragmatist sees the design problem both of the others ignore. Then in round 2 each persona has to go on record about the others' findings — validate or challenge — so the synthesizer can tell you which findings have multi-perspective support and which are one reviewer's hunch.
Trade-off, named honestly: a single model running three personas has anchoring bias that two separate models don't. The cross-review round mitigates this (each persona must defend a position visible to the others), and the personas themselves are written with explicit "stay in your lane / do not duplicate the others" instructions. But if you genuinely need decorrelated outputs across the model boundary, run adverse twice with different agents and diff the reports.
# global install (gives you the `adverse` binary on PATH)
npm install -g adverse
# or one-shot
npx adverse review ./srcRequires Node.js 20+. Zero runtime dependencies.
# Default: review the current directory using `claude -p`.
adverse review
# Specific path with a specific agent.
adverse review ./src --agent "claude -p"
adverse review ./src --agent "codex exec --quiet"
adverse review ./src --agent "gemini"
adverse review ./src --agent "ollama run llama3.1"
# Review only the changes on this branch.
adverse review --diff main
# Review uncommitted changes.
adverse review --diff
# Save the report to a file (markdown, JSON, and self-contained HTML).
adverse review --out review.md --json-out review.json --html-out review.html
# Skip cross-review (faster, less rigorous).
adverse review --single-round
# Run a different subset of personas.
adverse review --personas auditor,adversary
# See what came back from each agent (for debugging).
adverse review --save-artifacts ./.adverse-debug --verboseThe --agent flag accepts any command that reads a prompt from stdin and writes a response to stdout. Adverse handles the common output shapes — plain text, fenced JSON, Claude's {"result": "..."} wrapper, Anthropic content-block format — and retries once with feedback if the agent's first response doesn't parse.
| Variable | Effect |
|---|---|
ADVERSE_AGENT |
Default for --agent (e.g. export ADVERSE_AGENT="codex exec --quiet"). |
ADVERSE_LIVE |
Set to 1 to run the live-Claude integration tests. |
| Code | Meaning |
|---|---|
| 0 | Review completed; verdict was approve, conditional, or hold |
| 1 | Review completed; verdict was reject |
| 2 | Bad arguments (target missing, unknown persona, etc.) |
| 3 | Fewer than 2 reviewers produced valid output — synthesis aborted |
Code 1 is what you wire into a CI gate.
Used internally by the Skill, also useful standalone if you have round-1/round-2 JSON from somewhere else and just want the report:
adverse synthesize \
--round1 round1-combined.json \
--round2 round2-combined.json \
--out report.md \
--html-out report.html| Persona | Lens |
|---|---|
| Auditor | Correctness, logic, and algorithmic soundness — does this code compute the right answer? |
| Adversary | Security, abuse, trust boundaries — what can a hostile caller do? |
| Pragmatist | Maintainability, complexity, design fit — will this survive contact with reality? |
adverse personas prints them at runtime. Each persona's full system prompt is in src/personas.mjs; they're written with explicit "what's in scope / what's out of scope" rules so they don't duplicate each other's work. Keep new personas orthogonal to the existing three — overlap costs money for no signal.
┌──────────────────────────────────────────────────────────────────────┐
│ Round 1 — Independent Reviews (parallel, 3 calls) │
│ Each persona reviews the code with only its own lens. │
│ Output: { verdict, summary, findings[] } │
├──────────────────────────────────────────────────────────────────────┤
│ Round 2 — Cross-Review (parallel, 3 calls) │
│ Each persona sees all three round-1 reviews and: │
│ • validates findings it agrees with │
│ • challenges findings it thinks are wrong / overstated │
│ • adds new findings the other angles surfaced │
├──────────────────────────────────────────────────────────────────────┤
│ Synthesis — Deterministic (no LLM) │
│ Merge findings, score consensus, render report: │
│ cross-validated → reported by ≥2 personas │
│ consensus → reported by 1, validated by another │
│ disputed → reported by 1, challenged by another │
│ solo → reported by 1, no cross-talk │
└──────────────────────────────────────────────────────────────────────┘
Synthesis is deterministic Node code, not another LLM call. A fourth model invocation would cost more, add another failure mode, and inherit the same single-model bias the personas have. Counting validate / challenge edges is enough.
Per review: 6 model invocations (3 round-1 + 3 round-2). --single-round halves it. Wall time is roughly twice the slowest single invocation since personas run in parallel within each round.
npm test # 111 unit + contract tests, no API calls. Fast (~1s).
npm run test:live # Live tests against `claude -p`. Requires `claude` on
# PATH and auth — won't pass from inside a nested Claude
# Code session because subprocesses don't inherit auth.Coverage includes JSON-extraction across every wrapper shape Claude Code can produce ({"result": ...}, content-block, plain-string, fenced markdown, banner-then-JSON, raw), validator and synthesis logic, source collection, retry behavior, full CLI subprocess flow. Contract tests in tests/fixtures/claude-cli/ pin the wrapper shapes — drop a new file in there when Claude ships a CLI change and the suite picks it up automatically.
src/ # Shared core, used by both CLI and Skill
personas.mjs # Three persona system prompts
prompts.mjs # Phase-1/Phase-2 prompt construction + validators
parse.mjs # JSON extraction across every wrapper shape
collect.mjs # Directory walk + git-diff source collection
runner.mjs # Subprocess agent invocation + parallel orchestration
synthesis.mjs # Deterministic merge + markdown rendering
html.mjs # Self-contained HTML dashboard renderer
cli.mjs # Argv parsing + command dispatch
bin/
adverse.mjs # CLI entrypoint (#!/usr/bin/env node)
skills/adverse-review/
SKILL.md # Claude Code playbook (the "code" of the Skill)
scripts/
collect.mjs # Skill bridge: source collection
combine.mjs # Skill bridge: combine per-persona JSON
synthesize.mjs # Skill bridge: deterministic synthesis
dump-prompts.mjs # Regenerate prompt files from src/personas.mjs
prompts/ # Persona system prompts as plain .txt for the Skill
auditor.txt, adversary.txt, pragmatist.txt
round1.txt, round2.txt
tests/
*.test.mjs # node --test, no Jest/Mocha
fixtures/claude-cli/ # Pinned `claude -p` output shapes (contract tests)
fixtures/fake-agent.mjs # Stub agent for end-to-end CLI tests
- Single-model anchoring bias. Honest answer: a single model running three personas correlates more than three independent models would. Cross-review round mitigates; running adverse twice with different agents decorrelates. Don't pretend this is the same thing as two-provider review.
- Source size cap. Default 250 KB total / 30 KB per file. Trips on very large repos in non-diff mode. Use
--difffor review-on-PR workflows where the change set is what matters. - Subprocess agent contract. The CLI assumes the agent reads prompt from stdin and writes the response to stdout, exiting cleanly. Most coding agents support this; some need a flag (
-pfor Claude Code,execfor Codex CLI). When in doubt, run the agent manually with a stdin prompt first to confirm the shape. - Not a fix-applier. This tool produces a report. Hand the report to your coding agent if you want fixes applied.
MIT