-
Notifications
You must be signed in to change notification settings - Fork 0
crickets code review
title: code-review — design status: launched kind: design scope: feature area: crickets/code-review governs: [src/code-review/] parent: crickets-hld.md seeded: 2026-06-20 approved: 2026-06-23
Note
LAUNCHED (lifted 2026-06-24, AG Phase 3; originally approved 2026-06-23). child-design — the code-review capability (adversarial review — assume the code is broken and prove otherwise). status: launched (lifted into tracked wiki/designs/ 2026-06-24, AG Phase 3). Points up at the crickets HLD.
code-review assumes the code is broken and makes you prove it isn't — adversarial review that reports a failing test, a file:line defect, or NO ISSUES FOUND, and never fixes. It is an implementation of the good opinion applied to code: the standard that a change has to survive a hostile read. The load-bearing discipline is the contract every reviewer returns against.
The primitives, all delivered.
Commands (operator-invoked):
| Command | What it does |
|---|---|
/code-review |
Standalone adversarial review of a diff or PR. |
/doubt |
In-flight review of a decision before it stands. |
/simplify |
Cut accidental complexity, no behavior change. |
Sub-agents (auto-dispatched during review):
-
adversarial-reviewer— the in-process critic. -
adversarial-reviewer-cross— a cross-model second opinion (Gemini). -
security-auditor— boundary threat modeling. -
test-engineer— coverage audit.
Plus the evidence-tracker hook (fires in /work), the cross-review.sh script, and the security-review + testing-strategy skills.
graph TD
CR["<b>code-review</b><br/><i>good, applied to code</i>"]
CR --> CMD["commands<br/><i>/code-review · /doubt · /simplify</i>"]
CR --> AG["sub-agents<br/><i>adversarial-reviewer (+cross)<br/>security-auditor · test-engineer</i>"]
CR --> ET["evidence-tracker hook"]
AG -. "auto-dispatched by" .-> RVP["/review"]
ET -. "gates" .-> WK["/work"]
AG --> CT["falsifiable finding<br/><i>failing test · file:line · NO ISSUES</i>"]
Three operator-invoked commands + the auto-dispatched adversarial sub-agents + the evidence-tracker /work gate; every reviewer returns a falsifiable finding (prose rejected) — what good means concretely.
-
/code-review— standalone adversarial review of a diff or PR (a commit range / branch / PR / URL, or the working-tree diff). It dispatches the cross-model reviewer first, then the in-process one for corroboration, and reports a failing test /DEFECT file:line/NO ISSUES FOUND; never fixes. Invoked by the operator, on demand — a standalone command, not part of a workflow. -
/doubt— in-flight review of a decision before it stands: CLAIM → EXTRACT → DOUBT → RECONCILE → STOP, with a hard 3-cycle cap. Catches a contract-misread or a wrong assumption before the code is written. Invoked by the operator, on demand. -
/simplify— cut accidental complexity (Chesterton's Fence — understand why code exists before removing it — + the Rule of 500); no behavior change; returns safe removals + Rule-of-500 flags for operator approval. Invoked by the operator, on demand.
-
adversarial-reviewer— the in-process critic, framed "the code contains bugs, find them"; returns the three-form contract. Invoked automatically as part of the/reviewworkflow (in development-lifecycle), and by/code-review. -
adversarial-reviewer-cross— a cross-model second opinion (Gemini, viacross-review.sh), graceful-fallback to the in-process reviewer when unavailable. Invoked automatically by/code-reviewfirst, and by/reviewwhen a cross-model pass is available. -
security-auditor— three-tier boundary threat modeling (LLM API / persistence / system execution) →VULNERABILITY file:line. Invoked by thesecurity-reviewskill, or automatically by/reviewwhen security review is requested. -
test-engineer— Beyoncé-rule + prove-it coverage audit →MISSING-TEST. Invoked by thetesting-strategyskill, or automatically by/reviewwhen a coverage review is requested.
Every reviewer returns one of three forms: a failing test · a DEFECT / VULNERABILITY / MISSING-TEST at file:line · NO ISSUES FOUND. Prose-only critiques are rejected. This is the discipline that makes the review trustworthy — it forces a finding you can act on or reproduce, not an opinion. It is also what good means, concretely.
A PreToolUse hook (shipped here, enforcing development-lifecycle's /work): it blocks a [ ] → [x] checkbox flip unless the task's verification evidence is present — default-fail, with a per-task **Evidence:** none — <rationale> opt-out. It keeps "done" honest. Invoked automatically — the hook fires on every /work checkbox flip.
code-review implements good — the adversarial-review contract is what good looks like applied to code: a change has to survive a hostile read, and a finding has to be a falsifiable file:line, not an opinion. (The standard is hardwired in the reviewers today; requesting good by name — opinion_request("good") — is the Phase-3/4 registry work, the Opinions design.)
-
enhances development-lifecycle's
review—/reviewdispatches the adversarial pass; theevidence-trackerhook gates/work. -
implements the
goodopinion (agentm Opinions) — the adversarial-review contract is whatgoodlooks like. - composed by diagnostics (hypothesis verification — real bug, or an assumption?) and maintenance (a no-living-callers check on a removal).
- Points up at the crickets HLD; the requires/enhances mechanics are in crickets-composition.
-
The
goodopinion is hardwired today — the three-form contract is embedded in the reviewers; requestinggoodby name (opinion_request) is Phase-3/4 (the Opinions design).[PENDING-IMPL]— wire the request-by-name when the Opinion registry ships; until then the hardwired contract isgood. -
The cross-model reviewer depends on an external CLI (Gemini) — opt-in per invocation, editable to another model in
cross-review.sh, and graceful-fallback to the in-process reviewer; not a hard dependency. -
The
{domain}-reviewfamily is future scope — peer-review beyond code (design-review,doc-review) arrives as sibling plugins, not a bloated single one. -
Re-audit triggers: wire
opinion_request("good")when the registry ships; add{domain}-reviewsiblings when peer-review beyond code is needed.
- crickets
src/code-review/— commands (/code-review,/doubt,/simplify) · agents (adversarial-reviewer,adversarial-reviewer-cross,security-auditor,test-engineer) ·evidence-trackerhook ·cross-review.sh· skills (security-review,testing-strategy); declares[adversarial-review] -
Up / composes: crickets HLD · composition · agentm Opinions (
good)
2026-06-24 (re-homed 2026-06-25) — folded ADR 0009 (evidence-tracker hook) into this design. Originally folded into developer-safety in AG Phase 4; re-homed here (the correct owner — the hook ships in src/code-review/) in the AG settle-sweep Group E.
0009 — Evidence-tracker hook (2026-05-22). evidence-tracker ships as a PreToolUse hook with a default-FAIL contract: the agent must demonstrably Read a spec/test/evidence file matching the task requirement before any Write/Edit flipping a PLAN.md checkbox [ ]→[x]. Hybrid evidence resolution: heuristic (test-dir/test-file patterns, code-ext only) + per-task **Evidence:** <glob> override + **Evidence:** none — <rationale> opt-out. Why not pure heuristic: rejects tasks with non-test-shaped evidence. Why not per-task-only: forces annotating every task. features.json gating rejected as redundant with /release-time adversarial review. Re-audit triggers: Antigravity gains a file-based hook surface; PLAN.md task-header shape changes; Bash-read evidence becomes common enough to warrant extending; verification-quality grading becomes needed.
2026-06-23 — added the structure diagram (diagram backfill). Per the every-design-carries-a-diagram rule.
2026-06-23 — authored from the seeded stub (grounded against the live src/code-review plugin); revised on operator review. All primitives delivered — three operator-invoked commands (/code-review, /doubt, /simplify), four auto-dispatched adversarial sub-agents (incl. the cross-model variant), the evidence-tracker /work gate, two skills, cross-review.sh. Named the load-bearing invariant — falsifiable findings only (a failing test / file:line defect / NO ISSUES; prose rejected) — which is what good means concretely; added an Opinions-it-consumes clause (implements good). Operator review: framed it as an implementation of good applied to code; listed the commands as a table + the sub-agents individually; and made invocation plain (operator-invoked commands vs auto-dispatched agents/hook, each naming its workflow). Kept the name (the review-workflows rename was rejected). Designed-not-built: the good request-by-name (Phase-3/4, [PENDING-IMPL]) and the {domain}-review family. Re-audit: wire opinion_request("good"); add review siblings when needed.
🔧 How-to
🏛️ Architecture
🧩 Designs
Architecture (Agent M) — in the agentm wiki ↗
Crickets