feat: add adversarial review agents for code and documents by tmchow · Pull Request #403 · EveryInc/compound-engineering-plugin

tmchow · 2026-03-27T02:56:37Z

Summary

Adds two adversarial review agents that take a fundamentally different stance from existing reviewers: instead of evaluating quality against known criteria, they actively try to falsify the artifact by constructing scenarios that break it.

Code adversarial-reviewer -- conditional reviewer for ce-review. Four hunting techniques: assumption violation, composition failures, cascade construction, abuse cases. Auto-scales depth (quick/standard/deep) based on diff size and risk signals.
Document adversarial-document-reviewer -- conditional reviewer for document-review. Five analysis techniques: premise challenging, assumption surfacing, decision stress-testing, simplification pressure, alternative blindness.

Both use the standard JSON findings contract, integrate as cross-cutting conditionals in their respective ensembles, and have explicit suppress conditions to avoid overlap with existing reviewers. Skill-level routing excludes test-only, generated, and lockfile diffs from the size threshold.

Measured impact

Tested the adversarial code reviewer against 3 synthetic high-risk diffs (payment webhook handler, Redis rate limiter + cache, CSV user import with migration) and 1 negative control (small config rename).

Assertion pass rate: 100% adversarial vs 61% baseline on findings that require multi-step failure reasoning -- cascades, cross-component composition failures, and abuse scenarios. Examples of findings unique to the adversarial reviewer:

Payment webhook: traced a partial-failure cascade where a mailer exception after a balance credit triggers a Stripe retry that double-credits the account
Rate limiter: traced a 6-step cascade from a routine cache KEYS invalidation through Redis blocking to fleet-wide rate-limit bypass
User import: traced SMTP timeout inside a transaction rolling back 500 user creates while welcome emails for the first 200 were already sent and irrecoverable

The existing persona reviewers (correctness, security, reliability) consistently caught individual bugs in the same code but did not trace multi-step failure chains across component boundaries. The adversarial reviewer also used fewer tokens per run (34K avg vs 49K) since it focuses on its specific domain rather than full-spectrum review.

Negative control (small config rename): adversarial reviewer correctly not selected on either skill version. No noise regression -- both versions produced clean reviews.

Design decisions

Two agents, not one: document and code adversarial review require fundamentally different reasoning (strategic skepticism vs. chaos engineering). A single agent's prompt would be too sprawling.
Conditional tier, not always-on: adversarial review is expensive. Small config changes don't need it.
Agent-internal auto-scaling: the agents estimate size from the raw content they receive rather than requiring template extensions with sizing metadata.

Test plan

bun test passes (497/497), including the path-sanitization collision test that caught the original shared-name issue
bun run release:validate confirms 46 agents, 42 skills, 1 MCP server in sync

🤖 Generated with Claude Opus 4.6 (1M context) via Claude Code

Two new conditional reviewers that take a falsification stance -- actively constructing failure scenarios rather than checking against known patterns. Code adversarial-reviewer: assumption violation, composition failures, cascade construction, and abuse cases. Auto-scales quick/standard/deep based on diff size and risk signals. Document adversarial-document-reviewer: premise challenging, assumption surfacing, decision stress-testing, simplification pressure, and alternative blindness. Auto-scales based on document size and domain risk. Both integrate into existing review ensembles (ce-review and document-review) as cross-cutting conditional reviewers using the standard findings contract.

tmchow force-pushed the feat/adversarial-reviewers branch from e27db33 to 6c6aabe Compare March 27, 2026 02:59

tmchow merged commit 5e6cd5c into main Mar 27, 2026
2 checks passed

github-actions bot mentioned this pull request Mar 27, 2026

chore: release main #397

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add adversarial review agents for code and documents#403

feat: add adversarial review agents for code and documents#403
tmchow merged 1 commit intomainfrom
feat/adversarial-reviewers

tmchow commented Mar 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tmchow commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Measured impact

Design decisions

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tmchow commented Mar 27, 2026 •

edited

Loading