Evidence-first coding quality for Claude Code.
Forge classifies risk, captures a verification baseline, runs a multi-tier verification cascade, and spawns adversarial reviewer sub-agents before presenting code to you.
/install forge
Runs the full evidence-first Forge Loop for a given task.
Phases:
- Size — classifies the task as Small, Medium, or Large
- Classify — rates each file 🟢/🟡/🔴; any 🔴 file escalates to Large
- Baseline — runs the verification cascade before touching code; records results
- Implement — edits files; skips to Verify if
git diffalready shows changes - Verify cascade — Tier 1 (diagnostics/syntax) → Tier 2 (build/types/lint/tests) → Tier 3 (smoke script)
- Adversarial review — 1 reviewer (Medium) or 3 parallel reviewers (Large / 🔴); fixes findings; max 2 rounds
- Evidence Bundle — structured markdown output with before/after results and confidence level
- Commit — auto-commit with structured message; captures pre-commit SHA for rollback
- Session file — saves
implementations/<task-id>.mdto repo root
Standalone verification cascade on currently changed files. No implementation phase — useful for verifying changes made manually or by another tool.
Steps: detect changed files → classify risk → run Tiers 1–3 → output Evidence Bundle.
These skills activate automatically without you invoking them:
| Skill | When it fires |
|---|---|
forge:risk-classify |
Before Claude edits, creates, or deletes files — rates each file 🟢/🟡/🔴 |
forge:evidence-gate |
Before Claude writes a "build/tests passed" claim — requires real tool call evidence |
At the end of every /forge or /forge-verify run, you receive a structured report:
## 🔨 Forge Evidence Bundle
**Task**: add-payment-webhook | **Size**: L | **Risk**: 🔴
### Baseline (before changes)
| Check | Result | Command |
|--------------|---------------|----------------------|
| diagnostics | ✅ 0 errors | ide-get_diagnostics |
| build | ✅ exit 0 | npm run build |
| types | ✅ exit 0 | npm run type-check |
| tests | ✅ 47 passed | npm test |
### After Changes
| Check | Result | Command |
|--------------|---------------|----------------------|
| diagnostics | ✅ 0 errors | ide-get_diagnostics |
| build | ✅ exit 0 | npm run build |
| types | ✅ exit 0 | npm run type-check |
| tests | ✅ 48 passed | npm test |
### Adversarial Review
| Reviewer | Findings |
|----------|--------------------------------------------------------|
| sonnet | Missing HMAC validation line 34 (95) — **fixed** |
| haiku | Idempotency gap line 67 (78) — **fixed** |
| opus | No issues |
**Confidence**: High — all tiers passed, all reviewer findings fixed
**Rollback**: `git revert HEAD` or `git checkout <pre-sha> -- <file>`
| Level | Meaning |
|---|---|
| High | All tiers passed, no regressions, reviewers found zero issues or only issues you fixed |
| Medium | Most checks passed but a reviewer concern was addressed without certainty, or test coverage for the changed path is missing |
| Low | A check failed you could not fix, or a reviewer raised something you cannot disprove — states what would raise confidence |
/forge add HMAC signature validation to webhook handler
→ Size: Large (webhook + signature keywords)
→ Classify: payments.ts 🔴, order.service.ts 🟡, payments.test.ts 🟢
→ Baseline: build ✅ types ✅ tests 47 ✅
→ Implement changes
→ Verify: build ✅ types ✅ tests 48 ✅
→ 3× forge-reviewer (parallel)
sonnet: Missing timingSafeEqual (95) — fixed
haiku: Idempotency gap (78) — fixed
opus: No issues
→ Re-verify after fixes: all ✅
→ Second review round: no new findings
→ Evidence Bundle: Confidence High
→ Commit created
→ implementations/add-hmac-validation.md saved
| Color | Meaning | Examples |
|---|---|---|
| 🔴 | Critical — escalates task to Large, triggers 3 reviewers | Auth, payments, crypto, schema migrations, data deletion, public API surface, webhooks |
| 🟡 | Significant — existing logic being modified | Business logic, service files, DB queries, UI state, controllers |
| 🟢 | Additive — low blast radius | New test files, documentation, config, new files from scratch, CSS |