SpecForge

Turn Claude Code into a disciplined engineer. /spec to think. /yolo to ship. /forge to run overnight. Zero freestyle.

The Problem

AI coding agents are powerful but undisciplined. They skip planning, hallucinate scope, ignore edge cases, and ship untested code. The longer the task, the worse it gets.

The Fix

Drop these files into your repo. Now Claude Code operates in three modes:

/spec — Forces Claude to think before coding. It scans your codebase, asks you 3-5 hard clarifying questions, presents scored implementation options, decomposes into story-sized tasks with acceptance criteria and rollback plans, and writes the approved plan to disk. No code written.

/yolo — Supervised execution. Reads the approved plan. Executes one task at a time. Runs format/lint/typecheck/tests after every edit. Spawns an independent code reviewer. Optionally calls Gemini for a cross-model second opinion. Commits only when everything passes. If Claude crashes mid-run, it reads the checkpoint file and resumes from where it left off.

/forge — Autonomous execution via AgentForge. Translates the spec into AgentForge's format and launches the Ralph Loop. Codex builds each task, Sonnet scores it 0-10, and the harness retries up to 3 times with structured feedback. Walk away, come back to committed code.

/deslop — Mandatory cleanup pass. Scans changed files for AI junk like TODOs, debug logs, placeholder text, commented-out dead code, and unjustified suppressions before review.

Context snapshots — Each phase writes docs/specs/<feature>-context.md so /research, /spec, /yolo, and /autopilot can hand off without losing the plot after compaction, crashes, or mode switches.

The agent never invents scope. If the spec doesn't say to do it, it doesn't do it.

What's Inside

your-project/
├── CLAUDE.md                           # Always-on rules (~60 lines)
├── ARCHITECTURE.md                     # System design reference
├── .claude/
│   ├── settings.json                   # Hooks + permissions
│   ├── skills/
│   │   ├── spec/SKILL.md              # /spec — planning engine
│   │   ├── yolo/SKILL.md             # /yolo — supervised executor
│   │   ├── forge/SKILL.md            # /forge — AgentForge bridge
│   │   ├── research/SKILL.md         # /research — deep research
│   │   ├── deslop/SKILL.md           # /deslop — AI slop cleanup
│   │   └── autopilot/SKILL.md        # /autopilot — full pipeline
│   ├── agents/
│   │   ├── code-reviewer.md           # Quality gate (read-only)
│   │   ├── critic.md                  # Spec red-team (read-only)
│   │   ├── diagnostician.md           # Gate failure diagnosis (read-only)
│   │   ├── researcher.md              # Architecture investigator (read-only + web)
│   │   └── security-reviewer.md       # Security hunter (read-only)
│   ├── scripts/
│   │   ├── review.sh                  # External Gemini review via curl
│   │   └── write-context-snapshot.sh  # Snapshot helper for cross-phase handoffs
│   └── provider-handoff.md            # Cross-model review contract
└── docs/
    ├── specs/                          # Approved spec artifacts
    ├── runs/                           # Execution logs
    ├── reviews/                        # Review output
    └── templates/                      # Blank templates (spec, run, review)

Quick Start

# 1. Clone into your project
git clone https://github.com/benikigai/SpecForge.git /tmp/sdh
cp -r /tmp/sdh/.claude /tmp/sdh/CLAUDE.md /tmp/sdh/ARCHITECTURE.md /tmp/sdh/docs your-project/
rm -rf /tmp/sdh

# 2. Optional: enable external Gemini review
export GEMINI_API_KEY="your-key-here"

# 3. Start Claude Code in your project
cd your-project
claude

# 4. Plan a feature
> /spec I want to add user authentication with OAuth2

# 5a. Execute supervised (you watch)
> /yolo

# 5b. Execute autonomous (walk away)
> /forge

How It Works

/spec Flow

You describe a feature
    → Claude scans codebase
    → Loads any prior context snapshot / research doc
    → Asks 3-5 clarifying questions
    → Presents 2-3 options scored on Elegant / Efficient / Effective
    → Critic subagent attacks the recommended option
    → Decomposes chosen option into tasks
    → Writes approved spec + context snapshot to docs/specs/

/yolo Flow

For each task in the spec:
    → Load context snapshot
    → Restate objective (no ambiguity)
    → Implement (stay in spec's file list)
    → Run format / lint / typecheck / tests
    → Run deslop cleanup pass
    → Spawn code-reviewer subagent
    → Optional: call Gemini via review.sh
    → Fix any issues raised
    → Commit with structured message
    → Mark checkbox [x] in execution file
    → Refresh context snapshot
    → Next task

/forge Flow (Autonomous via AgentForge)

/forge reads the approved spec
    → Generates forge JSON (AgentForge feature list format)
    → Generates build prompt from spec + project context
    → Launches ralph-loop.sh
        → Codex builds each task
        → Sonnet evaluator scores 0-10
        → If score < threshold: structured feedback → retry (up to 3x)
        → If pass: commit + mark complete
        → If stagnation: accept if close, skip if not
    → Writes run report to docs/runs/
    → Marks yolo checkboxes for any completed tasks

Requirements: AgentForge installed at ~/code/AgentForge, codex CLI, OPENAI_API_KEY + ANTHROPIC_API_KEY set.

/research Flow

You describe a topic
    → Claude searches web + GitHub + dev communities
    → Synthesizes consensus, contrarian takes, and repos to steal from
    → Writes docs/specs/<topic>-research.md
    → Seeds docs/specs/<topic>-context.md for /spec

When to Use Which

Situation	Use
First time building in a new area of the codebase	`/yolo` — watch and learn
Security-sensitive changes (auth, payments, data)	`/yolo` — human review per task
Well-defined feature, clear tests, low risk	`/forge` — let it run
Overnight batch of features	`/forge` — built for this
Debugging a failed `/forge` task	`/yolo` — step through manually
Complex architecture decision	`/yolo` + `/research` — need human judgment

The Review Chain

Every task gets reviewed before commit. Up to three independent perspectives:

Reviewer	Model	How
code-reviewer	Claude Sonnet (subagent)	Spawned automatically — checks correctness, edge cases, tests, security
Gemini	Flash/Pro (API)	`review.sh` sends `git diff` to Gemini, prints verdict to terminal
You	Human	Final review after all tasks complete

Complexity Tiers

Tasks are graded. The system scales its effort accordingly:

Tier	When	Research	Review
Simple	Known pattern, <50 LOC, <3 files	Skip	Hooks only
Moderate	Some ambiguity, 50-200 LOC, 3-6 files	Light	Subagent review
Complex	Architecture change, >200 LOC, >6 files	Full /research loop	Full review chain

Safety

Hooks enforce rules deterministically — not suggestions, not prose
PreToolUse hook blocks rm -rf, --force, --no-verify, .env access
Stop hook verifies acceptance criteria met before Claude finishes any turn
Protected paths require explicit approval (migrations, auth, secrets, CI/CD, lock files)
Crash resilience — physical [ ]/[x] checkboxes on disk track progress; resume from any interruption
Cross-phase memory — docs/specs/<feature>-context.md preserves decisions, risks, and progress across /research → /spec → /yolo → /autopilot
Every passing task is committed — your safety net

Notifications

SpecForge now assumes Clawhip is the notification backbone for unattended flows such as /autopilot.

Preferred repo command: bash .claude/scripts/notify-clawhip.sh finished --session "<feature>" --summary "PR ready: <url>"
The wrapper emits agent.* events with project=SpecForge
Default routing in this setup:
- agent.* for project=SpecForge → #codex
- github.ci-* → #system-ops
- git.commit from monitored repos → #codex
clawhip must be on PATH or available at ~/.cargo/bin/clawhip

Core Principles

Spec owns ambiguity. Yolo owns zero.
One writer, parallel reviewers. Only Opus edits code.
Hooks enforce. CLAUDE.md advises.
Story-sized tasks. One at a time.
Files are memory. Chat is ephemeral.

Customization

Edit CLAUDE.md for your project's rules and conventions
Edit .claude/settings.json to adjust hooks for your formatter/linter
Edit agent files to tune review criteria
Set REVIEWER_MODEL env var to change the Gemini model (default: gemini-2.5-flash)

What This Draws From

Built on patterns from Ralph (file-based state, one story per loop), Oh My Codex (deep interviews, staged pipelines), Karpathy's AutoResearch (hypothesis-driven iteration), and Anthropic's official Claude Code best practices (hooks, subagents, skills, concise CLAUDE.md).

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.claude		.claude
docs		docs
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CLAUDE.md		CLAUDE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpecForge

The Problem

The Fix

What's Inside

Quick Start

How It Works

/spec Flow

/yolo Flow

/forge Flow (Autonomous via AgentForge)

/research Flow

When to Use Which

The Review Chain

Complexity Tiers

Safety

Notifications

Core Principles

Customization

What This Draws From

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SpecForge

The Problem

The Fix

What's Inside

Quick Start

How It Works

/spec Flow

/yolo Flow

/forge Flow (Autonomous via AgentForge)

/research Flow

When to Use Which

The Review Chain

Complexity Tiers

Safety

Notifications

Core Principles

Customization

What This Draws From

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages