Stop AI agents from interpreting your specs differently every time.
spec-gate is a validation system for Claude Code that ensures AI specs produce consistent, predictable output. It scores specs before implementation, validates diffs after, and learns from every cycle to get better over time.
npx spec-gate init
Zero runtime dependencies. Zero config required. Works with any Claude Code workflow.
You write a spec. You give it to an AI agent. It builds something. You give the same spec to another agent — or even the same agent in a new session — and it builds something different.
The root cause: most specs are under-specified. They leave room for interpretation, and every agent interprets differently. "Add auth" could mean JWT or sessions or OAuth. "Update the frontend" could touch 2 files or 20.
spec-gate adds two validation gates around your implementation:
/check-spec implement /check-diff
┌──────────────┐ ┌─────────┐ ┌──────────┐
│ Detect type │ │ │ │ Score │
│ (frontend, │ │ │ │ diff for │
│ backend...) │ contract │ Code │──────────►│ compliance│
│ │ │────┬────►│ │ │ (type- │
│ Score spec │ │ │ │ │ aware) │
│ (domain- │ │ └─────────┘ └──────────┘
│ specific) │ │ │
└──────────────┘ │ │
▲ ▼ │
│ /check-determinism │
│ ┌──────────────┐ │
│ │ 2 agents │ │
│ │ same spec │ │
│ │ compare │ │
│ └──────────────┘ │
│ │ │
│ learnings.json │
└────────────────────────────────────────────────┘
self-improving loop
Gate 1 (/check-spec) — Detects the spec type (frontend, backend, infra, data, ux, fullstack) and scores 5 determinism signals using domain-specific checklists. If the score is low, asks targeted refinement questions tuned to the spec type. Outputs a contract.
Optional: /check-determinism — Proves the spec is deterministic by having two agents interpret it independently. Light mode compares outlines; full mode compares real code.
Gate 2 (/check-diff) — Compares the actual diff against the contract using type-aware decision verification. Catches scope creep, missing files, boundary violations, and decision divergence. Writes learnings that make Gate 1 smarter next time.
# Scaffold into your project (creates Claude Code skills, agent, and config)
npx spec-gate init
# Or install globally
npm i -g spec-gate
spec-gate initThis creates:
.claude/skills/check-spec/SKILL.md # /check-spec command
.claude/skills/check-determinism/SKILL.md # /check-determinism command
.claude/skills/check-diff/SKILL.md # /check-diff command
.claude/agents/spec-gate-validator.md # Stop hook agent
.claude/settings.json # Hook registration
.spec-gate.json # Config
In Claude Code:
# Before implementing — score and refine your spec
/check-spec add JWT auth to the login endpoint
# Optional — prove the spec is deterministic before you build
/check-determinism # light mode (fast, outlines only)
/check-determinism --full # full mode (two real implementations)
# After implementing — validate the diff
/check-diffThat's it. No config needed.
Scores a spec on 5 weighted determinism signals:
| Signal | Weight | What it measures |
|---|---|---|
| Scope | ×3 | How precisely the change is described |
| File boundaries | ×2 | Whether exact file paths are listed |
| Acceptance criteria | ×2 | Whether success is testable |
| Negative space | ×2 | Whether out-of-scope items are explicit |
| Decisions resolved | ×2 | Whether technical choices are locked in |
Score ≥ 8/10: Generates a contract directly — the spec is deterministic enough.
Score < 8/10: Asks targeted refinement questions based on codebase context. Not generic suggestions — real questions with real options derived from your actual code:
"The nav bar is in
src/app/layout.tsx. Should the logo replace the site title text on line 21, or go to the left of it?"
After refinement, outputs:
- Refined spec →
.spec-gate/refined-spec.md - Contract →
.spec-gate/contract.json - Test suggestions → derived from acceptance criteria
Then offers: Implement now | Plan first | Done for now
Tests whether two independent agents would interpret the spec the same way. Run this before implementing — it validates the spec itself, not your implementation.
Light mode (default) — two agents produce detailed implementation outlines (file lists, imports, key decisions, function signatures, critical code lines) without writing real code. Compares their outlines to find ambiguity. Fast and cheap.
Full mode (--full) — two agents implement the spec for real in isolated git worktrees. Diffs their actual code line-by-line. Definitive but expensive (two full implementations).
Both modes:
- Give each agent the identical spec with zero additional context
- Report actual determinism: file/import agreement, decision consistency, structural match
- Compare predicted score (from
/check-spec) against actual results - Write divergence patterns to learnings so
/check-specgets smarter
Use light mode to quickly spot ambiguity during spec refinement. Use full mode when you need proof that a critical spec truly produces identical code.
Compares the actual diff against the contract across 5 compliance signals:
| Signal | Weight | What it measures |
|---|---|---|
| File accuracy | ×3 | Expected files present, no unexpected extras |
| Boundary respect | ×1 | Within file count and line limits |
| Acceptance criteria | ×3 | Each criterion verified against diff evidence |
| Scope discipline | ×1 | No scope creep beyond the contract |
| Decision adherence | ×2 | Technical decisions actually followed in code |
Decision verification is the key differentiator — it doesn't just check which files changed, but what the code actually does. If the spec says "use jose lib, RS256" but the code imports jsonwebtoken with HS256, check-diff catches it.
Uses the contract timestamp to scope the diff, filtering out pre-existing uncommitted changes.
The core innovation. Every /check-diff run writes learnings to .spec-gate/learnings.json:
File coupling rules — Project-specific patterns like "changing prisma/schema.prisma also requires prisma/migrations/". Next time /check-spec sees a spec that touches the schema but doesn't mention migrations, it lowers the file boundaries score and asks the user about it.
Scoring notes — Tracks which signals get over-scored. If file boundaries has been too generous 3 times, /check-spec applies stricter scoring automatically.
Decision specificity rules — From /check-determinism results, learns which kinds of decisions need more detail. "Use a good auth library" isn't specific enough; "jose, RS256, 1hr expiry" is.
Project checklists — After enough learnings accumulate, /check-spec auto-generates a project-specific checklist and uses it to front-load questions before you even write a vague spec.
Session 1: /check-spec → 10/10 → implement → /check-diff → 6/10
Learns: "layout.tsx changes also need globals.css"
Session 5: /check-spec → spec mentions layout.tsx without globals.css
→ Flags it: "Past changes to layout.tsx also required globals.css"
→ Score drops to 7/10, triggers refinement
→ After refinement: 10/10 with globals.css included
Session 5: /check-diff → 9/10 → learnings reinforced
The system gets better the more you use it on a project.
spec-gate auto-detects what produced the spec:
| Workflow | How to use |
|---|---|
| Raw prompt | /check-spec add a dark mode toggle |
| GSD phases | /check-spec 24 → reads .planning/phase-24/PLAN.md |
| Plan mode | /check-spec with no args → picks up plan from context |
| Plan files | /check-spec ./my-plan.md |
| spec-kit | Auto-detected from .spec-kit/ directory |
A common pattern is to use Claude Code's built-in plan mode to design an approach, then validate it with spec-guard before implementing:
# 1. Use plan mode to design the feature
> Plan how to add rate limiting to the API endpoints
# Claude enters plan mode, explores the codebase, proposes a plan
# You review and approve the plan
# 2. Before implementing, validate the plan for determinism
> /check-spec
# spec-guard picks up the plan from context, scores it, and asks
# refinement questions for any gaps:
#
# Spec type: backend
# Score: 6/10
#
# "The plan mentions rate limiting but doesn't specify the strategy.
# Should it use fixed window (100 req/min, 429 after) or
# sliding window (token bucket, 100 tokens/min)?"
#
# "The plan doesn't specify the error response format.
# Should 429 responses return {error, retryAfter} or
# use a Retry-After header?"
# 3. After refinement, choose "Implement now" or "Plan first"
# The contract locks in the decisions so implementation is predictable
# 4. After implementation, validate the diff
> /check-diff
# spec-guard verifies the code matches the contract —
# right files changed, decisions followed, criteria metThe key insight: plan mode gives you a strategy, spec-guard makes it deterministic. A plan might say "add rate limiting middleware" — spec-guard ensures it specifies which library, what limits, which status codes, and what error format.
.spec-gate.json (created by init, all fields optional):
{
"specSources": [
{ "pattern": ".planning/phase-*/PLAN.md", "workflow": "gsd" },
{ "pattern": ".spec-kit/**/*.md", "workflow": "spec-kit" }
],
"ignoredPaths": [
"package-lock.json", "yarn.lock", "pnpm-lock.yaml",
"*.snap", ".env*", "*.generated.*"
],
"boundaries": {
"defaultMaxFiles": 20,
"defaultMaxLinesAdded": 1000
}
}spec-gate init # Scaffold skills, agent, hook, config
spec-gate init --skills-only # Only install skills and config (no hook)
spec-gate init --hooks-only # Only install agent and hook
spec-gate init --force # Overwrite existing files (creates backups)
spec-gate update # Update to latest templates (keeps config)
spec-gate remove # Remove all spec-gate files
spec-gate remove --data # Also remove .spec-gate/ directoryAfter init, these files exist in your project:
| File | Purpose | Git? |
|---|---|---|
.claude/skills/check-spec/SKILL.md |
Gate 1 skill | Yes |
.claude/skills/check-determinism/SKILL.md |
Determinism test skill | Yes |
.claude/skills/check-diff/SKILL.md |
Gate 2 skill | Yes |
.claude/agents/spec-gate-validator.md |
Stop hook agent | Yes |
.claude/settings.json |
Hook registration | Yes |
.spec-gate.json |
Configuration | Yes |
.spec-gate/contract.json |
Current contract | No* |
.spec-gate/refined-spec.md |
Current refined spec | No* |
.spec-gate/learnings.json |
Cross-session learnings | No* |
*The .spec-gate/ directory is session-specific. Add it to .gitignore or commit it — your choice. Learnings are more valuable if committed (shared across team members).
- Zero runtime dependencies — CLI uses only Node.js built-ins
- Always advisory — Never blocks execution, never exits with error codes
- Non-destructive settings merge — Only touches the
hookskey in settings.json - Scaffolding, not a framework — After init, the files are yours. Edit them freely.
- Works offline — Everything runs locally, no API calls beyond Claude Code itself
MIT