spec-gate

Stop AI agents from interpreting your specs differently every time.

spec-gate is a validation system for Claude Code that ensures AI specs produce consistent, predictable output. It scores specs before implementation, validates diffs after, and learns from every cycle to get better over time.

npx spec-gate init

Zero runtime dependencies. Zero config required. Works with any Claude Code workflow.

The problem

You write a spec. You give it to an AI agent. It builds something. You give the same spec to another agent — or even the same agent in a new session — and it builds something different.

The root cause: most specs are under-specified. They leave room for interpretation, and every agent interprets differently. "Add auth" could mean JWT or sessions or OAuth. "Update the frontend" could touch 2 files or 20.

How spec-gate fixes it

spec-gate adds two validation gates around your implementation:

  /check-spec                implement              /check-diff
  ┌──────────────┐          ┌─────────┐           ┌──────────┐
  │ Detect type  │          │         │           │ Score     │
  │ (frontend,   │          │         │           │ diff for  │
  │  backend...) │ contract │  Code   │──────────►│ compliance│
  │      │       │────┬────►│         │           │ (type-    │
  │ Score spec   │    │     │         │           │  aware)   │
  │ (domain-     │    │     └─────────┘           └──────────┘
  │  specific)   │    │                                 │
  └──────────────┘    │                                 │
       ▲              ▼                                 │
       │    /check-determinism                          │
       │    ┌──────────────┐                            │
       │    │ 2 agents     │                            │
       │    │ same spec    │                            │
       │    │ compare      │                            │
       │    └──────────────┘                            │
       │              │                                 │
       │        learnings.json                          │
       └────────────────────────────────────────────────┘
                    self-improving loop

Gate 1 (/check-spec) — Detects the spec type (frontend, backend, infra, data, ux, fullstack) and scores 5 determinism signals using domain-specific checklists. If the score is low, asks targeted refinement questions tuned to the spec type. Outputs a contract.

Optional: /check-determinism — Proves the spec is deterministic by having two agents interpret it independently. Light mode compares outlines; full mode compares real code.

Gate 2 (/check-diff) — Compares the actual diff against the contract using type-aware decision verification. Catches scope creep, missing files, boundary violations, and decision divergence. Writes learnings that make Gate 1 smarter next time.

Quick start

Install

# Scaffold into your project (creates Claude Code skills, agent, and config)
npx spec-gate init

# Or install globally
npm i -g spec-gate
spec-gate init

This creates:

.claude/skills/check-spec/SKILL.md         # /check-spec command
.claude/skills/check-determinism/SKILL.md  # /check-determinism command
.claude/skills/check-diff/SKILL.md         # /check-diff command
.claude/agents/spec-gate-validator.md     # Stop hook agent
.claude/settings.json                      # Hook registration
.spec-gate.json                           # Config

Use

In Claude Code:

# Before implementing — score and refine your spec
/check-spec add JWT auth to the login endpoint

# Optional — prove the spec is deterministic before you build
/check-determinism          # light mode (fast, outlines only)
/check-determinism --full   # full mode (two real implementations)

# After implementing — validate the diff
/check-diff

That's it. No config needed.

The three commands

`/check-spec [path|phase-number|prompt]`

Scores a spec on 5 weighted determinism signals:

Signal	Weight	What it measures
Scope	×3	How precisely the change is described
File boundaries	×2	Whether exact file paths are listed
Acceptance criteria	×2	Whether success is testable
Negative space	×2	Whether out-of-scope items are explicit
Decisions resolved	×2	Whether technical choices are locked in

Score ≥ 8/10: Generates a contract directly — the spec is deterministic enough.

Score < 8/10: Asks targeted refinement questions based on codebase context. Not generic suggestions — real questions with real options derived from your actual code:

"The nav bar is in src/app/layout.tsx. Should the logo replace the site title text on line 21, or go to the left of it?"

After refinement, outputs:

Refined spec → .spec-gate/refined-spec.md
Contract → .spec-gate/contract.json
Test suggestions → derived from acceptance criteria

Then offers: Implement now | Plan first | Done for now

`/check-determinism [--full]`

Tests whether two independent agents would interpret the spec the same way. Run this before implementing — it validates the spec itself, not your implementation.

Light mode (default) — two agents produce detailed implementation outlines (file lists, imports, key decisions, function signatures, critical code lines) without writing real code. Compares their outlines to find ambiguity. Fast and cheap.

Full mode (--full) — two agents implement the spec for real in isolated git worktrees. Diffs their actual code line-by-line. Definitive but expensive (two full implementations).

Both modes:

Give each agent the identical spec with zero additional context
Report actual determinism: file/import agreement, decision consistency, structural match
Compare predicted score (from /check-spec) against actual results
Write divergence patterns to learnings so /check-spec gets smarter

Use light mode to quickly spot ambiguity during spec refinement. Use full mode when you need proof that a critical spec truly produces identical code.

`/check-diff [base-branch]`

Compares the actual diff against the contract across 5 compliance signals:

Signal	Weight	What it measures
File accuracy	×3	Expected files present, no unexpected extras
Boundary respect	×1	Within file count and line limits
Acceptance criteria	×3	Each criterion verified against diff evidence
Scope discipline	×1	No scope creep beyond the contract
Decision adherence	×2	Technical decisions actually followed in code

Decision verification is the key differentiator — it doesn't just check which files changed, but what the code actually does. If the spec says "use jose lib, RS256" but the code imports jsonwebtoken with HS256, check-diff catches it.

Uses the contract timestamp to scope the diff, filtering out pre-existing uncommitted changes.

Self-improving loop

The core innovation. Every /check-diff run writes learnings to .spec-gate/learnings.json:

File coupling rules — Project-specific patterns like "changing prisma/schema.prisma also requires prisma/migrations/". Next time /check-spec sees a spec that touches the schema but doesn't mention migrations, it lowers the file boundaries score and asks the user about it.

Scoring notes — Tracks which signals get over-scored. If file boundaries has been too generous 3 times, /check-spec applies stricter scoring automatically.

Decision specificity rules — From /check-determinism results, learns which kinds of decisions need more detail. "Use a good auth library" isn't specific enough; "jose, RS256, 1hr expiry" is.

Project checklists — After enough learnings accumulate, /check-spec auto-generates a project-specific checklist and uses it to front-load questions before you even write a vague spec.

Session 1: /check-spec → 10/10 → implement → /check-diff → 6/10
           Learns: "layout.tsx changes also need globals.css"

Session 5: /check-spec → spec mentions layout.tsx without globals.css
           → Flags it: "Past changes to layout.tsx also required globals.css"
           → Score drops to 7/10, triggers refinement
           → After refinement: 10/10 with globals.css included

Session 5: /check-diff → 9/10 → learnings reinforced

The system gets better the more you use it on a project.

Works with any workflow

spec-gate auto-detects what produced the spec:

Workflow	How to use
Raw prompt	`/check-spec add a dark mode toggle`
GSD phases	`/check-spec 24` → reads `.planning/phase-24/PLAN.md`
Plan mode	`/check-spec` with no args → picks up plan from context
Plan files	`/check-spec ./my-plan.md`
spec-kit	Auto-detected from `.spec-kit/` directory

Example: plan mode workflow

A common pattern is to use Claude Code's built-in plan mode to design an approach, then validate it with spec-guard before implementing:

# 1. Use plan mode to design the feature
> Plan how to add rate limiting to the API endpoints

# Claude enters plan mode, explores the codebase, proposes a plan
# You review and approve the plan

# 2. Before implementing, validate the plan for determinism
> /check-spec

# spec-guard picks up the plan from context, scores it, and asks
# refinement questions for any gaps:
#
#   Spec type: backend
#   Score: 6/10
#
#   "The plan mentions rate limiting but doesn't specify the strategy.
#    Should it use fixed window (100 req/min, 429 after) or
#    sliding window (token bucket, 100 tokens/min)?"
#
#   "The plan doesn't specify the error response format.
#    Should 429 responses return {error, retryAfter} or
#    use a Retry-After header?"

# 3. After refinement, choose "Implement now" or "Plan first"
# The contract locks in the decisions so implementation is predictable

# 4. After implementation, validate the diff
> /check-diff

# spec-guard verifies the code matches the contract —
# right files changed, decisions followed, criteria met

The key insight: plan mode gives you a strategy, spec-guard makes it deterministic. A plan might say "add rate limiting middleware" — spec-guard ensures it specifies which library, what limits, which status codes, and what error format.

Configuration

.spec-gate.json (created by init, all fields optional):

{
  "specSources": [
    { "pattern": ".planning/phase-*/PLAN.md", "workflow": "gsd" },
    { "pattern": ".spec-kit/**/*.md", "workflow": "spec-kit" }
  ],
  "ignoredPaths": [
    "package-lock.json", "yarn.lock", "pnpm-lock.yaml",
    "*.snap", ".env*", "*.generated.*"
  ],
  "boundaries": {
    "defaultMaxFiles": 20,
    "defaultMaxLinesAdded": 1000
  }
}

CLI commands

spec-gate init              # Scaffold skills, agent, hook, config
spec-gate init --skills-only  # Only install skills and config (no hook)
spec-gate init --hooks-only   # Only install agent and hook
spec-gate init --force        # Overwrite existing files (creates backups)
spec-gate update            # Update to latest templates (keeps config)
spec-gate remove            # Remove all spec-gate files
spec-gate remove --data     # Also remove .spec-gate/ directory

File overview

After init, these files exist in your project:

File	Purpose	Git?
`.claude/skills/check-spec/SKILL.md`	Gate 1 skill	Yes
`.claude/skills/check-determinism/SKILL.md`	Determinism test skill	Yes
`.claude/skills/check-diff/SKILL.md`	Gate 2 skill	Yes
`.claude/agents/spec-gate-validator.md`	Stop hook agent	Yes
`.claude/settings.json`	Hook registration	Yes
`.spec-gate.json`	Configuration	Yes
`.spec-gate/contract.json`	Current contract	No*
`.spec-gate/refined-spec.md`	Current refined spec	No*
`.spec-gate/learnings.json`	Cross-session learnings	No*

*The .spec-gate/ directory is session-specific. Add it to .gitignore or commit it — your choice. Learnings are more valuable if committed (shared across team members).

Design principles

Zero runtime dependencies — CLI uses only Node.js built-ins
Always advisory — Never blocks execution, never exits with error codes
Non-destructive settings merge — Only touches the hooks key in settings.json
Scaffolding, not a framework — After init, the files are yours. Edit them freely.
Works offline — Everything runs locally, no API calls beyond Claude Code itself

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/workflows		.github/workflows
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spec-gate

The problem

How spec-gate fixes it

Quick start

Install

Use

The three commands

`/check-spec [path|phase-number|prompt]`

`/check-determinism [--full]`

`/check-diff [base-branch]`

Self-improving loop

Works with any workflow

Example: plan mode workflow

Configuration

CLI commands

File overview

Design principles

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

spec-gate

The problem

How spec-gate fixes it

Quick start

Install

Use

The three commands

/check-spec [path|phase-number|prompt]

/check-determinism [--full]

/check-diff [base-branch]

Self-improving loop

Works with any workflow

Example: plan mode workflow

Configuration

CLI commands

File overview

Design principles

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`/check-spec [path|phase-number|prompt]`

`/check-determinism [--full]`

`/check-diff [base-branch]`

Packages