Upfront

Force thinking before code.

AI makes writing code effortless. That's the problem.

When code is cheap to produce, it's easy to stop thinking. Skip the spec. Skip the design. Let the AI figure it out. Ship it, fix it later. Except "later" is now a codebase full of confident mistakes that passed every automated check and got rubber-stamped in review.

This is how teams lose their skills. Not all at once — gradually. The senior engineer who used to catch architectural issues in review now approves AI-generated PRs because the code "looks right." The mid-level engineer who used to design systems now prompts for them. The junior engineer who should be learning to think never has to, because the AI thinks for them.

Nobody notices until something breaks and no one on the team understands the system well enough to fix it.

Upfront solves this by making the thinking process explicit, challenging, and auditable. Every command in this toolkit is designed to force humans to engage — not fill out templates, not click "yes" to AI suggestions, but actually think through what they're building and why.

If the AI can't talk you out of your approach, your approach is probably sound. If it can, you should know that before writing a single line of code.

How it works

Upfront is a set of skills for Claude Code that cover the full development lifecycle. Install as a plugin — two commands in your terminal, no dependencies, no build step, no SaaS.

The commands follow a natural flow:

Think → Define → Plan → Build → Ship → Learn

Every command that involves decisions challenges you. The AI doesn't suggest and wait for approval — it asks open questions, waits for your answer, then fills the gaps you missed. You think first. The AI decorates second.

Commands

Think

`/ideate` — Find a problem worth solving

For when you don't know what to build yet. Divergent brainstorming that starts wide ("what's bugging you?"), clusters related pain points, challenges whether each is worth solving, converges to 1-3 candidates, and helps you pick one. No output file — the output is clarity. When you have a problem, it hands off to /feature.

`/explore` — Document the codebase and its ecosystem

Five-phase investigation that produces specs/ARCHITECTURE.md — the shared reference document every other command reads. Goes beyond code: maps external connections (latency, SLAs, failure modes, data volumes), ecosystem context (upstream, downstream, blast radius), and operational reality (deployment, monitoring, who gets paged). Asks about ecosystem diagrams.

Greenfield projects get a fast exit: detects empty repos, creates minimal scaffolding, sends you to /feature.

`/teach` — Rebuild your understanding

For when you haven't touched the codebase in a while. Gauges your current level (refresher, domain expert but new to code, or completely new), walks through the system in layers (context → happy path → failure modes → invariants → connections), and optionally quizzes you with questions that test real understanding — not trivia.

Generates a study guide with your strong areas, focus areas, key files to read, invariants to memorize, and a safe first task to build familiarity.

Define

`/feature` — Define a feature through forced thinking

The core command. Four phases, each with a different AI role:

Intent — Five forcing-function questions. What problem? How will you know it worked? What's out of scope? What must NOT happen? Pre-mortem. The AI is adversarial — it pushes back on vague answers and refuses to move on until the thinking is substantive.
Behavioral Spec — Five-level funnel: user stories → mechanism (why will this approach work?) → states and transitions → concurrency and shared state → error cases. The AI challenges your causal logic before any code is discussed.
Design Conversation — The AI researches the codebase, presents options and tradeoffs. You make the decisions.
Implementation Design — The AI proposes architecture AND challenges the codebase itself. Flags inconsistent patterns, ambiguous placement, structural rot. Cleanup becomes a prerequisite.

Every phase transition produces a thinking record — what was decided, why, what was rejected, what was skipped. The spec is the audit trail of the thinking, not just the conclusions.

If your problem statement is vague, /feature suggests /ideate. If there's no architecture doc, it suggests /explore.

Appends design decisions to specs/DECISIONS.md for future reference.

`/refine` — Iterate on a spec without starting over

Add // comments directly in the spec file to mark corrections, then run /refine. The AI treats each // comment as an agenda item — but challenges every change before applying it. Weakening a success metric? Why? Removing a constraint? What changed?

Checks for ripple effects across sections and updates thinking records with refinement notes.

Plan

`/plan` — Break a spec into buildable phases

Starts with a three-level architectural deep-dive:

System architecture — components, communication, system-level invariants
Subsystem architecture — modules, boundaries, data models, subsystem invariants
Design patterns and connections — concrete interfaces, edge behaviors, error propagation

Each level is confirmed before proceeding. The AI challenges assumptions and pushes for well-understood behaviors: "You're building a queue with flush semantics — this is a solved problem. Are you inventing from scratch?"

If specs/ARCHITECTURE.md is stale (>30 days old with commits since), the AI refuses to use it at face value — actively compares it to the actual codebase and presents specific drifts.

After architecture, audits for missing guardrails (linters, type checkers, security scanners, dead code detection, secret detection, slopsquatting protection). Proposes a Phase 0 to install anything missing.

Then breaks the spec into ~400 LOC phases, each independently verifiable and committable.

Architecture persists in specs/ARCHITECTURE.md across features. Reviewed and updated every /plan run.

Build

`/build` — Execute phases with TDD, review, and red team

The orchestrator. Spawns a fresh sub-agent for each phase (preventing context degradation), enforces strict TDD, and runs a post-phase code review.

Pre-flight: Detects all ecosystems in the project, verifies existing tools work, audits for missing tools (per-language checklists for Go, TypeScript, Python, Rust, JVM — linting, security, vulnerability scanning, dead code, formatting, secret detection, slopsquatting). Pushes hard for installation.

Per phase:

Fresh sub-agent with clean context (RALPH pattern)
Strict TDD — red/green/refactor. Work rejected if tests aren't written first.
Automated verification — every check from the plan runs independently
Post-phase code review — separate agent checks spec compliance, correctness, architecture. Optional stronger model for review.

After all phases:

Integration sweep — verify pieces connect correctly
Red team — adversarial agent that tries to break correctness, concurrency, boundaries, tests, and security. Fixes obvious issues, asks about judgment calls, flags design concerns.
Learning capture — appends to specs/LEARNINGS.md

Crash recovery: On resume, detects uncommitted changes (keep/stash/discard), reconciles progress file with git history, presents structured handoff summary.

`/quick` — Small changes without ceremony

For well-understood changes that don't need the full workflow. Takes a one-line description. Still enforces TDD and runs a code review, but no spec, no plan, no phases. If the change grows past ~50 lines, stops and redirects to /feature.

`/patch` — More structure than /quick, less than /feature

For bug fixes and small features from a clear problem statement or GitHub issue.

`/debug` — Scientific method debugging

Hypothesis → test → narrow → fix, with persistent state in specs/DEBUG.md. If a session dies mid-debug, the next session reads the file and picks up — never re-tries eliminated hypotheses.

Integrates with browser devtools (via Gasoline) for web/UI debugging: console errors, network requests, screenshots, DOM state.

Circuit breaker: after 3 failed hypothesis cycles, stops and asks for more context.

Ship

`/ship` — Create a PR with spec context

Auto-populates the PR description from the spec: why (intent), what (behavioral summary), key decisions, constraints, and a verification checklist. Reviewers get the "why" without reading the full spec. Links to the spec file for deep dives.

Learn

`/retro` — Check predictions against reality

After a feature ships and has production data, go back to the spec's "how will we know it worked?" and check. Scores each prediction (hit/partial/miss/unknown), analyzes why misses happened (mechanism, measurement, or environment), and extracts generalizable lessons.

Pushes for numbers, not feelings. "I think it improved" gets challenged: "Do you have the actual number?"

Feeds forward: suggests changes to /feature, /plan, or /build if the retro reveals a pattern.

Support

`/note` — Zero-friction todo capture

/note this module needs refactoring — appends a timestamped item to specs/TODO.md. /note shows the list. /note done 3 marks item 3 complete. /note clear removes completed items.

`/pause` — Structured session handoff

Captures everything the next session needs: what was running, what's done, what's next, key decisions, gotchas, git state, active files. Writes specs/HANDOFF.md. No questions asked — reads the conversation context and gets out.

`/resume` — Restore from handoff

Reads specs/HANDOFF.md, checks what changed since the pause, presents a structured briefing, waits for confirmation. Integrates with /build's crash recovery.

Persistent documents

These files accumulate project knowledge across features and sessions:

File	Purpose	Updated by
`specs/ARCHITECTURE.md`	System + subsystem + patterns + external connections	`/explore`, `/plan`
`specs/DECISIONS.md`	Append-only design decision register	`/feature`, `/refine`
`specs/LEARNINGS.md`	What surprised us, what went wrong, patterns	`/build`, `/debug`, `/retro`
`specs/TODO.md`	Scratchpad for ideas and tasks	`/note`
`specs/HANDOFF.md`	Session continuity	`/pause`
`specs/DEBUG.md`	Active debug session state	`/debug`

Install

Run these in your terminal:

claude plugin marketplace add ThinkUpfront/Upfront
claude plugin install upfront

Restart Claude Code. All 20 /upfront:* skills will be available in every project.

Telemetry

Upfront sends anonymous usage events to help prioritize development: plugin version, skill name, and a hashed project identifier (derived from your git remote URL). No personally identifiable information is collected — no IP addresses, repo names, file paths, or code. The telemetry implementation is in plugin/hooks/hooks.json — it's open source, you can verify exactly what's sent. Set DO_NOT_TRACK=1 to disable.

Audit binary and team pipelines

The plugin gives you the skills and hooks. The upfront binary adds the audit trail — a durable JSONL log of every thinking record produced during /feature runs. This is what turns individual discipline into team-wide visibility.

How it works

Every time /feature produces a thinking record, the PostToolUse hook captures it and writes a structured event to .upfront/audit.jsonl. Events are stored locally first (durable — survives network failures), then optionally flushed to a remote endpoint.

CLI commands

upfront status                          # Queue depth, last event, config
upfront log                             # View audit events (last 50)
upfront log --feature checkout --phase 1  # Filter by feature/phase
upfront flush                           # Push queued events to remote
upfront purge                           # Delete events older than TTL

Remote integration

Configure .upfront/config.json (project-level) or ~/.upfront/config.json (user-level) to flush events to your observability stack:

{
  "endpoint": "https://your-langfuse.example.com/api/public/ingestion",
  "auth_header": "Bearer pk-lf-...",
  "ttl_days": 90,
  "project_name": "my-project"
}

Important: Config files may contain API tokens. Add .upfront/config.json to your .gitignore.

Compatible tools

Upfront events are structured JSON, compatible with any tool that accepts HTTP POST of JSON payloads:

Langfuse — open-source LLM observability with trace visualization and team views
Arize Phoenix — LLM tracing and evaluation
Helicone — LLM monitoring and analytics
Portkey — AI gateway with observability
Custom webhooks — any endpoint that accepts POST with Content-Type: application/json

The event format extends the agent-monitoring trace schema from the Delivery-Gap-Toolkit, which aligns with OpenTelemetry span conventions. Each event contains session, timestamp, phase, feature name, and the full thinking record summary.

What managers see

Three metrics without reading a single spec:

Adoption — percentage of features that went through /feature vs built ad-hoc
Depth — how many phases were completed (all 4 = thorough thinking, 1-2 = bailed early)
Effectiveness — rework rate difference between spec'd and unspec'd features

The audit trail is the triage layer. 30 seconds to decide whether to read the spec or send it back.

Backed by research

Upfront's design is grounded in empirical research on AI-assisted software development:

The skill erosion problem

Anthropic (2026): Anthropic's own randomized controlled trial — 52 junior engineers learning an async library. The AI group scored 17% lower on comprehension. The largest gap was in debugging — AI specifically undermines the ability to identify when code is wrong. Critically, developers who used AI for conceptual questions scored 65%+, while those who delegated code generation scored below 40%. The interaction pattern matters more than whether AI was used. This is exactly the dynamic Upfront is designed to enforce — thinking, not delegating. (Anthropic, "The Impact of AI Assistance on Coding Skill Formation")
METR (2025): A randomized controlled trial found that AI tooling actually increased completion time by 19% for experienced open-source developers working on their own repositories. The productivity narrative assumes AI helps everyone — this study shows it can actively hurt experienced developers on familiar codebases. (METR, "Measuring the Impact of AI Coding Assistants on Developer Productivity")
Harvard/BCG (2023): 758 consultants using GPT-4 — quality jumped 40% inside the AI's capability frontier, but dropped 19 percentage points below the no-AI group on tasks outside it. The variable was judgment, and judgment atrophies without exercise. (Dell'Acqua et al., "Navigating the Jagged Technological Frontier")
DORA (2025): AI adoption increases throughput but also increases delivery instability. Teams shipping faster are simultaneously shipping less reliably. Speed without verification is not a gain. (DORA State of DevOps Report 2025)
Microsoft Research (2024): Developers using AI assistants experienced a "false sense of confidence" — they believed their code was more secure than it actually was. The AI-generated code had comparable vulnerability rates but the developers reviewed it less carefully. (Perry et al.)

The review collapse

SmartBear/Cisco: Review effectiveness collapses above 400 lines of code. Reviewers stop finding defects when diffs are too large. This is why /plan targets ~400 LOC phases — it's the empirical limit of human attention. (SmartBear Code Review Study)
Faros AI (2024): 10,000 developers across 1,255 teams — PR volume increased 98% after AI adoption, but net throughput (accepted, non-reverted changes) showed zero improvement. The volume increase was absorbed by review overhead and rework.
GitClear (2024): Across 211 million changed lines, code churn doubled while refactoring collapsed. AI generates more code but less of it survives contact with production.

Why forcing functions work

Nagappan & Ball (2005): Code churn is a defect predictor with 89% accuracy. Rework rate — the metric /feature ties to success criteria — is one of the strongest signals of software quality. (ICSE 2005)
Capers Jones: Defect removal efficiency (DRE) above 95% is the threshold for adequate quality, across 12,000+ projects. The Verification Triangle in the Delivery Gap framework uses this as the theoretical anchor.
Mantyla & Lassenius (2009): 75% of defects found in code review are evolvability issues (structure, clarity, maintainability), not functional bugs. This is why /build's review checks architecture and spec compliance, not just correctness.

The specification gap

Montgomery et al.: Systematic mapping of requirements quality research shows that ambiguous requirements are the single largest source of downstream defects. Clear intent — what Upfront's /feature forces — reduces rework more than any other intervention.
Jellyfish (2025): 60% of engineering leaders cite "lack of clear metrics" as their biggest challenge with AI adoption. Only 20% measure AI's actual impact on delivery outcomes. The audit trail and /retro feedback loop exist to close this gap.

Philosophy

Upfront is built on one belief: the thinking is the product, not the code.

AI can generate code. It cannot generate judgment. Every command in this toolkit exists to protect and exercise human judgment — the one thing that doesn't come back once it's gone.

The spec is not the point. The thinking the spec forces is the point.

Read HUMAN-FIRST.md for the full picture: how human-writes mode works, why challenge-first questioning matters, what thinking records capture, and what this means for ICs, leads, and managers.

License

AGPL-3.0. See LICENSE.

For commercial licensing (proprietary use, SaaS embedding, or redistribution without AGPL obligations), contact brenn@thedeliverygap.com.

Acknowledgments

Inspired by Superhuman, Get-Shit-Done, and Agentic Brownfield Coding, and countless hours of research on the impact of AI on learning, coding, and people in general.

Always Think Upfront.

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
.claude-plugin		.claude-plugin
.github/workflows		.github/workflows
assets		assets
cmd/upfront		cmd/upfront
examples		examples
internal		internal
plugin		plugin
scripts		scripts
site		site
specs		specs
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.goreleaser.yml		.goreleaser.yml
CLAUDE.md		CLAUDE.md
HUMAN-FIRST.md		HUMAN-FIRST.md
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
hooks		hooks
install-upfront.ps1		install-upfront.ps1
install-upfront.sh		install-upfront.sh
install.sh		install.sh

Folders and files

Latest commit

History

Repository files navigation

Upfront

How it works

Commands

Think

/ideate — Find a problem worth solving

/explore — Document the codebase and its ecosystem

/teach — Rebuild your understanding

Define

/feature — Define a feature through forced thinking

/refine — Iterate on a spec without starting over

Plan

/plan — Break a spec into buildable phases

Build

/build — Execute phases with TDD, review, and red team

/quick — Small changes without ceremony

/patch — More structure than /quick, less than /feature

/debug — Scientific method debugging

Ship

/ship — Create a PR with spec context

Learn

/retro — Check predictions against reality

Support

/note — Zero-friction todo capture

/pause — Structured session handoff

/resume — Restore from handoff

Persistent documents

Install

Telemetry

Audit binary and team pipelines

How it works

CLI commands

Remote integration

Compatible tools

What managers see

Backed by research

The skill erosion problem

The review collapse

Why forcing functions work

The specification gap

Philosophy

License

Acknowledgments

Related

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`/ideate` — Find a problem worth solving

`/explore` — Document the codebase and its ecosystem

`/teach` — Rebuild your understanding

`/feature` — Define a feature through forced thinking

`/refine` — Iterate on a spec without starting over

`/plan` — Break a spec into buildable phases

`/build` — Execute phases with TDD, review, and red team

`/quick` — Small changes without ceremony

`/patch` — More structure than /quick, less than /feature

`/debug` — Scientific method debugging

`/ship` — Create a PR with spec context

`/retro` — Check predictions against reality

`/note` — Zero-friction todo capture

`/pause` — Structured session handoff

`/resume` — Restore from handoff

Packages