devstack

A Claude Code plugin that wraps a decompose → plan → implement → review → checkpoint loop around long, multi-phase coding work.

Mantra: mechanical verification > AI judgment > human checkpoint. Push as much as possible into hooks; reserve human attention for what hooks and AI review cannot catch.

Status: v1.1 (post-m8-qc-html postmortem). Single-round reviews with HITL split, per-leaf commit budget, dual-track codex review, full state machine.

The four problems devstack is built for

Big tasks fall apart silently. Complexity compounds: 20 decision points × 80 % per-step accuracy ≈ 1 % end-to-end correct. Without a contract that every downstream step is checked against, "the agent finished" usually means "the agent got tired".
Premature task complete. Training is still running, tests have not been run, acceptance bullets are not satisfied — Claude marks the task done anyway. Mechanical gates need to refuse the commit, not the human.
Doc–code drift goes unnoticed. An implementation step changes the code but does not update the top-level decomposition. The next session sees the inconsistency and reverts the code (real CWF-author experience). Treating the decomposition as the contract — and forcing every plan/review to read it — closes that loop.
Human-in-the-loop is either too noisy or too sparse. Asking for input at every step is exhausting; full-auto runs drift unwatched. Devstack splits cognitive load: heavy effort at decompose time (one human-led pass), light dispatch at review time (5 single-letter options), bounded checkpoint surface.

5-minute quickstart

# In a Claude Code session, in your project directory:
/init                                  # one-shot opt-in + status print
/decompose <high-level goal>           # human-led contract authoring
/auto-implement                        # run plan → impl → review per leaf

That's the whole happy path. Each leaf phase will:

Plan — Claude drafts plan.md with typed task IDs and mechanical acceptance.
PlanReview — codex (read-only) verifies the plan against the decomposition.
Implement — Claude (or codex-goal) executes one task at a time; task-completion-gate.sh runs your acceptance predicates on every commit.
PhaseReview — two codex passes run concurrently: Track A (code quality) and Track B (contract alignment).
Checkpoint — when HITL is on, you dispatch with a single letter: y / r / n / f / aN.

Every leaf produces exactly 2 NEW commits: <slug>/phase X.Y: plan approved and <slug>/phase X.Y: implementation complete. Commits are how state is recovered across sessions.

Command flow

              human                                   codex (read-only)            codex / claude (read-write)
                │                                            │                                   │
   /init  ────► opt-in marker + workflow.config.json         │                                   │
                │                                            │                                   │
   /decompose ► author decomposition-<slug>.md ────────────► plan/contract review ──┐            │
                │                                            │                       │            │
                ▼                                            │                       │            │
        approved? ──no──► /decompose amend                   │                       │            │
                │ yes                                        │                       │            │
   /auto-implement ─────────────────────────────────────────►│                       │            │
                │                                            │                       │            │
                ├── for each leaf ──► Plan ─────────────────►│                       │            │
                │                                            │ codex plan review     │            │
                │                                            ▼                       │            │
                │                              PlanReviewWait (HITL on)              │            │
                │                                  y/r/n/f/aN                        │            │
                │                                            │                       │            │
                │                       Implement (engine: claude | codex-goal) ─────│───────────►│
                │                                            │     task-gate after every commit   │
                │                                            ▼                                    │
                │                            PhaseReview (Track A + Track B run concurrently)     │
                │                                            │                                    │
                │                              ReviewWait (HITL on)                               │
                │                                  y/r/n/f/aN                                     │
                │                                            │                                    │
                │                              CheckpointWait (HITL on)  ─► next leaf             │
                ▼
   /status            read-only 5-section report (any time)
   /auto-resume       resume from Paused (handles ReviewWait / PlanReviewWait)
   /auto-abort        terminal abort, preserves commits
   /rollback          delete commits with auto backup branch + 2-stage confirm

a3 at any wait dispatcher means accept this one and auto-skip the next 3 waits — the bridge between fully-supervised and fully-unattended runs.

Why devstack vs other Claude-Code workflow plugins

devstack is opinionated. The opinions came from running real multi-phase work through earlier versions and watching exactly where the wheels came off. Eight things it does that adjacent plugins do not (or do less rigorously):

Formalised state machine. 18 transitions (E1–E18), 11 abort conditions, JSON-schema-validated state file, single-writer (hooks/lib/workflow-lib.sh::transition). Other plugins express state through marker files (coarse, drifts) or carry no formal state at all. Illegal transitions error loudly instead of silently continuing — bugs surface near their cause.
Dual-track concurrent codex review. Track A audits code quality (bug / security / perf / maintainability). Track B audits contract alignment against the decomposition (silent_drop / scope_creep / contract_break / acceptance_miss). Splitting the two prevents the single-reviewer attention dilution that single-track loops suffer from. P1/P2/P3 grading + Track B's explicit decomposition: amended exception are scars from earlier rounds.
Cross-model review. codex reviews Claude's code. Most loops are Claude-reviews-Claude (same model family, shared blind spots) or self-evaluation (/goal family). Heterogeneous reviewer raises the upper bound on caught defects.
Per-leaf commit budget enforced as a Bash hook. commit-budget-gate.sh (PreToolUse on Bash) accepts only the four reserved commit subjects and lets non-slug commits pass. The result: each leaf produces exactly 2 NEW workflow commits, the git log is itself a readable work record, and there is nothing to clean up afterward.
Multi-decomposition management. Slug-isolated state with a current-decomposition switch lets a project run several decompositions in flight (auth + payments + migration on the same repo). Adjacent plugins assume one decomposition per project.
Real test coverage. 16 shell test scripts cover the state machine, abort conditions, every hook, every command, all 6 acceptance predicate forms, the commit budget, resume, status, and rollback. Plus dogfood-demo.sh, fixtures, and verify-equivalence.sh. Industrial-grade test surface for a workflow plugin.
First-class long-running processes. ML training, large builds, headless agents — registered via the longproc library, gated by a Stop hook so Claude cannot end the session while a tracked subprocess is ACTIVE or STUCK. Other plugins ignore this regime entirely.
Refused the auto-fix loop. v1.1 deliberately set review.rounds = 1 and added the HITL dispatcher (y/r/n/f/aN). Ralph-style infinite-fix loops have one well-known failure mode — burn budget on the wrong direction. Devstack chose human dispatch over more retries. That refusal needed having tried it the other way first.

Installation

devstack is a Claude Code plugin. The published repository is the plugin itself (this directory becomes the plugin root).

# 1. Clone
git clone git@github.com:Pelion-AI/devstack.git ~/code/devstack

# 2. Probe deps (need git + jq; codex >= 0.128.0 enables codex-goal engine)
bash ~/code/devstack/scripts/check-deps.sh

# 3. Register with Claude Code (path-based marketplace shown; see MIGRATION.md for alternatives)
jq --arg path "$HOME/code/devstack" '
  .extraKnownMarketplaces["devstack-local"] = { "source": { "source": "path", "path": $path } }
  | .enabledPlugins["devstack@devstack-local"] = true
' ~/.claude/settings.json > /tmp/s.json && mv /tmp/s.json ~/.claude/settings.json

# 4. Restart Claude Code so it loads hooks/hooks.json + the v1.1 commands.

Other registration paths (direct enabledPlugins, ~/.claude/plugins/ symlink, manual user-scope hooks) are documented in MIGRATION.md.

Per-project opt-in

The plugin is inert in a project until you opt in. From a Claude Code session:

/init

That writes .claude/workflow-enabled and .claude/workflow.config.json and prints a one-screen status summary. Idempotent — re-runnable.

Kill-switch one project without affecting others:

touch .claude/workflow-emergency-disable

The disable marker takes precedence over workflow-enabled.

What ships

Commands (10)

command	purpose
`/init`	Opt-in this project + print status
`/decompose`	Author or amend a decomposition
`/auto-implement`	Run the workflow loop (resumable)
`/auto-resume`	Resume from `Paused` (handles `ReviewWait` / `PlanReviewWait`)
`/auto-abort`	Terminal abort; preserves commits
`/status`	Read-only 5-section analytical report
`/rollback`	Delete commits with auto backup branch + 2-stage confirm
`/auto-long-process <on\|off>`	Toggle longproc Stop-hook gate
`/human-in-the-loop <on\|off>`	Toggle HITL checkpoint pauses
`/spawn-teamagents <on\|off>`	Toggle teamagents intent (engine still deferred in v1.1)

Skills

workflow-decompose, workflow-plan, workflow-implement, workflow-review, workflow-codex-goal, long-running-processes. Each opens with a "Critical Must-Do" TL;DR and embeds the relevant contracts (LeafPhaseSpec, PlanTask, the 6 mechanical acceptance predicate forms).

Hooks

event	hook	purpose
`PreToolUse` (Bash)	`commit-budget-gate.sh`	Enforce 2-NEW-commits-per-leaf via reserved subjects
`PostToolUse` (ExitPlanMode)	`codex-review-plan.sh`	Single-track codex plan review (~10 min budget); disk fallback for empty payload
`PostToolUse` (Bash)	`task-completion-gate.sh`	lint / test / 6-form acceptance / longproc / todo / commit_prefix / loop checks
`Stop`	`stop-phase-gate.sh`	Block session end while ACTIVE/STUCK longproc subprocesses are tracked
`SessionStart`	`session-start-resume.sh`	When status=Paused, surface `/status` / `/auto-resume` / `/auto-abort`
`SubagentStop`	`subagent-review-save.sh`	Persist subagent review-shaped output to `<slug>/phase-<X.Y>/review.md`

Engines

engine: claude (default) — Claude executes one task at a time per the workflow-implement SKILL's 8-step flow; per-task git commit triggers the gate.
engine: codex-goal — spawns a headless codex /goal subprocess, classified into 5 termination states; runs a post-codex gate after achieved because codex's own commits do not trigger PostToolUse(Bash).
engine: teamagents — config switch only in v1.1; real parallel-Agent-Teams runtime is deferred.

Configuration

workflow.config.json ships with v1.1 defaults. Toggle the three boolean switches at the top via the /auto-long-process, /human-in-the-loop, /spawn-teamagents commands; everything else is edited directly.

{
  "human_in_the_loop": "on",
  "auto_long_process": "on",
  "spawn_teamagents": "off",
  "plan":      { "codex_review": true,  "rounds": 1, "review_wait_on_hitl": true,  "fix_iterations": 2 },
  "implement": { "default_engine": "claude", "task_failure_threshold": 3, "large_diff_warn_threshold": 800,
                 "codex_goal":   { "default_budget_tokens": 80000, "max_budget_tokens": 200000,
                                   "monitoring_interval_seconds": 30 } },
  "review":    { "method": "codex", "codex_timeout_seconds": 600, "rounds": 1,
                 "review_wait_on_hitl": true, "phase_review_dual_track": true, "fix_iterations": 2 },
  "auto_mode": { "max_duration_minutes": 240, "max_idle_minutes": 30,
                 "budget_limited_action": "pause", "budget_extend_max_count": 1 },
  "rollback":  { "require_clean_tree": true, "auto_backup_branch": true },
  "hooks":     { "session_start_resume_prompt": true, "subagent_review_save": true }
}

HITL dispatcher (single-letter)

When human_in_the_loop=on, plan review and phase review halt at PlanReviewWait / ReviewWait instead of auto-fix-looping:

key	meaning
`y`	Accept the review and apply P1 fixes
`r`	Re-run the same review (use when you suspect codex hallucinated)
`n`	Accept review as final, skip fixes, advance
`f`	Write `feedback.md`, transition to `Paused`; resume after editing
`aN`	Accept + auto-skip the next N review-wait pauses

aN is the bridge: read the first leaf's review carefully, then a4 to let the rest of the run go unattended.

Layout

.
├── .claude-plugin/plugin.json            ← manifest
├── README.md                             ← this file
├── MIGRATION.md                          ← upgrade path from user-scope hooks
├── commands/                             ← 10 slash commands
├── skills/                               ← 6 SKILLs (decompose / plan / implement / review / codex-goal / longproc)
├── hooks/
│   ├── hooks.json                        ← 6 event registrations
│   ├── commit-budget-gate.sh             ← PreToolUse on Bash
│   ├── codex-review-plan.sh              ← PostToolUse on ExitPlanMode (with disk fallback)
│   ├── codex-track-{a,b}-prompt.md       ← dual-track prompt definitions
│   ├── task-completion-gate.sh           ← PostToolUse on Bash (6 acceptance forms)
│   ├── stop-phase-gate.sh                ← Stop hook
│   ├── session-start-resume.sh           ← SessionStart hook
│   ├── subagent-review-save.sh           ← SubagentStop hook
│   └── lib/                              ← workflow-lib.sh (transition), codex-review.sh, codex-goal.sh, longproc-lib.sh
├── scripts/
│   ├── init-project.sh
│   ├── check-deps.sh
│   ├── codex-goal-runner.sh
│   ├── codex-track-{a,b}.sh
│   └── rollback.sh
├── templates/
│   ├── workflow.config.json              ← v1.1 schema
│   ├── auto-state.schema.json            ← legal state combinations
│   └── decomposition.md.template
└── tests/                                ← 16 shell tests + dogfood + fixtures + verify-equivalence

Verification

# Syntax check
bash -n hooks/*.sh hooks/lib/*.sh scripts/*.sh tests/*.sh

# Run every test
for t in tests/test-*.sh tests/dogfood-demo.sh; do bash "$t"; done

# End-to-end equivalence (real codex; takes a few minutes; skips codex-goal smoke if codex < 0.128.0)
bash tests/verify-equivalence.sh

Limitations (v1.1, known)

codex-goal requires codex ≥ 0.128.0. Below that, codex-goal-runner.sh refuses with a clear error. Plan/PhaseReview hooks still work on older codex.
engine: teamagents is config-only. /auto-implement step 8 still ESCALATEs on teamagents tasks; real parallel Agent Teams runtime is deferred.
Single-session, single-turn execution. A long /auto-implement shares one Claude context window — no auto-compact between leaves. For very long runs, plan smaller decompositions or use HITL f to reset between leaves.
/rollback uses git reset --hard. Commits stay in git reflog for ~90 days plus the auto-created backup branch. Full purge is not done automatically.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

devstack

The four problems devstack is built for

5-minute quickstart

Command flow

Why devstack vs other Claude-Code workflow plugins

Installation

Per-project opt-in

What ships

Commands (10)

Skills

Hooks

Engines

Configuration

HITL dispatcher (single-letter)

Layout

Verification

Limitations (v1.1, known)

License & contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.claude-plugin		.claude-plugin
agents		agents
commands		commands
hooks		hooks
scripts		scripts
skills		skills
templates		templates
tests		tests
.gitignore		.gitignore
MIGRATION.md		MIGRATION.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

devstack

The four problems devstack is built for

5-minute quickstart

Command flow

Why devstack vs other Claude-Code workflow plugins

Installation

Per-project opt-in

What ships

Commands (10)

Skills

Hooks

Engines

Configuration

HITL dispatcher (single-letter)

Layout

Verification

Limitations (v1.1, known)

License & contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages