Skip to content

Pelion-AI/devstack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

devstack

A Claude Code plugin that wraps a decompose → plan → implement → review → checkpoint loop around long, multi-phase coding work.

Mantra: mechanical verification > AI judgment > human checkpoint. Push as much as possible into hooks; reserve human attention for what hooks and AI review cannot catch.

Status: v1.1 (post-m8-qc-html postmortem). Single-round reviews with HITL split, per-leaf commit budget, dual-track codex review, full state machine.


The four problems devstack is built for

  1. Big tasks fall apart silently. Complexity compounds: 20 decision points × 80 % per-step accuracy ≈ 1 % end-to-end correct. Without a contract that every downstream step is checked against, "the agent finished" usually means "the agent got tired".
  2. Premature task complete. Training is still running, tests have not been run, acceptance bullets are not satisfied — Claude marks the task done anyway. Mechanical gates need to refuse the commit, not the human.
  3. Doc–code drift goes unnoticed. An implementation step changes the code but does not update the top-level decomposition. The next session sees the inconsistency and reverts the code (real CWF-author experience). Treating the decomposition as the contract — and forcing every plan/review to read it — closes that loop.
  4. Human-in-the-loop is either too noisy or too sparse. Asking for input at every step is exhausting; full-auto runs drift unwatched. Devstack splits cognitive load: heavy effort at decompose time (one human-led pass), light dispatch at review time (5 single-letter options), bounded checkpoint surface.

5-minute quickstart

# In a Claude Code session, in your project directory:
/init                                  # one-shot opt-in + status print
/decompose <high-level goal>           # human-led contract authoring
/auto-implement                        # run plan → impl → review per leaf

That's the whole happy path. Each leaf phase will:

  1. Plan — Claude drafts plan.md with typed task IDs and mechanical acceptance.
  2. PlanReview — codex (read-only) verifies the plan against the decomposition.
  3. Implement — Claude (or codex-goal) executes one task at a time; task-completion-gate.sh runs your acceptance predicates on every commit.
  4. PhaseReview — two codex passes run concurrently: Track A (code quality) and Track B (contract alignment).
  5. Checkpoint — when HITL is on, you dispatch with a single letter: y / r / n / f / aN.

Every leaf produces exactly 2 NEW commits: <slug>/phase X.Y: plan approved and <slug>/phase X.Y: implementation complete. Commits are how state is recovered across sessions.


Command flow

              human                                   codex (read-only)            codex / claude (read-write)
                │                                            │                                   │
   /init  ────► opt-in marker + workflow.config.json         │                                   │
                │                                            │                                   │
   /decompose ► author decomposition-<slug>.md ────────────► plan/contract review ──┐            │
                │                                            │                       │            │
                ▼                                            │                       │            │
        approved? ──no──► /decompose amend                   │                       │            │
                │ yes                                        │                       │            │
   /auto-implement ─────────────────────────────────────────►│                       │            │
                │                                            │                       │            │
                ├── for each leaf ──► Plan ─────────────────►│                       │            │
                │                                            │ codex plan review     │            │
                │                                            ▼                       │            │
                │                              PlanReviewWait (HITL on)              │            │
                │                                  y/r/n/f/aN                        │            │
                │                                            │                       │            │
                │                       Implement (engine: claude | codex-goal) ─────│───────────►│
                │                                            │     task-gate after every commit   │
                │                                            ▼                                    │
                │                            PhaseReview (Track A + Track B run concurrently)     │
                │                                            │                                    │
                │                              ReviewWait (HITL on)                               │
                │                                  y/r/n/f/aN                                     │
                │                                            │                                    │
                │                              CheckpointWait (HITL on)  ─► next leaf             │
                ▼
   /status            read-only 5-section report (any time)
   /auto-resume       resume from Paused (handles ReviewWait / PlanReviewWait)
   /auto-abort        terminal abort, preserves commits
   /rollback          delete commits with auto backup branch + 2-stage confirm

a3 at any wait dispatcher means accept this one and auto-skip the next 3 waits — the bridge between fully-supervised and fully-unattended runs.


Why devstack vs other Claude-Code workflow plugins

devstack is opinionated. The opinions came from running real multi-phase work through earlier versions and watching exactly where the wheels came off. Eight things it does that adjacent plugins do not (or do less rigorously):

  1. Formalised state machine. 18 transitions (E1–E18), 11 abort conditions, JSON-schema-validated state file, single-writer (hooks/lib/workflow-lib.sh::transition). Other plugins express state through marker files (coarse, drifts) or carry no formal state at all. Illegal transitions error loudly instead of silently continuing — bugs surface near their cause.
  2. Dual-track concurrent codex review. Track A audits code quality (bug / security / perf / maintainability). Track B audits contract alignment against the decomposition (silent_drop / scope_creep / contract_break / acceptance_miss). Splitting the two prevents the single-reviewer attention dilution that single-track loops suffer from. P1/P2/P3 grading + Track B's explicit decomposition: amended exception are scars from earlier rounds.
  3. Cross-model review. codex reviews Claude's code. Most loops are Claude-reviews-Claude (same model family, shared blind spots) or self-evaluation (/goal family). Heterogeneous reviewer raises the upper bound on caught defects.
  4. Per-leaf commit budget enforced as a Bash hook. commit-budget-gate.sh (PreToolUse on Bash) accepts only the four reserved commit subjects and lets non-slug commits pass. The result: each leaf produces exactly 2 NEW workflow commits, the git log is itself a readable work record, and there is nothing to clean up afterward.
  5. Multi-decomposition management. Slug-isolated state with a current-decomposition switch lets a project run several decompositions in flight (auth + payments + migration on the same repo). Adjacent plugins assume one decomposition per project.
  6. Real test coverage. 16 shell test scripts cover the state machine, abort conditions, every hook, every command, all 6 acceptance predicate forms, the commit budget, resume, status, and rollback. Plus dogfood-demo.sh, fixtures, and verify-equivalence.sh. Industrial-grade test surface for a workflow plugin.
  7. First-class long-running processes. ML training, large builds, headless agents — registered via the longproc library, gated by a Stop hook so Claude cannot end the session while a tracked subprocess is ACTIVE or STUCK. Other plugins ignore this regime entirely.
  8. Refused the auto-fix loop. v1.1 deliberately set review.rounds = 1 and added the HITL dispatcher (y/r/n/f/aN). Ralph-style infinite-fix loops have one well-known failure mode — burn budget on the wrong direction. Devstack chose human dispatch over more retries. That refusal needed having tried it the other way first.

Installation

devstack is a Claude Code plugin. The published repository is the plugin itself (this directory becomes the plugin root).

# 1. Clone
git clone git@github.com:Pelion-AI/devstack.git ~/code/devstack

# 2. Probe deps (need git + jq; codex >= 0.128.0 enables codex-goal engine)
bash ~/code/devstack/scripts/check-deps.sh

# 3. Register with Claude Code (path-based marketplace shown; see MIGRATION.md for alternatives)
jq --arg path "$HOME/code/devstack" '
  .extraKnownMarketplaces["devstack-local"] = { "source": { "source": "path", "path": $path } }
  | .enabledPlugins["devstack@devstack-local"] = true
' ~/.claude/settings.json > /tmp/s.json && mv /tmp/s.json ~/.claude/settings.json

# 4. Restart Claude Code so it loads hooks/hooks.json + the v1.1 commands.

Other registration paths (direct enabledPlugins, ~/.claude/plugins/ symlink, manual user-scope hooks) are documented in MIGRATION.md.

Per-project opt-in

The plugin is inert in a project until you opt in. From a Claude Code session:

/init

That writes .claude/workflow-enabled and .claude/workflow.config.json and prints a one-screen status summary. Idempotent — re-runnable.

Kill-switch one project without affecting others:

touch .claude/workflow-emergency-disable

The disable marker takes precedence over workflow-enabled.


What ships

Commands (10)

command purpose
/init Opt-in this project + print status
/decompose Author or amend a decomposition
/auto-implement Run the workflow loop (resumable)
/auto-resume Resume from Paused (handles ReviewWait / PlanReviewWait)
/auto-abort Terminal abort; preserves commits
/status Read-only 5-section analytical report
/rollback Delete commits with auto backup branch + 2-stage confirm
/auto-long-process <on|off> Toggle longproc Stop-hook gate
/human-in-the-loop <on|off> Toggle HITL checkpoint pauses
/spawn-teamagents <on|off> Toggle teamagents intent (engine still deferred in v1.1)

Skills

workflow-decompose, workflow-plan, workflow-implement, workflow-review, workflow-codex-goal, long-running-processes. Each opens with a "Critical Must-Do" TL;DR and embeds the relevant contracts (LeafPhaseSpec, PlanTask, the 6 mechanical acceptance predicate forms).

Hooks

event hook purpose
PreToolUse (Bash) commit-budget-gate.sh Enforce 2-NEW-commits-per-leaf via reserved subjects
PostToolUse (ExitPlanMode) codex-review-plan.sh Single-track codex plan review (~10 min budget); disk fallback for empty payload
PostToolUse (Bash) task-completion-gate.sh lint / test / 6-form acceptance / longproc / todo / commit_prefix / loop checks
Stop stop-phase-gate.sh Block session end while ACTIVE/STUCK longproc subprocesses are tracked
SessionStart session-start-resume.sh When status=Paused, surface /status / /auto-resume / /auto-abort
SubagentStop subagent-review-save.sh Persist subagent review-shaped output to <slug>/phase-<X.Y>/review.md

Engines

  • engine: claude (default) — Claude executes one task at a time per the workflow-implement SKILL's 8-step flow; per-task git commit triggers the gate.
  • engine: codex-goal — spawns a headless codex /goal subprocess, classified into 5 termination states; runs a post-codex gate after achieved because codex's own commits do not trigger PostToolUse(Bash).
  • engine: teamagents — config switch only in v1.1; real parallel-Agent-Teams runtime is deferred.

Configuration

workflow.config.json ships with v1.1 defaults. Toggle the three boolean switches at the top via the /auto-long-process, /human-in-the-loop, /spawn-teamagents commands; everything else is edited directly.

{
  "human_in_the_loop": "on",
  "auto_long_process": "on",
  "spawn_teamagents": "off",
  "plan":      { "codex_review": true,  "rounds": 1, "review_wait_on_hitl": true,  "fix_iterations": 2 },
  "implement": { "default_engine": "claude", "task_failure_threshold": 3, "large_diff_warn_threshold": 800,
                 "codex_goal":   { "default_budget_tokens": 80000, "max_budget_tokens": 200000,
                                   "monitoring_interval_seconds": 30 } },
  "review":    { "method": "codex", "codex_timeout_seconds": 600, "rounds": 1,
                 "review_wait_on_hitl": true, "phase_review_dual_track": true, "fix_iterations": 2 },
  "auto_mode": { "max_duration_minutes": 240, "max_idle_minutes": 30,
                 "budget_limited_action": "pause", "budget_extend_max_count": 1 },
  "rollback":  { "require_clean_tree": true, "auto_backup_branch": true },
  "hooks":     { "session_start_resume_prompt": true, "subagent_review_save": true }
}

HITL dispatcher (single-letter)

When human_in_the_loop=on, plan review and phase review halt at PlanReviewWait / ReviewWait instead of auto-fix-looping:

key meaning
y Accept the review and apply P1 fixes
r Re-run the same review (use when you suspect codex hallucinated)
n Accept review as final, skip fixes, advance
f Write feedback.md, transition to Paused; resume after editing
aN Accept + auto-skip the next N review-wait pauses

aN is the bridge: read the first leaf's review carefully, then a4 to let the rest of the run go unattended.


Layout

.
├── .claude-plugin/plugin.json            ← manifest
├── README.md                             ← this file
├── MIGRATION.md                          ← upgrade path from user-scope hooks
├── commands/                             ← 10 slash commands
├── skills/                               ← 6 SKILLs (decompose / plan / implement / review / codex-goal / longproc)
├── hooks/
│   ├── hooks.json                        ← 6 event registrations
│   ├── commit-budget-gate.sh             ← PreToolUse on Bash
│   ├── codex-review-plan.sh              ← PostToolUse on ExitPlanMode (with disk fallback)
│   ├── codex-track-{a,b}-prompt.md       ← dual-track prompt definitions
│   ├── task-completion-gate.sh           ← PostToolUse on Bash (6 acceptance forms)
│   ├── stop-phase-gate.sh                ← Stop hook
│   ├── session-start-resume.sh           ← SessionStart hook
│   ├── subagent-review-save.sh           ← SubagentStop hook
│   └── lib/                              ← workflow-lib.sh (transition), codex-review.sh, codex-goal.sh, longproc-lib.sh
├── scripts/
│   ├── init-project.sh
│   ├── check-deps.sh
│   ├── codex-goal-runner.sh
│   ├── codex-track-{a,b}.sh
│   └── rollback.sh
├── templates/
│   ├── workflow.config.json              ← v1.1 schema
│   ├── auto-state.schema.json            ← legal state combinations
│   └── decomposition.md.template
└── tests/                                ← 16 shell tests + dogfood + fixtures + verify-equivalence

Verification

# Syntax check
bash -n hooks/*.sh hooks/lib/*.sh scripts/*.sh tests/*.sh

# Run every test
for t in tests/test-*.sh tests/dogfood-demo.sh; do bash "$t"; done

# End-to-end equivalence (real codex; takes a few minutes; skips codex-goal smoke if codex < 0.128.0)
bash tests/verify-equivalence.sh

Limitations (v1.1, known)

  • codex-goal requires codex ≥ 0.128.0. Below that, codex-goal-runner.sh refuses with a clear error. Plan/PhaseReview hooks still work on older codex.
  • engine: teamagents is config-only. /auto-implement step 8 still ESCALATEs on teamagents tasks; real parallel Agent Teams runtime is deferred.
  • Single-session, single-turn execution. A long /auto-implement shares one Claude context window — no auto-compact between leaves. For very long runs, plan smaller decompositions or use HITL f to reset between leaves.
  • /rollback uses git reset --hard. Commits stay in git reflog for ~90 days plus the auto-created backup branch. Full purge is not done automatically.

License & contact

Author: see .claude-plugin/plugin.json. License: TBD until v1.0 release (treat as all rights reserved).

About

A Claude Code plugin that wraps a decompose → plan → implement → review → checkpoint loop around long, multi-phase coding work.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages