A Claude Code / GitHub Copilot CLI plugin for a complete development lifecycle with built-in ticket management. Conserves context by delegating to sub-agents, and guarantees quality through a Generator-Evaluator pipeline.
- Why simple-workflow?
- Prerequisites
- Quick Start
- Building Blocks
- Core Workflow
- All Skills
- Configuration
- Limitations
- Acknowledgements
- Contributing
Claude Code is powerful, but its context window is finite — and fragile. Long-running agent sessions face four structural threats:
| Threat | What happens | Structural countermeasure |
|---|---|---|
| Loss | Session boundaries — compaction, exit — discard accumulated understanding | Pre-compact hooks, /catchup recovery, /tune cross-session learning |
| Exhaustion | The window fills up, degrading instruction-following and response quality | Bounded sub-agent returns (< 500 tokens), phase-aware context release |
| Contamination | Biasing information leaks into contexts where it distorts judgment | Information firewall (Generator → Evaluator blocked), ticket directory confinement |
| Bloat | Unbounded intermediate output crowds out critical instructions | Artifacts written to files, structured summaries returned to orchestrator |
simple-workflow addresses each threat with architectural constraints that hold regardless of model behavior — not prompt-level instructions that the model might rationalize away.
Treats the context window as a consumable resource and systematically conserves it.
- Bounded sub-agent returns: Each sub-agent launches with a fresh context, writes detailed artifacts to files, and returns only a structured summary (< 500 tokens). Without this bound, multi-round orchestration would accumulate unbounded output and degrade the orchestrator's decision quality
- Phase-aware context release:
/catchupauto-detects your current phase (investigate → plan → implement → test → review → commit) and recommends the next action. Completed phases live on disk — clear the context and move on - Structured state preservation: Before context compaction, a hook saves per-ticket state as YAML frontmatter.
/catchupparses this to resume interrupted work — including mid-/implloops
Structurally separates "writing code" from "judging code" to guarantee quality by design.
- A Generator writes code, independent Evaluators verify it, and failures trigger automatic retry with specific feedback — up to 3 rounds. See
/implfor the full pipeline - Information firewall (asymmetric): Evaluators never see the Generator's self-assessment — they judge solely from
git diffand test results. The reverse is intentionally open: on retry, the Generator receives Evaluator feedback. Even though both sides run the same model, weights × context = output: by excluding the Generator's trial-and-error history and implicit knowledge of shortcuts from the Evaluator's context, sunk-cost bias is structurally eliminated rather than merely discouraged by prompt - Mandatory Skill Invocations: Orchestrator skills enforce that sub-agent dispatch uses the
Skilltool rather than manual bash fallbacks. MUST/NEVER/Fail language in skill definitions makes this a structural contract, not a suggestion — ensuring sub-agents always launch with proper context isolation - FAIL-CRITICAL violations halt execution immediately — no rationalization, no retry
- After ticket completion, evaluation logs feed into the Knowledge Base, creating a cross-session feedback loop
Your project's .backlog/ directory is a state machine. Tickets transition between states via physical directory moves (mv .backlog/product_backlog/{ticket-dir} .backlog/active/{ticket-dir}), making state visible, traceable, and greppable — no database or external tools required.
.backlog/
├── product_backlog/ # New tickets
├── active/ # In-progress tickets
├── blocked/ # Blocked tickets
└── done/ # Completed tickets
Note: Moving tickets to
blocked/is done manually:mv .backlog/active/{ticket-dir} .backlog/blocked/{ticket-dir}To unblock and resume work:
mv .backlog/blocked/{ticket-dir} .backlog/active/{ticket-dir}
Each ticket is a directory where all work artifacts accumulate:
.backlog/active/001-add-search-feature/
├── ticket.md # The ticket (size, acceptance criteria, scope)
├── investigation.md # Research findings
├── plan.md # Implementation plan
├── eval-round-1.md # Acceptance evaluation (round 1)
└── quality-round-1.md # Quality review (round 1)
From creation to completion, every intermediate artifact is confined within its ticket directory. This directory-level confinement serves dual purposes: audit trail (the complete history of investigation, planning, evaluation, and review is preserved as files) and contamination prevention (artifacts from one ticket never leak into another's context). The directory structure itself enforces governance — information accumulates in the filesystem, not the context window.
Every ticket carries a phase-state.yaml file that declares its full lifecycle state (create_ticket → scout → impl → ship → done). /create-ticket creates it, each subsequent phase-owner skill updates only its own section, and /ship moves it alongside the ticket directory to .backlog/done/. phase-state.yaml is never deleted — it is the permanent lifecycle record for the ticket. The SessionStart hook and /catchup both read this file to recover active-ticket context in one step, without walking every artifact. The canonical schema, field enums, and per-skill write-ownership rules live in skills/create-ticket/references/phase-state-schema.md.
Scope note:
phase-state.yamltracks the manual workflow (/create-ticket→/scout→/impl→/ship). Cost accounting and orchestration state for/autopilotare tracked inautopilot-state.yaml, a separate schema. Seeskills/create-ticket/references/phase-state-schema.md§5 "Dual-state precedence" for how/catchupreconciles the two.
Phase-terminating skills (/create-ticket, /scout, /plan2doc, /impl, /ship) close their output with a standardized [SW-CHECKPOINT] block that lists the phase, ticket, artifacts, and recommended next command — a signal that running /clear is safe and that /catchup can recover from phase-state.yaml with minimal token spend.
.simple-wf-knowledge/ is an automatically maintained knowledge base that captures recurring patterns from evaluation logs. /tune analyzes completed ticket evaluations (eval-round, audit-round files) via the tune-analyzer agent, extracts actionable patterns (common failures, recurring feedback themes), and persists them as structured entries. At implementation time, /impl injects relevant knowledge base patterns into the Generator's dispatch prompt, so lessons learned from past tickets inform future implementation — closing the loop between evaluation feedback and code generation across sessions.
This feedback loop means simple-workflow is not a static tool — it is a learning system that improves with use. The more tickets you complete in a project, the more project-specific patterns accumulate, and the higher the probability that future implementations pass evaluation on the first round. In effect, the system develops project-specific expertise over time — analogous to a human developer becoming more effective the longer they work on a codebase — without fine-tuning the underlying model.
simple-workflow is composed of three types of components: Skills, Sub-agents, and Hooks.
Operations invoked as slash commands like /scout or /impl. There are two kinds:
- Orchestrators: Don't do work themselves — they coordinate sub-agents to drive a workflow (
/impl,/ship,/create-ticket,/brief,/autopilot, etc.) - Delegators: Hand off work to a specific sub-agent (
/investigate,/plan2doc,/test, etc.)
Specialists launched by skills. Each runs in an isolated context with a tool permission scope appropriate to its role:
- Generator agents (
implementer,test-writer) need broadBash(*)access to run arbitrary build/test tools defined by the target project - Evaluator agents (
ac-evaluator,code-reviewer,security-scanner,ticket-evaluator) are restricted to read-only file utilities, withac-evaluatoradditionally having read-only git access and specific test/lint runners - Research/planning agents (
researcher,planner) are restricted to read-only git and filesystem tools
This asymmetry is deliberate: the Generator-Evaluator separation relies on evaluators being unable to execute destructive commands even if prompted to do so.
| Role | Agent | Model |
|---|---|---|
| Research | researcher | Sonnet |
| Planning | planner | Opus / Sonnet |
| Implementation | implementer | Opus / Sonnet |
| Acceptance evaluation | ac-evaluator | Sonnet |
| Quality review | code-reviewer | Sonnet |
| Testing | test-writer | Sonnet |
| Ticket evaluation | ticket-evaluator | Sonnet |
| Security audit | security-scanner | Sonnet |
| Pattern analysis | tune-analyzer | Sonnet |
Models are auto-selected based on ticket size (S/M/L/XL). planner uses Sonnet for S and Opus for M/L/XL. implementer uses Sonnet for S/M and Opus for L/XL. Both agents accept a dynamic model parameter; orchestrator skills pass the appropriate model at invocation time.
Guardrails that fire automatically on tool execution to protect your project.
- pre-bash-safety — Best-effort blocking of common destructive commands (
rm -rf,git push --force,git reset --hard,git clean -f,DROP TABLE/DATABASE, and bulk-staging of sensitive files). Does NOT catch arbitrary destructive commands from cloud / orchestration CLIs (gh repo delete,aws s3 rm,kubectl delete,terraform destroy, etc.) or shell-string indirection (sh -c '...',python -c '...'). Treat this hook as a guardrail for common slip-ups, not as a security boundary. - pre-write-safety / pre-edit-safety — Blocks writes to sensitive files (
.env, private keys, credentials) - session-start — Initializes the session environment: loads branch info and changed file count, auto-injects an initial commit on empty repositories, auto-appends
.gitignoreentries for plugin-managed directories, and cleans old session logs - pre-compact-save — Auto-saves work state before context compaction as a YAML-frontmatter snapshot (active tickets, plans, latest evaluation rounds, in-progress phase).
/catchupparses this to resume mid-loop work after compaction. - session-stop-log — Records a work log (branch, last commit, status, recent commits) on session end as a YAML-frontmatter file. Used by
/catchupas a fallback state source when no compact-state file exists. - autopilot-continue — Stop hook that prevents premature
end_turnduring/autopilotpipeline execution. Returnsdecision: "block"whenautopilot-state.yamlhas unfinished steps, keeping the pipeline running until all steps complete. - pre-level1-guard — PreToolUse hook that blocks integration test scripts from running without
RUN_LEVEL1_TESTS=true, preventing accidental API charges from expensive test suites.
- Claude Code CLI or GitHub Copilot CLI installed and authenticated
- GitHub CLI (
gh) — required for/ship gitandjq
# Install the plugin
claude plugin install aimsise/simple-workflow # Claude Code
copilot plugin install aimsise/simple-workflow # GitHub Copilot CLIOnce installed, all slash commands work the same on both platforms:
/investigate <topic>
/create-ticket <description>
/scout .backlog/product_backlog/001-migrate-to-session-auth/ticket.md
/impl
/audit
/ship
Note: Session lifecycle hooks (
pre-compact-save,session-stop-log) may not fire on Copilot CLI. Context recovery via/catchupafter compaction works best on Claude Code.
A typical development flow follows these five steps:
investigate ──> create-ticket ──> scout ──> impl ──> ship
/investigate how is user authentication currently implemented
The researcher agent explores the codebase and writes its findings to .docs/research/ or the ticket directory. Only a summary is returned to the caller.
/create-ticket migrate from JWT auth to session-based auth
Creates a structured ticket through four phases:
- Investigation: The researcher examines scope and impact
- Socratic dialogue: Asks clarifying questions to align understanding with the user
- Ticket drafting: The planner creates a ticket with size, acceptance criteria, and scope
- Quality evaluation: The ticket-evaluator checks five quality gates (testability, unambiguity, completeness, implementability, size fit) — if any gate fails, the ticket is revised and re-evaluated
The resulting ticket is saved to .backlog/product_backlog/{ticket-dir}/ticket.md (where {ticket-dir} is {NNN}-{slug}, e.g., 001-migrate-to-session-auth).
/scout .backlog/product_backlog/001-migrate-to-session-auth/ticket.md
Chains /investigate and /plan2doc in sequence. Moves the ticket to .backlog/active/, then runs research and creates an implementation plan in one go. /plan2doc selects model based on ticket size (sonnet for S, opus for M/L/XL).
At this point, .backlog/active/{ticket-dir}/ contains ticket.md, investigation.md, and plan.md — everything needed for implementation.
/impl
Implements code through a three-phase pipeline:
Phase 1: Preparation
- Loads the active plan and detects ticket size
- For M-size and above: AC sanity check (the Generator flags ambiguous AC up front)
- For L/XL only: blocking Evaluator dry run (agreement on the verification plan before any code is written; on failure, the user is asked whether to proceed)
- Saves current state with
git stashfor safety
Phase 2: Implementation loop (up to 3 rounds)
- Generator (implementer) writes code using a test-first approach. Model is auto-selected: sonnet for S/M, opus for L/XL.
- AC Evaluator independently verifies acceptance criteria compliance — on failure, sends specific feedback back to the Generator
/auditruns after AC passes — a multi-agent review that always invokessecurity-scannerand runscode-reviewerin parallel, returning an aggregatedStatus / Critical / Warnings / Suggestionsblock
Each round's evaluation results are saved as eval-round-{n}.md / quality-round-{n}.md / security-scan-{n}.md in the ticket directory.
Phase 3: Completion report
- Outputs a summary of all evaluation rounds
Note:
/implrequires interactive mode for specific failure recovery paths (Evaluator Dry Run failure for L/XL tickets,/auditinfrastructure failure). Inclaude -por CI automation, these paths will stop the skill with an explanatory message rather than hang. For fully autonomous pipelines, avoid relying on L/XL ticket sizes or pre-validate your audit infrastructure.Similarly,
/create-ticketrequires interactive mode for two paths: (1) Phase 2 Socratic Refinement is skipped in non-interactive mode (the ticket is generated from researcher findings alone without Q&A refinement, and the summary notes "Phase 2 skipped"); (2) Phase 4 quality FAIL escalation stops the skill with the ticket saved on disk for manual editing — non-interactive mode will not silently bypass unresolved quality gates.
/ship # Commit + PR (default)
/ship merge=true # Commit + PR + squash-merge
Ships the current changes through up to three phases:
- Commit — Stages changes and creates a Conventional Commits-formatted commit
- Create PR — Pushes to GitHub and creates a pull request
- Merge (optional,
merge=true) — Squash-merges, deletes the branch, and syncs local
If no prior review via /audit is detected, a review gate recommends running one first. Pre-computed context (branch name, diff stats, commit log) is gathered with a resilience contract that ensures /ship never fails due to unexpected git state — missing remotes, empty diffs, or detached HEAD are all handled gracefully. After a successful commit, the ticket is automatically moved to .backlog/done/, and /tune is invoked to extract reusable patterns from the ticket's evaluation logs into the project knowledge base.
For a fully automated pipeline from idea to PR:
-
/brief <what-to-build>— Investigates the codebase and conducts a structured interview to gather all requirements. Generates a brief document and an autopilot-policy.yaml defining autonomous decision rules. -
/autopilot <slug>— Reads the brief and executes the full pipeline (create-ticket → scout → impl → ship) with zero human intervention. Decision points are resolved by the autopilot-policy. Large scopes are automatically split into multiple tickets and executed in dependency order. Quality safeguards run at each pipeline step: an Artifact Presence Gate validates that all expected artifacts (investigation, plan, evaluation logs, etc.) exist before marking a step complete, and a Skill Invocation Audit tracks whether each step used proper Skill tool dispatch. Steps that fell back to manual bash invocation are flagged ascompleted-with-warnings.
Note: Workflow isolation is bidirectional.
/autopilotrequires a brief as its starting point — it creates tickets internally and processes only those tickets. It does not pick up existing tickets fromproduct_backlog/. Conversely, manual/implexcludes autopilot-managed tickets (those containingautopilot-policy.yaml) and selects the lowest-numbered non-autopilot ticket first (FIFO). To process tickets created manually via/create-ticket, use the individual skill flow:/scout → /impl → /ship.
The autopilot-policy evolves over time: /tune extracts decision patterns from execution logs, and future /brief runs use these patterns to suggest more accurate defaults.
| Phase | Skill | Description |
|---|---|---|
| Discovery | /investigate |
Deep-dive codebase exploration |
| Discovery | /catchup |
Recover context from compact-state / session-log, detect current phase, and recommend next action (including resuming an in-progress /impl loop) |
| Discovery | /brief |
Structured interview to generate brief + autopilot-policy. Investigates codebase and conducts Q&A to gather requirements |
| Planning | /scout |
Chain investigation + planning in one step |
| Planning | /plan2doc |
Create a detailed implementation plan (auto-selects model by ticket size) |
| Tickets | /create-ticket |
Create a structured ticket with quality evaluation |
| Implementation | /impl |
Implement via Generator-Evaluator pipeline |
| Implementation | /refactor |
Safe refactoring with backup branch |
| Testing | /test |
Design and run tests |
| Quality | /audit |
Multi-agent code quality + security audit (use only_security_scan=true for security-only) |
| Quality | /tune |
Analyze evaluation logs and maintain project knowledge base |
| Delivery | /ship |
Commit + PR in one step (optionally merge) |
| Full Pipeline | /autopilot |
Execute the full pipeline (create-ticket → scout → impl → ship) from a brief document with zero human intervention. Auto-splits large scopes |
Model selection is automatic based on ticket size — S-size tickets use Sonnet for speed, M and above use Opus for depth. This selection is driven by the orchestrator skills (/impl, /plan2doc), which pass the appropriate model to agents at invocation time.
Hook scripts are registered in hooks/hooks.json. To customize, edit the JSON file or override individual scripts while keeping the same interface (read stdin, exit 0 to allow / exit 2 to block).
- Designed for use with Claude Code CLI and GitHub Copilot CLI. IDE extensions (VS Code, JetBrains) may have limited support for hooks and plugin features.
- The
/shipskill requires GitHub CLI (gh) with authentication. Other Git hosting services are not supported. - Ticket management uses the local filesystem (
.backlog/). There is no sync with external issue trackers (Jira, Linear, etc.). - Sub-agents consume API tokens independently. Large tickets (L/XL) using Opus may result in higher API costs.
- AC Evaluator ships with built-in test/lint runners for JS, Python, Rust, Go, JVM (Gradle/Maven/sbt), .NET, Ruby, Elixir, Swift, Flutter/Dart, PHP, and Make. For other ecosystems, wrap your test/lint commands in a Makefile (
make test/make lint) or the evaluator will rely on static code analysis only (reported as PASS-WITH-CAVEATS).
simple-workflow is heavily inspired by:
- Harness design for long-running agents — Anthropic's guide on designing harnesses for reliable, long-running AI agents
- obra/superpowers — Patterns for maximizing Claude Code's capabilities through skills, agents, and hooks
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.