Agent Flow

Durable, deterministic LLM agent workflows — described in plain English, run inside Claude Code, with zero runtime dependencies.

/agentflow:create-workflow · /agentflow:run-workflow · Docs · Cookbook

The problem

You kick off a long, multi-step agent job in Claude Code — review 200 files, draft every chapter of an outline, drain a work queue from three terminals. Then context compaction hits, or you close the session, and the in-flight sub-agents take their state with them. There's no checkpoint to resume from. Hours of work, gone — and stitching sub-agents together by hand is fragile long before that happens.

The fix

Agent Flow turns a plain-English goal into a durable, resumable workflow. Each step is an LLM doing the work, but the control flow is deterministic and every run is a state.json on disk — so a Stop hook auto-continues it across turns, and a long job survives parallelism, interruptions, and even compaction. Close the laptop at step 47 of 100; reopen and it resumes at 48.

You don't type the flags. You describe the goal; the skill writes the exact commands. The /agentflow:… lines in this README show what runs under the hood, for transparency.

Familiar by design. If you've written a Claude Code skill, you already know the shape: a WORKFLOW.md is the same idea as a SKILL.md — frontmatter plus one heading per step — run right in your session. No server, no Python, no extra services — it runs on the same Node that powers Claude Code; Agent Flow just adds the durable state and deterministic control flow that turn those steps into a resumable pipeline.

You ▸ /agentflow:create-workflow "review every .cs file in src → group findings by area → one digest"

Agent Flow ▸ designed `code-review` (discover → review·foreach → group → digest·reduce). Run it?

You ▸ /agentflow:run-workflow workflows/code-review/WORKFLOW.md

Agent Flow ▸ stage 2/4 · review  ███████░░░  142/200 items · 3 agents · $0.38
             — context compaction —
             ▸ resumed from disk at item 143/200 …
             ✓ done · digest written to ./code-review-digest.md

Quickstart

Install (inside Claude Code):

/plugin marketplace add AleSaiani/agentflow
/plugin install agentflow@agentflow

Describe → build → run. You say what you want; Agent Flow writes a self-contained, movable folder — a human-readable WORKFLOW.md (frontmatter + one heading per stage, like a SKILL.md) plus any helper script, referenced relatively so the folder runs anywhere:

/agentflow:create-workflow "review every .cs file in src → group findings by area → one digest"
/agentflow:run-workflow   workflows/code-review/WORKFLOW.md     # --dry-run to preview · --param k=v for inputs

workflows/code-review/
  WORKFLOW.md       ← the steps, in readable markdown (read it, diff it, hand-edit it)
  discover.mjs      ← a helper script (called via {{workflow.dir}}, so the folder is portable)

See the engine work in ~10 seconds — no LLM at all. From a clone, drive a bundled all-deterministic demo straight to done:

node dist/state/pipe.js init demo --workflow examples/workflows/demo.json
node dist/state/pipe.js drive demo          # → {"action":"done","steps_taken":5}
node dist/inspect.js board                  # the dashboard of all runs

That's the whole machine: a pipeline of stages, driven to completion, fully inspectable. The only thing the LLM adds is the work inside the stages that need judgment. → Full walkthrough in Getting started.

How it works

plain-English goal  →  WORKFLOW.md (deterministic spec)  →  state.json on disk  →  Stop hook resumes across turns

Runs are state on disk. Every invocation is a state.json under .agentflow/<cmd>/<run-id>/. Claude is the only writer (atomic temp-file + rename), so an interrupted write never corrupts a run — and resuming needs only the file, not the chat history.
The Stop hook is the continuity engine. When a turn ends, a hook scans for in-flight runs and, if there's residual work under the cap, resumes them — purely from disk. That's what carries a loop across turns and across compaction, with no daemon. A second hook snapshots the chat transcript to .agentflow/chat/ before compaction.
The determinism boundary. The LLM produces structured data; branching is always code over that data, never a judgment on free text. A fuzzy decision ("did the review find a security issue?") becomes a step whose structured output (a bool/enum) a deterministic predicate reads.

→ The full mental model is in Concepts.

What you get

Durable & resumable — runs survive parallelism, interruption, and context compaction; nothing has to be redone.
A programmatic vocabulary — enumerate (unfold) · foreach (map) · reduce (fold) · group (partition) · repeat/until/while (loop) · step (one unit) — composed by pipe.
Workflows as readable markdown — author WORKFLOW.md, parameterize with params + --param, branch with conditional fork routing, validate a step's structured output against a schema, and nest sub-workflows.
Any runtime for a step — run a prompt inline, in a subagent, or sessionlessly via claude -p / codex exec — the basis for cross-model (adversarial/cooperative) conversations.
Scale safely — split a list across terminals with --shard, or have many workers drain one lock-free queue (atomic-rename claim, no double-processing); coordinate instances via a mailbox.
Stay in control — cost caps (--max-usd) pause a run; a --stop-file pauses on demand; pipe progress shows where you are; board/history/inspect show what's in flight and what it cost.
Reuse results without re-running — inspect results <run> dumps a finished run's outputs (--json / --checklist) so you can transform an expensive run deterministically — e.g. turn 2,000 review results into a CHECKLIST.md without one re-review.
Zero runtime dependencies — TypeScript compiled to a committed dist/; Node builtins only; Apache-2.0.

Why Agent Flow

It earns its keep when work fans out (many items), loops (until a condition holds), or runs long enough to risk interruption. If your task is a single prompt, you don't need it — just ask Claude.

Instead of…	Agent Flow gives you
Plain Claude Code sub-agents	The same agents, but with durable state — they resume across turns and survive compaction, composed with a real vocabulary instead of ad-hoc prompting
LangGraph / LangChain	No Python service, no runtime deps, no separate process — workflows live in your Claude Code session as readable, diffable markdown
Airflow / Prefect / Temporal	No server, no scheduler, no infra — durability is a `state.json` on disk, not a database

Examples

Described in plain language (left) → what Agent Flow runs (right). Full versions in the Cookbook.

You say	It runs
"Audit `src` for bugs and give me a report."	`/agentflow:run-workflow workflows/audit/WORKFLOW.md --param target=src --param glob="*/.cs"`
"Draft every chapter of this outline."	`enumerate` outline → chapters · `foreach` draft each · `reduce` stitch
"Work through every task file in `tasks/`."	`/agentflow:foreach --folder tasks --prompt "Do the task in this file"` (file kanban)
"Keep fixing the build until it passes."	`/agentflow:until --stage "npm run build" --stop "npm run build"`
"Chew through this list from 3 terminals."	`/agentflow:queue --items work.json --prompt "…"` in each terminal
"Have a second model critique the draft, loop until solid."	two `/agentflow:step` (different `--model`) inside `/agentflow:until`
"Review the diff; only deploy if it's safe."	a `step` emits `{blocking}` → `fork` routes to `ship` or `fix`
"Audit `src`, but stop if it passes $5."	`/agentflow:run-workflow workflows/audit/WORKFLOW.md --param target=src --max-usd 5` (pauses at the cap)
"Turn that finished review into a to-do list — don't re-run it."	`inspect results <run> --checklist > CHECKLIST.md` → `/agentflow:checklist`

Commands

You only type a handful — describe what you want and Agent Flow wires the rest. The low-level primitives are building blocks the engine composes for you, kept out of the / menu so it stays focused.

Front door — what you actually type (the /agentflow: menu):

Command	What it does
`do`	Describe a one-off deterministic operation → designs + runs an inline pipeline, then offers to promote it to a saved workflow
`how`	Help desk: "how do I … with Agent Flow?" → the right command + a copy-paste recipe (read-only)
`create-workflow`	Author a reusable `WORKFLOW.md` (validates + previews)
`run-workflow`	Run a workflow end to end (`--dry-run`, `--param k=v`); echoes a progress block each turn
`checklist`	Run a repeatable `- [ ]` to-do list — tick items, write back, resume only the open ones
`workflows` · `board` · `inspect`	Observe: workflow catalog · live dashboard · a run's detail/tree/budget — plus `inspect results <run>` (reuse a finished run) and `pipe progress <run>` (where you are)
`runs`	Control: list jobs in scheduling order, pause/resume one job or the whole engine, set priority, delete or clean up finished runs

Building blocks — composed by the engine, hidden from the / menu (user-invocable: false) but still scriptable via their CLIs: the FP vocabulary enumerate · foreach · reduce · group · repeat/until/while · step · pipe, plus queue (lock-free shared queue), mailbox (directed cross-instance messages), notify (webhook/desktop), history. Sources are pluggable everywhere — a folder, a markdown checklist (- [ ] task {model:opus}), a JSON list, or another run's output.

Shipped workflows — ready-to-run recipes under workflows/: pr-review (adaptive, diff-scoped code review), remediate (fix every finding → loop until green), knowledge-build (repo → structured md docs, bootstrap/update), audit (whole-tree review → digest), gdpr-domain/gdpr-repo, security-domain/security-repo/security-pack, email-auth (SPF/DMARC/DKIM posture), release-gate. List them with /agentflow:workflows (local and shipped, with stage count + params); run one with /agentflow:run-workflow <name>.

Controlling & reusing a run

Once a run is going (or finished) you stay in command — all on durable state, so nothing is lost or redone:

Choose how it executes — --concurrency N / --chunk-size M (parallelism and agent count are separate knobs), --serial, or --execution main-thread for small lists.
Steer it live — create the --stop-file to pause at the next chunk (delete to resume); --max-usd pauses at a cost cap; Esc interrupts and the state is on disk. pipe progress <run> shows stage %, the current phase + item countdown, cumulative agents/$, and what's next.
Run many, in order — when several jobs are in flight, runs shows the queue (priority, then oldest-first) and lets you steer it: runs pause/resume is the global stop button, runs stop <id> pauses just one job, runs priority <id> N bumps it up. runs clean GCs finished runs so the workspace stays tidy. (Pausing is non-destructive — state is kept and the run picks up where it left off.)
Reuse a finished run without re-running — every result is persisted; inspect results <run> --json (or --checklist) operates on the saved items deterministically, so nothing is dropped.

→ Full recipes in the Cookbook.

Documentation

Page	What it covers
Getting started	Install · see the engine work in ~10s (no LLM) · your first real run
Concepts	The mental model: runs as state on disk, the Stop hook, the determinism boundary, the workflow layer
Cookbook	Real scenarios: one command → full workflow → operating at scale → composing operators
Reference	Every command, flag, and CLI subcommand · the WORKFLOW.md schema · sources & views
Production	Running it for real: resilience, cost caps, approval gates, secrets/env, concurrency, audit
Beta testing	A graded protocol to verify the promises live (cross-turn resume, compaction)

New here? Read Getting started, skim Concepts, then keep the Cookbook open and copy from it.

Install & requirements

# inside Claude Code
/plugin marketplace add AleSaiani/agentflow
/plugin install agentflow@agentflow

Local development (point Claude Code at a clone):

git clone https://github.com/AleSaiani/agentflow && cd agentflow
npm install && npm run build
claude --plugin-dir .

Requires Node ≥ 22 and Git Bash (shell stages run under bash for POSIX semantics on every OS). On Windows, Agent Flow auto-prefers Git Bash and skips the WSL/Store bash.exe (System32); set $AGENTFLOW_BASH to point at a specific bash if needed.

Contributing

Issues and PRs are welcome — see CONTRIBUTING.md for dev setup (npm install && npm run build && npm test) and the PR workflow. Security reports go through the security policy, not public issues. Commits follow Conventional Commits.

Status

1.0.0-beta.2. Primitives, the workflow layer (WORKFLOW.md, params, fork, schema validation, stage retry/timeout, approval gate, on-failure), queue/mailbox/step, inspect/board, and the hooks (cross-turn resume + preserve-chat) are implemented and covered by an automated suite (94 tests, run in CI) including a simulated cross-turn loop. The recommended final check before production is a live multi-turn Claude Code session smoke test. Roadmap and deferred items live in CHANGELOG.md.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.claude-plugin		.claude-plugin
.github		.github
dist		dist
docs		docs
examples		examples
hooks		hooks
site		site
skills		skills
src		src
test		test
workflows		workflows
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.nojekyll		.nojekyll
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Flow

The problem

The fix

Quickstart

How it works

What you get

Why Agent Flow

Examples

Commands

Controlling & reusing a run

Documentation

Install & requirements

Contributing

Status

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agent Flow

The problem

The fix

Quickstart

How it works

What you get

Why Agent Flow

Examples

Commands

Controlling & reusing a run

Documentation

Install & requirements

Contributing

Status

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages