Design by day, execute by night.
A methodology-agnostic agent orchestration platform that automates multi-stage software development workflows, enforces deterministic validation gates around non-deterministic AI agent output, and provides full observability and traceability via LangGraph.
Uses the BMAD Method as its reference implementation — but any team can encode their own development methodology as executable workflows.
flowchart LR
Plan["🌞 Plan (Day)"] --> Dispatch["🚀 Dispatch"]
Dispatch --> Execute["🌙 Execute (Night)"]
Execute --> Validate["✅ Validate"]
Validate -->|Pass| Merge["🔀 PR Ready"]
Validate -->|Fail| Retry["🔄 Retry"]
Retry -->|Budget left| Execute
Retry -->|Exhausted| Halt["🛑 Halt & Report"]
- Why Arcwright AI
- How It Works
- Key Features
- Architecture Overview
- Getting Started
- CLI Reference
- Configuration
- Python API
- Project Status
- Contributing
- License
AI coding agents are capable. The BMAD Method solves context management — structured methodology that produces comprehensive planning artifacts. What's missing is autonomous execution at velocity.
Today, developers manually shepherd AI agents through workflows one conversation at a time — sequential, unvalidated, and unobservable. The ceiling isn't agent intelligence — it's human throughput as the orchestration layer.
Arcwright AI wraps a deterministic shell around non-deterministic agents, enabling you to:
- Plan collaboratively during the day (brainstorming, PRDs, architecture, stories)
- Dispatch automated execution overnight across multiple epics and stories
- Wake up to completed, validated, traceable work
The three-piece puzzle:
| Piece | Role |
|---|---|
| AI Agents (Claude Code) | Capability — execute individual tasks |
| BMAD Method | Context — structured planning artifacts that give agents everything they need |
| Arcwright AI | Velocity — autonomous orchestration that converts plans into working code |
Arcwright AI provides a LangGraph-based orchestration engine with four internal subsystems behind one CLI entry point:
- Orchestration Engine — LangGraph StateGraph for workflow DAG execution with deterministic state transitions
- Validation Framework — artifact-specific validation patterns with retry budgets (V3 reflexion + V6 invariant checks)
- Process Runtime — Claude Code SDK for stateless agent invocation (one fresh session per command)
- SCM Integration — git worktree isolation for safe, parallel agent execution
flowchart TD
CLI["arcwright-ai dispatch --epic EPIC-3"] --> Scope["Scope Selector"]
Scope --> DAG["Dependency Resolution"]
DAG --> Story1["Story 3.1"]
DAG --> Story2["Story 3.2"]
DAG --> Story3["Story 3.3"]
Story1 --> Invoke1["Claude Code SDK"]
Invoke1 --> V3_1["V3 Reflexion"]
V3_1 --> V6_1["V6 Invariants"]
V6_1 -->|Pass| PR1["PR + Provenance"]
V6_1 -->|Fail| Retry1["Retry (up to 5x)"]
Story2 --> Invoke2["Claude Code SDK"]
Story3 --> Invoke3["Claude Code SDK"]
Every execution produces a complete reasoning trail — what was decided, what was rejected, and why. Code review of AI-generated PRs becomes decision-centric ("Do I agree with the choices?") instead of line-by-line reading.
The system halts an epic on unrecoverable failure — no silent breakage, no partial work masquerading as complete. The halt summary reports what succeeded, what failed, why, and exactly where to resume.
Unlike black-box autonomous agents, every decision is logged, every output is validated, every workflow step is observable. You choose exactly what work to dispatch — down to individual stories.
Granular, user-controlled scope selection:
Epic selectors accept all of the following equivalent forms: 3, epic-3, and EPIC-3.
# Dispatch an entire epic
python -m arcwright_ai dispatch --epic 3
# Equivalent epic selector formats
python -m arcwright_ai dispatch --epic epic-3
python -m arcwright_ai dispatch --epic EPIC-3
# Dispatch a single story
python -m arcwright_ai dispatch --story 3.1
# Resume a halted epic from the failure point
python -m arcwright_ai dispatch --epic 3 --resumeSix validation patterns ordered by cost, with artifact-specific pipelines. V6 (deterministic) and V3 (reflexion) are implemented; the rest are planned.
| Pattern | Status | Description | Use Case |
|---|---|---|---|
| V1 | Planned | BMAD native validators | Cross-doc validation workflows |
| V2 | Planned | LLM-as-Judge | Independent model scoring |
| V3 | Implemented | Reflexion | Agent self-critique + revise loop |
| V4 | Planned | Cross-document consistency | Artifact agreement checks |
| V5 | Planned | Multi-perspective ensemble | Parallel persona review |
| V6 | Implemented | Invariant checks | Static rule-based assertions |
See docs/validation-pipeline.md for the full technical reference — V6 check details, V3 reflexion flow, retry mechanics, halt classification, artifact formats, and configuration.
Per-story and per-run cost tracked and reported. You always know what an overnight run costs.
flowchart TD
subgraph Orchestrator["Arcwright AI Orchestrator"]
Engine["LangGraph StateGraph"]
Answerer["Static Rule Answerer"]
Validator["Validation Pipeline"]
SCM["Git Worktree Manager"]
end
subgraph Agents["Agent Layer"]
SDK["Claude Code SDK"]
Agent1["Agent (Story N)"]
Agent2["Agent (Story N+1)"]
end
subgraph Artifacts["BMAD Artifacts"]
PRD["PRD"]
Arch["Architecture"]
Stories["Stories + ACs"]
end
subgraph Output["Run Output"]
Summary["summary.md"]
Provenance["provenance/"]
PRs["Pull Requests"]
end
Artifacts --> Engine
Engine --> Answerer
Engine --> SDK
SDK --> Agent1
SDK --> Agent2
Agent1 --> Validator
Agent2 --> Validator
Validator --> SCM
SCM --> Output
Technology stack:
- Python 3.11+ — core runtime
- LangGraph — workflow DAG execution, state management, observability
- Claude Code SDK — stateless AI agent invocation
- Git (2.25+) — worktree isolation, branch management, PR generation
- Pydantic — config validation, state models
- Click/Typer — CLI framework
- Python 3.11 or later
- Git 2.25 or later
- A Claude API key
- BMAD 6.1 or later — planning artifacts and dev-story workflow features require BMAD 6.1+
- A project with BMAD planning artifacts (PRD, architecture, stories with acceptance criteria)
Create a virtual environment inside your target project, install, and pin the dependency:
cd /path/to/your/project
python3 -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install arcwright-aiTo version-control the dependency, add a requirements.txt to your project:
arcwright-ai>=0.2.20
Then anyone cloning the project can reproduce the environment:
python3 -m venv .venv && source .venv/bin/activate && pip install -r requirements.txtTip — guaranteed local execution: Use
python -m arcwright_aiinstead of the barearcwright-aicommand. This always runs the copy installed in the active virtual environment, never a stale global install.
-
Initialize your project:
python -m arcwright_ai init
This scaffolds the
.arcwright-ai/directory, generates a default config, writes a.env.exampletemplate, adds temp/run directories and.envto.gitignore, and detects existing BMAD artifacts. -
Configure your API key in
.env:Copy the generated
.env.exampleand fill in your values:cp .env.example .env # Edit .env — at minimum set ARCWRIGHT_API_CLAUDE_API_KEY
Arcwright AI reads .env automatically on startup.
-
Validate your setup:
python -m arcwright_ai validate-setup # Check installed version python -m arcwright_ai versionExpect output like:
✅ Claude API key: valid ✅ BMAD project structure: detected at ./_spec/ ✅ Planning artifacts: PRD, architecture, epics found ✅ Story artifacts: 12 stories with acceptance criteria ✅ Arcwright AI config: valid Ready for dispatch. -
Dispatch your first run:
python -m arcwright_ai dispatch --story 1.1
-
Review results in
.arcwright-ai/runs/<run-id>/summary.md
| Command | Description |
|---|---|
arcwright-ai init |
Scaffold .arcwright-ai/, generate default config, detect BMAD artifacts |
arcwright-ai dispatch --epic EPIC-N |
Dispatch full epic for sequential autonomous execution (also accepts N and epic-N) |
arcwright-ai dispatch --epic EPIC-N --resume |
Resume a halted epic from the failure point (also accepts N and epic-N) |
arcwright-ai dispatch --story STORY-N.N |
Dispatch a single story |
arcwright-ai validate-setup |
Validate config, API key, project structure |
arcwright-ai status [--run RUN-ID] |
Show current/last run status with cost summary |
arcwright-ai cleanup |
Clean up git worktrees |
arcwright-ai version |
Print the installed package version |
| Code | Meaning |
|---|---|
0 |
Success |
1 |
General error |
2 |
Validation failure (max retries exhausted) |
3 |
Cost cap reached (graceful halt) |
4 |
Configuration error |
5 |
Timeout |
All commands are composable in shell scripts:
python -m arcwright_ai dispatch --epic 3 && notify-slack "done"Arcwright AI uses a two-tier configuration model with environment variable overrides.
Precedence: env var > project config > global config > defaults
model:
version: "claude-sonnet-4-20250514"
limits:
tokens_per_story: 100000
cost_per_run: 50.00
timeout_per_story: 1800methodology:
artifacts_path: "./_spec"
type: "bmad"
scm:
branch_template: "arcwright-ai/{epic}/{story}"
limits:
tokens_per_story: 80000
cost_per_run: 25.00
retry_budget: 10.00
timeout_per_story: 3600
reproducibility:
enabled: true
retention: "last-10-runs"| Variable | Purpose |
|---|---|
ARCWRIGHT_API_CLAUDE_API_KEY |
Claude API key (set this in .env) |
ARCWRIGHT_AI_MODEL_GENERATE_VERSION |
Override the generate (code-writing) model version |
ARCWRIGHT_AI_MODEL_REVIEW_VERSION |
Override the review model version |
LANGCHAIN_TRACING_V2 |
Set to true to enable LangSmith tracing (see below) |
LANGCHAIN_API_KEY |
Your LangSmith API key (set this in .env) |
LANGCHAIN_PROJECT |
LangSmith project name (default: default) |
Arcwright AI runs on LangGraph, which has built-in support for LangSmith — LangChain's observability platform. When tracing is enabled, every graph invocation (preflight → budget_check → agent_dispatch → validate → commit → finalize) is recorded as a trace you can inspect in the LangSmith web UI.
- See the full execution graph for each story dispatch in real time
- Inspect node inputs/outputs, state transitions, and timing
- Debug validation failures and agent responses visually
- Track token usage and latency across runs
- Create a free account at smith.langchain.com
- Go to Settings → API Keys and create an API key
- Add the following entries to your
.envfile:
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=lsv2_pt_...
LANGCHAIN_PROJECT=arcwright-ai # optional — names your project in the UIThat's it. The next python -m arcwright_ai dispatch will send traces to your LangSmith project automatically — no code changes required.
Unset or remove LANGCHAIN_TRACING_V2, or set it to false. Tracing is off by default; runs produce no external network calls to LangSmith unless you opt in.
Note: LangSmith tracing is independent of the local
.arcwright-ai/runs/artifacts. Run artifacts are always written locally regardless of whether LangSmith is enabled.
The CLI is a thin wrapper around a programmatic Python API:
from arcwright_ai import Orchestrator
o = Orchestrator()
o.dispatch(epic="EPIC-3")
o.dispatch(story="STORY-3.1")
o.status(run_id="RUN-2026-02-26")
o.cost(run_id="RUN-2026-02-26")
o.cleanup()Arcwright AI is in active development and available on PyPI. MVP is complete — the sequential pipeline, V3+V6 validation, decision provenance, halt-and-notify, cost tracking, resume, SCM integration with auto-merge, role-based model registry, and dynamic versioning are all implemented. Automated publishing via GitHub Actions triggers on version tags.
| Phase | Focus |
|---|---|
| MVP | Sequential pipeline, V3+V6 validation, decision provenance, halt-and-notify, cost tracking, --resume |
| Growth | Observe mode, deterministic replay, cost enforcement, parallel execution, public Python API, generated docs |
| Vision | Methodology-agnostic orchestration, multi-user/team coordination, web UI, community workflow marketplace |
This project maintains one customization to the default BMAD dev-story workflow. The change lives in _bmad/bmm/workflows/4-implementation/dev-story/workflow.md and must be re-applied manually after each BMAD framework upgrade.
Note (BMAD 6.1): Several features that were previously custom in this project — Step 3 review-continuation detection, Step 8
[AI-Review]follow-up handling, expanded Step 10 completion/communication, and the enhancedchecklist.md— were adopted natively in BMAD 6.1. Only the Step 9 git diff audit below remains custom.
The BMAD framework is installed into a project, not built alongside it. It ships as a set of files dropped into _bmad/ by the BMAD installer/updater. Because these files are owned by the framework distribution rather than the application project, the standard BMAD .gitignore excludes all of _bmad/ — just as you would not commit node_modules/ or a Python .venv. Committing them would create merge conflicts every time BMAD releases an update.
| File | Change | Reason |
|---|---|---|
_bmad/bmm/workflows/4-implementation/dev-story/workflow.md |
Replaced the stock Step 9 one-liner (Confirm File List includes every changed file) with a full git diff reconciliation audit |
8 of 12 stories across Epics 2–4 (67%) had Dev Agent Record File Lists that did not match the files actually changed. The audit runs git diff --name-only HEAD and git status --short, compares against the story's File List, outputs a reconciliation table, and blocks code-review submission until all discrepancies are resolved. |
A BMAD framework update (via npx bmad-method@<version> install or equivalent) will overwrite workflow.md with the stock original. After each upgrade, open _bmad/bmm/workflows/4-implementation/dev-story/workflow.md, find Step 9, and replace the stock file-list confirmation line with the git diff audit block below.
Stock Step 9 line to replace:
<action>Confirm File List includes every changed file</action>Replace with this git diff reconciliation audit (paste immediately after the <action>Run the full regression suite …</action> line inside Step 9):
Click to expand the full replacement block
<!-- GIT DIFF AUDIT: Reconcile actual changed files against Dev Agent Record File List -->
<action>Run: git diff --name-only HEAD to get all files changed since the last commit</action>
<action>Also run: git status --short to surface any untracked or unstaged files relevant to this story</action>
<action>Extract the current File List from Dev Agent Record → File List section of the story file</action>
<action>Compare the two lists:
- Files in git diff output but NOT in File List → Missing entries (must be added before review)
- Files in File List but NOT in git diff output → Phantom entries (verify intent or remove)
- Files appearing in both → Confirmed ✅
</action>
<action>Output a reconciliation table: filename | in-git-diff | in-file-list | status</action>
<check if="any files appear in git diff but are absent from the File List">
<output>⚠️ FILE LIST DISCREPANCY — Missing Entries
The following changed files are NOT recorded in Dev Agent Record → File List:
{{missing_files}}
You MUST add these entries before the story can move to review.
</output>
<action>Update Dev Agent Record → File List to include all missing files (repo-root-relative paths)</action>
<action>Re-save the story file after updating the File List</action>
</check>
<check if="any files appear in the File List but are absent from git diff output">
<output>⚠️ FILE LIST DISCREPANCY — Phantom Entries
The following files are listed in Dev Agent Record → File List but show no git changes:
{{phantom_files}}
Confirm these files were intentionally included (e.g. deletions tracked separately) or remove them.
</output>
</check>
<check if="git diff output and File List match exactly">
<output>✅ Git diff audit passed — all changed files are accounted for in the File List</output>
</check>
<action if="File List was updated during audit">Re-save the story file before proceeding</action>Symptom of missing customization: Dev agent File Lists stop matching git diff output after a BMAD update. See the troubleshooting entry in arcwright-ai/README.md.
Arcwright AI is open-source and welcomes contributions. Whether you're fixing bugs, adding features, improving documentation, or contributing workflow definitions for your own methodology — all contributions are valued.
git clone https://github.com/ProductEngineerIO/arcwright-ai.git
cd arcwright-ai
pip install -e .- Core orchestration — LangGraph state machine, pipeline execution
- Validation patterns — new validators, artifact-specific pipelines
- Workflow definitions — encode your team's development methodology as an executable workflow
- Documentation — guides, tutorials, API reference improvements
Arcwright AI uses hatch-vcs for automatic versioning from git tags. No files need editing to cut a release.
How versions are resolved:
| Repo state | Resolved version | Example |
|---|---|---|
| Exactly on a tag | Tag version | v0.2.0 → 0.2.0 |
| N commits after a tag | Next-patch dev build | 3 commits after v0.2.0 → 0.2.1.dev3 |
| No tags at all | 0.0.0.dev<N> |
fallback for fresh clones without history |
Merging a PR to main does NOT create a new version tag. Every commit on main after the last tag automatically gets a PEP 440 dev version (e.g., 0.2.1.dev5). This is the expected state between releases.
To cut a release:
# 1. Ensure main is clean and CI is green
git checkout main && git pull
# 2. Create an annotated tag (the ONLY step that matters)
git tag -a v0.2.0 -m "v0.2.0 — brief description of what's in this release"
# 3. Push the tag
git push origin v0.2.0That's it. The next pip install or wheel build will report 0.2.0.
Version scheme: guess-next-dev with no-local-version — produces clean PyPI-compatible versions with no +gABCDEF local identifiers.
Rollback: If hatch-vcs causes issues, revert to a static version = "X.Y.Z" in pyproject.toml and a hardcoded __version__ in __init__.py. No application code depends on the versioning mechanism.
The long-term vision is a community where every methodology trapped in someone's head or a wiki becomes an executable workflow. If you have a structured development process, consider encoding it as an Arcwright AI workflow definition.
This project is licensed under the MIT License — see the LICENSE file for details.