Status: v0.1 — first build stage, actively evolving. The end-to-end pipeline (plan → spawn → worker → report → critic → merge-queue with post-merge verify) has been validated with a single worker on a real repository. Running more workers in parallel is supported by the design but not yet stress-tested — see ROADMAP.md.
Expect frequent, visible changes. braid is being built in the open, iteratively, against real use. Design decisions get revised as empirical feedback comes in. Every change is tracked in git —
git logis the honest record of what moved when and why. If you pin a specific behavior in your own workflow, pin to a commit/tag rather than tomain.
A combination of existing SOTA patterns from open-source coding agent tools, packaged as a small auditable bash harness.
A capable planner model (in v0.1: Claude Opus via Claude Code) acts as orchestrator — planning, decomposing, reviewing. A separate CLI worker (in v0.1: OpenAI Codex) executes under strict contracts. Each worker runs in an isolated git worktree, produces a structured report, and the orchestrator independently verifies the output before a serial merge-queue integrates it with a post-merge-verify safety net.
Think of it as a pair-programming pattern where one agent plans the work, a second agent executes under a binding contract, and a merge-queue enforces integration safety. Multiple strands, one fabric.
# 1. Clone and link the CLI
git clone https://github.com/GeOhDoubleT/braid.git
cd braid
export PATH="$PWD/bin:$PATH"
# 2. Set up a target repo (clones if URL, validates if path)
braid init https://github.com/you/your-project.git
# 3. Activate the environment + check your tooling
source .env.current
braid doctor
# 4. Try the built-in demo (self-contained Python sandbox)
braid init-demo
braid validate fix-increment
# 5. From a claude-code session in this directory, run /braid-taskParallel AI coding agents share two persistent failure modes:
- Silent scope creep — agents touch files they shouldn't, quietly refactor things they noticed
- Uncritical self-reporting — agents claim COMPLETED when tests actually failed; integration breaks later
braid addresses both by separating planning from execution and enforcing independent verification:
- A Sprint Contract specifies exactly what files a worker may touch, what tests must pass, what it must NOT do
- The worker runs in a git worktree, sees only its own contract
- The worker's report is not trusted — the orchestrator re-runs tests itself
- Merges go through a serial queue with post-merge verification and automatic rollback on conflict
braid does not invent new ideas. It combines patterns you can find elsewhere (see Limitations and prior art) into one opinionated harness, and keeps the implementation small (~500 lines of bash) so you can read, trust, and modify it.
You don't need to write structured prompts. Inside a Claude Code session with braid, two slash commands cover the full workflow:
/braid-setup — interactive setup for a new target repo. Claude asks which repo, runs braid init, verifies the environment, installs the worker skill. You just answer the questions.
/braid-task — end-to-end dispatch of a feature or change. You say what you want in one sentence. The command instructs Claude to:
- Clarify (max 3 short questions) only if the intent is ambiguous
- Explore the target repo first — read the README, scan directory structure, find similar existing features, detect test conventions
- Propose 2–4 atomic tasks as a first slice, with an explicit note of what's deferred to later slices
- Write contracts once you pick which tasks to dispatch first
- Brief per contract — goal, allowed paths, test commands, budget — and wait for your dispatch approval
- Spawn, verify, and merge — the worker runs, Claude re-runs the tests itself (independent critic), you approve the merge
Example:
/braid-task
Feature: add a "Leads" entity to our CRM, similar to how Opportunities work.
That's enough. Claude will explore the repo, check whether Leads already exists, propose a decomposition (e.g. "Task 1: entity + migration, Task 2: GraphQL resolver, Task 3: seed data"), and ask which to dispatch first. You never write a YAML contract by hand unless you want to.
- Strict scope isolation.
allowed_paths/forbidden_pathsdeny-lists are enforced at planning time and re-checked at verify time. - Independent post-hoc verification. The orchestrator runs tests itself against the worker's actual diff — no "trust me bro" from the worker.
- Serial merges with rollback. The merge-queue rebases, squash-merges, and re-runs tests against the merged state. If post-merge verify fails (intent-conflict with a parallel task), it rolls back and generates a rework contract.
- Worker-infra cleanliness. The contract and discipline files (
AGENTS.md,.sprint-contract.yaml) never leak into the target repo. - One-file-CLI.
bin/braidis a single bash script plusbin/braid-merge-queue. Read it in an afternoon. - Two-agent asymmetry. Use the expensive planner model only for planning/review. Workers can be cheap models on fast profiles.
braid is in its first build stage. The full end-to-end pipeline — plan, spawn, worker execution, independent critic, merge queue with post-merge verify and rollback — has been validated with a single worker on a real git repository. Running multiple workers in parallel is supported by the design (bash spawns are cheap and worktrees are isolated), but has not been stress-tested for orthogonal-scope availability, API rate-limit behavior, merge-queue throughput, or observability with many workers. Scaling work is an explicit next step in the roadmap — see ROADMAP.md.
Issues and PRs welcome for anything that breaks or feels wrong.
- Not a multi-vendor adapter. Workers are hard-wired to the OpenAI Codex CLI. Extending to Gemini/Claude workers is feasible but not shipped.
- Not a hosted service. Everything runs on your machine (or WSL). No server, no daemon.
- Not a replacement for Claude Code Agent Teams. Agent Teams coordinate Claude-to-Claude with built-in mailbox. braid is Claude-plans / Codex-executes, which Agent Teams does not support.
- No cross-worker context sharing. Workers are deliberately isolated. If task B depends on task A's output, serialize them — don't parallelize. Cross-worker gossip is an anti-pattern.
- Single-lane merge queue. No dependency graph between tasks (yet). Parallel tasks must have orthogonal scopes.
- No observability layer. Status is read from filesystem (
braid status) and tmux panes. Usable for ~3 workers, unclear above that — the ROADMAP treats this as an open question, not a commitment.
┌──────────────────────────────┐
│ Orchestrator (planner LLM) │
│ — reads CLAUDE.md │
│ — writes Sprint Contracts │
│ — runs Critic step │
└──────────────┬───────────────┘
│
braid spawn <task-id>
│
┌──────────────▼──────────────┐
│ git worktree (isolated) │
│ + .sprint-contract.yaml │
│ + AGENTS.md │
│ + .git/info/exclude │
└──────────────┬──────────────┘
│
codex exec
│
┌──────────────▼──────────────┐
│ Worker (Codex CLI) │
│ — reads contract │
│ — runs tests, edits files │
│ — writes atomic report │
└──────────────┬──────────────┘
│
reports/<task-id>.md + .exit
│
┌──────────────▼──────────────┐
│ Orchestrator: Critic │
│ — re-runs test_commands │
│ — checks scope compliance │
│ — PASS or Rework contract │
└──────────────┬──────────────┘
│
braid merge <task-id>
│
┌──────────────▼──────────────┐
│ Merge Queue (serial lock) │
│ 1. rebase on target │
│ 2. squash merge │
│ 3. re-run tests on merged │
│ state (post-merge) │
│ 4. rollback on failure │
│ 5. archive + cleanup │
└─────────────────────────────┘
More detail: docs/ARCHITECTURE.md.
| Component | Purpose | Install |
|---|---|---|
| bash 4+ | CLI is pure bash | preinstalled on Linux/macOS/WSL |
| git 2.30+ | worktree operations | apt install git / brew install git |
| tmux (recommended) | Multi-pane parallel workers | apt install tmux / brew install tmux |
| OpenAI Codex CLI | Worker executor | npm install -g @openai/codex |
| A planner LLM | Orchestrator (e.g. Claude Code) | per vendor install |
braid is developed and tested on Linux (Ubuntu), macOS, and Windows-via-WSL2. Native Windows without WSL is not supported — tmux and POSIX worktree semantics are required.
- New to this? Start with docs/GETTING_STARTED.md — from fresh WSL install to a first passing smoke test.
- Know your tools? docs/ARCHITECTURE.md for the design, docs/CONTRIBUTING.md for how to extend.
- Why these choices? docs/FOUNDATIONS.md maps every architectural pattern in braid to its academic / industry source.
- What do real runs look like? docs/FIELD_NOTES.md is a running log of observations from actual sessions — token distributions, self-decomposition events, apparent-hang phenomena, etc. n=1 per entry, not statistics, but honest.
braid is a combination of existing patterns from open-source coding agent tools and engineering blogs. Nothing here is novel. The contribution — if any — is that these patterns usually show up separately, and packaging them into one small auditable bash harness was missing. braid is an empirical probe, not a proven solution.
- Planner/Executor split — a capable model plans, a cheaper CLI worker executes. Popularized by Anthropic's multi-agent research writeup and Factory AI's Droids. Pragmatic cost report in BSWEN's planner-executor guide.
- Contract-based handoff (YAML) — structured task specs rather than dialogue between agents. Used by agent-mux, EveryInc's compound-engineering-plugin, and codex-bmad-skills.
- Independent post-hoc verification — the planner re-runs tests rather than trusting worker claims. Convergent choice across Sourcegraph Amp's subagent design and cross-model review patterns.
- Strict scope deny-lists — explicit
forbidden_pathsper task. Related philosophy: the "Commands > Prose" rule from Stack Overflow's coding guidelines for AI agents. - Post-merge verify + rollback — run tests on merged state, revert on failure. Articulated by ctx.rs, Why Coding Agents Need a Merge Queue.
- Worktree isolation, no cross-worker communication — each worker runs in its own
git worktree. Used by oh-my-claudecode (tmux multi-agent), ccswarm, parallel-code. Empirical defense of bounded parallelism in Simon Willison's "parallel coding agents lifestyle". - Counter-evidence worth knowing: Cognition's "Don't Build Multi-Agents" warns that parallel subagents produce contradictory implicit decisions. braid's response is contracts as explicit coordination — no agent-to-agent improvisation.
- AGENTS.md convention — shared planner/worker instruction file across tools. See agents.md (Linux Foundation).
Each listed project overlaps with braid in at least one column; none combines all of them.
| Project | Planner/Executor | Contract (YAML) | File-level deny-list | Independent Critic | Post-merge verify + rollback | Worktree isolation |
|---|---|---|---|---|---|---|
| braid | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| agent-mux | ✓ | ✓ (JSON) | partial | — | — | — |
| ComposioHQ/agent-orchestrator | ✓ | ~ | — | — | — | ✓ |
| nwiizo/ccswarm | — (Claude-only) | — | — | — | — | ✓ |
| EveryInc/compound-engineering-plugin | ✓ | ✓ | — | — | — | — |
| xmm/codex-bmad-skills | — | ✓ | — | — | — | — |
| oh-my-claudecode | — | — | — | — | — | — (tmux only) |
| Claude Code Agent Teams | ✓ (Claude↔Claude) | — | — | — | — | ✓ |
Pick a different tool if you need any of:
- Cross-vendor workers (Gemini, Claude-as-worker, local LLMs) — see CrewAI, LangGraph
- UI-driven review and approval — see Cognition's Devin, Sourcegraph Amp
- Dependency graphs between tasks — see LangGraph or Prefect
- Claude-to-Claude coordination — use Claude Code Agent Teams
- Mass IaC generation — dedicated frameworks exist for that niche
These are measurement opportunities anyone can run with the harness as-is:
- Scope-adherence rate — on a fixed task set, measure how often deny-list files are touched with vs. without the deny-list mechanism
- Cost-per-accepted-merge — cross-vendor (Opus-plan + Codex-execute) vs. homogeneous (Claude-only, Codex-only) on identical tasks
- Post-merge-verify catch rate — how often does post-merge verify reject a merge that looked clean at pre-merge verify?
- Parallelism sweet spot — empirical curve of accepted merges per hour vs. parallelism, on real repos
- Reasoning-effort elasticity — for a task class, what's the quality/cost frontier across
minimal|low|medium|high|xhigh?
None of these are promised. None have been run yet. The harness is small enough that anyone can produce the numbers.
MIT — see LICENSE. Open source first. Pull requests welcome, feature-request issues encouraged.
Created by @GeOhDoubleT, 2026. Inspired by public patterns in the multi-agent coding tools space; see Limitations and prior art for attribution.