Skip to content

GeOhDoubleT/braid

Repository files navigation

braid

Status: v0.1 — first build stage, actively evolving. The end-to-end pipeline (plan → spawn → worker → report → critic → merge-queue with post-merge verify) has been validated with a single worker on a real repository. Running more workers in parallel is supported by the design but not yet stress-tested — see ROADMAP.md.

Expect frequent, visible changes. braid is being built in the open, iteratively, against real use. Design decisions get revised as empirical feedback comes in. Every change is tracked in git — git log is the honest record of what moved when and why. If you pin a specific behavior in your own workflow, pin to a commit/tag rather than to main.

A combination of existing SOTA patterns from open-source coding agent tools, packaged as a small auditable bash harness.

A capable planner model (in v0.1: Claude Opus via Claude Code) acts as orchestrator — planning, decomposing, reviewing. A separate CLI worker (in v0.1: OpenAI Codex) executes under strict contracts. Each worker runs in an isolated git worktree, produces a structured report, and the orchestrator independently verifies the output before a serial merge-queue integrates it with a post-merge-verify safety net.

Think of it as a pair-programming pattern where one agent plans the work, a second agent executes under a binding contract, and a merge-queue enforces integration safety. Multiple strands, one fabric.


Quickstart (for developers in a hurry)

# 1. Clone and link the CLI
git clone https://github.com/GeOhDoubleT/braid.git
cd braid
export PATH="$PWD/bin:$PATH"

# 2. Set up a target repo (clones if URL, validates if path)
braid init https://github.com/you/your-project.git

# 3. Activate the environment + check your tooling
source .env.current
braid doctor

# 4. Try the built-in demo (self-contained Python sandbox)
braid init-demo
braid validate fix-increment

# 5. From a claude-code session in this directory, run /braid-task

What problem does this solve?

Parallel AI coding agents share two persistent failure modes:

  1. Silent scope creep — agents touch files they shouldn't, quietly refactor things they noticed
  2. Uncritical self-reporting — agents claim COMPLETED when tests actually failed; integration breaks later

braid addresses both by separating planning from execution and enforcing independent verification:

  • A Sprint Contract specifies exactly what files a worker may touch, what tests must pass, what it must NOT do
  • The worker runs in a git worktree, sees only its own contract
  • The worker's report is not trusted — the orchestrator re-runs tests itself
  • Merges go through a serial queue with post-merge verification and automatic rollback on conflict

braid does not invent new ideas. It combines patterns you can find elsewhere (see Limitations and prior art) into one opinionated harness, and keeps the implementation small (~500 lines of bash) so you can read, trust, and modify it.


You describe intent — the planner does the rest

You don't need to write structured prompts. Inside a Claude Code session with braid, two slash commands cover the full workflow:

/braid-setup — interactive setup for a new target repo. Claude asks which repo, runs braid init, verifies the environment, installs the worker skill. You just answer the questions.

/braid-task — end-to-end dispatch of a feature or change. You say what you want in one sentence. The command instructs Claude to:

  1. Clarify (max 3 short questions) only if the intent is ambiguous
  2. Explore the target repo first — read the README, scan directory structure, find similar existing features, detect test conventions
  3. Propose 2–4 atomic tasks as a first slice, with an explicit note of what's deferred to later slices
  4. Write contracts once you pick which tasks to dispatch first
  5. Brief per contract — goal, allowed paths, test commands, budget — and wait for your dispatch approval
  6. Spawn, verify, and merge — the worker runs, Claude re-runs the tests itself (independent critic), you approve the merge

Example:

/braid-task

Feature: add a "Leads" entity to our CRM, similar to how Opportunities work.

That's enough. Claude will explore the repo, check whether Leads already exists, propose a decomposition (e.g. "Task 1: entity + migration, Task 2: GraphQL resolver, Task 3: seed data"), and ask which to dispatch first. You never write a YAML contract by hand unless you want to.


What braid does well

  • Strict scope isolation. allowed_paths / forbidden_paths deny-lists are enforced at planning time and re-checked at verify time.
  • Independent post-hoc verification. The orchestrator runs tests itself against the worker's actual diff — no "trust me bro" from the worker.
  • Serial merges with rollback. The merge-queue rebases, squash-merges, and re-runs tests against the merged state. If post-merge verify fails (intent-conflict with a parallel task), it rolls back and generates a rework contract.
  • Worker-infra cleanliness. The contract and discipline files (AGENTS.md, .sprint-contract.yaml) never leak into the target repo.
  • One-file-CLI. bin/braid is a single bash script plus bin/braid-merge-queue. Read it in an afternoon.
  • Two-agent asymmetry. Use the expensive planner model only for planning/review. Workers can be cheap models on fast profiles.

Status — v0.1 (first build stage)

braid is in its first build stage. The full end-to-end pipeline — plan, spawn, worker execution, independent critic, merge queue with post-merge verify and rollback — has been validated with a single worker on a real git repository. Running multiple workers in parallel is supported by the design (bash spawns are cheap and worktrees are isolated), but has not been stress-tested for orthogonal-scope availability, API rate-limit behavior, merge-queue throughput, or observability with many workers. Scaling work is an explicit next step in the roadmap — see ROADMAP.md.

Issues and PRs welcome for anything that breaks or feels wrong.

What braid is NOT (today)

  • Not a multi-vendor adapter. Workers are hard-wired to the OpenAI Codex CLI. Extending to Gemini/Claude workers is feasible but not shipped.
  • Not a hosted service. Everything runs on your machine (or WSL). No server, no daemon.
  • Not a replacement for Claude Code Agent Teams. Agent Teams coordinate Claude-to-Claude with built-in mailbox. braid is Claude-plans / Codex-executes, which Agent Teams does not support.
  • No cross-worker context sharing. Workers are deliberately isolated. If task B depends on task A's output, serialize them — don't parallelize. Cross-worker gossip is an anti-pattern.
  • Single-lane merge queue. No dependency graph between tasks (yet). Parallel tasks must have orthogonal scopes.
  • No observability layer. Status is read from filesystem (braid status) and tmux panes. Usable for ~3 workers, unclear above that — the ROADMAP treats this as an open question, not a commitment.

Architecture at a glance

                     ┌──────────────────────────────┐
                     │  Orchestrator (planner LLM)  │
                     │  — reads CLAUDE.md           │
                     │  — writes Sprint Contracts   │
                     │  — runs Critic step          │
                     └──────────────┬───────────────┘
                                    │
                       braid spawn <task-id>
                                    │
                     ┌──────────────▼──────────────┐
                     │ git worktree (isolated)     │
                     │  + .sprint-contract.yaml    │
                     │  + AGENTS.md                │
                     │  + .git/info/exclude        │
                     └──────────────┬──────────────┘
                                    │
                              codex exec
                                    │
                     ┌──────────────▼──────────────┐
                     │ Worker (Codex CLI)          │
                     │  — reads contract           │
                     │  — runs tests, edits files  │
                     │  — writes atomic report     │
                     └──────────────┬──────────────┘
                                    │
                    reports/<task-id>.md + .exit
                                    │
                     ┌──────────────▼──────────────┐
                     │ Orchestrator: Critic        │
                     │  — re-runs test_commands    │
                     │  — checks scope compliance  │
                     │  — PASS or Rework contract  │
                     └──────────────┬──────────────┘
                                    │
                         braid merge <task-id>
                                    │
                     ┌──────────────▼──────────────┐
                     │ Merge Queue (serial lock)   │
                     │  1. rebase on target        │
                     │  2. squash merge            │
                     │  3. re-run tests on merged  │
                     │     state (post-merge)      │
                     │  4. rollback on failure     │
                     │  5. archive + cleanup       │
                     └─────────────────────────────┘

More detail: docs/ARCHITECTURE.md.


Requirements

Component Purpose Install
bash 4+ CLI is pure bash preinstalled on Linux/macOS/WSL
git 2.30+ worktree operations apt install git / brew install git
tmux (recommended) Multi-pane parallel workers apt install tmux / brew install tmux
OpenAI Codex CLI Worker executor npm install -g @openai/codex
A planner LLM Orchestrator (e.g. Claude Code) per vendor install

braid is developed and tested on Linux (Ubuntu), macOS, and Windows-via-WSL2. Native Windows without WSL is not supported — tmux and POSIX worktree semantics are required.


Docs

  • New to this? Start with docs/GETTING_STARTED.md — from fresh WSL install to a first passing smoke test.
  • Know your tools? docs/ARCHITECTURE.md for the design, docs/CONTRIBUTING.md for how to extend.
  • Why these choices? docs/FOUNDATIONS.md maps every architectural pattern in braid to its academic / industry source.
  • What do real runs look like? docs/FIELD_NOTES.md is a running log of observations from actual sessions — token distributions, self-decomposition events, apparent-hang phenomena, etc. n=1 per entry, not statistics, but honest.

What braid is (and is not)

braid is a combination of existing patterns from open-source coding agent tools and engineering blogs. Nothing here is novel. The contribution — if any — is that these patterns usually show up separately, and packaging them into one small auditable bash harness was missing. braid is an empirical probe, not a proven solution.

Patterns braid uses (and where they come from)

Where braid sits vs. adjacent OSS projects

Each listed project overlaps with braid in at least one column; none combines all of them.

Project Planner/Executor Contract (YAML) File-level deny-list Independent Critic Post-merge verify + rollback Worktree isolation
braid
agent-mux ✓ (JSON) partial
ComposioHQ/agent-orchestrator ~
nwiizo/ccswarm — (Claude-only)
EveryInc/compound-engineering-plugin
xmm/codex-bmad-skills
oh-my-claudecode — (tmux only)
Claude Code Agent Teams ✓ (Claude↔Claude)

Pick a different tool if you need any of:

  • Cross-vendor workers (Gemini, Claude-as-worker, local LLMs) — see CrewAI, LangGraph
  • UI-driven review and approval — see Cognition's Devin, Sourcegraph Amp
  • Dependency graphs between tasks — see LangGraph or Prefect
  • Claude-to-Claude coordination — use Claude Code Agent Teams
  • Mass IaC generation — dedicated frameworks exist for that niche

What we think braid could show empirically

These are measurement opportunities anyone can run with the harness as-is:

  1. Scope-adherence rate — on a fixed task set, measure how often deny-list files are touched with vs. without the deny-list mechanism
  2. Cost-per-accepted-merge — cross-vendor (Opus-plan + Codex-execute) vs. homogeneous (Claude-only, Codex-only) on identical tasks
  3. Post-merge-verify catch rate — how often does post-merge verify reject a merge that looked clean at pre-merge verify?
  4. Parallelism sweet spot — empirical curve of accepted merges per hour vs. parallelism, on real repos
  5. Reasoning-effort elasticity — for a task class, what's the quality/cost frontier across minimal|low|medium|high|xhigh?

None of these are promised. None have been run yet. The harness is small enough that anyone can produce the numbers.


License

MIT — see LICENSE. Open source first. Pull requests welcome, feature-request issues encouraged.

Author

Created by @GeOhDoubleT, 2026. Inspired by public patterns in the multi-agent coding tools space; see Limitations and prior art for attribution.

About

A combination of existing SOTA patterns from open-source coding agent tools, packaged as a small auditable bash harness.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages