A plumb line is the builder's reference of truth — the vertical everything is checked against. Here, the spec is that line.
A small set of skills that take a feature from idea to verified, working code through five single-responsibility stages. Each stage is run by an AI agent. The stages never call each other — they coordinate through files and folders, so any stage can run on its own, and the spec stays the single source of truth from the first interview to the final sign-off.
The design choice that matters: stages coordinate only through files, and each one is forbidden from doing the next one's job. inspector can't edit the spec to make a test pass; builder can't invent requirements the spec doesn't have. The boundaries are what keep it honest.
This is built for a solo developer's own projects, and it's been run end-to-end on real features — spec through signed-off code — not just designed on paper. It's deliberately strongest on reliability and repeatability; the lighter-weight path for very small features is still being tuned, and the machinery for restructuring a spec once a project gets large is specified but intentionally not yet built. It grows when a real project makes a gap hurt — not before.
scaffold ──▶ architect ──▶ foreman ──▶ builder ──▶ inspector
(once) (spec) (blueprint) (code) (proof)
▲ │
└──── fix ◀───┘
| Stage | Owns | Produces |
|---|---|---|
| scaffold | Bootstrapping a project (run once) | Folder conventions, a CLAUDE.md contract, a spec template |
| architect | Defining what to build | Planning/specs/[feature]_spec.md with inline acceptance criteria, a decision log |
| foreman | Planning how to build it | Planning/blueprints/[feature]_BP.md — slices of ordered steps |
| builder | Writing the code | Code + tests, executed one slice at a time, with a deviation log |
| inspector | Proving it works | An evidence report; a dated PASS/FAIL stamp on the blueprint |
Each stage hands off through the filesystem. The user approves between stages — the agent does the work, the human stays the judge.
- The spec is the spine.
architectwrites it,builderchecks its work against it,inspectorverifies the running software against it. The blueprint in the middle is a disposable plan; the spec is truth at both ends and in the middle. - Evidence over assertion. "Done" means shown to work.
inspectorruns the software and captures output — a bare "PASS" is not evidence. - Independence is structural, not promised.
inspectorruns with fresh eyes — ideally as a separate agent that never saw the build — so "no stake in the outcome" is enforced by isolation, not by good intentions. - Convention-coupled, not call-coupled. Skills share documented file contracts (the
CLAUDE.mddeclarations + folder layout), never direct calls. They're sequenced by data dependency, not by a hard-wired chain. - Proportional ceremony. A one-line fix doesn't get a decision doc; a schema change does.
architectsizes its interview to the work; the Change rules in each project'sCLAUDE.mdgive trivia a Quick Path and reserve the Full Path for features and schema. - Gates must earn their place. Every stop, check, and document exists to catch a specific failure. The ones that don't get cut — this framework has been pruned as hard as it's been built.
Design priorities, in order: repeatable > reliable > easy > fast. When they conflict, the higher one wins — a reliability gain is worth a small speed cost, and consistency across runs beats either.
Acceptance criteria live inline in the spec, attached to each feature — not in a separate checklist that drifts. Each is observable and tagged:
**Done when:**
- `POST /login` with valid creds → 200 + a session cookie; bad creds → 401 `[automated]`
- the dashboard's 4 KPI cards sit above the fold at 1280px `[human-required]`
[automated]—foremanplans a committed test that encodes it;builderwrites that test;inspectorruns it. "Automated" means there is a test, not "the agent improvised a check."[human-required]—inspectorcaptures evidence (e.g. a screenshot) but never grades it; the human signs it off.
builderreads the spec, not just the blueprint. If a step contradicts the spec, it stops rather than faithfully building the wrong thing — closing the "telephone game" between plan and intent.builderstops cleanly when stuck. Clear rules: don't improvise a different approach, don't retry forever, never run a destructive action on the blueprint's say-so alone, and leave the codebase in a known state when you stop.- Deviations are an audit trail, not a blocker. When the build diverges from the plan in a behavior-preserving way, it's logged inline and rolled up to
output/deviations/— visible at the end whether or not inspection runs. inspectormay stamp a result but never edit criteria. It can record✅ PASS — <date>on the blueprint; it cannot touch a step, a Done-when, or the spec to make something pass.
| Path | Holds |
|---|---|
Planning/specs/[feature]_spec.md |
Specs — the source of truth |
Planning/specs/archive/ |
Superseded spec versions |
Planning/blueprints/[feature]_BP.md |
Per-feature build plans |
Planning/decisions/ |
Decision logs |
Planning/reference/ |
Glossary, data models — created only when a project grows large enough to need them |
docs/changelog/ |
Changelog |
output/inspect/ |
Inspector reports + evidence |
output/deviations/ |
Builder deviation rollups |
output/surveys/ |
Surveyor drift reports (Survey_YYYY-MM-DD.md) |
output/walkthrough/ |
Walkthrough log + recommendations (…_YYYY-MM-DD.md) |
Nothing empty is pre-created. Folders appear when a stage first writes to them.
scaffold writes these declarations into each project; every skill reads them:
- Test command — how tests run.
- Run/demo command — how to launch the thing so behaviour is visible.
inspectordepends on this being real. - UI evidence tool (web stacks only) — how
inspectordrives the UI and captures screenshots. - Spec/blueprint naming — so every stage finds the same files.
- Change rules — the Quick Path / Full Path for any code change, and who owns the changelog and decision docs. This is doctrine every skill reads, not a skill you invoke.
surveyor— a static spec-vs-code drift report. Reads and compares, never runs the software (that'sinspector's job). Cheap; catches divergence after a refactor, and flags[automated]Done-when items with no backing test.walkthrough— an autonomous maintenance pass over a whole project, fenced to safe (Quick-Path) changes and routing anything bigger to a reviewed recommendations list. Delegates drift detection tosurveyor.
The order-of-operations for any single change isn't a skill — it lives as the Change rules in each project's CLAUDE.md (Quick Path / Full Path), so every skill follows the same doctrine without one having to call another.
MIT © 2026 BytesFromToby