Source-anchored, Test-driven, Research-grounded, User-gated, Traceable — a spec-first, TDD-enforced development pipeline built on Claude Code's skill/agent architecture.
AI coding tools make it easy to generate code and hard to verify it. The bottleneck has moved from writing to specifying, reviewing, and testing, but most workflows still optimize for speed of generation. STRUT structures the process around the parts that actually break.
Each letter represents a design principle:
- Source-anchored. Every pipeline run starts by scanning your actual codebase. Agents plan from real file paths and existing patterns, not assumptions.
- Test-driven. Tests are written before implementation, and they're the permanent asset. Code is disposable; the test suite is what survives.
- Research-grounded. Every structural decision cites its source: peer-reviewed papers, industry reports, or explicit design judgment. Nothing is vibes-based. See
architectural-decisions.mdfor the full rationale. - User-gated. The pipeline pauses at two high-leverage points (spec approval and PR review) where human judgment converts into the most protection. Automated review runs first so you review pre-filtered output, not raw diffs.
- Traceable. Agents communicate through file contracts in
.pipeline/, not shared context. Every decision has a paper trail from scan evidence → classification → spec → tests → implementation → review.
STRUT is a change-processing pipeline. You describe a change; the pipeline scans your codebase, classifies the change by risk, writes a spec, gets your approval, implements with tests first, runs an automated review chain, and opens a PR. You're prompted at two gates: spec approval (before implementation) and PR review (before merge).
The pipeline has three phases, Read Truth (scan the codebase, classify the change), Process Change (spec → approve → test → implement → review → build), and Update Truth (capture knowledge for the next run), connected by file contracts in .pipeline/ that keep agents isolated from each other's context.
A classification system scales ceremony to risk. Two independent modifiers, trust (fires when the change touches auth, security, schema, or data boundaries) and decompose (fires when the change crosses 2+ architectural boundaries), add or remove pipeline steps based on what the scan finds, not on human guesswork.
View the full pipeline diagram (interactive HTML)
STRUT is not a code generator, a framework, or a language-specific toolkit. It's an orchestration layer that runs your build, lint, typecheck, and test commands through the Read Truth → Process Change → Update Truth cycle described above, with gates at the points where judgment matters.
What STRUT provides:
- A pipeline of orchestrator skills and worker agents under
.claude/ - A classification system (trust / decompose modifiers) that scales ceremony to risk
- Rules files that govern both session-level Claude behavior and pipeline execution
- A file contract system in
.pipeline/that lets agents communicate without contaminating each other's context
What STRUT doesn't provide:
- Opinions about your language, framework, or test runner
- Pre-configured commands for a specific stack
- A database schema, auth system, or any application code
- Claude Code installed (
.claude/rules/auto-loading requires v2.0.64 or later) - Project under version control with a clean branching workflow
- A working build/lint/test pipeline in your project. STRUT orchestrates these, it doesn't provide them
- Familiarity with your project's data-layer paths (for scoping
database.mdif you use it)
Integration splits into three groups: mechanical setup Claude can do for you, domain setup that requires your knowledge, and ongoing practices that happen through normal use.
These are mechanical steps. Claude can do them all in one pass given a stack description. A reasonable first prompt: "My stack is [X]. Do the Section A integration steps from the STRUT README. Flag anything you're uncertain about."
A1. Fill in CLAUDE.md build commands. Replace the commented-out placeholders with your project's actual commands. These are what scripts/build-check.sh will invoke.
A2. Adapt scripts/build-check.sh. The script runs your build/lint/typecheck/test commands and writes a result JSON to .pipeline/build-check/build-check.json. The template ships with placeholder commands; update them to match your project. This is the one file in the architecture that is deliberately stack-specific.
A3. Update architecture.md directory tree. The template assumes a generic layout (app/lib/, app/components/). Update the tree and the shared-logic rule (rule 2) to match your project's actual structure. If your project doesn't have an app/ directory or doesn't separate shared logic/UI that way, Claude can detect the real layout from your project and rewrite accordingly.
A4. Scope or flag database.md. If your stack matches the file's SQL + multi-tenant + RLS baseline (Postgres with Supabase, Prisma, Drizzle, etc.), Claude can uncomment the globs: frontmatter and set the paths to your actual data-layer directories. If your stack doesn't match (NoSQL, single-tenant, no RLS), Claude should flag the file as needing replacement rather than silently converting it, since replacing these rules is a judgment call you should review.
A5. Add your stack's commands to .claude/settings.json. The template ships with git, file-manipulation, and general shell commands pre-authorized in the allow list, but not language-specific commands. Claude needs to add entries for your build, lint, typecheck, test, and package-manager commands so those don't interrupt the pipeline with permission prompts. Examples:
| Stack | Entries to add to allow |
|---|---|
| npm | Bash(npm run build:*), Bash(npm run lint:*), Bash(npm run typecheck:*), Bash(npm test:*), Bash(npm install:*), Bash(npx:*), Bash(node:*) |
| Python | Bash(python:*), Bash(python3:*), Bash(pip install:*), Bash(pytest:*), Bash(mypy:*), Bash(ruff:*) |
| Rust | Bash(cargo build:*), Bash(cargo test:*), Bash(cargo clippy:*), Bash(cargo check:*) |
| Go | Bash(go build:*), Bash(go test:*), Bash(go vet:*), Bash(gofmt:*) |
A6. Add dependency-manifest protection to .claude/settings.json. Architecture rule 9 says "no new dependencies without approval." Enforce this mechanically by adding the relevant manifest file(s) to the ask list so Claude has to confirm before editing them. Examples:
| Stack | Entries to add to ask |
|---|---|
| npm | Edit(package.json), Edit(package-lock.json), Bash(npm publish:*) |
| Python | Edit(pyproject.toml), Edit(requirements.txt), Bash(twine upload:*), Bash(poetry publish:*) |
| Rust | Edit(Cargo.toml), Bash(cargo publish:*) |
| Go | Edit(go.mod) |
A7. Add language-specific code conventions to operating-rules.md. The file has a TODO block under "Code Generation" for language-specific rules. Claude can generate a reasonable starting set based on your language (destructuring depth for JS/TS, docstring style for Python, error-type conventions for Rust, etc.). Review these; coding conventions are opinionated and Claude is guessing at your preferences.
A8. Uncomment your stack's block in .gitignore. The universal section (environment variables, OS files, editor files, logs, coverage, pipeline state) is active by default. Stack-specific entries (Node, Python, Rust, Go) ship commented out; Claude should uncomment the block matching your stack and delete the others. If skipped, build artifacts and dependency directories will start getting committed.
What happens if you skip Section A: The pipeline won't run. These steps are load-bearing for basic operation. Claude doing them takes minutes; leaving them undone breaks the pipeline at the first change.
These steps encode knowledge Claude doesn't have. Claude can scaffold but not fill in.
B1. Populate security.md MUST NEVER section. Each entry here becomes a negative test under trust ON. Claude can suggest generic invariants ("users from one org must not access another org's data") but the load-bearing entries come from your understanding of what breaks if violated: "payment records must never be modified after the settlement timestamp," "audit logs must never lose entries on partial failure." Wrong MUST NEVERs are worse than missing ones: they become tests that either never fire (useless) or block legitimate behavior (harmful).
Format: MUST NEVER: [constraint] — added [date] from [source]. When trust ON, spec-derive-intent reads this section to populate the spec's must_never[] array; the scan reads it for trust-sensitive definitions.
Entries also arrive automatically over time via the self-improving rules cycle: the scan outputs a rules_gaps entry when it detects a risk signal without a matching rule; post-merge, update-capture proposes specific rule text for you to review and add.
What happens if you skip this: Trust ON changes run without negative tests for your specific invariants. The scan and reviewers still catch general trust violations, but project-specific invariants go unprotected. Not a pipeline breaker, but meaningfully weakens the safety net.
B2. Populate docs/user-context/ (optional but recommended). This folder is read by spec-derive-intent before spec writing. It enriches spec quality by giving the agent access to product decisions, user expectations, and domain vocabulary that aren't obvious from the code alone.
The folder ships empty (contains only docs/user-context/README.md). You populate it yourself, in two phases:
- Initial seeding during integration. Create a few files covering your major product areas, drawing from whatever notes, docs, or wiki pages already exist in your organization. A typical starting point is 3–5 files: one per product area with its key decisions, a glossary for domain vocabulary, a trust-invariants file for expectations that go beyond code-level rules. You can also start with zero files and add them reactively if you prefer.
- Ongoing additions during normal use. Every time you clarify something at the spec approval gate that
spec-derive-intentshould have known, like "published updates are immutable after 24 hours" or "our 'customer' means account holder, not end user," that clarification is a candidate for a new context file.
What belongs: product decisions, trust invariants beyond code-level rules, user expectations, domain vocabulary the AI might misinterpret.
What doesn't belong: code documentation (the scan reads your code directly), API references, deployment config, changelogs.
Any readable structure works. Organize by feature area, user segment, team ownership, or whatever makes sense; the scan reads the folder's contents, not a prescribed schema.
What happens if you skip this: spec-derive-intent still runs but derives intent from scan evidence alone. Specs are thinner and less accurate to your product context. Mismatches get caught at the spec approval gate, but approval takes longer. If specs keep failing review for "missing criteria" or "unclear intent," populating this folder is usually the fix.
These happen naturally as you use the pipeline.
C1. Seed project-specific rules as they emerge. Don't try to front-load every rule. The self-improving cycle will surface gaps through normal pipeline runs. Add rules when you observe a specific failure, not in anticipation. Areas that commonly accumulate rules: commit message format, branch naming, PR title conventions, issue/ticket reference patterns.
C2. Populate docs/project/decision-log.md as you make non-obvious decisions. Technology choices, architectural patterns, scope boundaries, deferred work. Entries also arrive from update-capture proposals after pipeline runs. The log is append-only; if a decision is superseded, add a new entry referencing the old one.
C3. Update docs/project/system-map.md as architecture evolves. Data flow, service boundaries, integration points, trust boundaries. Unlike the decision log, this file is edited in place to reflect current state. The scan reads it for grounding when planning changes. Start with whatever exists; fill in sections as they become relevant.
After integration, invoke the pipeline with:
/run-strut <describe the change you want>
The pipeline scans your codebase, classifies the change, and walks through: spec refinement → spec approval gate → implementation (tests first, then code) → review chain → build check → PR. You'll be prompted at two gates: spec approval (before implementation starts) and PR review (before merge).
Add --step to pause after every agent/skill dispatch:
/run-strut --step <describe the change you want>
The pipeline will stop after each step, showing the completed agent, its output file, and the next step. At each pause you can type continue to proceed or abort to stop the pipeline.
Step mode is useful for your first few pipeline runs, when you want to inspect each agent's output before the next one starts. The flag is per-invocation; omitting --step on a resume disables it. On abort, re-invoke with --step to resume from the last checkpoint.
For the full pipeline architecture, see docs/strut-architecture/core-path-architecture.md.
Every pipelined change is classified by truth-classify based on scan evidence. Classification produces two independent modifiers:
Trust ON triggers if the scan detects any of: auth, RLS, schema changes, security boundaries, data immutability, encryption, multi-tenant isolation. Adds MUST NEVER collection, negative criteria, security-review (Opus), describe-flow, mandatory knowledge capture.
Decompose ON triggers if the change crosses 2+ architectural boundaries (UI / server / database). Adds task breakdown (up to 5 tasks), per-task TDD loop, and a gate after task 1 to verify the approach before remaining tasks proceed.
Both modifiers are independent. Combinations: standard (both OFF), trust-only, decompose-only, guarded-decompose (both ON, adds adversarial spec review).
Verifying classification works on your codebase: on the first few changes, check that classification.json in .pipeline/ matches what you'd expect. If trust-sensitive files (auth, migrations, etc.) aren't triggering trust ON, the scan isn't recognizing your project's patterns, and you may need to add rules that help it identify trust-sensitive code.
For worked examples of each classification path, see docs/strut-architecture/modifiers/.
View the sub-orchestrator reference cards (interactive HTML)
You can build your own skills and agents to extend the pipeline:
- Templates:
docs/contributing/templates/has blank templates for agents, skills, and specs. Copy one to start a new component. - Testing:
docs/contributing/testing/has a dual-model test harness that generates and runs tests under both Sonnet and Opus. Runbash docs/contributing/testing/run-tests.sh <component-name>to validate a new component. - Architecture reference:
docs/strut-architecture/has the full design rationale, constraint model, and research citations behind the pipeline's structure.
Modifier-activated plugins. Not every component needs to run on every change. You can build plugins that activate only when a specific modifier fires; the agent reads classification.json for the modifier state and the orchestrator's step sequence includes a conditional guard (if trust_on is false, skip). The existing trust ON plugins, review-security (Opus security audit in the review chain) and impl-describe-flow (data flow description before PR creation), are worked examples. See their agent files and the #### Trust ON Plugin: subsections in the architecture doc for the pattern. Unbuilt plugin points are listed in Section 7 of the architecture doc.
Claude Code auto-loads all markdown files in .claude/rules/ at session start. By default, every rule in every file is in context for every session.
Two mechanisms reduce this load:
Frontmatter scoping. Rules files can include a globs: frontmatter block listing path patterns. The file then loads only when Claude is working with matching files. database.md ships configured for this (commented out until you set your paths). pipeline.md is already scoped to .claude/skills/**, .claude/agents/**, and scripts/**.
Important: use globs: (not the documented paths: syntax). The paths: field has known YAML parsing issues in Claude Code; it silently fails to match. Verify your scoping actually works by running /context in a session and checking which rules files appear in "Memory files."
File roles in the template:
| File | Load scope | Purpose |
|---|---|---|
CLAUDE.md |
Every session | Project identity, build commands, modifier table, pointer to architecture doc |
architecture.md |
Global | Directory structure, naming, design principles |
methodology.md |
Global | Classification, TDD, review chain, spec cycle, anti-rationalization |
operating-rules.md |
Global | Build/test/CI requirements, scope discipline, error recovery |
security.md |
Global | Trust invariants, RLS rules, MUST NEVER constraints |
database.md |
Scope after setup | Data-layer conventions (SQL + multi-tenant + RLS baseline) |
pipeline.md |
Scoped (pre-set) | Skill/agent authoring constraints, file contracts, pipeline execution |
Two files assume this common backend shape: database.md and security.md (RLS section).
If your stack matches the baseline: use the files as-is, just scope database.md (Section A) and populate security.md's MUST NEVER section (Section B).
If your stack differs:
- Replace
database.mdwith rules appropriate to your data layer (NoSQL access patterns, single-tenant query conventions, etc.) - Replace
security.md's RLS section with your equivalent authorization boundary (document-level access rules, row-level filters in the application layer, whatever your stack uses). Keep the Authentication, Data Immutability, Encryption and Secrets, and MUST NEVER sections, as those are stack-agnostic.
Stale .pipeline/ files. The pipeline cleans its own directories between runs, but if you manually interrupt a run and start a different change without invoking /run-strut, old files can persist. If you see agents reading unexpected content, check .pipeline/ and clean it manually (rm -rf .pipeline/spec-refinement .pipeline/implementation .pipeline/build-check .pipeline/update-truth).
Frontmatter scoping silently not working. Both bugs are documented: paths: syntax fails to parse, and occasionally scoped rules load globally anyway. Always verify with /context after setting up scoping. If a scoped file doesn't appear when expected, try the globs: syntax and restart the session.
Assuming the template's directory conventions match yours. architecture.md rule 2 references app/lib/ and app/components/. If your project uses src/ or packages/ or a different structure, update the rule, otherwise Claude will try to put files in directories that don't exist.
Skipping docs/user-context/ setup and wondering why specs are thin. spec-derive-intent works without it, but specs derived from scan evidence alone tend to miss product context. If specs keep failing review for "missing criteria" or "unclear intent," populating docs/user-context/ is usually the fix.
Treating classification as negotiable. Session Claude doesn't classify; truth-classify does, based on scan evidence. If a classification feels wrong, override manually at the gate or add a rule that helps the scan make a better call next time.
| Doc | What it covers |
|---|---|
docs/strut-architecture/core-path-architecture.md |
Full pipeline: phases, agents, skills, file contracts, dispatch sequences |
docs/strut-architecture/architectural-decisions.md |
Why each load-bearing choice was made, with research citations |
docs/strut-architecture/modifiers/ |
Worked examples of all four classification paths |
docs/contributing/ |
Templates and testing tools for building new pipeline components |