builder-ai v1.0.0

Your AI assistant will skip the eval, change the prompt without versioning it, add no fallback, and ship without a safety review.

This pack makes that impossible.

Drop one folder into your project. Your AI coding assistant now enforces production standards for every LLM feature — not as suggestions it can ignore, but as gates it cannot pass without evidence.

Mac / Linux / WSL:

bash <(curl -fsSL https://raw.githubusercontent.com/RBraga01/builder-ai/master/install.sh)

Windows PowerShell:

irm https://raw.githubusercontent.com/RBraga01/builder-ai/master/install.ps1 | iex

Works on Claude Code, Codex CLI, Cursor, and OpenCode. Works alongside A Team, builder-design, builder-product, and builder-growth.

The Four Ways LLM Features Fail in Production

Every one of these has happened to a team that was confident before launch:

1. Shipped without an eval "I tested it and it looked good." The feature worked on the 8 examples you chose. It failed on 30% of real traffic. Nobody knew which prompt change caused it because there was no baseline.

2. Prompt changed, nobody noticed "Small tweak." A single instruction shifted pass rate from 89% to 72%. There was no previous version to compare against. The regression took three weeks to diagnose.

3. No fallback when the model misbehaved Timeout at peak load → blank response → support ticket at 3am. "We'll add error handling after launch." You added it at 3am.

4. Shipped without a safety review "Nobody will try that." Someone did, on day two. The injection vector was in the document upload — not the user message — and it had been there since the first commit.

builder-ai makes each of these a gate your AI assistant must pass before marking any LLM task complete.

What's in the Pack

Hard Gates — Cannot Be Skipped

Skill	What It Blocks
`eval-before-ship`	No LLM feature merges without a named eval suite, documented pass rate, failure analysis, and baseline
`prompt-versioning`	No prompt goes to production without a version file in `prompts/` and a CHANGELOG entry
`fallback-required`	No LLM call ships without tested fallback paths for timeout, malformed output, low confidence, and refusal

Workflow Skills — How to Do the Work Right

Skill	What It Enforces
`rag-pipeline-design`	Data audit + query audit before any pipeline decision — no "standard chunking" shortcuts
`model-benchmarking`	Task-specific benchmarking across three tiers before committing to a model
`context-optimization`	Measure → reduce by hierarchy → measure again — not guessing at token savings
`ai-cost-audit`	Token count + call volume + cost at 10× scale before launch, not after the billing alert
`ai-safety-review`	Four-category review with tested attack surfaces before any feature reaches users

Agents — Specialist Roles

Agent	Role	Model
`prompt-engineer`	Writes, versions, and iterates prompts with eval criteria	Sonnet
`eval-designer`	Designs evaluation suites and writes eval harnesses	Sonnet
`rag-architect`	Designs and debugs retrieval pipelines	Opus
`model-selector`	Benchmarks models and recommends the cost-optimal choice	Sonnet
`ai-safety-reviewer`	Reviews for injection, hallucination, abuse, and agentic scope	Opus

How Enforcement Works

Each hard gate defines exactly what an agent must produce — not a checklist to tick, a formatted evidence block it must fill in with real numbers.

An agent reading eval-before-ship cannot say "task complete" without producing:

Eval complete.
Suite: evals/email-classifier/test-set.jsonl — 200 examples
Model: claude-sonnet-4-6, temperature: 0.0, seed: 42
Pass rate: 178/200 = 89% (threshold: ≥ 85% ✓)
Top failure mode: format violation (12 cases — emails > 2000 tokens)
Baseline: v1 = 82% → v2 = 89%, delta: +7pp ✓
Results stored: evals/email-classifier/results-2026-06-07.md

"It looks good" does not fill that template. That is the entire point.

Each skill also lists the Rationalization Red Flags — the exact things teams say when they want to skip the gate — and explains why each one is wrong. The agent has already read the rebuttals.

Integration with A Team

If A Team is installed, builder-ai slots in with no conflicts:

cp -r builder-ai/skills/*         your-project/skills/
cp -r builder-ai/.claude/agents/* your-project/.claude/agents/

The builder-ai agents can be dispatched by the A Team orchestrator like any other specialist. All names are unique across both packs.

Directory Layout After Installation

your-project/
├── skills/                ← drop-in enforcement rules
├── .claude/agents/        ← specialist roles
├── prompts/               ← versioned prompts  (prompt-engineer creates these)
├── evals/                 ← eval suites + results  (eval-designer creates these)
├── benchmarks/            ← model comparison results  (model-selector creates these)
├── cost-audit/            ← cost projections
└── safety-reviews/        ← safety review documents

License

MIT — Ricardo Romão Marques Braga

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.claude/agents		.claude/agents
.github		.github
docs		docs
scripts		scripts
skills		skills
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
install.ps1		install.ps1
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

builder-ai v1.0.0

The Four Ways LLM Features Fail in Production

What's in the Pack

Hard Gates — Cannot Be Skipped

Workflow Skills — How to Do the Work Right

Agents — Specialist Roles

How Enforcement Works

Integration with A Team

Directory Layout After Installation

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

builder-ai v1.0.0

The Four Ways LLM Features Fail in Production

What's in the Pack

Hard Gates — Cannot Be Skipped

Workflow Skills — How to Do the Work Right

Agents — Specialist Roles

How Enforcement Works

Integration with A Team

Directory Layout After Installation

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages