Agent Engineering Playbook

A battle-tested process for building software with AI coding agents.

Most teams using AI coding agents (Claude Code, Cursor, Copilot, etc.) have the same experience: the agent writes code fast, but the code doesn't follow your conventions, breaks existing patterns, introduces lint errors, and creates inconsistent commit histories. You spend as much time correcting the agent as you would have spent writing the code yourself.

This playbook solves that. It's a complete engineering process — extracted from a production application — that makes AI agents disciplined contributors rather than chaotic ones. It covers everything from how to configure the agent's instructions, to how quality gates prevent regressions, to how debugging knowledge compounds over time.

What's Inside

The Playbook (playbook.md)

A comprehensive guide (1,400+ lines) covering 13 areas of engineering process, written for a junior engineer setting up a new project:

Section	What It Covers
Philosophy & Principles	Quality ratchet, local CI = remote CI, compound learning, honest opposition
Agent.md as Project Constitution	The single file that makes every agent session productive
Agent Tooling Setup	Serena (MCP), agent settings, memory systems
Skills System	Reusable workflows agents activate on demand (commit, review, brainstorm)
Branching & Worktree Workflow	Parallel-safe isolation for humans and agents working simultaneously
Code Quality Gates	Two tiers: local CI (before push) and remote CI (safety net) — same checks, same source of truth
Code Health Metrics	Maintainability index, complexity, file size, duplication, quality ratchet
CI/CD Pipeline	Single source of truth for checks, dev-first promotion, zero-rebuild deploys
Architecture Patterns	Exception hierarchy, Unit of Work, Repository, task queue protocol
Testing Strategy	Pytest config, fixtures, async patterns, what NOT to test
Documentation Infrastructure	Architecture docs, lessons-learned, compound learning loop
Infrastructure & DevOps	Dev setup, expand/contract migrations, secrets, deployment
Adoption Roadmap	Phased rollout from Day 1 to Month 2+

Copy-Pasteable Examples (examples/)

Every script, config file, and template you need — adapted for portability with ADAPT comments marking project-specific values:

examples/
├── ci-checks.json                         # CI check definitions (single source of truth)
├── scripts/
│   ├── ci_check_local.py                  # Local CI runner (reads ci-checks.json)
│   ├── check_code_health.py               # Code health metrics (MI, CC, SLOC, duplication)
│   ├── quality_delta.py                   # PR quality regression gate (the ratchet)
│   └── check_migration_heads.py           # Alembic migration head validator
├── skills/
│   ├── commit/SKILL.md                    # Commit -> push -> PR -> cleanup workflow
│   ├── code-review/
│   │   ├── SKILL.md                       # Multi-persona review orchestration
│   │   ├── findings-schema.md             # Structured findings format
│   │   └── personas/                      # 5 specialized reviewer personas
│   │       ├── correctness.md             # Logic & edge case reviewer
│   │       ├── testing.md                 # Test coverage reviewer
│   │       ├── project-standards.md       # Convention compliance (adapt this)
│   │       ├── security.md                # Security reviewer (conditional)
│   │       └── adversarial.md             # Failure scenario reviewer (conditional)
│   └── brainstorming/SKILL.md             # Scope-adaptive brainstorming (Light/Standard/Deep)
├── config/
│   ├── mcp.json                           # Serena MCP server config
│   └── claude-settings.json               # Agent permissions
└── docs/
    ├── Agent.md                           # Starter project constitution
    └── lessons-learned/
        ├── README.md                      # Lesson index template
        └── TEMPLATE.md                    # Individual lesson template

The Problem This Solves

Without a structured process, AI-assisted development creates these problems:

Inconsistent quality. One agent session follows your conventions, the next doesn't. You get random commit messages, missing type annotations, lint errors that slip through, and architectural violations that take days to unwind.
No institutional memory. Every agent session starts from zero. It doesn't know your branching strategy, your testing conventions, your exception hierarchy, or the bug you spent 4 hours tracking down last week. You repeat the same corrections session after session.
CI whack-a-mole. Push, wait 5 minutes for CI, discover a lint error, fix it, push again, wait another 5 minutes, discover a type error... This loop destroys flow and creates noisy commit histories.
Quality debt accumulates silently. Without metrics that ratchet, every "just this once" exception becomes permanent. Type suppressions grow, lint suppressions grow, complexity grows — and nobody notices until the codebase is painful to work in.
Debugging knowledge evaporates. You spend 2 hours discovering that SAQ's default timeout is 10 seconds. Next month, someone (or some agent) hits the same problem. The knowledge existed briefly in a conversation, then vanished.

How This Playbook Fixes It

Problem	Solution	Playbook Section
Inconsistent quality	Agent.md — a project constitution every agent session reads	Section 2
No institutional memory	Memory systems + compound learning loop	Section 3, Section 11
CI whack-a-mole	Local CI = Remote CI — same checks, catch everything before push	Section 6
Silent quality debt	Quality ratchet — metrics can only improve, never regress	Section 7
Debugging knowledge lost	Lessons-learned docs that feed back into Agent.md rules	Section 11

Quick Start

Phase 1: Foundation (Day 1-3)

These provide immediate value with minimal effort:

Copy Agent.md to your project root. Edit it to match your conventions.
```
cp examples/docs/Agent.md ./Agent.md
```

Set up local CI. Copy the check definitions and runner script.

cp examples/ci-checks.json ./ci-checks.json
mkdir -p scripts
cp examples/scripts/ci_check_local.py ./scripts/
# Edit ci-checks.json to match your project's commands and paths

Run it before every commit. Your commit skill should invoke this automatically:
```
python scripts/ci_check_local.py --fix
```
The script auto-fixes formatting/linting and validates everything else (types, tests, code health, migrations). The same script runs in your CI workflow — so "local passes" means "CI will pass."

Phase 2: Quality Gates (Week 1-2)

Copy scripts/check_code_health.py and add to your ci-checks.json
Create .type-ignore-threshold with your current count
Mirror your ci-checks.json in your CI workflow

Phase 3: Skills & Documentation (Week 2-4)

Copy skills/commit/ and skills/code-review/ to your project
Start docs/lessons-learned/ after your first non-trivial debugging session
Adapt skills/code-review/personas/project-standards.md to your conventions

Phase 4: Advanced (Month 2+)

Add quality_delta.py for PR regression checks
Set up Serena MCP for semantic code navigation
Add brainstorming skill for design workflows

See the full Adoption Roadmap for details.

Key Concepts

Agent.md — The Project Constitution

A markdown file at your project root that AI coding agents read at the start of every session. It defines rules, conventions, and architectural decisions. Without it, every session starts from zero. With it, the agent knows your branching strategy, testing conventions, exception hierarchy, and file size limits from the first message.

For Claude Code, this file is named CLAUDE.md. For other agents, adapt the name to whatever your tool reads (e.g., .cursorrules, .github/copilot-instructions.md). The content is the same — the playbook uses the generic name Agent.md.

The Quality Ratchet

Metrics can only improve, never regress. Every PR is compared against its merge base:

Added a # type: ignore? Remove one somewhere else.
Introduced a complexity violation? Simplify it.
New public function without type annotations? Add them.

This is enforced by quality_delta.py, which runs on PRs and fails if any metric regressed.

Local CI = Remote CI

A single ci-checks.json file defines every check. The local runner (ci_check_local.py) and the remote CI workflow both read from this file. Committing takes 30-60 seconds longer. But the alternative — push-wait-fail-fix-push-wait-fail-fix — wastes far more time and creates noisy histories.

Multi-Persona Code Review

Five specialized reviewers dispatched in parallel, each focused on what it's best at:

Correctness — logic errors, edge cases, state bugs
Testing — coverage gaps, weak assertions
Project Standards — convention compliance
Security (conditional) — injection, auth, data exposure
Adversarial (conditional) — failure scenarios, cascade failures

Each reviewer has a persona document defining what to hunt for, what to ignore, and how to calibrate confidence.

Compound Learning

A three-step loop that makes your codebase smarter over time:

Consult — Before starting work, search docs/lessons-learned/
Capture — After resolving a non-trivial bug, write a lesson
Promote — When a lesson recurs, elevate it to a rule in Agent.md

Tech Stack

This playbook was developed on a Python/FastAPI + React/TypeScript stack, but the principles and most tooling are adaptable:

Component	Used Here	Alternatives
Python linter/formatter	ruff	black + flake8, pylint
Type checker	mypy	pyright, pytype
Code metrics	radon	wily, complexipy
Duplication	jscpd	CPD, Simian
Dead code	vulture	pylint unused-import
Dependency check	deptry	pip-extra-reqs
Task queue	SAQ	Celery, Dramatiq, Arq
Migrations	Alembic	Django migrations, Flyway
Code navigation	Serena (MCP)	—
Agent	Claude Code	Cursor, Copilot, Aider

Contributing

This playbook is open source. If you've adapted it for a different tech stack, found improvements, or have new skill definitions that would benefit others, PRs are welcome.

License

MIT

Extracted from a production application by Chuck Conway. Built with Claude Code.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
LICENSE		LICENSE
README.md		README.md
playbook.md		playbook.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Engineering Playbook

What's Inside

The Playbook (playbook.md)

Copy-Pasteable Examples (examples/)

The Problem This Solves

How This Playbook Fixes It

Quick Start

Phase 1: Foundation (Day 1-3)

Phase 2: Quality Gates (Week 1-2)

Phase 3: Skills & Documentation (Week 2-4)

Phase 4: Advanced (Month 2+)

Key Concepts

Agent.md — The Project Constitution

The Quality Ratchet

Local CI = Remote CI

Multi-Persona Code Review

Compound Learning

Tech Stack

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Agent Engineering Playbook

What's Inside

The Playbook (playbook.md)

Copy-Pasteable Examples (examples/)

The Problem This Solves

How This Playbook Fixes It

Quick Start

Phase 1: Foundation (Day 1-3)

Phase 2: Quality Gates (Week 1-2)

Phase 3: Skills & Documentation (Week 2-4)

Phase 4: Advanced (Month 2+)

Key Concepts

Agent.md — The Project Constitution

The Quality Ratchet

Local CI = Remote CI

Multi-Persona Code Review

Compound Learning

Tech Stack

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages