Skip to content

chuckconway/agent-engineering-playbook

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Agent Engineering Playbook

A battle-tested process for building software with AI coding agents.

Most teams using AI coding agents (Claude Code, Cursor, Copilot, etc.) have the same experience: the agent writes code fast, but the code doesn't follow your conventions, breaks existing patterns, introduces lint errors, and creates inconsistent commit histories. You spend as much time correcting the agent as you would have spent writing the code yourself.

This playbook solves that. It's a complete engineering process — extracted from a production application — that makes AI agents disciplined contributors rather than chaotic ones. It covers everything from how to configure the agent's instructions, to how quality gates prevent regressions, to how debugging knowledge compounds over time.

What's Inside

The Playbook (playbook.md)

A comprehensive guide (1,400+ lines) covering 13 areas of engineering process, written for a junior engineer setting up a new project:

Section What It Covers
Philosophy & Principles Quality ratchet, local CI = remote CI, compound learning, honest opposition
Agent.md as Project Constitution The single file that makes every agent session productive
Agent Tooling Setup Serena (MCP), agent settings, memory systems
Skills System Reusable workflows agents activate on demand (commit, review, brainstorm)
Branching & Worktree Workflow Parallel-safe isolation for humans and agents working simultaneously
Code Quality Gates Two tiers: local CI (before push) and remote CI (safety net) — same checks, same source of truth
Code Health Metrics Maintainability index, complexity, file size, duplication, quality ratchet
CI/CD Pipeline Single source of truth for checks, dev-first promotion, zero-rebuild deploys
Architecture Patterns Exception hierarchy, Unit of Work, Repository, task queue protocol
Testing Strategy Pytest config, fixtures, async patterns, what NOT to test
Documentation Infrastructure Architecture docs, lessons-learned, compound learning loop
Infrastructure & DevOps Dev setup, expand/contract migrations, secrets, deployment
Adoption Roadmap Phased rollout from Day 1 to Month 2+

Copy-Pasteable Examples (examples/)

Every script, config file, and template you need — adapted for portability with ADAPT comments marking project-specific values:

examples/
├── ci-checks.json                         # CI check definitions (single source of truth)
├── scripts/
│   ├── ci_check_local.py                  # Local CI runner (reads ci-checks.json)
│   ├── check_code_health.py               # Code health metrics (MI, CC, SLOC, duplication)
│   ├── quality_delta.py                   # PR quality regression gate (the ratchet)
│   └── check_migration_heads.py           # Alembic migration head validator
├── skills/
│   ├── commit/SKILL.md                    # Commit -> push -> PR -> cleanup workflow
│   ├── code-review/
│   │   ├── SKILL.md                       # Multi-persona review orchestration
│   │   ├── findings-schema.md             # Structured findings format
│   │   └── personas/                      # 5 specialized reviewer personas
│   │       ├── correctness.md             # Logic & edge case reviewer
│   │       ├── testing.md                 # Test coverage reviewer
│   │       ├── project-standards.md       # Convention compliance (adapt this)
│   │       ├── security.md                # Security reviewer (conditional)
│   │       └── adversarial.md             # Failure scenario reviewer (conditional)
│   └── brainstorming/SKILL.md             # Scope-adaptive brainstorming (Light/Standard/Deep)
├── config/
│   ├── mcp.json                           # Serena MCP server config
│   └── claude-settings.json               # Agent permissions
└── docs/
    ├── Agent.md                           # Starter project constitution
    └── lessons-learned/
        ├── README.md                      # Lesson index template
        └── TEMPLATE.md                    # Individual lesson template

The Problem This Solves

Without a structured process, AI-assisted development creates these problems:

  1. Inconsistent quality. One agent session follows your conventions, the next doesn't. You get random commit messages, missing type annotations, lint errors that slip through, and architectural violations that take days to unwind.

  2. No institutional memory. Every agent session starts from zero. It doesn't know your branching strategy, your testing conventions, your exception hierarchy, or the bug you spent 4 hours tracking down last week. You repeat the same corrections session after session.

  3. CI whack-a-mole. Push, wait 5 minutes for CI, discover a lint error, fix it, push again, wait another 5 minutes, discover a type error... This loop destroys flow and creates noisy commit histories.

  4. Quality debt accumulates silently. Without metrics that ratchet, every "just this once" exception becomes permanent. Type suppressions grow, lint suppressions grow, complexity grows — and nobody notices until the codebase is painful to work in.

  5. Debugging knowledge evaporates. You spend 2 hours discovering that SAQ's default timeout is 10 seconds. Next month, someone (or some agent) hits the same problem. The knowledge existed briefly in a conversation, then vanished.

How This Playbook Fixes It

Problem Solution Playbook Section
Inconsistent quality Agent.md — a project constitution every agent session reads Section 2
No institutional memory Memory systems + compound learning loop Section 3, Section 11
CI whack-a-mole Local CI = Remote CI — same checks, catch everything before push Section 6
Silent quality debt Quality ratchet — metrics can only improve, never regress Section 7
Debugging knowledge lost Lessons-learned docs that feed back into Agent.md rules Section 11

Quick Start

Phase 1: Foundation (Day 1-3)

These provide immediate value with minimal effort:

  1. Copy Agent.md to your project root. Edit it to match your conventions.

    cp examples/docs/Agent.md ./Agent.md
  2. Set up local CI. Copy the check definitions and runner script.

    cp examples/ci-checks.json ./ci-checks.json
    mkdir -p scripts
    cp examples/scripts/ci_check_local.py ./scripts/
    # Edit ci-checks.json to match your project's commands and paths
  3. Run it before every commit. Your commit skill should invoke this automatically:

    python scripts/ci_check_local.py --fix

    The script auto-fixes formatting/linting and validates everything else (types, tests, code health, migrations). The same script runs in your CI workflow — so "local passes" means "CI will pass."

Phase 2: Quality Gates (Week 1-2)

  1. Copy scripts/check_code_health.py and add to your ci-checks.json
  2. Create .type-ignore-threshold with your current count
  3. Mirror your ci-checks.json in your CI workflow

Phase 3: Skills & Documentation (Week 2-4)

  1. Copy skills/commit/ and skills/code-review/ to your project
  2. Start docs/lessons-learned/ after your first non-trivial debugging session
  3. Adapt skills/code-review/personas/project-standards.md to your conventions

Phase 4: Advanced (Month 2+)

  1. Add quality_delta.py for PR regression checks
  2. Set up Serena MCP for semantic code navigation
  3. Add brainstorming skill for design workflows

See the full Adoption Roadmap for details.

Key Concepts

Agent.md — The Project Constitution

A markdown file at your project root that AI coding agents read at the start of every session. It defines rules, conventions, and architectural decisions. Without it, every session starts from zero. With it, the agent knows your branching strategy, testing conventions, exception hierarchy, and file size limits from the first message.

For Claude Code, this file is named CLAUDE.md. For other agents, adapt the name to whatever your tool reads (e.g., .cursorrules, .github/copilot-instructions.md). The content is the same — the playbook uses the generic name Agent.md.

The Quality Ratchet

Metrics can only improve, never regress. Every PR is compared against its merge base:

  • Added a # type: ignore? Remove one somewhere else.
  • Introduced a complexity violation? Simplify it.
  • New public function without type annotations? Add them.

This is enforced by quality_delta.py, which runs on PRs and fails if any metric regressed.

Local CI = Remote CI

A single ci-checks.json file defines every check. The local runner (ci_check_local.py) and the remote CI workflow both read from this file. Committing takes 30-60 seconds longer. But the alternative — push-wait-fail-fix-push-wait-fail-fix — wastes far more time and creates noisy histories.

Multi-Persona Code Review

Five specialized reviewers dispatched in parallel, each focused on what it's best at:

  • Correctness — logic errors, edge cases, state bugs
  • Testing — coverage gaps, weak assertions
  • Project Standards — convention compliance
  • Security (conditional) — injection, auth, data exposure
  • Adversarial (conditional) — failure scenarios, cascade failures

Each reviewer has a persona document defining what to hunt for, what to ignore, and how to calibrate confidence.

Compound Learning

A three-step loop that makes your codebase smarter over time:

  1. Consult — Before starting work, search docs/lessons-learned/
  2. Capture — After resolving a non-trivial bug, write a lesson
  3. Promote — When a lesson recurs, elevate it to a rule in Agent.md

Tech Stack

This playbook was developed on a Python/FastAPI + React/TypeScript stack, but the principles and most tooling are adaptable:

Component Used Here Alternatives
Python linter/formatter ruff black + flake8, pylint
Type checker mypy pyright, pytype
Code metrics radon wily, complexipy
Duplication jscpd CPD, Simian
Dead code vulture pylint unused-import
Dependency check deptry pip-extra-reqs
Task queue SAQ Celery, Dramatiq, Arq
Migrations Alembic Django migrations, Flyway
Code navigation Serena (MCP)
Agent Claude Code Cursor, Copilot, Aider

Contributing

This playbook is open source. If you've adapted it for a different tech stack, found improvements, or have new skill definitions that would benefit others, PRs are welcome.

License

MIT


Extracted from a production application by Chuck Conway. Built with Claude Code.

About

A comprehensive playbook for engineering teams using AI coding agents. Includes quality gates, CI/CD patterns, skills system, branching workflows, and code health metrics.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors