DevKit v3: redo around alignment + AFK pipeline (grill-me → to-prd → to-issues → Ralph)

## Problem Statement

The current devkit (v2.2) is unused. It ships ceremony — `.devkit/tasks/{pending,active,archive}/`, structured log trailers (`Task:`, `Decision:`, `Tried:`), three custom agents, an install script — but the things that actually solve the user's pain are missing. Concretely:

1. **Misalignment**. When given a vague problem, Claude commits to an implementation that doesn't match what the user wanted. There's no skill in current devkit that walks the design tree before code is written.
2. **No AFK throughput**. The user is the bottleneck on every task. There's no outer iterator that picks the next task, runs developer → code-reviewer to completion, commits, and picks the next one without manual re-launch.
3. **Exploration → issues conversion is broken**. The Explore phase produces a horizontal map (files, dependencies, integration points) that never converts cleanly into vertical-slice issues a Ralph-style loop can grab. The pipeline stalls before reaching the agent loop.

The user's actual daily setup is `~/.claude/rules/*` plus `developer` and `code-reviewer` agents in a near-AFK manual loop. Current devkit's `.devkit/tasks/`, `quick`, `wrap-up`, `brainstorm`, `plan`, `execute`, `bugfix`, `design`, `grepai` skills, `install.sh`, and trailer system are unused.

## Solution

Replace the contents of devkit with the alignment + slicing + AFK pipeline derived from Matt Pocock's published workflow ([mattpocock/skills](https://github.com/mattpocock/skills), [mattpocock/sandcastle](https://github.com/mattpocock/sandcastle), [mattpocock/ai-engineer-workshop-2026-project](https://github.com/mattpocock/ai-engineer-workshop-2026-project)). The new pipeline:

```
idea (gh issue, label `idea`)
  → /grill-me <issue#>     (interview to shared design concept, one Q at a time with recommended answer)
  → /to-prd <issue#>        (synthesize grilling into PRD as issue body, label → `ready`,
                             includes ## QA Checklist for manual verification)
  → /to-issues <issue#>     (slice PRD into ## Slices checklist appended to issue body, ≤10 slices)
  → ./ralph/afk.sh N        (separate terminal: orchestrator + dev + reviewer per slice,
                             ticks checkboxes, commits Refs #parent, posts "Ready for QA" when done)
  → manual QA               (user ticks ## QA Checklist; files new issues via /qa for findings)
  → user closes parent       (after all QA ticks)
```

PRD lives as the GitHub issue body (no separate doc — avoids doc-rot). Slices live as a `## Slices` checklist *inside* the parent issue body (no child issues), capped at 10 per PRD; if more would be needed, the PRD is split. Commits link via `Refs #parent`. The dev → reviewer flow uses subagents (Model 3) inside a `claude --print` orchestrator, fresh process per iteration to preserve smart-zone (~100k tokens). A bootstrap validation harness with a hermetic `gh` shim makes the system testable from a clean machine, with persisted scoring for model-drift detection.

**One pipeline, two trigger modes**: same agents, same skills, same rules, same output — only the trigger differs. **Manual** = the user, in a chat, advances slice-by-slice. **AFK** = `./ralph/afk.sh N` automates the same loop in a separate terminal. There is no system-level distinction between modes. `~/.claude/rules/development-workflow.md` mentions Ralph in one sentence; the orchestrator does not need internal Ralph knowledge.

**QA gate**: when all AFK slices in a PRD are ticked complete, **Ralph does NOT auto-close the parent issue**. It posts a `Ready for QA` comment for *that* parent and continues picking AFK slices from other open `ready` issues. The user opens the actual app, runs through the `## QA Checklist` (3-5 manual-verification steps authored at PRD time by `/to-prd`), and ticks each box manually. QA findings (bugs, taste failures, UX issues) become new GitHub issues via `/qa` skill — they enter Ralph's queue next round. When all `## QA Checklist` items are ticked, the user manually closes the parent issue. This separates auto-verifiable acceptance (slice level: tests pass, types check) from human-taste verification (QA level: looks right, feels right, ships).

**Decision-recording model**: per-feature decisions live inside the PRD-issue itself (the `## Implementation Decisions` and `## Out of Scope` sections capture chosen branches and rejected alternatives from the grilling session). System-wide invariants live in `docs/ai-coding-principles.md` (a documentation file, **not** an orchestrator rule — loaded on demand, not auto-pushed into every session). The existing `~/.claude/rules/development-workflow.md` rule's "append to `docs/decisions.md`" already covers the system-wide architectural-decisions log; no separate `docs/DECISIONS.md` or ADR directory.

**Domain-language model**: `UBIQUITOUS_LANGUAGE.md` only. Matt's stack also has `CONTEXT.md` (per bounded context) and `docs/adr/` (numbered ADR files); for solo single-context use those duplicate `UBIQUITOUS_LANGUAGE.md` and `docs/decisions.md` respectively. Dropped from v1 (Occam).

## User Stories

1. As a solo developer, I want to drop a vague brief into a GitHub issue with `--label idea`, so that I can park it for later without grilling immediately.
2. As a solo developer, when I'm ready to align, I want to run `/grill-me <issue#>` in a chat, so that the agent walks the design tree and asks one question at a time with a recommended answer for each.
3. As a solo developer, I want `/to-prd <issue#>` to synthesize the grilling conversation into a PRD as the issue body, so that I have a durable destination document without separate files that can rot.
4. As a solo developer, in a fresh chat, I want `/to-issues <issue#>` to break the PRD into vertical-slice tracer-bullet tickets appended to the issue body as a `## Slices` checklist, so that there's no fan-out into N child issues.
5. As a solo developer, I want each slice tagged AFK or HITL with Acceptance, Blocked-by, and Covers user stories fields, so that the Ralph loop can deterministically pick the next workable slice.
6. As a solo developer, I want `/to-issues` to enforce a soft cap of ≤10 slices per PRD (if more would be needed, the PRD is too big — `/to-issues` recommends splitting), so that issue bodies stay scannable and PRDs stay focused.
7. As a solo developer, in a separate terminal, I want to run `./ralph/afk.sh 20` (or `/ralph` slash command), so that the loop iterates: pick next AFK slice → developer agent (preloaded `/tdd`) implements → code-reviewer agent reviews → tick checkbox → commit `Refs #parent` → repeat → exit on `<promise>NO MORE TASKS</promise>`.
8. As a solo developer, I want to advance slices manually in a chat (without Ralph), so that I can do the same dev→reviewer cycle interactively when I want oversight. The system has only one pipeline; AFK and manual differ only in who triggers each iteration.
9. As a solo developer, I want each Ralph iteration to start with a fresh `claude --print` process, so that the smart-zone stays clean across slices.
10. As a solo developer, I want the inner `claude --print` to spawn `developer` and `code-reviewer` as subagents (Model 3), so that they reuse my existing agent registry and get isolated contexts.
11. As a solo developer, I want Ralph to use `--dangerously-skip-permissions`, so that no permission prompt halts the AFK loop overnight.
12. As a solo developer, I want the developer agent to be preloaded with the full Matt-bundle `tdd` skill (deep-modules, interface-design, mocking, refactoring, tests guidance), so that red-green-refactor and vertical-slice TDD happen by default without manual `/tdd` invocation.
13. As a solo developer, I want the code-reviewer to receive coding standards + deep-module + interface-design rules + `UBIQUITOUS_LANGUAGE.md` (if present) pushed in via prompt (per Matt's "push for reviewer, pull for implementer"), so that review judgment is consistent.
14. As a solo developer, I want the developer agent to fetch `UBIQUITOUS_LANGUAGE.md` if it exists in the project, so that domain language is respected.
15. As a solo developer, I want `/improve-codebase-architecture` available, so that I can periodically deepen shallow modules using the depth/seam/deletion-test vocabulary. (The skill silently proceeds without `CONTEXT.md`/`docs/adr/`; v1 doesn't ship those.)
16. As a solo developer, I want `/qa` available for interactive durable bug filing into GitHub issues, so that bug reports become Ralph-grabbable slices.
17. As a solo developer, I want `/ubiquitous-language` available, so that the project's glossary stays current.
18. As a solo developer, I want every PRD to include a `## QA Checklist` section (3-5 concrete manual-verification steps), authored at PRD time by `/to-prd`, so that what to verify post-implementation is decided up front, not improvised at QA time.
19. As a solo developer, I want Ralph to NOT auto-close the parent issue when all slices are ticked, so that manual QA is an explicit gate before close. Ralph instead posts a `Ready for QA` comment and stops.
20. As a solo developer, when I run manual QA and find bugs or taste failures, I want to file them as new GitHub issues via `/qa` skill, so that they become next-round Ralph slices instead of getting lost in chat history.
21. As a solo developer, I want to manually tick each `## QA Checklist` item after running through it in the actual app, so that the issue cannot be closed until human-taste verification is complete. After all QA ticks, I close the issue manually.
22. As a solo developer working IN devkit-the-project, I want `docs/ai-coding-principles.md` to capture the constitutional principles (smart zone, push/pull, vertical slices, deep modules, TDD discipline, anti-specs-to-code, grill-me posture, plain-text Q&A, AFK fresh-context, failure-mode awareness), so that the why-it-is-shaped-this-way is a referenceable document. Loaded on demand (not auto-pushed into every Claude Code session). Project-level `CLAUDE.md` directs agents working in this repo to load it.
23. As a solo developer using devkit on other projects, I want `docs/ai-coding-principles.md` to NOT pollute every Claude Code session globally, so that projects with different conventions (e.g., non-TDD ecosystems) aren't constrained by devkit's opinions.
24. As a solo developer, I want `docs/SAFETY.md` to explicitly name the v1 trade — `--dangerously-skip-permissions` without docker — so that the risk profile is documented (rather than being invisible debt) and future-me knows what was accepted.
25. As a maintainer of devkit-the-project, I want canonical sources in `devkit/{rules,agents,skills}/` symlinked into `~/.claude/` via `install.sh`, so that I edit once and everything stays in sync.
26. As a maintainer of devkit-the-project, I want `install.sh init-project <path>` to copy `templates/ralph/` into a project directory, so that each project has its own Ralph script with project-specific feedback loops.
27. As a maintainer of devkit-the-project, I want `make validate` on a fresh machine to bootstrap a tmpdir `CLAUDE_CONFIG_DIR`, install devkit into it, create a fixture project, run all scenarios with a hermetic `gh` shim, and persist scored results, so that I can validate the system from zero without GitHub auth.
28. As a maintainer of devkit-the-project, I want `make validate-live` to run the same scenarios against a real ephemeral GitHub repo (created and deleted within the run), so that the `gh` shim's faithfulness is verified pre-release.
29. As a maintainer of devkit-the-project, I want validation results persisted as `tests/validation/results/<timestamp>-<model-id>.json`, so that `python tests/validation/drift-report.py` can plot pass-rate and LLM-judge scores over time and flag regressions across model versions.
30. As a maintainer of devkit-the-project, I want bats tests for Ralph's slice picker, checkbox flipping, Blocked-by resolution, QA gate (does NOT close issue with unticked QA items), install idempotency, and frontmatter validity, so that deterministic regressions are caught on every commit.
31. As a maintainer of devkit-the-project, I want pytest evals (marked `@pytest.mark.eval`) for every LLM-driven step (grill-me, to-prd, to-issues, developer, code-reviewer), so that drift in model behavior is observable.
32. As a new user landing on the repo, I want `README.md` to be a navigation hub with a pipeline diagram (mermaid) and a scenario index ("new feature from vague idea", "improvement to existing feature", "bug report", "quick known fix", "refactor / shallow modules", "domain language drift", "background throughput") that routes me to the right skill or flow, so that I know which entry point to use without reading every doc.
33. As a new user, I want `README.md` to include a single ATTRIBUTION section pointing at `mattpocock/skills` (MIT license) for the skills derived/copied verbatim from Matt Pocock's repo, so that provenance is credited without per-file footer ceremony.
34. As a new user, I want a `TUTORIAL.md` that walks one canonical example end-to-end with copy-pastable commands and expected behavior at each step (including the QA workflow — Ralph posts Ready for QA, user runs the checklist, ticks boxes, closes), so that I know what working looks like.
35. As a new user, I want `ARCHITECTURE.md` that diagrams the pipeline mechanism, explains smart-zone vs dumb-zone and push vs pull, and shows how skills/agents/rules/Ralph wire together, so that I understand how the system works structurally (separate from why, which is in `docs/ai-coding-principles.md`).
36. As a long-term user, I want `DRIFT.md` that explains how to read validation results and what to do when scores drop, so that I have a playbook for model-update fallout.
37. As a long-term user, I want `TROUBLESHOOTING.md` with common failure modes (Ralph picks no slice, developer agent skips TDD, to-issues produces horizontal slices, install.sh symlink conflicts, Ralph closed issue without QA), so that diagnosis doesn't require re-deriving the system.
38. As a contributor or future-self working IN devkit-the-project, I want a project-level `CLAUDE.md` that points at `docs/ai-coding-principles.md` and the agent registry, so that any agent operating on this repo loads the same constraints without having to discover them.
39. As a maintainer, I want bootstrap work to land on a `v3` branch (not `main`), so that `main` keeps the v2.2 state until the redo passes its own QA Checklist — making mid-bootstrap rollback a one-line `git checkout main` operation.
40. As a maintainer, I want each shipped version tagged with annotated semver tags (`v3.0.0`, `v3.1.0`, `v3.0.1`), so that `git checkout v3.0.0` reproduces a known-good state and `install.sh link-skills` reports what's installed via `git describe`.
41. As an operator running AFK Ralph, I want the developer agent to surface non-obvious judgment calls (ambiguous spec, defaulted choices) as a `Judgment:` line in the commit message, and the code-reviewer to mirror those as a non-blocking `slice-NNN judgment call` comment on the parent issue, so that decisions made without me are visible at QA time without halting the loop.

## Implementation Decisions

**Distribution model**: hybrid. Devkit-the-project is the canonical source. Skills, rules, and workflow agents are user-level (mirrored to `~/.claude/skills`, `~/.claude/rules`, `~/.claude/agents` via symlinks). Only `ralph/` is per-project (because feedback loops are language-specific, e.g., `npm test` vs `pytest`).

**Modules to build / update in devkit-the-project**:

- **`rules/`** — canonical source for `~/.claude/rules/`.
  - `development-workflow.md` — rewritten: pipeline is `/grill-me → /to-prd → /to-issues → Ralph → manual QA → close`. Bypass section retained. Adds **two sentences**: (1) "After slicing, advance manually one slice at a time in chat, or run `./ralph/afk.sh N` from a separate terminal to automate the same loop." (2) "When Ralph posts `Ready for QA`, run the parent issue's `## QA Checklist` manually, tick each box, file follow-ups via `/qa`, and close the parent." Existing "Architectural decisions log" section (append to `docs/decisions.md`) kept.
  - `github-issues.md` — updated: `idea`/`ready` labels keep meaning; slice checklist convention with HITL/AFK as inline tags; `Refs #N` commit linkage; explicit "no child issues per PRD"; ≤10 slices per PRD soft cap; QA gate convention (issue stays open with unticked QA items).
  - `coding-preferences.md`, `safety.md`, `user-interaction.md`, `principles.md`, `identity.md` — kept; identity.md gets a minor edit to reference the new pipeline.

- **`agents/`** — canonical workflow agents.
  - `developer.md` — frontmatter: `skills: [tdd, emil-design-engineering]`. Body: per-slice contract — input is slice block + parent PRD body + parent issue number; output is commit ending `Refs #<parent>` + 2-3 line summary. Self-fetches `UBIQUITOUS_LANGUAGE.md` if present.
  - `code-reviewer.md` — body: receives commit hash + slice acceptance; pushes coding preferences, deep-module rules, interface-design rules, `UBIQUITOUS_LANGUAGE.md` (if present) into prompt; returns `findings: [...]` (file:line) or `clean`.
  - `design-doc-writer.md` — DELETED (replaced by `/to-prd`).

- **`skills/`** — canonical for `~/.claude/skills/`. v1 ships **7**:
  - `grill-me/SKILL.md` (Matt's prompt verbatim)
  - `to-prd/SKILL.md` (Matt's template + body verbatim; Process branched for fresh PRD vs maturing existing `idea` issue; **template adds `## QA Checklist` section** with 3-5 manual-verification checkboxes authored at PRD time)
  - `to-issues/SKILL.md` (concept from Matt; output is `## Slices` checklist via `gh issue edit`, no child issues; ≤10 slices cap; **verifies `## QA Checklist` exists in body before slicing — if missing, recommends user adds one**)
  - `tdd/` (full bundle: SKILL.md + deep-modules.md + interface-design.md + mocking.md + refactoring.md + tests.md — verbatim)
  - `improve-codebase-architecture/` (full bundle: SKILL.md + DEEPENING.md + INTERFACE-DESIGN.md + LANGUAGE.md — verbatim; works without `CONTEXT.md`/`docs/adr/`)
  - `qa/SKILL.md` (verbatim)
  - `ubiquitous-language/SKILL.md` (verbatim)
  - **`domain-model/` — DROPPED from v1**. Deferred to v1.1.

- **`templates/ralph/`** — copied per-project by `install.sh init-project`:
  - `afk.sh` — bash outer loop; per iteration runs `claude --print --dangerously-skip-permissions "<orchestrator-prompt>"`; exits on `<promise>NO MORE TASKS</promise>`.
  - `once.sh` — single-iteration variant for testing.
  - `prompt.md` — orchestrator prompt: parse open `ready` issues, find unchecked AFK slices in `## Slices`, filter by Blocked-by, sort by priority (bug > infra > tracer-bullet > polish > refactor), spawn `developer` subagent with slice + PRD + parent #, on commit spawn `code-reviewer` with commit hash + acceptance, retry developer on findings (max 2), tick checkbox via `gh issue edit --body-file -`. **When all AFK slices in a parent are ticked AND the parent has a `## QA Checklist` section: post a `Ready for QA — run the QA Checklist manually and close when done` comment on the parent issue, then continue picking other parents' slices. Do NOT auto-close. Do NOT tick QA Checklist items.** Emit `<promise>NO MORE TASKS</promise>` when no AFK slices remain across all open `ready` issues.

- **`install.sh`** — subcommands `link-skills` (symlink `devkit/{rules,agents,skills}/*` into `~/.claude/`, idempotent, `--copy` for detached install), `init-project [path]` (copy `templates/ralph/`, drop a `/ralph` slash-command file), `unlink` (reverse).

- **`commands/`** — minimal: a `/ralph` slash command shipped by `init-project` that bash-execs `ralph/afk.sh "$@"` from the project root (passes `--once` through).

- **`fixtures/sample-project/`** — tiny real git repo committed in-tree; synthetic codebase + sample issues; used by all validation scenarios.

- **`tests/`** (bats):
  - `ralph_pick_next.bats`, `ralph_tick_checkbox.bats`, `ralph_no_more_tasks.bats`
  - `ralph_qa_gate.bats` — given fixture issue body with all AFK slices ticked AND a `## QA Checklist` with unticked items: Ralph posts `Ready for QA` comment AND does NOT close the issue. Given a body with all slices ticked AND no QA Checklist: Ralph posts the comment but still does not close (operator closes manually).
  - `install_symlink_idempotent.bats`, `install_copy_mode.bats`
  - `frontmatter_valid.bats`

- **`evals/`** (pytest, marked `@pytest.mark.eval`):
  - `test_grill_me_opens_with_question_and_recommendation.py`
  - `test_to_prd_produces_six_sections_plus_qa_checklist.py` (verifies all 6 sections + `## QA Checklist`)
  - `test_to_issues_produces_vertical_slices.py` (LLM-as-judge)
  - `test_to_issues_caps_at_ten_slices.py`
  - `test_to_issues_warns_if_no_qa_checklist.py`
  - `test_developer_uses_red_green_not_horizontal.py`
  - `test_code_reviewer_catches_internal_mock.py`
  - `conftest.py` (fixtures: tmp project, claude invocation helper)

- **`tests/validation/`**:
  - `gh-shim/gh` — Python executable on PATH during validation; implements `gh issue create/list/view/edit/close/comment --label X --json Y`; stores state in `$TMPDIR/devkit-validation/issues/*.json`.
  - `gh-shim/state.py` — read/write JSON state.
  - `bootstrap.sh` — sets `CLAUDE_CONFIG_DIR=tmp`, prepends `gh-shim/` to PATH, runs `install.sh link-skills --copy`, creates fixture project, asserts symlinks/copies present.
  - `run.py` — orchestrates scenarios; `--live` flag toggles between shim and real `gh` (creates ephemeral `devkit-validation-<ts>-<rand>` repo under `$DEVKIT_VALIDATION_OWNER`, deletes on completion or via orphan-GC at next `--live` start).
  - `scenarios/01-add-feature/`, `scenarios/02-fix-bug/`, `scenarios/03-shallow-modules/`, `scenarios/04-domain-language-conflict/`, `scenarios/05-blocked-by-resolution/`, `scenarios/06-qa-gate/` — each has `brief.md`, fixture overlay, `expected.yaml`. (Scenario 06 specifically asserts QA gate behavior.)
  - `results/<timestamp>-<model-id>.json` — persisted scores.
  - `drift-report.py` — `python drift-report.py --since YYYY-MM-DD` plots pass-rate and LLM-judge mean scores.
  - `smoke-e2e.sh` — manual full-pipeline smoke including QA gate.

- **`README.md`** — navigation hub. Pipeline diagram (mermaid). Scenario index. 3-line install. Pointer to `TUTORIAL.md`. Links to `ARCHITECTURE.md`, `DRIFT.md`, `TROUBLESHOOTING.md`, `docs/ai-coding-principles.md`, `docs/SAFETY.md`. **One ATTRIBUTION section** at the bottom: `mattpocock/skills` (MIT license) credited for verbatim/derived skills. No per-file footers.

- **`CLAUDE.md`** (project-level, in `devkit/`) — short. Says: "Before working in this repo, load `docs/ai-coding-principles.md`. Follow the agent registry. Defer to PRINCIPLES on conflicts."

- **`docs/`**:
  - **`ai-coding-principles.md`** — constitutional principles (Appendix A content). **Doc, not rule.** Loaded on demand by user/agents working in devkit-the-project (via project-level `CLAUDE.md`). NOT auto-pushed into every Claude Code session.
  - **`SAFETY.md`** — names the v1 risk profile explicitly: `--dangerously-skip-permissions` without docker, single-developer trust, git as safety net, what changes in v2 (Sandcastle + docker).
  - `TUTORIAL.md` — copy-pastable canonical example end-to-end **including QA workflow section**.
  - `ARCHITECTURE.md` — pipeline diagram, smart/dumb zone, push vs pull, wiring.
  - `DRIFT.md` — running validation, reading results, response playbook.
  - `TROUBLESHOOTING.md` — common failure modes including QA-gate confusion.
  - `skills/<name>.md` — one-screen page per skill.

- **`Makefile`** — `validate`, `validate-live`, `test` (bats only), `eval` (pytest evals), `lint`.

**Killed from current devkit**: `commands/devkit/`, current `skills/` contents, `agents/design-doc-writer.md`, `.devkit/tasks/`, `.devkit/knowledge/`, trailer convention, `hooks/`, `logs/`, legacy `docs/`, old `templates/`. (`hooks/` and `grepai` disposition pending explicit confirmation; default kill.)

**Smart-zone protection**: Ralph runs `claude --print` once per iteration → fresh process → fresh context. The inner Claude is the orchestrator and spawns `developer` and `code-reviewer` as subagents.

**Permission posture v1**: `--dangerously-skip-permissions` on the inner `claude --print`. No docker. Documented in `docs/SAFETY.md`. v2 with Sandcastle + docker is the upgrade path.

**Quick-drop**: `gh issue create --label idea --title ... --body "<brief>"`.

**Slice format inside parent issue body** (≤10 slices):

```
## Slices

- [ ] **slice-001** — <title> — AFK
  - Acceptance: <verifiable behavior>
  - Blocked by: none
  - Covers user stories: 1, 2, 3
```

**QA Checklist format inside parent issue body** (3-5 items, authored by `/to-prd`):

```
## QA Checklist

- [ ] <concrete user-flow verification>
- [ ] <edge case to manually exercise>
- [ ] <visual/UX taste check>
```

**Ralph priority order**: critical bug > infrastructure > tracer-bullet > polish/quick wins > refactor.

**Retry policy**: max 2 retries of developer on code-reviewer findings before flagging the slice as HITL-needed.

**Validation modes**: hermetic (default) and `--live` (pre-release).

**Decision-recording model**: per-feature in PRD-issue body; system-wide invariants in `docs/ai-coding-principles.md` (doc, on-demand); architectural-decisions log via existing `docs/decisions.md` rule.

**Domain-language model**: `UBIQUITOUS_LANGUAGE.md` only.

**Attribution model**: single `## Attribution` section in `README.md`. `LICENSE-UPSTREAM` file at root contains Matt's MIT text.

**One pipeline, two trigger modes**: same agents/skills/rules. Manual = chat; AFK = `./ralph/afk.sh N`.

**QA gate**: Ralph stops at `Ready for QA`. User ticks `## QA Checklist` manually. User closes the parent issue manually. Findings → `/qa` → new issues.

**Branch and version strategy**:
- **Bootstrap**: 10 slices land on a `v3` feature branch (slice-001 creates it from `main`). Single commit per slice. `main` stays at v2.2 until slice-010 + v1 QA pass — restorable in one line.
- **Ship**: after slice-010 + QA, `git merge --no-ff v3` into `main`, tag annotated `v3.0.0`, push origin (branch + tag).
- **Future versions**: semver with annotated git tags. Major = breaking skill-prompt or agent-input-contract change. Minor = new skill or new scenario (e.g., `/prototype` ships v3.1.0). Patch = fix.
- **Ralph runtime**: commits to current branch. No per-slice branches in v1. v2 (Sandcastle) introduces worktree + temp-branch + merge.
- **Upgrade path**: `git pull && ./install.sh link-skills`. Symlinks auto-reflect new content (they point at paths inside the repo).
- **Version observability**: `install.sh link-skills` prints current devkit `git describe` (tag or SHA) on exit so user knows what's installed.
- **No CHANGELOG file.** Git log + annotated tag messages ARE the history. Solo scale doesn't earn CHANGELOG ceremony.

## Testing Decisions

**What makes a good test**: behavior through public interface. For LLM evals: regex on structural output OR LLM-as-judge with structured JSON for non-regex-able properties.

**Modules tested**:

*Deterministic (bats, every commit)*:
- Ralph slice picker, checkbox flip, NO MORE TASKS emission, **QA gate (no auto-close, posts Ready for QA comment)**.
- `install.sh link-skills` idempotency, `--copy` mode, `init-project`.
- Frontmatter validity for every skill and agent.

*LLM-driven (pytest evals, opt-in/nightly)*:
- `grill-me` opens with question + recommendation.
- `to-prd` produces all six sections + `## QA Checklist`.
- `to-issues` produces vertical slices (LLM-judge), caps at ≤10, warns when no QA Checklist exists.
- `developer` uses red-green not horizontal.
- `code-reviewer` catches seeded antipatterns.

*End-to-end smoke*: `tests/smoke-e2e.sh` — bootstrap → grill → prd → issues → ralph → assert Ready for QA comment posted, parent issue still open. User ticks QA, closes manually.

*Validation harness*: `make validate` + `make validate-live` + `drift-report.py`.

**Prior art**: Matt Pocock's `mattpocock/course-video-manager` uses [evalite](https://github.com/mattpocock/evalite) (TypeScript). We use pytest because Python is preferred.

**Cadence**: bats every commit; pytest evals manual + nightly; `make validate` pre-release.

**Cost guardrails**: evals use `claude-haiku-4-5` for LLM-judge; full target model for grilling/PRD/issues/dev. Per-scenario ≤ $0.50; full suite ≤ $5/run.

## Out of Scope

- **Sandcastle parallel orchestration** — v2.
- **Docker sandboxing** — v2.
- **`domain-model` skill** — v1.1.
- **`/prototype` skill** — Matt-inspired throwaway-prototype-route generator for frontend taste decisions. Useful for "what should this UI look like" branches; out of v1 because frontend work isn't core to the alignment+AFK gap. Defer to v1.1.
- **`CONTEXT.md` / `CONTEXT-MAP.md` / `docs/adr/`** — solo duplicates of `UBIQUITOUS_LANGUAGE.md` + `docs/decisions.md`.
- **GitLab / Gitea / non-GitHub trackers** — v1 is GitHub-only.
- **Local-files-only mode** — divergent code path, rejected.
- **Migration tooling** for current devkit's `.devkit/tasks/`.
- **IDE plugin / web UI**.
- **Multi-repo Ralph orchestration**.
- **Auto-promotion of `idea` → `ready`** — manual.
- **`design-an-interface`, `request-refactor-plan`, `zoom-out`** skills — v1.1.
- **Auto-tick of QA Checklist items** — explicitly rejected. QA is human taste; the system MUST require manual ticks.
- **Auto-close on all-slices-ticked** — explicitly rejected. QA gate is the whole point.
- **Separate `docs/PRINCIPLES.md` or `docs/DECISIONS.md`** — superseded by `docs/ai-coding-principles.md` (doc) and PRD-issue bodies / existing `docs/decisions.md` rule.
- **`ai-coding-principles.md` as auto-loaded rule** — must be a doc, not pollute every session globally.
- **Per-file MIT footers** — single ATTRIBUTION block in README.

## QA Checklist

Manual verification after v1 ships (run on a fresh machine where possible):

- [ ] **Fresh-machine bootstrap**: clone devkit on a clean machine without `gh` auth; run `make validate`; all scenarios pass green; results JSON written to `tests/validation/results/`.
- [ ] **`install.sh link-skills` idempotency**: run twice against `~/.claude/`; second run produces no errors; symlinks for `skills/`, `agents/`, `rules/` all present and pointing at `devkit/`.
- [ ] **`install.sh init-project` in scratch project**: `ralph/{afk.sh, once.sh, prompt.md}` copied; `commands/ralph` slash command exists; running `/ralph --once` from that project starts a Ralph iteration.
- [ ] **`/grill-me` first response shape**: file a vague brief as `idea` GitHub issue; in fresh chat, run `/grill-me <issue#>`; verify the first response is exactly ONE question with a recommended answer (not a plan, not multiple questions).
- [ ] **`/to-prd` synthesis**: after a few grilling exchanges, run `/to-prd <issue#>`; verify body has all six sections (Problem Statement / Solution / User Stories / Implementation Decisions / Testing Decisions / Out of Scope) AND a populated `## QA Checklist` (3-5 items); label flipped from `idea` to `ready`.
- [ ] **`/to-issues` slicing**: in a fresh chat, run `/to-issues <issue#>`; verify body now has `## Slices` with ≤10 items, each with Acceptance / Blocked-by / Covers user stories. If the test PRD is artificially long, verify slicer recommends splitting.
- [ ] **Ralph one-shot**: run `./ralph/once.sh`; verify it picks the first AFK slice, developer subagent runs with `/tdd`, code-reviewer runs after, slice ticks `[ ]→[x]`, commit message ends with `Refs #<parent>`.
- [ ] **QA gate enforcement**: complete all AFK slices in a test PRD via Ralph; run Ralph again; verify Ralph posts a `Ready for QA` comment AND does NOT close the issue. The `## QA Checklist` items remain unticked.
- [ ] **Manual QA closes**: tick `## QA Checklist` items by hand; close issue manually; verify GitHub records the close event correctly.
- [ ] **`/qa` skill files follow-ups**: report a manufactured bug via `/qa` in a chat; verify a new GitHub issue is created with proper format (What happened / Expected / Steps to reproduce); verify Ralph picks it up next round.
- [ ] **Judgment-call surfacing**: review the parent issue for any `slice-NNN judgment call` comments; for each, decide whether the defaulted choice was correct. If wrong, file a corrective issue via `/qa`. Confirm code-reviewer surfaced ambiguities to the issue rather than silently letting them pass.
- [ ] **Drift report**: run `python tests/validation/drift-report.py --since 2026-01-01`; verify it produces output without errors (chart or empty-state message).
- [ ] **TUTORIAL.md cold-read**: read end-to-end as if first time; follow every step; verify everything described actually works on the live system.
- [ ] **`docs/SAFETY.md`** exists and accurately names the v1 trade (no docker, `--dangerously-skip-permissions`, git as safety net).
- [ ] **`docs/ai-coding-principles.md` is NOT a rule**: confirm it's NOT in `~/.claude/rules/`; confirm a fresh Claude Code session in a non-devkit project does NOT load it; confirm a session in devkit-the-project DOES (via project-level `CLAUDE.md`).
- [ ] **`v3.0.0` shipped**: `git tag -l 'v3.*'` shows `v3.0.0`; `git log --oneline main` includes the merge commit at `v3.0.0`; `git push origin main v3.0.0` succeeded; `./install.sh link-skills` prints `Linked devkit @ v3.0.0` (or equivalent `git describe`).

## Slices

(Hand-populated for this bootstrap PRD because `/to-issues` skill doesn't exist yet — the very first thing being built. Future PRDs use the slicer skill once it exists in slice-003.)

- [x] **slice-001** — Cleanup + skeleton — AFK
  - Acceptance: **First action: `git checkout -b v3` from `main`**. All subsequent bootstrap work (slices 001-010) lands on `v3`. Then: old content deleted (`commands/devkit/`, `skills/{brainstorm,plan,execute,quick,design,bugfix,wrap-up,grepai}`, `agents/design-doc-writer.md`, `.devkit/tasks/`, `.devkit/knowledge/`, `hooks/`, `logs/`, legacy `docs/`, old `templates/`). New empty directories created: `skills/`, `agents/`, `rules/`, `templates/ralph/`, `docs/`, `tests/`, `evals/`, `fixtures/`, `tests/validation/`, `commands/`. Kept: `LICENSE`, `README.md` (will be rewritten in slice-010), this PRD's commit history. Single git commit on `v3` titled `slice-001: cleanup + skeleton`.
  - Blocked by: none
  - Covers user stories: (no stories — pure tree-shaping; symlinking is slice-006's job)

- [x] **slice-002** — Verbatim skill imports + LICENSE-UPSTREAM — AFK
  - Acceptance: `skills/grill-me/SKILL.md` copied verbatim from `mattpocock/skills`. `skills/tdd/{SKILL,deep-modules,interface-design,mocking,refactoring,tests}.md` copied verbatim. `skills/qa/SKILL.md` copied verbatim. `skills/ubiquitous-language/SKILL.md` copied verbatim. `skills/improve-codebase-architecture/{SKILL,DEEPENING,INTERFACE-DESIGN,LANGUAGE}.md` copied verbatim. Total 13 files. `LICENSE-UPSTREAM` at repo root with Matt's MIT text. All frontmatter parses (run `frontmatter_valid.bats` if it already exists, else manual).
  - Blocked by: slice-001
  - Covers user stories: 2, 12, 13, 14, 15, 16, 17, 20 (the `/qa` skill itself, which enables QA findings → new issues), 33 (LICENSE-UPSTREAM file half; README ATTRIBUTION half is slice-010)

- [x] **slice-003** — to-prd, to-issues, ralph templates — AFK
  - Acceptance: `skills/to-prd/SKILL.md` authored: Matt's body + Process branched for fresh-PRD vs maturing-`idea`-issue + template includes `## QA Checklist` section. `skills/to-issues/SKILL.md` authored: ≤10 slice cap, `## Slices` checklist via `gh issue edit`, no child issues, verifies `## QA Checklist` exists in body before slicing. `templates/ralph/{afk.sh,once.sh,prompt.md}` authored: `claude --print --dangerously-skip-permissions`, `<promise>NO MORE TASKS</promise>` sentinel, QA gate (post `Ready for QA` comment, do NOT close). Frontmatter valid on both skills.
  - Blocked by: slice-001
  - Covers user stories: 3, 4, 5 (slices tagged AFK/HITL with required fields — `/to-issues` produces this format), 6, 7, 9, 10, 11, 18 (`## QA Checklist` template in `/to-prd`), 19 (Ralph no-auto-close behavior), 21 (Ralph does NOT tick QA boxes — leaves them for human ticking)

- [x] **slice-004** — Workflow agents rewrite — AFK
  - Acceptance: `agents/developer.md` frontmatter has `skills: [tdd, emil-design-engineering]`; body specifies slice + parent PRD body + parent issue # input; self-fetches `UBIQUITOUS_LANGUAGE.md` if present; commits with `Refs #<parent>`; **when developer makes a non-obvious judgment call (ambiguous spec, defaulted choice, assumption), commit message body includes a `Judgment: <one-line rationale>` line so downstream review can detect it**. `agents/code-reviewer.md` body pushes coding preferences + deep-module rules + interface-design rules + UBIQUITOUS_LANGUAGE.md (if present) inline; accepts commit hash + slice acceptance; returns `findings: [...]` or `clean`; **when commit message contains a `Judgment:` line, code-reviewer additionally posts a comment on the parent issue formatted as: `**slice-NNN judgment call** — Ambiguity: <what>. Defaulted to: <choice>. Why: <rationale>. Review at QA — file new issue if wrong.` This is non-blocking — loop continues; comment is for human-QA visibility.** Both have valid frontmatter (manual YAML check or via existing tooling — `frontmatter_valid.bats` doesn't exist yet; will re-verify once slice-007 lands).
  - Blocked by: slice-002 (needs `tdd` skill referenced in frontmatter)
  - Covers user stories: 12, 13, 14, 41 (judgment-call surfacing)

- [x] **slice-005** — Rules updates — AFK
  - Acceptance: `rules/development-workflow.md` rewritten: pipeline section is `/grill-me → /to-prd → /to-issues → Ralph → manual QA → close`; bypass section retained; QA gate sentence added; existing `docs/decisions.md` log rule kept. `rules/github-issues.md` updated: slice convention + ≤10 cap + no-child-issues + QA gate convention. `rules/identity.md` updated to reference new pipeline. Other rule files (`coding-preferences.md`, `safety.md`, `user-interaction.md`, `principles.md`) unchanged.
  - Blocked by: slice-003 (rules reference skills that must exist)
  - Covers user stories: 8

- [x] **slice-006** — install.sh + Makefile + project CLAUDE.md — AFK
  - Acceptance: `install.sh` has subcommands `link-skills` (idempotent symlink, `--copy` mode), `init-project [path]` (copies `templates/ralph/`, drops `commands/ralph` slash command), `unlink`. `link-skills` prints current devkit `git describe` on exit (e.g., `Linked devkit @ v3.0.0` or `@ <sha>`). `Makefile` has targets `validate`, `validate-live`, `test`, `eval`, `lint`. `devkit/CLAUDE.md` exists, short, points at `docs/ai-coding-principles.md` and the agent registry. After running `./install.sh link-skills`, `~/.claude/skills/grill-me/SKILL.md` resolves to `devkit/skills/grill-me/SKILL.md`.
  - Blocked by: slice-005
  - Covers user stories: 25, 26, 38

- [x] **slice-007** — bats tests + frontmatter check + fixture project — AFK
  - Acceptance: `tests/{ralph_pick_next,ralph_tick_checkbox,ralph_no_more_tasks,ralph_qa_gate,install_symlink_idempotent,install_copy_mode,frontmatter_valid}.bats` authored and all pass. `fixtures/sample-project/` exists as a tiny real git repo committed in-tree (synthetic codebase + sample issues for use by validation scenarios). `make test` passes.
  - Blocked by: slice-006
  - Covers user stories: 30

- [x] **slice-008** — gh shim + bootstrap.sh + scenarios — AFK
  - Acceptance: `tests/validation/gh-shim/{gh,state.py}` faithfully implements `gh issue create/list/view/edit/close/comment --label X --json Y`, state in `$TMPDIR/devkit-validation/issues/*.json`. `tests/validation/bootstrap.sh` sets `CLAUDE_CONFIG_DIR=tmp`, prepends shim to PATH, runs `install.sh link-skills --copy`, creates a fresh fixture project. `tests/validation/scenarios/{01-add-feature,02-fix-bug,03-shallow-modules,04-domain-language-conflict,05-blocked-by-resolution,06-qa-gate}/` each have `brief.md` + fixture overlay + `expected.yaml` (structural assertions + LLM-judge keys).
  - Blocked by: slice-007
  - Covers user stories: 27, 28

- [x] **slice-009** — pytest evals + run.py + drift-report + smoke E2E — AFK
  - Acceptance: `evals/{test_grill_me_opens_with_question_and_recommendation,test_to_prd_produces_six_sections_plus_qa_checklist,test_to_issues_produces_vertical_slices,test_to_issues_caps_at_ten_slices,test_to_issues_warns_if_no_qa_checklist,test_developer_uses_red_green_not_horizontal,test_code_reviewer_catches_internal_mock,conftest}.py` authored, all marked `@pytest.mark.eval`. `tests/validation/run.py` orchestrates scenarios, `--live` flag toggles real-`gh` mode, ephemeral repo created/deleted (with orphan-GC). `tests/validation/drift-report.py` plots pass-rate and LLM-judge mean scores from `results/`. `tests/smoke-e2e.sh` exercises full pipeline including QA gate. `make validate` succeeds end-to-end against `fixtures/sample-project/`.
  - Blocked by: slice-008
  - Covers user stories: 27, 28, 29, 31

- [ ] **slice-010** — All docs + README + ATTRIBUTION — HITL
  - Acceptance: `docs/ai-coding-principles.md` authored from this PRD's Appendix A (verbatim transfer). `docs/SAFETY.md` names v1 risk profile explicitly. `docs/ARCHITECTURE.md` has pipeline diagram (mermaid), smart/dumb zone, push/pull, wiring, distinct from the why. `docs/TUTORIAL.md` walks one canonical example end-to-end including QA workflow section (Ralph posts Ready for QA → user ticks → user closes). `docs/DRIFT.md` explains validation reading + model-update playbook. `docs/TROUBLESHOOTING.md` covers common failure modes including QA-gate confusion. `docs/skills/<name>.md` has one-screen page per v1 skill — 7 files: `grill-me`, `to-prd`, `to-issues`, `tdd`, `improve-codebase-architecture`, `qa`, `ubiquitous-language`. (No `domain-model` page; it's v1.1 — when added, ship its doc page alongside.) `README.md` is navigation hub: 3-sentence what/why, mermaid pipeline diagram, scenario index, 3-line install, links to all docs, **single ATTRIBUTION section** crediting `mattpocock/skills` (MIT). HITL: user reviews TUTORIAL by following it end-to-end on a real project. **After v1 QA Checklist passes (manual ticks on this issue): `git checkout main && git merge --no-ff v3` (commit message: `v3.0.0 — alignment + AFK pipeline`), then `git tag -a v3.0.0 -m "DevKit v3.0.0 — see issue #1"`, then `git push origin main v3.0.0`. Verify `git describe --tags` on `main` returns `v3.0.0`. Close issue #1 manually after confirming all QA boxes ticked AND tag pushed.**
  - Blocked by: slice-009
  - Covers user stories: 22, 23, 24, 32, 33, 34, 35, 36, 37, 38

## Further Notes

- **Bootstrap caveat**: this PRD's `## Slices` section was hand-populated, not produced by `/to-issues`, because `/to-issues` itself is what slice-003 creates. Future PRDs run through the real slicer.
- **Bootstrap operational order**: slices 001-006 are hand-driven (use existing `~/.claude/agents/{developer,code-reviewer}` agents in their current state — they get rewritten in slice-004 mid-stream). **After slice-006, run `./install.sh init-project .` against devkit-the-project itself** so the `/ralph` slash command and `templates/ralph/` are wired locally. From there, AFK Ralph can drive slices 007-009 (`./ralph/afk.sh` against this same issue #1). Slice-010 is HITL by design (TUTORIAL needs human eyes).
- **Source repos** for skill content: [mattpocock/skills](https://github.com/mattpocock/skills) and [mattpocock/ai-engineer-workshop-2026-project](https://github.com/mattpocock/ai-engineer-workshop-2026-project), cloned to `/tmp/mattpocock-skills/{repo,workshop}/` for reference. If those tmp paths are gone in a future session, `git clone` fresh.
- **Sandcastle** ([mattpocock/sandcastle](https://github.com/mattpocock/sandcastle)) is the v2 parallelization upgrade target.
- **Grilling discipline** (one question at a time, recommended answer per question, walk the design tree) preserved verbatim in `grill-me` skill — this PRD is the synthesis of a 12-question grill-me session plus a final QA-workflow alignment round.
- **Hooks/grepai disposition**: pending explicit confirmation. Default action: kill in slice-001. Speak up before slice-001 ships if either should survive.
- **`/ralph` slash command** supports `--once` flag passthrough to `ralph/once.sh`. Trivial.
- **Inspiration sources**: ["Software Fundamentals Matter More Than Ever" — Matt Pocock](https://www.youtube.com/watch?v=v4F1gFy-hqg), ["Essential Skills for AI Coding from Planning to Production" — Matt Pocock workshop](https://www.youtube.com/watch?v=-QFHIoCo-Ko).
- **Reuse posture summary**:
  - **Verbatim** (drop-in copy): `grill-me`, all 6 `tdd/` files, all 4 `improve-codebase-architecture/` files, `qa`, `ubiquitous-language` = **13 files**
  - **Mix** (Matt's body, our process): `to-prd/SKILL.md` = 1 file
  - **New** (concept from Matt, our output): `to-issues/SKILL.md`, `ralph/prompt.md` = 2 files
  - **Adapted scripts**: `ralph/afk.sh`, `ralph/once.sh` = 2 files
  - **Rewrite existing**: `developer.md`, `code-reviewer.md`, `development-workflow.md` = 3 files
  - **Update existing**: `github-issues.md`, `identity.md` = 2 files
  - **Kept**: `coding-preferences.md`, `safety.md`, `user-interaction.md`, `principles.md` = 4 files
  - **New (devkit-original)**: `docs/ai-coding-principles.md`, `docs/SAFETY.md`, all tests/evals/validation/install/Makefile/README/CLAUDE.md/scenario fixtures, `templates/ralph/` slash command, `LICENSE-UPSTREAM`

## Appendix A — Content for `docs/ai-coding-principles.md`

This appendix captures the constitutional principles that came out of the grill-me session for this PRD. Slice-010 authors `docs/ai-coding-principles.md` using this content.

**Posture**: project-scoped doc. Loaded on demand via project-level `CLAUDE.md` when an agent works in devkit-the-project. **Not** an auto-loaded rule.

### Context discipline

- **Smart zone is ~100k tokens.** Beyond that, attention degrades regardless of advertised context window. Size tasks to fit.
- **Clear over compact.** Memento-style: every new task starts with a fresh context. Compaction preserves sediment that hurts later judgment.
- **Push for reviewer, pull for implementer.** Implementer pulls skills on demand. Reviewer gets coding standards pushed inline.

### Planning discipline

- **Specs-to-code is rejected.** Don't ignore the code; don't just regenerate from a spec. Code is the battleground.
- **Grill-me before you plan.** Reach a shared design concept first (Brooks). Walk the design tree, one question at a time, with a recommended answer per question.
- **Don't read the PRD after generation.** It's destination doc only.
- **Don't keep PRDs around long-term.** Doc-rot risks future grilling sessions anchoring on stale text.

### Slicing discipline

- **Vertical slices, never horizontal.** One slice cuts schema → service → UI → test.
- **Issue body is the source of truth.** PRD lives as issue body. Slices live as `## Slices` checklist appended to body. No child issues.
- **≤10 slices per PRD.** Encoded in `to-issues` skill.

### Module discipline

- **Deep modules over shallow.** Simple interface, complex implementation (Ousterhout).
- **Deletion test.** Would removing this module concentrate complexity in callers (good) or just move it (bad)?
- **Design the interface, delegate the implementation.** Treat modules as gray boxes once the interface is locked.

### TDD discipline

- **AI cheats at tests.** Default: writes all impl, then writes tests against it. Counter: red-green-refactor with vertical slicing.
- **Test behavior, not implementation.** Tests must survive internal refactor.
- **Feedback loops are the speed limit.** Tests + types + browser MCP. Don't outrun your headlights.

### Front-end discipline

- **Front-end is multimodal — AI can't see.** Use throwaway prototype routes for taste decisions.

### Interaction discipline

- **Plain text Q&A only.** `AskUserQuestion` UI is rejected.
- **Wait for responses.** When asking, stop and wait.

### Domain discipline

- **Maintain `UBIQUITOUS_LANGUAGE.md`.** Shared terminology between user, AI, and code.
- **Architectural decisions log only for system-wide choices.** Per-feature decisions in PRD-issue bodies; system-wide invariants here. Append to `docs/decisions.md` only when the decision binds future architecture project-wide.

### Operational discipline

- **One pipeline, two trigger modes.** Manual = chat. AFK = `./ralph/afk.sh N`. Same agents, same skills, same output.
- **AFK loop has fresh context per iteration.** Ralph script restarts `claude --print` per slice.
- **`--dangerously-skip-permissions` is acceptable for solo trusted operator.** Git is the safety net. Risk profile in `docs/SAFETY.md`.
- **AFK observability**: terminal stream + git log + GitHub issue body checkboxes + close events. Orchestration chat and Ralph terminal are separate observables; no chat-feedback channel needed.
- **Editing external state through local buffers**: when an external system stores authoritative state (GitHub issue body, remote config), the discipline is **pull → surgical `Edit` → push, never full-file rewrite**. For GitHub issues: `gh issue view <N> --json body --jq .body > /tmp/<name>.md` (pull fresh), targeted `Edit` calls (each shows explicit `old_string` → `new_string` — diff visible, typo blast radius bounded), `gh issue edit <N> --body-file /tmp/<name>.md` (push), discard the local buffer afterward (don't reuse across rounds — pull fresh next time to avoid silently overwriting parallel edits). Heredoc-based full-body rewrites are tempting but lose diff visibility, burn tokens regenerating unchanged content, and can clobber concurrent changes.

### QA discipline

- **QA is human taste.** What makes a feature actually ship-worthy is not auto-verifiable — it's clicking through the user flow and noticing what's wrong.
- **`## QA Checklist` lives in the PRD body.** 3-5 concrete manual-verification steps authored at PRD time by `/to-prd`.
- **QA gate before close**: when all AFK slices ticked, Ralph posts `Ready for QA` and stops. User runs `## QA Checklist` manually, ticks each box, closes the parent issue manually.
- **QA findings become new issues** via `/qa` skill — never ad-hoc fixes mid-Ralph-run.
- **Don't auto-tick QA boxes**, ever. The whole point is human-in-the-loop verification.

### Failure-mode awareness

- **Misalignment** (AI builds wrong thing) → grill harder.
- **Verbose output** → ubiquitous language gap.
- **Doesn't work** → feedback loop weakness.
- **Brain can't keep up** → modules are shallow; deepen them.
- **Plan mode is too eager** → reach the design concept first via grill-me.
- **Feature feels off after Ralph completes it** → QA gate caught it; file new issues via `/qa`, don't bypass to a quick patch.














Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DevKit v3: redo around alignment + AFK pipeline (grill-me → to-prd → to-issues → Ralph) #1

Problem Statement

Solution

User Stories

Implementation Decisions

Testing Decisions

Out of Scope

QA Checklist

Slices

Further Notes

Appendix A — Content for `docs/ai-coding-principles.md`

Context discipline

Planning discipline

Slicing discipline

Module discipline

TDD discipline

Front-end discipline

Interaction discipline

Domain discipline

Operational discipline

QA discipline

Failure-mode awareness

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

DevKit v3: redo around alignment + AFK pipeline (grill-me → to-prd → to-issues → Ralph) #1

Description

Problem Statement

Solution

User Stories

Implementation Decisions

Testing Decisions

Out of Scope

QA Checklist

Slices

Further Notes

Appendix A — Content for docs/ai-coding-principles.md

Context discipline

Planning discipline

Slicing discipline

Module discipline

TDD discipline

Front-end discipline

Interaction discipline

Domain discipline

Operational discipline

QA discipline

Failure-mode awareness

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Appendix A — Content for `docs/ai-coding-principles.md`