Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

# Checks

A check is a markdown file describing a single code quality standard. Each check runs as a full AI agent on pull requests reading files, running commands, and exercising judgment to catch issues that linters and tests can't express.
A check is a markdown file describing a single code quality standard. Each check runs as a full AI agent on pull requests: reading files, running commands, and exercising judgment to catch issues that linters and tests can't express.

```markdown
---
Expand All @@ -25,7 +25,7 @@ When migrations are present, look for these issues:

**Tests** verify behavior. If you can assert an input/output, write a test.

**Checks** handle judgment calls that require context reading multiple files, understanding intent, applying conventions that aren't codified in a linter rule.
**Checks** handle judgment calls that require context, reading multiple files, understanding intent, applying conventions that aren't codified in a linter rule.

## Quick start

Expand Down Expand Up @@ -75,7 +75,7 @@ See the [workflow YAML](./.github/workflows/checks.yml) for the full implementat

## Learn more

- [spec.md](./spec.md) — Full spec: check file format, best practices, CI workflow, design principles, and problems you'll run into.
[spec.md](./spec.md) — Full spec: check file format, best practices, CI workflow, design principles, and problems you'll run into.

## License

Expand Down
48 changes: 22 additions & 26 deletions spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## The Problem

AI coding agents produce more code, faster. That's the point. But velocity without quality control creates a compounding problem: bad patterns slip in, the next agent session reads them as the standard, and the codebase degrades from the inside.
AI coding agents produce more code, faster. But velocity without quality control creates a compounding problem: bad patterns slip in, the next agent session reads them as the standard, and the codebase degrades from the inside.

Teams feel this. [Carnegie Mellon researchers found](https://arxiv.org/abs/2511.04427) that AI coding tools increase velocity but also increase static analysis warnings and code complexity. Open-source maintainers are [creating AI policies](https://redmonk.com/kholterhoff/2026/02/26/generative-ai-policy-landscape-in-open-source/) in response to quality issues with AI-generated submissions. A Brown CS professor published a [teardown of AI-generated code](https://gist.github.com/shriram/064756c61ca98774c2a509aa3893d941) for a bookstore app: floating-point arithmetic for money, custom CSV parsers instead of standard libraries, functions named `filterByTitle` that actually implement search. Each one minor. Together, a codebase that teaches the next agent session to write worse code.

Expand All @@ -21,7 +21,7 @@ Checks are semantic tests for things linters can't express:
- "Every new endpoint needs auth middleware."
- "Don't add dependencies without justification."

Because checks are plain English, anyone with a standard can write them not just engineers.
Because checks are plain English, anyone with a standard can write them, not just engineers.

## Checks vs. Tests vs. Linting

Expand Down Expand Up @@ -57,7 +57,7 @@ Your prompt here. This is the instruction the agent uses
when evaluating the pull request.
```

Check files live in a directory at the repository root, such as `.checks/` or `.agents/checks/`. The examples in this repo use `.checks/` — see the [`.checks/`](./.checks/) directory for ready-to-use check files you can copy into your project.
Check files live in a directory at the repository root, such as `.checks/` or `.agents/checks/`. The examples in this repo use `.checks/`. See the [`.checks/`](./.checks/) directory for ready-to-use check files you can copy into your project.

### Frontmatter Fields

Expand Down Expand Up @@ -221,9 +221,9 @@ description: Keep docs in sync when public APIs or configuration changes

Look for changes to public-facing APIs or configuration where the documentation wasn't updated:

- A public function, class, or type signature was added or changed — update the corresponding docs (README, docs/, or inline JSDoc/docstrings)
- Configuration options were added or renamed (config files, environment variables, CLI flags) — update the relevant documentation to reflect the new options
- API route paths or request/response shapes changed — update the API documentation
- A public function, class, or type signature was added or changed. Update the corresponding docs (README, docs/, or inline JSDoc/docstrings)
- Configuration options were added or renamed (config files, environment variables, CLI flags). Update the relevant documentation to reflect the new options
- API route paths or request/response shapes changed. Update the API documentation

No changes needed if the PR doesn't touch public APIs or configuration, or if docs were already updated.
```
Expand Down Expand Up @@ -268,7 +268,7 @@ When migrations are present, look for these issues:

## Running Checks Locally

Checks start local. You write a check, run it against your current branch with your coding agent, read the output, refine. The feedback loop is: edit the markdown, run the check, see the result, iterate. No infrastructure required — if your agent can read your repo, it can run your checks.
Checks start local. You write a check, run it against your current branch with your coding agent, read the output, refine. The feedback loop is: edit the markdown, run the check, see the result, iterate. No infrastructure required. If your agent can read your repo, it can run your checks.

Add check-running to your agent instructions (e.g. `AGENTS.md`) so checks run as part of the standard workflow:

Expand All @@ -282,7 +282,7 @@ The local feedback loop is the fastest way to improve checks:

1. Write or edit the check markdown file
2. Run the check against your current branch
3. Read the output — did it catch what it should? Did it flag things it shouldn't?
3. Read the output. Did it catch what it should? Did it flag things it shouldn't?
4. Refine the prompt and repeat

Because checks run inside your coding agent, you can iterate in conversation:
Expand All @@ -295,32 +295,28 @@ The migration safety check is flagging additive-only migrations as failures. Upd

Tests started local too. Developers ran them on their machines and trusted each other to do the same. That stopped working for every reason that applies to checks now: "works on my machine" divergence, the honor system (people forget, skip, rush), and silent drift.

Martin Fowler named this in his original CI article: the result has to be visible to the whole team, immediately. A red build means everyone knows. Checks need the same visibility on the PR, before merge, before the problem compounds.
Martin Fowler named this in his original CI article: the result has to be visible to the whole team, immediately. A red build means everyone knows. Checks need the same visibility on the PR, before merge, before the problem compounds.

The goal: run every check file as a full AI agent on every PR. Report results as native GitHub status checks. Suggest fixes when checks fail.

### The Architecture

You need five things:

1. **Trigger** PR opened, updated, or reopened
2. **Discovery** find all check files in the checks directory
3. **Agent runtime** run each check as an AI agent with the PR diff as context
4. **Verdict parser** extract pass/fail and reasoning from agent output
5. **Status reporter** post results as GitHub status checks
1. **Trigger**: PR opened, updated, or reopened
2. **Discovery**: find all check files in the checks directory
3. **Agent runtime**: run each check as an AI agent with the PR diff as context
4. **Verdict parser**: extract pass/fail and reasoning from agent output
5. **Status reporter**: post results as GitHub status checks

### Workflow

Below is a complete GitHub Actions workflow that runs checks on every PR. This is the cleanest version we know how to write — a single job, one runner, individual status checks per check file, and committable GitHub suggestion comments when checks fail.

The workflow is also available as a ready-to-use file at [`.github/workflows/checks.yml`](./.github/workflows/checks.yml).
A complete GitHub Actions workflow that runs checks on every PR is available at [`.github/workflows/checks.yml`](./.github/workflows/checks.yml). This is the cleanest version we know how to write: a single job, one runner, individual status checks per check file, and committable GitHub suggestion comments when checks fail.

When a check passes, it completes silently — no comment noise. When a check fails with actionable suggestions, it posts a pull request review with GitHub suggestion comments that can be committed directly from the PR UI. Each suggestion comment is labeled with the check name that produced it. If a check fails without suggestions (or the review can't be posted), it falls back to a plain PR comment.

> **Note:** Because check runs are created by the GitHub Actions app, the "Details" link on each check always points to the Actions workflow log — not the review comment. GitHub locks `details_url` for its own app's check runs. To control where the "Details" link points, you'd need to register a custom GitHub App that creates the check runs under its own identity, which allows setting `details_url` to any URL.

The workflow is available at [`.github/workflows/checks.yml`](./.github/workflows/checks.yml). See that file for the full implementation.

### What This Gets You

- Individual GitHub status checks per check file (green/red on the PR)
Expand All @@ -332,12 +328,12 @@ The workflow is available at [`.github/workflows/checks.yml`](./.github/workflow

### What This Doesn't Solve

- **Sequential execution** checks run one at a time. Parallelize with a matrix strategy at the cost of more runners and npm installs.
- **Non-determinism** the same check can produce different results on the same diff. No fix, only mitigation.
- **No session continuity** you can't follow up with the agent that made a judgment call. Save artifacts and use local runs for iteration.
- **No feedback loop** no built-in mechanism to track which checks produce accepted vs. rejected suggestions. Track manually.
- **Cost** each check is a full agent session. Start with few checks and monitor spend.
- **Check detail links** clicking a check name links to the Actions workflow log, not the review comment. This is a GitHub platform limitation for check runs created by the GitHub Actions app. A custom GitHub App would allow controlling the `details_url`.
- **Sequential execution**: checks run one at a time. Parallelize with a matrix strategy at the cost of more runners and npm installs.
- **Non-determinism**: the same check can produce different results on the same diff. No fix, only mitigation.
- **No session continuity**: you can't follow up with the agent that made a judgment call. Save artifacts and use local runs for iteration.
- **No feedback loop**: no built-in mechanism to track which checks produce accepted vs. rejected suggestions. Track manually.
- **Cost**: each check is a full agent session. Start with few checks and monitor spend.
- **Check detail links**: clicking a check name links to the Actions workflow log, not the review comment. This is a GitHub platform limitation for check runs created by the GitHub Actions app. A custom GitHub App would allow controlling the `details_url`.

## Problems You'll Run Into

Expand Down Expand Up @@ -369,7 +365,7 @@ Each check is a full agent session: reads files, processes a diff, reasons about

### Surfacing Suggested Fixes

When a check fails with suggestions, the workflow posts a pull request review with GitHub suggestion comments. These are committable directly from the PR UI one click to accept a fix. The agent returns structured suggestions with file, line range, and replacement code; the workflow constructs a review payload with ````suggestion` blocks and posts it via the GitHub API.
When a check fails with suggestions, the workflow posts a pull request review with GitHub suggestion comments. These are committable directly from the PR UI: one click to accept a fix. The agent returns structured suggestions with file, line range, and replacement code; the workflow constructs a review payload with ````suggestion` blocks and posts it via the GitHub API.

Edge cases to watch: suggestions on lines not in the diff will be rejected by GitHub (the workflow handles this gracefully). Accepting a suggestion creates a new commit, which triggers `synchronize`, which reruns all checks. If rerun loops become a problem, add commit-author detection to skip runs triggered by `github-actions[bot]`.

Expand Down
Loading