Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .claude/skills/maintainer-review
540 changes: 540 additions & 0 deletions .github/skills/maintainer-review/SKILL.md

Large diffs are not rendered by default.

208 changes: 208 additions & 0 deletions .github/skills/maintainer-review/adversarial.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,208 @@
<!-- SPDX-License-Identifier: Apache-2.0
https://www.apache.org/licenses/LICENSE-2.0 -->

# Adversarial reviewers — second-read integration

Some maintainers run a **second LLM reviewer** alongside their
own reading and the in-skill review to catch blind spots one
model would miss. The skill supports integrating any such
reviewer that exposes itself as a slash command in the
maintainer's harness — the maintainer names the command at
invocation time and the skill works it into the per-PR loop.

The skill does not ship a dependency on any particular plugin.
If the maintainer has none configured, Step 5 of
[`review-flow.md`](review-flow.md) is a no-op.

---

## Why bother

Two LLM reviewers with different training data flag different
classes of mistakes. The cost is one extra slash-command turn;
the benefit is meaningful for upstream PRs that land in front
of thousands of contributors. Adversarial framing — *"prove this
PR is wrong"* rather than *"check this PR for issues"* — pushes
harder on auth, data-loss, and race-condition assumptions, which
is the right gate for code that ships.

---

## How the maintainer configures one

Pass the slash command to invoke as the `with-reviewer:`
selector:

```text
/maintainer-review with-reviewer:/some-plugin:adversarial-review
```

The skill stores that command for the session and proposes it
at Step 5 of [`review-flow.md`](review-flow.md) on every PR.

To skip the proposal entirely for a session, pass
`no-adversarial`. To re-enable on the next session, drop the
flag — `with-reviewer:` does not persist across sessions
(deliberately; reviewers and their command names change, and
implicit state across sessions is hostile to maintainers who
share a checkout).

If the maintainer wants the same reviewer to be the default
across sessions, they put the slash command under a "Review
preferences" heading in their agent-instructions file —
project-scope `AGENTS.md` is the agent-agnostic canonical
location; harness-specific files (`.claude/CLAUDE.md`,
`~/.claude/CLAUDE.md`) work too. The skill picks the command up
from there at session start. The skill does **not** auto-discover
plugins or scan installed extensions.

---

## The "assistant proposes, user fires" constraint

Slash commands cannot be invoked from the assistant side. They
are user-side commands provided by the harness; only the human
user — or a configured hook — can fire them.

This means the per-PR adversarial step is a two-turn dance:

1. **Assistant proposes** — at Step 5 of
[`review-flow.md`](review-flow.md), the assistant writes the
proposal text and pauses:

> *I've drafted my findings for PR #N (1 major, 3 minor;
> see above). Want a second read? Type
> `<ADVERSARIAL_COMMAND>` and I'll wait for the result. Or
> `[N]o` / `[Q]uit` to skip.*

`<ADVERSARIAL_COMMAND>` is the slash command the maintainer
passed via `with-reviewer:` (or pulled from their
agent-instructions file). The assistant types it back
**literally** so the maintainer can copy-paste it; it does
not paraphrase.

2. **User fires** — the maintainer types the slash command. It
runs in the conversation and emits its findings as a normal
message.

3. **Assistant integrates** — on its next turn, the assistant
reads the second reviewer's findings, deduplicates against
its own list, marks each finding with its `source:
primary | adversarial | both`, and continues from Step 6 of
[`review-flow.md`](review-flow.md).

The assistant never *promises* to invoke the slash command
itself. *"Running the adversarial review now…"* is the wrong
phrasing — it implies the assistant is about to fire the
command. The right phrasing is *"Type `<ADVERSARIAL_COMMAND>`
and I'll wait."*

---

## Background mode for large diffs

If the second reviewer supports a background-run flag (most do
for large diffs), the maintainer can pass it as part of the
slash command. The skill's proposal becomes:

> *Diff is large (47 files, +1.2k −800). Suggest:
> `<ADVERSARIAL_COMMAND> --background` so it runs async. When
> it finishes, surface the output (whatever your reviewer's
> result-fetch command is — e.g. `/<plugin>:result`) and I'll
> wait.*

Once the user pastes the result back, the assistant integrates
as in step 3 above.

The skill does not assume any particular result-fetch command
exists. If the maintainer's reviewer doesn't support
background mode, drop the suggestion.

---

## What the integration looks like

After the second reviewer returns, the assistant produces a
**combined findings list**. Each finding is annotated with its
source:

```yaml
- file: providers/foo/src/airflow/providers/foo/hook.py
line: 142
rule_source: .github/instructions/code-review.instructions.md
rule_id: "Imports inside function bodies"
source: both # ← primary AND adversarial flagged this
severity: minor

- file: providers/foo/src/airflow/providers/foo/hook.py
line: 89
rule_source: adversarial (security)
rule_id: "Unbounded recursion on user-supplied JSON"
source: adversarial # ← adversarial-only
severity: blocking # ← adversarial flagged a real bug the primary missed

- file: providers/foo/tests/unit/foo/test_hook.py
line: 33
rule_source: AGENTS.md ("Use spec/autospec when mocking")
rule_id: "Unspec'd Mock"
source: primary # ← primary-only
severity: minor
```

The disposition pick (Step 6) then weighs the **combined**
findings: a `blocking` from the second reviewer flips the
disposition the same way a `blocking` from the primary review
does.

---

## Conflict between reviewers

If the primary review says *"this is fine"* and the second
reviewer says *"this is broken"* (or vice versa), surface the
disagreement to the maintainer **explicitly**:

> *Reviewers disagree on file.py:N. Primary: no concern.
> Adversarial: potential race condition (quoted: "the lock is
> released before the assertion"). Want me to drill into it,
> or defer to your judgment?*

Do not silently pick one. Disagreements between two LLM
reviewers are exactly the moments where the human reviewer's
judgment is most valuable; suppressing them defeats the
purpose of running two reviewers.

---

## When no adversarial reviewer is configured

If the maintainer didn't pass `with-reviewer:` and there's no
"Review preferences" entry in their agent-instructions file,
the skill announces once at session start:

> *No adversarial reviewer configured. Reviews this session use
> only my own pass. Pass `with-reviewer:<command>` next time if
> you want a second read.*

…and skips Step 5 entirely. The session summary notes which
PRs went through with single-reviewer coverage so the
maintainer can decide whether to back-fill manually later.

---

## Hook-based automation

Some plugins ship a `Stop` hook that auto-runs a generic review
at the end of each model turn. If the maintainer has one of
those installed, **the per-PR Step 5 proposal still runs
explicitly**. The two are independent:

- The hook runs at end-of-turn — catches anything obvious in
the working state of files (and may run a non-adversarial
variant of the reviewer).
- The skill's Step 5 proposes the **adversarial** variant
pointed at the specific PR diff, with adversarial framing
(auth / data-loss / race-condition default questioning).

Don't conflate the two. The hook is a safety net at end-of-
turn; the per-PR adversarial step is the actual review tool.
172 changes: 172 additions & 0 deletions .github/skills/maintainer-review/criteria.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
<!-- SPDX-License-Identifier: Apache-2.0
https://www.apache.org/licenses/LICENSE-2.0 -->

# Review criteria — pointers to source

This file is a **navigation map** for the project's review
criteria. It does not restate the rules — those live in the
source files below and are the single source of truth. The
skill's review pass reads them at session start (and re-reads
the per-area `AGENTS.md` files as PRs route into different
trees) and quotes the **source rule verbatim** in any finding
it raises.

If you find yourself wanting to "summarise the rule" in this
file or in a finding body, **stop and link to the source line
or section instead**. Summaries drift; links don't.

---

## Source files

| File | What it covers |
|---|---|
| [`.github/instructions/code-review.instructions.md`](../../../.github/instructions/code-review.instructions.md) | The rule set every Apache Airflow PR is reviewed against (architecture / DB / quality / testing / API / UI / generated files / AI-generated-code signals / quality signals). |
| [`AGENTS.md`](../../../AGENTS.md) | Repo-wide AI/agent instructions (architecture boundaries, security model, coding standards, testing standards, commits & PR conventions). |
| [`registry/AGENTS.md`](../../../registry/AGENTS.md) | Registry-tree-specific rules. |
| [`dev/AGENTS.md`](../../../dev/AGENTS.md) | `dev/` scripts conventions. |
| [`dev/ide_setup/AGENTS.md`](../../../dev/ide_setup/AGENTS.md) | IDE bootstrap conventions. |
| [`providers/AGENTS.md`](../../../providers/AGENTS.md) | Provider-tree boundary, compat-layer, and provider-yaml expectations. |
| [`providers/elasticsearch/AGENTS.md`](../../../providers/elasticsearch/AGENTS.md) | Elasticsearch-specific rules. |
| [`providers/opensearch/AGENTS.md`](../../../providers/opensearch/AGENTS.md) | OpenSearch-specific rules. |
| [`airflow-core/docs/security/security_model.rst`](../../../airflow-core/docs/security/security_model.rst) | The documented security model — what *is* and *isn't* a vulnerability. |

The per-PR review flow re-runs `git ls-files` against the
touched paths to discover any other `AGENTS.md` not in this
table; see [`review-flow.md#area-specific-overlay`](review-flow.md).

---

## Categories — link out to the source section

The headings below mirror the section structure of the source
files; click through for the actual rule text.

### Architecture boundaries

[`code-review.instructions.md` § Architecture Boundaries](../../../.github/instructions/code-review.instructions.md#architecture-boundaries) ·
[`AGENTS.md` § Architecture Boundaries](../../../AGENTS.md#architecture-boundaries)

### Database / query correctness

[`code-review.instructions.md` § Database and Query Correctness](../../../.github/instructions/code-review.instructions.md#database-and-query-correctness)

### Code quality

[`code-review.instructions.md` § Code Quality Rules](../../../.github/instructions/code-review.instructions.md#code-quality-rules) ·
[`AGENTS.md` § Coding Standards](../../../AGENTS.md#coding-standards)

### Testing

[`code-review.instructions.md` § Testing Requirements](../../../.github/instructions/code-review.instructions.md#testing-requirements) ·
[`AGENTS.md` § Testing Standards](../../../AGENTS.md#testing-standards)

### API correctness

[`code-review.instructions.md` § API Correctness](../../../.github/instructions/code-review.instructions.md#api-correctness)

### UI (React/TypeScript)

[`code-review.instructions.md` § UI Code (React/TypeScript)](../../../.github/instructions/code-review.instructions.md#ui-code-reacttypescript)

### Generated files

[`code-review.instructions.md` § Generated Files](../../../.github/instructions/code-review.instructions.md#generated-files)

### AI-generated code signals

[`code-review.instructions.md` § AI-Generated Code Signals](../../../.github/instructions/code-review.instructions.md#ai-generated-code-signals)

### Quality signals to check

[`code-review.instructions.md` § Quality Signals to Check](../../../.github/instructions/code-review.instructions.md#quality-signals-to-check)

### Commits and PRs (newsfragments, commit messages, tracking issues)

[`AGENTS.md` § Commits and PRs](../../../AGENTS.md#commits-and-prs)

---

## Provider-specific signals

When a PR touches `providers/<name>/`, the skill reads (and
quotes from) the provider-tree files in addition to the
repo-wide ones:

- [`providers/AGENTS.md`](../../../providers/AGENTS.md) — the
provider-boundary, compat-layer, and `provider.yaml`
expectations apply.
- `providers/<name>/AGENTS.md` if present — provider-specific
rules (e.g.
[`providers/elasticsearch/AGENTS.md`](../../../providers/elasticsearch/AGENTS.md),
[`providers/opensearch/AGENTS.md`](../../../providers/opensearch/AGENTS.md)).

If the provider's tree has no `AGENTS.md`, the repo-wide rules
are still in effect.

---

## Security model — calibration

Before flagging anything that looks security-flavoured, read
the documented security model at
[`airflow-core/docs/security/security_model.rst`](../../../airflow-core/docs/security/security_model.rst)
and the
[`AGENTS.md` § Security Model](../../../AGENTS.md#security-model)
calibration guide. The latter is short and tells you how to
distinguish:

1. an **actual vulnerability** that violates the documented
model — flag as blocking,
2. a **known limitation** that's already documented as
intentional — do not flag,
3. a **deployment-hardening opportunity** — belongs in
deployment guidance, not as a code finding.

When the skill downgrades what looked like a finding because
the documented model permits it, the review body **quotes the
relevant model paragraph** so the contributor sees the
calibration explicitly. Don't paraphrase.

---

## Backports and version-specific PRs

Branch `vX-Y-test` PRs are backports of already-merged `main`
work. They aren't called out in the repo-wide files, so the
calibration is local to this skill:

- **Diff parity**: does this match what was merged on `main`?
- **Cherry-pick conflicts**: did the resolution introduce new
changes that need scrutiny?
- **API/migration version markers**: backports should not
introduce new Cadwyn version bumps; if they do, that's a
finding (cite
[`code-review.instructions.md` § API Correctness](../../../.github/instructions/code-review.instructions.md#api-correctness)).

For these PRs, prefer `COMMENT` over `REQUEST_CHANGES` unless
the cherry-pick has clearly drifted from the `main` change.

---

## Conflict between source rules

If the per-area `AGENTS.md` rules **conflict** with the
repo-wide ones (rare; usually a more specific override), the
more specific one wins — but the conflict is surfaced to the
maintainer for explicit acceptance during disposition pick
(see [`review-flow.md`](review-flow.md)).

---

## When in doubt — defer

If after reading the diff you're not sure whether something is
a finding or just a style preference, **do not flag it**.
Surface the uncertainty to the maintainer (one line:
*"Hmm — line N does X, which I'm not sure violates the rules;
flagging for your eye."*) and let them decide. The cost of an
over-zealous auto-finding is a contributor who feels
nitpicked; the cost of a missed nit is one round of
back-and-forth a maintainer can catch easily on their own
pass.
Loading
Loading