Skip to content

skill: requesting-code-review — adversarial framing rewrite#3

Merged
intel352 merged 17 commits into
mainfrom
feat/skills-improvements-from-failure-modes
Apr 25, 2026
Merged

skill: requesting-code-review — adversarial framing rewrite#3
intel352 merged 17 commits into
mainfrom
feat/skills-improvements-from-failure-modes

Conversation

@intel352
Copy link
Copy Markdown
Contributor

@intel352 intel352 commented Apr 25, 2026

Summary

  • skills/requesting-code-review/SKILL.md — full rewrite: adversarial framing, scope-vs-dispatch compliance gate (runs first), 9-class bug checklist, per-finding output format mandate (file:line required), iterative loop protocol (max 5 rounds), verdict vocabulary, updated review-request template
  • skills/requesting-code-review/test-fixtures/sibling-asymmetry/ — 6-file fixture demonstrating the before/after: old prompt approves a sibling-method asymmetry, new prompt flags it as Important (REQUEST-CHANGES)
  • docs/plans/2026-04-25-superpowers-skills-improvements.md — implementation plan for the full 5-PR sequence
  • docs/plans/2026-04-25-superpowers-skills-improvements-design.md — design doc capturing failure modes and approach rationale

Scope notes

  • Tasks 5 (output format mandate) and 6 (iterative loop protocol) landed in the same commit (0e0016d) — both are complete and correct; the commit message covers both.
  • The clarifying paragraph added in d085267 (two-phase model: gate first, adversarial scan second) is a necessary consequence of finding fix(tdd): post-merge Copilot round-4 fixes — class-invariant rewrite + fixture clarity #8 — adding "NOT validating" to the template without the clarification creates a logical contradiction between the template and the compliance gate mandate.

This PR is its own strongest validation

The team-internal review chain (spec-reviewer + code-reviewer) issued SHIP-IT three times across 5 rounds while Copilot found 22 substantive findings still open. The new brief — adversarial framing, full bug-class checklist, per-finding format mandate — is precisely the fix for that gap. It was needed on the PR introducing it.

This is not a process failure unique to this team or this PR. It is the reproducible baseline the brief is designed to eliminate. The gap closes when the external reviewer (Copilot, or a reviewer using the new brief) runs the full checklist adversarially rather than confirming the implementer's narrative.

Round history

Round Findings Result
1 1 Important + 1 Minor + 1 Nit (Example format, template params, synthesized marker) REQUEST-CHANGES
2 1 Important (Example skipped re-review step) REQUEST-CHANGES
3 SHIP-IT (team chain) — held for Copilot
4 16 Copilot findings (sanitization, skill text, fixtures, PR description) REQUEST-CHANGES
4 cont. 1 Important (new-prompt.md format drift) + 1 Minor (undocumented scope) REQUEST-CHANGES
5 SHIP-IT (team chain) — held for Copilot
6 6 Copilot findings (template clarity, stale ref, loop protocol, plan fence, expected-result checklist) REQUEST-CHANGES

Test plan

  • SKILL.md adversarial framing section: "NOT validating" applies to bug-class scan; compliance gate is separate structural check
  • Review-request template: two-phase note at top; phase 1 labeled "structural check", phase 2 labeled "adversarial"
  • Iterative loop: each round runs (a) compliance gate then (b) full bug-class checklist
  • No stale reference to code-reviewer.md at end of file
  • Scope-vs-dispatch gate appears before bug-class checklist
  • Bug-class "Scope-vs-dispatch drift" entry says "gate above" (not below)
  • Output format section: ### Finding N — heading with 6 required fields
  • Minor findings: do not auto-trigger re-review; orchestrator inspects before merge
  • new-prompt.md per-finding format matches SKILL.md ### Finding N — block exactly
  • expected-new-result.md: 1 Important finding + table of all 9 bug classes checked
  • Plan Task 5 outer fence uses 4 backticks (no nesting breakage)
  • Plan Task 1 section order: compliance gate at position 2 (before bug-class checklist)
  • Both plan docs: generic implementer reference (no agent/team names)

🤖 Generated with Claude Code

intel352 added 11 commits April 24, 2026 21:14
Address ten failure patterns observed in multi-agent autonomous
development sessions where the current skill set let real bugs reach
production. Headline finding: a third-party adversarial code reviewer
running on a fresh diff catches bugs the team's own reviewer agent
already approved — same base LLM, very different framing and brief.

Recommended approach: edit existing skills in place (most patterns) +
introduce ONE new mini-skill (runtime-launch-validation) for the
cross-cutting concern referenced by both finishing-a-development-branch
and writing-plans. Avoids skill proliferation; keeps cross-cutting
content in one referenceable place.

Patterns addressed:
- P1, P7: runtime-launch validation gate in finishing-a-development-branch
- P2: sibling-method coverage when fix is class invariant violation
- P3: revert-and-verify-fail invariant proof in TDD
- P4: version-skew audit step in finishing-a-development-branch
- P6 (HEADLINE): requesting-code-review rewrite — adversarial framing,
  bug-class checklist, iterative loop, scope-vs-dispatch gate,
  line-anchored output, verdict vocabulary
- P8: per-change-class verification table in writing-plans
- P9: --design-only flag in brainstorming + writing-plans
- P10: DRY team-conventions extraction in subagent-driven-development

P5 (sequential dispatch) already covered by existing
dispatching-parallel-agents skill — no change needed.

Status: design approved, full execution pipeline will run.
29 task-by-task TDD-structured tasks across 5 PRs:

- PR 1: requesting-code-review rewrite (adversarial framing,
  bug-class checklist, output format, iterative loop, scope
  compliance gate, verdict vocabulary) + sibling-asymmetry fixture
- PR 2: test-driven-development additions (Verify Regression
  Invariant step, class-invariant violation section) + narrow-
  regression fixture
- PR 3: NEW skills/runtime-launch-validation/ + finishing-a-
  development-branch Step 1b (runtime gate) + Step 1c (version-
  skew audit) + missing-pin-skew fixture
- PR 4: writing-plans verification-per-change-class table +
  --design-only flag + brainstorming --design-only mirror +
  design-only flag fixture
- PR 5: subagent-driven-development DRY refactor — extract
  agents/team-conventions.md, implementer/spec-reviewer/code-
  reviewer prompts reference conventions instead of inlining +
  DRY before/after fixture

Each PR ends with finishing-a-development-branch + pr-monitoring.
PRs sequenced 1→5 for dependency: PR 3 references the new skill
introduced in itself; PR 4's flag propagates between two skills;
PR 5 dogfoods the DRY pattern from PRs 1-4.

Implementer dogfoods: every test under PR 1+ uses revert-and-
restore proof (PR 2's discipline). Every PR review uses
adversarial brief (PR 1's discipline). PR bodies reference
team-conventions instead of inlining (PR 5's discipline).

Sanitization: generic placeholders throughout. No project,
company, technology, or incident references.

Status: design approved. Full execution pipeline runs.
Copilot AI review requested due to automatic review settings April 25, 2026 01:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the requesting-code-review skill to enforce an adversarial, checklist-driven review style and adds a “sibling-asymmetry” fixture demonstrating the behavioral difference between the old vs. new reviewer briefs. It also adds large plan/design docs describing a broader multi-PR skills-improvement initiative.

Changes:

  • Rewrites skills/requesting-code-review/SKILL.md with adversarial framing, a scope-vs-dispatch gate, bug-class checklist, mandated per-finding output format, iterative loop protocol, and verdict vocabulary.
  • Adds a sibling-asymmetry fixture (diff + dispatch + old/new prompts + expected outputs) and fixture README.
  • Adds implementation plan + design documents under docs/plans/.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
skills/requesting-code-review/SKILL.md Adds new adversarial review framing, checklist, output mandate, loop protocol, and updated dispatch template text
skills/requesting-code-review/test-fixtures/sibling-asymmetry/old-prompt.md Captures the pre-rewrite (validation-framing) reviewer brief for the fixture
skills/requesting-code-review/test-fixtures/sibling-asymmetry/new-prompt.md Captures the post-rewrite (adversarial) reviewer brief applied to the fixture diff
skills/requesting-code-review/test-fixtures/sibling-asymmetry/expected-old-result.md Expected reviewer output under the old brief (approves)
skills/requesting-code-review/test-fixtures/sibling-asymmetry/expected-new-result.md Expected reviewer output under the new brief (flags symmetry violation)
skills/requesting-code-review/test-fixtures/sibling-asymmetry/dispatch.md The “implementer dispatch” text used for scope-vs-dispatch checking in the fixture
skills/requesting-code-review/test-fixtures/sibling-asymmetry/diff.patch Minimal diff used to demonstrate sibling-method asymmetry
skills/requesting-code-review/test-fixtures/README.md Documents fixture purpose and how to add new fixtures
docs/plans/2026-04-25-superpowers-skills-improvements.md Large implementation plan describing a 5-PR sequence of skill upgrades
docs/plans/2026-04-25-superpowers-skills-improvements-design.md Design doc describing motivation/approach and mapping patterns to skills

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread skills/requesting-code-review/SKILL.md Outdated
@@ -0,0 +1,55 @@
You are a code reviewer with adversarial framing. Find at least three things
wrong with this code, even if they seem minor. Bias toward finding issues.
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixture prompt mirrors the “Review-request template” in SKILL.md, but it currently omits the “You are NOT validating that the code matches the dispatch …” sentence that SKILL.md marks as required prompt phrasing. If the template is updated to include that sentence, update this fixture too so it stays in sync.

Suggested change
wrong with this code, even if they seem minor. Bias toward finding issues.
wrong with this code, even if they seem minor. Bias toward finding issues.
You are NOT validating that the code matches the dispatch; you are reviewing
the code change itself for bugs and issues.

Copilot uses AI. Check for mistakes.
Comment thread docs/plans/2026-04-25-superpowers-skills-improvements-design.md Outdated
Comment thread docs/plans/2026-04-25-superpowers-skills-improvements.md Outdated
Comment thread docs/plans/2026-04-25-superpowers-skills-improvements.md Outdated
Comment on lines 184 to 185
## Example

Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example that follows this header appears to still use the old placeholder-based dispatch format and old Strengths/Issues output style, which conflicts with the new required brief and per-finding output mandate above. Consider updating the example to match the new template so readers don’t copy the outdated format.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +5
# Superpowers Skills Improvements — Implementation Plan

> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

**Goal:** Address ten observed failure patterns in the autonomous-development pipeline by upgrading six existing skills and adding one new mini-skill for cross-cutting runtime-launch validation.
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description lists only requesting-code-review skill + fixtures, but this PR also adds large implementation-plan and design documents under docs/plans/. Either update the PR description to include these docs (and why they belong in this PR) or split them into a separate PR to keep scope clear.

Copilot uses AI. Check for mistakes.

```
You are a code reviewer with adversarial framing. Find at least three things
wrong with this code, even if they seem minor. Bias toward finding issues.
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Required prompt phrasing” section includes an extra sentence (“You are NOT validating that the code matches the dispatch...”), but the verbatim review-request template omits it. Either include the full required phrasing in the template (and fixtures), or relax the requirement so the template is compliant.

Suggested change
wrong with this code, even if they seem minor. Bias toward finding issues.
wrong with this code, even if they seem minor. Bias toward finding issues.
You are NOT validating that the code matches the dispatch — you are looking
for bugs the original author missed.

Copilot uses AI. Check for mistakes.
Comment thread skills/requesting-code-review/SKILL.md Outdated
intel352 and others added 2 commits April 24, 2026 21:37
Finding 1 (Important): Example section showed old Strengths/Issues/Assessment
prose format. Replace with per-finding blocks matching the output format
mandate — adversarial framing, Severity/Bug-class/Location/What/Why/Fix.
Also align Example's SHA commands with the How-to-Request template.

Finding 2 (Minor): Example dispatch now shows filled-in <diff>/<dispatch text>
placeholders matching the review-request template directly.

Finding 3 (Nit): README step 3 documents the synthesized-marker convention;
both expected-*.md files get <!-- synthesized --> header since they were
author-written, not captured from live subagent runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…terative loop)

The Example jumped from "fix" directly to "continue" — teaching agents the
exact failure mode the iterative-loop protocol was written to eliminate.

Add a round-2 review dispatch after the fix step, followed by a SHIP-IT
verdict, so the Example demonstrates the full loop: review → fix → re-review
→ SHIP-IT → proceed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 25, 2026 01:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +12 to +24
**Required prompt phrasing in every dispatch:**

> Find at least three things wrong with this code, even if they seem minor. Bias toward finding issues. You are NOT validating that the code matches the dispatch — you are looking for bugs the original author missed.

Why: the same base model produces dramatically different output under "find what's wrong" vs. "review the work." Validation framing biases toward approval; adversarial framing biases toward rigor. The shift is mechanical and reproducible.

**Forbidden phrasing**: "review the work", "verify the implementation matches the plan", "confirm correctness", any prompt that implies the reviewer's job is to validate. These produce sign-offs that miss real bugs.

**Reflexive approval is forbidden.** If the reviewer finds nothing wrong, they must state which checks were run (the bug-class checklist below) and what specifically they verified. A bare "looks good" is rejected.

## Scope-vs-dispatch compliance gate (RUN FIRST)

Before any other check, the reviewer compares the PR diff against the dispatch text — the message that asked the implementer to do the work.
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The required prompt phrasing says the reviewer is “NOT validating that the code matches the dispatch”, but the very next section mandates a scope-vs-dispatch compliance gate. As written, these two directives conflict and could cause reviewers to skip or underweight the compliance gate.

Suggested fix: reword the required phrasing to clarify that reviewers should not stop at validation / should go beyond dispatch compliance, while still running the scope-vs-dispatch gate first.

Copilot uses AI. Check for mistakes.
Comment thread skills/requesting-code-review/test-fixtures/README.md Outdated
Comment thread docs/plans/2026-04-25-superpowers-skills-improvements.md Outdated
intel352 and others added 2 commits April 24, 2026 21:51
Sanitization violations (docs):
- design doc line 332: generalize implementer reference
- plan line 29: generalize implementer reference
- plan line 11: clarify sanitization rule — generic tech terms OK, specific
  project/company/version/incident references not

Skill text inconsistencies:
- Error swallowing: replace invalid `_, err := ...; result` with valid
  `_, err := f(); _ = err` pattern
- Scope-vs-dispatch drift entry: "gate below" → "gate above" (gate is above)
- Iterative loop paragraph: rewrite to resolve internal contradiction — Minor
  findings do not auto-trigger re-review; orchestrator inspects before merge
- Review-request template: add "You are NOT validating..." sentence to match
  the Required prompt phrasing section
- Review-request template: replace abbreviated per-finding format with the
  full ### Finding N block format matching the output-format section
- Example dispatch: replace shorthand key-value form with verbatim brief
  (showing the actual template filled in), so agents model correct usage

Fixture inconsistencies:
- new-prompt.md: add missing "You are NOT validating..." sentence
- expected-new-result.md: fix Delete's "kind" line ref (17 → 27) and
  Inspect's location (lib/dispatcher.go:13 → :17) to match @@ -10,6 +10,12
  hunk positioning in the actual fixture diff

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ound 4 cont.)

Finding 13 — internal contradiction: adversarial framing vs. compliance gate
Add clarifying note in the adversarial framing section: "NOT validating"
applies to the bug-class scan mindset, not to the structural compliance gate
which runs as a separate first-pass check. Both mandates are valid; they
apply to different phases (gate first, adversarial scan second).

Finding 14 — lib/dispatcher.go:13 wrong location
Already fixed in f748c86 (:13 → :17). No change needed.

Finding 15 — README "do not synthesize" contradicts synthesized-marker docs
Rewrite step 3: prefer real transcripts; synthesized results acceptable for
initial demonstrations only if labeled <!-- synthesized -->; encourage
replacing with live transcripts over time. No contradiction remains.

Finding 16 — plan section order doesn't match SKILL.md
Update plan Task 1 section list: compliance gate moves to position 2 (before
bug-class checklist), matching the actual SKILL.md structure and the design's
"RUN FIRST" mandate.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 25, 2026 01:53
… format

Finding 1 (Important): new-prompt.md still used abbreviated bullet list for
the per-finding format after f748c86 updated SKILL.md and the review-request
template. Replace with the verbatim ### Finding N — block (6 required fields)
so the canonical fixture matches the authoritative format in SKILL.md:164-172.

Finding 2 (Minor): PR description updated (no code change) to document the
clarification paragraph added in d085267 as a necessary consequence of
finding #8 — adding "NOT validating" to the template without the two-phase
clarification creates a logical contradiction. Scope note added to PR body.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread skills/requesting-code-review/SKILL.md Outdated
Comment on lines +188 to +189
Use the review-request template above. Replace `<diff>` with the output of
`git diff $BASE_SHA..$HEAD_SHA` and `<dispatch text>` with the original
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section instructs users to use the new review-request template above, but the file still ends with "See template at: requesting-code-review/code-reviewer.md". That referenced template currently uses validation framing (placeholders like WHAT_WAS_IMPLEMENTED/PLAN_OR_REQUIREMENTS) and conflicts with the new adversarial brief; either update code-reviewer.md to match the new template or remove/adjust the reference to avoid users dispatching the outdated prompt.

Suggested change
Use the review-request template above. Replace `<diff>` with the output of
`git diff $BASE_SHA..$HEAD_SHA` and `<dispatch text>` with the original
Use the adversarial review-request template in this file above. Do **not**
use any legacy template reference such as
`requesting-code-review/code-reviewer.md`. Replace `<diff>` with the output
of `git diff $BASE_SHA..$HEAD_SHA` and `<dispatch text>` with the original

Copilot uses AI. Check for mistakes.
Comment on lines +91 to +99
One review pass is insufficient. After implementer addresses findings, the reviewer **re-reads the new diff from scratch** and runs the full bug-class checklist again. New issues commonly surface — fixes introduce new bugs, fixes only address the symptom, or the implementer interprets the finding too narrowly.

Loop:

1. Implementer dispatches review.
2. Reviewer scans (bug-class checklist) → produces findings.
3. Implementer addresses each finding, pushes new commits.
4. **Reviewer re-reads the new diff (not just the diff-of-diffs)** and runs the full checklist again.
5. Repeat until reviewer's verdict is SHIP-IT with no Critical or Important findings.
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iterative loop protocol says the reviewer re-reads the new diff and "runs the full checklist again", but the overall process also requires the scope-vs-dispatch compliance gate to run first. To keep the loop unambiguous (especially across multiple rounds), consider explicitly stating that each re-review re-runs the full required sequence (gate first, then bug-class scan) rather than only the bug-class checklist.

Suggested change
One review pass is insufficient. After implementer addresses findings, the reviewer **re-reads the new diff from scratch** and runs the full bug-class checklist again. New issues commonly surface — fixes introduce new bugs, fixes only address the symptom, or the implementer interprets the finding too narrowly.
Loop:
1. Implementer dispatches review.
2. Reviewer scans (bug-class checklist) → produces findings.
3. Implementer addresses each finding, pushes new commits.
4. **Reviewer re-reads the new diff (not just the diff-of-diffs)** and runs the full checklist again.
5. Repeat until reviewer's verdict is SHIP-IT with no Critical or Important findings.
One review pass is insufficient. After implementer addresses findings, the reviewer **re-reads the new diff from scratch** and re-runs the full required review sequence: **first the scope-vs-dispatch compliance gate, then the full bug-class checklist**. New issues commonly surface — fixes introduce new bugs, fixes only address the symptom, or the implementer interprets the finding too narrowly.
Loop:
1. Implementer dispatches review.
2. Reviewer runs the full required review sequence — **scope-vs-dispatch compliance gate first, then bug-class scan** → produces findings.
3. Implementer addresses each finding, pushes new commits.
4. **Reviewer re-reads the new diff (not just the diff-of-diffs)** and re-runs that same full required sequence — **scope-vs-dispatch compliance gate first, then bug-class scan**.
5. Repeat that full sequence each round until reviewer's verdict is SHIP-IT with no Critical or Important findings.

Copilot uses AI. Check for mistakes.
Comment thread docs/plans/2026-04-25-superpowers-skills-improvements.md Outdated

For each finding, use this exact format:

### Finding N — <one-line summary>
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixture's "Required output" section describes a different per-finding format (a single bullet listing fields) than the SKILL's mandated "### Finding N" block with named fields. Since this fixture is meant to demonstrate the post-rewrite prompt, it should mirror the exact template/format from skills/requesting-code-review/SKILL.md so the expected output is testing the same constraint.

Suggested change
### Finding N — <one-line summary>
### Finding N
- **Summary**: <one-line summary>

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +40
### Finding 1 — Inspect omits `kind` arg present in all sibling methods
- **Severity**: Important
- **Bug class**: Symmetry violations
- **Location**: `lib/dispatcher.go:17`
- **What's wrong**: The new `Inspect` method's args map omits `"kind": d.kind`. Sibling methods `Create` (line 11) and `Delete` (line 27) both set it as the first key.
- **Why it matters**: Receivers downstream rely on `kind` to dispatch correctly. Calling `Inspect` will silently target the wrong dispatch path or return `not found` because the receiver cannot disambiguate the request type.
- **Suggested fix**: Add `"kind": d.kind,` as the first key in the args map, matching siblings.

## Verdict: REQUEST-CHANGES

One Important finding (sibling-method symmetry violation); fix is one line. Once added and re-reviewed, expected to be SHIP-IT.
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new prompt mandates "Find at least three things wrong with this code", but this expected result only produces a single finding. For the fixture to be a reliable regression test of the prompt behavior, either add two additional Minor/Nit findings that the checklist would plausibly flag (and include them in the Findings section), or revise the prompt requirement so the expected output can legitimately contain fewer than three findings when only one real issue exists.

Copilot uses AI. Check for mistakes.
…ound 6)

Finding 1 — template "NOT validating" lacks two-phase clarification
Restructure the review-request template's Required output section: rename
phase 1 as "Scope-vs-dispatch compliance gate (structural check)" and phase 2
as "Bug-class scan (adversarial)" where "NOT validating" applies. Add a Note
block at the top of the template explaining the two-phase distinction so
the brief is unambiguous when read in isolation.

Finding 2 — stale reference to code-reviewer.md
Remove final line "See template at: requesting-code-review/code-reviewer.md"
— the file ships a competing validation-framing template and contradicts the
new brief. No second template.

Finding 3 — iterative loop doesn't mandate compliance gate each round
Update loop steps 2 and 4: each round explicitly runs (a) compliance gate
then (b) full bug-class checklist, in that order.

Finding 4 — triple-backtick fence nesting breaks plan rendering
Change outer fence from ``` to ```` (4 backticks) in Task 5's markdown block
so the inner ``` fences don't prematurely close the outer block.

Finding 5 — new-prompt.md per-finding format
Already fixed in 381515e. No change needed.

Finding 6 — expected-new-result has 1 finding vs "at least three" mandate
Use the "state which checks ran" path: expand bug-class scan section into a
table showing all 9 classes with explicit results, demonstrating the reviewer
ran the full checklist. The one Important finding (symmetry violation) stands;
remaining classes documented as checked with no findings.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

**Files:**
- Modify: `skills/requesting-code-review/SKILL.md`

**Step 1: Insert at the bottom of the skill (after the existing existing content)**
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate word: "existing existing".

Suggested change
**Step 1: Insert at the bottom of the skill (after the existing existing content)**
**Step 1: Insert at the bottom of the skill (after the existing content)**

Copilot uses AI. Check for mistakes.

**Required prompt phrasing in every dispatch:**

> Find at least three things wrong with this code, even if they seem minor. Bias toward finding issues. You are NOT validating that the code matches the dispatch — you are looking for bugs the original author missed.
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The required prompt phrasing says to "Find at least three things wrong", but later the skill allows the reviewer to find nothing (and instead report which checks were run). As written this is internally inconsistent and may cause reviewers to manufacture findings; consider rephrasing to something like "try to find at least three" or "identify at least three potential issues/risks" while keeping the "if none, enumerate checks run" escape hatch.

Suggested change
> Find at least three things wrong with this code, even if they seem minor. Bias toward finding issues. You are NOT validating that the code matches the dispatch — you are looking for bugs the original author missed.
> Try to find at least three potential issues or risks in this code, even if they seem minor. Bias toward finding issues. You are NOT validating that the code matches the dispatch — you are looking for bugs the original author missed.

Copilot uses AI. Check for mistakes.
Comment on lines +245 to +246
Verdict: REQUEST-CHANGES — one Important finding (missing progress indicators); fix before merge.

Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the example, round 2 says it's using the "same diff range $BASE_SHA..$HEAD_SHA — now includes the fix commits", but after pushing new commits the original HEAD_SHA no longer points at the updated tip. Suggest updating the example to recompute HEAD_SHA (or use git diff $BASE_SHA..HEAD) so the re-review actually includes the fix commits.

Copilot uses AI. Check for mistakes.
Comment on lines 186 to 190
BASE_SHA=$(git rev-parse HEAD~1) # or origin/main
HEAD_SHA=$(git rev-parse HEAD)
```

**2. Dispatch code-reviewer subagent:**
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section directs users to use the in-file review-request template, but the file later still points readers at requesting-code-review/code-reviewer.md, which currently contains the old validation-framing template. This conflicting guidance is likely to regress usage back to the old prompt; consider removing/updating the reference, or updating code-reviewer.md to match the new adversarial template and required output format.

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +35
- **Suggested fix**: Add `"kind": d.kind,` as the first key in the args map, matching siblings.

## Verdict: REQUEST-CHANGES
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This finding block cites Delete as setting kind on line 27, but in the fixture diff Delete's map has "kind": d.kind on line 26 and "name": req.Name on line 27 (based on the hunk starting at +10). Updating these line refs will keep the required file:line anchoring accurate.

Copilot uses AI. Check for mistakes.
Comment on lines +472 to +473
3. Capture both old and new prompt outputs (run the actual reviewer subagent
under each prompt; do not synthesize).
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plan text says fixture expected results should be captured from live subagent runs and "do not synthesize", but the fixture README added in this PR explicitly allows synthesized results when labeled <!-- synthesized -->. Please align the plan with the repo's documented convention (prefer live transcripts, allow synthesized with explicit labeling for initial demos).

Suggested change
3. Capture both old and new prompt outputs (run the actual reviewer subagent
under each prompt; do not synthesize).
3. Capture both old and new prompt outputs. Prefer running the actual reviewer
subagent under each prompt; for an initial demo, a synthesized expected
result is allowed only if it is explicitly labeled `<!-- synthesized -->`.

Copilot uses AI. Check for mistakes.
@intel352 intel352 merged commit cf9e720 into main Apr 25, 2026
3 checks passed
intel352 added a commit that referenced this pull request Apr 25, 2026
…it fixes

Addresses 6 post-merge Copilot findings on PR #3 (Skills PR 1) plus 2 round-2 follow-ups: duplicate word in plan, 'find at least three' OR clause for ≤3 findings consistently propagated to template + example, BASE_SHA..HEAD_SHA semantic clarification, fixture line refs corrected, plan/README synthesized-OK alignment, indentation cleanup.
intel352 added a commit that referenced this pull request Apr 25, 2026
…follow-up)

Fixes line-ref errors in expected-new-result.md citing wrong line numbers relative to diff.patch hunk. Create's kind on line 12, Inspect inserted at lines 18-23, Delete's kind on line 27. Surfaced by Copilot post-merge of PR #3.
intel352 added a commit that referenced this pull request Apr 25, 2026
… plan+design

Finding #1 (plan:395): verification command pipes to grep, which exits 1 when no
matches — looks like failure. Fix: add || true, update expected output description.

Finding #2 (design:142): described markers as "HTML-comment-shaped" but the
syntax is angle-bracket tags (not HTML comments). Fix: "angle-bracket tag syntax
that most markdown renderers treat as unknown HTML tags".

Finding #3 (design:178): prose said OpenCode/Cursor "get blank entries" but
table uses sentinel value host-pass-through. Fix: prose now matches sentinel.

Finding #4 (plan:1232): "Two distinct clean Copilot review timestamps" hard gate
has no anchor policy doc. Fix: downgrade to "recommended practice" phrasing.

Finding #5 (plan:65): skill-content-grep.expected.txt listed as a created file
and in ALLOWED_FILES but the script never reads it — dead artifact. Fix: drop
the file from the task entirely.

Finding #6 (plan:141): guard script ran grep twice per token, reported line
numbers from the stripped (AWK-processed) string rather than the original file.
Fix: single-pass AWK that emits "LINENO:content" preserving original line
numbers; single grep pass with || true.

Finding #7 (plan:132): skip-regex <host: *claude-code *> only matched the exact
single-host tag form, not comma-separated lists like <host: codex, claude-code>.
Fix: AWK rule checks if /claude-code/ appears anywhere in the <host:> line,
handling any comma-separated host list.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
intel352 added a commit that referenced this pull request Apr 25, 2026
… plan+design

Finding #1 (plan:395): verification command pipes to grep, which exits 1 when no
matches — looks like failure. Fix: add || true, update expected output description.

Finding #2 (design:142): described markers as "HTML-comment-shaped" but the
syntax is angle-bracket tags (not HTML comments). Fix: "angle-bracket tag syntax
that most markdown renderers treat as unknown HTML tags".

Finding #3 (design:178): prose said OpenCode/Cursor "get blank entries" but
table uses sentinel value host-pass-through. Fix: prose now matches sentinel.

Finding #4 (plan:1232): "Two distinct clean Copilot review timestamps" hard gate
has no anchor policy doc. Fix: downgrade to "recommended practice" phrasing.

Finding #5 (plan:65): skill-content-grep.expected.txt listed as a created file
and in ALLOWED_FILES but the script never reads it — dead artifact. Fix: drop
the file from the task entirely.

Finding #6 (plan:141): guard script ran grep twice per token, reported line
numbers from the stripped (AWK-processed) string rather than the original file.
Fix: single-pass AWK that emits "LINENO:content" preserving original line
numbers; single grep pass with || true.

Finding #7 (plan:132): skip-regex <host: *claude-code *> only matched the exact
single-host tag form, not comma-separated lists like <host: codex, claude-code>.
Fix: AWK rule checks if /claude-code/ appears anywhere in the <host:> line,
handling any comma-separated host list.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
intel352 added a commit that referenced this pull request Apr 25, 2026
…ign docs (#13)

* fix(pr12-round2): address 7 Copilot findings on cross-LLM portability plan+design

Finding #1 (plan:395): verification command pipes to grep, which exits 1 when no
matches — looks like failure. Fix: add || true, update expected output description.

Finding #2 (design:142): described markers as "HTML-comment-shaped" but the
syntax is angle-bracket tags (not HTML comments). Fix: "angle-bracket tag syntax
that most markdown renderers treat as unknown HTML tags".

Finding #3 (design:178): prose said OpenCode/Cursor "get blank entries" but
table uses sentinel value host-pass-through. Fix: prose now matches sentinel.

Finding #4 (plan:1232): "Two distinct clean Copilot review timestamps" hard gate
has no anchor policy doc. Fix: downgrade to "recommended practice" phrasing.

Finding #5 (plan:65): skill-content-grep.expected.txt listed as a created file
and in ALLOWED_FILES but the script never reads it — dead artifact. Fix: drop
the file from the task entirely.

Finding #6 (plan:141): guard script ran grep twice per token, reported line
numbers from the stripped (AWK-processed) string rather than the original file.
Fix: single-pass AWK that emits "LINENO:content" preserving original line
numbers; single grep pass with || true.

Finding #7 (plan:132): skip-regex <host: *claude-code *> only matched the exact
single-host tag form, not comma-separated lists like <host: codex, claude-code>.
Fix: AWK rule checks if /claude-code/ appears anywhere in the <host:> line,
handling any comma-separated host list.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cross-llm-plan): address 14 Copilot round-1 findings on plan+design

Plan doc (cross-llm-portability.md):
- Add lowercase sonnet/opus/haiku to forbidden-tokens list and explain why
- Add rationale for not yet including AskUserQuestion/Agent (PRs B-D)
- Remove tests/skill-content-grep.sh from ALLOWED_FILES (tests/ not scanned)
- Fix AWK skip to match only exclusive <host: claude-code> blocks, not
  multi-host tags; update inline comment accordingly
- Remove dead fail=1 inside while-pipe subshell; rely solely on $tmp check
- Add || true to all 7 skill-scoped grep verification commands (lines
  443, 517, 626, 694, 764, 844, 888) — grep exits 1 on no-match (PASS case)

Design doc (cross-llm-portability-design.md):
- Fix broken path: subagent-driven-development.md →
  skills/subagent-driven-development/SKILL.md
- Remove false gpt-5.x claim from grep guard description; clarify that
  only Claude-Code brand names are guarded, Codex names use <host:> markers
- Replace "Open questions deferred to implementation" with resolved decisions:
  AWK guard approach, angle-bracket marker syntax, OpenCode tool mapping,
  Cursor support depth scope

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cross-llm-plan): address 7 Copilot round-2 findings on plan + design docs

- set -euo pipefail (was set -u only)
- Fix stray || true in step-title prose and GHA step name
- Verification commands capture guard output before grepping to avoid masking execution errors
- Fix prose claiming gpt-5.x tokens are guarded (guard only covers Claude brand names)
- Design doc: clarify angle-bracket marker advantage is readability in rendered Markdown

* fix(cross-llm-plan): add end-anchor to AWK skip regex so multi-host blocks are never skipped

Agent-Logs-Url: https://github.com/GoCodeAlone/claude-superpowers/sessions/24d5a30f-f7b3-43e7-83dd-60c470d0f4a4

Co-authored-by: intel352 <77607+intel352@users.noreply.github.com>

* Update docs/plans/2026-04-25-cross-llm-portability.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: intel352 <77607+intel352@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants