skill: requesting-code-review — adversarial framing rewrite#3
Conversation
Address ten failure patterns observed in multi-agent autonomous development sessions where the current skill set let real bugs reach production. Headline finding: a third-party adversarial code reviewer running on a fresh diff catches bugs the team's own reviewer agent already approved — same base LLM, very different framing and brief. Recommended approach: edit existing skills in place (most patterns) + introduce ONE new mini-skill (runtime-launch-validation) for the cross-cutting concern referenced by both finishing-a-development-branch and writing-plans. Avoids skill proliferation; keeps cross-cutting content in one referenceable place. Patterns addressed: - P1, P7: runtime-launch validation gate in finishing-a-development-branch - P2: sibling-method coverage when fix is class invariant violation - P3: revert-and-verify-fail invariant proof in TDD - P4: version-skew audit step in finishing-a-development-branch - P6 (HEADLINE): requesting-code-review rewrite — adversarial framing, bug-class checklist, iterative loop, scope-vs-dispatch gate, line-anchored output, verdict vocabulary - P8: per-change-class verification table in writing-plans - P9: --design-only flag in brainstorming + writing-plans - P10: DRY team-conventions extraction in subagent-driven-development P5 (sequential dispatch) already covered by existing dispatching-parallel-agents skill — no change needed. Status: design approved, full execution pipeline will run.
29 task-by-task TDD-structured tasks across 5 PRs: - PR 1: requesting-code-review rewrite (adversarial framing, bug-class checklist, output format, iterative loop, scope compliance gate, verdict vocabulary) + sibling-asymmetry fixture - PR 2: test-driven-development additions (Verify Regression Invariant step, class-invariant violation section) + narrow- regression fixture - PR 3: NEW skills/runtime-launch-validation/ + finishing-a- development-branch Step 1b (runtime gate) + Step 1c (version- skew audit) + missing-pin-skew fixture - PR 4: writing-plans verification-per-change-class table + --design-only flag + brainstorming --design-only mirror + design-only flag fixture - PR 5: subagent-driven-development DRY refactor — extract agents/team-conventions.md, implementer/spec-reviewer/code- reviewer prompts reference conventions instead of inlining + DRY before/after fixture Each PR ends with finishing-a-development-branch + pr-monitoring. PRs sequenced 1→5 for dependency: PR 3 references the new skill introduced in itself; PR 4's flag propagates between two skills; PR 5 dogfoods the DRY pattern from PRs 1-4. Implementer dogfoods: every test under PR 1+ uses revert-and- restore proof (PR 2's discipline). Every PR review uses adversarial brief (PR 1's discipline). PR bodies reference team-conventions instead of inlining (PR 5's discipline). Sanitization: generic placeholders throughout. No project, company, technology, or incident references. Status: design approved. Full execution pipeline runs.
There was a problem hiding this comment.
Pull request overview
This PR updates the requesting-code-review skill to enforce an adversarial, checklist-driven review style and adds a “sibling-asymmetry” fixture demonstrating the behavioral difference between the old vs. new reviewer briefs. It also adds large plan/design docs describing a broader multi-PR skills-improvement initiative.
Changes:
- Rewrites
skills/requesting-code-review/SKILL.mdwith adversarial framing, a scope-vs-dispatch gate, bug-class checklist, mandated per-finding output format, iterative loop protocol, and verdict vocabulary. - Adds a
sibling-asymmetryfixture (diff + dispatch + old/new prompts + expected outputs) and fixture README. - Adds implementation plan + design documents under
docs/plans/.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| skills/requesting-code-review/SKILL.md | Adds new adversarial review framing, checklist, output mandate, loop protocol, and updated dispatch template text |
| skills/requesting-code-review/test-fixtures/sibling-asymmetry/old-prompt.md | Captures the pre-rewrite (validation-framing) reviewer brief for the fixture |
| skills/requesting-code-review/test-fixtures/sibling-asymmetry/new-prompt.md | Captures the post-rewrite (adversarial) reviewer brief applied to the fixture diff |
| skills/requesting-code-review/test-fixtures/sibling-asymmetry/expected-old-result.md | Expected reviewer output under the old brief (approves) |
| skills/requesting-code-review/test-fixtures/sibling-asymmetry/expected-new-result.md | Expected reviewer output under the new brief (flags symmetry violation) |
| skills/requesting-code-review/test-fixtures/sibling-asymmetry/dispatch.md | The “implementer dispatch” text used for scope-vs-dispatch checking in the fixture |
| skills/requesting-code-review/test-fixtures/sibling-asymmetry/diff.patch | Minimal diff used to demonstrate sibling-method asymmetry |
| skills/requesting-code-review/test-fixtures/README.md | Documents fixture purpose and how to add new fixtures |
| docs/plans/2026-04-25-superpowers-skills-improvements.md | Large implementation plan describing a 5-PR sequence of skill upgrades |
| docs/plans/2026-04-25-superpowers-skills-improvements-design.md | Design doc describing motivation/approach and mapping patterns to skills |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -0,0 +1,55 @@ | |||
| You are a code reviewer with adversarial framing. Find at least three things | |||
| wrong with this code, even if they seem minor. Bias toward finding issues. | |||
There was a problem hiding this comment.
This fixture prompt mirrors the “Review-request template” in SKILL.md, but it currently omits the “You are NOT validating that the code matches the dispatch …” sentence that SKILL.md marks as required prompt phrasing. If the template is updated to include that sentence, update this fixture too so it stays in sync.
| wrong with this code, even if they seem minor. Bias toward finding issues. | |
| wrong with this code, even if they seem minor. Bias toward finding issues. | |
| You are NOT validating that the code matches the dispatch; you are reviewing | |
| the code change itself for bugs and issues. |
| ## Example | ||
|
|
There was a problem hiding this comment.
The example that follows this header appears to still use the old placeholder-based dispatch format and old Strengths/Issues output style, which conflicts with the new required brief and per-finding output mandate above. Consider updating the example to match the new template so readers don’t copy the outdated format.
| # Superpowers Skills Improvements — Implementation Plan | ||
|
|
||
| > **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. | ||
|
|
||
| **Goal:** Address ten observed failure patterns in the autonomous-development pipeline by upgrading six existing skills and adding one new mini-skill for cross-cutting runtime-launch validation. |
There was a problem hiding this comment.
PR description lists only requesting-code-review skill + fixtures, but this PR also adds large implementation-plan and design documents under docs/plans/. Either update the PR description to include these docs (and why they belong in this PR) or split them into a separate PR to keep scope clear.
|
|
||
| ``` | ||
| You are a code reviewer with adversarial framing. Find at least three things | ||
| wrong with this code, even if they seem minor. Bias toward finding issues. |
There was a problem hiding this comment.
The “Required prompt phrasing” section includes an extra sentence (“You are NOT validating that the code matches the dispatch...”), but the verbatim review-request template omits it. Either include the full required phrasing in the template (and fixtures), or relax the requirement so the template is compliant.
| wrong with this code, even if they seem minor. Bias toward finding issues. | |
| wrong with this code, even if they seem minor. Bias toward finding issues. | |
| You are NOT validating that the code matches the dispatch — you are looking | |
| for bugs the original author missed. |
Finding 1 (Important): Example section showed old Strengths/Issues/Assessment prose format. Replace with per-finding blocks matching the output format mandate — adversarial framing, Severity/Bug-class/Location/What/Why/Fix. Also align Example's SHA commands with the How-to-Request template. Finding 2 (Minor): Example dispatch now shows filled-in <diff>/<dispatch text> placeholders matching the review-request template directly. Finding 3 (Nit): README step 3 documents the synthesized-marker convention; both expected-*.md files get <!-- synthesized --> header since they were author-written, not captured from live subagent runs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…terative loop) The Example jumped from "fix" directly to "continue" — teaching agents the exact failure mode the iterative-loop protocol was written to eliminate. Add a round-2 review dispatch after the fix step, followed by a SHIP-IT verdict, so the Example demonstrates the full loop: review → fix → re-review → SHIP-IT → proceed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| **Required prompt phrasing in every dispatch:** | ||
|
|
||
| > Find at least three things wrong with this code, even if they seem minor. Bias toward finding issues. You are NOT validating that the code matches the dispatch — you are looking for bugs the original author missed. | ||
|
|
||
| Why: the same base model produces dramatically different output under "find what's wrong" vs. "review the work." Validation framing biases toward approval; adversarial framing biases toward rigor. The shift is mechanical and reproducible. | ||
|
|
||
| **Forbidden phrasing**: "review the work", "verify the implementation matches the plan", "confirm correctness", any prompt that implies the reviewer's job is to validate. These produce sign-offs that miss real bugs. | ||
|
|
||
| **Reflexive approval is forbidden.** If the reviewer finds nothing wrong, they must state which checks were run (the bug-class checklist below) and what specifically they verified. A bare "looks good" is rejected. | ||
|
|
||
| ## Scope-vs-dispatch compliance gate (RUN FIRST) | ||
|
|
||
| Before any other check, the reviewer compares the PR diff against the dispatch text — the message that asked the implementer to do the work. |
There was a problem hiding this comment.
The required prompt phrasing says the reviewer is “NOT validating that the code matches the dispatch”, but the very next section mandates a scope-vs-dispatch compliance gate. As written, these two directives conflict and could cause reviewers to skip or underweight the compliance gate.
Suggested fix: reword the required phrasing to clarify that reviewers should not stop at validation / should go beyond dispatch compliance, while still running the scope-vs-dispatch gate first.
Sanitization violations (docs): - design doc line 332: generalize implementer reference - plan line 29: generalize implementer reference - plan line 11: clarify sanitization rule — generic tech terms OK, specific project/company/version/incident references not Skill text inconsistencies: - Error swallowing: replace invalid `_, err := ...; result` with valid `_, err := f(); _ = err` pattern - Scope-vs-dispatch drift entry: "gate below" → "gate above" (gate is above) - Iterative loop paragraph: rewrite to resolve internal contradiction — Minor findings do not auto-trigger re-review; orchestrator inspects before merge - Review-request template: add "You are NOT validating..." sentence to match the Required prompt phrasing section - Review-request template: replace abbreviated per-finding format with the full ### Finding N block format matching the output-format section - Example dispatch: replace shorthand key-value form with verbatim brief (showing the actual template filled in), so agents model correct usage Fixture inconsistencies: - new-prompt.md: add missing "You are NOT validating..." sentence - expected-new-result.md: fix Delete's "kind" line ref (17 → 27) and Inspect's location (lib/dispatcher.go:13 → :17) to match @@ -10,6 +10,12 hunk positioning in the actual fixture diff Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ound 4 cont.) Finding 13 — internal contradiction: adversarial framing vs. compliance gate Add clarifying note in the adversarial framing section: "NOT validating" applies to the bug-class scan mindset, not to the structural compliance gate which runs as a separate first-pass check. Both mandates are valid; they apply to different phases (gate first, adversarial scan second). Finding 14 — lib/dispatcher.go:13 wrong location Already fixed in f748c86 (:13 → :17). No change needed. Finding 15 — README "do not synthesize" contradicts synthesized-marker docs Rewrite step 3: prefer real transcripts; synthesized results acceptable for initial demonstrations only if labeled <!-- synthesized -->; encourage replacing with live transcripts over time. No contradiction remains. Finding 16 — plan section order doesn't match SKILL.md Update plan Task 1 section list: compliance gate moves to position 2 (before bug-class checklist), matching the actual SKILL.md structure and the design's "RUN FIRST" mandate. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… format Finding 1 (Important): new-prompt.md still used abbreviated bullet list for the per-finding format after f748c86 updated SKILL.md and the review-request template. Replace with the verbatim ### Finding N — block (6 required fields) so the canonical fixture matches the authoritative format in SKILL.md:164-172. Finding 2 (Minor): PR description updated (no code change) to document the clarification paragraph added in d085267 as a necessary consequence of finding #8 — adding "NOT validating" to the template without the two-phase clarification creates a logical contradiction. Scope note added to PR body. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Use the review-request template above. Replace `<diff>` with the output of | ||
| `git diff $BASE_SHA..$HEAD_SHA` and `<dispatch text>` with the original |
There was a problem hiding this comment.
This section instructs users to use the new review-request template above, but the file still ends with "See template at: requesting-code-review/code-reviewer.md". That referenced template currently uses validation framing (placeholders like WHAT_WAS_IMPLEMENTED/PLAN_OR_REQUIREMENTS) and conflicts with the new adversarial brief; either update code-reviewer.md to match the new template or remove/adjust the reference to avoid users dispatching the outdated prompt.
| Use the review-request template above. Replace `<diff>` with the output of | |
| `git diff $BASE_SHA..$HEAD_SHA` and `<dispatch text>` with the original | |
| Use the adversarial review-request template in this file above. Do **not** | |
| use any legacy template reference such as | |
| `requesting-code-review/code-reviewer.md`. Replace `<diff>` with the output | |
| of `git diff $BASE_SHA..$HEAD_SHA` and `<dispatch text>` with the original |
| One review pass is insufficient. After implementer addresses findings, the reviewer **re-reads the new diff from scratch** and runs the full bug-class checklist again. New issues commonly surface — fixes introduce new bugs, fixes only address the symptom, or the implementer interprets the finding too narrowly. | ||
|
|
||
| Loop: | ||
|
|
||
| 1. Implementer dispatches review. | ||
| 2. Reviewer scans (bug-class checklist) → produces findings. | ||
| 3. Implementer addresses each finding, pushes new commits. | ||
| 4. **Reviewer re-reads the new diff (not just the diff-of-diffs)** and runs the full checklist again. | ||
| 5. Repeat until reviewer's verdict is SHIP-IT with no Critical or Important findings. |
There was a problem hiding this comment.
The iterative loop protocol says the reviewer re-reads the new diff and "runs the full checklist again", but the overall process also requires the scope-vs-dispatch compliance gate to run first. To keep the loop unambiguous (especially across multiple rounds), consider explicitly stating that each re-review re-runs the full required sequence (gate first, then bug-class scan) rather than only the bug-class checklist.
| One review pass is insufficient. After implementer addresses findings, the reviewer **re-reads the new diff from scratch** and runs the full bug-class checklist again. New issues commonly surface — fixes introduce new bugs, fixes only address the symptom, or the implementer interprets the finding too narrowly. | |
| Loop: | |
| 1. Implementer dispatches review. | |
| 2. Reviewer scans (bug-class checklist) → produces findings. | |
| 3. Implementer addresses each finding, pushes new commits. | |
| 4. **Reviewer re-reads the new diff (not just the diff-of-diffs)** and runs the full checklist again. | |
| 5. Repeat until reviewer's verdict is SHIP-IT with no Critical or Important findings. | |
| One review pass is insufficient. After implementer addresses findings, the reviewer **re-reads the new diff from scratch** and re-runs the full required review sequence: **first the scope-vs-dispatch compliance gate, then the full bug-class checklist**. New issues commonly surface — fixes introduce new bugs, fixes only address the symptom, or the implementer interprets the finding too narrowly. | |
| Loop: | |
| 1. Implementer dispatches review. | |
| 2. Reviewer runs the full required review sequence — **scope-vs-dispatch compliance gate first, then bug-class scan** → produces findings. | |
| 3. Implementer addresses each finding, pushes new commits. | |
| 4. **Reviewer re-reads the new diff (not just the diff-of-diffs)** and re-runs that same full required sequence — **scope-vs-dispatch compliance gate first, then bug-class scan**. | |
| 5. Repeat that full sequence each round until reviewer's verdict is SHIP-IT with no Critical or Important findings. |
|
|
||
| For each finding, use this exact format: | ||
|
|
||
| ### Finding N — <one-line summary> |
There was a problem hiding this comment.
The fixture's "Required output" section describes a different per-finding format (a single bullet listing fields) than the SKILL's mandated "### Finding N" block with named fields. Since this fixture is meant to demonstrate the post-rewrite prompt, it should mirror the exact template/format from skills/requesting-code-review/SKILL.md so the expected output is testing the same constraint.
| ### Finding N — <one-line summary> | |
| ### Finding N | |
| - **Summary**: <one-line summary> |
| ### Finding 1 — Inspect omits `kind` arg present in all sibling methods | ||
| - **Severity**: Important | ||
| - **Bug class**: Symmetry violations | ||
| - **Location**: `lib/dispatcher.go:17` | ||
| - **What's wrong**: The new `Inspect` method's args map omits `"kind": d.kind`. Sibling methods `Create` (line 11) and `Delete` (line 27) both set it as the first key. | ||
| - **Why it matters**: Receivers downstream rely on `kind` to dispatch correctly. Calling `Inspect` will silently target the wrong dispatch path or return `not found` because the receiver cannot disambiguate the request type. | ||
| - **Suggested fix**: Add `"kind": d.kind,` as the first key in the args map, matching siblings. | ||
|
|
||
| ## Verdict: REQUEST-CHANGES | ||
|
|
||
| One Important finding (sibling-method symmetry violation); fix is one line. Once added and re-reviewed, expected to be SHIP-IT. |
There was a problem hiding this comment.
The new prompt mandates "Find at least three things wrong with this code", but this expected result only produces a single finding. For the fixture to be a reliable regression test of the prompt behavior, either add two additional Minor/Nit findings that the checklist would plausibly flag (and include them in the Findings section), or revise the prompt requirement so the expected output can legitimately contain fewer than three findings when only one real issue exists.
…ound 6) Finding 1 — template "NOT validating" lacks two-phase clarification Restructure the review-request template's Required output section: rename phase 1 as "Scope-vs-dispatch compliance gate (structural check)" and phase 2 as "Bug-class scan (adversarial)" where "NOT validating" applies. Add a Note block at the top of the template explaining the two-phase distinction so the brief is unambiguous when read in isolation. Finding 2 — stale reference to code-reviewer.md Remove final line "See template at: requesting-code-review/code-reviewer.md" — the file ships a competing validation-framing template and contradicts the new brief. No second template. Finding 3 — iterative loop doesn't mandate compliance gate each round Update loop steps 2 and 4: each round explicitly runs (a) compliance gate then (b) full bug-class checklist, in that order. Finding 4 — triple-backtick fence nesting breaks plan rendering Change outer fence from ``` to ```` (4 backticks) in Task 5's markdown block so the inner ``` fences don't prematurely close the outer block. Finding 5 — new-prompt.md per-finding format Already fixed in 381515e. No change needed. Finding 6 — expected-new-result has 1 finding vs "at least three" mandate Use the "state which checks ran" path: expand bug-class scan section into a table showing all 9 classes with explicit results, demonstrating the reviewer ran the full checklist. The one Important finding (symmetry violation) stands; remaining classes documented as checked with no findings. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| **Files:** | ||
| - Modify: `skills/requesting-code-review/SKILL.md` | ||
|
|
||
| **Step 1: Insert at the bottom of the skill (after the existing existing content)** |
There was a problem hiding this comment.
Duplicate word: "existing existing".
| **Step 1: Insert at the bottom of the skill (after the existing existing content)** | |
| **Step 1: Insert at the bottom of the skill (after the existing content)** |
|
|
||
| **Required prompt phrasing in every dispatch:** | ||
|
|
||
| > Find at least three things wrong with this code, even if they seem minor. Bias toward finding issues. You are NOT validating that the code matches the dispatch — you are looking for bugs the original author missed. |
There was a problem hiding this comment.
The required prompt phrasing says to "Find at least three things wrong", but later the skill allows the reviewer to find nothing (and instead report which checks were run). As written this is internally inconsistent and may cause reviewers to manufacture findings; consider rephrasing to something like "try to find at least three" or "identify at least three potential issues/risks" while keeping the "if none, enumerate checks run" escape hatch.
| > Find at least three things wrong with this code, even if they seem minor. Bias toward finding issues. You are NOT validating that the code matches the dispatch — you are looking for bugs the original author missed. | |
| > Try to find at least three potential issues or risks in this code, even if they seem minor. Bias toward finding issues. You are NOT validating that the code matches the dispatch — you are looking for bugs the original author missed. |
| Verdict: REQUEST-CHANGES — one Important finding (missing progress indicators); fix before merge. | ||
|
|
There was a problem hiding this comment.
In the example, round 2 says it's using the "same diff range $BASE_SHA..$HEAD_SHA — now includes the fix commits", but after pushing new commits the original HEAD_SHA no longer points at the updated tip. Suggest updating the example to recompute HEAD_SHA (or use git diff $BASE_SHA..HEAD) so the re-review actually includes the fix commits.
| BASE_SHA=$(git rev-parse HEAD~1) # or origin/main | ||
| HEAD_SHA=$(git rev-parse HEAD) | ||
| ``` | ||
|
|
||
| **2. Dispatch code-reviewer subagent:** |
There was a problem hiding this comment.
This section directs users to use the in-file review-request template, but the file later still points readers at requesting-code-review/code-reviewer.md, which currently contains the old validation-framing template. This conflicting guidance is likely to regress usage back to the old prompt; consider removing/updating the reference, or updating code-reviewer.md to match the new adversarial template and required output format.
| - **Suggested fix**: Add `"kind": d.kind,` as the first key in the args map, matching siblings. | ||
|
|
||
| ## Verdict: REQUEST-CHANGES |
There was a problem hiding this comment.
This finding block cites Delete as setting kind on line 27, but in the fixture diff Delete's map has "kind": d.kind on line 26 and "name": req.Name on line 27 (based on the hunk starting at +10). Updating these line refs will keep the required file:line anchoring accurate.
| 3. Capture both old and new prompt outputs (run the actual reviewer subagent | ||
| under each prompt; do not synthesize). |
There was a problem hiding this comment.
This plan text says fixture expected results should be captured from live subagent runs and "do not synthesize", but the fixture README added in this PR explicitly allows synthesized results when labeled <!-- synthesized -->. Please align the plan with the repo's documented convention (prefer live transcripts, allow synthesized with explicit labeling for initial demos).
| 3. Capture both old and new prompt outputs (run the actual reviewer subagent | |
| under each prompt; do not synthesize). | |
| 3. Capture both old and new prompt outputs. Prefer running the actual reviewer | |
| subagent under each prompt; for an initial demo, a synthesized expected | |
| result is allowed only if it is explicitly labeled `<!-- synthesized -->`. |
…it fixes Addresses 6 post-merge Copilot findings on PR #3 (Skills PR 1) plus 2 round-2 follow-ups: duplicate word in plan, 'find at least three' OR clause for ≤3 findings consistently propagated to template + example, BASE_SHA..HEAD_SHA semantic clarification, fixture line refs corrected, plan/README synthesized-OK alignment, indentation cleanup.
…follow-up) Fixes line-ref errors in expected-new-result.md citing wrong line numbers relative to diff.patch hunk. Create's kind on line 12, Inspect inserted at lines 18-23, Delete's kind on line 27. Surfaced by Copilot post-merge of PR #3.
… plan+design Finding #1 (plan:395): verification command pipes to grep, which exits 1 when no matches — looks like failure. Fix: add || true, update expected output description. Finding #2 (design:142): described markers as "HTML-comment-shaped" but the syntax is angle-bracket tags (not HTML comments). Fix: "angle-bracket tag syntax that most markdown renderers treat as unknown HTML tags". Finding #3 (design:178): prose said OpenCode/Cursor "get blank entries" but table uses sentinel value host-pass-through. Fix: prose now matches sentinel. Finding #4 (plan:1232): "Two distinct clean Copilot review timestamps" hard gate has no anchor policy doc. Fix: downgrade to "recommended practice" phrasing. Finding #5 (plan:65): skill-content-grep.expected.txt listed as a created file and in ALLOWED_FILES but the script never reads it — dead artifact. Fix: drop the file from the task entirely. Finding #6 (plan:141): guard script ran grep twice per token, reported line numbers from the stripped (AWK-processed) string rather than the original file. Fix: single-pass AWK that emits "LINENO:content" preserving original line numbers; single grep pass with || true. Finding #7 (plan:132): skip-regex <host: *claude-code *> only matched the exact single-host tag form, not comma-separated lists like <host: codex, claude-code>. Fix: AWK rule checks if /claude-code/ appears anywhere in the <host:> line, handling any comma-separated host list. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… plan+design Finding #1 (plan:395): verification command pipes to grep, which exits 1 when no matches — looks like failure. Fix: add || true, update expected output description. Finding #2 (design:142): described markers as "HTML-comment-shaped" but the syntax is angle-bracket tags (not HTML comments). Fix: "angle-bracket tag syntax that most markdown renderers treat as unknown HTML tags". Finding #3 (design:178): prose said OpenCode/Cursor "get blank entries" but table uses sentinel value host-pass-through. Fix: prose now matches sentinel. Finding #4 (plan:1232): "Two distinct clean Copilot review timestamps" hard gate has no anchor policy doc. Fix: downgrade to "recommended practice" phrasing. Finding #5 (plan:65): skill-content-grep.expected.txt listed as a created file and in ALLOWED_FILES but the script never reads it — dead artifact. Fix: drop the file from the task entirely. Finding #6 (plan:141): guard script ran grep twice per token, reported line numbers from the stripped (AWK-processed) string rather than the original file. Fix: single-pass AWK that emits "LINENO:content" preserving original line numbers; single grep pass with || true. Finding #7 (plan:132): skip-regex <host: *claude-code *> only matched the exact single-host tag form, not comma-separated lists like <host: codex, claude-code>. Fix: AWK rule checks if /claude-code/ appears anywhere in the <host:> line, handling any comma-separated host list. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ign docs (#13) * fix(pr12-round2): address 7 Copilot findings on cross-LLM portability plan+design Finding #1 (plan:395): verification command pipes to grep, which exits 1 when no matches — looks like failure. Fix: add || true, update expected output description. Finding #2 (design:142): described markers as "HTML-comment-shaped" but the syntax is angle-bracket tags (not HTML comments). Fix: "angle-bracket tag syntax that most markdown renderers treat as unknown HTML tags". Finding #3 (design:178): prose said OpenCode/Cursor "get blank entries" but table uses sentinel value host-pass-through. Fix: prose now matches sentinel. Finding #4 (plan:1232): "Two distinct clean Copilot review timestamps" hard gate has no anchor policy doc. Fix: downgrade to "recommended practice" phrasing. Finding #5 (plan:65): skill-content-grep.expected.txt listed as a created file and in ALLOWED_FILES but the script never reads it — dead artifact. Fix: drop the file from the task entirely. Finding #6 (plan:141): guard script ran grep twice per token, reported line numbers from the stripped (AWK-processed) string rather than the original file. Fix: single-pass AWK that emits "LINENO:content" preserving original line numbers; single grep pass with || true. Finding #7 (plan:132): skip-regex <host: *claude-code *> only matched the exact single-host tag form, not comma-separated lists like <host: codex, claude-code>. Fix: AWK rule checks if /claude-code/ appears anywhere in the <host:> line, handling any comma-separated host list. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(cross-llm-plan): address 14 Copilot round-1 findings on plan+design Plan doc (cross-llm-portability.md): - Add lowercase sonnet/opus/haiku to forbidden-tokens list and explain why - Add rationale for not yet including AskUserQuestion/Agent (PRs B-D) - Remove tests/skill-content-grep.sh from ALLOWED_FILES (tests/ not scanned) - Fix AWK skip to match only exclusive <host: claude-code> blocks, not multi-host tags; update inline comment accordingly - Remove dead fail=1 inside while-pipe subshell; rely solely on $tmp check - Add || true to all 7 skill-scoped grep verification commands (lines 443, 517, 626, 694, 764, 844, 888) — grep exits 1 on no-match (PASS case) Design doc (cross-llm-portability-design.md): - Fix broken path: subagent-driven-development.md → skills/subagent-driven-development/SKILL.md - Remove false gpt-5.x claim from grep guard description; clarify that only Claude-Code brand names are guarded, Codex names use <host:> markers - Replace "Open questions deferred to implementation" with resolved decisions: AWK guard approach, angle-bracket marker syntax, OpenCode tool mapping, Cursor support depth scope Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(cross-llm-plan): address 7 Copilot round-2 findings on plan + design docs - set -euo pipefail (was set -u only) - Fix stray || true in step-title prose and GHA step name - Verification commands capture guard output before grepping to avoid masking execution errors - Fix prose claiming gpt-5.x tokens are guarded (guard only covers Claude brand names) - Design doc: clarify angle-bracket marker advantage is readability in rendered Markdown * fix(cross-llm-plan): add end-anchor to AWK skip regex so multi-host blocks are never skipped Agent-Logs-Url: https://github.com/GoCodeAlone/claude-superpowers/sessions/24d5a30f-f7b3-43e7-83dd-60c470d0f4a4 Co-authored-by: intel352 <77607+intel352@users.noreply.github.com> * Update docs/plans/2026-04-25-cross-llm-portability.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: intel352 <77607+intel352@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Summary
skills/requesting-code-review/SKILL.md— full rewrite: adversarial framing, scope-vs-dispatch compliance gate (runs first), 9-class bug checklist, per-finding output format mandate (file:line required), iterative loop protocol (max 5 rounds), verdict vocabulary, updated review-request templateskills/requesting-code-review/test-fixtures/sibling-asymmetry/— 6-file fixture demonstrating the before/after: old prompt approves a sibling-method asymmetry, new prompt flags it as Important (REQUEST-CHANGES)docs/plans/2026-04-25-superpowers-skills-improvements.md— implementation plan for the full 5-PR sequencedocs/plans/2026-04-25-superpowers-skills-improvements-design.md— design doc capturing failure modes and approach rationaleScope notes
0e0016d) — both are complete and correct; the commit message covers both.d085267(two-phase model: gate first, adversarial scan second) is a necessary consequence of finding fix(tdd): post-merge Copilot round-4 fixes — class-invariant rewrite + fixture clarity #8 — adding "NOT validating" to the template without the clarification creates a logical contradiction between the template and the compliance gate mandate.This PR is its own strongest validation
The team-internal review chain (spec-reviewer + code-reviewer) issued SHIP-IT three times across 5 rounds while Copilot found 22 substantive findings still open. The new brief — adversarial framing, full bug-class checklist, per-finding format mandate — is precisely the fix for that gap. It was needed on the PR introducing it.
This is not a process failure unique to this team or this PR. It is the reproducible baseline the brief is designed to eliminate. The gap closes when the external reviewer (Copilot, or a reviewer using the new brief) runs the full checklist adversarially rather than confirming the implementer's narrative.
Round history
Test plan
SKILL.mdadversarial framing section: "NOT validating" applies to bug-class scan; compliance gate is separate structural checkcode-reviewer.mdat end of file### Finding N —heading with 6 required fieldsnew-prompt.mdper-finding format matches SKILL.md### Finding N —block exactlyexpected-new-result.md: 1 Important finding + table of all 9 bug classes checked🤖 Generated with Claude Code