feat: add review-research-pr skill for research recipe PRs#588
Merged
Trecek merged 7 commits intointegrationfrom Apr 3, 2026
Merged
Conversation
…earch-pr (Part A) - New tests/skills/test_review_research_pr_guards.py: 8 behavioral guards enforcing research lens coverage, inconclusive-results contract, verdict mechanics, and GitHub Reviews API posting requirements. - Append TestResearchRecipeStructure to test_bundled_recipes.py: asserts research.yaml has review_pr ingredient (default=false), review_research_pr step, skip_when_false gate, and routes to research_complete on any outcome. - Update test_skills.py: insert review-research-pr into BUNDLED_SKILLS, bump skills_extended count 82→83, bump list_all count 84→85, add review-research-pr to RESEARCH_SKILL_NAMES. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…(Part A) - Create skills_extended/review-research-pr/SKILL.md with 7 research lenses (methodology, reproducibility, report-quality, statistical-rigor, isolation, data-integrity, slop), no deletion_regression, inconclusive-valid constraint, [LNNN] markers, and GitHub Reviews API posting mechanics - Add review_pr ingredient (default false) and review_research_pr step to research.yaml, gated by skip_when_false, routing to research_complete on any outcome - Add review-research-pr to tier3 in defaults.yaml - Update write-recipe/SKILL.md bundled skills list - Update doc skill counts (85 → 86) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…older allowlist - Use inputs.review_pr format in research.yaml skip_when_false (semantic rule requires inputs. prefix) - Add on_context_limit: research_complete to review_research_pr step (advisory step guard) - Fix test_research_review_step_skip_when_false to expect inputs.review_pr - Add review-research-pr entries to placeholder allowlist in test_skill_placeholder_contracts.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Trecek
commented
Apr 3, 2026
Collaborator
Author
Trecek
left a comment
There was a problem hiding this comment.
AutoSkillit PR Review — Verdict: changes_requested
Trecek
commented
Apr 3, 2026
Collaborator
Author
Trecek
left a comment
There was a problem hiding this comment.
AutoSkillit review found 5 blocking issues. See inline comments. (Note: REQUEST_CHANGES bypassed — cannot request changes on own PR; verdict: changes_requested)
…ategories: line, not anywhere in file Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ews co-location, not independent tokens Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…context_limit routing - TestResearchRecipeStructure: scope='class' -> scope='function' for xdist safety - test_research_review_step_routes_to_complete_on_any_outcome: add on_context_limit assertion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ls_in_skills_extended Function name encoded a stale count (58); implementation asserts 83. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Introduces a new
review-research-prskill tailored for automated review of research-oriented PRs, replacing production-code lenses with seven research-appropriate lenses covering methodology, reproducibility, report quality, statistical rigor, isolation, data integrity, and slop detection. The skill is wired intoresearch.yamlas an optional gated step and follows the same structural conventions asreview-pr. Supporting infrastructure updates include skill discovery config, bundled skill count assertions, and write-recipe documentation.Individual Group Plans
Group 1: Implementation Plan: Create review-research-pr Skill (Issue #526) — Part A
Create
src/autoskillit/skills_extended/review-research-pr/SKILL.md— a diff-scoped PR review skill with 7 research-appropriate lenses replacingreview-pr's 7 production-code lenses. The skill is structurally identical toreview-pr(same GitHub Reviews API mechanics, same tiered fallback, same verdict tokens), with three differences: 7 research lenses (no deletion_regression), no Step 2.5 (deletion_regression pre-computation omitted), and inconclusive-valid constraint on thereport-qualitylens.Group 2: Implementation Plan: Create review-research-pr Skill (Issue #526) — Part B
Complete implementation steps and verification: creates
review-research-pr/SKILL.md, updatesresearch.yaml(newreview_pringredient +review_research_prstep), updatesconfig/defaults.yaml(tier3 registration), updatestests/workspace/test_skills.py(counts 84→85, RESEARCH_SKILL_NAMES), addstests/skills/test_review_research_pr_guards.py, and appendsTestResearchRecipeStructuretotests/recipe/test_bundled_recipes.py.Requirements
From TalonT-Org/spectral-init#141:
<feature-branch>and<base-branch>matchingreview-printerface.[LNNN]line markers.approved,changes_requested,needs_humanverdict tokens.open_research_pr.Architecture Impact
Process Flow Diagram
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; START([Phase 2 converges]) COMPLETE([research_complete]) subgraph RecipeTail ["● research.yaml — Modified Phase 2 Tail"] push["push_branch<br/>━━━━━━━━━━<br/>git push -u origin HEAD"] open_pr["open_research_pr<br/>━━━━━━━━━━<br/>gh pr create<br/>on_success → review_research_pr"] gate{"● skip_when_false gate<br/>━━━━━━━━━━<br/>inputs.review_pr == true?"} end subgraph SkillFlow ["★ review-research-pr/SKILL.md — Execution Flow"] direction TB subgraph Init ["Step 0–2: Validate + Find PR + Get Diff"] s0["★ Step 0: Derive feature_branch<br/>━━━━━━━━━━<br/>worktree path → git rev-parse HEAD<br/>derive escalation_user mention"] s1["★ Step 1: Find open PR<br/>━━━━━━━━━━<br/>gh pr list --head feature_branch<br/>exit 0 verdict=approved if gh unavailable"] s2["★ Step 2: Get PR diff<br/>━━━━━━━━━━<br/>gh pr diff → annotated diff<br/>line number [LNNN] markers"] end subgraph Lenses ["Step 3: 7 Parallel Research Lenses (Task, sonnet)"] direction LR L1["★ methodology<br/>━━━━━━━━━━<br/>hypothesis, variables,<br/>controls"] L2["★ reproducibility<br/>━━━━━━━━━━<br/>self-contained scripts,<br/>deps, seeds"] L3["★ report-quality<br/>━━━━━━━━━━<br/>completeness — inconclusive<br/>is valid outcome"] L4["★ statistical-rigor<br/>━━━━━━━━━━<br/>metrics, fair comparisons,<br/>effect sizes"] L5["★ isolation<br/>━━━━━━━━━━<br/>research/ scope,<br/>prod changes minimal"] L6["★ data-integrity<br/>━━━━━━━━━━<br/>raw results preserved,<br/>figures match data"] L7["slop<br/>━━━━━━━━━━<br/>AI filler, dead code,<br/>verbose boilerplate"] end subgraph Aggregate ["Step 4–5: Aggregate + Verdict"] agg["★ Step 4: Deduplicate findings<br/>━━━━━━━━━━<br/>by (file, line) pairs<br/>requires_decision axis"] verdict{"★ Step 5: Verdict decision<br/>━━━━━━━━━━<br/>any requires_decision?<br/>any blocking findings?"} end subgraph Post ["Step 6: Tiered Review Posting"] tier1["★ Tier 1: Batch POST /reviews<br/>━━━━━━━━━━<br/>GitHub Reviews API<br/>inline comments + event"] tier2["★ Tier 2 Fallback: /pulls/comments<br/>━━━━━━━━━━<br/>per-finding individual POST"] tier3["★ Tier 3 DEGRADED<br/>━━━━━━━━━━<br/>bullet-list body dump"] end emit["★ Step 7–8: Submit Review + emit verdict=<br/>━━━━━━━━━━<br/>approved · changes_requested · needs_human"] end START --> push --> open_pr --> gate gate -->|"review_pr=false (default)"| COMPLETE gate -->|"review_pr=true"| s0 s0 --> s1 --> s2 s2 --> L1 & L2 & L3 & L4 & L5 & L6 & L7 L1 & L2 & L3 & L4 & L5 & L6 & L7 --> agg agg --> verdict verdict -->|"no blockers"| tier1 verdict -->|"has blockers / needs_human"| tier1 tier1 -->|"batch ok"| emit tier1 -->|"batch fails"| tier2 tier2 -->|"some ok"| emit tier2 -->|"all fail"| tier3 tier3 --> emit emit --> COMPLETE class START,COMPLETE terminal; class push,open_pr phase; class gate stateNode; class s0,s1,s2 handler; class L1,L2,L3,L4,L5,L6 newComponent; class L7,agg handler; class verdict stateNode; class tier1,tier2,emit newComponent; class tier3 detector;Color Legend:
Module Dependency Diagram
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%% graph TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; subgraph L3 ["LAYER 3 — TESTS (verify contracts)"] direction LR t_skills["★ test_review_research_pr_guards.py<br/>━━━━━━━━━━<br/>behavioral guards: lenses,<br/>verdicts, API mechanics"] t_recipe["● test_bundled_recipes.py<br/>━━━━━━━━━━<br/>TestResearchRecipeStructure:<br/>review_pr ingredient + step"] t_workspace["● test_skills.py<br/>━━━━━━━━━━<br/>BUNDLED_SKILLS count 84→85<br/>RESEARCH_SKILL_NAMES extended"] end subgraph L2 ["LAYER 2 — RECIPE ORCHESTRATION (wires skill into pipeline)"] research["● research.yaml<br/>━━━━━━━━━━<br/>review_pr ingredient (default false)<br/>review_research_pr step<br/>skip_when_false · on_context_limit"] end subgraph L1 ["LAYER 1 — SKILLS EXTENDED (implementations)"] skill["★ review-research-pr/SKILL.md<br/>━━━━━━━━━━<br/>7 research lenses: methodology,<br/>reproducibility, report-quality,<br/>statistical-rigor, isolation,<br/>data-integrity, slop<br/>categories: [research]"] write_recipe["● write-recipe/SKILL.md<br/>━━━━━━━━━━<br/>bundled skills list:<br/>review-research-pr inserted<br/>after review-pr"] end subgraph L0 ["LAYER 0 — CONFIG + DISCOVERY REGISTRY (tier assignment)"] defaults["● config/defaults.yaml<br/>━━━━━━━━━━<br/>skills.tier3 list:<br/>review-research-pr added<br/>(same tier as review-pr)"] resolver["workspace/skills.py<br/>━━━━━━━━━━<br/>SkillResolver<br/>reads tier config<br/>list_all() count: 85"] end t_skills -->|"reads SKILL.md<br/>asserts lens coverage"| skill t_recipe -->|"loads research.yaml<br/>asserts step + ingredient"| research t_workspace -->|"calls SkillResolver<br/>asserts count=85"| resolver research -->|"run_skill: /autoskillit:review-research-pr<br/>discovers skill at invocation time"| skill write_recipe -->|"documents<br/>review-research-pr in<br/>bundled skills list"| skill defaults -->|"tier3 list<br/>consumed by SkillResolver"| resolver skill -->|"discovered via<br/>tier3 → skills_extended dir"| defaults class t_skills newComponent; class t_recipe,t_workspace handler; class research phase; class skill newComponent; class write_recipe handler; class defaults stateNode; class resolver output;Color Legend:
Closes #526
Implementation Plan
Plan files:
/home/talon/projects/autoskillit-runs/impl-20260403-092441-340811/.autoskillit/temp/make-plan/review_research_pr_plan_2026-04-03_120000_part_a.md/home/talon/projects/autoskillit-runs/impl-20260403-092441-340811/.autoskillit/temp/make-plan/review_research_pr_plan_2026-04-03_120000_part_b.md🤖 Generated with Claude Code via AutoSkillit
Token Usage Summary