feat: add review-research-pr skill for research recipe PRs by Trecek · Pull Request #588 · TalonT-Org/AutoSkillit

Trecek · 2026-04-03T17:53:59Z

Summary

Introduces a new review-research-pr skill tailored for automated review of research-oriented PRs, replacing production-code lenses with seven research-appropriate lenses covering methodology, reproducibility, report quality, statistical rigor, isolation, data integrity, and slop detection. The skill is wired into research.yaml as an optional gated step and follows the same structural conventions as review-pr. Supporting infrastructure updates include skill discovery config, bundled skill count assertions, and write-recipe documentation.

Individual Group Plans

Group 1: Implementation Plan: Create review-research-pr Skill (Issue #526) — Part A

Create src/autoskillit/skills_extended/review-research-pr/SKILL.md — a diff-scoped PR review skill with 7 research-appropriate lenses replacing review-pr's 7 production-code lenses. The skill is structurally identical to review-pr (same GitHub Reviews API mechanics, same tiered fallback, same verdict tokens), with three differences: 7 research lenses (no deletion_regression), no Step 2.5 (deletion_regression pre-computation omitted), and inconclusive-valid constraint on the report-quality lens.

Group 2: Implementation Plan: Create review-research-pr Skill (Issue #526) — Part B

Complete implementation steps and verification: creates review-research-pr/SKILL.md, updates research.yaml (new review_pr ingredient + review_research_pr step), updates config/defaults.yaml (tier3 registration), updates tests/workspace/test_skills.py (counts 84→85, RESEARCH_SKILL_NAMES), adds tests/skills/test_review_research_pr_guards.py, and appends TestResearchRecipeStructure to tests/recipe/test_bundled_recipes.py.

Requirements

From TalonT-Org/spectral-init#141:

REQ-SKILL-001: Accepts <feature-branch> and <base-branch> matching review-pr interface.
REQ-SKILL-002: Spawns parallel subagent lenses for all 7 dimensions.
REQ-SKILL-003: Each lens receives annotated PR diff with [LNNN] line markers.
REQ-VERDICT-001: Produces approved, changes_requested, needs_human verdict tokens.
REQ-VERDICT-002: Posts inline GitHub review comments via Reviews API.
REQ-VERDICT-003: Inconclusive reports are valid — not flagged as deficiency.
REQ-INTEG-001: Invocable as optional step in research recipe after open_research_pr.

Architecture Impact

Process Flow Diagram

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;

    START([Phase 2 converges])
    COMPLETE([research_complete])

    subgraph RecipeTail ["● research.yaml — Modified Phase 2 Tail"]
        push["push_branch<br/>━━━━━━━━━━<br/>git push -u origin HEAD"]
        open_pr["open_research_pr<br/>━━━━━━━━━━<br/>gh pr create<br/>on_success → review_research_pr"]
        gate{"● skip_when_false gate<br/>━━━━━━━━━━<br/>inputs.review_pr == true?"}
    end

    subgraph SkillFlow ["★ review-research-pr/SKILL.md — Execution Flow"]
        direction TB

        subgraph Init ["Step 0–2: Validate + Find PR + Get Diff"]
            s0["★ Step 0: Derive feature_branch<br/>━━━━━━━━━━<br/>worktree path → git rev-parse HEAD<br/>derive escalation_user mention"]
            s1["★ Step 1: Find open PR<br/>━━━━━━━━━━<br/>gh pr list --head feature_branch<br/>exit 0 verdict=approved if gh unavailable"]
            s2["★ Step 2: Get PR diff<br/>━━━━━━━━━━<br/>gh pr diff → annotated diff<br/>line number [LNNN] markers"]
        end

        subgraph Lenses ["Step 3: 7 Parallel Research Lenses (Task, sonnet)"]
            direction LR
            L1["★ methodology<br/>━━━━━━━━━━<br/>hypothesis, variables,<br/>controls"]
            L2["★ reproducibility<br/>━━━━━━━━━━<br/>self-contained scripts,<br/>deps, seeds"]
            L3["★ report-quality<br/>━━━━━━━━━━<br/>completeness — inconclusive<br/>is valid outcome"]
            L4["★ statistical-rigor<br/>━━━━━━━━━━<br/>metrics, fair comparisons,<br/>effect sizes"]
            L5["★ isolation<br/>━━━━━━━━━━<br/>research/ scope,<br/>prod changes minimal"]
            L6["★ data-integrity<br/>━━━━━━━━━━<br/>raw results preserved,<br/>figures match data"]
            L7["slop<br/>━━━━━━━━━━<br/>AI filler, dead code,<br/>verbose boilerplate"]
        end

        subgraph Aggregate ["Step 4–5: Aggregate + Verdict"]
            agg["★ Step 4: Deduplicate findings<br/>━━━━━━━━━━<br/>by (file, line) pairs<br/>requires_decision axis"]
            verdict{"★ Step 5: Verdict decision<br/>━━━━━━━━━━<br/>any requires_decision?<br/>any blocking findings?"}
        end

        subgraph Post ["Step 6: Tiered Review Posting"]
            tier1["★ Tier 1: Batch POST /reviews<br/>━━━━━━━━━━<br/>GitHub Reviews API<br/>inline comments + event"]
            tier2["★ Tier 2 Fallback: /pulls/comments<br/>━━━━━━━━━━<br/>per-finding individual POST"]
            tier3["★ Tier 3 DEGRADED<br/>━━━━━━━━━━<br/>bullet-list body dump"]
        end

        emit["★ Step 7–8: Submit Review + emit verdict=<br/>━━━━━━━━━━<br/>approved · changes_requested · needs_human"]
    end

    START --> push --> open_pr --> gate
    gate -->|"review_pr=false (default)"| COMPLETE
    gate -->|"review_pr=true"| s0
    s0 --> s1 --> s2
    s2 --> L1 & L2 & L3 & L4 & L5 & L6 & L7
    L1 & L2 & L3 & L4 & L5 & L6 & L7 --> agg
    agg --> verdict
    verdict -->|"no blockers"| tier1
    verdict -->|"has blockers / needs_human"| tier1
    tier1 -->|"batch ok"| emit
    tier1 -->|"batch fails"| tier2
    tier2 -->|"some ok"| emit
    tier2 -->|"all fail"| tier3
    tier3 --> emit
    emit --> COMPLETE

    class START,COMPLETE terminal;
    class push,open_pr phase;
    class gate stateNode;
    class s0,s1,s2 handler;
    class L1,L2,L3,L4,L5,L6 newComponent;
    class L7,agg handler;
    class verdict stateNode;
    class tier1,tier2,emit newComponent;
    class tier3 detector;

Color Legend:

Color	Category	Description
Dark Blue	Terminal	Start and complete states
Purple	Phase	Recipe orchestration steps
Teal	State	Decision and routing gates
Orange	Handler	Diff acquisition, deduplication, slop lens
Green	New Component	New skill, 6 research lenses, posting tiers, verdict emit
Red	Detector	Degraded fallback (Tier 3)

Module Dependency Diagram

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%%
graph TB
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    subgraph L3 ["LAYER 3 — TESTS (verify contracts)"]
        direction LR
        t_skills["★ test_review_research_pr_guards.py<br/>━━━━━━━━━━<br/>behavioral guards: lenses,<br/>verdicts, API mechanics"]
        t_recipe["● test_bundled_recipes.py<br/>━━━━━━━━━━<br/>TestResearchRecipeStructure:<br/>review_pr ingredient + step"]
        t_workspace["● test_skills.py<br/>━━━━━━━━━━<br/>BUNDLED_SKILLS count 84→85<br/>RESEARCH_SKILL_NAMES extended"]
    end

    subgraph L2 ["LAYER 2 — RECIPE ORCHESTRATION (wires skill into pipeline)"]
        research["● research.yaml<br/>━━━━━━━━━━<br/>review_pr ingredient (default false)<br/>review_research_pr step<br/>skip_when_false · on_context_limit"]
    end

    subgraph L1 ["LAYER 1 — SKILLS EXTENDED (implementations)"]
        skill["★ review-research-pr/SKILL.md<br/>━━━━━━━━━━<br/>7 research lenses: methodology,<br/>reproducibility, report-quality,<br/>statistical-rigor, isolation,<br/>data-integrity, slop<br/>categories: [research]"]
        write_recipe["● write-recipe/SKILL.md<br/>━━━━━━━━━━<br/>bundled skills list:<br/>review-research-pr inserted<br/>after review-pr"]
    end

    subgraph L0 ["LAYER 0 — CONFIG + DISCOVERY REGISTRY (tier assignment)"]
        defaults["● config/defaults.yaml<br/>━━━━━━━━━━<br/>skills.tier3 list:<br/>review-research-pr added<br/>(same tier as review-pr)"]
        resolver["workspace/skills.py<br/>━━━━━━━━━━<br/>SkillResolver<br/>reads tier config<br/>list_all() count: 85"]
    end

    t_skills -->|"reads SKILL.md<br/>asserts lens coverage"| skill
    t_recipe -->|"loads research.yaml<br/>asserts step + ingredient"| research
    t_workspace -->|"calls SkillResolver<br/>asserts count=85"| resolver

    research -->|"run_skill: /autoskillit:review-research-pr<br/>discovers skill at invocation time"| skill

    write_recipe -->|"documents<br/>review-research-pr in<br/>bundled skills list"| skill

    defaults -->|"tier3 list<br/>consumed by SkillResolver"| resolver

    skill -->|"discovered via<br/>tier3 → skills_extended dir"| defaults

    class t_skills newComponent;
    class t_recipe,t_workspace handler;
    class research phase;
    class skill newComponent;
    class write_recipe handler;
    class defaults stateNode;
    class resolver output;

Color Legend:

Color	Category	Description
Green	New Component	New skill file and new test guard
Orange	Handler	Modified test files and write-recipe doc
Purple	Phase	Modified recipe orchestration
Teal	State	Config registry (defaults.yaml)
Dark Teal	Output	SkillResolver (reads tier config, exposes list_all)

Closes #526

Implementation Plan

Plan files:

/home/talon/projects/autoskillit-runs/impl-20260403-092441-340811/.autoskillit/temp/make-plan/review_research_pr_plan_2026-04-03_120000_part_a.md
/home/talon/projects/autoskillit-runs/impl-20260403-092441-340811/.autoskillit/temp/make-plan/review_research_pr_plan_2026-04-03_120000_part_b.md

🤖 Generated with Claude Code via AutoSkillit

Token Usage Summary

Step	input	output	cached	count	time
plan	48	48.8k	2.3M	1	14m 59s
verify	36	28.2k	1.6M	2	9m 11s
implement	76	30.9k	3.0M	2	28m 11s
fix	71	30.8k	4.8M	1	14m 12s
audit_impl	27	18.7k	908.8k	1	8m 3s
open_pr	31	20.6k	1.3M	1	6m 39s
Total	289	178.1k	13.9M		1h 21m

…earch-pr (Part A) - New tests/skills/test_review_research_pr_guards.py: 8 behavioral guards enforcing research lens coverage, inconclusive-results contract, verdict mechanics, and GitHub Reviews API posting requirements. - Append TestResearchRecipeStructure to test_bundled_recipes.py: asserts research.yaml has review_pr ingredient (default=false), review_research_pr step, skip_when_false gate, and routes to research_complete on any outcome. - Update test_skills.py: insert review-research-pr into BUNDLED_SKILLS, bump skills_extended count 82→83, bump list_all count 84→85, add review-research-pr to RESEARCH_SKILL_NAMES. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…(Part A) - Create skills_extended/review-research-pr/SKILL.md with 7 research lenses (methodology, reproducibility, report-quality, statistical-rigor, isolation, data-integrity, slop), no deletion_regression, inconclusive-valid constraint, [LNNN] markers, and GitHub Reviews API posting mechanics - Add review_pr ingredient (default false) and review_research_pr step to research.yaml, gated by skip_when_false, routing to research_complete on any outcome - Add review-research-pr to tier3 in defaults.yaml - Update write-recipe/SKILL.md bundled skills list - Update doc skill counts (85 → 86) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…older allowlist - Use inputs.review_pr format in research.yaml skip_when_false (semantic rule requires inputs. prefix) - Add on_context_limit: research_complete to review_research_pr step (advisory step guard) - Fix test_research_review_step_skip_when_false to expect inputs.review_pr - Add review-research-pr entries to placeholder allowlist in test_skill_placeholder_contracts.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Trecek

AutoSkillit PR Review — Verdict: changes_requested

Trecek

AutoSkillit review found 5 blocking issues. See inline comments. (Note: REQUEST_CHANGES bypassed — cannot request changes on own PR; verdict: changes_requested)

…ategories: line, not anywhere in file Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ews co-location, not independent tokens Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…context_limit routing - TestResearchRecipeStructure: scope='class' -> scope='function' for xdist safety - test_research_review_step_routes_to_complete_on_any_outcome: add on_context_limit assertion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ls_in_skills_extended Function name encoded a stale count (58); implementation asserts 83. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Trecek and others added 3 commits April 3, 2026 09:59

Trecek commented Apr 3, 2026

View reviewed changes

Trecek and others added 4 commits April 3, 2026 11:04

fix(review): tighten categories assertion — check 'research' on the c…

f025c46

…ategories: line, not anywhere in file Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(review): tighten reviews API assertion — check for pulls/{n}/revi…

de593ec

…ews co-location, not independent tokens Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(review): rename test_58_skills_in_skills_extended to test_83_skil…

4210c8f

…ls_in_skills_extended Function name encoded a stale count (58); implementation asserts 83. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Trecek added this pull request to the merge queue Apr 3, 2026

Merged via the queue into integration with commit bb33e99 Apr 3, 2026
2 checks passed

Trecek deleted the feat-create-review-research-pr-skill-for-research-recipe-prs/526 branch April 3, 2026 18:14

Trecek mentioned this pull request Apr 14, 2026

Promote integration to main (198 PRs, 182 issues, 102 fixes, 107 features, 1 infra, 7 tests, 4 docs) #925

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add review-research-pr skill for research recipe PRs#588

feat: add review-research-pr skill for research recipe PRs#588
Trecek merged 7 commits intointegrationfrom
feat-create-review-research-pr-skill-for-research-recipe-prs/526

Trecek commented Apr 3, 2026 •

edited

Loading

Uh oh!

Trecek left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Trecek left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Trecek commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Group 1: Implementation Plan: Create review-research-pr Skill (Issue #526) — Part A

Group 2: Implementation Plan: Create review-research-pr Skill (Issue #526) — Part B

Requirements

Architecture Impact

Process Flow Diagram

Module Dependency Diagram

Implementation Plan

Token Usage Summary

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Trecek commented Apr 3, 2026 •

edited

Loading