Skip to content

feat: add review-research-pr skill for research recipe PRs#588

Merged
Trecek merged 7 commits intointegrationfrom
feat-create-review-research-pr-skill-for-research-recipe-prs/526
Apr 3, 2026
Merged

feat: add review-research-pr skill for research recipe PRs#588
Trecek merged 7 commits intointegrationfrom
feat-create-review-research-pr-skill-for-research-recipe-prs/526

Conversation

@Trecek
Copy link
Copy Markdown
Collaborator

@Trecek Trecek commented Apr 3, 2026

Summary

Introduces a new review-research-pr skill tailored for automated review of research-oriented PRs, replacing production-code lenses with seven research-appropriate lenses covering methodology, reproducibility, report quality, statistical rigor, isolation, data integrity, and slop detection. The skill is wired into research.yaml as an optional gated step and follows the same structural conventions as review-pr. Supporting infrastructure updates include skill discovery config, bundled skill count assertions, and write-recipe documentation.

Individual Group Plans

Group 1: Implementation Plan: Create review-research-pr Skill (Issue #526) — Part A

Create src/autoskillit/skills_extended/review-research-pr/SKILL.md — a diff-scoped PR review skill with 7 research-appropriate lenses replacing review-pr's 7 production-code lenses. The skill is structurally identical to review-pr (same GitHub Reviews API mechanics, same tiered fallback, same verdict tokens), with three differences: 7 research lenses (no deletion_regression), no Step 2.5 (deletion_regression pre-computation omitted), and inconclusive-valid constraint on the report-quality lens.

Group 2: Implementation Plan: Create review-research-pr Skill (Issue #526) — Part B

Complete implementation steps and verification: creates review-research-pr/SKILL.md, updates research.yaml (new review_pr ingredient + review_research_pr step), updates config/defaults.yaml (tier3 registration), updates tests/workspace/test_skills.py (counts 84→85, RESEARCH_SKILL_NAMES), adds tests/skills/test_review_research_pr_guards.py, and appends TestResearchRecipeStructure to tests/recipe/test_bundled_recipes.py.

Requirements

From TalonT-Org/spectral-init#141:

  • REQ-SKILL-001: Accepts <feature-branch> and <base-branch> matching review-pr interface.
  • REQ-SKILL-002: Spawns parallel subagent lenses for all 7 dimensions.
  • REQ-SKILL-003: Each lens receives annotated PR diff with [LNNN] line markers.
  • REQ-VERDICT-001: Produces approved, changes_requested, needs_human verdict tokens.
  • REQ-VERDICT-002: Posts inline GitHub review comments via Reviews API.
  • REQ-VERDICT-003: Inconclusive reports are valid — not flagged as deficiency.
  • REQ-INTEG-001: Invocable as optional step in research recipe after open_research_pr.

Architecture Impact

Process Flow Diagram

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;

    START([Phase 2 converges])
    COMPLETE([research_complete])

    subgraph RecipeTail ["● research.yaml — Modified Phase 2 Tail"]
        push["push_branch<br/>━━━━━━━━━━<br/>git push -u origin HEAD"]
        open_pr["open_research_pr<br/>━━━━━━━━━━<br/>gh pr create<br/>on_success → review_research_pr"]
        gate{"● skip_when_false gate<br/>━━━━━━━━━━<br/>inputs.review_pr == true?"}
    end

    subgraph SkillFlow ["★ review-research-pr/SKILL.md — Execution Flow"]
        direction TB

        subgraph Init ["Step 0–2: Validate + Find PR + Get Diff"]
            s0["★ Step 0: Derive feature_branch<br/>━━━━━━━━━━<br/>worktree path → git rev-parse HEAD<br/>derive escalation_user mention"]
            s1["★ Step 1: Find open PR<br/>━━━━━━━━━━<br/>gh pr list --head feature_branch<br/>exit 0 verdict=approved if gh unavailable"]
            s2["★ Step 2: Get PR diff<br/>━━━━━━━━━━<br/>gh pr diff → annotated diff<br/>line number [LNNN] markers"]
        end

        subgraph Lenses ["Step 3: 7 Parallel Research Lenses (Task, sonnet)"]
            direction LR
            L1["★ methodology<br/>━━━━━━━━━━<br/>hypothesis, variables,<br/>controls"]
            L2["★ reproducibility<br/>━━━━━━━━━━<br/>self-contained scripts,<br/>deps, seeds"]
            L3["★ report-quality<br/>━━━━━━━━━━<br/>completeness — inconclusive<br/>is valid outcome"]
            L4["★ statistical-rigor<br/>━━━━━━━━━━<br/>metrics, fair comparisons,<br/>effect sizes"]
            L5["★ isolation<br/>━━━━━━━━━━<br/>research/ scope,<br/>prod changes minimal"]
            L6["★ data-integrity<br/>━━━━━━━━━━<br/>raw results preserved,<br/>figures match data"]
            L7["slop<br/>━━━━━━━━━━<br/>AI filler, dead code,<br/>verbose boilerplate"]
        end

        subgraph Aggregate ["Step 4–5: Aggregate + Verdict"]
            agg["★ Step 4: Deduplicate findings<br/>━━━━━━━━━━<br/>by (file, line) pairs<br/>requires_decision axis"]
            verdict{"★ Step 5: Verdict decision<br/>━━━━━━━━━━<br/>any requires_decision?<br/>any blocking findings?"}
        end

        subgraph Post ["Step 6: Tiered Review Posting"]
            tier1["★ Tier 1: Batch POST /reviews<br/>━━━━━━━━━━<br/>GitHub Reviews API<br/>inline comments + event"]
            tier2["★ Tier 2 Fallback: /pulls/comments<br/>━━━━━━━━━━<br/>per-finding individual POST"]
            tier3["★ Tier 3 DEGRADED<br/>━━━━━━━━━━<br/>bullet-list body dump"]
        end

        emit["★ Step 7–8: Submit Review + emit verdict=<br/>━━━━━━━━━━<br/>approved · changes_requested · needs_human"]
    end

    START --> push --> open_pr --> gate
    gate -->|"review_pr=false (default)"| COMPLETE
    gate -->|"review_pr=true"| s0
    s0 --> s1 --> s2
    s2 --> L1 & L2 & L3 & L4 & L5 & L6 & L7
    L1 & L2 & L3 & L4 & L5 & L6 & L7 --> agg
    agg --> verdict
    verdict -->|"no blockers"| tier1
    verdict -->|"has blockers / needs_human"| tier1
    tier1 -->|"batch ok"| emit
    tier1 -->|"batch fails"| tier2
    tier2 -->|"some ok"| emit
    tier2 -->|"all fail"| tier3
    tier3 --> emit
    emit --> COMPLETE

    class START,COMPLETE terminal;
    class push,open_pr phase;
    class gate stateNode;
    class s0,s1,s2 handler;
    class L1,L2,L3,L4,L5,L6 newComponent;
    class L7,agg handler;
    class verdict stateNode;
    class tier1,tier2,emit newComponent;
    class tier3 detector;
Loading

Color Legend:

Color Category Description
Dark Blue Terminal Start and complete states
Purple Phase Recipe orchestration steps
Teal State Decision and routing gates
Orange Handler Diff acquisition, deduplication, slop lens
Green New Component New skill, 6 research lenses, posting tiers, verdict emit
Red Detector Degraded fallback (Tier 3)

Module Dependency Diagram

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%%
graph TB
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    subgraph L3 ["LAYER 3 — TESTS (verify contracts)"]
        direction LR
        t_skills["★ test_review_research_pr_guards.py<br/>━━━━━━━━━━<br/>behavioral guards: lenses,<br/>verdicts, API mechanics"]
        t_recipe["● test_bundled_recipes.py<br/>━━━━━━━━━━<br/>TestResearchRecipeStructure:<br/>review_pr ingredient + step"]
        t_workspace["● test_skills.py<br/>━━━━━━━━━━<br/>BUNDLED_SKILLS count 84→85<br/>RESEARCH_SKILL_NAMES extended"]
    end

    subgraph L2 ["LAYER 2 — RECIPE ORCHESTRATION (wires skill into pipeline)"]
        research["● research.yaml<br/>━━━━━━━━━━<br/>review_pr ingredient (default false)<br/>review_research_pr step<br/>skip_when_false · on_context_limit"]
    end

    subgraph L1 ["LAYER 1 — SKILLS EXTENDED (implementations)"]
        skill["★ review-research-pr/SKILL.md<br/>━━━━━━━━━━<br/>7 research lenses: methodology,<br/>reproducibility, report-quality,<br/>statistical-rigor, isolation,<br/>data-integrity, slop<br/>categories: [research]"]
        write_recipe["● write-recipe/SKILL.md<br/>━━━━━━━━━━<br/>bundled skills list:<br/>review-research-pr inserted<br/>after review-pr"]
    end

    subgraph L0 ["LAYER 0 — CONFIG + DISCOVERY REGISTRY (tier assignment)"]
        defaults["● config/defaults.yaml<br/>━━━━━━━━━━<br/>skills.tier3 list:<br/>review-research-pr added<br/>(same tier as review-pr)"]
        resolver["workspace/skills.py<br/>━━━━━━━━━━<br/>SkillResolver<br/>reads tier config<br/>list_all() count: 85"]
    end

    t_skills -->|"reads SKILL.md<br/>asserts lens coverage"| skill
    t_recipe -->|"loads research.yaml<br/>asserts step + ingredient"| research
    t_workspace -->|"calls SkillResolver<br/>asserts count=85"| resolver

    research -->|"run_skill: /autoskillit:review-research-pr<br/>discovers skill at invocation time"| skill

    write_recipe -->|"documents<br/>review-research-pr in<br/>bundled skills list"| skill

    defaults -->|"tier3 list<br/>consumed by SkillResolver"| resolver

    skill -->|"discovered via<br/>tier3 → skills_extended dir"| defaults

    class t_skills newComponent;
    class t_recipe,t_workspace handler;
    class research phase;
    class skill newComponent;
    class write_recipe handler;
    class defaults stateNode;
    class resolver output;
Loading

Color Legend:

Color Category Description
Green New Component New skill file and new test guard
Orange Handler Modified test files and write-recipe doc
Purple Phase Modified recipe orchestration
Teal State Config registry (defaults.yaml)
Dark Teal Output SkillResolver (reads tier config, exposes list_all)

Closes #526

Implementation Plan

Plan files:

  • /home/talon/projects/autoskillit-runs/impl-20260403-092441-340811/.autoskillit/temp/make-plan/review_research_pr_plan_2026-04-03_120000_part_a.md
  • /home/talon/projects/autoskillit-runs/impl-20260403-092441-340811/.autoskillit/temp/make-plan/review_research_pr_plan_2026-04-03_120000_part_b.md

🤖 Generated with Claude Code via AutoSkillit

Token Usage Summary

Step input output cached count time
plan 48 48.8k 2.3M 1 14m 59s
verify 36 28.2k 1.6M 2 9m 11s
implement 76 30.9k 3.0M 2 28m 11s
fix 71 30.8k 4.8M 1 14m 12s
audit_impl 27 18.7k 908.8k 1 8m 3s
open_pr 31 20.6k 1.3M 1 6m 39s
Total 289 178.1k 13.9M 1h 21m

Trecek and others added 3 commits April 3, 2026 09:59
…earch-pr (Part A)

- New tests/skills/test_review_research_pr_guards.py: 8 behavioral guards
  enforcing research lens coverage, inconclusive-results contract,
  verdict mechanics, and GitHub Reviews API posting requirements.
- Append TestResearchRecipeStructure to test_bundled_recipes.py: asserts
  research.yaml has review_pr ingredient (default=false), review_research_pr
  step, skip_when_false gate, and routes to research_complete on any outcome.
- Update test_skills.py: insert review-research-pr into BUNDLED_SKILLS,
  bump skills_extended count 82→83, bump list_all count 84→85,
  add review-research-pr to RESEARCH_SKILL_NAMES.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…(Part A)

- Create skills_extended/review-research-pr/SKILL.md with 7 research lenses
  (methodology, reproducibility, report-quality, statistical-rigor, isolation,
  data-integrity, slop), no deletion_regression, inconclusive-valid constraint,
  [LNNN] markers, and GitHub Reviews API posting mechanics
- Add review_pr ingredient (default false) and review_research_pr step to
  research.yaml, gated by skip_when_false, routing to research_complete on any outcome
- Add review-research-pr to tier3 in defaults.yaml
- Update write-recipe/SKILL.md bundled skills list
- Update doc skill counts (85 → 86)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…older allowlist

- Use inputs.review_pr format in research.yaml skip_when_false (semantic rule requires inputs. prefix)
- Add on_context_limit: research_complete to review_research_pr step (advisory step guard)
- Fix test_research_review_step_skip_when_false to expect inputs.review_pr
- Add review-research-pr entries to placeholder allowlist in test_skill_placeholder_contracts.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit PR Review — Verdict: changes_requested

Comment thread tests/skills/test_review_research_pr_guards.py Outdated
Comment thread tests/skills/test_review_research_pr_guards.py Outdated
Comment thread tests/recipe/test_bundled_recipes.py Outdated
Comment thread tests/recipe/test_bundled_recipes.py
Comment thread tests/workspace/test_skills.py Outdated
Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit review found 5 blocking issues. See inline comments. (Note: REQUEST_CHANGES bypassed — cannot request changes on own PR; verdict: changes_requested)

Trecek and others added 4 commits April 3, 2026 11:04
…ategories: line, not anywhere in file

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ews co-location, not independent tokens

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…context_limit routing

- TestResearchRecipeStructure: scope='class' -> scope='function' for xdist safety
- test_research_review_step_routes_to_complete_on_any_outcome: add on_context_limit assertion

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ls_in_skills_extended

Function name encoded a stale count (58); implementation asserts 83.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Trecek Trecek added this pull request to the merge queue Apr 3, 2026
Merged via the queue into integration with commit bb33e99 Apr 3, 2026
2 checks passed
@Trecek Trecek deleted the feat-create-review-research-pr-skill-for-research-recipe-prs/526 branch April 3, 2026 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant