Skip to content

Implementation Plan: Citation Integrity Gates for Research Pipeline#661

Merged
Trecek merged 6 commits intointegrationfrom
add-audit-claims-skill-and-resolve-claims-review-for-citatio/657
Apr 7, 2026
Merged

Implementation Plan: Citation Integrity Gates for Research Pipeline#661
Trecek merged 6 commits intointegrationfrom
add-audit-claims-skill-and-resolve-claims-review-for-citatio/657

Conversation

@Trecek
Copy link
Copy Markdown
Collaborator

@Trecek Trecek commented Apr 7, 2026

Summary

Add an optional citation-integrity gate to the research pipeline as two new Tier 3 skills:

  • audit-claims — parallel subagent-driven claim extraction and evidence matching that emits verdict = {approved|changes_requested|needs_human}. Mirrors the review-research-pr pattern but focused on citation integrity across four claim types: experimental, external, methodological, comparative.
  • resolve-claims-review — ACCEPT/REJECT/DISCUSS intent validation with five fix strategies (add_citation, qualify_claim, remove_claim, rerun_required [escalated], design_flaw [escalated]). Mirrors the resolve-research-review pattern.

Recipe changes restructure the PR+Review phase so both read-only analysis gates (review_research_pr and audit_claims) complete before any resolution step begins. This prevents double re-runs when both gates request changes. A new merge_escalations route step (replacing check_escalations) triggers at most one re_run_experiment if either resolution skill escalates needs_rerun = true.

Architecture Impact

Process Flow Diagram

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;

    %% TERMINALS %%
    START([START])
    RERUN([re_run_experiment])
    PUSH([re_push_research])

    %% PHASE 1: REVIEW GATE %%
    subgraph ReviewGate ["● Review Gate (research.yaml)"]
        ReviewPR["● review_research_pr<br/>━━━━━━━━━━<br/>review-research-pr skill<br/>skip_when_false: inputs.review_pr<br/>captures: review_verdict"]
    end

    %% PHASE 2: AUDIT CLAIMS (NEW) %%
    subgraph AuditClaimsGate ["★ Audit Claims Gate (audit-claims/SKILL.md — NEW)"]
        AuditEntry["★ audit_claims<br/>━━━━━━━━━━<br/>gated: inputs.audit_claims<br/>retries: 1<br/>captures: audit_verdict"]
        GetDiff["Get PR Diff<br/>━━━━━━━━━━<br/>gh pr diff<br/>save: diff_{pr}.txt"]
        ClaimExtract["★ Phase 1: Claim Extraction<br/>━━━━━━━━━━<br/>Parallel subagents per diff section<br/>(Executive Summary, Results,<br/>Methodology, Discussion, …)<br/>→ claims_{pr}.json"]
        EvidenceMatch["★ Phase 2: Evidence Matching<br/>━━━━━━━━━━<br/>Parallel subagents per claim type<br/>external / methodological / comparative<br/>experimental → skipped (self-evidencing)<br/>→ findings_{pr}.json"]
        AuditAggregate["Aggregate & Deduplicate<br/>━━━━━━━━━━<br/>Deduplicate by (file, line)<br/>Bucket: actionable / decision / info"]
        AuditVerdictDet{"Verdict<br/>Determination<br/>actionable_findings?<br/>decision_findings?"}
        PostInlineReview["★ Post Inline Review<br/>━━━━━━━━━━<br/>GitHub Reviews API<br/>APPROVE / COMMENT / REQUEST_CHANGES<br/>Fallback: individual → summary dump"]

        AuditEntry --> GetDiff
        GetDiff --> ClaimExtract
        ClaimExtract --> EvidenceMatch
        EvidenceMatch --> AuditAggregate
        AuditAggregate --> AuditVerdictDet
        AuditVerdictDet -->|"actionable → changes_requested"| PostInlineReview
        AuditVerdictDet -->|"decision only → needs_human"| PostInlineReview
        AuditVerdictDet -->|"none → approved"| PostInlineReview
    end

    %% PHASE 3: RESOLUTION ROUTING %%
    subgraph ResolutionRouting ["● Resolution Routing (research.yaml — MODIFIED)"]
        RouteReviewResolve{"● route_review_resolve<br/>review_verdict?"}
        ResolveResearch["resolve_research_review<br/>━━━━━━━━━━<br/>retries: 2<br/>captures: review_needs_rerun"]
        RouteClaimsResolve{"● route_claims_resolve<br/>audit_verdict?"}

        RouteReviewResolve -->|"review == changes_requested"| ResolveResearch
        RouteReviewResolve -->|"else (approved / needs_human)"| RouteClaimsResolve
        ResolveResearch -->|"any exit → review_needs_rerun captured"| RouteClaimsResolve
    end

    %% PHASE 4: RESOLVE CLAIMS REVIEW (NEW) %%
    subgraph ResolveClaimsGate ["★ Resolve Claims Review (resolve-claims-review/SKILL.md — NEW)"]
        FetchComments["Fetch Review Comments<br/>━━━━━━━━━━<br/>REST: inline comments + reviews<br/>GraphQL: thread node IDs<br/>(cursor-paginated, skip resolved)"]
        ParseDimGroup["Parse & Dimension-Group<br/>━━━━━━━━━━<br/>Extract [severity] dimension:<br/>citations / methodology /<br/>comparisons / unknown"]
        IntentVal["★ Parallel Intent Validation<br/>━━━━━━━━━━<br/>Subagents per dimension group<br/>→ ACCEPT / REJECT / DISCUSS<br/>+ fix_strategy + escalate flag"]
        FixRoute{"fix_strategy?<br/>ACCEPT / REJECT / DISCUSS"}
        ApplyEdits["Apply Edits<br/>━━━━━━━━━━<br/>add_citation / qualify_claim /<br/>remove_claim<br/>→ pre-commit + git commit"]
        EscalateFind["★ Escalate Finding<br/>━━━━━━━━━━<br/>rerun_required / design_flaw<br/>→ escalation_records_{pr}.json"]
        SkipFix["Skip (REJECT / DISCUSS)<br/>━━━━━━━━━━<br/>No code change<br/>Record skip"]
        RunValCmd["Validation Command<br/>━━━━━━━━━━<br/>claims_review.validation_command<br/>max 3 iterations | null → SKIPPED"]
        ThreadResolve["Resolve Threads + Post Replies<br/>━━━━━━━━━━<br/>GraphQL resolveReviewThread<br/>+ inline reply per comment<br/>Best-effort (never affects exit code)"]
        RerunCheck{"Any rerun_required<br/>in escalation_records?"}

        FetchComments --> ParseDimGroup
        ParseDimGroup --> IntentVal
        IntentVal --> FixRoute
        FixRoute -->|"add_citation / qualify_claim / remove_claim"| ApplyEdits
        FixRoute -->|"rerun_required / design_flaw"| EscalateFind
        FixRoute -->|"REJECT / DISCUSS"| SkipFix
        ApplyEdits --> RunValCmd
        EscalateFind --> ThreadResolve
        SkipFix --> ThreadResolve
        RunValCmd --> ThreadResolve
        ThreadResolve --> RerunCheck
    end

    %% PHASE 5: MERGE ESCALATIONS %%
    MergeEsc{"● merge_escalations<br/>review_needs_rerun<br/>OR claims_needs_rerun?"}

    %% TOP-LEVEL FLOW %%
    START --> ReviewPR
    ReviewPR -->|"any exit (success / failure / context_limit)"| AuditEntry
    PostInlineReview -->|"verdict = approved / changes_requested / needs_human"| RouteReviewResolve

    RouteClaimsResolve -->|"audit == changes_requested"| FetchComments
    RouteClaimsResolve -->|"else (approved / needs_human)"| MergeEsc

    RerunCheck -->|"needs_rerun = true / false"| MergeEsc
    MergeEsc -->|"true"| RERUN
    MergeEsc -->|"false"| PUSH

    %% CLASS ASSIGNMENTS %%
    class START,RERUN,PUSH terminal;
    class ReviewPR,ResolveResearch handler;
    class AuditEntry,ClaimExtract,EvidenceMatch,PostInlineReview,IntentVal,EscalateFind newComponent;
    class GetDiff,AuditAggregate,FetchComments,ParseDimGroup,ApplyEdits,ThreadResolve output;
    class AuditVerdictDet,RouteReviewResolve,RouteClaimsResolve,FixRoute,RerunCheck,MergeEsc stateNode;
    class RunValCmd detector;
    class SkipFix gap;
Loading

Scenarios Diagram

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 55, 'curve': 'basis'}}}%%
flowchart LR
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    %% ─────────────────────────────────────── %%
    subgraph S1 ["SCENARIO 1: Citation Audit Passes (Happy Path)"]
        direction LR
        S1_PR["review_research_pr<br/>━━━━━━━━━━<br/>reads: worktree diff<br/>writes: review_verdict"]
        S1_AUDIT["★ audit_claims<br/>━━━━━━━━━━<br/>reads: PR diff via gh<br/>writes: claims JSON, findings JSON<br/>emits: verdict=approved"]
        S1_ROUTE_REV["route_review_resolve<br/>━━━━━━━━━━<br/>reads: review_verdict<br/>routes: approved → route_claims_resolve"]
        S1_ROUTE_CL["route_claims_resolve<br/>━━━━━━━━━━<br/>reads: audit_verdict=approved<br/>routes: → merge_escalations"]
        S1_MERGE["merge_escalations<br/>━━━━━━━━━━<br/>reads: needs_rerun flags<br/>routes: → re_push_research"]
        S1_PUSH["re_push_research<br/>━━━━━━━━━━<br/>writes: origin HEAD push<br/>routes: → begin_archival"]
    end

    S1_PR -->|"verdict captured"| S1_AUDIT
    S1_AUDIT -->|"all paths → route"| S1_ROUTE_REV
    S1_ROUTE_REV -->|"approved/skipped"| S1_ROUTE_CL
    S1_ROUTE_CL -->|"approved → skip resolve"| S1_MERGE
    S1_MERGE -->|"no rerun needed"| S1_PUSH

    %% ─────────────────────────────────────── %%
    subgraph S2 ["SCENARIO 2: Citation Fixes Required (changes_requested)"]
        direction LR
        S2_AUDIT["★ audit_claims<br/>━━━━━━━━━━<br/>reads: PR diff sections<br/>Phase 1: extract claims by section<br/>Phase 2: match against evidence<br/>emits: verdict=changes_requested"]
        S2_ROUTE["route_claims_resolve<br/>━━━━━━━━━━<br/>reads: audit_verdict=changes_requested<br/>routes: → resolve_claims_review"]
        S2_RESOLVE["★ resolve_claims_review<br/>━━━━━━━━━━<br/>reads: inline PR comments<br/>groups by dimension: citations/methodology/comparisons<br/>launches: parallel intent-validation subagents<br/>writes: classification_map, fixes"]
        S2_INTENT["Intent Validation Subagents<br/>━━━━━━━━━━<br/>reads: actual file content ±30 lines<br/>classifies: ACCEPT / REJECT / DISCUSS<br/>assigns: fix_strategy per finding"]
        S2_FIX["Apply Fixes<br/>━━━━━━━━━━<br/>add_citation / qualify_claim / remove_claim<br/>writes: git commits in worktree<br/>resolves: review threads via GraphQL"]
        S2_SUMMARY["★ resolve-claims-review summary<br/>━━━━━━━━━━<br/>writes: report_{pr}_{ts}.md<br/>emits: needs_rerun=false"]
    end

    S2_AUDIT -->|"changes_requested"| S2_ROUTE
    S2_ROUTE -->|"routes to resolve"| S2_RESOLVE
    S2_RESOLVE -->|"per-dimension subagents"| S2_INTENT
    S2_INTENT -->|"classification_map"| S2_FIX
    S2_FIX -->|"commits applied"| S2_SUMMARY

    %% ─────────────────────────────────────── %%
    subgraph S3 ["SCENARIO 3: Rerun Escalation (rerun_required fix_strategy)"]
        direction LR
        S3_INTENT["Intent Validation Subagents<br/>━━━━━━━━━━<br/>reads: claim at flagged line<br/>classifies: ACCEPT<br/>fix_strategy=rerun_required"]
        S3_ESCALATE["★ resolve_claims_review<br/>━━━━━━━━━━<br/>escalate=true for rerun_required<br/>writes: escalation_records_{pr}.json<br/>emits: needs_rerun=true"]
        S3_MERGE["merge_escalations<br/>━━━━━━━━━━<br/>reads: claims_needs_rerun=true<br/>routes: → re_run_experiment"]
        S3_RERUN["re_run_experiment<br/>━━━━━━━━━━<br/>runs: run-experiment --adjust<br/>writes: updated results_path"]
        S3_REPORT["re_write_report<br/>━━━━━━━━━━<br/>reads: updated experiment_results<br/>writes: new research report"]
        S3_PUSH["re_push_research<br/>━━━━━━━━━━<br/>pushes: updated branch<br/>routes: → begin_archival"]
    end

    S3_INTENT -->|"rerun_required found"| S3_ESCALATE
    S3_ESCALATE -->|"needs_rerun=true"| S3_MERGE
    S3_MERGE -->|"rerun triggered"| S3_RERUN
    S3_RERUN -->|"new results"| S3_REPORT
    S3_REPORT -->|"updated report"| S3_PUSH

    %% ─────────────────────────────────────── %%
    subgraph S4 ["SCENARIO 4: Dual Gate Sequencing (both gates before resolution)"]
        direction LR
        S4_REV["● review_research_pr<br/>━━━━━━━━━━<br/>skip_when_false: inputs.review_pr<br/>on_context_limit/failure: audit_claims<br/>on_result: → audit_claims"]
        S4_AUDIT["★ audit_claims<br/>━━━━━━━━━━<br/>skip_when_false: inputs.audit_claims<br/>on_exhausted/failure: route_review_resolve<br/>on_result: → route_review_resolve"]
        S4_GATE1["route_review_resolve<br/>━━━━━━━━━━<br/>reads: review_verdict<br/>deferred gate 1: review changes?"]
        S4_GATE2["route_claims_resolve<br/>━━━━━━━━━━<br/>reads: audit_verdict<br/>deferred gate 2: claims changes?"]
        S4_RESOLVE_REV["resolve_research_review<br/>━━━━━━━━━━<br/>conditional: review_verdict=changes_requested<br/>on_success: → route_claims_resolve"]
    end

    S4_REV -->|"all paths"| S4_AUDIT
    S4_AUDIT -->|"all paths"| S4_GATE1
    S4_GATE1 -->|"changes_requested"| S4_RESOLVE_REV
    S4_GATE1 -->|"other verdicts"| S4_GATE2
    S4_RESOLVE_REV -->|"after resolve"| S4_GATE2

    %% ─────────────────────────────────────── %%
    subgraph S5 ["SCENARIO 5: Graceful Degradation (audit disabled or gh unavailable)"]
        direction LR
        S5_INPUT["● research.yaml ingredient<br/>━━━━━━━━━━<br/>audit_claims: false (default)<br/>reads: inputs.audit_claims"]
        S5_SKIP["★ audit_claims<br/>━━━━━━━━━━<br/>skip_when_false: inputs.audit_claims<br/>OR gh unavailable:<br/>verdict=approved, exit 0"]
        S5_ROUTE["route_claims_resolve<br/>━━━━━━━━━━<br/>reads: audit_verdict=approved/empty<br/>routes: → merge_escalations (bypass)"]
        S5_END["merge_escalations → re_push<br/>━━━━━━━━━━<br/>no claims_needs_rerun flag set<br/>proceeds directly to archival"]
    end

    S5_INPUT -->|"false → skip"| S5_SKIP
    S5_SKIP -->|"approved emitted"| S5_ROUTE
    S5_ROUTE -->|"no resolve step"| S5_END

    %% CLASS ASSIGNMENTS %%
    class S1_PR,S4_REV handler;
    class S1_AUDIT,S2_AUDIT,S5_SKIP newComponent;
    class S1_ROUTE_REV,S1_ROUTE_CL,S1_MERGE,S4_GATE1,S4_GATE2,S5_ROUTE phase;
    class S1_PUSH,S3_PUSH output;
    class S2_RESOLVE,S3_ESCALATE newComponent;
    class S2_ROUTE,S3_MERGE phase;
    class S2_INTENT,S3_INTENT handler;
    class S2_FIX,S3_RERUN,S3_REPORT handler;
    class S2_SUMMARY output;
    class S3_INTENT detector;
    class S4_AUDIT newComponent;
    class S4_RESOLVE_REV handler;
    class S5_INPUT stateNode;
    class S5_END output;
Loading

State Lifecycle Diagram

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    START([RESEARCH RECIPE STATE])

    %% ─── INIT_PRESERVE: Recipe Context Fields ─── %%
    subgraph InitPreserve ["INIT_PRESERVE — Recipe Context (● modified)"]
        direction LR
        WT["● worktree_path<br/>━━━━━━━━━━<br/>Set at create_worktree<br/>Never re-derived"]
        PRURL["● pr_url<br/>━━━━━━━━━━<br/>Set at compose_research_pr<br/>Passed to audit-claims"]
        BB["● base_branch<br/>━━━━━━━━━━<br/>From ingredient<br/>Never changes"]
    end

    %% ─── DEFERRED GATE PATTERN ─── %%
    subgraph DeferredGates ["DEFERRED GATE ROUTING (● research.yaml) — both gates complete before routing"]
        direction TB
        RVW_GATE{"skip_when_false<br/>inputs.review_pr<br/>━━━━━━━━━━<br/>INIT_ONLY per recipe run"}
        RVW_RESULT["● review_verdict<br/>━━━━━━━━━━<br/>MUTABLE<br/>approved | changes_requested | needs_human"]
        AUD_GATE{"★ skip_when_false<br/>inputs.audit_claims<br/>━━━━━━━━━━<br/>INIT_ONLY per recipe run"}
        AUD_RESULT["★ audit_verdict<br/>━━━━━━━━━━<br/>MUTABLE<br/>approved | changes_requested | needs_human"]
        ROUTE_RVW["● route_review_resolve<br/>━━━━━━━━━━<br/>DERIVED routing<br/>reads: review_verdict"]
        ROUTE_CLM["★ route_claims_resolve<br/>━━━━━━━━━━<br/>DERIVED routing<br/>reads: audit_verdict"]
    end

    %% ─── AUDIT-CLAIMS STATE MACHINE ─── %%
    subgraph AuditClaimsBlock ["★ audit-claims Skill — Claim Extraction State"]
        direction TB
        AC_PH1["★ Phase 1: Extract Claims<br/>━━━━━━━━━━<br/>Parallel subagents by section<br/>INIT_ONLY: pr_url, diff_{pr}.txt"]
        AC_CLAIMS["★ claims_{pr}.json<br/>━━━━━━━━━━<br/>APPEND_ONLY<br/>Built from Phase 1 subagents<br/>{file, line, claim_text, claim_type}"]
        AC_PH2["★ Phase 2: Evidence Match<br/>━━━━━━━━━━<br/>Parallel subagents by claim_type<br/>experimental → always skip"]
        AC_FINDINGS["★ findings_{pr}.json<br/>━━━━━━━━━━<br/>APPEND_ONLY (then deduped)<br/>Dedup: keep highest severity<br/>per (file, line) pair"]
        AC_VERDICT{"★ Verdict Derivation<br/>━━━━━━━━━━<br/>DERIVED (never stored mutable)<br/>actionable? → changes_requested<br/>decision_only? → needs_human<br/>none? → approved"}
        AC_OUT["★ verdict = {value}<br/>━━━━━━━━━━<br/>Structured output token<br/>Captured as audit_verdict"]
    end

    %% ─── RESOLVE-CLAIMS-REVIEW STATE MACHINE ─── %%
    subgraph ResolveBlock ["★ resolve-claims-review Skill — Fix Application State"]
        direction TB
        RC_CFG["★ claims_review config<br/>━━━━━━━━━━<br/>INIT_ONLY for session<br/>validation_command (null=skip)<br/>validation_timeout (default: 120)"]
        RC_THREADS["★ Thread + Comment Fetch<br/>━━━━━━━━━━<br/>inline_comments_{pr}.json<br/>threads_{pr}.json<br/>dimension_groups_{pr}.json"]
        RC_INTENT["★ Intent Validation Gate<br/>━━━━━━━━━━<br/>BEFORE any code changes<br/>Parallel by dimension group<br/>Classifies: ACCEPT | REJECT | DISCUSS"]
        RC_CLASSMAP["★ classification_map_{pr}.json<br/>━━━━━━━━━━<br/>MUTABLE (built from validation)<br/>Keys: comment_id → verdict_entry<br/>fix_strategy per ACCEPT"]
        RC_ADDR["★ addressed_thread_ids<br/>━━━━━━━━━━<br/>APPEND_ONLY (in-memory)<br/>Grows per ACCEPT fix applied<br/>Consumed at Step 6: thread resolve"]
        RC_ESC["★ escalation_records<br/>━━━━━━━━━━<br/>APPEND_ONLY (in-memory)<br/>strategy: rerun_required | design_flaw<br/>Consumed at output determination"]
        RC_VGATE{"★ Validation Command Gate<br/>━━━━━━━━━━<br/>null → SKIP entirely<br/>else: max 3 retry iterations<br/>FAIL after 3 → exit non-zero"}
        RC_DERIVE{"★ needs_rerun Derivation<br/>━━━━━━━━━━<br/>DERIVED from escalation_records<br/>any strategy==rerun_required → true<br/>all design_flaw or none → false"}
        RC_OUT["★ needs_rerun = {true|false}<br/>━━━━━━━━━━<br/>Structured output token<br/>Captured as claims_needs_rerun"]
    end

    %% ─── MERGE ESCALATIONS ─── %%
    subgraph MergeBlock ["● merge_escalations — Combined Rerun Gate (● modified)"]
        direction TB
        MERGE_GATE{"● review_needs_rerun == true<br/>OR claims_needs_rerun == true<br/>━━━━━━━━━━<br/>DERIVED routing (reads 2 MUTABLE fields)"}
        RERUN["re_run_experiment"]
        PUSH["re_push_research"]
    end

    %% ─── RESUME DETECTION ─── %%
    subgraph ResumeBlock ["RESUME DETECTION STRATEGY"]
        direction LR
        R1["Tier 1: worktree_path present<br/>━━━━━━━━━━<br/>Resume from execution phase"]
        R2["Tier 2: pr_url present<br/>━━━━━━━━━━<br/>Resume from PR review phase"]
        R3["Tier 3: audit_verdict present<br/>━━━━━━━━━━<br/>Skip audit → route_claims_resolve"]
        R4["Tier 4: claims_needs_rerun present<br/>━━━━━━━━━━<br/>Skip resolution → merge_escalations"]
    end

    %% ─── CONNECTIONS ─── %%
    START --> InitPreserve

    WT --> RVW_GATE
    BB --> RVW_GATE
    PRURL --> AUD_GATE

    RVW_GATE -->|"skip/run"| RVW_RESULT
    RVW_RESULT --> ROUTE_RVW

    AUD_GATE -->|"skip (audit_claims=false)"| AUD_RESULT
    AUD_GATE -->|"run (audit_claims=true)"| AC_PH1
    AC_PH1 --> AC_CLAIMS --> AC_PH2 --> AC_FINDINGS --> AC_VERDICT --> AC_OUT
    AC_OUT --> AUD_RESULT
    AUD_RESULT --> ROUTE_CLM

    ROUTE_RVW -->|"changes_requested"| RC_CFG
    ROUTE_RVW -->|"other → skip resolve_research_review"| ROUTE_CLM

    ROUTE_CLM -->|"changes_requested"| RC_CFG
    ROUTE_CLM -->|"approved | needs_human | skipped"| MERGE_GATE

    RC_CFG --> RC_THREADS --> RC_INTENT --> RC_CLASSMAP
    RC_CLASSMAP -->|"ACCEPT: add_citation, qualify_claim, remove_claim"| RC_ADDR
    RC_CLASSMAP -->|"ACCEPT: rerun_required, design_flaw → ESCALATE"| RC_ESC
    RC_ADDR --> RC_VGATE
    RC_ESC --> RC_VGATE
    RC_VGATE -->|"SKIP or PASS"| RC_DERIVE
    RC_VGATE -->|"FAIL (3 retries exhausted)"| FAIL_EXIT([exit non-zero])
    RC_DERIVE --> RC_OUT --> MERGE_GATE

    MERGE_GATE -->|"either true"| RERUN
    MERGE_GATE -->|"both false"| PUSH

    R1 --> R2 --> R3 --> R4

    %% ─── CLASS ASSIGNMENTS ─── %%
    class START terminal;
    class WT,PRURL,BB stateNode;
    class RVW_GATE,AUD_GATE detector;
    class RVW_RESULT,AUD_RESULT stateNode;
    class ROUTE_RVW phase;
    class ROUTE_CLM newComponent;
    class AC_PH1,AC_PH2 newComponent;
    class AC_CLAIMS,AC_FINDINGS output;
    class AC_VERDICT,AC_OUT newComponent;
    class RC_CFG stateNode;
    class RC_THREADS handler;
    class RC_INTENT newComponent;
    class RC_CLASSMAP stateNode;
    class RC_ADDR,RC_ESC handler;
    class RC_VGATE newComponent;
    class RC_DERIVE,RC_OUT newComponent;
    class MERGE_GATE phase;
    class RERUN,PUSH output;
    class R1,R2,R3,R4 cli;
    class FAIL_EXIT terminal;
Loading

Closes #657

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/impl-20260407-144555-361346/.autoskillit/temp/make-plan/audit_claims_plan_2026-04-07_120100.md

🤖 Generated with Claude Code via AutoSkillit

Token Usage Summary

Step uncached output cache_read cache_write count time
plan 39 53.4k 1.2M 103.9k 1 22m 30s
verify 24 19.1k 1.1M 78.3k 1 5m 53s
implement 83 32.5k 7.3M 114.9k 1 21m 12s
fix 36 14.6k 1.4M 61.5k 1 6m 44s
prepare_pr 10 5.8k 179.7k 35.2k 1 1m 31s
run_arch_lenses 2.5k 45.6k 1.1M 183.4k 3 12m 41s
compose_pr 12 19.6k 308.3k 49.5k 1 4m 31s
Total 2.7k 190.7k 12.6M 626.7k 1h 15m

Trecek and others added 2 commits April 7, 2026 15:33
… gate routing

Adds two new Tier 3 research skills and restructures the research recipe PR+Review
phase so both read-only audit gates (review_research_pr and audit_claims) complete
before any resolution step begins. Replaces check_escalations with merge_escalations
which consumes rerun flags from both resolution skills.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… skills

- Remove stale test_review_research_pr_routes_to_begin_archival and
  test_resolve_research_review_routes_to_begin_archival (superseded by
  deferred routing to audit_claims/route_claims_resolve)
- Add skill_contracts.yaml entries for review-research-pr, audit-claims,
  and resolve-claims-review to satisfy undeclared-capture-key rule
- Add pseudocode allowlist entries for audit-claims and resolve-claims-review
  bash placeholder tokens (mirrors review-research-pr/resolve-research-review
  patterns already in the allowlist)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit PR Review — Verdict: changes_requested

25 actionable findings (critical/warning) requiring fixes before merge. See inline comments.

assert step.on_failure == "begin_archival"
assert step.on_context_limit == "begin_archival"

def test_research_no_issue_number_ingredient(self, recipe) -> None:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] tests: Deleted test test_research_review_step_routes_to_begin_archival_on_any_outcome asserted review_research_pr.on_failure and on_context_limit. The new routing targets ('audit_claims') are not directly tested on the review_research_pr step — only the downstream audit_claims step's own routes are tested. Add assertions for review_research_pr.on_failure == 'audit_claims' and review_research_pr.on_context_limit == 'audit_claims'.

assert matching, "No condition with changes_requested"
assert matching[0] == "resolve_research_review"

def test_research_validates_cleanly(self, recipe) -> None:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] tests: Deleted test test_review_research_pr_has_on_result_routing asserted changes_requested routes to resolve_research_review; deleted test test_review_research_pr_routes_to_begin_archival asserted the default on_result route. No new test verifies the on_result default route of review_research_pr now points to audit_claims.

Comment thread tests/recipe/test_bundled_recipes.py
Comment thread src/autoskillit/skills_extended/audit-claims/SKILL.md
Comment thread src/autoskillit/skills_extended/audit-claims/SKILL.md
expected_output_patterns:
- "verdict\\s*=\\s*(approved|changes_requested|needs_human)"
pattern_examples:
- "verdict = approved\n%%ORDER_UP%%"
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] cohesion: audit-claims contract's pattern_examples omits the 'needs_human' example. The skill emits three verdict values (approved, changes_requested, needs_human) per allowed_values, but pattern_examples only covers two. The review-pr contract serves as the established template and includes all three examples.

If `pr_url` is missing or positional args are insufficient, abort with:
`"Usage: /autoskillit:audit-claims <worktree_path> <base_branch> <pr_url>"`

### Step 0.5 — Code-Index Initialization (required before any code-index tool call)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] slop: Step 0.5 (L77-L94) is generic code-index boilerplate repeated verbatim from CLAUDE.md and other skills. The path-format examples add no skill-specific value. Note: CLAUDE.md mandates this step — the fix is to trim the examples to skill-relevant paths only, not to remove the step.

- `decision_findings` — requires_decision=true (any severity)
- `info_findings` — severity == "info" AND requires_decision=false

### Step 4.5: Echo Primary Obligation
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[critical] slop: Step 4.5 'Echo Primary Obligation' is an AI prompt-engineering directive instructing the executor to narrate its own obligation aloud. This is scaffolding metadata embedded in the skill spec. If this is intentional (mirrors review-pr SKILL.md), it should be acknowledged; otherwise remove.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. Investigated — false_positive_intentional_pattern. Step 4.5 mirrors review-pr/SKILL.md:302-308 which has identical Step 4.5 and Step 6.5 sections. This pattern is intentional prompt-engineering scaffolding to prevent the AI executor from skipping inline comment posting, carried over from the source skill.

gh pr review {pr_number} --comment --body "{summary_markdown}"
```

### Step 6.5: Post-Completion Confirmation
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] slop: Step 6.5 'Post-Completion Confirmation' instructs the executor to recite a confirmation string aloud after posting comments. This is an AI meta-prompt artifact. If mirroring review-pr SKILL.md intentionally, document why; otherwise remove.

exit(1)
# Analyze failures, revert/adjust problematic commit, retry
```

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] slop: Prose paragraph after the validation code block (lines 283-284) fully restates what the code already shows; AI-generated filler that adds no information.

Trecek and others added 4 commits April 7, 2026 16:21
…ee, scope gh repo view, mkdir before diff write, filter invalid line values in COMMENTS_JSON, track Tier 1 fallback success/failure counts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t rev-parse, null-check PR_NUMBER, discard partial subagent JSON, clarify timeout semantics, document claims_review config, add Step 0.5 code-index init

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…_verdict, audit_verdict, review_needs_rerun, claims_needs_rerun in routing step notes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…te_claims_resolve/merge_escalations; add needs_human pattern_examples to review-research-pr and audit-claims contracts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Trecek Trecek added this pull request to the merge queue Apr 7, 2026
Merged via the queue into integration with commit a9150d6 Apr 7, 2026
2 checks passed
@Trecek Trecek deleted the add-audit-claims-skill-and-resolve-claims-review-for-citatio/657 branch April 7, 2026 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant