Implementation Plan: Citation Integrity Gates for Research Pipeline by Trecek · Pull Request #661 · TalonT-Org/AutoSkillit

Trecek · 2026-04-07T23:00:22Z

Summary

Add an optional citation-integrity gate to the research pipeline as two new Tier 3 skills:

audit-claims — parallel subagent-driven claim extraction and evidence matching that emits verdict = {approved|changes_requested|needs_human}. Mirrors the review-research-pr pattern but focused on citation integrity across four claim types: experimental, external, methodological, comparative.
resolve-claims-review — ACCEPT/REJECT/DISCUSS intent validation with five fix strategies (add_citation, qualify_claim, remove_claim, rerun_required [escalated], design_flaw [escalated]). Mirrors the resolve-research-review pattern.

Recipe changes restructure the PR+Review phase so both read-only analysis gates (review_research_pr and audit_claims) complete before any resolution step begins. This prevents double re-runs when both gates request changes. A new merge_escalations route step (replacing check_escalations) triggers at most one re_run_experiment if either resolution skill escalates needs_rerun = true.

Architecture Impact

Process Flow Diagram

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;

    %% TERMINALS %%
    START([START])
    RERUN([re_run_experiment])
    PUSH([re_push_research])

    %% PHASE 1: REVIEW GATE %%
    subgraph ReviewGate ["● Review Gate (research.yaml)"]
        ReviewPR["● review_research_pr<br/>━━━━━━━━━━<br/>review-research-pr skill<br/>skip_when_false: inputs.review_pr<br/>captures: review_verdict"]
    end

    %% PHASE 2: AUDIT CLAIMS (NEW) %%
    subgraph AuditClaimsGate ["★ Audit Claims Gate (audit-claims/SKILL.md — NEW)"]
        AuditEntry["★ audit_claims<br/>━━━━━━━━━━<br/>gated: inputs.audit_claims<br/>retries: 1<br/>captures: audit_verdict"]
        GetDiff["Get PR Diff<br/>━━━━━━━━━━<br/>gh pr diff<br/>save: diff_{pr}.txt"]
        ClaimExtract["★ Phase 1: Claim Extraction<br/>━━━━━━━━━━<br/>Parallel subagents per diff section<br/>(Executive Summary, Results,<br/>Methodology, Discussion, …)<br/>→ claims_{pr}.json"]
        EvidenceMatch["★ Phase 2: Evidence Matching<br/>━━━━━━━━━━<br/>Parallel subagents per claim type<br/>external / methodological / comparative<br/>experimental → skipped (self-evidencing)<br/>→ findings_{pr}.json"]
        AuditAggregate["Aggregate & Deduplicate<br/>━━━━━━━━━━<br/>Deduplicate by (file, line)<br/>Bucket: actionable / decision / info"]
        AuditVerdictDet{"Verdict<br/>Determination<br/>actionable_findings?<br/>decision_findings?"}
        PostInlineReview["★ Post Inline Review<br/>━━━━━━━━━━<br/>GitHub Reviews API<br/>APPROVE / COMMENT / REQUEST_CHANGES<br/>Fallback: individual → summary dump"]

        AuditEntry --> GetDiff
        GetDiff --> ClaimExtract
        ClaimExtract --> EvidenceMatch
        EvidenceMatch --> AuditAggregate
        AuditAggregate --> AuditVerdictDet
        AuditVerdictDet -->|"actionable → changes_requested"| PostInlineReview
        AuditVerdictDet -->|"decision only → needs_human"| PostInlineReview
        AuditVerdictDet -->|"none → approved"| PostInlineReview
    end

    %% PHASE 3: RESOLUTION ROUTING %%
    subgraph ResolutionRouting ["● Resolution Routing (research.yaml — MODIFIED)"]
        RouteReviewResolve{"● route_review_resolve<br/>review_verdict?"}
        ResolveResearch["resolve_research_review<br/>━━━━━━━━━━<br/>retries: 2<br/>captures: review_needs_rerun"]
        RouteClaimsResolve{"● route_claims_resolve<br/>audit_verdict?"}

        RouteReviewResolve -->|"review == changes_requested"| ResolveResearch
        RouteReviewResolve -->|"else (approved / needs_human)"| RouteClaimsResolve
        ResolveResearch -->|"any exit → review_needs_rerun captured"| RouteClaimsResolve
    end

    %% PHASE 4: RESOLVE CLAIMS REVIEW (NEW) %%
    subgraph ResolveClaimsGate ["★ Resolve Claims Review (resolve-claims-review/SKILL.md — NEW)"]
        FetchComments["Fetch Review Comments<br/>━━━━━━━━━━<br/>REST: inline comments + reviews<br/>GraphQL: thread node IDs<br/>(cursor-paginated, skip resolved)"]
        ParseDimGroup["Parse & Dimension-Group<br/>━━━━━━━━━━<br/>Extract [severity] dimension:<br/>citations / methodology /<br/>comparisons / unknown"]
        IntentVal["★ Parallel Intent Validation<br/>━━━━━━━━━━<br/>Subagents per dimension group<br/>→ ACCEPT / REJECT / DISCUSS<br/>+ fix_strategy + escalate flag"]
        FixRoute{"fix_strategy?<br/>ACCEPT / REJECT / DISCUSS"}
        ApplyEdits["Apply Edits<br/>━━━━━━━━━━<br/>add_citation / qualify_claim /<br/>remove_claim<br/>→ pre-commit + git commit"]
        EscalateFind["★ Escalate Finding<br/>━━━━━━━━━━<br/>rerun_required / design_flaw<br/>→ escalation_records_{pr}.json"]
        SkipFix["Skip (REJECT / DISCUSS)<br/>━━━━━━━━━━<br/>No code change<br/>Record skip"]
        RunValCmd["Validation Command<br/>━━━━━━━━━━<br/>claims_review.validation_command<br/>max 3 iterations | null → SKIPPED"]
        ThreadResolve["Resolve Threads + Post Replies<br/>━━━━━━━━━━<br/>GraphQL resolveReviewThread<br/>+ inline reply per comment<br/>Best-effort (never affects exit code)"]
        RerunCheck{"Any rerun_required<br/>in escalation_records?"}

        FetchComments --> ParseDimGroup
        ParseDimGroup --> IntentVal
        IntentVal --> FixRoute
        FixRoute -->|"add_citation / qualify_claim / remove_claim"| ApplyEdits
        FixRoute -->|"rerun_required / design_flaw"| EscalateFind
        FixRoute -->|"REJECT / DISCUSS"| SkipFix
        ApplyEdits --> RunValCmd
        EscalateFind --> ThreadResolve
        SkipFix --> ThreadResolve
        RunValCmd --> ThreadResolve
        ThreadResolve --> RerunCheck
    end

    %% PHASE 5: MERGE ESCALATIONS %%
    MergeEsc{"● merge_escalations<br/>review_needs_rerun<br/>OR claims_needs_rerun?"}

    %% TOP-LEVEL FLOW %%
    START --> ReviewPR
    ReviewPR -->|"any exit (success / failure / context_limit)"| AuditEntry
    PostInlineReview -->|"verdict = approved / changes_requested / needs_human"| RouteReviewResolve

    RouteClaimsResolve -->|"audit == changes_requested"| FetchComments
    RouteClaimsResolve -->|"else (approved / needs_human)"| MergeEsc

    RerunCheck -->|"needs_rerun = true / false"| MergeEsc
    MergeEsc -->|"true"| RERUN
    MergeEsc -->|"false"| PUSH

    %% CLASS ASSIGNMENTS %%
    class START,RERUN,PUSH terminal;
    class ReviewPR,ResolveResearch handler;
    class AuditEntry,ClaimExtract,EvidenceMatch,PostInlineReview,IntentVal,EscalateFind newComponent;
    class GetDiff,AuditAggregate,FetchComments,ParseDimGroup,ApplyEdits,ThreadResolve output;
    class AuditVerdictDet,RouteReviewResolve,RouteClaimsResolve,FixRoute,RerunCheck,MergeEsc stateNode;
    class RunValCmd detector;
    class SkipFix gap;

Scenarios Diagram

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 55, 'curve': 'basis'}}}%%
flowchart LR
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    %% ─────────────────────────────────────── %%
    subgraph S1 ["SCENARIO 1: Citation Audit Passes (Happy Path)"]
        direction LR
        S1_PR["review_research_pr<br/>━━━━━━━━━━<br/>reads: worktree diff<br/>writes: review_verdict"]
        S1_AUDIT["★ audit_claims<br/>━━━━━━━━━━<br/>reads: PR diff via gh<br/>writes: claims JSON, findings JSON<br/>emits: verdict=approved"]
        S1_ROUTE_REV["route_review_resolve<br/>━━━━━━━━━━<br/>reads: review_verdict<br/>routes: approved → route_claims_resolve"]
        S1_ROUTE_CL["route_claims_resolve<br/>━━━━━━━━━━<br/>reads: audit_verdict=approved<br/>routes: → merge_escalations"]
        S1_MERGE["merge_escalations<br/>━━━━━━━━━━<br/>reads: needs_rerun flags<br/>routes: → re_push_research"]
        S1_PUSH["re_push_research<br/>━━━━━━━━━━<br/>writes: origin HEAD push<br/>routes: → begin_archival"]
    end

    S1_PR -->|"verdict captured"| S1_AUDIT
    S1_AUDIT -->|"all paths → route"| S1_ROUTE_REV
    S1_ROUTE_REV -->|"approved/skipped"| S1_ROUTE_CL
    S1_ROUTE_CL -->|"approved → skip resolve"| S1_MERGE
    S1_MERGE -->|"no rerun needed"| S1_PUSH

    %% ─────────────────────────────────────── %%
    subgraph S2 ["SCENARIO 2: Citation Fixes Required (changes_requested)"]
        direction LR
        S2_AUDIT["★ audit_claims<br/>━━━━━━━━━━<br/>reads: PR diff sections<br/>Phase 1: extract claims by section<br/>Phase 2: match against evidence<br/>emits: verdict=changes_requested"]
        S2_ROUTE["route_claims_resolve<br/>━━━━━━━━━━<br/>reads: audit_verdict=changes_requested<br/>routes: → resolve_claims_review"]
        S2_RESOLVE["★ resolve_claims_review<br/>━━━━━━━━━━<br/>reads: inline PR comments<br/>groups by dimension: citations/methodology/comparisons<br/>launches: parallel intent-validation subagents<br/>writes: classification_map, fixes"]
        S2_INTENT["Intent Validation Subagents<br/>━━━━━━━━━━<br/>reads: actual file content ±30 lines<br/>classifies: ACCEPT / REJECT / DISCUSS<br/>assigns: fix_strategy per finding"]
        S2_FIX["Apply Fixes<br/>━━━━━━━━━━<br/>add_citation / qualify_claim / remove_claim<br/>writes: git commits in worktree<br/>resolves: review threads via GraphQL"]
        S2_SUMMARY["★ resolve-claims-review summary<br/>━━━━━━━━━━<br/>writes: report_{pr}_{ts}.md<br/>emits: needs_rerun=false"]
    end

    S2_AUDIT -->|"changes_requested"| S2_ROUTE
    S2_ROUTE -->|"routes to resolve"| S2_RESOLVE
    S2_RESOLVE -->|"per-dimension subagents"| S2_INTENT
    S2_INTENT -->|"classification_map"| S2_FIX
    S2_FIX -->|"commits applied"| S2_SUMMARY

    %% ─────────────────────────────────────── %%
    subgraph S3 ["SCENARIO 3: Rerun Escalation (rerun_required fix_strategy)"]
        direction LR
        S3_INTENT["Intent Validation Subagents<br/>━━━━━━━━━━<br/>reads: claim at flagged line<br/>classifies: ACCEPT<br/>fix_strategy=rerun_required"]
        S3_ESCALATE["★ resolve_claims_review<br/>━━━━━━━━━━<br/>escalate=true for rerun_required<br/>writes: escalation_records_{pr}.json<br/>emits: needs_rerun=true"]
        S3_MERGE["merge_escalations<br/>━━━━━━━━━━<br/>reads: claims_needs_rerun=true<br/>routes: → re_run_experiment"]
        S3_RERUN["re_run_experiment<br/>━━━━━━━━━━<br/>runs: run-experiment --adjust<br/>writes: updated results_path"]
        S3_REPORT["re_write_report<br/>━━━━━━━━━━<br/>reads: updated experiment_results<br/>writes: new research report"]
        S3_PUSH["re_push_research<br/>━━━━━━━━━━<br/>pushes: updated branch<br/>routes: → begin_archival"]
    end

    S3_INTENT -->|"rerun_required found"| S3_ESCALATE
    S3_ESCALATE -->|"needs_rerun=true"| S3_MERGE
    S3_MERGE -->|"rerun triggered"| S3_RERUN
    S3_RERUN -->|"new results"| S3_REPORT
    S3_REPORT -->|"updated report"| S3_PUSH

    %% ─────────────────────────────────────── %%
    subgraph S4 ["SCENARIO 4: Dual Gate Sequencing (both gates before resolution)"]
        direction LR
        S4_REV["● review_research_pr<br/>━━━━━━━━━━<br/>skip_when_false: inputs.review_pr<br/>on_context_limit/failure: audit_claims<br/>on_result: → audit_claims"]
        S4_AUDIT["★ audit_claims<br/>━━━━━━━━━━<br/>skip_when_false: inputs.audit_claims<br/>on_exhausted/failure: route_review_resolve<br/>on_result: → route_review_resolve"]
        S4_GATE1["route_review_resolve<br/>━━━━━━━━━━<br/>reads: review_verdict<br/>deferred gate 1: review changes?"]
        S4_GATE2["route_claims_resolve<br/>━━━━━━━━━━<br/>reads: audit_verdict<br/>deferred gate 2: claims changes?"]
        S4_RESOLVE_REV["resolve_research_review<br/>━━━━━━━━━━<br/>conditional: review_verdict=changes_requested<br/>on_success: → route_claims_resolve"]
    end

    S4_REV -->|"all paths"| S4_AUDIT
    S4_AUDIT -->|"all paths"| S4_GATE1
    S4_GATE1 -->|"changes_requested"| S4_RESOLVE_REV
    S4_GATE1 -->|"other verdicts"| S4_GATE2
    S4_RESOLVE_REV -->|"after resolve"| S4_GATE2

    %% ─────────────────────────────────────── %%
    subgraph S5 ["SCENARIO 5: Graceful Degradation (audit disabled or gh unavailable)"]
        direction LR
        S5_INPUT["● research.yaml ingredient<br/>━━━━━━━━━━<br/>audit_claims: false (default)<br/>reads: inputs.audit_claims"]
        S5_SKIP["★ audit_claims<br/>━━━━━━━━━━<br/>skip_when_false: inputs.audit_claims<br/>OR gh unavailable:<br/>verdict=approved, exit 0"]
        S5_ROUTE["route_claims_resolve<br/>━━━━━━━━━━<br/>reads: audit_verdict=approved/empty<br/>routes: → merge_escalations (bypass)"]
        S5_END["merge_escalations → re_push<br/>━━━━━━━━━━<br/>no claims_needs_rerun flag set<br/>proceeds directly to archival"]
    end

    S5_INPUT -->|"false → skip"| S5_SKIP
    S5_SKIP -->|"approved emitted"| S5_ROUTE
    S5_ROUTE -->|"no resolve step"| S5_END

    %% CLASS ASSIGNMENTS %%
    class S1_PR,S4_REV handler;
    class S1_AUDIT,S2_AUDIT,S5_SKIP newComponent;
    class S1_ROUTE_REV,S1_ROUTE_CL,S1_MERGE,S4_GATE1,S4_GATE2,S5_ROUTE phase;
    class S1_PUSH,S3_PUSH output;
    class S2_RESOLVE,S3_ESCALATE newComponent;
    class S2_ROUTE,S3_MERGE phase;
    class S2_INTENT,S3_INTENT handler;
    class S2_FIX,S3_RERUN,S3_REPORT handler;
    class S2_SUMMARY output;
    class S3_INTENT detector;
    class S4_AUDIT newComponent;
    class S4_RESOLVE_REV handler;
    class S5_INPUT stateNode;
    class S5_END output;

State Lifecycle Diagram

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    START([RESEARCH RECIPE STATE])

    %% ─── INIT_PRESERVE: Recipe Context Fields ─── %%
    subgraph InitPreserve ["INIT_PRESERVE — Recipe Context (● modified)"]
        direction LR
        WT["● worktree_path<br/>━━━━━━━━━━<br/>Set at create_worktree<br/>Never re-derived"]
        PRURL["● pr_url<br/>━━━━━━━━━━<br/>Set at compose_research_pr<br/>Passed to audit-claims"]
        BB["● base_branch<br/>━━━━━━━━━━<br/>From ingredient<br/>Never changes"]
    end

    %% ─── DEFERRED GATE PATTERN ─── %%
    subgraph DeferredGates ["DEFERRED GATE ROUTING (● research.yaml) — both gates complete before routing"]
        direction TB
        RVW_GATE{"skip_when_false<br/>inputs.review_pr<br/>━━━━━━━━━━<br/>INIT_ONLY per recipe run"}
        RVW_RESULT["● review_verdict<br/>━━━━━━━━━━<br/>MUTABLE<br/>approved | changes_requested | needs_human"]
        AUD_GATE{"★ skip_when_false<br/>inputs.audit_claims<br/>━━━━━━━━━━<br/>INIT_ONLY per recipe run"}
        AUD_RESULT["★ audit_verdict<br/>━━━━━━━━━━<br/>MUTABLE<br/>approved | changes_requested | needs_human"]
        ROUTE_RVW["● route_review_resolve<br/>━━━━━━━━━━<br/>DERIVED routing<br/>reads: review_verdict"]
        ROUTE_CLM["★ route_claims_resolve<br/>━━━━━━━━━━<br/>DERIVED routing<br/>reads: audit_verdict"]
    end

    %% ─── AUDIT-CLAIMS STATE MACHINE ─── %%
    subgraph AuditClaimsBlock ["★ audit-claims Skill — Claim Extraction State"]
        direction TB
        AC_PH1["★ Phase 1: Extract Claims<br/>━━━━━━━━━━<br/>Parallel subagents by section<br/>INIT_ONLY: pr_url, diff_{pr}.txt"]
        AC_CLAIMS["★ claims_{pr}.json<br/>━━━━━━━━━━<br/>APPEND_ONLY<br/>Built from Phase 1 subagents<br/>{file, line, claim_text, claim_type}"]
        AC_PH2["★ Phase 2: Evidence Match<br/>━━━━━━━━━━<br/>Parallel subagents by claim_type<br/>experimental → always skip"]
        AC_FINDINGS["★ findings_{pr}.json<br/>━━━━━━━━━━<br/>APPEND_ONLY (then deduped)<br/>Dedup: keep highest severity<br/>per (file, line) pair"]
        AC_VERDICT{"★ Verdict Derivation<br/>━━━━━━━━━━<br/>DERIVED (never stored mutable)<br/>actionable? → changes_requested<br/>decision_only? → needs_human<br/>none? → approved"}
        AC_OUT["★ verdict = {value}<br/>━━━━━━━━━━<br/>Structured output token<br/>Captured as audit_verdict"]
    end

    %% ─── RESOLVE-CLAIMS-REVIEW STATE MACHINE ─── %%
    subgraph ResolveBlock ["★ resolve-claims-review Skill — Fix Application State"]
        direction TB
        RC_CFG["★ claims_review config<br/>━━━━━━━━━━<br/>INIT_ONLY for session<br/>validation_command (null=skip)<br/>validation_timeout (default: 120)"]
        RC_THREADS["★ Thread + Comment Fetch<br/>━━━━━━━━━━<br/>inline_comments_{pr}.json<br/>threads_{pr}.json<br/>dimension_groups_{pr}.json"]
        RC_INTENT["★ Intent Validation Gate<br/>━━━━━━━━━━<br/>BEFORE any code changes<br/>Parallel by dimension group<br/>Classifies: ACCEPT | REJECT | DISCUSS"]
        RC_CLASSMAP["★ classification_map_{pr}.json<br/>━━━━━━━━━━<br/>MUTABLE (built from validation)<br/>Keys: comment_id → verdict_entry<br/>fix_strategy per ACCEPT"]
        RC_ADDR["★ addressed_thread_ids<br/>━━━━━━━━━━<br/>APPEND_ONLY (in-memory)<br/>Grows per ACCEPT fix applied<br/>Consumed at Step 6: thread resolve"]
        RC_ESC["★ escalation_records<br/>━━━━━━━━━━<br/>APPEND_ONLY (in-memory)<br/>strategy: rerun_required | design_flaw<br/>Consumed at output determination"]
        RC_VGATE{"★ Validation Command Gate<br/>━━━━━━━━━━<br/>null → SKIP entirely<br/>else: max 3 retry iterations<br/>FAIL after 3 → exit non-zero"}
        RC_DERIVE{"★ needs_rerun Derivation<br/>━━━━━━━━━━<br/>DERIVED from escalation_records<br/>any strategy==rerun_required → true<br/>all design_flaw or none → false"}
        RC_OUT["★ needs_rerun = {true|false}<br/>━━━━━━━━━━<br/>Structured output token<br/>Captured as claims_needs_rerun"]
    end

    %% ─── MERGE ESCALATIONS ─── %%
    subgraph MergeBlock ["● merge_escalations — Combined Rerun Gate (● modified)"]
        direction TB
        MERGE_GATE{"● review_needs_rerun == true<br/>OR claims_needs_rerun == true<br/>━━━━━━━━━━<br/>DERIVED routing (reads 2 MUTABLE fields)"}
        RERUN["re_run_experiment"]
        PUSH["re_push_research"]
    end

    %% ─── RESUME DETECTION ─── %%
    subgraph ResumeBlock ["RESUME DETECTION STRATEGY"]
        direction LR
        R1["Tier 1: worktree_path present<br/>━━━━━━━━━━<br/>Resume from execution phase"]
        R2["Tier 2: pr_url present<br/>━━━━━━━━━━<br/>Resume from PR review phase"]
        R3["Tier 3: audit_verdict present<br/>━━━━━━━━━━<br/>Skip audit → route_claims_resolve"]
        R4["Tier 4: claims_needs_rerun present<br/>━━━━━━━━━━<br/>Skip resolution → merge_escalations"]
    end

    %% ─── CONNECTIONS ─── %%
    START --> InitPreserve

    WT --> RVW_GATE
    BB --> RVW_GATE
    PRURL --> AUD_GATE

    RVW_GATE -->|"skip/run"| RVW_RESULT
    RVW_RESULT --> ROUTE_RVW

    AUD_GATE -->|"skip (audit_claims=false)"| AUD_RESULT
    AUD_GATE -->|"run (audit_claims=true)"| AC_PH1
    AC_PH1 --> AC_CLAIMS --> AC_PH2 --> AC_FINDINGS --> AC_VERDICT --> AC_OUT
    AC_OUT --> AUD_RESULT
    AUD_RESULT --> ROUTE_CLM

    ROUTE_RVW -->|"changes_requested"| RC_CFG
    ROUTE_RVW -->|"other → skip resolve_research_review"| ROUTE_CLM

    ROUTE_CLM -->|"changes_requested"| RC_CFG
    ROUTE_CLM -->|"approved | needs_human | skipped"| MERGE_GATE

    RC_CFG --> RC_THREADS --> RC_INTENT --> RC_CLASSMAP
    RC_CLASSMAP -->|"ACCEPT: add_citation, qualify_claim, remove_claim"| RC_ADDR
    RC_CLASSMAP -->|"ACCEPT: rerun_required, design_flaw → ESCALATE"| RC_ESC
    RC_ADDR --> RC_VGATE
    RC_ESC --> RC_VGATE
    RC_VGATE -->|"SKIP or PASS"| RC_DERIVE
    RC_VGATE -->|"FAIL (3 retries exhausted)"| FAIL_EXIT([exit non-zero])
    RC_DERIVE --> RC_OUT --> MERGE_GATE

    MERGE_GATE -->|"either true"| RERUN
    MERGE_GATE -->|"both false"| PUSH

    R1 --> R2 --> R3 --> R4

    %% ─── CLASS ASSIGNMENTS ─── %%
    class START terminal;
    class WT,PRURL,BB stateNode;
    class RVW_GATE,AUD_GATE detector;
    class RVW_RESULT,AUD_RESULT stateNode;
    class ROUTE_RVW phase;
    class ROUTE_CLM newComponent;
    class AC_PH1,AC_PH2 newComponent;
    class AC_CLAIMS,AC_FINDINGS output;
    class AC_VERDICT,AC_OUT newComponent;
    class RC_CFG stateNode;
    class RC_THREADS handler;
    class RC_INTENT newComponent;
    class RC_CLASSMAP stateNode;
    class RC_ADDR,RC_ESC handler;
    class RC_VGATE newComponent;
    class RC_DERIVE,RC_OUT newComponent;
    class MERGE_GATE phase;
    class RERUN,PUSH output;
    class R1,R2,R3,R4 cli;
    class FAIL_EXIT terminal;

Closes #657

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/impl-20260407-144555-361346/.autoskillit/temp/make-plan/audit_claims_plan_2026-04-07_120100.md

🤖 Generated with Claude Code via AutoSkillit

Token Usage Summary

Step	uncached	output	cache_read	cache_write	count	time
plan	39	53.4k	1.2M	103.9k	1	22m 30s
verify	24	19.1k	1.1M	78.3k	1	5m 53s
implement	83	32.5k	7.3M	114.9k	1	21m 12s
fix	36	14.6k	1.4M	61.5k	1	6m 44s
prepare_pr	10	5.8k	179.7k	35.2k	1	1m 31s
run_arch_lenses	2.5k	45.6k	1.1M	183.4k	3	12m 41s
compose_pr	12	19.6k	308.3k	49.5k	1	4m 31s
Total	2.7k	190.7k	12.6M	626.7k		1h 15m

… gate routing Adds two new Tier 3 research skills and restructures the research recipe PR+Review phase so both read-only audit gates (review_research_pr and audit_claims) complete before any resolution step begins. Replaces check_escalations with merge_escalations which consumes rerun flags from both resolution skills. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… skills - Remove stale test_review_research_pr_routes_to_begin_archival and test_resolve_research_review_routes_to_begin_archival (superseded by deferred routing to audit_claims/route_claims_resolve) - Add skill_contracts.yaml entries for review-research-pr, audit-claims, and resolve-claims-review to satisfy undeclared-capture-key rule - Add pseudocode allowlist entries for audit-claims and resolve-claims-review bash placeholder tokens (mirrors review-research-pr/resolve-research-review patterns already in the allowlist) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Trecek

AutoSkillit PR Review — Verdict: changes_requested

25 actionable findings (critical/warning) requiring fixes before merge. See inline comments.

Trecek · 2026-04-07T23:11:57Z

-        assert step.on_failure == "begin_archival"
-        assert step.on_context_limit == "begin_archival"
-
    def test_research_no_issue_number_ingredient(self, recipe) -> None:


[warning] tests: Deleted test test_research_review_step_routes_to_begin_archival_on_any_outcome asserted review_research_pr.on_failure and on_context_limit. The new routing targets ('audit_claims') are not directly tested on the review_research_pr step — only the downstream audit_claims step's own routes are tested. Add assertions for review_research_pr.on_failure == 'audit_claims' and review_research_pr.on_context_limit == 'audit_claims'.

Trecek · 2026-04-07T23:11:57Z

-        assert matching, "No condition with changes_requested"
-        assert matching[0] == "resolve_research_review"
-
    def test_research_validates_cleanly(self, recipe) -> None:


[warning] tests: Deleted test test_review_research_pr_has_on_result_routing asserted changes_requested routes to resolve_research_review; deleted test test_review_research_pr_routes_to_begin_archival asserted the default on_result route. No new test verifies the on_result default route of review_research_pr now points to audit_claims.

Trecek · 2026-04-07T23:11:58Z

+    expected_output_patterns:
+    - "verdict\\s*=\\s*(approved|changes_requested|needs_human)"
+    pattern_examples:
+    - "verdict = approved\n%%ORDER_UP%%"


[warning] cohesion: audit-claims contract's pattern_examples omits the 'needs_human' example. The skill emits three verdict values (approved, changes_requested, needs_human) per allowed_values, but pattern_examples only covers two. The review-pr contract serves as the established template and includes all three examples.

Trecek · 2026-04-07T23:11:58Z

+If `pr_url` is missing or positional args are insufficient, abort with:
+`"Usage: /autoskillit:audit-claims <worktree_path> <base_branch> <pr_url>"`
+
+### Step 0.5 — Code-Index Initialization (required before any code-index tool call)


[warning] slop: Step 0.5 (L77-L94) is generic code-index boilerplate repeated verbatim from CLAUDE.md and other skills. The path-format examples add no skill-specific value. Note: CLAUDE.md mandates this step — the fix is to trim the examples to skill-relevant paths only, not to remove the step.

Trecek · 2026-04-07T23:11:58Z

+   - `decision_findings` — requires_decision=true (any severity)
+   - `info_findings` — severity == "info" AND requires_decision=false
+
+### Step 4.5: Echo Primary Obligation


[critical] slop: Step 4.5 'Echo Primary Obligation' is an AI prompt-engineering directive instructing the executor to narrate its own obligation aloud. This is scaffolding metadata embedded in the skill spec. If this is intentional (mirrors review-pr SKILL.md), it should be acknowledged; otherwise remove.

Investigated — this is intentional. Investigated — false_positive_intentional_pattern. Step 4.5 mirrors review-pr/SKILL.md:302-308 which has identical Step 4.5 and Step 6.5 sections. This pattern is intentional prompt-engineering scaffolding to prevent the AI executor from skipping inline comment posting, carried over from the source skill.

Trecek · 2026-04-07T23:11:58Z

+gh pr review {pr_number} --comment --body "{summary_markdown}"
+```
+
+### Step 6.5: Post-Completion Confirmation


[warning] slop: Step 6.5 'Post-Completion Confirmation' instructs the executor to recite a confirmation string aloud after posting comments. This is an AI meta-prompt artifact. If mirroring review-pr SKILL.md intentionally, document why; otherwise remove.

Trecek · 2026-04-07T23:11:58Z

+            exit(1)
+        # Analyze failures, revert/adjust problematic commit, retry
+```
+


[warning] slop: Prose paragraph after the validation code block (lines 283-284) fully restates what the code already shows; AI-generated filler that adds no information.

…ee, scope gh repo view, mkdir before diff write, filter invalid line values in COMMENTS_JSON, track Tier 1 fallback success/failure counts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…t rev-parse, null-check PR_NUMBER, discard partial subagent JSON, clarify timeout semantics, document claims_review config, add Step 0.5 code-index init Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…_verdict, audit_verdict, review_needs_rerun, claims_needs_rerun in routing step notes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…te_claims_resolve/merge_escalations; add needs_human pattern_examples to review-research-pr and audit-claims contracts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Trecek and others added 2 commits April 7, 2026 15:33

Trecek commented Apr 7, 2026

View reviewed changes

Trecek and others added 4 commits April 7, 2026 16:21

fix(review): audit-claims defense hardening — abort on missing worktr…

fd769ee

…ee, scope gh repo view, mkdir before diff write, filter invalid line values in COMMENTS_JSON, track Tier 1 fallback success/failure counts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(review): research.yaml — document absent-key semantics for review…

d0f661c

…_verdict, audit_verdict, review_needs_rerun, claims_needs_rerun in routing step notes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(review): add routing condition tests for route_review_resolve/rou…

a92c695

…te_claims_resolve/merge_escalations; add needs_human pattern_examples to review-research-pr and audit-claims contracts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Trecek added this pull request to the merge queue Apr 7, 2026

Merged via the queue into integration with commit a9150d6 Apr 7, 2026
2 checks passed

Trecek deleted the add-audit-claims-skill-and-resolve-claims-review-for-citatio/657 branch April 7, 2026 23:35

Trecek mentioned this pull request Apr 14, 2026

Promote integration to main (198 PRs, 182 issues, 102 fixes, 107 features, 1 infra, 7 tests, 4 docs) #925

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation Plan: Citation Integrity Gates for Research Pipeline#661

Implementation Plan: Citation Integrity Gates for Research Pipeline#661
Trecek merged 6 commits intointegrationfrom
add-audit-claims-skill-and-resolve-claims-review-for-citatio/657

Trecek commented Apr 7, 2026 •

edited

Loading

Uh oh!

Trecek left a comment

Uh oh!

Trecek Apr 7, 2026

Uh oh!

Trecek Apr 7, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Trecek Apr 7, 2026

Uh oh!

Trecek Apr 7, 2026

Uh oh!

Trecek Apr 7, 2026

Uh oh!

Trecek Apr 7, 2026

Uh oh!

Trecek Apr 7, 2026

Uh oh!

Trecek Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Trecek commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture Impact

Process Flow Diagram

Scenarios Diagram

State Lifecycle Diagram

Implementation Plan

Token Usage Summary

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Trecek commented Apr 7, 2026 •

edited

Loading