Skip to content

Promote integration to main (198 PRs, 182 issues, 102 fixes, 107 features, 1 infra, 7 tests, 4 docs)#925

Merged
Trecek merged 270 commits intomainfrom
integration
Apr 15, 2026
Merged

Promote integration to main (198 PRs, 182 issues, 102 fixes, 107 features, 1 infra, 7 tests, 4 docs)#925
Trecek merged 270 commits intomainfrom
integration

Conversation

@Trecek
Copy link
Copy Markdown
Collaborator

@Trecek Trecek commented Apr 14, 2026

Promotion: integration to main

This promotion merges 382 commits across 198 PRs from the integration branch into main, advancing AutoSkillit from v0.5.2 to v0.8.38 across three minor release cycles. The release delivers a production-grade research pipeline with containerized Micromamba execution, a 31-lens visualization and experiment family, and full output-mode routing. The quota guard was fundamentally redesigned with a dual-window model, while headless session orchestration gained single-skill mode, anomaly detection, idle timeouts, and verdict-gated CI recovery. Merge workflow reliability was hardened with three-way routing and a cheap rebase gate, and the recording/replay infrastructure was rewritten in Rust via PyO3.

Stats: 671 files changed, 99570 insertions(+), 12642 deletions(-) | 102 fixes, 107 features, 7 tests, 1 infra, 4 docs

Highlights

  • Research pipeline overhaul: Containerized Micromamba execution, YAML-driven experiment type registry, 12 vis-lens + 19 exp-lens skill families, plan-visualization, report bundling, and output_mode ingredient (breaking: default changed from implicit pr to local)
  • Quota guard redesign: Dual-window model (short 85% / long 98%) with per-window enable/disable toggles, three-layer resilience, and a new disable_quota_guard MCP tool
  • Verdict-gated CI recovery: resolve-failures emits typed verdicts (real_fix / already_green / flake_suspected / ci_only_failure); recipes route via on_result gates enforced by a new semantic rule
  • Headless session hardening: Single-skill mode, MAX_MCP_OUTPUT_TOKENS injection, D-state/high-CPU anomaly detection, idle_output_timeout per step, structured crash-path error returns
  • Skill system expansion: Skill pack registry, sub-skill dependency activation, prepare-pr/compose-pr decomposition, validate-audit skill, review-design/resolve-design-review skills, first-run guided onboarding

Release Notes

New Features

Research Pipeline

  • Containerized experiment execution via Micromamba; experiments run in isolated conda environments
  • YAML-driven experiment type registry (ExperimentTypeSpec) with 5 bundled types
  • 12 vis-lens visualization skills and 19 exp-lens experiment lens skills
  • plan-visualization step wired post-design-review; output_mode ingredient (local | pr)
  • Post-completion archival phase with artifact merge, re-validation, and escalation routing
  • bundle-local-report skill for offline report packaging

Quota Guard

  • Dual-window quota model: short window (default 85%) + long window (default 98%) thresholds
  • Per-window enable/disable toggles; three-layer resilience (PreToolUse → PostToolUse → MCP)
  • New disable_quota_guard MCP tool for session-scoped opt-out
  • Background refresh loop (240s) + post-run_skill refresh keeps cache warm

Headless Session Orchestration

  • Single-skill mode for scoped headless sessions
  • MAX_MCP_OUTPUT_TOKENS injected at builder level for all session types
  • D-state process and high-CPU anomaly detection with configurable thresholds
  • idle_output_timeout per-step recipe override; bounded staleness suppression (1800s max)
  • Structured error returns on crash path; contract recovery nudge for missing structured output

Merge Workflow

  • Three-way merge routing for autoMergeAllowed repos (queue / direct / immediate)
  • Cheap rebase gate before conflict-resolution skill invocation
  • Merge queue classifier immunity; queue ejection loop fix

Skill System

  • Skill pack registry with YAML-defined packs and configurable visibility
  • Sub-skill dependency activation; prepare-pr/compose-pr decomposition replacing open-pr
  • validate-audit, review-design, resolve-design-review, resolve-research-review skills
  • --resume flag for cook and order CLI commands
  • chart-course and check-bearing project-local strategic skills

Recording/Replay Infrastructure

  • RecordingSubprocessRunner and SequencingSubprocessRunner for api-simulator
  • McpRecordingMiddleware for MCP-level capture; api-simulator rewritten in Rust via PyO3

CLI & Onboarding

  • First-run detection with guided onboarding experience
  • Stale-install detection with auto-detect prompt
  • terminal_guard() alternate-screen buffer envelope; terminal freeze immunity
  • Strict schema validation for config.yaml; stable recipe listing order

Token Telemetry

  • Cross-contamination fix via order_id scoping
  • Token summary table split into 4 distinct API token fields
  • Token summary uses GitHub REST API instead of gh pr edit/view

MCP Server

  • Wire-format sanitization middleware (_wire_compat.py)
  • Startup race fix; editable install guard; lifespan readiness sentinel
  • Signal-guarded server bootstrap

Clone System

  • Clone cleanup registry with session-scoped ownership tagging
  • Clone contamination guard; clone_repo clones from remote URL
  • Deferred cleanup to end of pipeline in process-issues

Pretty Output Hook

  • Typed payload dispatch for PostToolUse hooks → Markdown-KV reformatter
  • Dedicated formatters for run_skill, run_cmd, test_check, merge_worktree, kitchen_status, clone_repo, load_recipe, open_kitchen, list_recipes

Bug Fixes

102 rectification PRs addressing:

  • Doctor-install disconnect; plugin cache destruction + startup race regression
  • run_cmd env stripping regression; parallel pipeline deadlock (signal handling)
  • Pre-queue routing block; AskUserQuestion guard session-scope immunity
  • Context-limit dirty-tree immunity; stale MCP direct entry lifecycle
  • Headless core crash path (structured error instead of raise)
  • Merge queue watcher inconclusive budget; MCP init race + wire format rejection
  • Queue ejection loop; false stale kills during background Bash tasks
  • Token cross-contamination via order_id scoping
  • Channel B drain-race recovery; session adjudication false-positive
  • Terminal guard ownership contract; stale hook infinite loop
  • And 90+ additional stability and correctness fixes

Test Suite

  • 7 dedicated test improvement PRs (groupA–H): removed tautological tests, consolidated duplicates, fixed xdist isolation, corrected misleading names, eliminated over-mocking
  • 130+ new test files added; comprehensive contract and skill compliance testing

Infrastructure

  • patch-bump-integration.yml workflow for auto-incrementing patch version on PR merge
  • version-bump.yml updated: minor bump on promote (X.Y.Z → X.Y+1.0)
  • api-simulator dev dependency (Rust/PyO3) requires Rust toolchain + GH_PAT
  • Documentation overhauled: topic-based layout with 30+ new docs

Breaking Changes

Migration 0.7.77-to-0.8.0 — Research Recipe Overhaul

  1. write-report renamed to generate-report
  2. output_mode default changed from implicit pr to local
  3. commit_research_artifacts replaced by stage_bundle + finalize_bundle
  4. vis-lens must appear in requires_packs

Migration 0.8.9-to-0.9.0 — Verdict-Gated CI Recovery

  1. Auto-fix skills now declare typed verdict and fixes_applied outputs
  2. Recipes must use on_result: verdict dispatch instead of unconditional on_success: re_push
  3. New conditional-skill-ungated-push semantic rule (ERROR severity) enforces this

Other

  • run_skill exit_after_stop_delay_ms reduced from 120s to 2s
  • Quota guard config: single threshold replaced by dual-window model
  • Recipe schema additions: optional_context_refs, stale_threshold, idle_output_timeout, block, requires_packs

Merged PRs

PR Title Author Labels
#442 Rectify: Init Gitignore Completeness Immunity Trecek -
#446 Implementation Plan: Auto-Merge Direct Merge Fallback (Issue #401) Trecek -
#450 Rectify: Token Summary Note-Protocol Immunity Trecek -
#451 Implementation Plan: Orchestrator Must Claim All Issues Upfront Trecek -
#452 Init — Require Explicit Opt-in to Bypass Missing Secret Scanning Hook Trecek -
#453 Rectify: Retry Reason Routing Blindness — PART A ONLY Trecek -
#454 Implementation Plan: Strict Schema Validation for config.yaml Trecek -
#458 Implementation Plan: open-pr-main Token Usage Summary Trecek -
#459 Implementation Plan: Stable Recipe Listing Order Trecek -
#460 Implementation Plan: Orchestrator Must Detect Merge Queue Trecek -
#463 Add Parallel Step Scheduling Rule to Sous-Chef Prompt Trecek -
#464 Implementation Plan: First-Run Detection and Guided Onboarding Trecek -
#465 Rectify: Structured Output Markdown Fragility — PART A ONLY Trecek -
#467 Rectify: Token Telemetry Contamination — PART A ONLY Trecek -
#472 Implementation Plan: Three-Way Merge Routing for autoMergeAllowed Trecek -
#473 Rectify: Root .gitignore Write Path — PART A ONLY Trecek -
#474 Rectify: Secret Scanning Gate Ordering — PART A ONLY Trecek -
#476 Rectify: Config Schema Contamination — PART A ONLY Trecek -
#478 Rectify: Order Parameter Table Breaks — PART A ONLY Trecek -
#482 Relocate temp artifact paths to .autoskillit/temp/ Trecek -
#483 Rectify: Advisory Step Context-Limit Routing — PART B ONLY Trecek -
#484 Rectify: Structured Output Instruction Hardening — PART B ONLY Trecek -
#485 Rectify: Non-Blocking Dispatch Immunity — PART B ONLY Trecek -
#490 Rectify: format_ingredients_table GFM Width Cap Trecek -
#491 Rectify: Hardcoded origin in Skill Bash Blocks — PART A ONLY Trecek -
#492 Implementation Plan: open-pr Strips PART X ONLY Suffix Trecek -
#493 Defer Clone Cleanup Until All Parallel Pipelines Complete Trecek -
#495 Rectify: pretty_output Hook — Typed Payload Dispatch Trecek -
#500 Merge Queue Detection Should Validate merge_group Trigger Trecek -
#502 Rectify: Formatter Raw/Derived Field Duplication Trecek -
#505 Implementation Plan: Configurable Label Whitelist Trecek -
#508 Rectify: stdlib-only Contract for SKILL.md Python Blocks Trecek -
#510 Add --resume flag to cook and order CLI commands Trecek -
#511 Rectify: Terminal Ownership Contract Trecek -
#517 Replace Reverse-Sync Version Bumping with Minor-Bump-on-Promote Trecek -
#518 Detect Stale Installs — Doctor Check + Dev-Mode Install Trecek -
#519 Rectify: MergeQueueWatcher Terminal State via Negative Inference Trecek -
#520 Rectify: TestRunner Protocol Returns Lossy Bare Tuple Trecek -
#521 Implementation Plan: terminal_guard() Alternate Screen Buffer Trecek -
#528 Add validate-audit skill: parallel post-audit finding validation Trecek -
#530 Implementation Plan: Patch-Bump-Integration.yml Race Fix Trecek -
#531 Rectify: Session ID Resolution — Fragmented Sources Trecek -
#534 Rectify: Stale Hook Prompt Infinite Loop Trecek -
#535 Rectify: SkillResult.session_id Channel B Backfill Trecek -
#536 Rectify: terminal_guard() Exit-Only Ownership Contract Trecek -
#542 Rectify: MCP Tool Name Prefix Non-Determinism Trecek -
#543 Rectify: Headless Editable Install Poison Trecek -
#544 Rectify: push_to_remote Non-Fast-Forward Rejection Trecek -
#545 Rectify: Stale-Check Dismiss/Snooze State Split Trecek -
#546 Rectify: Pipeline Identity Layer Trecek -
#548 Fix cook --resume UnusedCliTokensError Trecek -
#549 Implementation Plan: Skill Pack Registry Trecek -
#551 Implementation Plan: token_summary_appender REST API Trecek -
#556 Fix Test Placement, Organization, Documentation (groupG) Trecek -
#557 Remove Tautological/Import-Only Tests — groupA Trecek -
#558 Consolidate and Remove Redundant Tests (groupE) Trecek -
#559 Strengthen Exception-Grade HIGH Findings (groupD) Trecek -
#560 Fix Misleading Test Names and Stale Logic (groupF) Trecek -
#561 Low-Severity Test Suite Fixes (groupH) Trecek -
#562 Remove Over-Mocked Tests and Fix State Mutation (groupC) Trecek -
#563 Fix xdist Isolation — Hardcoded /dev/shm and /tmp (groupB) Trecek -
#564 Resolve architectural audit findings (2026-03-28) Trecek -
#568 Rectify: Session Adjudication False-Positive Trecek -
#569 Rectify: Workspace Clean Worktree Discovery Trecek -
#570 Cohesion audit: server fixes, hook registry, docs Trecek -
#571 Reduce audit-arch false positives with pre-flight gates Trecek -
#573 Bug: prepare-issue Uses Summary Instead of Full Report Trecek -
#574 Token Telemetry Cross-Contamination — order_id Scoping Trecek -
#577 Split promote-to-main into Changelog + Review-Promotion Trecek -
#582 Bug: promote-to-main Incorrectly in Bundled Skills Trecek -
#583 Bundle Research Recipe from spectral-init Trecek -
#585 Recipe requires_packs Schema Extension Trecek -
#587 Bundle 19 Experimental Lens Skills Trecek -
#588 Add review-research-pr skill Trecek -
#594 Simplify Research Recipe — Single-Phase, Always-Decompose Trecek -
#595 Create resolve-research-review Skill Trecek -
#596 Create open-research-pr Skill Trecek -
#597 Evolve Experiment Plan Schema — YAML Frontmatter Trecek -
#598 Create review-design Skill Trecek -
#602 Rectify: review-design L1 Severity Calibration Trecek -
#606 Add resolve-design-review + Eliminate Terminal STOP Dead-End Trecek -
#611 process-issues — Defer Clone Cleanup to End of Pipeline Trecek -
#612 Move smoke-test recipe to project-local Trecek -
#613 Integrate api-simulator for Quota Guard E2E Testing Trecek -
#614 Add Red-Team Severity Calibration by Experiment Type Trecek -
#615 Split Token Summary Table Into 4 API Token Fields Trecek -
#616 Rectify: Zero-Write False Positive — Completion Token Contract Trecek -
#620 Research Recipe — Post-Review Re-Validation + Escalation Trecek -
#622 Channel B Drain-Race Recovery for Deferred type=result Trecek -
#623 Clone Contamination Guard Trecek -
#624 Test Session Failure Classification with api-simulator Trecek -
#625 Research Recipe — Post-Completion Archival Trecek -
#626 Fix git auth for private deps in version-bump workflows Trecek -
#628 Queue Ejection Loop Fix Trecek -
#630 Fix review-design Threshold + Scope Drift Trecek -
#633 Fix False Stale Kills During Background Bash Tasks Trecek -
#634 Default audit-impl to OFF in All Recipes Trecek -
#636 Research Recipe — Troubleshoot/Diagnose Skill Trecek -
#639 Rectify: Quota Guard Cache Refresh Lifecycle Trecek -
#640 Rectify: Sub-Skill Refusal Handling Trecek -
#642 Tier 2 Sub-Skill Dependency Activation Trecek -
#648 Archive Research Artifacts into .tar.gz Trecek -
#649 Rectify: Per-Invocation Completion Marker Isolation Trecek -
#650 Rectify: review-pr Posts Zero Inline Comments Trecek -
#651 Migrate to Rust-based api-simulator (PyO3 Rewrite) Trecek -
#656 Decompose open-research-pr into prepare + lens + compose Trecek -
#658 Rectify: research recipe archives wrong directory Trecek -
#659 Decompose open-pr into prepare-pr + lenses + compose-pr Trecek -
#660 Rewrite Smoke-Test as Lightweight E2E Sanity Check Trecek -
#661 Citation Integrity Gates for Research Pipeline Trecek -
#665 Rectify: CI Event Discrimination Trecek -
#668 Quota Guard Three-Layer Resilience Trecek -
#670 Rectify: Ghost Hook Registrations Survive Git Revert Trecek -
#671 Fix resolve-review misclassifies protocol deviations Trecek -
#675 Dynamic archive collection in research recipe Trecek -
#678 Fix VT100 reset terminal corruption Trecek -
#679 Move TOOL_CATEGORIES from L0 Core to L3 Server Trecek -
#682 Rectify: Stale-Hooks Check Infinite Prompt Loop Trecek -
#683 Rectify: Terminal Guard Reset Specification Trecek -
#684 Add RecordingSubprocessRunner Trecek -
#685 Add SequencingSubprocessRunner for Scenario Replay Trecek -
#687 Add Project-Scoped Full-Audit Recipe Trecek -
#691 Pin api-simulator Dependency Trecek -
#694 STOP Verdict Fail-Fast Gate — ADDRESSABLE Classification Trecek -
#695 Add Computational Complexity to Scope Skill Trecek -
#696 Add agent_implementability to review-design Trecek -
#699 Bump fastmcp and dynaconf Pins Trecek -
#701 Rectify: Smoke-Test Workspace Isolation Trecek -
#705 Rectify: merge_worktree Merges Into Wrong Branch Trecek -
#706 Rectify: Quota Guard Multi-Window Selection Trecek -
#708 Rectify: Quota Dataclass Type Boundary Enforcement Trecek -
#713 Documentation Overhaul — Topic-Based Layout Trecek -
#714 Configurable Temp Directory via Placeholder Substitution Trecek -
#715 Fix zero_writes False Positive for Research Skills Trecek -
#727 Wire McpRecordingMiddleware into MCP Server Trecek -
#728 Source-drift gate + open_kitchen envelope + quota cache versioning Trecek -
#731 Rectify: IDE Env Leak Across Subprocess Launch Sites Trecek -
#732 Single-Skill Mode for Headless Sessions Trecek -
#733 Per-window quota guard thresholds Trecek -
#734 Fix config-resolved ingredient defaults override Trecek -
#735 Rectify: Stdout Idle Watchdog + Bounded Suppression Trecek -
#736 Add Exception Whitelists to Project-Local Audit Skills Trecek -
#737 Validated Audit Report — arch Remediation Trecek -
#740 Rectify: prepare-issue Validated-Report Pipeline Trecek -
#746 Validated Audit Report — cohesion Trecek -
#748 Validated Audit Report — tests Trecek -
#749 Rectify: SIGTERM Bypasses atexit Trecek -
#750 Rectify: Channel B Blind Spot + AskUserQuestion Deviation Trecek -
#751 Rectify: Session-Bridge File Outside temp/ Trecek -
#752 Contract Recovery Nudge Trecek -
#753 Consolidate quota guard threshold configuration Trecek -
#754 Investigate Historical Recurrence Check Trecek -
#761 Clone Cleanup Registry — Session-Scoped Ownership Trecek -
#762 Fix route_queue_mode autoMergeAllowed=false Trecek -
#764 Rectify: Pre-Kitchen AskUserQuestion Gate Trecek -
#766 Rectify: Clone Registry Batch Delete Write-Back Trecek -
#767 Detect D-state and High-CPU Anomalies Trecek -
#768 CLI Update Prompts — Single Source of Truth Trecek -
#769 Per-window enable/disable for quota_guard Trecek -
#776 Rectify: Trace Identity Contract Trecek -
#781 vis-lens family, plan-visualization, output_mode Trecek -
#783 Rectify: CLI Startup Update Prompt Freeze Immunity Trecek -
#793 idle_output_timeout as Per-Step Recipe Override Trecek -
#794 Generalize Scope Skill for Non-Code Research Trecek -
#795 Experiment Type Registry — YAML-Driven, Extensible Trecek -
#796 Generalize generate-report for Non-Software Research Trecek -
#797 Ensure Research Experiments Include Test Infrastructure Trecek -
#798 Data Staging and Resource Planning Skill Trecek -
#799 Research Environment Isolation — Containerized Execution Trecek -
#803 Rectify: MCP Server Startup Race Trecek -
#808 Rectify: channel_won Unconditional SIGKILL Trecek -
#809 Rectify: PTY Wrapper Tracer PID Resolution Trecek -
#812 Rectify: clone_repo Silent Local-Transport Fallback Trecek -
#813 Rectify: Merge Queue Classifier Immunity Trecek -
#818 Rectify: Pre-Queue Routing Block Immunity Trecek -
#821 Rectify: Parallel Pipeline Deadlock — Signal Handling Trecek -
#860 Cheap Rebase Gate Before Conflict-Resolution Skill Trecek -
#861 resolve-failures: CI source of truth; kill bypass Trecek -
#862 resolve-failures — Test Polling Cascade Trecek -
#893 Generalize headless narration suppression Trecek -
#896 Rectify: Stale MCP Direct Entry Lifecycle Trecek -
#897 Rectify: Headless Core Crash Path Trecek -
#899 GitHub API resilience — token_factory fallback Trecek -
#900 Rectify: AskUserQuestion Guard Session-Scope Immunity Trecek -
#901 Rectify: Context-Limit Dirty-Tree Immunity Trecek -
#903 clone_repo: clone from remote URL Trecek -
#905 Auto-derive step_name when recording is active Trecek -
#906 run_cmd bypasses recorder during recording Trecek -
#908 Strip %%ORDER_UP%% — Prompt-Level Injection Trecek -
#909 Rectify: Doctor Plugin Cache + Startup Race Trecek -
#910 Inject MAX_MCP_OUTPUT_TOKENS into Order Sessions Trecek -
#917 skill_cmd_guard Suggests Relocating Extra Context Trecek -
#918 Rectify: Merge Queue Watcher Inconclusive Budget Trecek -
#920 Rectify: MCP Init Race + Wire Format Rejection Trecek -
#921 Fix run_cmd env stripping regression Trecek -
#922 Inject MAX_MCP_OUTPUT_TOKENS at builder level Trecek -
#923 Add MCP tool to disable quota guard Trecek -
#924 Rectify: Doctor-Install Disconnect Trecek -

Attention Required

  • api-simulator users: Rust toolchain + GH_PAT environment variable required to build
  • Custom research recipes: Apply migration 0.7.77-to-0.8.0 — failure defaults to local mode (no PR creation)
  • Pipeline recipes with CI recovery: Apply migration 0.8.9-to-0.9.0 — unrouted pushes now fail validation
  • Quota guard config: Old single-threshold format no longer valid; update to dual-window schema
  • run_skill callers: exit_after_stop_delay_ms reduced from 120s to 2s

Architecture Impact

Module Dependency (Structural — "How are modules coupled?")

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%%
graph TB
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;

    subgraph L3 ["L3 — APPLICATION"]
        direction LR
        SERVER["● server/<br/>━━━━━━━━━━<br/>FastMCP tools (18 files)<br/>Fan-in: 17"]
        CLI["● cli/<br/>━━━━━━━━━━<br/>CLI entry points (22 files)<br/>★ _onboarding, _update_checks<br/>★ _serve_guard, _terminal"]
    end

    subgraph L2 ["L2 — DOMAIN"]
        direction LR
        RECIPE["● recipe/<br/>━━━━━━━━━━<br/>Schema + validation (35 files)<br/>Fan-in: 40<br/>★ rules_blocks, rules_packs<br/>★ experiment_type_registry"]
        MIGRATION["● migration/<br/>━━━━━━━━━━<br/>Version migrations (5 files)<br/>★ 0.7.77→0.8.0<br/>★ 0.8.9→0.9.0"]
    end

    subgraph L1 ["L1 — SERVICES"]
        direction LR
        CONFIG["● config/<br/>━━━━━━━━━━<br/>Dynaconf settings (3 files)<br/>Fan-in: 20"]
        PIPELINE["● pipeline/<br/>━━━━━━━━━━<br/>DI + telemetry (9 files)<br/>Fan-in: 14<br/>★ background.py"]
        EXECUTION["● execution/<br/>━━━━━━━━━━<br/>Headless + process (21 files)<br/>Fan-in: 19<br/>★ recording, clone_guard<br/>★ _headless_scan"]
        WORKSPACE["● workspace/<br/>━━━━━━━━━━<br/>Clone + skills (7 files)<br/>Fan-in: 14<br/>★ clone_registry<br/>★ worktree"]
    end

    subgraph L0 ["L0 — FOUNDATION"]
        direction LR
        CORE["● core/<br/>━━━━━━━━━━<br/>Types + IO (15 files)<br/>Fan-in: 109<br/>★ _claude_env, readiness<br/>★ kitchen_state"]
    end

    subgraph STANDALONE ["STANDALONE — HOOKS"]
        direction LR
        HOOKS["● hooks/<br/>━━━━━━━━━━<br/>Pre/PostToolUse (19 files)<br/>★ pretty_output_hook<br/>★ quota_post_hook<br/>★ token_summary_hook"]
        HOOKREG["● hook_registry.py"]
    end

    subgraph EXT ["EXTERNAL"]
        direction LR
        FASTMCP["fastmcp"]
        HTTPX["httpx"]
        ANYIO["anyio"]
    end

    SERVER -->|"recipe, migration"| RECIPE
    SERVER -->|"migration"| MIGRATION
    CLI -->|"recipe, migration"| RECIPE
    SERVER -->|"pipeline, execution,<br/>workspace, config"| PIPELINE
    CLI -->|"config, execution,<br/>workspace"| CONFIG
    SERVER -->|"core (12 files)"| CORE
    CLI -->|"core (10 files)"| CORE
    RECIPE -->|"core (20 files)"| CORE
    MIGRATION -->|"core (3 files)"| CORE
    RECIPE -.->|"workspace (deferred)"| WORKSPACE
    CONFIG -->|"core"| CORE
    PIPELINE -->|"core"| CORE
    EXECUTION -->|"core"| CORE
    WORKSPACE -->|"core"| CORE
    EXECUTION -.->|"config (deferred)"| CONFIG
    HOOKREG -->|"core"| CORE
    HOOKS -->|"hook_registry"| HOOKREG
    SERVER -.->|"⚠ cli (hard L3→L3)"| CLI
    SERVER --> FASTMCP
    EXECUTION --> HTTPX
    CLI --> ANYIO

    class SERVER,CLI cli;
    class RECIPE,MIGRATION phase;
    class CONFIG,PIPELINE,EXECUTION,WORKSPACE handler;
    class CORE stateNode;
    class HOOKS,HOOKREG newComponent;
    class FASTMCP,HTTPX,ANYIO integration;
Loading

Legend: Dark Blue = L3 App | Purple = L2 Domain | Orange = L1 Services | Teal = L0 Foundation | Green = Hooks | Red = External | Dashed = Deferred/violation

Process Flow (Physiological — "How does it behave?")

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 55, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;

    START([START])
    COMPLETE([RECIPE COMPLETE])
    ESCALATE([ESCALATE TO USER])

    subgraph KITCHEN ["★ Kitchen Lifecycle"]
        OPEN["● open_kitchen<br/>━━━━━━━━━━<br/>Prime quota cache<br/>Start refresh loop"]
        LOAD["● load_recipe<br/>━━━━━━━━━━<br/>YAML → Recipe"]
    end

    subgraph SOUSCHEF ["● Sous-Chef Loop"]
        STEP{"● Step Eval<br/>━━━━━━━━━━<br/>skip? retries?"}
        QUOTA{"★ Quota Gate<br/>━━���━━━━━━━<br/>Dual-window"}
        DISPATCH["● run_skill"]
    end

    subgraph HEADLESS ["● Headless Session"]
        SPAWN["● run_managed_async<br/>━━━━━━━━━━<br/>anyio task group"]
        RACE{"● Channel Race<br/>━━━━━━━━━━<br/>A: stdout | B: JSONL<br/>★ idle_timeout"}
        CLASSIFY["● Result Classification<br/>━━━━━━━━━━<br/>★ Recovery pipeline<br/>★ Zero-write gate"]
    end

    subgraph VERDICT ["★ Verdict Routing"]
        ROUTE{"● on_result<br/>━━━━━━━━━━<br/>Typed dispatch"}
        REAL_FIX["re_push"]
        GREEN["★ pre_resolve_rebase"]
        HUMAN["release_issue_failure"]
        CI["● wait_for_ci"]
    end

    subgraph MERGE ["● Merge Workflow"]
        MERGE_EVAL{"● route_queue_mode<br/>━━━━━━━━━━<br/>★ 3-way routing"}
        QUEUE["queue path"]
        DIRECT["direct merge"]
        IMMEDIATE["★ immediate"]
    end

    START --> OPEN --> LOAD --> STEP
    STEP -->|"skip=false"| QUOTA
    QUOTA -->|"allowed"| DISPATCH
    QUOTA -->|"★ blocked"| QUOTA
    DISPATCH --> SPAWN --> RACE --> CLASSIFY
    CLASSIFY -->|"success"| ROUTE
    CLASSIFY -->|"needs_retry"| STEP
    CLASSIFY -->|"budget exhausted"| ESCALATE
    ROUTE -->|"on_success"| MERGE_EVAL
    ROUTE -->|"★ real_fix"| REAL_FIX
    ROUTE -->|"★ already_green"| GREEN
    ROUTE -->|"★ flake/ci_only"| HUMAN
    REAL_FIX --> CI
    GREEN --> CI
    CI -->|"green"| MERGE_EVAL
    CI -->|"failure"| ROUTE
    HUMAN --> ESCALATE
    MERGE_EVAL -->|"queue+trigger"| QUEUE
    MERGE_EVAL -->|"auto OK"| DIRECT
    MERGE_EVAL -->|"★ neither"| IMMEDIATE
    QUEUE --> COMPLETE
    DIRECT --> COMPLETE
    IMMEDIATE --> COMPLETE

    class START,COMPLETE,ESCALATE terminal;
    class OPEN,LOAD,DISPATCH,SPAWN,CLASSIFY handler;
    class STEP,QUOTA,RACE,MERGE_EVAL stateNode;
    class ROUTE,CI phase;
    class REAL_FIX,GREEN,HUMAN,QUEUE,DIRECT,IMMEDIATE newComponent;
Loading

Legend: Dark Blue = Terminal | Teal = Decisions | Orange = Processing | Purple = Verdict routing | Green = New components

C4 Container (Anatomical — "How is it built?")

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
graph TB
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;

    USER(["Developer<br/>━━━━━━━━━━<br/>Claude Code user"])

    subgraph APP ["APPLICATION"]
        direction LR
        CLI_APP["● CLI<br/>━━━━━━━━━━<br/>cyclopts + anyio<br/>★ onboarding, update"]
        MCP["● MCP Server<br/>━━━━━━━━━━<br/>FastMCP v3 (stdio)<br/>★ wire_compat, lifespan"]
        CHEF["● Sous-Chef<br/>━━━━━━━━━━<br/>Tier 1 Claude<br/>★ wavefront, verdicts"]
    end

    subgraph HOOKS_L ["★ HOOKS"]
        direction LR
        PRE["★ PreToolUse<br/>━━━━━━━━━━<br/>quota_guard<br/>ask_user_guard"]
        POST["★ PostToolUse<br/>━━━━━━━━━━<br/>pretty_output<br/>quota_post, token_summary"]
    end

    subgraph DOMAIN ["DOMAIN"]
        direction LR
        RECIPE["● Recipe Engine<br/>━━━━━━━━━━<br/>igraph + YAML<br/>★ 7 new rule modules<br/>★ experiment types"]
        MIGR["● Migration<br/>━━━━━━━━━━<br/>★ 0.7.77→0.8.0<br/>★ 0.8.9→0.9.0"]
    end

    subgraph SERVICE ["SERVICES"]
        direction LR
        EXEC["● Execution<br/>━━━━━━━━━━<br/>anyio + psutil<br/>★ recording, anomaly<br/>★ idle timeout"]
        WS["● Workspace<br/>━━━━━━━━━━<br/>★ clone_registry<br/>★ worktree"]
        PIPE["● Pipeline DI<br/>━━━━━━━━━━<br/>★ background"]
        CONF["● Config<br/>━━━━━━━━━━<br/>dynaconf<br/>★ dual-window quota"]
    end

    subgraph FOUND ["FOUNDATION"]
        CORE["● Core<br/>━━━━━━━━━━<br/>structlog + PyYAML<br/>★ _claude_env, readiness"]
    end

    subgraph STORE ["STORAGE"]
        direction LR
        RECIPES[("● Recipes<br/>━━━━━━━━━━<br/>★ research.yaml<br/>★ experiment-types/")]
        SKILLS[("● Skills<br/>━━━━━━━━━━<br/>120+ SKILL.md<br/>★ 60+ new")]
        LOGS[("Session Logs")]
        CACHE[("★ Quota Cache")]
    end

    subgraph EXT ["EXTERNAL"]
        direction LR
        CLAUDE["Claude CLI<br/>━━━━━━━━━━<br/>subprocess + PTY"]
        GH["GitHub API<br/>━━━━━━━━━━<br/>REST + GraphQL"]
        ANTH["Anthropic API<br/>━━━━━━━━━━<br/>Token quota"]
    end

    USER -->|"CLI / MCP stdio"| CLI_APP
    CLI_APP -->|"starts"| MCP
    MCP -->|"injects"| CHEF
    CHEF -->|"MCP tools"| MCP
    CHEF -.->|"intercept"| PRE
    MCP -.->|"intercept"| POST
    MCP -->|"loads"| RECIPE
    MCP -->|"migrates"| MIGR
    MCP -->|"spawns"| EXEC
    MCP -->|"isolates"| WS
    MCP -->|"injects"| PIPE
    MCP -->|"reads"| CONF
    RECIPE --> CORE
    EXEC --> CORE
    WS --> CORE
    CONF --> CORE
    PIPE --> CORE
    EXEC -->|"reads"| RECIPES
    WS -->|"reads"| SKILLS
    EXEC -->|"writes"| LOGS
    EXEC -->|"reads/writes"| CACHE
    PRE -->|"reads"| CACHE
    EXEC -->|"subprocess"| CLAUDE
    EXEC -->|"CI/merge queue"| GH
    EXEC -->|"quota"| ANTH
    WS -->|"git"| GH

    class USER,CLI_APP,MCP,CHEF cli;
    class RECIPE,MIGR phase;
    class EXEC,WS,PIPE,CONF handler;
    class CORE stateNode;
    class PRE,POST newComponent;
    class RECIPES,SKILLS,LOGS,CACHE output;
    class CLAUDE,GH,ANTH integration;
Loading

Legend: Dark Blue = Application | Purple = Domain | Orange = Services | Teal = Foundation | Green = New hooks | Dark Teal = Storage | Red = External

Closes #401
Closes #427
Closes #429
Closes #439
Closes #440
Closes #441
Closes #444
Closes #445
Closes #447
Closes #448
Closes #449
Closes #456
Closes #457
Closes #461
Closes #462
Closes #466
Closes #468
Closes #469
Closes #470
Closes #471
Closes #475
Closes #477
Closes #480
Closes #481
Closes #486
Closes #487
Closes #488
Closes #494
Closes #498
Closes #499
Closes #503
Closes #504
Closes #506
Closes #507
Closes #509
Closes #512
Closes #513
Closes #514
Closes #516
Closes #522
Closes #524
Closes #525
Closes #526
Closes #527
Closes #529
Closes #532
Closes #533
Closes #537
Closes #538
Closes #539
Closes #540
Closes #541
Closes #547
Closes #550
Closes #553
Closes #554
Closes #555
Closes #565
Closes #566
Closes #567
Closes #572
Closes #576
Closes #579
Closes #589
Closes #590
Closes #591
Closes #592
Closes #593
Closes #599
Closes #600
Closes #601
Closes #603
Closes #604
Closes #605
Closes #607
Closes #608
Closes #609
Closes #610
Closes #617
Closes #618
Closes #619
Closes #621
Closes #627
Closes #629
Closes #631
Closes #632
Closes #635
Closes #637
Closes #638
Closes #641
Closes #643
Closes #644
Closes #646
Closes #647
Closes #652
Closes #653
Closes #655
Closes #657
Closes #662
Closes #663
Closes #664
Closes #666
Closes #669
Closes #672
Closes #673
Closes #676
Closes #680
Closes #681
Closes #686
Closes #688
Closes #690
Closes #692
Closes #693
Closes #697
Closes #698
Closes #700
Closes #703
Closes #704
Closes #707
Closes #710
Closes #711
Closes #712
Closes #716
Closes #717
Closes #718
Closes #719
Closes #721
Closes #723
Closes #724
Closes #725
Closes #729
Closes #739
Closes #741
Closes #742
Closes #744
Closes #745
Closes #747
Closes #755
Closes #756
Closes #757
Closes #758
Closes #759
Closes #760
Closes #763
Closes #771
Closes #774
Closes #775
Closes #777
Closes #778
Closes #784
Closes #785
Closes #786
Closes #787
Closes #788
Closes #789
Closes #790
Closes #801
Closes #802
Closes #804
Closes #805
Closes #806
Closes #807
Closes #811
Closes #814
Closes #815
Closes #816
Closes #817
Closes #819
Closes #859
Closes #892
Closes #894
Closes #895
Closes #902
Closes #904
Closes #907
Closes #911
Closes #912
Closes #913
Closes #914
Closes #915
Closes #916
Closes #919

Generated with Claude Code via AutoSkillit

Trecek and others added 30 commits April 4, 2026 21:25
…rminal STOP Dead-End (#606)

## Summary

The research recipe's `review_design` step currently hard-routes
`verdict=STOP` directly to `design_rejected` (pipeline halt), bypassing
any analysis of whether the stop triggers are actually fixable. This
causes unnecessary pipeline deaths when stop triggers are mechanical
methodological flaws with concrete fixes (as shown in
TalonT-Org/spectral-init#222).

This plan adds:
1. A new `resolve-design-review` skill that triages each stop-trigger
finding as `ADDRESSABLE`, `STRUCTURAL`, or `DISCUSS` using parallel
feasibility-validation subagents, then emits either `resolution=revised`
(loop back for revision) or `resolution=failed` (genuinely terminal)
2. A new `resolve_design_review` recipe step in `research.yaml` that
routes `STOP → resolve_design_review` instead of directly to
`design_rejected`
3. A skill contract entry for `resolve-design-review` in
`skill_contracts.yaml`
4. Updated tests: fix the existing STOP-routing assertion and add new
tests for the step and skill

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    START([START])
    REJECTED([design_rejected<br/>action: stop])
    EXEC([create_worktree<br/>→ Execution Phase])

    subgraph DesignPhase ["Research Recipe — Design Phase"]
        direction TB
        scope["scope<br/>━━━━━━━━━━<br/>Scope research question"]
        plan["plan_experiment<br/>━━━━━━━━━━<br/>Plan experiment<br/>(receives revision_guidance)"]
        review["● review_design<br/>━━━━━━━━━━<br/>Validate plan<br/>retries: 2"]
        revise["revise_design<br/>━━━━━━━━━━<br/>Route → plan_experiment"]
        rdr["★ resolve_design_review<br/>━━━━━━━━━━<br/>Triage STOP findings<br/>retries: 1"]
        triage{"★ Triage<br/>━━━━━━━━━━<br/>Any ADDRESSABLE<br/>or DISCUSS?"}
    end

    %% FLOW %%
    START --> scope
    scope --> plan
    plan --> review
    review -->|"verdict=GO"| EXEC
    review -->|"verdict=REVISE"| revise
    revise --> plan
    review -->|"● verdict=STOP<br/>(was: design_rejected)"| rdr
    rdr --> triage
    triage -->|"resolution=revised<br/>any ADDRESSABLE/DISCUSS"| revise
    triage -->|"resolution=failed<br/>all STRUCTURAL"| REJECTED

    %% CLASS ASSIGNMENTS %%
    class START,REJECTED,EXEC terminal;
    class scope,plan handler;
    class review,revise stateNode;
    class rdr,triage newComponent;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Terminal | START, design_rejected halt, create_worktree
handoff |
| Orange | Handler | Existing processing steps (scope, plan_experiment)
|
| Teal | State | Existing routing/decision nodes (review_design,
revise_design) |
| Green | New Component | ★ New resolve_design_review step + triage
logic |

Closes #605

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-20260404-132147-193877/.autoskillit/temp/make-plan/resolve_design_review_plan_2026-04-04_132804.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | input | output | cached | count | time |
|------|-------|--------|--------|-------|------|
| plan | 39 | 23.7k | 1.5M | 1 | 8m 6s |
| verify | 23 | 12.0k | 937.2k | 1 | 4m 21s |
| implement | 56 | 16.1k | 2.7M | 1 | 7m 30s |
| fix | 25 | 9.1k | 879.4k | 1 | 5m 58s |
| audit_impl | 17 | 14.8k | 356.6k | 1 | 5m 57s |
| open_pr | 24 | 12.9k | 799.4k | 1 | 4m 41s |
| **Total** | 184 | 88.6k | 7.2M | | 36m 35s |

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…ipeline (#611)

## Summary

Every recipe (`implementation`, `remediation`, `implementation-groups`,
`merge-prs`) previously had an interactive `confirm_cleanup` prompt at
its terminal step. When `process-issues` drives batch processing, this
halted the pipeline waiting for user input. A `defer_cleanup` flag was
designed to bypass it, but made "interrupt the pipeline" the default and
"don't interrupt" the opt-in.

The fix: remove the interactive cleanup path entirely from all recipes.
Every terminal step unconditionally calls `register_clone_status`
(success or failure), writing to a shared registry file. After all
issues in `process-issues` complete, a single `batch_cleanup_clones`
call deletes all success-status clones and preserves all error-status
clones. No prompts. No flags. No per-issue decisions.

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 55, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;

    START([● process-issues starts batch])

    subgraph PerIssue ["Per-Issue Recipe (× N issues)"]
        direction TB
        RECIPE["● Recipe Pipeline<br/>━━━━━━━━━━<br/>implementation / remediation<br/>implementation-groups / merge-prs<br/>plan → implement → test → push → PR → wait"]
        OUTCOME{"terminal<br/>outcome?"}
        REL_S["● release_issue_success<br/>━━━━━━━━━━<br/>release GitHub issue claim<br/>on_success/on_failure → register"]
        REL_F["● release_issue_failure<br/>━━━━━━━━━━<br/>release on error<br/>on_success/on_failure → register_failure"]
        REG_S["● register_clone_success<br/>━━━━━━━━━━<br/>register_clone_status<br/>status='success'<br/>on_success/on_failure → done"]
        REG_F["● register_clone_failure<br/>━━━━━━━━━━<br/>register_clone_status<br/>status='error'<br/>on_success/on_failure → escalate_stop"]
        DONE["● done<br/>━━━━━━━━━━<br/>action: stop (success)"]
        FAIL["● escalate_stop<br/>━━━━━━━━━━<br/>action: stop (failure)"]
    end

    REGISTRY[("● clone-cleanup-registry.json<br/>━━━━━━━━━━<br/>.autoskillit/temp/<br/>accumulated entries")]

    subgraph PostBatch ["● After ALL Batches Complete (process-issues Step 3d)"]
        direction LR
        BATCH["● batch_cleanup_clones<br/>━━━━━━━━━━<br/>reads registry<br/>deletes status=success clones<br/>preserves status=error clones<br/>no prompt, one call"]
        PRESERVED["preserved clones<br/>━━━━━━━━━━<br/>status=error kept<br/>for investigation"]
        DELETED["deleted clones<br/>━━━━━━━━━━<br/>status=success removed<br/>disk reclaimed"]
    end

    END_OK([COMPLETE])

    START --> RECIPE
    RECIPE --> OUTCOME
    OUTCOME -->|"success path"| REL_S
    OUTCOME -->|"failure path"| REL_F
    REL_S --> REG_S
    REL_F --> REG_F
    REG_S -->|"writes status=success"| REGISTRY
    REG_F -->|"writes status=error"| REGISTRY
    REG_S --> DONE
    REG_F --> FAIL
    DONE -->|"after all issues done"| BATCH
    FAIL -->|"after all issues done"| BATCH
    BATCH -->|"reads registry"| REGISTRY
    BATCH --> PRESERVED
    BATCH --> DELETED
    DELETED --> END_OK
    PRESERVED --> END_OK

    class START,END_OK terminal;
    class RECIPE handler;
    class OUTCOME stateNode;
    class REL_S,REL_F phase;
    class REG_S,REG_F,BATCH newComponent;
    class DONE phase;
    class FAIL detector;
    class REGISTRY stateNode;
    class PRESERVED,DELETED output;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Terminal | Start and end states |
| Orange | Handler | Recipe pipeline execution |
| Teal | State | Decision routing and registry storage |
| Purple | Phase | Control flow nodes (release, done) |
| Green | New/Modified | ● Modified steps (register, batch cleanup) |
| Red | Detector | Failure terminal (escalate_stop) |
| Dark Teal | Output | Clone disposition artifacts |

### State Lifecycle Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    START([Pipeline Terminal Step])

    subgraph WritePath ["● WRITE: Recipe Terminal Registration (once per clone)"]
        direction LR
        REG_S["● register_clone_success<br/>━━━━━━━━━━<br/>INIT_ONLY write<br/>status='success'<br/>clone_path (immutable)"]
        REG_F["● register_clone_failure<br/>━━━━━━━━━━<br/>INIT_ONLY write<br/>status='error'<br/>clone_path (immutable)"]
    end

    subgraph Registry ["● Registry File — APPEND_ONLY during run"]
        direction TB
        ENTRY["● clone-cleanup-registry.json<br/>━━━━━━━━━━<br/>entries: [{clone_path, status,<br/>step_name, timestamp}]<br/>written N times (once per clone)<br/>never mutated after write"]
    end

    subgraph ReadPath ["● READ: Batch Cleanup (once, post-run)"]
        direction LR
        BATCH["● batch_cleanup_clones<br/>━━━━━━━━━━<br/>reads all entries<br/>partitions by status"]
        GATE{"status?"}
        DEL["delete clone dir<br/>━━━━━━━━━━<br/>status=success<br/>disk reclaimed"]
        KEEP["preserve clone dir<br/>━━━━━━━━━━<br/>status=error<br/>for investigation"]
    end

    subgraph Contracts ["Contract Cards (recipe input contracts)"]
        direction LR
        C1["★ contracts/implementation-groups.yaml<br/>━━━━━━━━━━<br/>NEW — no defer_cleanup<br/>no registry_path"]
        C2["● contracts/implementation.yaml<br/>━━━━━━━━━━<br/>updated — removed<br/>defer_cleanup, registry_path"]
        C3["● contracts/remediation.yaml<br/>━━━━━━━━━━<br/>updated — removed<br/>defer_cleanup, registry_path"]
        C4["● contracts/merge-prs.yaml<br/>━━━━━━━━━━<br/>updated — removed defer_cleanup<br/>registry_path, keep_clone_on_failure"]
    end

    ELIMINATED["ELIMINATED state<br/>━━━━━━━━━━<br/>defer_cleanup ingredient<br/>registry_path ingredient<br/>keep_clone_on_failure ingredient<br/>check_defer_cleanup step<br/>confirm_cleanup step"]

    END_OK([COMPLETE])

    START -->|"success terminal"| REG_S
    START -->|"failure terminal"| REG_F
    REG_S -->|"appends entry"| ENTRY
    REG_F -->|"appends entry"| ENTRY
    ENTRY -->|"read once post-run"| BATCH
    BATCH --> GATE
    GATE -->|"status=success"| DEL
    GATE -->|"status=error"| KEEP
    DEL --> END_OK
    KEEP --> END_OK

    C1 -.->|"contract enforces"| REG_S
    C2 -.->|"contract enforces"| REG_S
    C3 -.->|"contract enforces"| REG_S
    C4 -.->|"contract enforces"| REG_S

    ELIMINATED -.->|"no longer written"| ENTRY

    class START,END_OK terminal;
    class REG_S,REG_F,BATCH newComponent;
    class ENTRY stateNode;
    class GATE stateNode;
    class DEL,KEEP output;
    class C1 phase;
    class C2,C3,C4 phase;
    class ELIMINATED detector;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Terminal | Pipeline start and end |
| Green | ● Modified / New | register steps and batch cleanup (this PR)
|
| Teal | State | Registry file and status decision |
| Purple | Phase | Contract card files |
| Dark Teal | Output | Clone disposition outcomes |
| Red | Eliminated | State that no longer exists |

Closes #610

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-20260404-185031-682892/.autoskillit/temp/make-plan/process_issues_defer_clone_cleanup_plan_2026-04-04_000000.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | input | output | cached | count | time |
|------|-------|--------|--------|-------|------|
| plan | 5.5k | 76.5k | 6.0M | 5 | 32m 41s |
| verify | 3.1k | 86.2k | 5.4M | 5 | 31m 25s |
| implement | 1.1k | 116.2k | 22.6M | 6 | 50m 55s |
| fix | 214 | 28.4k | 3.5M | 5 | 30m 58s |
| audit_impl | 137 | 58.9k | 3.1M | 5 | 19m 28s |
| open_pr | 36 | 16.9k | 1.4M | 1 | 6m 15s |
| **Total** | 10.1k | 383.2k | 42.0M | | 2h 51m |

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…ocal (#612)

## Summary

Move `smoke-test.yaml` and its companion artifacts (contract card, flow
diagram) from the bundled `src/autoskillit/recipes/` directory to the
project-local `.autoskillit/recipes/` directory. This makes smoke-test
invisible to end-user projects while remaining fully functional when
running from the AutoSkillit repository root. The existing project-local
recipe discovery mechanism already supports this — no production code
changes are needed. All changes are file relocations and test updates.

## Requirements

### MOVE — Recipe File Relocation

- **REQ-MOVE-001:** The file `src/autoskillit/recipes/smoke-test.yaml`
must be relocated to `.autoskillit/recipes/smoke-test.yaml` at the
project root.
- **REQ-MOVE-002:** Associated contract card(s) in
`src/autoskillit/recipes/contracts/` matching `smoke-test*` must be
relocated to `.autoskillit/recipes/contracts/`.
- **REQ-MOVE-003:** Associated diagram(s) in
`src/autoskillit/recipes/diagrams/` matching `smoke-test*` must be
relocated to `.autoskillit/recipes/diagrams/`.

### LIST — Listing Behavior

- **REQ-LIST-001:** The smoke-test recipe must not appear in
`list_recipes` output when the current working directory is outside the
AutoSkillit repository.
- **REQ-LIST-002:** The smoke-test recipe must appear in `list_recipes`
output with source `PROJECT` when the current working directory is the
AutoSkillit repository root.

### LOAD — Pipeline Compatibility

- **REQ-LOAD-001:** `load_recipe("smoke-test")` must succeed when
invoked from the AutoSkillit repository root.
- **REQ-LOAD-002:** Existing smoke-test pipeline execution must remain
functionally identical after the move.

### TEST — Test Updates

- **REQ-TEST-001:** Tests that assert smoke-test has
`RecipeSource.BUILTIN` must be updated to assert `RecipeSource.PROJECT`.
- **REQ-TEST-002:** Tests that count the number of bundled recipes must
be updated to reflect the removal of smoke-test from the bundled set.

## Architecture Impact

### Operational Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    START(["list_recipes / find_recipe_by_name called"])

    subgraph ProjectLocal ["★ PROJECT-LOCAL SCAN (priority 1)"]
        direction TB
        PROJ_DIR["★ .autoskillit/recipes/<br/>━━━━━━━━━━<br/>source = PROJECT<br/>★ smoke-test.yaml (moved here)"]
        PROJ_CONTRACT["★ .autoskillit/recipes/contracts/<br/>━━━━━━━━━━<br/>★ smoke-test.yaml"]
        PROJ_DIAGRAM["★ .autoskillit/recipes/diagrams/<br/>━━━━━━━━━━<br/>★ smoke-test.md"]
    end

    subgraph Bundled ["BUNDLED SCAN (priority 2)"]
        direction TB
        BUILTIN_DIR["src/autoskillit/recipes/<br/>━━━━━━━━━━<br/>source = BUILTIN<br/>implementation, remediation,<br/>merge-prs, impl-groups<br/>(smoke-test removed)"]
    end

    DEDUP["Dedup via seen set<br/>━━━━━━━━━━<br/>Project names shadow bundled"]

    subgraph AutoskillitRepo ["AUTOSKILLIT REPO CONTEXT"]
        direction TB
        CLI_LIST["● autoskillit recipes list<br/>━━━━━━━━━━<br/>Shows smoke-test (source: project)"]
        CLI_ORDER["autoskillit order<br/>━━━━━━━━━━<br/>Pipeline execution menu"]
        CLI_RENDER["autoskillit recipes render<br/>━━━━━━━━━━<br/>_recipes_dir_for(PROJECT)<br/>→ .autoskillit/recipes/diagrams/"]
    end

    subgraph ExternalProject ["EXTERNAL PROJECT CONTEXT"]
        direction TB
        EXT_LIST["autoskillit recipes list<br/>━━━━━━━━━━<br/>smoke-test NOT visible<br/>(no project-local copy)"]
    end

    START --> PROJ_DIR
    PROJ_DIR --> DEDUP
    DEDUP --> BUILTIN_DIR
    PROJ_DIR --> PROJ_CONTRACT
    PROJ_DIR --> PROJ_DIAGRAM
    DEDUP --> CLI_LIST
    DEDUP --> CLI_ORDER
    CLI_RENDER --> PROJ_DIAGRAM
    DEDUP --> EXT_LIST

    class START terminal;
    class PROJ_DIR,PROJ_CONTRACT,PROJ_DIAGRAM newComponent;
    class BUILTIN_DIR stateNode;
    class DEDUP handler;
    class CLI_LIST,CLI_ORDER,CLI_RENDER cli;
    class EXT_LIST detector;
```

### Module Dependency Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%%
graph TB
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    subgraph Tests ["TESTS (modified ●)"]
        direction TB
        T_SMOKE["● test_smoke_pipeline.py<br/>━━━━━━━━━━<br/>uses SMOKE_SCRIPT<br/>→ project-local path"]
        T_BUNDLED["● test_bundled_recipes.py<br/>━━━━━━━━━━<br/>smoke_yaml fixture<br/>→ project-local path"]
        T_POLICY["● test_bundled_recipe_hidden_policy.py<br/>━━━━━━━━━━<br/>BUNDLED_RECIPE_NAMES<br/>smoke-test removed"]
        T_TOOLS["● test_tools_recipe.py<br/>━━━━━━━━━━<br/>list_recipes assertion<br/>smoke-test NOT in bundled"]
        T_ENGINE["● test_engine.py<br/>━━━━━━━━━━<br/>contract adapter test<br/>→ project-local path"]
    end

    subgraph L3 ["L3 — SERVER"]
        direction TB
        TOOLS_RECIPE["server.tools_recipe<br/>━━━━━━━━━━<br/>list_recipes, load_recipe<br/>validate_recipe"]
    end

    subgraph L2R ["L2 — RECIPE"]
        direction TB
        RECIPE_IO["recipe.io<br/>━━━━━━━━━━<br/>builtin_recipes_dir()<br/>list_recipes()"]
        RECIPE_VALIDATOR["recipe.validator<br/>━━━━━━━━━━<br/>run_semantic_rules<br/>analyze_dataflow"]
        RECIPE_CONTRACTS["recipe.contracts<br/>━━━━━━━━━━<br/>load_bundled_manifest"]
    end

    subgraph L2M ["L2 — MIGRATION"]
        direction TB
        MIG_ENGINE["migration.engine<br/>━━━━━━━━━━<br/>default_migration_engine<br/>contract adapters"]
    end

    subgraph L0 ["L0 — CORE"]
        direction TB
        CORE_PATHS["core.paths<br/>━━━━━━━━━━<br/>pkg_root() → bundled dir<br/>fan-in: all layers"]
    end

    subgraph Artifacts ["★ PROJECT-LOCAL ARTIFACTS (new)"]
        direction TB
        PROJ_RECIPE["★ .autoskillit/recipes/<br/>━━━━━━━━━━<br/>smoke-test.yaml"]
        PROJ_CONTRACT["★ .autoskillit/recipes/contracts/<br/>━━━━━━━━━━<br/>smoke-test.yaml"]
        PROJ_DIAGRAM["★ .autoskillit/recipes/diagrams/<br/>━━━━━━━━━━<br/>smoke-test.md"]
    end

    T_SMOKE -->|"imports"| TOOLS_RECIPE
    T_SMOKE -->|"imports"| RECIPE_IO
    T_BUNDLED -->|"imports"| RECIPE_IO
    T_BUNDLED -->|"imports"| RECIPE_CONTRACTS
    T_POLICY -->|"imports"| CORE_PATHS
    T_TOOLS -->|"imports"| TOOLS_RECIPE
    T_ENGINE -->|"imports"| CORE_PATHS
    T_ENGINE -->|"imports"| MIG_ENGINE

    TOOLS_RECIPE -->|"imports"| RECIPE_IO
    RECIPE_IO -->|"builtin_recipes_dir()"| CORE_PATHS
    RECIPE_VALIDATOR -->|"imports"| RECIPE_IO
    RECIPE_CONTRACTS -->|"imports"| RECIPE_IO
    MIG_ENGINE -->|"imports"| CORE_PATHS

    T_SMOKE -.->|"now reads"| PROJ_RECIPE
    T_BUNDLED -.->|"now reads"| PROJ_RECIPE
    T_ENGINE -.->|"now reads"| PROJ_CONTRACT

    class T_SMOKE,T_BUNDLED,T_POLICY,T_TOOLS,T_ENGINE phase;
    class TOOLS_RECIPE cli;
    class RECIPE_IO,RECIPE_VALIDATOR,RECIPE_CONTRACTS handler;
    class MIG_ENGINE handler;
    class CORE_PATHS stateNode;
    class PROJ_RECIPE,PROJ_CONTRACT,PROJ_DIAGRAM newComponent;
```

Closes #600

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-20260404-190817-394673/.autoskillit/temp/make-plan/move_smoke_test_recipe_plan_2026-04-04_190817.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | input | output | cached | count | time |
|------|-------|--------|--------|-------|------|
| plan | 5.5k | 76.5k | 6.0M | 5 | 32m 41s |
| verify | 3.1k | 86.2k | 5.4M | 5 | 31m 25s |
| implement | 1.1k | 116.2k | 22.6M | 6 | 50m 55s |
| fix | 214 | 28.4k | 3.5M | 5 | 30m 58s |
| audit_impl | 137 | 58.9k | 3.1M | 5 | 19m 28s |
| open_pr | 74 | 37.1k | 3.0M | 2 | 12m 44s |
| **Total** | 10.1k | 403.4k | 43.6M | | 2h 58m |

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…Type in review-design (#614)

## Summary

The `review-design` skill has L1 severity calibration that correctly
caps `estimand_clarity` and `hypothesis_falsifiability` by
`experiment_type` — benchmarks can never produce L1 critical findings.
But the red-team dimension has **no analogous calibration**, meaning any
critical red-team finding triggers STOP regardless of experiment type.
This creates an unresolvable loop for benchmarks: the red-team always
finds new critical issues at progressively higher abstraction (the Hydra
pattern), exhausting retries without ever producing GO.

The fix adds a red-team severity calibration rubric to
`review-design/SKILL.md` (mirroring the L1 rubric), updates the verdict
logic to apply the cap before building `stop_triggers`, and adds
diminishing-return awareness to `resolve-design-review/SKILL.md` so it
can detect goalposts-moving across rounds.

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    START([Plan submitted])
    GO([GO → execute])
    REVISE_OUT([REVISE → revise_design])
    REVISED_OUT([revised → revise_design])
    FAILED_OUT([failed → design_rejected])

    subgraph ReviewDesign ["● review-design/SKILL.md"]
        direction TB
        L1["L1 Analysis<br/>━━━━━━━━━━<br/>estimand_clarity +<br/>hypothesis_falsifiability"]
        L1GATE{"L1 Fail-Fast<br/>━━━━━━━━━━<br/>Any L1 critical?"}
        PARALLEL["L2 + L3 + L4 + RT<br/>━━━━━━━━━━<br/>Parallel analysis"]
        RTCAP["● RT Severity Cap<br/>━━━━━━━━━━<br/>RT_MAX_SEVERITY[experiment_type]<br/>Downgrade if above ceiling"]
        MERGE["Merge + Dedup<br/>━━━━━━━━━━<br/>All findings pooled"]
        VERDICT{"● Verdict Logic<br/>━━━━━━━━━━<br/>stop_triggers built<br/>AFTER rt_cap applied"}
    end

    subgraph ResolveDesign ["● resolve-design-review/SKILL.md"]
        direction TB
        PARSE["Step 1: Parse Dashboard<br/>━━━━━━━━━━<br/>Extract stop-trigger findings<br/>Classify ADDRESSABLE/STRUCTURAL/DISCUSS"]
        DIMCHECK{"prior_revision_guidance<br/>━━━━━━━━━━<br/>provided?"}
        DIMRET["● Step 1.5: Diminishing-Return<br/>━━━━━━━━━━<br/>Compare ADDRESSABLE themes<br/>vs prior guidance entries"]
        GOALPOST{"goalposts_moving<br/>━━━━━━━━━━<br/>true for any finding?"}
        RECLASSIFY["● Reclassify<br/>━━━━━━━━━━<br/>ADDRESSABLE → STRUCTURAL<br/>annotate prior_theme_match"]
        RESGATE{"Any ADDRESSABLE<br/>or DISCUSS?"}
    end

    subgraph RecipeRouting ["● research.yaml — resolve_design_review step"]
        direction LR
        RECIPE["skill_command passes<br/>━━━━━━━━━━<br/>$context.revision_guidance<br/>as optional 3rd arg"]
    end

    START --> L1
    L1 --> L1GATE
    L1GATE -->|"yes (L1 critical)"| MERGE
    L1GATE -->|"no"| PARALLEL
    PARALLEL --> RTCAP
    RTCAP --> MERGE
    MERGE --> VERDICT
    VERDICT -->|"stop_triggers present"| RECIPE
    VERDICT -->|"critical or ≥3 warnings"| REVISE_OUT
    VERDICT -->|"otherwise"| GO

    RECIPE --> PARSE
    PARSE --> DIMCHECK
    DIMCHECK -->|"yes"| DIMRET
    DIMCHECK -->|"no (round 1)"| RESGATE
    DIMRET --> GOALPOST
    GOALPOST -->|"true"| RECLASSIFY
    GOALPOST -->|"false"| RESGATE
    RECLASSIFY --> RESGATE
    RESGATE -->|"yes"| REVISED_OUT
    RESGATE -->|"all STRUCTURAL"| FAILED_OUT

    class START,GO,REVISE_OUT,REVISED_OUT,FAILED_OUT terminal;
    class L1,PARALLEL handler;
    class L1GATE,VERDICT,DIMCHECK,GOALPOST,RESGATE stateNode;
    class MERGE,PARSE phase;
    class RTCAP,DIMRET,RECLASSIFY newComponent;
    class RECIPE detector;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Terminal | Start and outcome states |
| Orange | Handler | Analysis agents (L1, parallel L2-L4+RT) |
| Teal | State | Decision points and verdict routing |
| Purple | Phase | Merge and parse aggregation steps |
| Green | Modified Component | ● Nodes changed by this PR (RT cap,
diminishing-return detection, reclassify, recipe routing) |
| Red | Detector | Recipe routing gate (passes revision_guidance) |

Closes #609

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-20260404-185816-184240/.autoskillit/temp/make-plan/add-red-team-severity-calibration-by-experiment-type_plan_2026-04-04_185816.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | input | output | cached | count | time |
|------|-------|--------|--------|-------|------|
| plan | 5.5k | 76.5k | 6.0M | 5 | 32m 41s |
| verify | 3.1k | 86.2k | 5.4M | 5 | 31m 25s |
| implement | 1.1k | 116.2k | 22.6M | 6 | 50m 55s |
| fix | 214 | 28.4k | 3.5M | 5 | 30m 58s |
| audit_impl | 137 | 58.9k | 3.1M | 5 | 19m 28s |
| open_pr | 135 | 68.4k | 5.4M | 4 | 23m 1s |
| review_pr | 31 | 22.8k | 1.2M | 1 | 5m 50s |
| **Total** | 10.2k | 457.5k | 47.2M | | 3h 14m |

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
#615)

## Summary

The token summary table (displayed in PRs, terminal, and compact KV
output) collapses 4 distinct Claude API token fields into 3 misleading
columns. The column labeled "input" actually shows only the tiny
uncached delta (`input_tokens`), and "cached" silently sums two
cost-distinct categories (`cache_read_input_tokens` at 0.1x billing +
`cache_creation_input_tokens` at 1.25x billing). This change splits the
display into 4 token columns — `uncached`, `output`, `cache_read`,
`cache_write` — across all 3 independent formatter implementations and
their tests.

No data model, extraction, or storage changes are needed — `TokenEntry`
already preserves all 4 fields. This is purely a formatting-layer fix.

## Architecture Impact

### Data Lineage Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart LR
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;

    subgraph API ["Claude API Response"]
        direction TB
        F1["input_tokens<br/>━━━━━━━━━━<br/>Uncached delta"]
        F2["output_tokens<br/>━━━━━━━━━━<br/>Generated tokens"]
        F3["cache_read_input_tokens<br/>━━━━━━━━━━<br/>0.1x billing"]
        F4["cache_creation_input_tokens<br/>━━━━━━━━━━<br/>1.25x billing"]
    end

    subgraph Storage ["TokenEntry Storage"]
        TE[("TokenEntry<br/>━━━━━━━━━━<br/>4 fields intact<br/>Accumulated per step")]
        TJ[("token_usage.json<br/>━━━━━━━━━━<br/>Persisted session data<br/>All 4 fields")]
    end

    subgraph Canonical ["● telemetry_fmt.py (Canonical Formatter)"]
        direction TB
        FMD["● format_token_table()<br/>━━━━━━━━━━<br/>Markdown table<br/>Step|uncached|output|cache_read|cache_write|count|time"]
        FTM["● format_token_table_terminal()<br/>━━━━━━━━━━<br/>Terminal table<br/>UNCACHED|OUTPUT|CACHE_RD|CACHE_WR"]
        FKV["● format_compact_kv()<br/>━━━━━━━━━━<br/>Compact KV<br/>uc:|out:|cr:|cw:"]
    end

    subgraph Hooks ["Stdlib Hooks (no autoskillit imports)"]
        direction TB
        TSA["● token_summary_appender._format_table()<br/>━━━━━━━━━━<br/>Reads token_usage.json<br/>Markdown table → GitHub PR body"]
        POS["● pretty_output._fmt_get_token_summary()<br/>━━━━━━━━━━<br/>Reads get_token_summary JSON<br/>Compact KV → PostToolUse"]
        POR["● pretty_output._fmt_run_skill()<br/>━━━━━━━━━━<br/>Reads run_skill result dict<br/>Inline KV → PostToolUse"]
    end

    subgraph Outputs ["Display Targets"]
        direction TB
        MD["PR Body<br/>━━━━━━━━━━<br/>GitHub markdown table"]
        TERM["Terminal<br/>━━━━━━━━━━<br/>Padded column output"]
        KV["Compact KV<br/>━━━━━━━━━━<br/>One-liner summaries"]
        HOOK["PostToolUse Output<br/>━━━━━━━━━━<br/>Hook-formatted display"]
    end

    F1 --> TE
    F2 --> TE
    F3 --> TE
    F4 --> TE
    TE --> TJ

    TE --> FMD
    TE --> FTM
    TE --> FKV
    TJ --> TSA
    TJ -.-> POS

    FMD -->|"markdown rows"| MD
    FTM -->|"padded columns"| TERM
    FKV -->|"kv lines"| KV
    TSA -->|"gh api PATCH"| MD
    POS -->|"formatted text"| HOOK
    POR -->|"formatted text"| HOOK

    class F1,F2,F3,F4 cli;
    class TE,TJ stateNode;
    class FMD,FTM,FKV handler;
    class TSA,POS,POR integration;
    class MD,TERM,KV,HOOK output;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | API Fields | 4 Claude API token categories from usage
response |
| Teal | Storage | TokenEntry dataclass + persisted JSON session files |
| Orange | Canonical Formatter | 3 functions in telemetry_fmt.py (all ●
modified) |
| Red | Stdlib Hooks | Independent hook implementations (all ● modified)
|
| Dark Teal | Outputs | Display targets: PR body, terminal, compact KV,
PostToolUse |

### Operational Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;

    subgraph Triggers ["OPERATOR TRIGGERS"]
        direction TB
        GTS["get_token_summary<br/>━━━━━━━━━━<br/>MCP tool call<br/>format=json|markdown"]
        RS["run_skill<br/>━━━━━━━━━━<br/>MCP tool call<br/>Headless session"]
        PRPATCH["PR body update<br/>━━━━━━━━━━<br/>After open-pr skill<br/>PostToolUse event"]
    end

    subgraph State ["TOKEN STATE (read/write)"]
        direction TB
        TL[("DefaultTokenLog<br/>━━━━━━━━━━<br/>In-memory accumulator<br/>4 fields per step")]
        TJ[("token_usage.json<br/>━━━━━━━━━━<br/>Per-session disk files<br/>Read by stdlib hooks")]
    end

    subgraph Formatters ["● FORMATTERS (modified)"]
        direction TB
        TF["● telemetry_fmt.py<br/>━━━━━━━━━━<br/>format_token_table()<br/>format_token_table_terminal()<br/>format_compact_kv()"]
        TSA["● token_summary_appender.py<br/>━━━━━━━━━━<br/>_format_table()<br/>Stdlib-only hook"]
        PO["● pretty_output.py<br/>━━━━━━━━━━<br/>_fmt_get_token_summary()<br/>_fmt_run_skill()"]
    end

    subgraph Outputs ["OBSERVABILITY OUTPUTS (write-only)"]
        direction TB
        MDTBL["PR Body Table<br/>━━━━━━━━━━<br/>## Token Usage Summary<br/>Step|uncached|output|cache_read|cache_write|count|time"]
        TERM["Terminal Table<br/>━━━━━━━━━━<br/>STEP UNCACHED OUTPUT CACHE_RD CACHE_WR COUNT TIME<br/>Padded for readability"]
        KV["Compact KV<br/>━━━━━━━━━━<br/>name xN [uc:X out:X cr:X cw:X t:Xs]<br/>total_uncached / total_cache_read / total_cache_write"]
        HOOK["PostToolUse Display<br/>━━━━━━━━━━<br/>tokens_uncached:<br/>tokens_cache_read:<br/>tokens_cache_write:"]
    end

    GTS -->|"reads"| TL
    TL -.->|"flush"| TJ
    TJ -->|"load_sessions"| TSA
    TJ -.->|"via MCP JSON payload"| PO

    GTS --> TF
    TF -->|"markdown"| MDTBL
    TF -->|"terminal"| TERM
    TF -->|"compact"| KV

    RS -->|"PostToolUse event"| PO
    PO -->|"_fmt_run_skill"| HOOK
    PO -->|"_fmt_get_token_summary"| KV

    PRPATCH -->|"PostToolUse event"| TSA
    TSA -->|"gh api PATCH"| MDTBL

    class GTS,RS,PRPATCH cli;
    class TL,TJ stateNode;
    class TF,TSA,PO handler;
    class MDTBL,TERM,KV,HOOK output;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Triggers | Operator-initiated MCP tool calls and
PostToolUse events |
| Teal | State | Token accumulator (read/write) and persisted JSON files
|
| Orange | Formatters | 3 modified formatter implementations (all ●
changed) |
| Dark Teal | Outputs | Write-only observability artifacts: PR table,
terminal, compact KV |

Closes #604

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-20260404-190817-266225/.autoskillit/temp/make-plan/token_summary_4_columns_plan_2026-04-04_191000.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary

Add `api-simulator` as a dev dependency and use its `mock_http_server`
pytest fixture to test the quota guard's real HTTP path end-to-end.
Currently all quota tests monkeypatch `_fetch_quota` at the function
level — the actual httpx client construction, header injection
(`Authorization: Bearer`, `anthropic-beta`), response parsing, and error
handling are never exercised. This plan introduces a `base_url`
parameter to `_fetch_quota` and `check_and_sleep_if_needed`, then writes
7 tests that point the real httpx client at `mock_http_server` to
exercise the full HTTP path.

**Files changed:** 3 (`pyproject.toml`,
`src/autoskillit/execution/quota.py`, new
`tests/execution/test_quota_http.py`)
**Existing tests:** Unchanged — all monkeypatch-based tests in
`test_quota.py` remain as-is.

## Requirements

### DEP — Dependency Integration

- **REQ-DEP-001:** The system must include `api-simulator` as a dev-only
dependency with a pinned git tag source.
- **REQ-DEP-002:** The api-simulator dependency must not appear in
production runtime dependencies.

### CFG — URL Configurability

- **REQ-CFG-001:** `_fetch_quota` must accept a `base_url` parameter
defaulting to `https://api.anthropic.com`.
- **REQ-CFG-002:** `check_and_sleep_if_needed` must thread the
`base_url` parameter through to `_fetch_quota` at both call sites.
- **REQ-CFG-003:** The production behavior must be unchanged when
`base_url` is not explicitly provided.

### HTTP — HTTP Path Verification

- **REQ-HTTP-001:** Tests must exercise the real httpx client
construction path, not monkeypatch `_fetch_quota`.
- **REQ-HTTP-002:** Tests must verify that the `Authorization: Bearer`
header is sent on the request.
- **REQ-HTTP-003:** Tests must verify that the `anthropic-beta:
oauth-2025-04-20` header is sent on the request.
- **REQ-HTTP-004:** Tests must verify correct JSON response parsing for
the `five_hour` utilization shape.

### ERR — Error Handling Verification

- **REQ-ERR-001:** Tests must verify fail-open behavior on HTTP 4xx/5xx
responses.
- **REQ-ERR-002:** Tests must verify fail-open behavior on network
timeout.
- **REQ-ERR-003:** Tests must verify that the above-threshold path
triggers a double-fetch (two HTTP requests).

### COMPAT — Backward Compatibility

- **REQ-COMPAT-001:** Existing `test_quota.py` tests must continue to
pass unchanged.

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    START([START: check_and_sleep_if_needed])

    subgraph GatePhase ["Gate Phase"]
        direction TB
        ENABLED{"config.enabled?"}
        DISABLED(["RETURN<br/>should_sleep: false"])
    end

    subgraph CachePhase ["Cache Phase"]
        direction TB
        CACHE["_read_cache<br/>━━━━━━━━━━<br/>Read local JSON cache"]
        CACHE_HIT{"Cache fresh?<br/>━━━━━━━━━━<br/>age ≤ max_age?"}
    end

    subgraph FetchPhase ["HTTP Fetch Phase"]
        direction TB
        FETCH["● _fetch_quota<br/>━━━━━━━━━━<br/>★ base_url parameter<br/>httpx.AsyncClient GET"]
        BASEURL["★ base_url<br/>━━━━━━━━━━<br/>default: api.anthropic.com<br/>test: mock_http_server.url"]
        PARSE["Parse Response<br/>━━━━━━━━━━<br/>five_hour.utilization<br/>Z→+00:00 normalization"]
    end

    subgraph DecisionPhase ["Threshold Decision"]
        direction TB
        THRESHOLD{"utilization<br/>≥ threshold?"}
        RESETS_AT1{"resets_at<br/>is None?<br/>(Gate 1)"}
        REFETCH["● _fetch_quota re-fetch<br/>━━━━━━━━━━<br/>★ base_url threaded<br/>Double-fetch for accuracy"]
        RESETS_AT2{"resets_at<br/>still None?<br/>(Gate 2)"}
    end

    subgraph Results ["Results"]
        BELOW(["RETURN<br/>should_sleep: false"])
        FALLBACK1(["RETURN<br/>should_sleep: true<br/>reason: unknown_reset<br/>fallback ≥ 60s"])
        FALLBACK2(["RETURN<br/>should_sleep: true<br/>reason: unknown_reset<br/>fallback ≥ 60s"])
        SLEEP(["RETURN<br/>should_sleep: true<br/>sleep_seconds computed"])
        FAILOPEN(["RETURN<br/>should_sleep: false<br/>error key present"])
    end

    subgraph TestInfra ["★ Test Infrastructure (test_quota_http.py)"]
        direction TB
        MOCK["★ mock_http_server<br/>━━━━━━━━━━<br/>api-simulator fixture<br/>HTTP server"]
        REGISTER["★ register / register_sequence<br/>━━━━━━━━━━<br/>Custom endpoint responses<br/>Status codes, delays"]
        INSPECT["★ get_requests / request_count<br/>━━━━━━━━━━<br/>Header verification<br/>Double-fetch assertion"]
    end

    START --> ENABLED
    ENABLED -->|"false"| DISABLED
    ENABLED -->|"true"| CACHE
    CACHE --> CACHE_HIT
    CACHE_HIT -->|"fresh + below threshold"| BELOW
    CACHE_HIT -->|"miss or expired"| FETCH
    FETCH --> BASEURL
    BASEURL --> PARSE
    PARSE --> THRESHOLD
    THRESHOLD -->|"below"| BELOW
    THRESHOLD -->|"above"| RESETS_AT1
    RESETS_AT1 -->|"None"| FALLBACK1
    RESETS_AT1 -->|"present"| REFETCH
    REFETCH --> RESETS_AT2
    RESETS_AT2 -->|"None"| FALLBACK2
    RESETS_AT2 -->|"present"| SLEEP
    FETCH -.->|"HTTP error / timeout"| FAILOPEN

    MOCK -.->|"serves responses to"| BASEURL
    REGISTER -.->|"configures"| MOCK
    INSPECT -.->|"verifies headers / count"| FETCH

    class START terminal;
    class DISABLED,BELOW,FALLBACK1,FALLBACK2,SLEEP,FAILOPEN phase;
    class ENABLED,CACHE_HIT,THRESHOLD,RESETS_AT1,RESETS_AT2 stateNode;
    class CACHE,PARSE handler;
    class FETCH,REFETCH handler;
    class BASEURL,MOCK,REGISTER,INSPECT newComponent;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Terminal | Entry point |
| Teal | State | Decision points and routing |
| Orange | Handler | Processing nodes (cache read, HTTP fetch, parse) |
| Green | New Component | ★ New `base_url` parameter and test
infrastructure |
| Purple | Phase | Result return paths |

### Development Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    subgraph Deps ["● DEPENDENCY MANIFEST (pyproject.toml)"]
        direction TB
        PYPROJECT["● pyproject.toml<br/>━━━━━━━━━━<br/>hatchling build backend<br/>requires-python ≥ 3.11"]
        DEVDEPS["● dev optional-dependencies<br/>━━━━━━━━━━<br/>pytest, pytest-asyncio,<br/>pytest-httpx, pytest-xdist,<br/>pytest-timeout, ruff,<br/>import-linter, packaging"]
        APISIM["★ api-simulator<br/>━━━━━━━━━━<br/>New dev dependency<br/>HTTP mock fixture provider"]
        UVSRC["★ [tool.uv.sources]<br/>━━━━━━━━━━<br/>api-simulator pinned<br/>git: TalonT-Org/api-simulator<br/>branch: main"]
        UVLOCK["● uv.lock<br/>━━━━━━━━━━<br/>Regenerated with<br/>api-simulator entry"]
    end

    subgraph Quality ["CODE QUALITY GATES (pre-commit)"]
        direction TB
        FORMAT["ruff format<br/>━━━━━━━━━━<br/>Auto-fix code style<br/>reads + modifies src"]
        LINT["ruff check<br/>━━━━━━━━━━<br/>Auto-fix lint violations<br/>reads + modifies src"]
        TYPES["mypy<br/>━━━━━━━━━━<br/>Type checking<br/>reads src, reports only"]
        UVCHECK["uv lock check<br/>━━━━━━━━━━<br/>Verifies lockfile sync<br/>reads uv.lock"]
        SECRETS["gitleaks<br/>━━━━━━━━━━<br/>Secret scanning<br/>reads staged files"]
        IMPORTLINT["import-linter<br/>━━━━━━━━━━<br/>Layer contract enforcement<br/>IL-001 through IL-007"]
    end

    subgraph Testing ["TEST FRAMEWORK"]
        direction TB
        PYTEST["pytest + pytest-asyncio<br/>━━━━━━━━━━<br/>asyncio_mode=auto<br/>timeout=60s signal"]
        XDIST["pytest-xdist -n 4<br/>━━━━━━━━━━<br/>Parallel test workers<br/>worksteal distribution"]
        UNITQUOTA["● test_quota.py<br/>━━━━━━━━━━<br/>23 unit tests<br/>monkeypatch _fetch_quota<br/>mock signature updated"]
        HTTPQUOTA["★ test_quota_http.py<br/>━━━━━━━━━━<br/>7 end-to-end HTTP tests<br/>real httpx client path<br/>no monkeypatching"]
        MOCKSERVER["★ mock_http_server fixture<br/>━━━━━━━━━━<br/>api-simulator provides<br/>register / register_sequence<br/>get_requests / request_count"]
    end

    subgraph EntryPoints ["ENTRY POINTS"]
        CLI["autoskillit CLI<br/>━━━━━━━━━━<br/>autoskillit.cli:main"]
    end

    PYPROJECT --> DEVDEPS
    DEVDEPS --> APISIM
    APISIM --> UVSRC
    UVSRC --> UVLOCK

    PYPROJECT --> FORMAT
    FORMAT --> LINT
    LINT --> TYPES
    TYPES --> UVCHECK
    UVCHECK --> SECRETS
    SECRETS --> IMPORTLINT

    IMPORTLINT --> PYTEST
    PYTEST --> XDIST
    XDIST --> UNITQUOTA
    XDIST --> HTTPQUOTA
    APISIM -.->|"provides fixture"| MOCKSERVER
    MOCKSERVER -.->|"injected into"| HTTPQUOTA

    PYPROJECT --> CLI

    class PYPROJECT,DEVDEPS,UVLOCK phase;
    class APISIM,UVSRC,HTTPQUOTA,MOCKSERVER newComponent;
    class UNITQUOTA handler;
    class FORMAT,LINT,TYPES,UVCHECK,SECRETS,IMPORTLINT detector;
    class PYTEST,XDIST handler;
    class CLI output;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Purple | Build Config | pyproject.toml, dev deps, lockfile |
| Green | New Component | ★ api-simulator dep, uv.sources, HTTP test
file, mock fixture |
| Orange | Test Framework | pytest, xdist, existing test_quota.py |
| Red | Quality Gates | ruff, mypy, uv lock check, gitleaks,
import-linter |
| Dark Teal | Entry Points | CLI entry point |

Closes #607

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-20260404-190816-816130/.autoskillit/temp/make-plan/integrate_api_simulator_quota_guard_plan_2026-04-04_191500.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | input | output | cached | count | time |
|------|-------|--------|--------|-------|------|
| plan | 5.5k | 76.5k | 6.0M | 5 | 32m 41s |
| verify | 3.1k | 86.2k | 5.4M | 5 | 31m 25s |
| implement | 1.1k | 116.2k | 22.6M | 6 | 50m 55s |
| fix | 214 | 28.4k | 3.5M | 5 | 30m 58s |
| audit_impl | 137 | 58.9k | 3.1M | 5 | 19m 28s |
| open_pr | 100 | 51.3k | 3.9M | 3 | 16m 38s |
| **Total** | 10.2k | 417.5k | 44.5M | | 3h 2m |

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary

The `zero_writes` gate in `execution/headless.py` fires unconditionally
when `write_behavior.mode == "always"` and `write_call_count == 0`. The
`resolve-failures` contract declares `write_behavior: always`, but the
skill legitimately exits with zero `Edit`/`Write` calls when the
worktree is already green (0 fix iterations). The gate has no escape
path for this case — `success=True` is demoted to `zero_writes`, killing
an otherwise correct pipeline run.

This PR changes the contract to `conditional` mode with a pattern gated
on the `fixes_applied` structured token, extends the same fix to
`retry-worktree` and `resolve-review`, and adds a semantic rule to
prevent regression.

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    START([run_skill called])
    SUCCESS(["✓ success=True<br/>subtype=success"])
    DEMOTED(["✗ success=False<br/>subtype=zero_writes"])

    subgraph Contract ["● Contract Resolution"]
        direction TB
        YAML["● skill_contracts.yaml<br/>━━━━━━━━━━<br/>resolve-failures:<br/>  write_behavior: conditional<br/>  write_expected_when:<br/>  - fixes_applied ≥ 1 regex"]
        FACTORY["● _factory.py<br/>━━━━━━━━━━<br/>_resolve_write_behavior()<br/>reads contract via lru_cache"]
        SPEC["WriteBehaviorSpec<br/>━━━━━━━━━━<br/>mode=conditional<br/>expected_when=(pattern,)"]
    end

    subgraph Execution ["● Skill Execution"]
        direction TB
        SESSION["headless subprocess<br/>━━━━━━━━━━<br/>run tests, apply fixes<br/>via Bash / Edit / Write"]
        TOKEN["● Structured Token<br/>━━━━━━━━━━<br/>fixes_applied = N<br/>emitted at Step 4"]
        COUNT["write_call_count<br/>━━━━━━━━━━<br/>count Edit + Write<br/>in tool_uses"]
    end

    subgraph Gate ["● Zero-Write Gate"]
        direction TB
        GUARD{"success=True AND<br/>write_count=0 AND<br/>write_behavior≠None?"}
        MODE{"● mode?<br/>━━━━━━━━━━<br/>always vs conditional"}
        PATTERN{"● _check_expected_patterns<br/>━━━━━━━━━━<br/>AND-match all patterns<br/>against session output"}
        EXPECT{"write_expected<br/>AND write_count=0?"}
    end

    %% FLOW %%
    START --> YAML
    YAML -->|"reads"| FACTORY
    FACTORY -->|"builds"| SPEC
    SPEC -->|"passed to executor"| SESSION
    SESSION --> TOKEN
    SESSION --> COUNT
    TOKEN --> GUARD
    COUNT --> GUARD

    GUARD -->|"No — gate inactive"| SUCCESS
    GUARD -->|"Yes"| MODE

    MODE -->|"always"| EXPECT
    MODE -->|"conditional"| PATTERN

    PATTERN -->|"fixes_applied=0<br/>no match → False"| SUCCESS
    PATTERN -->|"fixes_applied≥1<br/>match → True"| EXPECT

    EXPECT -->|"write_count > 0<br/>artifact written"| SUCCESS
    EXPECT -->|"write_count = 0<br/>no artifact"| DEMOTED

    %% CLASS ASSIGNMENTS %%
    class START,SUCCESS,DEMOTED terminal;
    class YAML,SPEC stateNode;
    class FACTORY,SESSION,COUNT handler;
    class TOKEN output;
    class GUARD,MODE,PATTERN,EXPECT detector;
```

### State Lifecycle Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;

    subgraph ContractFields ["● INIT_ONLY: Contract Fields (YAML → frozen)"]
        direction TB
        WB["● write_behavior<br/>━━━━━━━━━━<br/>always ∣ conditional ∣ null<br/>Set in skill_contracts.yaml<br/>Cached via @lru_cache"]
        WEW["● write_expected_when<br/>━━━━━━━━━━<br/>list of regex patterns<br/>AND-semantics at gate<br/>Empty = no pattern gate"]
    end

    subgraph SpecFields ["INIT_ONLY: WriteBehaviorSpec (frozen dataclass)"]
        direction TB
        MODE["● mode: str ∣ None<br/>━━━━━━━━━━<br/>Mirrors write_behavior<br/>Frozen after construction"]
        EXPECTED["● expected_when: tuple<br/>━━━━━━━━━━<br/>Immutable tuple of patterns<br/>Frozen after construction"]
    end

    subgraph SessionState ["MUTABLE + APPEND: Session State"]
        direction TB
        TOOLS["tool_uses: list<br/>━━━━━━━━━━<br/>APPEND_ONLY during session<br/>Each Edit/Write appended"]
        RESULT["● session output: str<br/>━━━━━━━━━━<br/>Contains structured tokens<br/>fixes_applied = N"]
        WCC["write_call_count: int<br/>━━━━━━━━━━<br/>DERIVED from tool_uses<br/>count(Edit + Write)"]
    end

    subgraph GateState ["● MUTABLE: SkillResult Fields (gate mutations)"]
        direction TB
        SUCCESS["● success: bool<br/>━━━━━━━━━━<br/>Init: True (if session ok)<br/>Gate may demote → False"]
        SUBTYPE["● subtype: str<br/>━━━━━━━━━━<br/>Init: success<br/>Gate may set → zero_writes"]
        RETRY["● needs_retry: bool<br/>━━━━━━━━━━<br/>Init: False<br/>Gate may set → True"]
    end

    subgraph Validation ["● VALIDATION GATES"]
        direction TB
        G1{"● mode check<br/>━━━━━━━━━━<br/>always → write_expected=True<br/>conditional → check patterns"}
        G2{"● _check_expected_patterns<br/>━━━━━━━━━━<br/>AND over all patterns<br/>re.search each on output"}
        G3{"write_expected AND<br/>write_count == 0?<br/>━━━━━━━━━━<br/>Demote if both True"}
    end

    %% FLOW: Contract → Spec %%
    WB -->|"reads"| MODE
    WEW -->|"reads"| EXPECTED

    %% FLOW: Spec → Gate %%
    MODE -->|"determines gate path"| G1
    EXPECTED -->|"provides patterns"| G2

    %% FLOW: Session → Gate %%
    TOOLS -->|"derives"| WCC
    RESULT -->|"scanned by"| G2
    WCC -->|"checked by"| G3

    %% FLOW: Gate decisions %%
    G1 -->|"conditional"| G2
    G1 -->|"always"| G3
    G2 -->|"match → True"| G3
    G2 -->|"no match → False"| SUCCESS

    %% FLOW: Gate → Mutation %%
    G3 -->|"demote"| SUBTYPE
    G3 -->|"demote"| RETRY
    G3 -->|"preserve"| SUCCESS

    %% CLASS ASSIGNMENTS %%
    class WB,WEW detector;
    class MODE,EXPECTED detector;
    class TOOLS handler;
    class RESULT output;
    class WCC phase;
    class SUCCESS,SUBTYPE,RETRY gap;
    class G1,G2,G3 stateNode;
```

Closes #603

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/remediation-20260404-212507-745574/.autoskillit/temp/rectify/rectify_zero-writes-false-positive_2026-04-04_215019_part_a.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | input | output | cached | count | time |
|------|-------|--------|--------|-------|------|
| investigate | 31 | 12.6k | 747.1k | 1 | 6m 34s |
| rectify | 11.4k | 57.9k | 2.0M | 1 | 27m 28s |
| review | 3.6k | 7.2k | 216.3k | 1 | 8m 0s |
| dry_walkthrough | 51 | 30.8k | 2.3M | 2 | 11m 22s |
| implement | 2.2k | 28.2k | 3.0M | 2 | 10m 56s |
| assess | 44 | 7.8k | 1.1M | 2 | 8m 43s |
| audit_impl | 30 | 18.6k | 654.7k | 2 | 9m 10s |
| open_pr | 28 | 15.8k | 1.0M | 1 | 7m 3s |
| **Total** | 17.3k | 178.9k | 11.1M | | 1h 29m |

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…alation Routing, and Pack Fix (#620)

## Summary

This part adds the post-review re-validation loop and escalation
consumption infrastructure to `research.yaml`, adds the `needs_rerun`
structured output token to `resolve-research-review/SKILL.md`, and fixes
the missing `exp-lens` pack registration. Additionally adds the data
provenance lifecycle across 5 research pipeline skills (plan-experiment,
run-experiment, write-report, review-design, review-research-pr) with
contract and guard tests.

## Requirements

### DATA — Data Provenance Lifecycle

- **REQ-DATA-001:** The `plan-experiment` skill must generate a Data
Manifest section in every experiment plan that maps each hypothesis to
its required data source(s), specifying source type (synthetic, fixture,
external, gitignored), acquisition method (generate, download, copy),
and verification criteria.
- **REQ-DATA-002:** When the research task directive or issue specifies
using particular data, the `plan-experiment` skill must include explicit
acquisition steps for that data in the plan — the plan must not assume
data will already be present.
- **REQ-DATA-003:** The `run-experiment` skill pre-flight must perform a
hypothesis-to-data mapping check against the Data Manifest: for each
hypothesis, verify its required data source is present and non-empty
before execution begins.
- **REQ-DATA-004:** When `run-experiment` pre-flight finds that data the
plan said would be acquired is missing, it must emit a structured
`blocked_hypotheses` list and treat this as a FAIL — not silently
degrade to N/A.
- **REQ-DATA-005:** The `review-design` skill must include data
acquisition completeness as a reviewable dimension at sufficient weight
to influence the verdict (not L-weight), checking that every hypothesis
has a data source, every external source has an acquisition step, and
every gitignored path has a generation/download step.
- **REQ-DATA-006:** The `review-research-pr` skill must include a
`data-scope` review dimension that checks whether the experiment's data
coverage matches the research task directive and flags when all
benchmarks used only synthetic data for a domain-specific project.

### REPORT — Write-Report Data Scope Guardrails

- **REQ-REPORT-001:** The `write-report` skill must include a mandatory
Data Scope Statement in the Executive Summary that explicitly states
what data types were used for all benchmarks and whether domain target
data was present, absent, or partial.
- **REQ-REPORT-002:** The `write-report` skill must perform a Metrics
Provenance Check before including any `*_metrics.json` files: verify
they were generated during the current experiment. If stale or
unrelated, disclose and omit with explanation rather than silently
dropping.
- **REQ-REPORT-003:** The `write-report` skill must enforce
pre-specified hypothesis gate thresholds: when a gate is not met, the
report must state this as a failure, and GO recommendations must
reference the specific gate that was met rather than silently
substituting a different threshold.

### REVAL — Post-Review Re-Validation Loop

- **REQ-REVAL-001:** The `resolve-research-review` skill must emit a
structured output token (`needs_rerun = true/false`) indicating whether
any `rerun_required` escalations exist, so the recipe can capture and
route on it.
- **REQ-REVAL-002:** The `research.yaml` recipe must include a routing
step after `resolve_research_review` that checks for `rerun_required`
escalations and routes to a `re_run_experiment` step when present.
- **REQ-REVAL-003:** The `re_run_experiment` step must perform a
targeted re-run of affected benchmarks/analyses (not a full experiment
replay) using the same data and scripts, then flow to `re_write_report`
→ `re_push_research`.
- **REQ-REVAL-004:** When only `design_flaw` escalations exist (no
`rerun_required`), the recipe must annotate the PR body with the
escalation details and continue to push.

### ESC — Escalation Consumption

- **REQ-ESC-001:** The `research.yaml` recipe must include a
`check_escalations` step between `resolve_research_review` and
`re_push_research` that reads `escalation_records_{pr}.json` and routes
based on escalation strategy types.
- **REQ-ESC-002:** The `check_escalations` step must distinguish between
`rerun_required` escalations (route to re-validation) and
`design_flaw`-only escalations (annotate and continue).

### PACK — Exp-Lens Pack Registration

- **REQ-PACK-001:** The `research.yaml` recipe must declare
`requires_packs: [research, exp-lens]` so that all 18 exp-lens skills
are available in headless sessions during the research recipe pipeline.

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    PUSH_BR([push_branch<br/>━━━━━━━━━━<br/>git push worktree])

    subgraph PRReview ["PR Review Phase"]
        direction TB
        OPEN["open_research_pr<br/>━━━━━━━━━━<br/>run_skill: open-pr"]
        GUARD{"guard_pr_url<br/>━━━━━━━━━━<br/>context.pr_url?"}
        REVIEW["● review_research_pr<br/>━━━━━━━━━━<br/>run_skill: review-research-pr<br/>captures: verdict"]
    end

    subgraph Resolution ["Review Resolution"]
        direction TB
        RESOLVE["● resolve_research_review<br/>━━━━━━━━━━<br/>run_skill: resolve-research-review<br/>captures: needs_rerun<br/>retries: 2"]
    end

    subgraph EscalationRouting ["★ Escalation Routing (New)"]
        direction TB
        CHECK{"★ check_escalations<br/>━━━━━━━━━━<br/>action: route<br/>context.needs_rerun?"}
    end

    subgraph RevalidationLoop ["★ Re-Validation Loop (New)"]
        direction TB
        RERUN["★ re_run_experiment<br/>━━━━━━━━━━<br/>run-experiment --adjust<br/>targeted benchmark re-run"]
        REWRITE["★ re_write_report<br/>━━━━━━━━━━<br/>write-report<br/>updated results"]
        RETEST["★ re_test<br/>━━━━━━━━━━<br/>test_check<br/>post-revalidation gate"]
    end

    REPUSH["● re_push_research<br/>━━━━━━━━━━<br/>run_cmd: git push"]
    COMPLETE([research_complete<br/>━━━━━━━━━━<br/>action: stop])

    PUSH_BR --> OPEN
    OPEN --> GUARD
    GUARD -->|"pr_url truthy"| REVIEW
    GUARD -->|"no pr_url"| COMPLETE
    REVIEW -->|"changes_requested"| RESOLVE
    REVIEW -->|"approved / needs_human"| COMPLETE
    RESOLVE -->|"on_success"| CHECK
    RESOLVE -->|"on_failure / exhausted"| COMPLETE
    CHECK -->|"needs_rerun == true"| RERUN
    CHECK -->|"default (false/absent)"| REPUSH
    RERUN -->|"on_success"| REWRITE
    RERUN -->|"on_failure / context_limit"| REPUSH
    REWRITE -->|"on_success"| RETEST
    REWRITE -->|"on_failure / context_limit"| REPUSH
    RETEST -->|"pass or fail"| REPUSH
    REPUSH --> COMPLETE

    class PUSH_BR,COMPLETE terminal;
    class GUARD,CHECK stateNode;
    class OPEN,REVIEW,RESOLVE handler;
    class RERUN,REWRITE,RETEST newComponent;
    class REPUSH phase;
```

### State Lifecycle Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;

    subgraph Manifest ["★ Data Manifest Contract (INIT_ONLY)"]
        direction TB
        DM["★ data_manifest<br/>━━━━━━━━━━<br/>hypothesis[], source_type,<br/>acquisition, location,<br/>verification, depends_on"]
        V9{"★ V9 Gate<br/>━━━━━━━━━━<br/>Every hypothesis has source?<br/>External has acquisition?<br/>Gitignored has generation?"}
    end

    subgraph DesignGate ["★ Design Review Gate"]
        direction TB
        DAQ{"★ data_acquisition L4<br/>━━━━━━━━━━<br/>Hypothesis coverage?<br/>External readiness?<br/>Directive compliance?"}
    end

    subgraph PreFlight ["★ Run-Experiment Pre-Flight"]
        direction TB
        PF{"★ Data Manifest<br/>Verification<br/>━━━━━━━━━━<br/>location exists?<br/>acquisition succeeds?"}
        BH["★ blocked_hypotheses<br/>━━━━━━━━━━<br/>APPEND_ONLY<br/>H5: missing at path"]
    end

    subgraph ReportGates ["★ Write-Report Validation Gates"]
        direction TB
        DSS["★ Data Scope Statement<br/>━━━━━━━━━━<br/>Mandatory in Executive Summary<br/>data types + domain coverage"]
        MPC["★ Metrics Provenance<br/>━━━━━━━━━━<br/>timestamp + relevance check<br/>disclose, never silently drop"]
        GE["★ Gate Enforcement<br/>━━━━━━━━━━<br/>pre-specified thresholds only<br/>no silent substitution"]
    end

    subgraph ReviewGate ["★ PR Review Gate"]
        direction TB
        DSCOPE["★ data-scope dimension<br/>━━━━━━━━━━<br/>Scope coverage?<br/>Claims qualified?<br/>Statement present?"]
    end

    subgraph EscalationState ["● Resolve Output Contract"]
        direction TB
        ESC["escalation_records<br/>━━━━━━━━━━<br/>APPEND_ONLY<br/>strategy: rerun_required<br/>strategy: design_flaw"]
        NR["● needs_rerun<br/>━━━━━━━━━━<br/>DERIVED from escalations<br/>any rerun_required → true<br/>else → false"]
    end

    DM -->|"writes"| V9
    V9 -->|"PASS: plan saved"| DAQ
    V9 -->|"FAIL: plan rejected"| FAIL_PLAN([Plan Rejected])

    DAQ -->|"GO: proceed"| PF
    DAQ -->|"STOP: hypothesis has no source"| REVISE([Revise Plan])
    DAQ -->|"REVISE: missing verification"| REVISE

    PF -->|"ALL READY"| DSS
    PF -->|"BLOCKED: data missing"| BH
    BH --> FAIL_RUN([Status: FAILED])

    DM -.->|"reads manifest"| PF
    DM -.->|"reads manifest"| DSS
    DM -.->|"reads manifest"| DSCOPE

    DSS --> MPC
    MPC --> GE
    GE -->|"report committed"| DSCOPE

    DSCOPE -->|"findings"| ESC
    ESC -->|"derive"| NR
    NR -->|"true → re-validate"| RERUN([Re-Validation Loop])
    NR -->|"false → push"| PUSH([Direct Push])

    class DM detector;
    class V9,DAQ,PF stateNode;
    class BH,ESC handler;
    class DSS,MPC,GE newComponent;
    class DSCOPE newComponent;
    class NR phase;
    class FAIL_PLAN,FAIL_RUN,REVISE gap;
    class RERUN,PUSH cli;
```

Closes #618

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-20260405-074034-301298/.autoskillit/temp/make-plan/research_recipe_data_provenance_plan_2026-04-05_074500_part_a.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| plan | 587 | 30.7k | 1.2M | 112.6k | 1 | 13m 29s |
| verify | 73 | 35.9k | 3.7M | 137.0k | 2 | 11m 23s |
| implement | 2.1k | 36.2k | 5.9M | 155.2k | 2 | 17m 4s |
| fix | 50 | 13.2k | 2.1M | 64.5k | 1 | 10m 53s |
| audit_impl | 28 | 17.3k | 786.1k | 51.7k | 1 | 5m 55s |
| open_pr | 23 | 17.1k | 736.1k | 58.6k | 1 | 8m 12s |
| **Total** | 2.9k | 150.3k | 14.5M | 579.5k | | 1h 6m |

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary

When a headless session spawns background agents via Claude Code's
`Agent` tool with `run_in_background: true`, Claude Code defers the
`type=result` NDJSON record until all background agents finish. If
autoskillit kills the process tree after Channel B confirms completion,
the deferred `type=result` is never flushed to stdout.
`parse_session_result` classifies the output as `UNPARSEABLE`, which
gates out all recovery paths and Channel B bypass — producing a false
failure for sessions that completed successfully.

The fix adds a **pre-gate Channel B drain-race recovery** in
`_build_skill_result` that runs *before* the `session.session_complete`
gate. When Channel B confirmed completion but the session is
UNPARSEABLE/EMPTY_OUTPUT, it reconstructs the result from
`assistant_messages` (which are written to stdout BEFORE the deferred
`type=result`) and promotes the session to SUCCESS, unlocking all
downstream recovery paths and Channel B bypass naturally.

## Architecture Impact

### Error/Resilience Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;

    START(["● _build_skill_result<br/>━━━━━━━━━━<br/>Entry with SubprocessResult"])

    subgraph PreGate ["● PRE-GATE: Channel B Drain-Race Recovery"]
        direction TB
        CB_CHECK{"● Channel B?<br/>+ subtype in<br/>RECOVERABLE_SUBTYPES?<br/>+ completion_marker?"}
        CB_RECOVER["● _recover_from_separate_marker<br/>━━━━━━━━━━<br/>Reconstruct result from<br/>assistant_messages"]
        CB_PROMOTE["● Promote session<br/>━━━━━━━━━━<br/>subtype → SUCCESS<br/>is_error → False"]
        CB_SKIP["No recovery needed<br/>━━━━━━━━━━<br/>Pass through unchanged"]
    end

    subgraph CompletionGate ["session.session_complete Gate"]
        direction TB
        GATE{"session_complete?<br/>━━━━━━━━━━<br/>not is_error AND<br/>subtype not in<br/>FAILURE_SUBTYPES"}
        MARKER_RECOVER["_recover_from_separate_marker<br/>━━━━━━━━━━<br/>Marker-based recovery"]
        PATTERN_RECOVER["_recover_block_from_assistant_messages<br/>━━━━━━━━━━<br/>Pattern-based recovery"]
        SYNTH["_synthesize_from_write_artifacts<br/>━━━━━━━━━━<br/>UNMONITORED only"]
        SKIP_RECOVERY["Skip all recovery<br/>━━━━━━━━━━<br/>TIMEOUT / genuine failure"]
    end

    subgraph Outcome ["● _compute_outcome"]
        direction TB
        CB_BYPASS{"● Channel B<br/>bypass in<br/>_compute_success?"}
        CONTENT_CHECK["_check_session_content<br/>━━━━━━━━━━<br/>6-gate validation"]
        DEAD_END{"Dead-end guard<br/>━━━━━━━━━━<br/>ABSENT → DRAIN_RACE<br/>CONTRACT_VIOLATION → FAIL"}
    end

    subgraph PostOutcome ["Post-Outcome Gates"]
        direction TB
        BUDGET["_apply_budget_guard<br/>━━━━━━━━━━<br/>Max consecutive retries"]
        CONTRACT["CONTRACT_RECOVERY gate<br/>━━━━━━━━━━<br/>adjudicated_failure +<br/>write evidence"]
        ZERO_WRITE["Zero-write gate<br/>━━━━━━━━━━<br/>Expected writes missing"]
    end

    subgraph Terminals ["TERMINAL STATES"]
        T_SUCCESS([SUCCEEDED])
        T_RETRY([RETRIABLE<br/>DRAIN_RACE / RESUME /<br/>CONTRACT_RECOVERY])
        T_FAIL([FAILED])
        T_BUDGET([BUDGET_EXHAUSTED])
    end

    START --> CB_CHECK
    CB_CHECK -->|"Yes: CHANNEL_B +<br/>UNPARSEABLE or EMPTY_OUTPUT"| CB_RECOVER
    CB_CHECK -->|"No: other channel<br/>or non-recoverable subtype"| CB_SKIP
    CB_RECOVER -->|"Recovery succeeds:<br/>marker standalone +<br/>substantive content"| CB_PROMOTE
    CB_RECOVER -->|"Recovery fails:<br/>no marker in messages"| CB_SKIP
    CB_PROMOTE --> GATE
    CB_SKIP --> GATE

    GATE -->|"True: session promoted<br/>or originally complete"| MARKER_RECOVER
    GATE -->|"False: TIMEOUT /<br/>unrecoverable subtype"| SKIP_RECOVERY
    MARKER_RECOVER --> PATTERN_RECOVER
    PATTERN_RECOVER --> SYNTH
    SYNTH --> CB_BYPASS
    SKIP_RECOVERY --> CB_BYPASS

    CB_BYPASS -->|"CHANNEL_B + session_complete<br/>+ patterns pass"| T_SUCCESS
    CB_BYPASS -->|"No bypass: falls to<br/>termination dispatch"| CONTENT_CHECK
    CONTENT_CHECK -->|"All 6 gates pass"| T_SUCCESS
    CONTENT_CHECK -->|"Any gate fails"| DEAD_END
    DEAD_END -->|"ABSENT + channel confirmed"| T_RETRY
    DEAD_END -->|"CONTRACT_VIOLATION /<br/>SESSION_ERROR"| T_FAIL

    T_RETRY --> BUDGET
    BUDGET -->|"Under limit"| CONTRACT
    BUDGET -->|"Exceeded"| T_BUDGET
    CONTRACT -->|"adjudicated_failure +<br/>writes ≥ 1"| T_RETRY
    CONTRACT -->|"No match"| ZERO_WRITE
    ZERO_WRITE -->|"Expected writes missing"| T_RETRY
    ZERO_WRITE -->|"No issue"| T_SUCCESS

    %% CLASS ASSIGNMENTS %%
    class START terminal;
    class CB_CHECK,GATE,CB_BYPASS,DEAD_END stateNode;
    class CB_RECOVER,CB_PROMOTE newComponent;
    class CB_SKIP,SKIP_RECOVERY gap;
    class MARKER_RECOVER,PATTERN_RECOVER,SYNTH handler;
    class CONTENT_CHECK phase;
    class BUDGET,CONTRACT,ZERO_WRITE detector;
    class T_SUCCESS,T_RETRY,T_FAIL,T_BUDGET terminal;
```

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;

    START(["● _build_skill_result<br/>━━━━━━━━━━<br/>SubprocessResult input"])

    subgraph EarlyExit ["Phase 1: Early Exit Interception"]
        direction TB
        TERM_CHECK{"termination<br/>reason?"}
        STALE_PATH["STALE handler<br/>━━━━━━━━━━<br/>Attempt stdout recovery<br/>then retry or fail"]
        TIMEOUT_PATH["TIMEOUT handler<br/>━━━━━━━━━━<br/>Override subtype=TIMEOUT<br/>is_error=True"]
    end

    PARSE["parse_session_result<br/>━━━━━━━━━━<br/>NDJSON → ClaudeSessionResult<br/>extracts assistant_messages"]

    subgraph DrainRace ["● Phase 2: Channel B Drain-Race Recovery"]
        direction TB
        CB_MATCH{"● match channel<br/>━━━━━━━━━━<br/>CHANNEL_B +<br/>UNPARSEABLE/EMPTY_OUTPUT<br/>+ completion_marker?"}
        CB_RECON["● _recover_from_separate_marker<br/>━━━━━━━━━━<br/>Check marker standalone<br/>in assistant_messages"]
        CB_PROMOTE["● Promote session<br/>━━━━━━━━━━<br/>subtype → SUCCESS<br/>is_error → False"]
        CB_NONE["No drain-race<br/>━━━━━━━━━━<br/>Session unchanged"]
    end

    subgraph GatedRecovery ["Phase 3: Completion-Gated Recovery"]
        direction TB
        GATE{"session_complete?<br/>━━━━━━━━━━<br/>not is_error AND<br/>subtype ∉ FAILURE_SUBTYPES"}
        REC_MARKER["_recover_from_separate_marker<br/>━━━━━━━━━━<br/>Join assistant_messages<br/>when marker is standalone"]
        REC_PATTERN["_recover_block_from_assistant<br/>━━━━━━━━━━<br/>Patterns in messages<br/>not in result"]
        REC_SYNTH["_synthesize_from_write_artifacts<br/>━━━━━━━━━━<br/>UNMONITORED only:<br/>inject write paths"]
        GATE_SKIP["Skip recovery<br/>━━━━━━━━━━<br/>Incomplete session"]
    end

    subgraph ComputeOutcome ["● Phase 4: Outcome Adjudication"]
        direction TB
        COMPUTE["● _compute_outcome<br/>━━━━━━━━━━<br/>_compute_success +<br/>_compute_retry"]
        SUCCESS_CHECK{"● success?"}
        RETRY_CHECK{"needs_retry?"}
    end

    subgraph PostGates ["Phase 5: Post-Outcome Gates"]
        direction TB
        BUDGET_G["_apply_budget_guard<br/>━━━━━━━━━━<br/>consecutive_failures ><br/>max_retries?"]
        CONTRACT_G{"CONTRACT_RECOVERY?<br/>━━━━━━━━━━<br/>adjudicated_failure<br/>+ write_count ≥ 1"}
        ZERO_G{"zero_write_gate?<br/>━━━━━━━━━━<br/>success but no<br/>Write/Edit calls"}
    end

    T_SUCCESS([SUCCEEDED])
    T_RETRY([RETRIABLE])
    T_FAIL([FAILED])

    %% FLOW %%
    START --> TERM_CHECK
    TERM_CHECK -->|"STALE"| STALE_PATH
    TERM_CHECK -->|"TIMED_OUT"| TIMEOUT_PATH
    TERM_CHECK -->|"COMPLETED /<br/>NATURAL_EXIT"| PARSE
    STALE_PATH --> T_RETRY
    TIMEOUT_PATH --> PARSE

    PARSE --> CB_MATCH
    CB_MATCH -->|"Yes: all 3 guards pass"| CB_RECON
    CB_MATCH -->|"No: wrong channel /<br/>wrong subtype / no marker"| CB_NONE
    CB_RECON -->|"Marker found standalone<br/>+ substantive content"| CB_PROMOTE
    CB_RECON -->|"No marker or<br/>empty content"| CB_NONE
    CB_PROMOTE --> GATE
    CB_NONE --> GATE

    GATE -->|"True: complete session"| REC_MARKER
    GATE -->|"False: incomplete"| GATE_SKIP
    REC_MARKER --> REC_PATTERN
    REC_PATTERN --> REC_SYNTH
    REC_SYNTH --> COMPUTE
    GATE_SKIP --> COMPUTE

    COMPUTE --> SUCCESS_CHECK
    SUCCESS_CHECK -->|"True"| ZERO_G
    SUCCESS_CHECK -->|"False"| RETRY_CHECK
    RETRY_CHECK -->|"True"| BUDGET_G
    RETRY_CHECK -->|"False"| CONTRACT_G

    BUDGET_G -->|"Under limit"| T_RETRY
    BUDGET_G -->|"Exhausted"| T_FAIL
    CONTRACT_G -->|"Yes: promote to retry"| BUDGET_G
    CONTRACT_G -->|"No"| T_FAIL
    ZERO_G -->|"Writes expected<br/>but count = 0"| T_RETRY
    ZERO_G -->|"OK"| T_SUCCESS

    %% CLASS ASSIGNMENTS %%
    class START,T_SUCCESS,T_RETRY,T_FAIL terminal;
    class TERM_CHECK,CB_MATCH,GATE,SUCCESS_CHECK,RETRY_CHECK stateNode;
    class STALE_PATH,TIMEOUT_PATH,PARSE handler;
    class CB_RECON,CB_PROMOTE newComponent;
    class CB_NONE,GATE_SKIP gap;
    class REC_MARKER,REC_PATTERN,REC_SYNTH handler;
    class COMPUTE phase;
    class BUDGET_G,CONTRACT_G,ZERO_G detector;
```

Closes #619

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-619-20260405-085642-620214/.autoskillit/temp/make-plan/channel_b_drain_race_recovery_plan_2026-04-05_090230.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| plan | 42 | 18.8k | 1.6M | 80.7k | 1 | 9m 8s |
| verify | 17 | 17.4k | 687.5k | 79.7k | 1 | 6m 55s |
| implement | 77 | 28.2k | 4.4M | 89.7k | 1 | 15m 40s |
| audit_impl | 14 | 8.9k | 348.9k | 43.4k | 1 | 3m 4s |
| open_pr | 3.0k | 17.7k | 865.3k | 63.1k | 1 | 7m 30s |
| **Total** | 3.1k | 91.0k | 8.0M | 356.6k | | 42m 19s |

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…ulator FakeClaudeCLI (#624)

## Summary

Add 10 end-to-end tests in a new file
`tests/execution/test_session_classification_e2e.py` that exercise the
full session failure classification pipeline — from raw NDJSON
subprocess output produced by api-simulator's `fake_claude` fixture
through `parse_session_result()` and `_build_skill_result()` to final
`SkillResult` classification. Today all headless tests use
`MockSubprocessRunner` with pre-constructed `SubprocessResult` objects;
the NDJSON parsing and classification logic is never exercised against
realistic subprocess output. These tests close that gap using 4 groups:
NDJSON stream robustness (4 tests), context exhaustion edge cases (2
tests), kill boundary scenarios (2 tests), and process behavior
simulation (2 tests).

No production code changes are required. The `api-simulator` dev
dependency was added by #607.

## Requirements

### BRIDGE — Integration Bridge

- **REQ-BRIDGE-001:** Tests must use `fake_claude.run()` to produce real
subprocess output, not hand-constructed strings.
- **REQ-BRIDGE-002:** Tests must feed `proc.stdout` through
`parse_session_result()` from `autoskillit.execution.session`.
- **REQ-BRIDGE-003:** Tests must wrap the parsed result in a
`SubprocessResult` and pass it to `_build_skill_result()` for full
classification.

### PARSE — NDJSON Parse Robustness

- **REQ-PARSE-001:** The parser must correctly skip `type=system` /
`api_retry` records and still extract the final `type=result` record.
- **REQ-PARSE-002:** The parser must handle non-JSON lines (stream
corruption) gracefully without losing valid records.
- **REQ-PARSE-003:** When multiple `type=result` records appear, the
last one must determine classification.

### CTX — Context Exhaustion

- **REQ-CTX-001:** A flat assistant record containing the context
exhaustion marker with no `type=result` record must classify as
`context_exhaustion` with `needs_retry=True`.
- **REQ-CTX-002:** A `type=result` record with `is_error=True` and
`errors` containing the marker must classify as retriable with
`retry_reason=RESUME`.

### KILL — Kill Boundary

- **REQ-KILL-001:** A truncated stream (via `truncate_after`) must
produce `subtype=unparseable` or partial classification with nonzero
exit code.
- **REQ-KILL-002:** An `interrupted` subtype with nonzero exit code must
result in `needs_retry=False` (gated by returncode).

### PROC — Process Behavior

- **REQ-PROC-001:** The hang-after-result scenario must verify that the
result record was emitted to stdout before the process hung.
- **REQ-PROC-002:** Mid-stream exit via `inject_exit` must produce the
correct exit code and truncated stdout.

### COMPAT — Compatibility

- **REQ-COMPAT-001:** Existing `test_headless.py` and `test_session.py`
tests must remain unchanged and passing.

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    START([FakeClaudeCLI<br/>━━━━━━━━━━<br/>api-simulator fixture])

    subgraph Bridge ["★ E2E Test Bridge (new)"]
        direction TB
        RUN["★ fake_claude.run()<br/>━━━━━━━━━━<br/>CompletedProcess<br/>with real NDJSON stdout"]
        WRAP["★ _classify() / inline<br/>━━━━━━━━━━<br/>Wrap in SubprocessResult<br/>pid=0, caller termination"]
    end

    subgraph Parse ["parse_session_result()"]
        direction TB
        SCAN{"stdout empty?"}
        LOOP["Scan NDJSON lines<br/>━━━━━━━━━━<br/>JSON decode; skip errors<br/>last type=result wins"]
        CTX_FLAG{"flat assistant<br/>output_tokens=0<br/>+ ctx marker?"}
        RESULT_FOUND{"result record<br/>found?"}
    end

    subgraph Classify ["_compute_outcome()"]
        direction TB
        SUCCESS_GATE{"_compute_success<br/>━━━━━━━━━━<br/>returncode=0?<br/>is_error? result?"}
        RETRY_GATE{"_compute_retry<br/>━━━━━━━━━━<br/>session.needs_retry?<br/>kill anomaly?"}
        CONTRA{"contradiction<br/>success+retry?"}
        DEADEND{"dead-end<br/>failed+confirmed<br/>+ABSENT?"}
    end

    subgraph Normalize ["_normalize_subtype()"]
        NORM["Map raw CLI subtype<br/>━━━━━━━━━━<br/>to final string label"]
    end

    subgraph Gates ["Post-Classification Gates"]
        BUDGET{"budget<br/>exhausted?"}
        ZERO{"zero writes<br/>when expected?"}
    end

    subgraph Outcomes ["SkillResult"]
        direction LR
        OK([success])
        CTX([context_exhaustion<br/>needs_retry=True])
        EMPTY([empty_output /<br/>unparseable])
        INTR([interrupted<br/>needs_retry=False])
        FAIL([failure<br/>terminal])
    end

    START --> RUN
    RUN --> WRAP
    WRAP --> SCAN
    SCAN -->|"empty"| EMPTY
    SCAN -->|"non-empty"| LOOP
    LOOP --> CTX_FLAG
    CTX_FLAG -->|"yes → jsonl_context_exhausted=True"| RESULT_FOUND
    CTX_FLAG -->|"no"| RESULT_FOUND
    RESULT_FOUND -->|"yes"| SUCCESS_GATE
    RESULT_FOUND -->|"no → UNPARSEABLE / CTX_EXHAUSTION"| RETRY_GATE
    SUCCESS_GATE --> RETRY_GATE
    RETRY_GATE --> CONTRA
    CONTRA -->|"demote success"| DEADEND
    CONTRA -->|"consistent"| DEADEND
    DEADEND -->|"DRAIN_RACE"| NORM
    DEADEND -->|"terminal"| NORM
    NORM --> BUDGET
    BUDGET -->|"BUDGET_EXHAUSTED"| FAIL
    BUDGET -->|"ok"| ZERO
    ZERO -->|"zero_writes"| CTX
    ZERO -->|"ok"| OK
    SUCCESS_GATE -->|"returncode!=0"| INTR

    class START terminal;
    class RUN,WRAP newComponent;
    class LOOP handler;
    class SCAN,CTX_FLAG,RESULT_FOUND stateNode;
    class SUCCESS_GATE,RETRY_GATE,CONTRA,DEADEND phase;
    class NORM handler;
    class BUDGET,ZERO detector;
    class OK,CTX,EMPTY,INTR,FAIL terminal;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Terminal | Start (FakeClaudeCLI), final SkillResult
outcomes |
| Green | New Component | ★ `_classify()` bridge helper and
`fake_claude.run()` — new test code |
| Orange | Handler | NDJSON scan/accumulation and subtype normalization
|
| Teal | State | Decision points: empty check, context flag, result
found |
| Purple | Phase | Outcome computation gates (success, retry,
contradiction, dead-end) |
| Red | Detector | Post-classification guards (budget, zero-write) |

### Error/Resilience Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 55, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;

    START([★ E2E Test Suite<br/>━━━━━━━━━━<br/>10 failure scenarios<br/>via FakeClaudeCLI])

    subgraph ParseGates ["NDJSON Parse Resilience Gates"]
        direction TB
        EMPTY_CHECK{"stdout<br/>empty?"}
        JSON_ERR["Corrupt / non-JSON lines<br/>━━━━━━━━━━<br/>silently skipped<br/>(test 2: corrupt_stream)"]
        API_RETRY["api_retry records<br/>━━━━━━━━━━<br/>skipped — not type=result<br/>(test 1: inject_api_retry)"]
        LAST_WINS["Multiple result records<br/>━━━━━━━━━━<br/>last record wins<br/>(test 3: two results)"]
        EXHAUST["Exhausted retries<br/>━━━━━━━━━━<br/>no result record emitted<br/>(test 4: exhaust=True)"]
    end

    subgraph CtxDetect ["Context Exhaustion Detection"]
        direction TB
        FLAT_DETECT{"flat assistant<br/>output_tokens=0<br/>+ ctx marker?<br/>(test 5)"}
        ERR_DETECT{"is_error=True AND<br/>marker in errors[]?<br/>(test 6)"}
        CTX_FLAG["jsonl_context_exhausted<br/>━━━━━━━━━━<br/>race-resilient flag"]
    end

    subgraph KillGates ["Kill Boundary Gates"]
        direction TB
        RC_CHECK{"returncode != 0?"}
        KILL_ANOM{"_is_kill_anomaly?<br/>━━━━━━━━━━<br/>UNPARSEABLE /\nEMPTY_OUTPUT /\nINTERRUPTED"}
        INTR_GATE{"subtype=interrupted<br/>+ rc != 0?<br/>(test 8)"}
    end

    subgraph PostGates ["Post-Classification Guards"]
        BUDGET{"consecutive failures<br/>> budget max?"}
        ZERO_WRITE{"success AND<br/>write_count=0<br/>AND write expected?"}
    end

    T_SUCCESS([success<br/>━━━━━━━━━━<br/>needs_retry=False])
    T_CTX([context_exhaustion<br/>━━━━━━━━━━<br/>needs_retry=True, RESUME])
    T_EMPTY([empty_output / unparseable<br/>━━━━━━━━━━<br/>needs_retry=True via RESUME])
    T_INTR([interrupted<br/>━━━━━━━━━━<br/>needs_retry=False, terminal])
    T_BUDGET([budget_exhausted<br/>━━━━━━━━━━<br/>needs_retry=False, terminal])
    T_ZERO([zero_writes<br/>━━━━━━━━━━<br/>needs_retry=True])

    START --> EMPTY_CHECK
    EMPTY_CHECK -->|"empty stdout"| T_EMPTY
    EMPTY_CHECK -->|"has content"| JSON_ERR
    JSON_ERR -->|"skip bad lines, continue"| API_RETRY
    API_RETRY -->|"skip, continue to result"| LAST_WINS
    LAST_WINS -->|"no result"| EXHAUST
    EXHAUST -->|"empty_output / unparseable"| T_EMPTY
    LAST_WINS -->|"result found"| FLAT_DETECT
    FLAT_DETECT -->|"yes"| CTX_FLAG
    FLAT_DETECT -->|"no"| ERR_DETECT
    ERR_DETECT -->|"yes"| CTX_FLAG
    CTX_FLAG -->|"needs_retry=True"| T_CTX
    ERR_DETECT -->|"no"| RC_CHECK
    RC_CHECK -->|"nonzero (test 7,8,10)"| INTR_GATE
    INTR_GATE -->|"yes → no retry"| T_INTR
    INTR_GATE -->|"no"| T_EMPTY
    RC_CHECK -->|"zero"| KILL_ANOM
    KILL_ANOM -->|"anomaly → RESUME retry"| T_EMPTY
    KILL_ANOM -->|"no anomaly"| BUDGET
    BUDGET -->|"exceeded"| T_BUDGET
    BUDGET -->|"ok"| ZERO_WRITE
    ZERO_WRITE -->|"violation"| T_ZERO
    ZERO_WRITE -->|"ok"| T_SUCCESS

    class START newComponent;
    class EMPTY_CHECK,FLAT_DETECT,ERR_DETECT,RC_CHECK,KILL_ANOM,INTR_GATE stateNode;
    class JSON_ERR,API_RETRY,LAST_WINS,EXHAUST,CTX_FLAG handler;
    class BUDGET,ZERO_WRITE detector;
    class T_SUCCESS,T_CTX,T_EMPTY,T_INTR,T_BUDGET,T_ZERO terminal;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Green | New Component | ★ E2E test suite (new) — exercises all failure
paths |
| Teal | Decision Gates | Key detection and routing decisions |
| Orange | Handler | Parse resilience processing and flag setting |
| Red | Guard | Post-classification safety guards (budget, zero-write) |
| Dark Blue | Terminal | Final SkillResult outcome states |

### State Lifecycle Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 45, 'rankSpacing': 55, 'curve': 'basis'}}}%%
flowchart TB
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;

    TEST["★ test_session_classification_e2e.py<br/>━━━━━━━━━━<br/>10 scenarios assert field contracts<br/>across all classification paths"]

    subgraph ParseState ["INIT_ONLY — Set by Parser, Never Overwritten"]
        direction LR
        CTX_EX["jsonl_context_exhausted<br/>━━━━━━━━━━<br/>flat assistant → True<br/>read by _is_context_exhausted()"]
        RC["returncode / termination<br/>━━━━━━━━━━<br/>from SubprocessResult<br/>used in all compute_* gates"]
        SID["session_id<br/>━━━━━━━━━━<br/>from result record<br/>passed through unchanged"]
    end

    subgraph DerivedState ["DERIVED — Computed, Not Stored During Parse"]
        direction TB
        SUCCESS_D["success<br/>━━━━━━━━━━<br/>returncode=0 AND content gates<br/>must be False if needs_retry=True"]
        RETRY_D["needs_retry + retry_reason<br/>━━━━━━━━━━<br/>RESUME / ZERO_WRITES / etc.<br/>only valid pair if needs_retry=True"]
        SUBTYPE_D["subtype (normalized)<br/>━━━━━━━━━━<br/>'success' / 'context_exhaustion'<br/>/ 'interrupted' / etc."]
    end

    subgraph Contracts ["CONTRACT ENFORCEMENT GATES"]
        direction TB
        CONTRA_GATE{"Contradiction Guard<br/>━━━━━━━━━━<br/>success=True AND<br/>needs_retry=True?"}
        INTR_GATE{"Interrupted Gate<br/>━━━━━━━━━━<br/>subtype=interrupted AND<br/>rc != 0?"}
        CTX_GATE{"Context Exhaustion<br/>━━━━━━━━━━<br/>jsonl_context_exhausted OR<br/>marker in errors[]?"}
        BUDGET_GATE{"Budget Guard<br/>━━━━━━━━━━<br/>consecutive failures<br/>> budget max?"}
    end

    subgraph ResumeStates ["RESUME SAFETY — needs_retry contract"]
        direction LR
        RESUME_OK(["needs_retry=True<br/>retry_reason=RESUME<br/>━━━━━━━━━━<br/>context_exhaustion path"])
        NO_RETRY(["needs_retry=False<br/>retry_reason=NONE<br/>━━━━━━━━━━<br/>interrupted + rc!=0 path"])
        BUDGET_STOP(["needs_retry=False<br/>retry_reason=BUDGET_EXHAUSTED<br/>━━━━━━━━━━<br/>terminal, no more retries"])
    end

    TEST -->|"asserts all contracts"| CTX_EX
    TEST --> RC
    TEST --> SID

    CTX_EX -->|"read by"| CTX_GATE
    RC -->|"read by"| INTR_GATE
    RC -->|"read by"| CONTRA_GATE

    CTX_GATE -->|"exhausted → needs_retry=True"| RETRY_D
    CTX_GATE -->|"not exhausted"| INTR_GATE
    INTR_GATE -->|"interrupted+rc!=0 → terminal"| NO_RETRY
    INTR_GATE -->|"other"| CONTRA_GATE
    CONTRA_GATE -->|"contradiction → demote success"| SUCCESS_D
    CONTRA_GATE -->|"consistent"| SUCCESS_D
    RETRY_D --> BUDGET_GATE
    SUCCESS_D --> BUDGET_GATE
    SUBTYPE_D --> BUDGET_GATE
    BUDGET_GATE -->|"exceeded → clamp"| BUDGET_STOP
    BUDGET_GATE -->|"within budget"| RESUME_OK

    class TEST newComponent;
    class CTX_EX,RC,SID detector;
    class SUCCESS_D,RETRY_D,SUBTYPE_D phase;
    class CTX_GATE,INTR_GATE,CONTRA_GATE,BUDGET_GATE stateNode;
    class RESUME_OK,NO_RETRY,BUDGET_STOP cli;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Green | New Component | ★ E2E test suite — asserts all field contracts
|
| Red | INIT_ONLY | Fields set by parser, never overwritten |
| Purple | Derived | Fields computed from classification, not stored
during parse |
| Teal | Gates | Contract enforcement decision points |
| Dark Blue | Resume States | Terminal resume-safety outcomes |

Closes #608

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-608-20260405-085643-660865/.autoskillit/temp/make-plan/test_session_failure_classification_with_api_simulator_plan_2026-04-05_090300.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| plan | 31 | 22.4k | 812.6k | 59.1k | 1 | 12m 6s |
| verify | 21 | 17.2k | 863.3k | 66.7k | 1 | 9m 28s |
| implement | 2.5k | 9.4k | 1.1M | 48.2k | 1 | 5m 43s |
| fix | 21 | 7.3k | 703.0k | 42.4k | 1 | 7m 38s |
| audit_impl | 10 | 7.4k | 139.9k | 39.6k | 1 | 3m 29s |
| open_pr | 47 | 27.2k | 2.2M | 74.8k | 1 | 10m 44s |
| **Total** | 2.7k | 90.9k | 5.8M | 330.8k | | 49m 11s |

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…rect Changes (#623)

## Summary

When `implement-worktree-no-merge` runs and the model ignores
instructions to create a worktree (via `git worktree add`), it edits
files directly in the clone directory. This leaves dirty uncommitted
changes (or direct commits) on the clone's branch. On retry, the next
session inherits a contaminated working tree.

This plan adds a **clone contamination guard** to the headless execution
pipeline. The guard:
1. Snapshots the clone's HEAD SHA before each worktree-based skill
session
2. After a failed session where no worktree was created, detects
contamination (uncommitted changes or direct commits)
3. Reverts the clone to its pre-session state
4. Logs the cleanup for pipeline observability

Key architectural insight: `EnterWorktree` does not exist in this
codebase. Worktree creation uses standard `git worktree add` via Bash,
and success is signaled by emitting `worktree_path = <path>` tokens in
assistant messages. Detection of "no worktree created" is therefore: no
`worktree_path` token in `session.assistant_messages`.

## Requirements

### Snapshot (SNAP)

- **REQ-SNAP-001:** The system must capture the clone HEAD SHA before
each `run_skill` invocation for worktree-based skills
(implement-worktree-no-merge, retry-worktree).
- **REQ-SNAP-002:** The system must capture the clone working tree
cleanliness state (clean/dirty) before each `run_skill` invocation for
worktree-based skills.

### Detection (DET)

- **REQ-DET-001:** The system must detect uncommitted changes in the
clone CWD after a worktree-based skill session that was adjudicated as
failure.
- **REQ-DET-002:** The system must detect direct commits in the clone
(HEAD differs from pre-session SHA) after a worktree-based skill session
that was adjudicated as failure.
- **REQ-DET-003:** The system must verify whether `EnterWorktree` was
called during the session by inspecting tool_uses in the session result.

### Revert (REV)

- **REQ-REV-001:** The system must revert uncommitted changes in the
clone when contamination is detected (git checkout + git clean).
- **REQ-REV-002:** The system must revert direct commits in the clone
when contamination is detected (git reset to pre-session SHA).
- **REQ-REV-003:** The revert must only execute when all three
conditions are met: worktree-based skill, adjudicated failure, and no
EnterWorktree call in tool_uses.

### Observability (OBS)

- **REQ-OBS-001:** The system must log all contamination detection and
revert actions in the audit log with sufficient detail for pipeline
visibility.
- **REQ-OBS-002:** The audit log entry must include the pre-session SHA,
post-session SHA, list of contaminated files, and revert action taken.

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    START(["● run_headless_core()"])

    subgraph PreSession ["★ Pre-Session Snapshot"]
        direction TB
        IS_WT{"★ is_worktree_skill?<br/>━━━━━━━━━━<br/>implement-worktree-no-merge<br/>or retry-worktree in cmd"}
        IS_CLONE{"★ not is_git_worktree?<br/>━━━━━━━━━━<br/>cwd is clone root,<br/>not a worktree"}
        SNAP["★ snapshot_clone_state()<br/>━━━━━━━━━━<br/>git rev-parse HEAD<br/>→ CloneSnapshot(head_sha)"]
    end

    subgraph Session ["Existing Session Lifecycle"]
        direction TB
        RUN["● runner() subprocess<br/>━━━━━━━━━━<br/>Headless Claude CLI"]
        BUILD["● _build_skill_result()<br/>━━━━━━━━━━<br/>Adjudication + gates<br/>worktree_path always extracted"]
    end

    subgraph PostGuard ["★ Post-Session Clone Guard"]
        direction TB
        CHK_SNAP{"★ snapshot captured?<br/>━━━━━━━━━━<br/>_clone_snapshot is not None"}
        CHK_SUCC{"★ skill_result.success?"}
        CHK_WT{"★ worktree_path set?<br/>━━━━━━━━━━<br/>skill_result.worktree_path<br/>is not None"}
        DETECT["★ detect_contamination()<br/>━━━━━━━━━━<br/>git rev-parse HEAD → post_sha<br/>git status --porcelain → files"]
        CHK_DIRTY{"★ contamination found?<br/>━━━━━━━━━━<br/>post_sha ≠ pre_sha<br/>OR dirty files"}
        REVERT["★ revert_contamination()<br/>━━━━━━━━━━<br/>git reset --hard pre_sha<br/>git clean -fd"]
        AUDIT["★ audit.record_failure()<br/>━━━━━━━━━━<br/>subtype=clone_contamination<br/>RetryReason.CLONE_CONTAMINATION"]
    end

    FLUSH["● flush_session_log()<br/>━━━━━━━━━━<br/>★ clone_contamination_reverted<br/>→ summary.json"]
    RETURN(["● return skill_result"])
    SKIP_SNAP(["skip → _clone_snapshot=None"])

    START --> IS_WT
    IS_WT -->|"no: not a worktree skill"| SKIP_SNAP
    IS_WT -->|"yes"| IS_CLONE
    IS_CLONE -->|"already a worktree CWD"| SKIP_SNAP
    IS_CLONE -->|"clone root CWD"| SNAP
    SNAP --> RUN
    SKIP_SNAP --> RUN
    RUN --> BUILD
    BUILD --> CHK_SNAP
    CHK_SNAP -->|"no snapshot"| FLUSH
    CHK_SNAP -->|"snapshot exists"| CHK_SUCC
    CHK_SUCC -->|"success=True"| FLUSH
    CHK_SUCC -->|"success=False"| CHK_WT
    CHK_WT -->|"worktree created"| FLUSH
    CHK_WT -->|"no worktree"| DETECT
    DETECT --> CHK_DIRTY
    CHK_DIRTY -->|"clean"| FLUSH
    CHK_DIRTY -->|"contaminated"| REVERT
    REVERT --> AUDIT
    AUDIT --> FLUSH
    FLUSH --> RETURN

    class START,RETURN,SKIP_SNAP terminal;
    class IS_WT,IS_CLONE,CHK_SNAP,CHK_SUCC,CHK_WT,CHK_DIRTY stateNode;
    class RUN,BUILD,FLUSH handler;
    class SNAP,DETECT,REVERT,AUDIT newComponent;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Terminal | Entry/exit points of `run_headless_core` |
| Teal | State/Decision | Routing decisions that control guard
activation |
| Orange | Handler | Existing subprocess, adjudication, and telemetry
nodes |
| Green | New Component | New clone contamination guard components (★) |

### Module Dependency Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%%
graph TB
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;

    subgraph L3 ["L3 — SERVER (existing, unchanged)"]
        direction LR
        SERVER["server/tools_execution.py<br/>━━━━━━━━━━<br/>run_skill, run_cmd handlers"]
    end

    subgraph L1 ["L1 — EXECUTION"]
        direction TB
        HEADLESS["● execution/headless.py<br/>━━━━━━━━━━<br/>run_headless_core()<br/>_build_skill_result()"]
        CLONE_GUARD["★ execution/clone_guard.py<br/>━━━━━━━━━━<br/>is_worktree_skill()<br/>snapshot_clone_state()<br/>check_and_revert_clone_contamination()"]
        SESSION_LOG["● execution/session_log.py<br/>━━━━━━━━━━<br/>flush_session_log()<br/>★ clone_contamination_reverted"]
        COMMANDS["execution/commands.py<br/>━━━━━━━━━━<br/>build_full_headless_cmd()"]
        SESSION["execution/session.py<br/>━━━━━━━━━━<br/>ClaudeSessionResult"]
    end

    subgraph L0 ["L0 — CORE (zero autoskillit imports)"]
        direction TB
        ENUMS["● core/_type_enums.py<br/>━━━━━━━━━━<br/>RetryReason enum<br/>★ CLONE_CONTAMINATION added"]
        TYPES["core/types.py<br/>━━━━━━━━━━<br/>SkillResult, FailureRecord<br/>AuditStore, SubprocessRunner"]
        PATHS["core/paths.py<br/>━━━━━━━━━━<br/>is_git_worktree()"]
        LOGGING["core/logging.py<br/>━━━━━━━━━━<br/>get_logger()"]
        CORE_INIT["core/__init__.py<br/>━━━━━━━━━━<br/>Re-exports all L0 surface"]
    end

    subgraph Ext ["EXTERNAL (stdlib)"]
        STDLIB["dataclasses, pathlib<br/>datetime, typing"]
    end

    SERVER -->|"imports run_headless"| HEADLESS
    HEADLESS -->|"★ imports 3 functions"| CLONE_GUARD
    HEADLESS -->|"imports"| COMMANDS
    HEADLESS -->|"imports"| SESSION
    HEADLESS -->|"imports"| SESSION_LOG
    HEADLESS -->|"imports core surface"| CORE_INIT
    CLONE_GUARD -->|"★ imports FailureRecord<br/>RetryReason, SkillResult<br/>get_logger, is_git_worktree"| CORE_INIT
    SESSION_LOG -->|"imports"| LOGGING
    CORE_INIT -->|"re-exports"| ENUMS
    CORE_INIT -->|"re-exports"| TYPES
    CORE_INIT -->|"re-exports"| PATHS
    CORE_INIT -->|"re-exports"| LOGGING
    TYPES -->|"imports RetryReason"| ENUMS
    CLONE_GUARD -->|"stdlib only"| STDLIB
    ENUMS -->|"stdlib only"| STDLIB

    class SERVER cli;
    class HEADLESS,SESSION_LOG,COMMANDS,SESSION handler;
    class CLONE_GUARD newComponent;
    class ENUMS,TYPES,PATHS,LOGGING,CORE_INIT stateNode;
    class STDLIB integration;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Server (L3) | MCP tool handlers — top application layer |
| Orange | Execution (L1) | Service/orchestration layer modules |
| Green | New Module | `clone_guard.py` — new L1 execution module (★) |
| Teal | Core (L0) | Stable vocabulary/type layer — high fan-in |
| Red | External | Standard library dependencies |

Closes #617

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-617-20260405-085643-202786/.autoskillit/temp/make-plan/clone_contamination_guard_plan_2026-04-05_090600.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| plan | 6.9k | 23.4k | 1.7M | 82.7k | 1 | 10m 39s |
| verify | 33 | 20.7k | 1.4M | 55.6k | 1 | 8m 39s |
| implement | 81 | 24.3k | 4.4M | 89.7k | 1 | 10m 6s |
| fix | 40 | 14.4k | 1.7M | 62.9k | 1 | 9m 17s |
| audit_impl | 13 | 11.0k | 288.2k | 45.3k | 1 | 4m 14s |
| open_pr | 28 | 20.1k | 1.0M | 55.4k | 1 | 7m 18s |
| **Total** | 7.1k | 113.9k | 10.5M | 391.6k | | 50m 15s |

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…rtifact Merge Phase (#625)

## Summary

Add a six-step archival phase to the end of the research recipe
(`research.yaml`) that separates research artifacts from experimental
code before completion. After all review cycles, re-runs, and CI checks
finish, the new phase: (1) captures the experiment branch name, (2)
creates a clean artifact-only branch containing only `research/` from a
temporary worktree, (3) opens an artifact PR targeting the base branch,
(4) tags the full experiment branch under `archive/research/` for
permanent reference, (5) closes the original experiment PR with
cross-reference links, then (6) proceeds to `research_complete`. Every
archival step degrades gracefully — `on_failure` routes to
`research_complete` so the pipeline never blocks on archival failures.

## Requirements

### SPLIT — Artifact Extraction

- **REQ-SPLIT-001:** The recipe must create a new branch from the base
branch (e.g., main) containing only the `research/` directory contents
from the experiment branch, with no production source file changes.
- **REQ-SPLIT-002:** The artifact extraction must use `git checkout
<experiment-branch> -- research/` (or equivalent) to copy only the
research directory's file state, not replay commit history.
- **REQ-SPLIT-003:** The artifact-only branch must produce a single
clean commit with a descriptive message referencing the experiment name.

### PR — Artifact PR

- **REQ-PR-001:** The recipe must open a PR targeting the base branch
with the artifact-only branch, referencing the original experiment PR
number and summarizing key findings in the body.
- **REQ-PR-002:** The artifact PR must contain zero changes to
production source files — only files under `research/`.

### TAG — Branch Archival

- **REQ-TAG-001:** The recipe must create an annotated git tag with the
prefix `archive/research/` capturing the final state of the experiment
branch (after all reviews, re-runs, and CI pass).
- **REQ-TAG-002:** The annotated tag message must include the experiment
name and a note that the report was merged via the artifact PR.
- **REQ-TAG-003:** The tag must be pushed to the remote before the
experiment branch is cleaned up.

### CLOSE — Experiment PR Closure

- **REQ-CLOSE-001:** The recipe must close the original experiment PR
with a comment linking to the artifact PR, the archive tag, and any
follow-up implementation issues.
- **REQ-CLOSE-002:** The closure comment must explain why the PR was not
merged (experimental code in production source files) and where the
research record is preserved.

### ORDER — Execution Ordering

- **REQ-ORDER-001:** The archival phase must execute only after all
review cycles, review resolutions, experiment re-runs (per #618), and CI
checks have completed successfully.
- **REQ-ORDER-002:** The archival phase must be the final phase before
`research_complete`, not interleaved with review or re-validation steps.

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    subgraph PostReview ["● Post-Review Phase (modified routing)"]
        direction TB
        GPR{"guard_pr_url<br/>━━━━━━━━━━<br/>pr_url set?"}
        RRP["● review_research_pr<br/>━━━━━━━━━━<br/>run_skill: review-pr<br/>skip_when_false: review_pr"]
        RRR["● resolve_research_review<br/>━━━━━━━━━━<br/>run_skill: resolve-review<br/>retries: 2"]
        CE{"check_escalations<br/>━━━━━━━━━━<br/>needs_rerun?"}
        RERUN["re_run_experiment<br/>━━━━━━━━━━<br/>run-experiment --adjust"]
        REWRITE["re_write_report<br/>━━━━━━━━━━<br/>write-report"]
        RETEST["re_test<br/>━━━━━━━━━━<br/>test_check"]
        REPUSH["● re_push_research<br/>━━━━━━━━━━<br/>git push"]
    end

    subgraph Archival ["★ Archival Phase (new)"]
        direction TB
        BA{"★ begin_archival<br/>━━━━━━━━━━<br/>pr_url truthy?"}
        CEB["★ capture_experiment_branch<br/>━━━━━━━━━━<br/>git rev-parse HEAD<br/>captures: experiment_branch"]
        CAB["★ create_artifact_branch<br/>━━━━━━━━━━<br/>worktree + checkout research/<br/>captures: artifact_branch"]
        OAP["★ open_artifact_pr<br/>━━━━━━━━━━<br/>gh pr create (research/ only)<br/>captures: artifact_pr_url"]
        TEB["★ tag_experiment_branch<br/>━━━━━━━━━━<br/>git tag -a archive/research/*<br/>captures: archive_tag"]
        CEP["★ close_experiment_pr<br/>━━━━━━━━━━<br/>gh pr close + comment"]
    end

    RC([research_complete<br/>━━━━━━━━━━<br/>action: stop])

    GPR -->|"pr_url empty"| RC
    GPR -->|"pr_url truthy"| RRP
    RRP -->|"changes_requested"| RRR
    RRP -->|"needs_human / default / fail"| BA
    RRR -->|"success"| CE
    RRR -->|"exhausted / fail"| BA
    CE -->|"needs_rerun=true"| RERUN
    CE -->|"default"| REPUSH
    RERUN --> REWRITE --> RETEST --> REPUSH
    REPUSH -->|"success / fail"| BA

    BA -->|"pr_url truthy"| CEB
    BA -->|"default"| RC
    CEB -->|"success"| CAB
    CEB -->|"fail"| RC
    CAB -->|"success"| OAP
    CAB -->|"fail"| RC
    OAP -->|"success"| TEB
    OAP -->|"fail"| RC
    TEB -->|"success"| CEP
    TEB -->|"fail"| RC
    CEP -->|"success / fail"| RC

    class GPR,CE,BA stateNode;
    class RRP,RRR,RERUN,REWRITE,RETEST,REPUSH handler;
    class CEB,CAB,OAP,TEB,CEP newComponent;
    class RC terminal;
```

**Color Legend:**

| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Terminal | `research_complete` stop state |
| Teal | State/Route | Decision and routing steps (guard_pr_url,
check_escalations, begin_archival) |
| Orange | Handler | Existing processing steps — `●` marks modified
routing targets |
| Green | New Component | Six new archival steps (`★`) — linear chain
with graceful degradation |

Closes #621

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-20260405-101015-593986/.autoskillit/temp/make-plan/research_recipe_post_completion_archival_plan_2026-04-05_101500.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| plan | 2.2k | 36.6k | 1.4M | 90.3k | 1 | 16m 17s |
| verify | 32 | 25.8k | 1.2M | 55.5k | 1 | 14m 5s |
| implement | 48 | 14.0k | 1.9M | 50.5k | 1 | 5m 52s |
| audit_impl | 16 | 9.7k | 178.9k | 55.3k | 2 | 4m 31s |
| open_pr | 22 | 11.7k | 690.1k | 46.2k | 1 | 4m 26s |
| **Total** | 2.3k | 97.7k | 5.4M | 297.8k | | 45m 13s |

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary

- Add `Configure git auth for private deps` step to
`patch-bump-integration.yml` and `version-bump.yml` before `uv lock`
runs
- Fixes authentication failure when resolving the private
`api-simulator` git dependency added in PR #613
- Mirrors the existing auth pattern already present in `tests.yml` (line
76)

## Root Cause

PR #613 added `api-simulator` as a private git dependency in
`pyproject.toml`. The `tests.yml` workflow was updated with git auth,
but both version-bump workflows were missed. Every PR merged to
`integration` since then fails at the `uv lock` step with:

```
fatal: could not read Username for 'https://github.com': terminal prompts disabled
```

## Test plan

- [ ] This PR's own CI passes (tests.yml)
- [ ] After merge, the patch-bump workflow should succeed — verify by
checking the `bump-patch` check on this PR's merge commit
- [ ] Re-run a recent failed bump-patch workflow to confirm the fix

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary

Fixes a 3-iteration ejection loop in the merge queue pipeline by
introducing ejection-cause enrichment (`ejected_ci_failure` state and
`ejection_cause` field in `wait_for_merge_queue`), a CI gate after every
force-push (`ci_watch_post_queue_fix` step), and two post-rebase
manifest validation gates (language-aware validity check and duplicate
key scan) in `resolve-merge-conflicts`. Closes all six gaps identified
in #627: blind CI ejection routing, missing CI gate after re-push,
absent manifest/semantic validation, and missing `head_sha` in CI
results.

<details>
<summary>Individual Group Plans</summary>

### Group 1: Implementation Plan: Queue Ejection Loop Fix — PART A ONLY

This part addresses the Python code layer for the queue ejection loop
fix (Gaps 2 and 5 from issue #627).

**Gap 2** — `execution/merge_queue.py` currently returns
`pr_state="ejected"` for every ejection regardless of cause. When
GitHub's CI fails on a merge-group commit, the recipe cannot distinguish
a CI failure ejection from a conflict ejection, so it retries conflict
resolution indefinitely (no-op rebase loop). The fix: when the ejection
is confirmed and `checks_state == "FAILURE"`, return
`pr_state="ejected_ci_failure"` plus an `ejection_cause="ci_failure"`
field, allowing recipe `on_result` routing to send CI failures directly
to `diagnose_ci` instead of `queue_ejected_fix`.

**Gap 5** — `server/tools_ci.py` infers `head_sha` from `git rev-parse
HEAD` but never includes it in the JSON response. Recipe orchestrators
cannot verify that CI results correspond to the current HEAD after a
force-push. The fix: include `head_sha` in the `wait_for_ci` return dict
when it was resolved.

### Group 2: Implementation Plan: Queue Ejection Loop Fix — PART B ONLY

This part addresses the recipe and skill layer of the queue ejection
loop fix (Gaps 1, 3, 4, 6 from issue #627). Part A (code layer) must be
implemented first — this part routes on `pr_state="ejected_ci_failure"`
which Part A introduces.

**Gap 1** — `re_push_queue_fix` routes directly to `reenter_merge_queue`
after force-push, bypassing CI. Fix: insert a new
`ci_watch_post_queue_fix` step between `re_push_queue_fix` and
`reenter_merge_queue`, mirroring the existing `ci_watch` step.

**Gap 6** — `wait_for_queue` routes all `ejected` states to
`queue_ejected_fix` (conflict resolution), even when the ejection was
caused by a CI failure that conflict resolution cannot fix. Fix: add an
`ejected_ci_failure` route before `ejected` in
`wait_for_queue.on_result`, routing to `diagnose_ci` instead.

**Gap 3** — `resolve-merge-conflicts` SKILL.md runs only `pre-commit run
--all-files` post-rebase. Fix: add Step 5a — language-detected manifest
validation using fast non-compiling checks.

**Gap 4** — Even a clean rebase can produce duplicate keys when both
branches independently added the same dependency. Fix: add Step 5b —
targeted duplicate key scan in TOML/JSON manifest files.

Applied to: `recipes/implementation.yaml`, `recipes/remediation.yaml`,
`recipes/implementation-groups.yaml`,
`skills_extended/resolve-merge-conflicts/SKILL.md`.

</details>

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    START([wait_for_queue\nrecipe step])
    END_OK([release_issue_success])
    END_FAIL([release_issue_failure])
    END_TIMEOUT([release_issue_timeout])
    END_DIAG([diagnose_ci])

    subgraph MQPoll ["● Merge Queue Watcher (merge_queue.py)"]
        direction TB
        POLL["poll GitHub GraphQL\n━━━━━━━━━━\nPR state + queue state\n+ checks_state"]
        MERGED{"merged?"}
        CI_FAIL{"● checks_state\n== 'FAILURE'?"}
        CONFIRM["confirmation window\n━━━━━━━━━━\nnot_in_queue_cycles++"]
        CONFIRMED{"cycles ≥ threshold?"}
        STALL{"stall retries\nexhausted?"}
        TIMEOUT{"deadline\nexceeded?"}
    end

    subgraph EjectRoute ["● Recipe Ejection Routing (implementation.yaml)"]
        direction TB
        ROUTE{"● pr_state?"}
        REENROLL["reenroll_stalled_pr\n━━━━━━━━━━\ntoggle_auto_merge tool"]
    end

    subgraph ConflictFix ["● Conflict Fix Sub-Flow (implementation.yaml)"]
        direction TB
        QFIX["queue_ejected_fix\n━━━━━━━━━━\nresolve-merge-conflicts skill"]
        ESC{"escalation_required?"}
        REPUSH["re_push_queue_fix\n━━━━━━━━━━\npush_to_remote force=true"]
        CI_WATCH["● ci_watch_post_queue_fix\n━━━━━━━━━━\nwait_for_ci tool\ntimeout=300s"]
        CI_PASS{"CI pass?"}
        DETECT["detect_ci_conflict\n━━━━━━━━━━\ndiagnose-ci skill"]
        REENTER["reenter_merge_queue\n━━━━━━━━━━\ngh pr merge --squash --auto"]
    end

    subgraph WFCITool ["● wait_for_ci tool handler (tools_ci.py)"]
        direction LR
        INFER["infer head_sha\n━━━━━━━━━━\ngit rev-parse HEAD"]
        CIWAIT["ci_watcher.wait(scope)"]
        ENRICH["● result includes head_sha\n━━━━━━━━━━\nverifies SHA matches HEAD\nafter force-push"]
    end

    %% MAIN FLOW %%
    START --> POLL
    POLL --> MERGED
    MERGED -->|"yes"| END_OK
    MERGED -->|"no"| CONFIRM
    CONFIRM --> CONFIRMED
    CONFIRMED -->|"no"| STALL
    CONFIRMED -->|"yes (not in queue)"| CI_FAIL
    STALL -->|"yes"| END_TIMEOUT
    STALL -->|"no"| TIMEOUT
    TIMEOUT -->|"yes"| END_TIMEOUT
    TIMEOUT -->|"no"| POLL

    CI_FAIL -->|"yes"| ROUTE
    CI_FAIL -->|"no"| ROUTE

    ROUTE -->|"ejected_ci_failure\n(● new route)"| END_DIAG
    ROUTE -->|"ejected"| QFIX
    ROUTE -->|"stalled"| REENROLL
    ROUTE -->|"timeout"| END_TIMEOUT
    REENROLL -->|"success"| START
    REENROLL -->|"failure"| END_FAIL

    QFIX --> ESC
    ESC -->|"true"| END_FAIL
    ESC -->|"false"| REPUSH
    REPUSH -->|"failure"| END_FAIL
    REPUSH -->|"success"| CI_WATCH

    CI_WATCH --> INFER --> CIWAIT --> ENRICH
    ENRICH --> CI_PASS
    CI_PASS -->|"failure"| DETECT
    CI_PASS -->|"success"| REENTER
    DETECT --> END_FAIL
    REENTER -->|"success"| START
    REENTER -->|"failure"| END_FAIL

    %% CLASS ASSIGNMENTS %%
    class START terminal;
    class END_OK,END_FAIL,END_TIMEOUT,END_DIAG terminal;
    class POLL,CONFIRM handler;
    class MERGED,CONFIRMED,STALL,TIMEOUT stateNode;
    class CI_FAIL,ROUTE,ESC,CI_PASS detector;
    class QFIX,REPUSH,REENTER handler;
    class REENROLL,DETECT handler;
    class CI_WATCH,INFER,CIWAIT,ENRICH newComponent;
```

### State Lifecycle Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;

    subgraph MQResult ["wait_for_merge_queue Return Dict (merge_queue.py)"]
        direction TB
        PS["● pr_state : str\n━━━━━━━━━━\nmerged | ejected\nejected_ci_failure | stalled\ntimeout | error\n(bare literals, no StrEnum)"]
        SUC["success : bool\n━━━━━━━━━━\ntrue only for 'merged'"]
        REASON["reason : str\n━━━━━━━━━━\nhuman-readable\nalways present"]
        STALL["stall_retries_attempted : int\n━━━━━━━━━━\nalways present\nexcept 'error' path"]
        EC["● ejection_cause : str\n━━━━━━━━━━\n'ci_failure' only\nwhen pr_state==ejected_ci_failure\nCONDITIONAL FIELD"]
    end

    subgraph InternalPoll ["PRFetchState — Internal Polling State (not returned)"]
        direction LR
        CHECKS["checks_state : str|None\n━━━━━━━━━━\nGitHub StatusCheckRollup\nNone = no checks configured"]
        INQUEUE["in_queue : bool\n━━━━━━━━━━\nPR in mergeQueue.entries"]
        QSTATE["queue_state : str|None\n━━━━━━━━━━\nUNMERGEABLE | AWAITING_CHECKS\n| LOCKED | null"]
    end

    subgraph Gate1 ["● Ejection Decision Gate (merge_queue.py)"]
        direction TB
        CFAIL{"checks_state\n== 'FAILURE'?"}
        SET_ECI["● set pr_state='ejected_ci_failure'\n━━━━━━━━━━\nejection_cause='ci_failure'\nINJECTED into result"]
        SET_EJ["set pr_state='ejected'\n━━━━━━━━━━\nno ejection_cause field\n(absent, not null)"]
    end

    subgraph CIScope ["CIRunScope — Frozen Input Scope (core/types)"]
        direction LR
        WF["workflow : str|None\n━━━━━━━━━━\ne.g. 'tests.yml'"]
        HS["● head_sha : str|None\n━━━━━━━━━━\ngit rev-parse HEAD\nor caller-supplied"]
    end

    subgraph CIResult ["● wait_for_ci Return Dict (tools_ci.py)"]
        direction TB
        RUNID["run_id : int|None\n━━━━━━━━━━\nGitHub Actions run ID"]
        CONC["conclusion : str\n━━━━━━━━━━\nsuccess|failure|cancelled\naction_required|timed_out\nno_runs|error|unknown"]
        FJOBS["failed_jobs : list\n━━━━━━━━━━\nalways present\nempty on billing errors"]
        HSHA["● head_sha : str\n━━━━━━━━━━\nCONDITIONAL: present only\nwhen scope.head_sha truthy\ninjected by tool layer"]
    end

    subgraph ConsumerGate ["Recipe Routing Gate (on_result)"]
        direction TB
        ROUTE{"pr_state value?"}
        R1["ejected_ci_failure\n→ diagnose_ci"]
        R2["ejected\n→ queue_ejected_fix"]
        R3["merged|stalled|timeout\n→ other routes"]
    end

    %% FLOW %%
    CHECKS --> CFAIL
    INQUEUE --> CFAIL
    QSTATE --> CFAIL
    CFAIL -->|"FAILURE"| SET_ECI
    CFAIL -->|"other"| SET_EJ
    SET_ECI --> PS
    SET_ECI --> EC
    SET_EJ --> PS
    PS --> SUC
    PS --> REASON
    PS --> STALL

    HS --> CIResult
    WF --> CIResult
    RUNID --> CONC
    CONC --> FJOBS
    FJOBS --> HSHA

    PS --> ROUTE
    EC --> ROUTE
    ROUTE --> R1
    ROUTE --> R2
    ROUTE --> R3

    HSHA -.->|"verifies HEAD\nafter force-push"| R2

    %% CLASS ASSIGNMENTS %%
    class PS,EC,HSHA,SET_ECI,HS,CFAIL gap;
    class SUC,REASON,STALL,RUNID,CONC,FJOBS output;
    class CHECKS,INQUEUE,QSTATE,WF stateNode;
    class SET_EJ handler;
    class ROUTE,R1,R2,R3 detector;
    class InternalPoll phase;
```

### Error/Resilience Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;

    END_OK([release_issue_success])
    END_FAIL([release_issue_failure\n━━━━━━━━━━\nhuman escalation\nclone preserved])
    END_DIAG([diagnose_ci])

    subgraph MQLoop ["● Merge Queue Poll Loop (merge_queue.py)"]
        direction TB
        POLL["GraphQL fetch\n━━━━━━━━━━\nPR + queue state"]
        POLL_ERR{"Exception\ncaught?"}
        TIMEOUT_CHK{"deadline\nexceeded?"}
        STALL_CHK{"stall retries\n≥ max (3)?"}
    end

    subgraph EjectGate ["● Ejection Classification Gate (merge_queue.py)"]
        direction TB
        EJECT_DECISION{"● checks_state\n== 'FAILURE'?"}
        CI_EJ["● ejected_ci_failure\n━━━━━━━━━━\nejection_cause=ci_failure\nskips conflict resolution"]
        CONF_EJ["ejected\n━━━━━━━━━━\nno cause field\nconflict resolution"]
    end

    subgraph StallBreaker ["Stall Circuit Breaker (merge_queue.py)"]
        direction LR
        TOGGLE["_toggle_auto_merge\n━━━━━━━━━━\ndisable → 2s → re-enable\nbackoff: 30/60/120s"]
        TOGGLE_ERR{"Exception\ncaught?"}
    end

    subgraph ConflictPath ["Conflict Resolution Path (implementation.yaml)"]
        direction TB
        QFIX["queue_ejected_fix\n━━━━━━━━━━\nresolve-merge-conflicts skill"]
        ESC_CHK{"escalation\nrequired?"}
        REPUSH["re_push_queue_fix\n━━━━━━━━━━\nforce-push"]
        REPUSH_FAIL{"push\nfailed?"}
    end

    subgraph CIGate ["● CI Gate After Re-Push (implementation.yaml + tools_ci.py)"]
        direction TB
        CI_WATCH["● ci_watch_post_queue_fix\n━━━━━━━━━━\nwait_for_ci, timeout=300s\nincludes head_sha"]
        CI_CONC{"conclusion\n== success?"}
        DETECT["detect_ci_conflict\n━━━━━━━━━━\ngit merge-base check\n(stale base?)"]
        DETECT_CHK{"stale\nbase?"}
        CI_CF["ci_conflict_fix\n━━━━━━━━━━\nresolve-merge-conflicts"]
    end

    subgraph ManifestGates ["● Post-Rebase Manifest Validation (SKILL.md)"]
        direction TB
        STEP5A["● Step 5a: manifest validity\n━━━━━━━━━━\ncargo metadata / node JSON.parse\nuv lock --check / tomllib"]
        STEP5A_CHK{"manifest\nvalid?"}
        STEP5B["● Step 5b: duplicate key scan\n━━━━━━━━━━\nTOML dep sections\nJSON object_pairs_hook"]
        STEP5B_CHK{"duplicates\nfound?"}
        REBASE_ABORT["git rebase --abort\n━━━━━━━━━━\nescalation_required=true"]
    end

    %% POLL LOOP FLOW %%
    POLL --> POLL_ERR
    POLL_ERR -->|"yes: log + retry"| POLL
    POLL_ERR -->|"no"| TIMEOUT_CHK
    TIMEOUT_CHK -->|"yes"| END_FAIL
    TIMEOUT_CHK -->|"no"| STALL_CHK
    STALL_CHK -->|"yes: stalled"| END_FAIL
    STALL_CHK -->|"no: stall attempt"| TOGGLE
    TOGGLE --> TOGGLE_ERR
    TOGGLE_ERR -->|"yes: log + increment"| STALL_CHK
    TOGGLE_ERR -->|"no: success"| POLL

    %% EJECTION GATE %%
    STALL_CHK -->|"ejection confirmed"| EJECT_DECISION
    EJECT_DECISION -->|"FAILURE"| CI_EJ
    EJECT_DECISION -->|"other"| CONF_EJ
    CI_EJ --> END_DIAG
    CONF_EJ --> QFIX

    %% CONFLICT PATH %%
    QFIX --> STEP5A
    STEP5A --> STEP5A_CHK
    STEP5A_CHK -->|"invalid"| REBASE_ABORT
    STEP5A_CHK -->|"valid"| STEP5B
    STEP5B --> STEP5B_CHK
    STEP5B_CHK -->|"duplicates"| REBASE_ABORT
    STEP5B_CHK -->|"clean"| ESC_CHK
    REBASE_ABORT --> ESC_CHK
    ESC_CHK -->|"true"| END_FAIL
    ESC_CHK -->|"false"| REPUSH
    REPUSH --> REPUSH_FAIL
    REPUSH_FAIL -->|"yes"| END_FAIL
    REPUSH_FAIL -->|"no"| CI_WATCH

    %% CI GATE %%
    CI_WATCH --> CI_CONC
    CI_CONC -->|"yes"| END_OK
    CI_CONC -->|"no"| DETECT
    DETECT --> DETECT_CHK
    DETECT_CHK -->|"yes: stale base"| CI_CF
    DETECT_CHK -->|"no: code failure"| END_DIAG
    CI_CF --> ESC_CHK

    %% CLASS ASSIGNMENTS %%
    class END_OK,END_FAIL,END_DIAG terminal;
    class POLL,TOGGLE handler;
    class POLL_ERR,TOGGLE_ERR,TIMEOUT_CHK,STALL_CHK gap;
    class EJECT_DECISION,CI_CONC,DETECT_CHK,STEP5A_CHK,STEP5B_CHK,ESC_CHK,REPUSH_FAIL detector;
    class CI_EJ,CONF_EJ,REBASE_ABORT output;
    class QFIX,REPUSH,CI_WATCH,DETECT,CI_CF handler;
    class STEP5A,STEP5B phase;
```

Closes #627

## Implementation Plan

Plan files:
-
`/home/talon/projects/autoskillit-runs/impl-20260405-122055-152199/.autoskillit/temp/make-plan/queue_ejection_loop_plan_2026-04-05_122055_part_a.md`
-
`/home/talon/projects/autoskillit-runs/impl-20260405-122055-152199/.autoskillit/temp/make-plan/queue_ejection_loop_plan_2026-04-05_122055_part_b.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| plan | 37 | 31.7k | 1.9M | 113.2k | 1 | 11m 19s |
| review | 3.4k | 5.6k | 147.3k | 41.5k | 1 | 5m 45s |
| verify | 44 | 35.4k | 1.9M | 144.8k | 2 | 11m 15s |
| implement | 100 | 33.5k | 4.6M | 123.5k | 2 | 12m 17s |
| audit_impl | 15 | 14.0k | 279.5k | 44.2k | 1 | 3m 46s |
| open_pr | 33 | 30.5k | 1.2M | 68.1k | 1 | 10m 58s |
| **Total** | 3.6k | 150.8k | 9.9M | 535.3k | | 55m 23s |

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…Artifact Preservation (#630)

## Summary

The review-design skill has four compounding defects that make GO
verdicts structurally unreachable. This plan fixes all four:

1. **Threshold unreachable** — Replace the static `>= 3` warning
threshold with a proportional formula based on active dimensions
(`active_dimensions * WARNING_BUDGET_PER_DIM` where budget = 5),
calibrated so that the spectral-init v6 baseline (32 warnings across ~7
dimensions, deemed "substantively sound") would receive a GO verdict.

2. **Prescriptive findings** — Add evaluative-only constraints to
Critical Constraints and a shared subagent evaluation scope block before
Step 2, requiring findings to describe WHAT is lacking, never HOW to fix
it.

3. **Scope drift** — Add a design scope boundary to the shared subagent
block, prohibiting evaluation of implementation code snippets and
constraining review to experimental design elements.

4. **Artifact preservation** — Enhance the `create_worktree` step in
research.yaml to copy all review-cycle artifacts (dashboards, revision
guidance, plan versions, resolve-design-review output) into
`research/.../artifacts/`, and add a `commit_research_artifacts` step
before `push_branch` to capture phase-groups and phase-plans from the
worktree.

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;

    START([plan_experiment])
    COMPLETE([research_complete])
    STOP_OUT([design_rejected])

    subgraph DesignReview ["● review_design Step (research.yaml)"]
        direction TB
        RD["● review_design<br/>━━━━━━━━━━<br/>run_skill<br/>retries: 2"]
        REVISE_ROUTE["revise_design<br/>━━━━━━━━━━<br/>route → plan_experiment"]
        RESOLVE["resolve_design_review<br/>━━━━━━━━━━<br/>run_skill, retries: 1"]
    end

    subgraph VerdictSynthesis ["● Step 7: Verdict Synthesis (review-design SKILL.md)"]
        direction TB
        SCOPE["● Evaluative Scope Gate<br/>━━━━━━━━━━<br/>Findings: WHAT is lacking<br/>Design boundary only"]
        RTCAP["rt_cap = RT_MAX_SEVERITY<br/>━━━━━━━━━━<br/>Downgrade red_team<br/>severity by type"]
        CLASSIFY["Classify findings<br/>━━━━━━━━━━<br/>critical_findings<br/>warning_findings"]
        ACTIVE["● active_dimensions<br/>━━━━━━━━━━<br/>count spawned non-SILENT<br/>dims (L1+L2+L3+L4+RT)"]
        THRESH["★ warning_threshold<br/>━━━━━━━━━━<br/>active_dims × 5<br/>WARNING_BUDGET_PER_DIM=5"]
        VERDICT{"● Verdict Decision<br/>━━━━━━━━━━<br/>stop_triggers?<br/>critical? warnings≥threshold?"}
    end

    subgraph ArtifactPath ["★ Artifact Commit Path (research.yaml)"]
        direction TB
        TEST["● test<br/>━━━━━━━━━━<br/>test_check"]
        FIX["fix_tests<br/>━━━━━━━━━━<br/>run_skill"]
        RETEST["● retest<br/>━━━━━━━━━━<br/>test_check"]
        COMMIT["★ commit_research_artifacts<br/>━━━━━━━━━━<br/>run_cmd: copy phase-groups<br/>phase-plans → artifacts/<br/>on_failure: push_branch"]
    end

    PUSH["push_branch<br/>━━━━━━━━━━<br/>run_cmd"]

    START -->|"run review_design"| RD
    RD -->|"STOP verdict"| RESOLVE
    RD -->|"REVISE verdict"| REVISE_ROUTE
    RD -->|"GO verdict"| create_worktree
    REVISE_ROUTE -->|"loop back"| START
    RESOLVE -->|"revised"| REVISE_ROUTE
    RESOLVE -->|"failed"| STOP_OUT
    RD -->|"on_failure / on_exhausted"| create_worktree

    create_worktree["create_worktree<br/>━━━━━━━━━━<br/>★ copies review-cycles<br/>plan-versions artifacts"]

    create_worktree --> decompose["decompose_phases<br/>plan_phase<br/>implement_phase"]
    decompose --> experiment["run_experiment<br/>write_report"]
    experiment --> TEST

    TEST -->|"pass"| COMMIT
    TEST -->|"fail"| FIX
    FIX --> RETEST
    RETEST -->|"pass"| COMMIT
    RETEST -->|"fail"| PUSH

    COMMIT -->|"success or failure"| PUSH
    PUSH --> COMPLETE

    SCOPE -.->|"constraint applied to<br/>all dimension subagents"| CLASSIFY
    RTCAP --> CLASSIFY
    CLASSIFY --> ACTIVE
    ACTIVE --> THRESH
    THRESH --> VERDICT
    VERDICT -->|"stop_triggers"| STOP_OUT
    VERDICT -->|"critical_findings or<br/>warnings ≥ threshold"| REVISE_ROUTE
    VERDICT -->|"else"| create_worktree

    class START,COMPLETE,STOP_OUT terminal;
    class RD,RESOLVE,decompose,experiment,FIX handler;
    class REVISE_ROUTE,RTCAP,CLASSIFY phase;
    class VERDICT,ACTIVE stateNode;
    class SCOPE detector;
    class THRESH,COMMIT,create_worktree newComponent;
    class TEST,RETEST,PUSH output;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Terminal | Start, complete, and terminal states |
| Orange | Handler | Processing steps (run_skill, run_cmd) |
| Purple | Phase | Control flow, routing, severity capping |
| Teal | State | Decision and counting nodes |
| Red | Detector | Constraint gates (evaluative scope) |
| Green | New | ★ new components, ● modified components |
| Dark Teal | Output | test_check steps and push_branch |

Closes #629

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-20260405-160303-009353/.autoskillit/temp/make-plan/fix-review-design-threshold-unreachable-prescriptive-finding_plan_2026-04-05_161500.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| plan | 2.8k | 22.6k | 1.2M | 85.0k | 1 | 10m 36s |
| verify | 30 | 14.6k | 1.5M | 74.8k | 1 | 8m 28s |
| implement | 62 | 19.9k | 4.1M | 92.5k | 1 | 7m 41s |
| audit_impl | 87 | 10.6k | 473.5k | 47.1k | 1 | 6m 41s |
| open_pr | 25 | 11.7k | 806.3k | 48.9k | 1 | 4m 22s |
| **Total** | 3.0k | 79.4k | 8.1M | 348.3k | | 37m 50s |

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…ound Bash Tasks (#633)

## Summary

Headless sessions running long-lived background Bash tasks (e.g. `cargo
bench` launched via
`run_in_background: true`) are killed as stale because the staleness
signal is JSONL file growth,
not actual session liveness. When the LLM goes idle waiting for a
background child, the JSONL
stops growing and the 20-minute staleness threshold is breached — even
though child processes are
actively running.

Three changes eliminate this class of false kill:

1. **`_has_active_child_processes`** — a second suppression gate in
`_session_log_monitor` that
checks child process CPU activity before issuing a kill. Added alongside
the existing
   `_has_active_api_connection` port-443 gate.

2. **`RecipeStep.stale_threshold`** — an optional per-step threshold
field that recipe authors
can raise for steps known to run long-lived experiments, passed through
`run_skill` →
   `run_headless_core` → `_session_log_monitor`.

3. **Recipe YAML overrides** — `stale_threshold: 2400` (40 min) on
specific long-running steps
in `research.yaml`, `implementation.yaml`, `remediation.yaml`,
`implementation-groups.yaml`,
   and `merge-prs.yaml`.

## Requirements

### STALE — Staleness Suppression via Child Process Detection

- **REQ-STALE-001:** The system must detect active child processes in
the headless session's process tree when the stale threshold is
breached.
- **REQ-STALE-002:** The system must suppress the stale kill when any
child process in the tree reports CPU usage exceeding ~10% via
`cpu_percent(interval=0)`.
- **REQ-STALE-003:** The system must reset the staleness clock
(`last_change`) when child process activity suppresses the stale kill,
identical to the existing `_has_active_api_connection` suppression
behavior.
- **REQ-STALE-004:** The child process detection must follow the
established exception-handling pattern, silently skipping
`NoSuchProcess`, `ZombieProcess`, and `AccessDenied` errors per process.
- **REQ-STALE-005:** The child process detection must only execute when
the stale threshold has already been breached (zero performance impact
during normal operation).
- **REQ-STALE-006:** The child process detection must emit a structured
log warning when suppressing a stale kill, following the pattern
established by `_has_active_api_connection`.

### SCHEMA — Per-Step Stale Threshold in RecipeStep

- **REQ-SCHEMA-001:** The `RecipeStep` dataclass must accept an optional
`stale_threshold` field of type `int | None` with no default value
(defaults to `None`).
- **REQ-SCHEMA-002:** When `stale_threshold` is `None` on a recipe step,
the global `RunSkillConfig.stale_threshold` (1200s) must apply.
- **REQ-SCHEMA-003:** The `run_skill` MCP tool handler must accept an
optional `stale_threshold` parameter and forward it to
`run_headless_core`.
- **REQ-SCHEMA-004:** The recipe validator must reject `stale_threshold`
values that are not positive integers when set.

### RECIPE — Research Recipe Step Overrides

- **REQ-RECIPE-001:** Research-oriented recipes must set
`stale_threshold: 2400` (40 minutes) on specific long-running steps
(e.g., `implement_phase`, `run_experiment`).
- **REQ-RECIPE-002:** Fast-completing steps (e.g., `plan_phase`) must
not have a `stale_threshold` override, relying on the global default.

### TEST — Test Coverage

- **REQ-TEST-001:** Unit tests must verify `_has_active_child_processes`
returns `True` when a child process exceeds the CPU threshold.
- **REQ-TEST-002:** Unit tests must verify `_has_active_child_processes`
returns `False` when all children are idle, when no children exist, and
when exceptions are raised.
- **REQ-TEST-003:** An integration test must verify stale suppression
when a child process is CPU-active but has no port-443 connection.
- **REQ-TEST-004:** The existing
`TestSessionLogMonitorStaleSuppressionGate` test class must be extended
with the child-process-active scenario.

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    START([SESSION LAUNCHED])
    T_COMPLETE([COMPLETION])
    T_STALE([STALE — KILL])

    %% CONFIG CHAIN %%
    subgraph Config ["● RECIPE STEP CONFIG (stale_threshold flow)"]
        direction TB
        RecipeStep["● RecipeStep YAML<br/>━━━━━━━━━━<br/>stale_threshold: 2400<br/>(or unset → None)"]
        RunSkill["● run_skill handler<br/>━━━━━━━━━━<br/>tools_execution.py<br/>stale_threshold: int | None"]
        Runner["DefaultSubprocessRunner<br/>━━━━━━━━━━<br/>process.py<br/>default: 1200s"]
    end

    %% PHASE 1 %%
    subgraph Phase1 ["PHASE 1 — JSONL File Discovery (poll 1s, timeout 30s)"]
        direction TB
        P1_Poll["Poll session_log_dir<br/>━━━━━━━━━━<br/>ctime > spawn_time?<br/>Match session_id?"]
        P1_Found{"File found<br/>within 30s?"}
    end

    %% PHASE 2 %%
    subgraph Phase2 ["● PHASE 2 — Staleness Monitor Loop (poll every 2s)"]
        direction TB
        P2_Stat["stat(session_file)<br/>━━━━━━━━━━<br/>current_size vs last_size"]
        P2_Grew{"JSONL<br/>grew?"}
        P2_Marker["Read new content<br/>━━━━━━━━━━<br/>scan for completion<br/>marker in JSONL"]
        P2_MarkerFound{"Completion<br/>marker found?"}
        P2_ResetGrow["last_size = current_size<br/>last_change = now()"]
        P2_Elapsed{"elapsed >=<br/>stale_threshold?"}
    end

    %% SUPPRESSION GATES %%
    subgraph Gates ["● SUPPRESSION GATES (only fire when stale threshold breached)"]
        direction TB
        Gate1["_has_active_api_connection<br/>━━━━━━━━━━<br/>Walk proc tree<br/>ESTABLISHED port-443?"]
        Gate1_Active{"API conn<br/>active?"}
        Gate2["● _has_active_child_processes<br/>━━━━━━━━━━<br/>Walk child procs<br/>cpu_percent > 10%?"]
        Gate2_Active{"Child CPU<br/>> 10%?"}
        ResetClock["last_change = now()<br/>━━━━━━━━━━<br/>Suppress stale kill<br/>reset staleness clock"]
    end

    %% CONNECTIONS %%
    START --> RecipeStep
    RecipeStep -->|"stale_threshold (int|None)"| RunSkill
    RunSkill -->|"float(x) or None → default 1200s"| Runner
    Runner -->|"stale_threshold, pid"| P1_Poll

    P1_Poll --> P1_Found
    P1_Found -->|"yes"| P2_Stat
    P1_Found -->|"no (30s timeout)"| T_STALE

    P2_Stat --> P2_Grew
    P2_Grew -->|"yes"| P2_ResetGrow
    P2_ResetGrow --> P2_Marker
    P2_Marker --> P2_MarkerFound
    P2_MarkerFound -->|"yes"| T_COMPLETE
    P2_MarkerFound -->|"no"| P2_Elapsed

    P2_Grew -->|"no"| P2_Elapsed
    P2_Elapsed -->|"no (wait)"| P2_Stat
    P2_Elapsed -->|"yes"| Gate1

    Gate1 --> Gate1_Active
    Gate1_Active -->|"yes"| ResetClock
    Gate1_Active -->|"no"| Gate2
    Gate2 --> Gate2_Active
    Gate2_Active -->|"yes"| ResetClock
    Gate2_Active -->|"no"| T_STALE
    ResetClock -->|"continue loop"| P2_Stat

    %% CLASS ASSIGNMENTS %%
    class START,T_COMPLETE,T_STALE terminal;
    class RecipeStep,RunSkill handler;
    class Runner stateNode;
    class P1_Poll,P2_Stat,P2_Marker,P2_ResetGrow,ResetClock phase;
    class P1_Found,P2_Grew,P2_MarkerFound,P2_Elapsed,Gate1_Active,Gate2_Active stateNode;
    class Gate1 handler;
    class Gate2 newComponent;
```

### Concurrency Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    START([SESSION LAUNCHED])
    COMPLETE([TASK GROUP CANCELLED])

    %% MAIN THREAD: Sequential setup %%
    subgraph MainSeq ["MAIN COROUTINE — Sequential Setup"]
        direction TB
        SpawnProc["Spawn Claude Code process<br/>━━━━━━━━━━<br/>asyncio subprocess<br/>get proc.pid"]
        CreateAcc["Create RaceAccumulator + trigger<br/>━━━━━━━━━━<br/>anyio.Event (idempotent set)<br/>channel_b_ready Event"]
        OpenTG["anyio.create_task_group()<br/>━━━━━━━━━━<br/>Fork: start 4–5 coroutines<br/>as tg.start_soon(...)"]
        TrigWait["await trigger.wait()<br/>━━━━━━━━━━<br/>Block until first watcher wins<br/>(or wall-clock timeout)"]
        DrainWait["Optional drain window<br/>━━━━━━━━━━<br/>await channel_b_ready if<br/>process exited but B pending"]
        CancelTG["tg.cancel_scope.cancel()<br/>━━━━━━━━━━<br/>Tear down all remaining tasks"]
        Resolve["resolve_termination(RaceSignals)<br/>━━━━━━━━━━<br/>Priority: exit > stale > completion"]
    end

    %% TASK GROUP: Concurrent watchers %%
    subgraph TaskGroup ["anyio TASK GROUP — Concurrent Watchers (cooperative, single event loop)"]
        direction LR

        subgraph ChA ["Channel A"]
            WatchProc["_watch_process<br/>━━━━━━━━━━<br/>await proc.wait()<br/>acc.process_exited=True"]
            WatchHB["_watch_heartbeat<br/>━━━━━━━━━━<br/>poll stdout NDJSON 0.5s<br/>acc.channel_a_confirmed=True"]
        end

        subgraph ChB ["● Channel B — Session Log"]
            ExtractID["_extract_stdout_session_id<br/>━━━━━━━━━━<br/>poll stdout for type=system<br/>sets stdout_session_id_ready"]
            WatchSL["● _watch_session_log<br/>━━━━━━━━━━<br/>calls _session_log_monitor<br/>acc.channel_b_status=COMPLETION|STALE"]
        end
    end

    %% STALENESS SUPPRESSION %%
    subgraph StaleGates ["● STALENESS SUPPRESSION — Sync psutil walks (inside _session_log_monitor)"]
        direction TB
        Gate1["_has_active_api_connection(pid)<br/>━━━━━━━━━━<br/>[parent + children(recursive=True)]<br/>net_connections port-443 ESTABLISHED?"]
        Gate2["● _has_active_child_processes(pid)<br/>━━━━━━━━━━<br/>[children(recursive=True) only]<br/>cpu_percent(interval=0) > 10%?"]
        ResetClock["last_change = monotonic()<br/>━━━━━━━━━━<br/>suppress stale kill<br/>continue Phase 2 loop"]
        ReturnStale["return STALE<br/>━━━━━━━━━━<br/>acc.channel_b_status = STALE<br/>trigger.set()"]
    end

    %% FLOW %%
    START --> SpawnProc
    SpawnProc --> CreateAcc
    CreateAcc --> OpenTG

    OpenTG -->|"tg.start_soon"| WatchProc
    OpenTG -->|"tg.start_soon"| WatchHB
    OpenTG -->|"tg.start_soon"| ExtractID
    OpenTG -->|"tg.start_soon"| WatchSL

    WatchProc -->|"trigger.set()"| TrigWait
    WatchHB -->|"trigger.set()"| TrigWait
    WatchSL -->|"trigger.set() after drain"| TrigWait

    WatchSL -->|"stale threshold breached"| Gate1
    Gate1 -->|"no API conn"| Gate2
    Gate2 -->|"child CPU active"| ResetClock
    Gate2 -->|"no activity"| ReturnStale
    Gate1 -->|"API conn active"| ResetClock
    ResetClock -->|"continue loop"| WatchSL

    TrigWait --> DrainWait
    DrainWait --> CancelTG
    CancelTG --> Resolve
    Resolve --> COMPLETE

    %% CLASS ASSIGNMENTS %%
    class START,COMPLETE terminal;
    class SpawnProc,CreateAcc,TrigWait,DrainWait,CancelTG,Resolve phase;
    class OpenTG detector;
    class WatchProc,WatchHB handler;
    class ExtractID handler;
    class WatchSL handler;
    class Gate1 handler;
    class Gate2 newComponent;
    class ResetClock output;
    class ReturnStale detector;
```

### State Lifecycle Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    START([RECIPE YAML LOADED])
    T_PASS([VALID — forwarded to run_skill])
    T_FAIL([INVALID — validation error])

    %% PARSE LAYER %%
    subgraph Parse ["● YAML → RecipeStep (io.py _parse_step)"]
        direction TB
        YAMLRead["● YAML key read<br/>━━━━━━━━━━<br/>data.get('stale_threshold')<br/>absent → None (no coercion)"]
        Construct["● RecipeStep(...)<br/>━━━━━━━━━━<br/>stale_threshold: int | None = None<br/>No __post_init__ mutations"]
        IntegrityGuard["_PARSE_STEP_HANDLED_FIELDS guard<br/>━━━━━━━━━━<br/>compile-time assert: fields == dataclass<br/>RuntimeError if diverged"]
    end

    %% VALIDATION LAYER %%
    subgraph Validation ["● STRUCTURAL VALIDATION (validator.py validate_recipe)"]
        direction TB
        IsNone{"stale_threshold<br/>is None?"}
        TypeCheck{"isinstance(int)<br/>AND > 0?"}
        AppendError["append error<br/>━━━━━━━━━━<br/>'must be positive integer<br/>when set'"]
        PassThrough["field passes<br/>━━━━━━━━━━<br/>no validation error<br/>for None or valid int"]
    end

    %% SEMANTIC LAYER %%
    subgraph Semantic ["● SEMANTIC RULE — _TOOL_PARAMS registry (rules_tools.py)"]
        direction TB
        ToolParamsCheck["_TOOL_PARAMS['run_skill']<br/>━━━━━━━━━━<br/>frozenset includes 'stale_threshold'<br/>dead-with-param rule: NO warning"]
        OtherToolWarn["Other tools<br/>━━━━━━━━━━<br/>stale_threshold not in their params<br/>dead-with-param: WARNING emitted"]
    end

    %% EXECUTION FORWARDING %%
    subgraph Execution ["EXECUTION FORWARDING (tools_execution.py run_skill)"]
        direction TB
        NullPath["stale_threshold = None<br/>━━━━━━━━━━<br/>→ DefaultSubprocessRunner default<br/>= 1200s (global config)"]
        OverridePath["stale_threshold = int<br/>━━━━━━━━━━<br/>float(stale_threshold)<br/>→ overrides global default"]
        Monitor["_session_log_monitor<br/>━━━━━━━━━━<br/>stale_threshold used as<br/>breach-detection window"]
    end

    %% FLOW %%
    START --> YAMLRead
    YAMLRead --> Construct
    Construct --> IntegrityGuard
    IntegrityGuard -->|"fields match — import OK"| IsNone

    IsNone -->|"yes (absent or None)"| PassThrough
    IsNone -->|"no (value present)"| TypeCheck
    TypeCheck -->|"valid"| PassThrough
    TypeCheck -->|"invalid (non-int or ≤ 0)"| AppendError
    AppendError --> T_FAIL
    PassThrough --> ToolParamsCheck

    ToolParamsCheck -->|"tool: run_skill"| T_PASS
    ToolParamsCheck -->|"other tool"| OtherToolWarn

    T_PASS --> NullPath
    T_PASS --> OverridePath
    NullPath --> Monitor
    OverridePath --> Monitor
    Monitor --> T_PASS

    %% CLASS ASSIGNMENTS %%
    class START,T_PASS,T_FAIL terminal;
    class YAMLRead,Construct handler;
    class IntegrityGuard detector;
    class IsNone,TypeCheck stateNode;
    class AppendError detector;
    class PassThrough output;
    class ToolParamsCheck newComponent;
    class OtherToolWarn gap;
    class NullPath,OverridePath,Monitor phase;
```

Closes #631

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-20260405-170436-566038/.autoskillit/temp/make-plan/fix_false_stale_kills_plan_2026-04-05_000000.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| plan | 2.8k | 45.6k | 2.0M | 151.7k | 2 | 19m 31s |
| verify | 62 | 36.0k | 3.3M | 155.3k | 2 | 15m 1s |
| implement | 149 | 47.2k | 9.6M | 183.8k | 2 | 16m 24s |
| audit_impl | 102 | 20.0k | 762.1k | 90.1k | 2 | 10m 31s |
| open_pr | 69 | 39.4k | 2.6M | 116.8k | 2 | 15m 32s |
| review_pr | 38 | 57.4k | 1.8M | 103.1k | 1 | 18m 47s |
| resolve_review | 55 | 32.5k | 3.1M | 84.3k | 1 | 14m 9s |
| fix | 38 | 14.6k | 1.3M | 58.3k | 1 | 9m 9s |
| **Total** | 3.3k | 292.6k | 24.3M | 943.5k | | 1h 59m |

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary

All four bundled recipes (`implementation`, `remediation`, `merge-prs`,
`implementation-groups`)
currently ship with `audit: default: "true"`, meaning `audit-impl` runs
unless explicitly
disabled. This plan changes all four recipes to `default: "false"` so
`audit-impl` is skipped
by default and becomes opt-in. No structural changes to the step graph,
routing, or test
infrastructure are needed — only the ingredient default changes.

**Scope:** 4 YAML ingredient default changes + 1 test assertion added.

## Requirements

### RCFG — Recipe Configuration

- **REQ-RCFG-001:** The `audit` input in `implementation.yaml` must
default to `"false"`.
- **REQ-RCFG-002:** The `audit` input in `implementation-groups.yaml`
must default to `"false"`.
- **REQ-RCFG-003:** The `audit` input in `remediation.yaml` must default
to `"false"`.
- **REQ-RCFG-004:** The `audit` input in `merge-prs.yaml` must default
to `"false"`.
- **REQ-RCFG-005:** The `audit_impl` step definition and its
`skip_when_false: "inputs.audit"` guard must remain unchanged in all
recipes.
- **REQ-RCFG-006:** Callers must still be able to opt in to audit-impl
by passing `audit: "true"` at pipeline invocation time.

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    START([Pipeline Invoked])
    CONTINUE([Continue to push / merge])
    ERROR([escalate_stop / register_clone_failure])

    subgraph Ingredient ["● Ingredient Resolution"]
        direction TB
        AuditIng["● audit ingredient<br/>━━━━━━━━━━<br/>BEFORE: default='true'<br/>AFTER: default='false'"]
    end

    subgraph Gate ["skip_when_false Gate"]
        direction TB
        SkipCheck{"inputs.audit == 'true'?"}
        SkipBypass["BYPASS<br/>━━━━━━━━━━<br/>Skip audit_impl<br/>(now default path)"]
        RunAudit["● run audit-impl skill<br/>━━━━━━━━━━<br/>runs /autoskillit:audit-impl<br/>(now opt-in path)"]
        Verdict{"GO / NO GO?"}
        Remediate["remediate<br/>━━━━━━━━━━<br/>Route to remediation<br/>or re-plan"]
    end

    %% FLOW %%
    START --> AuditIng
    AuditIng -->|"resolves to 'false'<br/>(new default)"| SkipCheck
    SkipCheck -->|"false (default — bypass)"| SkipBypass
    SkipCheck -->|"true (opt-in — explicit)"| RunAudit
    RunAudit --> Verdict
    Verdict -->|"GO"| CONTINUE
    Verdict -->|"NO GO"| Remediate
    Verdict -->|"error"| ERROR
    Remediate -->|"re-plan loop"| START
    SkipBypass --> CONTINUE

    %% CLASS ASSIGNMENTS %%
    class START,CONTINUE,ERROR terminal;
    class AuditIng handler;
    class SkipCheck,Verdict stateNode;
    class SkipBypass phase;
    class RunAudit detector;
    class Remediate phase;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Terminal | Pipeline start, continuation, and error states
|
| Teal | State | Decision gates (skip_when_false, GO/NO GO) |
| Orange | Handler | ● Audit ingredient (modified: default flipped to
"false") |
| Red | Detector | ● audit-impl skill execution (now opt-in path) |
| Purple | Phase | Bypass path (now default) and remediation routing |

### State Lifecycle Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    START([Recipe Invoked])
    GATE([skip_when_false Evaluated])

    subgraph Contracts ["● INGREDIENT CONTRACT DEFINITIONS"]
        direction TB
        ImplYaml["● implementation.yaml<br/>━━━━━━━━━━<br/>audit:<br/>  default: 'false'<br/>(was: 'true')"]
        ImplGroupsYaml["● implementation-groups.yaml<br/>━━━━━━━━━━<br/>audit:<br/>  default: 'false'<br/>(was: 'true')"]
        RemediationYaml["● remediation.yaml<br/>━━━━━━━━━━<br/>audit:<br/>  default: 'false'<br/>(was: 'true')"]
        MergePrsYaml["● merge-prs.yaml<br/>━━━━━━━━━━<br/>audit:<br/>  default: 'false'<br/>(was: 'true')"]
    end

    subgraph Resolution ["INIT_ONLY: Ingredient Resolution"]
        direction TB
        CallerSupplied["Caller-supplied value<br/>━━━━━━━━━━<br/>audit='true' (opt-in)<br/>INIT_ONLY — frozen for run"]
        DefaultApplied["● Contract default applied<br/>━━━━━━━━━━<br/>audit='false'<br/>INIT_ONLY — frozen for run"]
    end

    subgraph TestGate ["● CONTRACT VALIDATION (test_bundled_recipes.py)"]
        direction TB
        TestAssert["● test_audit_ingredient_defaults_to_false<br/>━━━━━━━━━━<br/>@pytest.mark.parametrize<br/>asserts audit.default == 'false'<br/>for all 4 recipes"]
    end

    %% FLOW %%
    START -->|"caller passes audit='true'"| CallerSupplied
    START -->|"no audit arg (default)"| DefaultApplied
    ImplYaml --> DefaultApplied
    ImplGroupsYaml --> DefaultApplied
    RemediationYaml --> DefaultApplied
    MergePrsYaml --> DefaultApplied
    CallerSupplied --> GATE
    DefaultApplied --> GATE

    Contracts -.->|"validated by"| TestAssert

    %% CLASS ASSIGNMENTS %%
    class START terminal;
    class GATE stateNode;
    class ImplYaml,ImplGroupsYaml,RemediationYaml,MergePrsYaml handler;
    class CallerSupplied detector;
    class DefaultApplied phase;
    class TestAssert gap;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Terminal | Pipeline invocation point |
| Teal | Gate | skip_when_false evaluation (INIT_ONLY field read) |
| Orange | Contract | ● Recipe YAML ingredient contract definitions
(modified) |
| Red | Opt-in | Caller-supplied value override (explicit audit='true')
|
| Purple | Default | ● Contract default applied (now 'false') |
| Yellow | Test | ● Contract validation test assertion (new) |

Closes #632

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-20260405-180825-135856/.autoskillit/temp/make-plan/feat_default_audit_impl_off_plan_2026-04-05_181000.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| plan | 2.8k | 60.3k | 4.0M | 213.2k | 3 | 24m 25s |
| verify | 82 | 43.0k | 3.9M | 193.2k | 3 | 22m 22s |
| implement | 176 | 53.6k | 10.3M | 221.3k | 3 | 18m 51s |
| audit_impl | 117 | 25.1k | 1.0M | 114.6k | 3 | 12m 6s |
| open_pr | 101 | 60.0k | 3.7M | 168.5k | 3 | 22m 39s |
| review_pr | 71 | 112.5k | 3.4M | 189.2k | 2 | 33m 19s |
| resolve_review | 77 | 40.4k | 3.7M | 117.7k | 2 | 18m 16s |
| fix | 38 | 14.6k | 1.3M | 58.3k | 1 | 9m 9s |
| **Total** | 3.5k | 409.5k | 31.4M | 1.3M | | 2h 41m |

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Increase sensitivity to catch quota exhaustion earlier, giving more
buffer before hard API limits are hit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…l for Experiment Failures (#636)

## Summary

This plan adds automated failure diagnosis to the research pipeline
(issue #635). There are two distinct requirements:

**DIAG**: Create a `troubleshoot-experiment` skill that reads session
logs and process traces to classify why a research step failed, then
emit a structured diagnostic artifact and `is_fixable` signal. Wire this
skill into `research.yaml` so that `implement_phase` failures route to
it instead of dying at `escalate_stop`.

**SEP**: Fix the structural misuse of `retry-worktree` in
`implement_phase`. The skill `retry-worktree` is designed to *resume*
context-exhausted `implement-worktree` sessions — it is not a primary
implementation driver. The research recipe already has the correct
purpose-built skill: `implement-experiment`, which explicitly forbids
experiment execution during implementation and routes context exhaustion
directly to `run-experiment`. Switching `implement_phase` to use
`implement-experiment` addresses REQ-SEP-001 and REQ-SEP-002 at the
skill level, where the constraint is enforceable.

## Requirements

### DIAG — Experiment Failure Diagnosis

- **REQ-DIAG-001:** The system must provide a skill that investigates
why a research recipe step failed by reading session logs and process
traces.
- **REQ-DIAG-002:** The skill must classify the failure type (stale
timeout, context exhaustion, build failure, data missing, parameter
issue, unknown).
- **REQ-DIAG-003:** The skill must emit a structured diagnostic artifact
that downstream steps or the human can act on.
- **REQ-DIAG-004:** The research recipe must route experiment failures
to the diagnostic skill instead of `escalate_stop`.

### SEP — Structural Separation of Implementation and Execution

- **REQ-SEP-001:** Implementation worktree steps must not perform
experiment execution (benchmarks, profiling, data collection).
- **REQ-SEP-002:** Experiment execution must route through the
`run_experiment` step (or equivalent) which has appropriate timeout and
retry semantics.

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    START([RESEARCH PIPELINE])
    ESCALATE([escalate_stop])
    COMPLETE([research_complete])

    subgraph PhaseMgmt ["Phase Management"]
        plan_phase["● plan_phase<br/>━━━━━━━━━━<br/>make-plan skill<br/>plans current group"]
        implement_phase["● implement_phase<br/>━━━━━━━━━━<br/>implement-experiment<br/>(was: retry-worktree)<br/>stale_threshold: 2400"]
        next_phase{"next_phase_or_experiment<br/>━━━━━━━━━━<br/>more phases?"}
    end

    subgraph DiagPhase ["★ Failure Diagnosis (NEW)"]
        troubleshoot["★ troubleshoot_implement_failure<br/>━━━━━━━━━━<br/>troubleshoot-experiment skill<br/>worktree_path + implement_phase"]
        route_fix{"★ route_implement_failure<br/>━━━━━━━━━━<br/>is_fixable?"}
    end

    subgraph SkillInternals ["★ troubleshoot-experiment Internals"]
        direction TB
        init_idx["★ initialize code-index<br/>━━━━━━━━━━<br/>set_project_path(worktree_path)"]
        session_lookup["★ locate failed session<br/>━━━━━━━━━━<br/>sessions.jsonl<br/>select success=false + cwd match"]
        read_diags["★ read session diagnostics<br/>━━━━━━━━━━<br/>summary.json: termination_reason<br/>write_call_count, exit_code<br/>anomalies.jsonl: kind, severity"]
        classify{"★ classify failure type<br/>━━━━━━━━━━<br/>priority-ordered<br/>decision table"}
        write_diag["★ diagnosis_{ts}.md<br/>━━━━━━━━━━<br/>failure_type, is_fixable<br/>evidence + recommended action"]
        emit_tokens["★ emit output tokens<br/>━━━━━━━━━━<br/>diagnosis_path=<br/>failure_type=<br/>is_fixable="]
    end

    subgraph ExperimentPhase ["Experiment Phase"]
        run_experiment["run_experiment<br/>━━━━━━━━━━<br/>run-experiment skill<br/>stale_threshold: 2400, retries: 2"]
    end

    START --> plan_phase
    plan_phase --> implement_phase

    implement_phase -->|"on_success"| next_phase
    implement_phase -->|"on_failure"| troubleshoot
    implement_phase -->|"on_exhausted / on_context_limit"| run_experiment

    next_phase -->|"more_phases"| plan_phase
    next_phase -->|"done"| run_experiment

    troubleshoot --> init_idx
    init_idx --> session_lookup
    session_lookup -->|"session found"| read_diags
    session_lookup -->|"no session / missing log"| write_diag
    read_diags --> classify

    classify -->|"context_limit → context_exhaustion, fixable=true"| write_diag
    classify -->|"stale + write=0 → stale_timeout, fixable=true"| write_diag
    classify -->|"exit!=0 + build error → build_failure, fixable=true"| write_diag
    classify -->|"infra error / OOM → environment_error, fixable=false"| write_diag
    classify -->|"unknown"| write_diag

    write_diag --> emit_tokens
    emit_tokens --> route_fix

    route_fix -->|"is_fixable=true"| plan_phase
    route_fix -->|"is_fixable=false"| ESCALATE

    troubleshoot -->|"on_failure (skill crash)"| ESCALATE

    run_experiment --> COMPLETE

    class START,ESCALATE,COMPLETE terminal;
    class plan_phase,implement_phase handler;
    class next_phase,route_fix,classify stateNode;
    class troubleshoot,init_idx,session_lookup,read_diags,write_diag,emit_tokens newComponent;
```

### Module Dependency Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%%
graph TB
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;

    subgraph L2_Recipe ["L2 — Recipe System"]
        recipe_io["recipe/io.py<br/>━━━━━━━━━━<br/>load_recipe, builtin_recipes_dir"]
        recipe_validator["recipe/validator.py<br/>━━━━━━━━━━<br/>validate_recipe"]
        recipe_contracts["recipe/contracts.py<br/>━━━━━━━━━━<br/>contract card generation"]
    end

    subgraph L1_Workspace ["L1 — Workspace"]
        workspace_skills["workspace/skills.py<br/>━━━━━━━━━━<br/>SkillResolver<br/>discovers skills_extended/"]
    end

    subgraph L0_Core ["L0 — Core"]
        core_paths["core/paths.py<br/>━━━━━━━━━━<br/>pkg_root()<br/>canonical package root"]
    end

    subgraph DataRecipes ["Data — Recipes (YAML)"]
        research_yaml["● recipes/research.yaml<br/>━━━━━━━━━━<br/>implement-experiment (was: retry-worktree)<br/>on_failure → troubleshoot_implement_failure<br/>on_exhausted → run_experiment"]
    end

    subgraph DataContracts ["Data — Contracts (YAML)"]
        skill_contracts["● recipe/skill_contracts.yaml<br/>━━━━━━━━━━<br/>★ troubleshoot-experiment entry<br/>is_fixable output pattern"]
    end

    subgraph DataSkills ["Data — Skills (SKILL.md)"]
        troubleshoot_skill["★ skills_extended/troubleshoot-experiment/<br/>━━━━━━━━━━<br/>session log reader<br/>failure classifier, is_fixable emitter"]
        implement_exp["skills_extended/implement-experiment/<br/>━━━━━━━━━━<br/>no experiment execution<br/>routes exhaustion → run-experiment"]
    end

    subgraph Tests ["Tests"]
        test_diag["★ tests/recipe/test_research_recipe_diag.py<br/>━━━━━━━━━━<br/>validates research.yaml routing<br/>asserts skill_command swap"]
        test_contracts["★ tests/skills/test_troubleshoot_experiment_contracts.py<br/>━━━━━━━━━━<br/>SkillResolver discovery<br/>SKILL.md existence"]
        test_skills_ws["● tests/workspace/test_skills.py<br/>━━━━━━━━━━<br/>skill count +1"]
    end

    recipe_io -->|"loads at runtime"| research_yaml
    recipe_validator -->|"validates"| research_yaml
    recipe_contracts -->|"loads at runtime"| skill_contracts
    research_yaml -->|"skill_command references"| troubleshoot_skill
    research_yaml -->|"skill_command references"| implement_exp
    skill_contracts -->|"contract entry for"| troubleshoot_skill
    workspace_skills -->|"discovers via pkg_root()"| troubleshoot_skill
    workspace_skills -->|"uses"| core_paths
    test_diag -->|"imports"| recipe_io
    test_diag -->|"imports"| recipe_validator
    test_contracts -->|"imports"| workspace_skills
    test_contracts -->|"imports"| core_paths
    test_skills_ws -->|"imports"| workspace_skills

    class recipe_io,recipe_validator,recipe_contracts phase;
    class workspace_skills handler;
    class core_paths stateNode;
    class research_yaml,skill_contracts output;
    class troubleshoot_skill newComponent;
    class implement_exp handler;
    class test_diag,test_contracts newComponent;
    class test_skills_ws handler;
```

Closes #635

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-20260405-193031-162971/.autoskillit/temp/make-plan/research_recipe_troubleshoot_plan_2026-04-05_193500.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| plan | 2.9k | 93.0k | 4.6M | 271.4k | 4 | 37m 40s |
| verify | 109 | 64.2k | 5.4M | 277.1k | 4 | 28m 55s |
| implement | 224 | 71.2k | 12.5M | 282.5k | 4 | 32m 50s |
| audit_impl | 117 | 25.1k | 1.0M | 114.6k | 3 | 12m 6s |
| open_pr | 131 | 76.9k | 4.8M | 232.2k | 4 | 27m 43s |
| review_pr | 100 | 134.7k | 4.3M | 237.6k | 3 | 38m 8s |
| resolve_review | 77 | 40.4k | 3.7M | 117.7k | 2 | 18m 16s |
| fix | 91 | 32.1k | 3.8M | 120.9k | 2 | 21m 36s |
| diagnose_ci | 13 | 1.4k | 161.4k | 15.6k | 1 | 37s |
| resolve_ci | 18 | 3.7k | 293.8k | 29.1k | 1 | 3m 2s |
| **Total** | 3.8k | 542.7k | 40.5M | 1.7M | | 3h 40m |

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
from pathlib import Path
from typing import Any, Literal

import httpx
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] arch: httpx imported at module level — adds a hard runtime dep on every CLI invocation even when update checks are suppressed. Should be deferred inside the function that uses it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. Duplicate of 3084082999. (category: false_positive_intentional_pattern)

Comment thread src/autoskillit/cli/_update_checks.py Outdated
)


def _check_plugin_cache_exists(
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] bugs: _check_plugin_cache_exists has no exception handler around detect_install(). Any classification error will surface as an unhandled exception rather than a graceful DoctorResult.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. Duplicate of 3084083006. (category: false_positive_intentional_pattern)

return

async with anyio.create_task_group() as tg:
await tg.start(_watch, tg.cancel_scope)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] bugs: mcp_server.run_async() exception before cancel_scope is cancelled propagates silently in the task group rather than being logged.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. Duplicate of 3084083010. (category: design_intent_misread)

import sys

from autoskillit.cli._ansi import supports_color
from autoskillit.cli._init_helpers import _require_interactive_stdin
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] arch: _timed_input imports from _init_helpers creating a circular-risk peer dependency; _require_interactive_stdin should be inlined or moved to a lower layer.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. Duplicate of 3084083014. (category: false_positive_intentional_pattern)

Comment thread src/autoskillit/recipe/_analysis.py Outdated
Comment thread src/autoskillit/recipe/_analysis.py
# Deferred import breaks the circular dependency with _analysis.py.
from autoskillit.recipe._analysis import _build_step_graph, extract_blocks # noqa: PLC0415

recipe.blocks = extract_blocks(recipe, _build_step_graph(recipe))
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] arch: load_recipe mutates Recipe.blocks after construction via extract_blocks, breaking the dataclass immutability contract. Code that calls _parse_recipe directly (e.g. contracts.py) gets a Recipe with empty blocks, silently.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. Duplicate of 3084083031. (category: false_positive_intentional_pattern)

Comment thread src/autoskillit/recipe/rules_fixing.py Outdated
Comment thread src/autoskillit/recipe/rules_reachability.py
Comment thread src/autoskillit/recipe/rules_reachability.py Outdated
budget_entry = _budget_for(bctx.block.name)
if "run_cmd" not in budget_entry:
return [] # No run_cmd budget declared for this block — skip check
budget = int(budget_entry["run_cmd"])
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] bugs: _check_block_run_cmd_budget: int(budget_entry['run_cmd']) will raise ValueError/TypeError if the YAML value is not a valid integer (e.g. a float or string). Should guard with try/except or explicit type check.

@lru_cache(maxsize=1)
def _block_budgets() -> Mapping[str, Mapping[str, Any]]:
"""Load block_budgets.yaml, cached for the lifetime of the process."""
path = pkg_root() / "recipe" / "block_budgets.yaml"
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] defense: _block_budgets() catches FileNotFoundError but not YAMLError or ValueError from load_yaml. A malformed block_budgets.yaml returns empty dict and all block rules silently skip.

should never contain these characters, but the guard makes the failure
loud and free.
"""
if "\n" in temp_dir_relpath or ": " in temp_dir_relpath:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] defense: substitute_temp_placeholder only guards against newline and ': '. A path containing # or [ could still produce malformed YAML. The guard comment says 'filesystem paths should never contain these' but this is an assertion, not enforcement.

@@ -97,6 +123,9 @@ class Recipe:
kitchen_rules: list[str] = field(default_factory=list)
version: str | None = None
experimental: bool = False
requires_packs: list[str] = field(default_factory=list)
# Populated by extract_blocks() during load; empty tuple for recipes with no block: anchors.
blocks: tuple[RecipeBlock, ...] = field(default_factory=tuple)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] arch: Recipe.blocks has a two-phase initialization pattern: _parse_recipe produces empty blocks, while load_recipe populates them. Code calling _parse_recipe directly gets a Recipe with empty blocks silently, with no sentinel or incomplete-state marker.

if isinstance(data, dict):
spec = _parse_experiment_type(data, path)
result[spec.name] = spec
except Exception:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] defense: _load_types_from_dir catches bare Exception and logs a warning, but this also hides TypeError/AttributeError from _parse_experiment_type bugs during development. Narrow to (ValueError, TypeError, OSError, KeyError).

Comment thread src/autoskillit/recipe/rules_contracts.py
_AUTOSKILLIT_LOG_DIR_ENV = "AUTOSKILLIT_LOG_DIR"


def _read_quota_cache(cache_path_str: str, max_age: int) -> dict | None:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] cohesion: _read_quota_cache is duplicated verbatim from quota_guard.py. These two sibling stdlib hooks share identical logic that should live in _hook_settings.py to avoid drift.

return None


def _resolve_quota_log_dir() -> Path | None:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] cohesion: _resolve_quota_log_dir is duplicated verbatim from quota_guard.py. Same consolidation opportunity.

return None


def _write_quota_log_event(event: dict, log_dir: Path | None) -> None:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] cohesion: _write_quota_log_event is duplicated verbatim from quota_guard.py. Three identical helper functions across two quota hooks with no shared home.

_age = datetime.now(UTC) - _opened_at
if _age.total_seconds() >= _ttl_hours * 3600:
_p.unlink()
except Exception:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] defense: On a corrupt marker file, the sweep calls _p.unlink() — silently deleting a file that may be temporarily unreadable (e.g. EINTR or disk flush). At minimum log before deleting.

Comment thread src/autoskillit/hooks/_hook_settings.py Outdated
with os.fdopen(fd, "w", encoding="utf-8") as f:
f.write(payload)
os.replace(tmp, marker_path)
except Exception:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] defense: On exception in _write_kitchen_marker, bare raise re-raises after cleanup, but the caller catches and emits a warning message — the original exception traceback is lost to the user (only str(e) survives). Consider logging traceback to stderr.

for remote_name in ("upstream", "origin"):
result = _probe_single_remote(source, remote_name)
last_result = result
if result.reason == "ok" and _is_not_file_url(result.url):
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] bugs: _probe_clone_source_url: when both upstream and origin timeout, caller gets 'timeout' reason with no indication that both probes timed out.

Comment thread src/autoskillit/workspace/clone_registry.py Outdated
def __init__(self) -> None:
self._resolver = SkillResolver()
def __init__(self, temp_dir_relpath: str = ".autoskillit/temp") -> None:
if "\n" in temp_dir_relpath or ": " in temp_dir_relpath:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] defense: SkillsDirectoryProvider.init validates temp_dir_relpath for newline and ': ' but not for other YAML-unsafe chars like bare colon, {}, or []. The guard catches the most dangerous cases but documents incomplete coverage.

Trecek and others added 2 commits April 14, 2026 22:21
- cli/_update_checks: _api_sha now tries refs/tags for tag revisions
- config/settings: annotate _EXIT_GRACE_BUFFER_MS as ClassVar[int]
- execution/_process_monitor: cache psutil.Process objects across calls
  so cpu_percent(interval=0) returns meaningful deltas
- hooks/_hook_settings: add ENV_DISABLED env-var override for disabled
- workspace/clone_registry: wrap open+flock in try/except in __enter__
  to prevent fd leak if flock() raises
- recipe/_analysis: extract_blocks accepts precomputed predecessors map
  to avoid duplicate computation; add warning logs for fallback
  entry/exit selection
- recipe/rules_fixing: use deque.popleft() instead of list.pop(0)
- recipe/rules_reachability: use ctx.predecessors in _ancestors();
  _find_capture_producers returns all producers
- recipe/rules_contracts: log warning on unreadable SKILL.md
- server/tools_kitchen: add gate.disable() on start_quota_refresh
  failure for consistency
- server/_factory: make recording ImportError degrade gracefully like
  replay path
- server/_wire_compat: use model_copy() instead of in-place mutation
  to avoid modifying shared FastMCP tool registry objects

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update JSON write site allowlist line numbers for clone_registry and
  tools_kitchen after code changes shifted lines
- Wire compat middleware tests: use model_copy mock returns instead of
  in-place mutation expectations
- Process monitor tests: account for two-call priming pattern with
  cached psutil.Process objects; clear module cache between tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Trecek Trecek merged commit bb81d16 into main Apr 15, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment