Promote integration to main (198 PRs, 182 issues, 102 fixes, 107 features, 1 infra, 7 tests, 4 docs) by Trecek · Pull Request #925 · TalonT-Org/AutoSkillit

Trecek · 2026-04-14T23:49:35Z

Promotion: integration to main

This promotion merges 382 commits across 198 PRs from the integration branch into main, advancing AutoSkillit from v0.5.2 to v0.8.38 across three minor release cycles. The release delivers a production-grade research pipeline with containerized Micromamba execution, a 31-lens visualization and experiment family, and full output-mode routing. The quota guard was fundamentally redesigned with a dual-window model, while headless session orchestration gained single-skill mode, anomaly detection, idle timeouts, and verdict-gated CI recovery. Merge workflow reliability was hardened with three-way routing and a cheap rebase gate, and the recording/replay infrastructure was rewritten in Rust via PyO3.

Stats: 671 files changed, 99570 insertions(+), 12642 deletions(-) | 102 fixes, 107 features, 7 tests, 1 infra, 4 docs

Highlights

Research pipeline overhaul: Containerized Micromamba execution, YAML-driven experiment type registry, 12 vis-lens + 19 exp-lens skill families, plan-visualization, report bundling, and output_mode ingredient (breaking: default changed from implicit pr to local)
Quota guard redesign: Dual-window model (short 85% / long 98%) with per-window enable/disable toggles, three-layer resilience, and a new disable_quota_guard MCP tool
Verdict-gated CI recovery: resolve-failures emits typed verdicts (real_fix / already_green / flake_suspected / ci_only_failure); recipes route via on_result gates enforced by a new semantic rule
Headless session hardening: Single-skill mode, MAX_MCP_OUTPUT_TOKENS injection, D-state/high-CPU anomaly detection, idle_output_timeout per step, structured crash-path error returns
Skill system expansion: Skill pack registry, sub-skill dependency activation, prepare-pr/compose-pr decomposition, validate-audit skill, review-design/resolve-design-review skills, first-run guided onboarding

Release Notes

New Features

Research Pipeline

Containerized experiment execution via Micromamba; experiments run in isolated conda environments
YAML-driven experiment type registry (ExperimentTypeSpec) with 5 bundled types
12 vis-lens visualization skills and 19 exp-lens experiment lens skills
plan-visualization step wired post-design-review; output_mode ingredient (local | pr)
Post-completion archival phase with artifact merge, re-validation, and escalation routing
bundle-local-report skill for offline report packaging

Quota Guard

Dual-window quota model: short window (default 85%) + long window (default 98%) thresholds
Per-window enable/disable toggles; three-layer resilience (PreToolUse → PostToolUse → MCP)
New disable_quota_guard MCP tool for session-scoped opt-out
Background refresh loop (240s) + post-run_skill refresh keeps cache warm

Headless Session Orchestration

Single-skill mode for scoped headless sessions
MAX_MCP_OUTPUT_TOKENS injected at builder level for all session types
D-state process and high-CPU anomaly detection with configurable thresholds
idle_output_timeout per-step recipe override; bounded staleness suppression (1800s max)
Structured error returns on crash path; contract recovery nudge for missing structured output

Merge Workflow

Three-way merge routing for autoMergeAllowed repos (queue / direct / immediate)
Cheap rebase gate before conflict-resolution skill invocation
Merge queue classifier immunity; queue ejection loop fix

Skill System

Skill pack registry with YAML-defined packs and configurable visibility
Sub-skill dependency activation; prepare-pr/compose-pr decomposition replacing open-pr
validate-audit, review-design, resolve-design-review, resolve-research-review skills
--resume flag for cook and order CLI commands
chart-course and check-bearing project-local strategic skills

Recording/Replay Infrastructure

RecordingSubprocessRunner and SequencingSubprocessRunner for api-simulator
McpRecordingMiddleware for MCP-level capture; api-simulator rewritten in Rust via PyO3

CLI & Onboarding

First-run detection with guided onboarding experience
Stale-install detection with auto-detect prompt
terminal_guard() alternate-screen buffer envelope; terminal freeze immunity
Strict schema validation for config.yaml; stable recipe listing order

Token Telemetry

Cross-contamination fix via order_id scoping
Token summary table split into 4 distinct API token fields
Token summary uses GitHub REST API instead of gh pr edit/view

MCP Server

Wire-format sanitization middleware (_wire_compat.py)
Startup race fix; editable install guard; lifespan readiness sentinel
Signal-guarded server bootstrap

Clone System

Clone cleanup registry with session-scoped ownership tagging
Clone contamination guard; clone_repo clones from remote URL
Deferred cleanup to end of pipeline in process-issues

Pretty Output Hook

Typed payload dispatch for PostToolUse hooks → Markdown-KV reformatter
Dedicated formatters for run_skill, run_cmd, test_check, merge_worktree, kitchen_status, clone_repo, load_recipe, open_kitchen, list_recipes

Bug Fixes

102 rectification PRs addressing:

Doctor-install disconnect; plugin cache destruction + startup race regression
run_cmd env stripping regression; parallel pipeline deadlock (signal handling)
Pre-queue routing block; AskUserQuestion guard session-scope immunity
Context-limit dirty-tree immunity; stale MCP direct entry lifecycle
Headless core crash path (structured error instead of raise)
Merge queue watcher inconclusive budget; MCP init race + wire format rejection
Queue ejection loop; false stale kills during background Bash tasks
Token cross-contamination via order_id scoping
Channel B drain-race recovery; session adjudication false-positive
Terminal guard ownership contract; stale hook infinite loop
And 90+ additional stability and correctness fixes

Test Suite

7 dedicated test improvement PRs (groupA–H): removed tautological tests, consolidated duplicates, fixed xdist isolation, corrected misleading names, eliminated over-mocking
130+ new test files added; comprehensive contract and skill compliance testing

Infrastructure

patch-bump-integration.yml workflow for auto-incrementing patch version on PR merge
version-bump.yml updated: minor bump on promote (X.Y.Z → X.Y+1.0)
api-simulator dev dependency (Rust/PyO3) requires Rust toolchain + GH_PAT
Documentation overhauled: topic-based layout with 30+ new docs

Breaking Changes

Migration 0.7.77-to-0.8.0 — Research Recipe Overhaul

write-report renamed to generate-report
output_mode default changed from implicit pr to local
commit_research_artifacts replaced by stage_bundle + finalize_bundle
vis-lens must appear in requires_packs

Migration 0.8.9-to-0.9.0 — Verdict-Gated CI Recovery

Auto-fix skills now declare typed verdict and fixes_applied outputs
Recipes must use on_result: verdict dispatch instead of unconditional on_success: re_push
New conditional-skill-ungated-push semantic rule (ERROR severity) enforces this

Other

run_skill exit_after_stop_delay_ms reduced from 120s to 2s
Quota guard config: single threshold replaced by dual-window model
Recipe schema additions: optional_context_refs, stale_threshold, idle_output_timeout, block, requires_packs

Merged PRs

PR	Title	Author	Labels
#442	Rectify: Init Gitignore Completeness Immunity	Trecek	-
#446	Implementation Plan: Auto-Merge Direct Merge Fallback (Issue #401)	Trecek	-
#450	Rectify: Token Summary Note-Protocol Immunity	Trecek	-
#451	Implementation Plan: Orchestrator Must Claim All Issues Upfront	Trecek	-
#452	Init — Require Explicit Opt-in to Bypass Missing Secret Scanning Hook	Trecek	-
#453	Rectify: Retry Reason Routing Blindness — PART A ONLY	Trecek	-
#454	Implementation Plan: Strict Schema Validation for config.yaml	Trecek	-
#458	Implementation Plan: open-pr-main Token Usage Summary	Trecek	-
#459	Implementation Plan: Stable Recipe Listing Order	Trecek	-
#460	Implementation Plan: Orchestrator Must Detect Merge Queue	Trecek	-
#463	Add Parallel Step Scheduling Rule to Sous-Chef Prompt	Trecek	-
#464	Implementation Plan: First-Run Detection and Guided Onboarding	Trecek	-
#465	Rectify: Structured Output Markdown Fragility — PART A ONLY	Trecek	-
#467	Rectify: Token Telemetry Contamination — PART A ONLY	Trecek	-
#472	Implementation Plan: Three-Way Merge Routing for autoMergeAllowed	Trecek	-
#473	Rectify: Root .gitignore Write Path — PART A ONLY	Trecek	-
#474	Rectify: Secret Scanning Gate Ordering — PART A ONLY	Trecek	-
#476	Rectify: Config Schema Contamination — PART A ONLY	Trecek	-
#478	Rectify: Order Parameter Table Breaks — PART A ONLY	Trecek	-
#482	Relocate temp artifact paths to .autoskillit/temp/	Trecek	-
#483	Rectify: Advisory Step Context-Limit Routing — PART B ONLY	Trecek	-
#484	Rectify: Structured Output Instruction Hardening — PART B ONLY	Trecek	-
#485	Rectify: Non-Blocking Dispatch Immunity — PART B ONLY	Trecek	-
#490	Rectify: format_ingredients_table GFM Width Cap	Trecek	-
#491	Rectify: Hardcoded origin in Skill Bash Blocks — PART A ONLY	Trecek	-
#492	Implementation Plan: open-pr Strips PART X ONLY Suffix	Trecek	-
#493	Defer Clone Cleanup Until All Parallel Pipelines Complete	Trecek	-
#495	Rectify: pretty_output Hook — Typed Payload Dispatch	Trecek	-
#500	Merge Queue Detection Should Validate merge_group Trigger	Trecek	-
#502	Rectify: Formatter Raw/Derived Field Duplication	Trecek	-
#505	Implementation Plan: Configurable Label Whitelist	Trecek	-
#508	Rectify: stdlib-only Contract for SKILL.md Python Blocks	Trecek	-
#510	Add --resume flag to cook and order CLI commands	Trecek	-
#511	Rectify: Terminal Ownership Contract	Trecek	-
#517	Replace Reverse-Sync Version Bumping with Minor-Bump-on-Promote	Trecek	-
#518	Detect Stale Installs — Doctor Check + Dev-Mode Install	Trecek	-
#519	Rectify: MergeQueueWatcher Terminal State via Negative Inference	Trecek	-
#520	Rectify: TestRunner Protocol Returns Lossy Bare Tuple	Trecek	-
#521	Implementation Plan: terminal_guard() Alternate Screen Buffer	Trecek	-
#528	Add validate-audit skill: parallel post-audit finding validation	Trecek	-
#530	Implementation Plan: Patch-Bump-Integration.yml Race Fix	Trecek	-
#531	Rectify: Session ID Resolution — Fragmented Sources	Trecek	-
#534	Rectify: Stale Hook Prompt Infinite Loop	Trecek	-
#535	Rectify: SkillResult.session_id Channel B Backfill	Trecek	-
#536	Rectify: terminal_guard() Exit-Only Ownership Contract	Trecek	-
#542	Rectify: MCP Tool Name Prefix Non-Determinism	Trecek	-
#543	Rectify: Headless Editable Install Poison	Trecek	-
#544	Rectify: push_to_remote Non-Fast-Forward Rejection	Trecek	-
#545	Rectify: Stale-Check Dismiss/Snooze State Split	Trecek	-
#546	Rectify: Pipeline Identity Layer	Trecek	-
#548	Fix cook --resume UnusedCliTokensError	Trecek	-
#549	Implementation Plan: Skill Pack Registry	Trecek	-
#551	Implementation Plan: token_summary_appender REST API	Trecek	-
#556	Fix Test Placement, Organization, Documentation (groupG)	Trecek	-
#557	Remove Tautological/Import-Only Tests — groupA	Trecek	-
#558	Consolidate and Remove Redundant Tests (groupE)	Trecek	-
#559	Strengthen Exception-Grade HIGH Findings (groupD)	Trecek	-
#560	Fix Misleading Test Names and Stale Logic (groupF)	Trecek	-
#561	Low-Severity Test Suite Fixes (groupH)	Trecek	-
#562	Remove Over-Mocked Tests and Fix State Mutation (groupC)	Trecek	-
#563	Fix xdist Isolation — Hardcoded /dev/shm and /tmp (groupB)	Trecek	-
#564	Resolve architectural audit findings (2026-03-28)	Trecek	-
#568	Rectify: Session Adjudication False-Positive	Trecek	-
#569	Rectify: Workspace Clean Worktree Discovery	Trecek	-
#570	Cohesion audit: server fixes, hook registry, docs	Trecek	-
#571	Reduce audit-arch false positives with pre-flight gates	Trecek	-
#573	Bug: prepare-issue Uses Summary Instead of Full Report	Trecek	-
#574	Token Telemetry Cross-Contamination — order_id Scoping	Trecek	-
#577	Split promote-to-main into Changelog + Review-Promotion	Trecek	-
#582	Bug: promote-to-main Incorrectly in Bundled Skills	Trecek	-
#583	Bundle Research Recipe from spectral-init	Trecek	-
#585	Recipe requires_packs Schema Extension	Trecek	-
#587	Bundle 19 Experimental Lens Skills	Trecek	-
#588	Add review-research-pr skill	Trecek	-
#594	Simplify Research Recipe — Single-Phase, Always-Decompose	Trecek	-
#595	Create resolve-research-review Skill	Trecek	-
#596	Create open-research-pr Skill	Trecek	-
#597	Evolve Experiment Plan Schema — YAML Frontmatter	Trecek	-
#598	Create review-design Skill	Trecek	-
#602	Rectify: review-design L1 Severity Calibration	Trecek	-
#606	Add resolve-design-review + Eliminate Terminal STOP Dead-End	Trecek	-
#611	process-issues — Defer Clone Cleanup to End of Pipeline	Trecek	-
#612	Move smoke-test recipe to project-local	Trecek	-
#613	Integrate api-simulator for Quota Guard E2E Testing	Trecek	-
#614	Add Red-Team Severity Calibration by Experiment Type	Trecek	-
#615	Split Token Summary Table Into 4 API Token Fields	Trecek	-
#616	Rectify: Zero-Write False Positive — Completion Token Contract	Trecek	-
#620	Research Recipe — Post-Review Re-Validation + Escalation	Trecek	-
#622	Channel B Drain-Race Recovery for Deferred type=result	Trecek	-
#623	Clone Contamination Guard	Trecek	-
#624	Test Session Failure Classification with api-simulator	Trecek	-
#625	Research Recipe — Post-Completion Archival	Trecek	-
#626	Fix git auth for private deps in version-bump workflows	Trecek	-
#628	Queue Ejection Loop Fix	Trecek	-
#630	Fix review-design Threshold + Scope Drift	Trecek	-
#633	Fix False Stale Kills During Background Bash Tasks	Trecek	-
#634	Default audit-impl to OFF in All Recipes	Trecek	-
#636	Research Recipe — Troubleshoot/Diagnose Skill	Trecek	-
#639	Rectify: Quota Guard Cache Refresh Lifecycle	Trecek	-
#640	Rectify: Sub-Skill Refusal Handling	Trecek	-
#642	Tier 2 Sub-Skill Dependency Activation	Trecek	-
#648	Archive Research Artifacts into .tar.gz	Trecek	-
#649	Rectify: Per-Invocation Completion Marker Isolation	Trecek	-
#650	Rectify: review-pr Posts Zero Inline Comments	Trecek	-
#651	Migrate to Rust-based api-simulator (PyO3 Rewrite)	Trecek	-
#656	Decompose open-research-pr into prepare + lens + compose	Trecek	-
#658	Rectify: research recipe archives wrong directory	Trecek	-
#659	Decompose open-pr into prepare-pr + lenses + compose-pr	Trecek	-
#660	Rewrite Smoke-Test as Lightweight E2E Sanity Check	Trecek	-
#661	Citation Integrity Gates for Research Pipeline	Trecek	-
#665	Rectify: CI Event Discrimination	Trecek	-
#668	Quota Guard Three-Layer Resilience	Trecek	-
#670	Rectify: Ghost Hook Registrations Survive Git Revert	Trecek	-
#671	Fix resolve-review misclassifies protocol deviations	Trecek	-
#675	Dynamic archive collection in research recipe	Trecek	-
#678	Fix VT100 reset terminal corruption	Trecek	-
#679	Move TOOL_CATEGORIES from L0 Core to L3 Server	Trecek	-
#682	Rectify: Stale-Hooks Check Infinite Prompt Loop	Trecek	-
#683	Rectify: Terminal Guard Reset Specification	Trecek	-
#684	Add RecordingSubprocessRunner	Trecek	-
#685	Add SequencingSubprocessRunner for Scenario Replay	Trecek	-
#687	Add Project-Scoped Full-Audit Recipe	Trecek	-
#691	Pin api-simulator Dependency	Trecek	-
#694	STOP Verdict Fail-Fast Gate — ADDRESSABLE Classification	Trecek	-
#695	Add Computational Complexity to Scope Skill	Trecek	-
#696	Add agent_implementability to review-design	Trecek	-
#699	Bump fastmcp and dynaconf Pins	Trecek	-
#701	Rectify: Smoke-Test Workspace Isolation	Trecek	-
#705	Rectify: merge_worktree Merges Into Wrong Branch	Trecek	-
#706	Rectify: Quota Guard Multi-Window Selection	Trecek	-
#708	Rectify: Quota Dataclass Type Boundary Enforcement	Trecek	-
#713	Documentation Overhaul — Topic-Based Layout	Trecek	-
#714	Configurable Temp Directory via Placeholder Substitution	Trecek	-
#715	Fix zero_writes False Positive for Research Skills	Trecek	-
#727	Wire McpRecordingMiddleware into MCP Server	Trecek	-
#728	Source-drift gate + open_kitchen envelope + quota cache versioning	Trecek	-
#731	Rectify: IDE Env Leak Across Subprocess Launch Sites	Trecek	-
#732	Single-Skill Mode for Headless Sessions	Trecek	-
#733	Per-window quota guard thresholds	Trecek	-
#734	Fix config-resolved ingredient defaults override	Trecek	-
#735	Rectify: Stdout Idle Watchdog + Bounded Suppression	Trecek	-
#736	Add Exception Whitelists to Project-Local Audit Skills	Trecek	-
#737	Validated Audit Report — arch Remediation	Trecek	-
#740	Rectify: prepare-issue Validated-Report Pipeline	Trecek	-
#746	Validated Audit Report — cohesion	Trecek	-
#748	Validated Audit Report — tests	Trecek	-
#749	Rectify: SIGTERM Bypasses atexit	Trecek	-
#750	Rectify: Channel B Blind Spot + AskUserQuestion Deviation	Trecek	-
#751	Rectify: Session-Bridge File Outside temp/	Trecek	-
#752	Contract Recovery Nudge	Trecek	-
#753	Consolidate quota guard threshold configuration	Trecek	-
#754	Investigate Historical Recurrence Check	Trecek	-
#761	Clone Cleanup Registry — Session-Scoped Ownership	Trecek	-
#762	Fix route_queue_mode autoMergeAllowed=false	Trecek	-
#764	Rectify: Pre-Kitchen AskUserQuestion Gate	Trecek	-
#766	Rectify: Clone Registry Batch Delete Write-Back	Trecek	-
#767	Detect D-state and High-CPU Anomalies	Trecek	-
#768	CLI Update Prompts — Single Source of Truth	Trecek	-
#769	Per-window enable/disable for quota_guard	Trecek	-
#776	Rectify: Trace Identity Contract	Trecek	-
#781	vis-lens family, plan-visualization, output_mode	Trecek	-
#783	Rectify: CLI Startup Update Prompt Freeze Immunity	Trecek	-
#793	idle_output_timeout as Per-Step Recipe Override	Trecek	-
#794	Generalize Scope Skill for Non-Code Research	Trecek	-
#795	Experiment Type Registry — YAML-Driven, Extensible	Trecek	-
#796	Generalize generate-report for Non-Software Research	Trecek	-
#797	Ensure Research Experiments Include Test Infrastructure	Trecek	-
#798	Data Staging and Resource Planning Skill	Trecek	-
#799	Research Environment Isolation — Containerized Execution	Trecek	-
#803	Rectify: MCP Server Startup Race	Trecek	-
#808	Rectify: channel_won Unconditional SIGKILL	Trecek	-
#809	Rectify: PTY Wrapper Tracer PID Resolution	Trecek	-
#812	Rectify: clone_repo Silent Local-Transport Fallback	Trecek	-
#813	Rectify: Merge Queue Classifier Immunity	Trecek	-
#818	Rectify: Pre-Queue Routing Block Immunity	Trecek	-
#821	Rectify: Parallel Pipeline Deadlock — Signal Handling	Trecek	-
#860	Cheap Rebase Gate Before Conflict-Resolution Skill	Trecek	-
#861	resolve-failures: CI source of truth; kill bypass	Trecek	-
#862	resolve-failures — Test Polling Cascade	Trecek	-
#893	Generalize headless narration suppression	Trecek	-
#896	Rectify: Stale MCP Direct Entry Lifecycle	Trecek	-
#897	Rectify: Headless Core Crash Path	Trecek	-
#899	GitHub API resilience — token_factory fallback	Trecek	-
#900	Rectify: AskUserQuestion Guard Session-Scope Immunity	Trecek	-
#901	Rectify: Context-Limit Dirty-Tree Immunity	Trecek	-
#903	clone_repo: clone from remote URL	Trecek	-
#905	Auto-derive step_name when recording is active	Trecek	-
#906	run_cmd bypasses recorder during recording	Trecek	-
#908	Strip %%ORDER_UP%% — Prompt-Level Injection	Trecek	-
#909	Rectify: Doctor Plugin Cache + Startup Race	Trecek	-
#910	Inject MAX_MCP_OUTPUT_TOKENS into Order Sessions	Trecek	-
#917	skill_cmd_guard Suggests Relocating Extra Context	Trecek	-
#918	Rectify: Merge Queue Watcher Inconclusive Budget	Trecek	-
#920	Rectify: MCP Init Race + Wire Format Rejection	Trecek	-
#921	Fix run_cmd env stripping regression	Trecek	-
#922	Inject MAX_MCP_OUTPUT_TOKENS at builder level	Trecek	-
#923	Add MCP tool to disable quota guard	Trecek	-
#924	Rectify: Doctor-Install Disconnect	Trecek	-

Attention Required

api-simulator users: Rust toolchain + GH_PAT environment variable required to build
Custom research recipes: Apply migration 0.7.77-to-0.8.0 — failure defaults to local mode (no PR creation)
Pipeline recipes with CI recovery: Apply migration 0.8.9-to-0.9.0 — unrouted pushes now fail validation
Quota guard config: Old single-threshold format no longer valid; update to dual-window schema
run_skill callers: exit_after_stop_delay_ms reduced from 120s to 2s

Architecture Impact

Module Dependency (Structural — "How are modules coupled?")

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%%
graph TB
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;

    subgraph L3 ["L3 — APPLICATION"]
        direction LR
        SERVER["● server/<br/>━━━━━━━━━━<br/>FastMCP tools (18 files)<br/>Fan-in: 17"]
        CLI["● cli/<br/>━━━━━━━━━━<br/>CLI entry points (22 files)<br/>★ _onboarding, _update_checks<br/>★ _serve_guard, _terminal"]
    end

    subgraph L2 ["L2 — DOMAIN"]
        direction LR
        RECIPE["● recipe/<br/>━━━━━━━━━━<br/>Schema + validation (35 files)<br/>Fan-in: 40<br/>★ rules_blocks, rules_packs<br/>★ experiment_type_registry"]
        MIGRATION["● migration/<br/>━━━━━━━━━━<br/>Version migrations (5 files)<br/>★ 0.7.77→0.8.0<br/>★ 0.8.9→0.9.0"]
    end

    subgraph L1 ["L1 — SERVICES"]
        direction LR
        CONFIG["● config/<br/>━━━━━━━━━━<br/>Dynaconf settings (3 files)<br/>Fan-in: 20"]
        PIPELINE["● pipeline/<br/>━━━━━━━━━━<br/>DI + telemetry (9 files)<br/>Fan-in: 14<br/>★ background.py"]
        EXECUTION["● execution/<br/>━━━━━━━━━━<br/>Headless + process (21 files)<br/>Fan-in: 19<br/>★ recording, clone_guard<br/>★ _headless_scan"]
        WORKSPACE["● workspace/<br/>━━━━━━━━━━<br/>Clone + skills (7 files)<br/>Fan-in: 14<br/>★ clone_registry<br/>★ worktree"]
    end

    subgraph L0 ["L0 — FOUNDATION"]
        direction LR
        CORE["● core/<br/>━━━━━━━━━━<br/>Types + IO (15 files)<br/>Fan-in: 109<br/>★ _claude_env, readiness<br/>★ kitchen_state"]
    end

    subgraph STANDALONE ["STANDALONE — HOOKS"]
        direction LR
        HOOKS["● hooks/<br/>━━━━━━━━━━<br/>Pre/PostToolUse (19 files)<br/>★ pretty_output_hook<br/>★ quota_post_hook<br/>★ token_summary_hook"]
        HOOKREG["● hook_registry.py"]
    end

    subgraph EXT ["EXTERNAL"]
        direction LR
        FASTMCP["fastmcp"]
        HTTPX["httpx"]
        ANYIO["anyio"]
    end

    SERVER -->|"recipe, migration"| RECIPE
    SERVER -->|"migration"| MIGRATION
    CLI -->|"recipe, migration"| RECIPE
    SERVER -->|"pipeline, execution,<br/>workspace, config"| PIPELINE
    CLI -->|"config, execution,<br/>workspace"| CONFIG
    SERVER -->|"core (12 files)"| CORE
    CLI -->|"core (10 files)"| CORE
    RECIPE -->|"core (20 files)"| CORE
    MIGRATION -->|"core (3 files)"| CORE
    RECIPE -.->|"workspace (deferred)"| WORKSPACE
    CONFIG -->|"core"| CORE
    PIPELINE -->|"core"| CORE
    EXECUTION -->|"core"| CORE
    WORKSPACE -->|"core"| CORE
    EXECUTION -.->|"config (deferred)"| CONFIG
    HOOKREG -->|"core"| CORE
    HOOKS -->|"hook_registry"| HOOKREG
    SERVER -.->|"⚠ cli (hard L3→L3)"| CLI
    SERVER --> FASTMCP
    EXECUTION --> HTTPX
    CLI --> ANYIO

    class SERVER,CLI cli;
    class RECIPE,MIGRATION phase;
    class CONFIG,PIPELINE,EXECUTION,WORKSPACE handler;
    class CORE stateNode;
    class HOOKS,HOOKREG newComponent;
    class FASTMCP,HTTPX,ANYIO integration;

Process Flow (Physiological — "How does it behave?")

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 55, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;

    START([START])
    COMPLETE([RECIPE COMPLETE])
    ESCALATE([ESCALATE TO USER])

    subgraph KITCHEN ["★ Kitchen Lifecycle"]
        OPEN["● open_kitchen<br/>━━━━━━━━━━<br/>Prime quota cache<br/>Start refresh loop"]
        LOAD["● load_recipe<br/>━━━━━━━━━━<br/>YAML → Recipe"]
    end

    subgraph SOUSCHEF ["● Sous-Chef Loop"]
        STEP{"● Step Eval<br/>━━━━━━━━━━<br/>skip? retries?"}
        QUOTA{"★ Quota Gate<br/>━━���━━━━━━━<br/>Dual-window"}
        DISPATCH["● run_skill"]
    end

    subgraph HEADLESS ["● Headless Session"]
        SPAWN["● run_managed_async<br/>━━━━━━━━━━<br/>anyio task group"]
        RACE{"● Channel Race<br/>━━━━━━━━━━<br/>A: stdout | B: JSONL<br/>★ idle_timeout"}
        CLASSIFY["● Result Classification<br/>━━━━━━━━━━<br/>★ Recovery pipeline<br/>★ Zero-write gate"]
    end

    subgraph VERDICT ["★ Verdict Routing"]
        ROUTE{"● on_result<br/>━━━━━━━━━━<br/>Typed dispatch"}
        REAL_FIX["re_push"]
        GREEN["★ pre_resolve_rebase"]
        HUMAN["release_issue_failure"]
        CI["● wait_for_ci"]
    end

    subgraph MERGE ["● Merge Workflow"]
        MERGE_EVAL{"● route_queue_mode<br/>━━━━━━━━━━<br/>★ 3-way routing"}
        QUEUE["queue path"]
        DIRECT["direct merge"]
        IMMEDIATE["★ immediate"]
    end

    START --> OPEN --> LOAD --> STEP
    STEP -->|"skip=false"| QUOTA
    QUOTA -->|"allowed"| DISPATCH
    QUOTA -->|"★ blocked"| QUOTA
    DISPATCH --> SPAWN --> RACE --> CLASSIFY
    CLASSIFY -->|"success"| ROUTE
    CLASSIFY -->|"needs_retry"| STEP
    CLASSIFY -->|"budget exhausted"| ESCALATE
    ROUTE -->|"on_success"| MERGE_EVAL
    ROUTE -->|"★ real_fix"| REAL_FIX
    ROUTE -->|"★ already_green"| GREEN
    ROUTE -->|"★ flake/ci_only"| HUMAN
    REAL_FIX --> CI
    GREEN --> CI
    CI -->|"green"| MERGE_EVAL
    CI -->|"failure"| ROUTE
    HUMAN --> ESCALATE
    MERGE_EVAL -->|"queue+trigger"| QUEUE
    MERGE_EVAL -->|"auto OK"| DIRECT
    MERGE_EVAL -->|"★ neither"| IMMEDIATE
    QUEUE --> COMPLETE
    DIRECT --> COMPLETE
    IMMEDIATE --> COMPLETE

    class START,COMPLETE,ESCALATE terminal;
    class OPEN,LOAD,DISPATCH,SPAWN,CLASSIFY handler;
    class STEP,QUOTA,RACE,MERGE_EVAL stateNode;
    class ROUTE,CI phase;
    class REAL_FIX,GREEN,HUMAN,QUEUE,DIRECT,IMMEDIATE newComponent;

Legend: Dark Blue = Terminal | Teal = Decisions | Orange = Processing | Purple = Verdict routing | Green = New components

C4 Container (Anatomical — "How is it built?")

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
graph TB
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;

    USER(["Developer<br/>━━━━━━━━━━<br/>Claude Code user"])

    subgraph APP ["APPLICATION"]
        direction LR
        CLI_APP["● CLI<br/>━━━━━━━━━━<br/>cyclopts + anyio<br/>★ onboarding, update"]
        MCP["● MCP Server<br/>━━━━━━━━━━<br/>FastMCP v3 (stdio)<br/>★ wire_compat, lifespan"]
        CHEF["● Sous-Chef<br/>━━━━━━━━━━<br/>Tier 1 Claude<br/>★ wavefront, verdicts"]
    end

    subgraph HOOKS_L ["★ HOOKS"]
        direction LR
        PRE["★ PreToolUse<br/>━━━━━━━━━━<br/>quota_guard<br/>ask_user_guard"]
        POST["★ PostToolUse<br/>━━━━━━━━━━<br/>pretty_output<br/>quota_post, token_summary"]
    end

    subgraph DOMAIN ["DOMAIN"]
        direction LR
        RECIPE["● Recipe Engine<br/>━━━━━━━━━━<br/>igraph + YAML<br/>★ 7 new rule modules<br/>★ experiment types"]
        MIGR["● Migration<br/>━━━━━━━━━━<br/>★ 0.7.77→0.8.0<br/>★ 0.8.9→0.9.0"]
    end

    subgraph SERVICE ["SERVICES"]
        direction LR
        EXEC["● Execution<br/>━━━━━━━━━━<br/>anyio + psutil<br/>★ recording, anomaly<br/>★ idle timeout"]
        WS["● Workspace<br/>━━━━━━━━━━<br/>★ clone_registry<br/>★ worktree"]
        PIPE["● Pipeline DI<br/>━━━━━━━━━━<br/>★ background"]
        CONF["● Config<br/>━━━━━━━━━━<br/>dynaconf<br/>★ dual-window quota"]
    end

    subgraph FOUND ["FOUNDATION"]
        CORE["● Core<br/>━━━━━━━━━━<br/>structlog + PyYAML<br/>★ _claude_env, readiness"]
    end

    subgraph STORE ["STORAGE"]
        direction LR
        RECIPES[("● Recipes<br/>━━━━━━━━━━<br/>★ research.yaml<br/>★ experiment-types/")]
        SKILLS[("● Skills<br/>━━━━━━━━━━<br/>120+ SKILL.md<br/>★ 60+ new")]
        LOGS[("Session Logs")]
        CACHE[("★ Quota Cache")]
    end

    subgraph EXT ["EXTERNAL"]
        direction LR
        CLAUDE["Claude CLI<br/>━━━━━━━━━━<br/>subprocess + PTY"]
        GH["GitHub API<br/>━━━━━━━━━━<br/>REST + GraphQL"]
        ANTH["Anthropic API<br/>━━━━━━━━━━<br/>Token quota"]
    end

    USER -->|"CLI / MCP stdio"| CLI_APP
    CLI_APP -->|"starts"| MCP
    MCP -->|"injects"| CHEF
    CHEF -->|"MCP tools"| MCP
    CHEF -.->|"intercept"| PRE
    MCP -.->|"intercept"| POST
    MCP -->|"loads"| RECIPE
    MCP -->|"migrates"| MIGR
    MCP -->|"spawns"| EXEC
    MCP -->|"isolates"| WS
    MCP -->|"injects"| PIPE
    MCP -->|"reads"| CONF
    RECIPE --> CORE
    EXEC --> CORE
    WS --> CORE
    CONF --> CORE
    PIPE --> CORE
    EXEC -->|"reads"| RECIPES
    WS -->|"reads"| SKILLS
    EXEC -->|"writes"| LOGS
    EXEC -->|"reads/writes"| CACHE
    PRE -->|"reads"| CACHE
    EXEC -->|"subprocess"| CLAUDE
    EXEC -->|"CI/merge queue"| GH
    EXEC -->|"quota"| ANTH
    WS -->|"git"| GH

    class USER,CLI_APP,MCP,CHEF cli;
    class RECIPE,MIGR phase;
    class EXEC,WS,PIPE,CONF handler;
    class CORE stateNode;
    class PRE,POST newComponent;
    class RECIPES,SKILLS,LOGS,CACHE output;
    class CLAUDE,GH,ANTH integration;

Closes #401
Closes #427
Closes #429
Closes #439
Closes #440
Closes #441
Closes #444
Closes #445
Closes #447
Closes #448
Closes #449
Closes #456
Closes #457
Closes #461
Closes #462
Closes #466
Closes #468
Closes #469
Closes #470
Closes #471
Closes #475
Closes #477
Closes #480
Closes #481
Closes #486
Closes #487
Closes #488
Closes #494
Closes #498
Closes #499
Closes #503
Closes #504
Closes #506
Closes #507
Closes #509
Closes #512
Closes #513
Closes #514
Closes #516
Closes #522
Closes #524
Closes #525
Closes #526
Closes #527
Closes #529
Closes #532
Closes #533
Closes #537
Closes #538
Closes #539
Closes #540
Closes #541
Closes #547
Closes #550
Closes #553
Closes #554
Closes #555
Closes #565
Closes #566
Closes #567
Closes #572
Closes #576
Closes #579
Closes #589
Closes #590
Closes #591
Closes #592
Closes #593
Closes #599
Closes #600
Closes #601
Closes #603
Closes #604
Closes #605
Closes #607
Closes #608
Closes #609
Closes #610
Closes #617
Closes #618
Closes #619
Closes #621
Closes #627
Closes #629
Closes #631
Closes #632
Closes #635
Closes #637
Closes #638
Closes #641
Closes #643
Closes #644
Closes #646
Closes #647
Closes #652
Closes #653
Closes #655
Closes #657
Closes #662
Closes #663
Closes #664
Closes #666
Closes #669
Closes #672
Closes #673
Closes #676
Closes #680
Closes #681
Closes #686
Closes #688
Closes #690
Closes #692
Closes #693
Closes #697
Closes #698
Closes #700
Closes #703
Closes #704
Closes #707
Closes #710
Closes #711
Closes #712
Closes #716
Closes #717
Closes #718
Closes #719
Closes #721
Closes #723
Closes #724
Closes #725
Closes #729
Closes #739
Closes #741
Closes #742
Closes #744
Closes #745
Closes #747
Closes #755
Closes #756
Closes #757
Closes #758
Closes #759
Closes #760
Closes #763
Closes #771
Closes #774
Closes #775
Closes #777
Closes #778
Closes #784
Closes #785
Closes #786
Closes #787
Closes #788
Closes #789
Closes #790
Closes #801
Closes #802
Closes #804
Closes #805
Closes #806
Closes #807
Closes #811
Closes #814
Closes #815
Closes #816
Closes #817
Closes #819
Closes #859
Closes #892
Closes #894
Closes #895
Closes #902
Closes #904
Closes #907
Closes #911
Closes #912
Closes #913
Closes #914
Closes #915
Closes #916
Closes #919

_{Generated with Claude Code via AutoSkillit}

…rminal STOP Dead-End (#606) ## Summary The research recipe's `review_design` step currently hard-routes `verdict=STOP` directly to `design_rejected` (pipeline halt), bypassing any analysis of whether the stop triggers are actually fixable. This causes unnecessary pipeline deaths when stop triggers are mechanical methodological flaws with concrete fixes (as shown in TalonT-Org/spectral-init#222). This plan adds: 1. A new `resolve-design-review` skill that triages each stop-trigger finding as `ADDRESSABLE`, `STRUCTURAL`, or `DISCUSS` using parallel feasibility-validation subagents, then emits either `resolution=revised` (loop back for revision) or `resolution=failed` (genuinely terminal) 2. A new `resolve_design_review` recipe step in `research.yaml` that routes `STOP → resolve_design_review` instead of directly to `design_rejected` 3. A skill contract entry for `resolve-design-review` in `skill_contracts.yaml` 4. Updated tests: fix the existing STOP-routing assertion and add new tests for the step and skill ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; %% TERMINALS %% START([START]) REJECTED([design_rejected action: stop]) EXEC([create_worktree → Execution Phase]) subgraph DesignPhase ["Research Recipe — Design Phase"] direction TB scope["scope ━━━━━━━━━━ Scope research question"] plan["plan_experiment ━━━━━━━━━━ Plan experiment (receives revision_guidance)"] review["● review_design ━━━━━━━━━━ Validate plan retries: 2"] revise["revise_design ━━━━━━━━━━ Route → plan_experiment"] rdr["★ resolve_design_review ━━━━━━━━━━ Triage STOP findings retries: 1"] triage{"★ Triage ━━━━━━━━━━ Any ADDRESSABLE or DISCUSS?"} end %% FLOW %% START --> scope scope --> plan plan --> review review -->|"verdict=GO"| EXEC review -->|"verdict=REVISE"| revise revise --> plan review -->|"● verdict=STOP (was: design_rejected)"| rdr rdr --> triage triage -->|"resolution=revised any ADDRESSABLE/DISCUSS"| revise triage -->|"resolution=failed all STRUCTURAL"| REJECTED %% CLASS ASSIGNMENTS %% class START,REJECTED,EXEC terminal; class scope,plan handler; class review,revise stateNode; class rdr,triage newComponent; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | START, design_rejected halt, create_worktree handoff | | Orange | Handler | Existing processing steps (scope, plan_experiment) | | Teal | State | Existing routing/decision nodes (review_design, revise_design) | | Green | New Component | ★ New resolve_design_review step + triage logic | Closes #605 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260404-132147-193877/.autoskillit/temp/make-plan/resolve_design_review_plan_2026-04-04_132804.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | input | output | cached | count | time | |------|-------|--------|--------|-------|------| | plan | 39 | 23.7k | 1.5M | 1 | 8m 6s | | verify | 23 | 12.0k | 937.2k | 1 | 4m 21s | | implement | 56 | 16.1k | 2.7M | 1 | 7m 30s | | fix | 25 | 9.1k | 879.4k | 1 | 5m 58s | | audit_impl | 17 | 14.8k | 356.6k | 1 | 5m 57s | | open_pr | 24 | 12.9k | 799.4k | 1 | 4m 41s | | **Total** | 184 | 88.6k | 7.2M | | 36m 35s | --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…ipeline (#611) ## Summary Every recipe (`implementation`, `remediation`, `implementation-groups`, `merge-prs`) previously had an interactive `confirm_cleanup` prompt at its terminal step. When `process-issues` drives batch processing, this halted the pipeline waiting for user input. A `defer_cleanup` flag was designed to bypass it, but made "interrupt the pipeline" the default and "don't interrupt" the opt-in. The fix: remove the interactive cleanup path entirely from all recipes. Every terminal step unconditionally calls `register_clone_status` (success or failure), writing to a shared registry file. After all issues in `process-issues` complete, a single `batch_cleanup_clones` call deletes all success-status clones and preserves all error-status clones. No prompts. No flags. No per-issue decisions. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 55, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; START([● process-issues starts batch]) subgraph PerIssue ["Per-Issue Recipe (× N issues)"] direction TB RECIPE["● Recipe Pipeline ━━━━━━━━━━ implementation / remediation implementation-groups / merge-prs plan → implement → test → push → PR → wait"] OUTCOME{"terminal outcome?"} REL_S["● release_issue_success ━━━━━━━━━━ release GitHub issue claim on_success/on_failure → register"] REL_F["● release_issue_failure ━━━━━━━━━━ release on error on_success/on_failure → register_failure"] REG_S["● register_clone_success ━━━━━━━━━━ register_clone_status status='success' on_success/on_failure → done"] REG_F["● register_clone_failure ━━━━━━━━━━ register_clone_status status='error' on_success/on_failure → escalate_stop"] DONE["● done ━━━━━━━━━━ action: stop (success)"] FAIL["● escalate_stop ━━━━━━━━━━ action: stop (failure)"] end REGISTRY[("● clone-cleanup-registry.json ━━━━━━━━━━ .autoskillit/temp/ accumulated entries")] subgraph PostBatch ["● After ALL Batches Complete (process-issues Step 3d)"] direction LR BATCH["● batch_cleanup_clones ━━━━━━━━━━ reads registry deletes status=success clones preserves status=error clones no prompt, one call"] PRESERVED["preserved clones ━━━━━━━━━━ status=error kept for investigation"] DELETED["deleted clones ━━━━━━━━━━ status=success removed disk reclaimed"] end END_OK([COMPLETE]) START --> RECIPE RECIPE --> OUTCOME OUTCOME -->|"success path"| REL_S OUTCOME -->|"failure path"| REL_F REL_S --> REG_S REL_F --> REG_F REG_S -->|"writes status=success"| REGISTRY REG_F -->|"writes status=error"| REGISTRY REG_S --> DONE REG_F --> FAIL DONE -->|"after all issues done"| BATCH FAIL -->|"after all issues done"| BATCH BATCH -->|"reads registry"| REGISTRY BATCH --> PRESERVED BATCH --> DELETED DELETED --> END_OK PRESERVED --> END_OK class START,END_OK terminal; class RECIPE handler; class OUTCOME stateNode; class REL_S,REL_F phase; class REG_S,REG_F,BATCH newComponent; class DONE phase; class FAIL detector; class REGISTRY stateNode; class PRESERVED,DELETED output; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | Start and end states | | Orange | Handler | Recipe pipeline execution | | Teal | State | Decision routing and registry storage | | Purple | Phase | Control flow nodes (release, done) | | Green | New/Modified | ● Modified steps (register, batch cleanup) | | Red | Detector | Failure terminal (escalate_stop) | | Dark Teal | Output | Clone disposition artifacts | ### State Lifecycle Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; START([Pipeline Terminal Step]) subgraph WritePath ["● WRITE: Recipe Terminal Registration (once per clone)"] direction LR REG_S["● register_clone_success ━━━━━━━━━━ INIT_ONLY write status='success' clone_path (immutable)"] REG_F["● register_clone_failure ━━━━━━━━━━ INIT_ONLY write status='error' clone_path (immutable)"] end subgraph Registry ["● Registry File — APPEND_ONLY during run"] direction TB ENTRY["● clone-cleanup-registry.json ━━━━━━━━━━ entries: [{clone_path, status, step_name, timestamp}] written N times (once per clone) never mutated after write"] end subgraph ReadPath ["● READ: Batch Cleanup (once, post-run)"] direction LR BATCH["● batch_cleanup_clones ━━━━━━━━━━ reads all entries partitions by status"] GATE{"status?"} DEL["delete clone dir ━━━━━━━━━━ status=success disk reclaimed"] KEEP["preserve clone dir ━━━━━━━━━━ status=error for investigation"] end subgraph Contracts ["Contract Cards (recipe input contracts)"] direction LR C1["★ contracts/implementation-groups.yaml ━━━━━━━━━━ NEW — no defer_cleanup no registry_path"] C2["● contracts/implementation.yaml ━━━━━━━━━━ updated — removed defer_cleanup, registry_path"] C3["● contracts/remediation.yaml ━━━━━━━━━━ updated — removed defer_cleanup, registry_path"] C4["● contracts/merge-prs.yaml ━━━━━━━━━━ updated — removed defer_cleanup registry_path, keep_clone_on_failure"] end ELIMINATED["ELIMINATED state ━━━━━━━━━━ defer_cleanup ingredient registry_path ingredient keep_clone_on_failure ingredient check_defer_cleanup step confirm_cleanup step"] END_OK([COMPLETE]) START -->|"success terminal"| REG_S START -->|"failure terminal"| REG_F REG_S -->|"appends entry"| ENTRY REG_F -->|"appends entry"| ENTRY ENTRY -->|"read once post-run"| BATCH BATCH --> GATE GATE -->|"status=success"| DEL GATE -->|"status=error"| KEEP DEL --> END_OK KEEP --> END_OK C1 -.->|"contract enforces"| REG_S C2 -.->|"contract enforces"| REG_S C3 -.->|"contract enforces"| REG_S C4 -.->|"contract enforces"| REG_S ELIMINATED -.->|"no longer written"| ENTRY class START,END_OK terminal; class REG_S,REG_F,BATCH newComponent; class ENTRY stateNode; class GATE stateNode; class DEL,KEEP output; class C1 phase; class C2,C3,C4 phase; class ELIMINATED detector; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | Pipeline start and end | | Green | ● Modified / New | register steps and batch cleanup (this PR) | | Teal | State | Registry file and status decision | | Purple | Phase | Contract card files | | Dark Teal | Output | Clone disposition outcomes | | Red | Eliminated | State that no longer exists | Closes #610 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260404-185031-682892/.autoskillit/temp/make-plan/process_issues_defer_clone_cleanup_plan_2026-04-04_000000.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | input | output | cached | count | time | |------|-------|--------|--------|-------|------| | plan | 5.5k | 76.5k | 6.0M | 5 | 32m 41s | | verify | 3.1k | 86.2k | 5.4M | 5 | 31m 25s | | implement | 1.1k | 116.2k | 22.6M | 6 | 50m 55s | | fix | 214 | 28.4k | 3.5M | 5 | 30m 58s | | audit_impl | 137 | 58.9k | 3.1M | 5 | 19m 28s | | open_pr | 36 | 16.9k | 1.4M | 1 | 6m 15s | | **Total** | 10.1k | 383.2k | 42.0M | | 2h 51m | --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…ocal (#612) ## Summary Move `smoke-test.yaml` and its companion artifacts (contract card, flow diagram) from the bundled `src/autoskillit/recipes/` directory to the project-local `.autoskillit/recipes/` directory. This makes smoke-test invisible to end-user projects while remaining fully functional when running from the AutoSkillit repository root. The existing project-local recipe discovery mechanism already supports this — no production code changes are needed. All changes are file relocations and test updates. ## Requirements ### MOVE — Recipe File Relocation - **REQ-MOVE-001:** The file `src/autoskillit/recipes/smoke-test.yaml` must be relocated to `.autoskillit/recipes/smoke-test.yaml` at the project root. - **REQ-MOVE-002:** Associated contract card(s) in `src/autoskillit/recipes/contracts/` matching `smoke-test*` must be relocated to `.autoskillit/recipes/contracts/`. - **REQ-MOVE-003:** Associated diagram(s) in `src/autoskillit/recipes/diagrams/` matching `smoke-test*` must be relocated to `.autoskillit/recipes/diagrams/`. ### LIST — Listing Behavior - **REQ-LIST-001:** The smoke-test recipe must not appear in `list_recipes` output when the current working directory is outside the AutoSkillit repository. - **REQ-LIST-002:** The smoke-test recipe must appear in `list_recipes` output with source `PROJECT` when the current working directory is the AutoSkillit repository root. ### LOAD — Pipeline Compatibility - **REQ-LOAD-001:** `load_recipe("smoke-test")` must succeed when invoked from the AutoSkillit repository root. - **REQ-LOAD-002:** Existing smoke-test pipeline execution must remain functionally identical after the move. ### TEST — Test Updates - **REQ-TEST-001:** Tests that assert smoke-test has `RecipeSource.BUILTIN` must be updated to assert `RecipeSource.PROJECT`. - **REQ-TEST-002:** Tests that count the number of bundled recipes must be updated to reflect the removal of smoke-test from the bundled set. ## Architecture Impact ### Operational Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; START(["list_recipes / find_recipe_by_name called"]) subgraph ProjectLocal ["★ PROJECT-LOCAL SCAN (priority 1)"] direction TB PROJ_DIR["★ .autoskillit/recipes/ ━━━━━━━━━━ source = PROJECT ★ smoke-test.yaml (moved here)"] PROJ_CONTRACT["★ .autoskillit/recipes/contracts/ ━━━━━━━━━━ ★ smoke-test.yaml"] PROJ_DIAGRAM["★ .autoskillit/recipes/diagrams/ ━━━━━━━━━━ ★ smoke-test.md"] end subgraph Bundled ["BUNDLED SCAN (priority 2)"] direction TB BUILTIN_DIR["src/autoskillit/recipes/ ━━━━━━━━━━ source = BUILTIN implementation, remediation, merge-prs, impl-groups (smoke-test removed)"] end DEDUP["Dedup via seen set ━━━━━━━━━━ Project names shadow bundled"] subgraph AutoskillitRepo ["AUTOSKILLIT REPO CONTEXT"] direction TB CLI_LIST["● autoskillit recipes list ━━━━━━━━━━ Shows smoke-test (source: project)"] CLI_ORDER["autoskillit order ━━━━━━━━━━ Pipeline execution menu"] CLI_RENDER["autoskillit recipes render ━━━━━━━━━━ _recipes_dir_for(PROJECT) → .autoskillit/recipes/diagrams/"] end subgraph ExternalProject ["EXTERNAL PROJECT CONTEXT"] direction TB EXT_LIST["autoskillit recipes list ━━━━━━━━━━ smoke-test NOT visible (no project-local copy)"] end START --> PROJ_DIR PROJ_DIR --> DEDUP DEDUP --> BUILTIN_DIR PROJ_DIR --> PROJ_CONTRACT PROJ_DIR --> PROJ_DIAGRAM DEDUP --> CLI_LIST DEDUP --> CLI_ORDER CLI_RENDER --> PROJ_DIAGRAM DEDUP --> EXT_LIST class START terminal; class PROJ_DIR,PROJ_CONTRACT,PROJ_DIAGRAM newComponent; class BUILTIN_DIR stateNode; class DEDUP handler; class CLI_LIST,CLI_ORDER,CLI_RENDER cli; class EXT_LIST detector; ``` ### Module Dependency Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%% graph TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; subgraph Tests ["TESTS (modified ●)"] direction TB T_SMOKE["● test_smoke_pipeline.py ━━━━━━━━━━ uses SMOKE_SCRIPT → project-local path"] T_BUNDLED["● test_bundled_recipes.py ━━━━━━━━━━ smoke_yaml fixture → project-local path"] T_POLICY["● test_bundled_recipe_hidden_policy.py ━━━━━━━━━━ BUNDLED_RECIPE_NAMES smoke-test removed"] T_TOOLS["● test_tools_recipe.py ━━━━━━━━━━ list_recipes assertion smoke-test NOT in bundled"] T_ENGINE["● test_engine.py ━━━━━━━━━━ contract adapter test → project-local path"] end subgraph L3 ["L3 — SERVER"] direction TB TOOLS_RECIPE["server.tools_recipe ━━━━━━━━━━ list_recipes, load_recipe validate_recipe"] end subgraph L2R ["L2 — RECIPE"] direction TB RECIPE_IO["recipe.io ━━━━━━━━━━ builtin_recipes_dir() list_recipes()"] RECIPE_VALIDATOR["recipe.validator ━━━━━━━━━━ run_semantic_rules analyze_dataflow"] RECIPE_CONTRACTS["recipe.contracts ━━━━━━━━━━ load_bundled_manifest"] end subgraph L2M ["L2 — MIGRATION"] direction TB MIG_ENGINE["migration.engine ━━━━━━━━━━ default_migration_engine contract adapters"] end subgraph L0 ["L0 — CORE"] direction TB CORE_PATHS["core.paths ━━━━━━━━━━ pkg_root() → bundled dir fan-in: all layers"] end subgraph Artifacts ["★ PROJECT-LOCAL ARTIFACTS (new)"] direction TB PROJ_RECIPE["★ .autoskillit/recipes/ ━━━━━━━━━━ smoke-test.yaml"] PROJ_CONTRACT["★ .autoskillit/recipes/contracts/ ━━━━━━━━━━ smoke-test.yaml"] PROJ_DIAGRAM["★ .autoskillit/recipes/diagrams/ ━━━━━━━━━━ smoke-test.md"] end T_SMOKE -->|"imports"| TOOLS_RECIPE T_SMOKE -->|"imports"| RECIPE_IO T_BUNDLED -->|"imports"| RECIPE_IO T_BUNDLED -->|"imports"| RECIPE_CONTRACTS T_POLICY -->|"imports"| CORE_PATHS T_TOOLS -->|"imports"| TOOLS_RECIPE T_ENGINE -->|"imports"| CORE_PATHS T_ENGINE -->|"imports"| MIG_ENGINE TOOLS_RECIPE -->|"imports"| RECIPE_IO RECIPE_IO -->|"builtin_recipes_dir()"| CORE_PATHS RECIPE_VALIDATOR -->|"imports"| RECIPE_IO RECIPE_CONTRACTS -->|"imports"| RECIPE_IO MIG_ENGINE -->|"imports"| CORE_PATHS T_SMOKE -.->|"now reads"| PROJ_RECIPE T_BUNDLED -.->|"now reads"| PROJ_RECIPE T_ENGINE -.->|"now reads"| PROJ_CONTRACT class T_SMOKE,T_BUNDLED,T_POLICY,T_TOOLS,T_ENGINE phase; class TOOLS_RECIPE cli; class RECIPE_IO,RECIPE_VALIDATOR,RECIPE_CONTRACTS handler; class MIG_ENGINE handler; class CORE_PATHS stateNode; class PROJ_RECIPE,PROJ_CONTRACT,PROJ_DIAGRAM newComponent; ``` Closes #600 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260404-190817-394673/.autoskillit/temp/make-plan/move_smoke_test_recipe_plan_2026-04-04_190817.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | input | output | cached | count | time | |------|-------|--------|--------|-------|------| | plan | 5.5k | 76.5k | 6.0M | 5 | 32m 41s | | verify | 3.1k | 86.2k | 5.4M | 5 | 31m 25s | | implement | 1.1k | 116.2k | 22.6M | 6 | 50m 55s | | fix | 214 | 28.4k | 3.5M | 5 | 30m 58s | | audit_impl | 137 | 58.9k | 3.1M | 5 | 19m 28s | | open_pr | 74 | 37.1k | 3.0M | 2 | 12m 44s | | **Total** | 10.1k | 403.4k | 43.6M | | 2h 58m | --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…Type in review-design (#614) ## Summary The `review-design` skill has L1 severity calibration that correctly caps `estimand_clarity` and `hypothesis_falsifiability` by `experiment_type` — benchmarks can never produce L1 critical findings. But the red-team dimension has **no analogous calibration**, meaning any critical red-team finding triggers STOP regardless of experiment type. This creates an unresolvable loop for benchmarks: the red-team always finds new critical issues at progressively higher abstraction (the Hydra pattern), exhausting retries without ever producing GO. The fix adds a red-team severity calibration rubric to `review-design/SKILL.md` (mirroring the L1 rubric), updates the verdict logic to apply the cap before building `stop_triggers`, and adds diminishing-return awareness to `resolve-design-review/SKILL.md` so it can detect goalposts-moving across rounds. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; START([Plan submitted]) GO([GO → execute]) REVISE_OUT([REVISE → revise_design]) REVISED_OUT([revised → revise_design]) FAILED_OUT([failed → design_rejected]) subgraph ReviewDesign ["● review-design/SKILL.md"] direction TB L1["L1 Analysis ━━━━━━━━━━ estimand_clarity + hypothesis_falsifiability"] L1GATE{"L1 Fail-Fast ━━━━━━━━━━ Any L1 critical?"} PARALLEL["L2 + L3 + L4 + RT ━━━━━━━━━━ Parallel analysis"] RTCAP["● RT Severity Cap ━━━━━━━━━━ RT_MAX_SEVERITY[experiment_type] Downgrade if above ceiling"] MERGE["Merge + Dedup ━━━━━━━━━━ All findings pooled"] VERDICT{"● Verdict Logic ━━━━━━━━━━ stop_triggers built AFTER rt_cap applied"} end subgraph ResolveDesign ["● resolve-design-review/SKILL.md"] direction TB PARSE["Step 1: Parse Dashboard ━━━━━━━━━━ Extract stop-trigger findings Classify ADDRESSABLE/STRUCTURAL/DISCUSS"] DIMCHECK{"prior_revision_guidance ━━━━━━━━━━ provided?"} DIMRET["● Step 1.5: Diminishing-Return ━━━━━━━━━━ Compare ADDRESSABLE themes vs prior guidance entries"] GOALPOST{"goalposts_moving ━━━━━━━━━━ true for any finding?"} RECLASSIFY["● Reclassify ━━━━━━━━━━ ADDRESSABLE → STRUCTURAL annotate prior_theme_match"] RESGATE{"Any ADDRESSABLE or DISCUSS?"} end subgraph RecipeRouting ["● research.yaml — resolve_design_review step"] direction LR RECIPE["skill_command passes ━━━━━━━━━━ $context.revision_guidance as optional 3rd arg"] end START --> L1 L1 --> L1GATE L1GATE -->|"yes (L1 critical)"| MERGE L1GATE -->|"no"| PARALLEL PARALLEL --> RTCAP RTCAP --> MERGE MERGE --> VERDICT VERDICT -->|"stop_triggers present"| RECIPE VERDICT -->|"critical or ≥3 warnings"| REVISE_OUT VERDICT -->|"otherwise"| GO RECIPE --> PARSE PARSE --> DIMCHECK DIMCHECK -->|"yes"| DIMRET DIMCHECK -->|"no (round 1)"| RESGATE DIMRET --> GOALPOST GOALPOST -->|"true"| RECLASSIFY GOALPOST -->|"false"| RESGATE RECLASSIFY --> RESGATE RESGATE -->|"yes"| REVISED_OUT RESGATE -->|"all STRUCTURAL"| FAILED_OUT class START,GO,REVISE_OUT,REVISED_OUT,FAILED_OUT terminal; class L1,PARALLEL handler; class L1GATE,VERDICT,DIMCHECK,GOALPOST,RESGATE stateNode; class MERGE,PARSE phase; class RTCAP,DIMRET,RECLASSIFY newComponent; class RECIPE detector; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | Start and outcome states | | Orange | Handler | Analysis agents (L1, parallel L2-L4+RT) | | Teal | State | Decision points and verdict routing | | Purple | Phase | Merge and parse aggregation steps | | Green | Modified Component | ● Nodes changed by this PR (RT cap, diminishing-return detection, reclassify, recipe routing) | | Red | Detector | Recipe routing gate (passes revision_guidance) | Closes #609 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260404-185816-184240/.autoskillit/temp/make-plan/add-red-team-severity-calibration-by-experiment-type_plan_2026-04-04_185816.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | input | output | cached | count | time | |------|-------|--------|--------|-------|------| | plan | 5.5k | 76.5k | 6.0M | 5 | 32m 41s | | verify | 3.1k | 86.2k | 5.4M | 5 | 31m 25s | | implement | 1.1k | 116.2k | 22.6M | 6 | 50m 55s | | fix | 214 | 28.4k | 3.5M | 5 | 30m 58s | | audit_impl | 137 | 58.9k | 3.1M | 5 | 19m 28s | | open_pr | 135 | 68.4k | 5.4M | 4 | 23m 1s | | review_pr | 31 | 22.8k | 1.2M | 1 | 5m 50s | | **Total** | 10.2k | 457.5k | 47.2M | | 3h 14m | --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

#615) ## Summary The token summary table (displayed in PRs, terminal, and compact KV output) collapses 4 distinct Claude API token fields into 3 misleading columns. The column labeled "input" actually shows only the tiny uncached delta (`input_tokens`), and "cached" silently sums two cost-distinct categories (`cache_read_input_tokens` at 0.1x billing + `cache_creation_input_tokens` at 1.25x billing). This change splits the display into 4 token columns — `uncached`, `output`, `cache_read`, `cache_write` — across all 3 independent formatter implementations and their tests. No data model, extraction, or storage changes are needed — `TokenEntry` already preserves all 4 fields. This is purely a formatting-layer fix. ## Architecture Impact ### Data Lineage Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart LR classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; subgraph API ["Claude API Response"] direction TB F1["input_tokens ━━━━━━━━━━ Uncached delta"] F2["output_tokens ━━━━━━━━━━ Generated tokens"] F3["cache_read_input_tokens ━━━━━━━━━━ 0.1x billing"] F4["cache_creation_input_tokens ━━━━━━━━━━ 1.25x billing"] end subgraph Storage ["TokenEntry Storage"] TE[("TokenEntry ━━━━━━━━━━ 4 fields intact Accumulated per step")] TJ[("token_usage.json ━━━━━━━━━━ Persisted session data All 4 fields")] end subgraph Canonical ["● telemetry_fmt.py (Canonical Formatter)"] direction TB FMD["● format_token_table() ━━━━━━━━━━ Markdown table Step|uncached|output|cache_read|cache_write|count|time"] FTM["● format_token_table_terminal() ━━━━━━━━━━ Terminal table UNCACHED|OUTPUT|CACHE_RD|CACHE_WR"] FKV["● format_compact_kv() ━━━━━━━━━━ Compact KV uc:|out:|cr:|cw:"] end subgraph Hooks ["Stdlib Hooks (no autoskillit imports)"] direction TB TSA["● token_summary_appender._format_table() ━━━━━━━━━━ Reads token_usage.json Markdown table → GitHub PR body"] POS["● pretty_output._fmt_get_token_summary() ━━━━━━━━━━ Reads get_token_summary JSON Compact KV → PostToolUse"] POR["● pretty_output._fmt_run_skill() ━━━━━━━━━━ Reads run_skill result dict Inline KV → PostToolUse"] end subgraph Outputs ["Display Targets"] direction TB MD["PR Body ━━━━━━━━━━ GitHub markdown table"] TERM["Terminal ━━━━━━━━━━ Padded column output"] KV["Compact KV ━━━━━━━━━━ One-liner summaries"] HOOK["PostToolUse Output ━━━━━━━━━━ Hook-formatted display"] end F1 --> TE F2 --> TE F3 --> TE F4 --> TE TE --> TJ TE --> FMD TE --> FTM TE --> FKV TJ --> TSA TJ -.-> POS FMD -->|"markdown rows"| MD FTM -->|"padded columns"| TERM FKV -->|"kv lines"| KV TSA -->|"gh api PATCH"| MD POS -->|"formatted text"| HOOK POR -->|"formatted text"| HOOK class F1,F2,F3,F4 cli; class TE,TJ stateNode; class FMD,FTM,FKV handler; class TSA,POS,POR integration; class MD,TERM,KV,HOOK output; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | API Fields | 4 Claude API token categories from usage response | | Teal | Storage | TokenEntry dataclass + persisted JSON session files | | Orange | Canonical Formatter | 3 functions in telemetry_fmt.py (all ● modified) | | Red | Stdlib Hooks | Independent hook implementations (all ● modified) | | Dark Teal | Outputs | Display targets: PR body, terminal, compact KV, PostToolUse | ### Operational Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; subgraph Triggers ["OPERATOR TRIGGERS"] direction TB GTS["get_token_summary ━━━━━━━━━━ MCP tool call format=json|markdown"] RS["run_skill ━━━━━━━━━━ MCP tool call Headless session"] PRPATCH["PR body update ━━━━━━━━━━ After open-pr skill PostToolUse event"] end subgraph State ["TOKEN STATE (read/write)"] direction TB TL[("DefaultTokenLog ━━━━━━━━━━ In-memory accumulator 4 fields per step")] TJ[("token_usage.json ━━━━━━━━━━ Per-session disk files Read by stdlib hooks")] end subgraph Formatters ["● FORMATTERS (modified)"] direction TB TF["● telemetry_fmt.py ━━━━━━━━━━ format_token_table() format_token_table_terminal() format_compact_kv()"] TSA["● token_summary_appender.py ━━━━━━━━━━ _format_table() Stdlib-only hook"] PO["● pretty_output.py ━━━━━━━━━━ _fmt_get_token_summary() _fmt_run_skill()"] end subgraph Outputs ["OBSERVABILITY OUTPUTS (write-only)"] direction TB MDTBL["PR Body Table ━━━━━━━━━━ ## Token Usage Summary Step|uncached|output|cache_read|cache_write|count|time"] TERM["Terminal Table ━━━━━━━━━━ STEP UNCACHED OUTPUT CACHE_RD CACHE_WR COUNT TIME Padded for readability"] KV["Compact KV ━━━━━━━━━━ name xN [uc:X out:X cr:X cw:X t:Xs] total_uncached / total_cache_read / total_cache_write"] HOOK["PostToolUse Display ━━━━━━━━━━ tokens_uncached: tokens_cache_read: tokens_cache_write:"] end GTS -->|"reads"| TL TL -.->|"flush"| TJ TJ -->|"load_sessions"| TSA TJ -.->|"via MCP JSON payload"| PO GTS --> TF TF -->|"markdown"| MDTBL TF -->|"terminal"| TERM TF -->|"compact"| KV RS -->|"PostToolUse event"| PO PO -->|"_fmt_run_skill"| HOOK PO -->|"_fmt_get_token_summary"| KV PRPATCH -->|"PostToolUse event"| TSA TSA -->|"gh api PATCH"| MDTBL class GTS,RS,PRPATCH cli; class TL,TJ stateNode; class TF,TSA,PO handler; class MDTBL,TERM,KV,HOOK output; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Triggers | Operator-initiated MCP tool calls and PostToolUse events | | Teal | State | Token accumulator (read/write) and persisted JSON files | | Orange | Formatters | 3 modified formatter implementations (all ● changed) | | Dark Teal | Outputs | Write-only observability artifacts: PR table, terminal, compact KV | Closes #604 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260404-190817-266225/.autoskillit/temp/make-plan/token_summary_4_columns_plan_2026-04-04_191000.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

## Summary Add `api-simulator` as a dev dependency and use its `mock_http_server` pytest fixture to test the quota guard's real HTTP path end-to-end. Currently all quota tests monkeypatch `_fetch_quota` at the function level — the actual httpx client construction, header injection (`Authorization: Bearer`, `anthropic-beta`), response parsing, and error handling are never exercised. This plan introduces a `base_url` parameter to `_fetch_quota` and `check_and_sleep_if_needed`, then writes 7 tests that point the real httpx client at `mock_http_server` to exercise the full HTTP path. **Files changed:** 3 (`pyproject.toml`, `src/autoskillit/execution/quota.py`, new `tests/execution/test_quota_http.py`) **Existing tests:** Unchanged — all monkeypatch-based tests in `test_quota.py` remain as-is. ## Requirements ### DEP — Dependency Integration - **REQ-DEP-001:** The system must include `api-simulator` as a dev-only dependency with a pinned git tag source. - **REQ-DEP-002:** The api-simulator dependency must not appear in production runtime dependencies. ### CFG — URL Configurability - **REQ-CFG-001:** `_fetch_quota` must accept a `base_url` parameter defaulting to `https://api.anthropic.com`. - **REQ-CFG-002:** `check_and_sleep_if_needed` must thread the `base_url` parameter through to `_fetch_quota` at both call sites. - **REQ-CFG-003:** The production behavior must be unchanged when `base_url` is not explicitly provided. ### HTTP — HTTP Path Verification - **REQ-HTTP-001:** Tests must exercise the real httpx client construction path, not monkeypatch `_fetch_quota`. - **REQ-HTTP-002:** Tests must verify that the `Authorization: Bearer` header is sent on the request. - **REQ-HTTP-003:** Tests must verify that the `anthropic-beta: oauth-2025-04-20` header is sent on the request. - **REQ-HTTP-004:** Tests must verify correct JSON response parsing for the `five_hour` utilization shape. ### ERR — Error Handling Verification - **REQ-ERR-001:** Tests must verify fail-open behavior on HTTP 4xx/5xx responses. - **REQ-ERR-002:** Tests must verify fail-open behavior on network timeout. - **REQ-ERR-003:** Tests must verify that the above-threshold path triggers a double-fetch (two HTTP requests). ### COMPAT — Backward Compatibility - **REQ-COMPAT-001:** Existing `test_quota.py` tests must continue to pass unchanged. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; START([START: check_and_sleep_if_needed]) subgraph GatePhase ["Gate Phase"] direction TB ENABLED{"config.enabled?"} DISABLED(["RETURN should_sleep: false"]) end subgraph CachePhase ["Cache Phase"] direction TB CACHE["_read_cache ━━━━━━━━━━ Read local JSON cache"] CACHE_HIT{"Cache fresh? ━━━━━━━━━━ age ≤ max_age?"} end subgraph FetchPhase ["HTTP Fetch Phase"] direction TB FETCH["● _fetch_quota ━━━━━━━━━━ ★ base_url parameter httpx.AsyncClient GET"] BASEURL["★ base_url ━━━━━━━━━━ default: api.anthropic.com test: mock_http_server.url"] PARSE["Parse Response ━━━━━━━━━━ five_hour.utilization Z→+00:00 normalization"] end subgraph DecisionPhase ["Threshold Decision"] direction TB THRESHOLD{"utilization ≥ threshold?"} RESETS_AT1{"resets_at is None? (Gate 1)"} REFETCH["● _fetch_quota re-fetch ━━━━━━━━━━ ★ base_url threaded Double-fetch for accuracy"] RESETS_AT2{"resets_at still None? (Gate 2)"} end subgraph Results ["Results"] BELOW(["RETURN should_sleep: false"]) FALLBACK1(["RETURN should_sleep: true reason: unknown_reset fallback ≥ 60s"]) FALLBACK2(["RETURN should_sleep: true reason: unknown_reset fallback ≥ 60s"]) SLEEP(["RETURN should_sleep: true sleep_seconds computed"]) FAILOPEN(["RETURN should_sleep: false error key present"]) end subgraph TestInfra ["★ Test Infrastructure (test_quota_http.py)"] direction TB MOCK["★ mock_http_server ━━━━━━━━━━ api-simulator fixture HTTP server"] REGISTER["★ register / register_sequence ━━━━━━━━━━ Custom endpoint responses Status codes, delays"] INSPECT["★ get_requests / request_count ━━━━━━━━━━ Header verification Double-fetch assertion"] end START --> ENABLED ENABLED -->|"false"| DISABLED ENABLED -->|"true"| CACHE CACHE --> CACHE_HIT CACHE_HIT -->|"fresh + below threshold"| BELOW CACHE_HIT -->|"miss or expired"| FETCH FETCH --> BASEURL BASEURL --> PARSE PARSE --> THRESHOLD THRESHOLD -->|"below"| BELOW THRESHOLD -->|"above"| RESETS_AT1 RESETS_AT1 -->|"None"| FALLBACK1 RESETS_AT1 -->|"present"| REFETCH REFETCH --> RESETS_AT2 RESETS_AT2 -->|"None"| FALLBACK2 RESETS_AT2 -->|"present"| SLEEP FETCH -.->|"HTTP error / timeout"| FAILOPEN MOCK -.->|"serves responses to"| BASEURL REGISTER -.->|"configures"| MOCK INSPECT -.->|"verifies headers / count"| FETCH class START terminal; class DISABLED,BELOW,FALLBACK1,FALLBACK2,SLEEP,FAILOPEN phase; class ENABLED,CACHE_HIT,THRESHOLD,RESETS_AT1,RESETS_AT2 stateNode; class CACHE,PARSE handler; class FETCH,REFETCH handler; class BASEURL,MOCK,REGISTER,INSPECT newComponent; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | Entry point | | Teal | State | Decision points and routing | | Orange | Handler | Processing nodes (cache read, HTTP fetch, parse) | | Green | New Component | ★ New `base_url` parameter and test infrastructure | | Purple | Phase | Result return paths | ### Development Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; subgraph Deps ["● DEPENDENCY MANIFEST (pyproject.toml)"] direction TB PYPROJECT["● pyproject.toml ━━━━━━━━━━ hatchling build backend requires-python ≥ 3.11"] DEVDEPS["● dev optional-dependencies ━━━━━━━━━━ pytest, pytest-asyncio, pytest-httpx, pytest-xdist, pytest-timeout, ruff, import-linter, packaging"] APISIM["★ api-simulator ━━━━━━━━━━ New dev dependency HTTP mock fixture provider"] UVSRC["★ [tool.uv.sources] ━━━━━━━━━━ api-simulator pinned git: TalonT-Org/api-simulator branch: main"] UVLOCK["● uv.lock ━━━━━━━━━━ Regenerated with api-simulator entry"] end subgraph Quality ["CODE QUALITY GATES (pre-commit)"] direction TB FORMAT["ruff format ━━━━━━━━━━ Auto-fix code style reads + modifies src"] LINT["ruff check ━━━━━━━━━━ Auto-fix lint violations reads + modifies src"] TYPES["mypy ━━━━━━━━━━ Type checking reads src, reports only"] UVCHECK["uv lock check ━━━━━━━━━━ Verifies lockfile sync reads uv.lock"] SECRETS["gitleaks ━━━━━━━━━━ Secret scanning reads staged files"] IMPORTLINT["import-linter ━━━━━━━━━━ Layer contract enforcement IL-001 through IL-007"] end subgraph Testing ["TEST FRAMEWORK"] direction TB PYTEST["pytest + pytest-asyncio ━━━━━━━━━━ asyncio_mode=auto timeout=60s signal"] XDIST["pytest-xdist -n 4 ━━━━━━━━━━ Parallel test workers worksteal distribution"] UNITQUOTA["● test_quota.py ━━━━━━━━━━ 23 unit tests monkeypatch _fetch_quota mock signature updated"] HTTPQUOTA["★ test_quota_http.py ━━━━━━━━━━ 7 end-to-end HTTP tests real httpx client path no monkeypatching"] MOCKSERVER["★ mock_http_server fixture ━━━━━━━━━━ api-simulator provides register / register_sequence get_requests / request_count"] end subgraph EntryPoints ["ENTRY POINTS"] CLI["autoskillit CLI ━━━━━━━━━━ autoskillit.cli:main"] end PYPROJECT --> DEVDEPS DEVDEPS --> APISIM APISIM --> UVSRC UVSRC --> UVLOCK PYPROJECT --> FORMAT FORMAT --> LINT LINT --> TYPES TYPES --> UVCHECK UVCHECK --> SECRETS SECRETS --> IMPORTLINT IMPORTLINT --> PYTEST PYTEST --> XDIST XDIST --> UNITQUOTA XDIST --> HTTPQUOTA APISIM -.->|"provides fixture"| MOCKSERVER MOCKSERVER -.->|"injected into"| HTTPQUOTA PYPROJECT --> CLI class PYPROJECT,DEVDEPS,UVLOCK phase; class APISIM,UVSRC,HTTPQUOTA,MOCKSERVER newComponent; class UNITQUOTA handler; class FORMAT,LINT,TYPES,UVCHECK,SECRETS,IMPORTLINT detector; class PYTEST,XDIST handler; class CLI output; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Purple | Build Config | pyproject.toml, dev deps, lockfile | | Green | New Component | ★ api-simulator dep, uv.sources, HTTP test file, mock fixture | | Orange | Test Framework | pytest, xdist, existing test_quota.py | | Red | Quality Gates | ruff, mypy, uv lock check, gitleaks, import-linter | | Dark Teal | Entry Points | CLI entry point | Closes #607 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260404-190816-816130/.autoskillit/temp/make-plan/integrate_api_simulator_quota_guard_plan_2026-04-04_191500.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | input | output | cached | count | time | |------|-------|--------|--------|-------|------| | plan | 5.5k | 76.5k | 6.0M | 5 | 32m 41s | | verify | 3.1k | 86.2k | 5.4M | 5 | 31m 25s | | implement | 1.1k | 116.2k | 22.6M | 6 | 50m 55s | | fix | 214 | 28.4k | 3.5M | 5 | 30m 58s | | audit_impl | 137 | 58.9k | 3.1M | 5 | 19m 28s | | open_pr | 100 | 51.3k | 3.9M | 3 | 16m 38s | | **Total** | 10.2k | 417.5k | 44.5M | | 3h 2m | --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

## Summary The `zero_writes` gate in `execution/headless.py` fires unconditionally when `write_behavior.mode == "always"` and `write_call_count == 0`. The `resolve-failures` contract declares `write_behavior: always`, but the skill legitimately exits with zero `Edit`/`Write` calls when the worktree is already green (0 fix iterations). The gate has no escape path for this case — `success=True` is demoted to `zero_writes`, killing an otherwise correct pipeline run. This PR changes the contract to `conditional` mode with a pattern gated on the `fixes_applied` structured token, extends the same fix to `retry-worktree` and `resolve-review`, and adds a semantic rule to prevent regression. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; %% TERMINALS %% START([run_skill called]) SUCCESS(["✓ success=True subtype=success"]) DEMOTED(["✗ success=False subtype=zero_writes"]) subgraph Contract ["● Contract Resolution"] direction TB YAML["● skill_contracts.yaml ━━━━━━━━━━ resolve-failures: write_behavior: conditional write_expected_when: - fixes_applied ≥ 1 regex"] FACTORY["● _factory.py ━━━━━━━━━━ _resolve_write_behavior() reads contract via lru_cache"] SPEC["WriteBehaviorSpec ━━━━━━━━━━ mode=conditional expected_when=(pattern,)"] end subgraph Execution ["● Skill Execution"] direction TB SESSION["headless subprocess ━━━━━━━━━━ run tests, apply fixes via Bash / Edit / Write"] TOKEN["● Structured Token ━━━━━━━━━━ fixes_applied = N emitted at Step 4"] COUNT["write_call_count ━━━━━━━━━━ count Edit + Write in tool_uses"] end subgraph Gate ["● Zero-Write Gate"] direction TB GUARD{"success=True AND write_count=0 AND write_behavior≠None?"} MODE{"● mode? ━━━━━━━━━━ always vs conditional"} PATTERN{"● _check_expected_patterns ━━━━━━━━━━ AND-match all patterns against session output"} EXPECT{"write_expected AND write_count=0?"} end %% FLOW %% START --> YAML YAML -->|"reads"| FACTORY FACTORY -->|"builds"| SPEC SPEC -->|"passed to executor"| SESSION SESSION --> TOKEN SESSION --> COUNT TOKEN --> GUARD COUNT --> GUARD GUARD -->|"No — gate inactive"| SUCCESS GUARD -->|"Yes"| MODE MODE -->|"always"| EXPECT MODE -->|"conditional"| PATTERN PATTERN -->|"fixes_applied=0 no match → False"| SUCCESS PATTERN -->|"fixes_applied≥1 match → True"| EXPECT EXPECT -->|"write_count > 0 artifact written"| SUCCESS EXPECT -->|"write_count = 0 no artifact"| DEMOTED %% CLASS ASSIGNMENTS %% class START,SUCCESS,DEMOTED terminal; class YAML,SPEC stateNode; class FACTORY,SESSION,COUNT handler; class TOKEN output; class GUARD,MODE,PATTERN,EXPECT detector; ``` ### State Lifecycle Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; subgraph ContractFields ["● INIT_ONLY: Contract Fields (YAML → frozen)"] direction TB WB["● write_behavior ━━━━━━━━━━ always ∣ conditional ∣ null Set in skill_contracts.yaml Cached via @lru_cache"] WEW["● write_expected_when ━━━━━━━━━━ list of regex patterns AND-semantics at gate Empty = no pattern gate"] end subgraph SpecFields ["INIT_ONLY: WriteBehaviorSpec (frozen dataclass)"] direction TB MODE["● mode: str ∣ None ━━━━━━━━━━ Mirrors write_behavior Frozen after construction"] EXPECTED["● expected_when: tuple ━━━━━━━━━━ Immutable tuple of patterns Frozen after construction"] end subgraph SessionState ["MUTABLE + APPEND: Session State"] direction TB TOOLS["tool_uses: list ━━━━━━━━━━ APPEND_ONLY during session Each Edit/Write appended"] RESULT["● session output: str ━━━━━━━━━━ Contains structured tokens fixes_applied = N"] WCC["write_call_count: int ━━━━━━━━━━ DERIVED from tool_uses count(Edit + Write)"] end subgraph GateState ["● MUTABLE: SkillResult Fields (gate mutations)"] direction TB SUCCESS["● success: bool ━━━━━━━━━━ Init: True (if session ok) Gate may demote → False"] SUBTYPE["● subtype: str ━━━━━━━━━━ Init: success Gate may set → zero_writes"] RETRY["● needs_retry: bool ━━━━━━━━━━ Init: False Gate may set → True"] end subgraph Validation ["● VALIDATION GATES"] direction TB G1{"● mode check ━━━━━━━━━━ always → write_expected=True conditional → check patterns"} G2{"● _check_expected_patterns ━━━━━━━━━━ AND over all patterns re.search each on output"} G3{"write_expected AND write_count == 0? ━━━━━━━━━━ Demote if both True"} end %% FLOW: Contract → Spec %% WB -->|"reads"| MODE WEW -->|"reads"| EXPECTED %% FLOW: Spec → Gate %% MODE -->|"determines gate path"| G1 EXPECTED -->|"provides patterns"| G2 %% FLOW: Session → Gate %% TOOLS -->|"derives"| WCC RESULT -->|"scanned by"| G2 WCC -->|"checked by"| G3 %% FLOW: Gate decisions %% G1 -->|"conditional"| G2 G1 -->|"always"| G3 G2 -->|"match → True"| G3 G2 -->|"no match → False"| SUCCESS %% FLOW: Gate → Mutation %% G3 -->|"demote"| SUBTYPE G3 -->|"demote"| RETRY G3 -->|"preserve"| SUCCESS %% CLASS ASSIGNMENTS %% class WB,WEW detector; class MODE,EXPECTED detector; class TOOLS handler; class RESULT output; class WCC phase; class SUCCESS,SUBTYPE,RETRY gap; class G1,G2,G3 stateNode; ``` Closes #603 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/remediation-20260404-212507-745574/.autoskillit/temp/rectify/rectify_zero-writes-false-positive_2026-04-04_215019_part_a.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | input | output | cached | count | time | |------|-------|--------|--------|-------|------| | investigate | 31 | 12.6k | 747.1k | 1 | 6m 34s | | rectify | 11.4k | 57.9k | 2.0M | 1 | 27m 28s | | review | 3.6k | 7.2k | 216.3k | 1 | 8m 0s | | dry_walkthrough | 51 | 30.8k | 2.3M | 2 | 11m 22s | | implement | 2.2k | 28.2k | 3.0M | 2 | 10m 56s | | assess | 44 | 7.8k | 1.1M | 2 | 8m 43s | | audit_impl | 30 | 18.6k | 654.7k | 2 | 9m 10s | | open_pr | 28 | 15.8k | 1.0M | 1 | 7m 3s | | **Total** | 17.3k | 178.9k | 11.1M | | 1h 29m | --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…alation Routing, and Pack Fix (#620) ## Summary This part adds the post-review re-validation loop and escalation consumption infrastructure to `research.yaml`, adds the `needs_rerun` structured output token to `resolve-research-review/SKILL.md`, and fixes the missing `exp-lens` pack registration. Additionally adds the data provenance lifecycle across 5 research pipeline skills (plan-experiment, run-experiment, write-report, review-design, review-research-pr) with contract and guard tests. ## Requirements ### DATA — Data Provenance Lifecycle - **REQ-DATA-001:** The `plan-experiment` skill must generate a Data Manifest section in every experiment plan that maps each hypothesis to its required data source(s), specifying source type (synthetic, fixture, external, gitignored), acquisition method (generate, download, copy), and verification criteria. - **REQ-DATA-002:** When the research task directive or issue specifies using particular data, the `plan-experiment` skill must include explicit acquisition steps for that data in the plan — the plan must not assume data will already be present. - **REQ-DATA-003:** The `run-experiment` skill pre-flight must perform a hypothesis-to-data mapping check against the Data Manifest: for each hypothesis, verify its required data source is present and non-empty before execution begins. - **REQ-DATA-004:** When `run-experiment` pre-flight finds that data the plan said would be acquired is missing, it must emit a structured `blocked_hypotheses` list and treat this as a FAIL — not silently degrade to N/A. - **REQ-DATA-005:** The `review-design` skill must include data acquisition completeness as a reviewable dimension at sufficient weight to influence the verdict (not L-weight), checking that every hypothesis has a data source, every external source has an acquisition step, and every gitignored path has a generation/download step. - **REQ-DATA-006:** The `review-research-pr` skill must include a `data-scope` review dimension that checks whether the experiment's data coverage matches the research task directive and flags when all benchmarks used only synthetic data for a domain-specific project. ### REPORT — Write-Report Data Scope Guardrails - **REQ-REPORT-001:** The `write-report` skill must include a mandatory Data Scope Statement in the Executive Summary that explicitly states what data types were used for all benchmarks and whether domain target data was present, absent, or partial. - **REQ-REPORT-002:** The `write-report` skill must perform a Metrics Provenance Check before including any `*_metrics.json` files: verify they were generated during the current experiment. If stale or unrelated, disclose and omit with explanation rather than silently dropping. - **REQ-REPORT-003:** The `write-report` skill must enforce pre-specified hypothesis gate thresholds: when a gate is not met, the report must state this as a failure, and GO recommendations must reference the specific gate that was met rather than silently substituting a different threshold. ### REVAL — Post-Review Re-Validation Loop - **REQ-REVAL-001:** The `resolve-research-review` skill must emit a structured output token (`needs_rerun = true/false`) indicating whether any `rerun_required` escalations exist, so the recipe can capture and route on it. - **REQ-REVAL-002:** The `research.yaml` recipe must include a routing step after `resolve_research_review` that checks for `rerun_required` escalations and routes to a `re_run_experiment` step when present. - **REQ-REVAL-003:** The `re_run_experiment` step must perform a targeted re-run of affected benchmarks/analyses (not a full experiment replay) using the same data and scripts, then flow to `re_write_report` → `re_push_research`. - **REQ-REVAL-004:** When only `design_flaw` escalations exist (no `rerun_required`), the recipe must annotate the PR body with the escalation details and continue to push. ### ESC — Escalation Consumption - **REQ-ESC-001:** The `research.yaml` recipe must include a `check_escalations` step between `resolve_research_review` and `re_push_research` that reads `escalation_records_{pr}.json` and routes based on escalation strategy types. - **REQ-ESC-002:** The `check_escalations` step must distinguish between `rerun_required` escalations (route to re-validation) and `design_flaw`-only escalations (annotate and continue). ### PACK — Exp-Lens Pack Registration - **REQ-PACK-001:** The `research.yaml` recipe must declare `requires_packs: [research, exp-lens]` so that all 18 exp-lens skills are available in headless sessions during the research recipe pipeline. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; PUSH_BR([push_branch ━━━━━━━━━━ git push worktree]) subgraph PRReview ["PR Review Phase"] direction TB OPEN["open_research_pr ━━━━━━━━━━ run_skill: open-pr"] GUARD{"guard_pr_url ━━━━━━━━━━ context.pr_url?"} REVIEW["● review_research_pr ━━━━━━━━━━ run_skill: review-research-pr captures: verdict"] end subgraph Resolution ["Review Resolution"] direction TB RESOLVE["● resolve_research_review ━━━━━━━━━━ run_skill: resolve-research-review captures: needs_rerun retries: 2"] end subgraph EscalationRouting ["★ Escalation Routing (New)"] direction TB CHECK{"★ check_escalations ━━━━━━━━━━ action: route context.needs_rerun?"} end subgraph RevalidationLoop ["★ Re-Validation Loop (New)"] direction TB RERUN["★ re_run_experiment ━━━━━━━━━━ run-experiment --adjust targeted benchmark re-run"] REWRITE["★ re_write_report ━━━━━━━━━━ write-report updated results"] RETEST["★ re_test ━━━━━━━━━━ test_check post-revalidation gate"] end REPUSH["● re_push_research ━━━━━━━━━━ run_cmd: git push"] COMPLETE([research_complete ━━━━━━━━━━ action: stop]) PUSH_BR --> OPEN OPEN --> GUARD GUARD -->|"pr_url truthy"| REVIEW GUARD -->|"no pr_url"| COMPLETE REVIEW -->|"changes_requested"| RESOLVE REVIEW -->|"approved / needs_human"| COMPLETE RESOLVE -->|"on_success"| CHECK RESOLVE -->|"on_failure / exhausted"| COMPLETE CHECK -->|"needs_rerun == true"| RERUN CHECK -->|"default (false/absent)"| REPUSH RERUN -->|"on_success"| REWRITE RERUN -->|"on_failure / context_limit"| REPUSH REWRITE -->|"on_success"| RETEST REWRITE -->|"on_failure / context_limit"| REPUSH RETEST -->|"pass or fail"| REPUSH REPUSH --> COMPLETE class PUSH_BR,COMPLETE terminal; class GUARD,CHECK stateNode; class OPEN,REVIEW,RESOLVE handler; class RERUN,REWRITE,RETEST newComponent; class REPUSH phase; ``` ### State Lifecycle Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; subgraph Manifest ["★ Data Manifest Contract (INIT_ONLY)"] direction TB DM["★ data_manifest ━━━━━━━━━━ hypothesis[], source_type, acquisition, location, verification, depends_on"] V9{"★ V9 Gate ━━━━━━━━━━ Every hypothesis has source? External has acquisition? Gitignored has generation?"} end subgraph DesignGate ["★ Design Review Gate"] direction TB DAQ{"★ data_acquisition L4 ━━━━━━━━━━ Hypothesis coverage? External readiness? Directive compliance?"} end subgraph PreFlight ["★ Run-Experiment Pre-Flight"] direction TB PF{"★ Data Manifest Verification ━━━━━━━━━━ location exists? acquisition succeeds?"} BH["★ blocked_hypotheses ━━━━━━━━━━ APPEND_ONLY H5: missing at path"] end subgraph ReportGates ["★ Write-Report Validation Gates"] direction TB DSS["★ Data Scope Statement ━━━━━━━━━━ Mandatory in Executive Summary data types + domain coverage"] MPC["★ Metrics Provenance ━━━━━━━━━━ timestamp + relevance check disclose, never silently drop"] GE["★ Gate Enforcement ━━━━━━━━━━ pre-specified thresholds only no silent substitution"] end subgraph ReviewGate ["★ PR Review Gate"] direction TB DSCOPE["★ data-scope dimension ━━━━━━━━━━ Scope coverage? Claims qualified? Statement present?"] end subgraph EscalationState ["● Resolve Output Contract"] direction TB ESC["escalation_records ━━━━━━━━━━ APPEND_ONLY strategy: rerun_required strategy: design_flaw"] NR["● needs_rerun ━━━━━━━━━━ DERIVED from escalations any rerun_required → true else → false"] end DM -->|"writes"| V9 V9 -->|"PASS: plan saved"| DAQ V9 -->|"FAIL: plan rejected"| FAIL_PLAN([Plan Rejected]) DAQ -->|"GO: proceed"| PF DAQ -->|"STOP: hypothesis has no source"| REVISE([Revise Plan]) DAQ -->|"REVISE: missing verification"| REVISE PF -->|"ALL READY"| DSS PF -->|"BLOCKED: data missing"| BH BH --> FAIL_RUN([Status: FAILED]) DM -.->|"reads manifest"| PF DM -.->|"reads manifest"| DSS DM -.->|"reads manifest"| DSCOPE DSS --> MPC MPC --> GE GE -->|"report committed"| DSCOPE DSCOPE -->|"findings"| ESC ESC -->|"derive"| NR NR -->|"true → re-validate"| RERUN([Re-Validation Loop]) NR -->|"false → push"| PUSH([Direct Push]) class DM detector; class V9,DAQ,PF stateNode; class BH,ESC handler; class DSS,MPC,GE newComponent; class DSCOPE newComponent; class NR phase; class FAIL_PLAN,FAIL_RUN,REVISE gap; class RERUN,PUSH cli; ``` Closes #618 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260405-074034-301298/.autoskillit/temp/make-plan/research_recipe_data_provenance_plan_2026-04-05_074500_part_a.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 587 | 30.7k | 1.2M | 112.6k | 1 | 13m 29s | | verify | 73 | 35.9k | 3.7M | 137.0k | 2 | 11m 23s | | implement | 2.1k | 36.2k | 5.9M | 155.2k | 2 | 17m 4s | | fix | 50 | 13.2k | 2.1M | 64.5k | 1 | 10m 53s | | audit_impl | 28 | 17.3k | 786.1k | 51.7k | 1 | 5m 55s | | open_pr | 23 | 17.1k | 736.1k | 58.6k | 1 | 8m 12s | | **Total** | 2.9k | 150.3k | 14.5M | 579.5k | | 1h 6m | --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

## Summary When a headless session spawns background agents via Claude Code's `Agent` tool with `run_in_background: true`, Claude Code defers the `type=result` NDJSON record until all background agents finish. If autoskillit kills the process tree after Channel B confirms completion, the deferred `type=result` is never flushed to stdout. `parse_session_result` classifies the output as `UNPARSEABLE`, which gates out all recovery paths and Channel B bypass — producing a false failure for sessions that completed successfully. The fix adds a **pre-gate Channel B drain-race recovery** in `_build_skill_result` that runs *before* the `session.session_complete` gate. When Channel B confirmed completion but the session is UNPARSEABLE/EMPTY_OUTPUT, it reconstructs the result from `assistant_messages` (which are written to stdout BEFORE the deferred `type=result`) and promotes the session to SUCCESS, unlocking all downstream recovery paths and Channel B bypass naturally. ## Architecture Impact ### Error/Resilience Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; START(["● _build_skill_result ━━━━━━━━━━ Entry with SubprocessResult"]) subgraph PreGate ["● PRE-GATE: Channel B Drain-Race Recovery"] direction TB CB_CHECK{"● Channel B? + subtype in RECOVERABLE_SUBTYPES? + completion_marker?"} CB_RECOVER["● _recover_from_separate_marker ━━━━━━━━━━ Reconstruct result from assistant_messages"] CB_PROMOTE["● Promote session ━━━━━━━━━━ subtype → SUCCESS is_error → False"] CB_SKIP["No recovery needed ━━━━━━━━━━ Pass through unchanged"] end subgraph CompletionGate ["session.session_complete Gate"] direction TB GATE{"session_complete? ━━━━━━━━━━ not is_error AND subtype not in FAILURE_SUBTYPES"} MARKER_RECOVER["_recover_from_separate_marker ━━━━━━━━━━ Marker-based recovery"] PATTERN_RECOVER["_recover_block_from_assistant_messages ━━━━━━━━━━ Pattern-based recovery"] SYNTH["_synthesize_from_write_artifacts ━━━━━━━━━━ UNMONITORED only"] SKIP_RECOVERY["Skip all recovery ━━━━━━━━━━ TIMEOUT / genuine failure"] end subgraph Outcome ["● _compute_outcome"] direction TB CB_BYPASS{"● Channel B bypass in _compute_success?"} CONTENT_CHECK["_check_session_content ━━━━━━━━━━ 6-gate validation"] DEAD_END{"Dead-end guard ━━━━━━━━━━ ABSENT → DRAIN_RACE CONTRACT_VIOLATION → FAIL"} end subgraph PostOutcome ["Post-Outcome Gates"] direction TB BUDGET["_apply_budget_guard ━━━━━━━━━━ Max consecutive retries"] CONTRACT["CONTRACT_RECOVERY gate ━━━━━━━━━━ adjudicated_failure + write evidence"] ZERO_WRITE["Zero-write gate ━━━━━━━━━━ Expected writes missing"] end subgraph Terminals ["TERMINAL STATES"] T_SUCCESS([SUCCEEDED]) T_RETRY([RETRIABLE DRAIN_RACE / RESUME / CONTRACT_RECOVERY]) T_FAIL([FAILED]) T_BUDGET([BUDGET_EXHAUSTED]) end START --> CB_CHECK CB_CHECK -->|"Yes: CHANNEL_B + UNPARSEABLE or EMPTY_OUTPUT"| CB_RECOVER CB_CHECK -->|"No: other channel or non-recoverable subtype"| CB_SKIP CB_RECOVER -->|"Recovery succeeds: marker standalone + substantive content"| CB_PROMOTE CB_RECOVER -->|"Recovery fails: no marker in messages"| CB_SKIP CB_PROMOTE --> GATE CB_SKIP --> GATE GATE -->|"True: session promoted or originally complete"| MARKER_RECOVER GATE -->|"False: TIMEOUT / unrecoverable subtype"| SKIP_RECOVERY MARKER_RECOVER --> PATTERN_RECOVER PATTERN_RECOVER --> SYNTH SYNTH --> CB_BYPASS SKIP_RECOVERY --> CB_BYPASS CB_BYPASS -->|"CHANNEL_B + session_complete + patterns pass"| T_SUCCESS CB_BYPASS -->|"No bypass: falls to termination dispatch"| CONTENT_CHECK CONTENT_CHECK -->|"All 6 gates pass"| T_SUCCESS CONTENT_CHECK -->|"Any gate fails"| DEAD_END DEAD_END -->|"ABSENT + channel confirmed"| T_RETRY DEAD_END -->|"CONTRACT_VIOLATION / SESSION_ERROR"| T_FAIL T_RETRY --> BUDGET BUDGET -->|"Under limit"| CONTRACT BUDGET -->|"Exceeded"| T_BUDGET CONTRACT -->|"adjudicated_failure + writes ≥ 1"| T_RETRY CONTRACT -->|"No match"| ZERO_WRITE ZERO_WRITE -->|"Expected writes missing"| T_RETRY ZERO_WRITE -->|"No issue"| T_SUCCESS %% CLASS ASSIGNMENTS %% class START terminal; class CB_CHECK,GATE,CB_BYPASS,DEAD_END stateNode; class CB_RECOVER,CB_PROMOTE newComponent; class CB_SKIP,SKIP_RECOVERY gap; class MARKER_RECOVER,PATTERN_RECOVER,SYNTH handler; class CONTENT_CHECK phase; class BUDGET,CONTRACT,ZERO_WRITE detector; class T_SUCCESS,T_RETRY,T_FAIL,T_BUDGET terminal; ``` ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; START(["● _build_skill_result ━━━━━━━━━━ SubprocessResult input"]) subgraph EarlyExit ["Phase 1: Early Exit Interception"] direction TB TERM_CHECK{"termination reason?"} STALE_PATH["STALE handler ━━━━━━━━━━ Attempt stdout recovery then retry or fail"] TIMEOUT_PATH["TIMEOUT handler ━━━━━━━━━━ Override subtype=TIMEOUT is_error=True"] end PARSE["parse_session_result ━━━━━━━━━━ NDJSON → ClaudeSessionResult extracts assistant_messages"] subgraph DrainRace ["● Phase 2: Channel B Drain-Race Recovery"] direction TB CB_MATCH{"● match channel ━━━━━━━━━━ CHANNEL_B + UNPARSEABLE/EMPTY_OUTPUT + completion_marker?"} CB_RECON["● _recover_from_separate_marker ━━━━━━━━━━ Check marker standalone in assistant_messages"] CB_PROMOTE["● Promote session ━━━━━━━━━━ subtype → SUCCESS is_error → False"] CB_NONE["No drain-race ━━━━━━━━━━ Session unchanged"] end subgraph GatedRecovery ["Phase 3: Completion-Gated Recovery"] direction TB GATE{"session_complete? ━━━━━━━━━━ not is_error AND subtype ∉ FAILURE_SUBTYPES"} REC_MARKER["_recover_from_separate_marker ━━━━━━━━━━ Join assistant_messages when marker is standalone"] REC_PATTERN["_recover_block_from_assistant ━━━━━━━━━━ Patterns in messages not in result"] REC_SYNTH["_synthesize_from_write_artifacts ━━━━━━━━━━ UNMONITORED only: inject write paths"] GATE_SKIP["Skip recovery ━━━━━━━━━━ Incomplete session"] end subgraph ComputeOutcome ["● Phase 4: Outcome Adjudication"] direction TB COMPUTE["● _compute_outcome ━━━━━━━━━━ _compute_success + _compute_retry"] SUCCESS_CHECK{"● success?"} RETRY_CHECK{"needs_retry?"} end subgraph PostGates ["Phase 5: Post-Outcome Gates"] direction TB BUDGET_G["_apply_budget_guard ━━━━━━━━━━ consecutive_failures > max_retries?"] CONTRACT_G{"CONTRACT_RECOVERY? ━━━━━━━━━━ adjudicated_failure + write_count ≥ 1"} ZERO_G{"zero_write_gate? ━━━━━━━━━━ success but no Write/Edit calls"} end T_SUCCESS([SUCCEEDED]) T_RETRY([RETRIABLE]) T_FAIL([FAILED]) %% FLOW %% START --> TERM_CHECK TERM_CHECK -->|"STALE"| STALE_PATH TERM_CHECK -->|"TIMED_OUT"| TIMEOUT_PATH TERM_CHECK -->|"COMPLETED / NATURAL_EXIT"| PARSE STALE_PATH --> T_RETRY TIMEOUT_PATH --> PARSE PARSE --> CB_MATCH CB_MATCH -->|"Yes: all 3 guards pass"| CB_RECON CB_MATCH -->|"No: wrong channel / wrong subtype / no marker"| CB_NONE CB_RECON -->|"Marker found standalone + substantive content"| CB_PROMOTE CB_RECON -->|"No marker or empty content"| CB_NONE CB_PROMOTE --> GATE CB_NONE --> GATE GATE -->|"True: complete session"| REC_MARKER GATE -->|"False: incomplete"| GATE_SKIP REC_MARKER --> REC_PATTERN REC_PATTERN --> REC_SYNTH REC_SYNTH --> COMPUTE GATE_SKIP --> COMPUTE COMPUTE --> SUCCESS_CHECK SUCCESS_CHECK -->|"True"| ZERO_G SUCCESS_CHECK -->|"False"| RETRY_CHECK RETRY_CHECK -->|"True"| BUDGET_G RETRY_CHECK -->|"False"| CONTRACT_G BUDGET_G -->|"Under limit"| T_RETRY BUDGET_G -->|"Exhausted"| T_FAIL CONTRACT_G -->|"Yes: promote to retry"| BUDGET_G CONTRACT_G -->|"No"| T_FAIL ZERO_G -->|"Writes expected but count = 0"| T_RETRY ZERO_G -->|"OK"| T_SUCCESS %% CLASS ASSIGNMENTS %% class START,T_SUCCESS,T_RETRY,T_FAIL terminal; class TERM_CHECK,CB_MATCH,GATE,SUCCESS_CHECK,RETRY_CHECK stateNode; class STALE_PATH,TIMEOUT_PATH,PARSE handler; class CB_RECON,CB_PROMOTE newComponent; class CB_NONE,GATE_SKIP gap; class REC_MARKER,REC_PATTERN,REC_SYNTH handler; class COMPUTE phase; class BUDGET_G,CONTRACT_G,ZERO_G detector; ``` Closes #619 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-619-20260405-085642-620214/.autoskillit/temp/make-plan/channel_b_drain_race_recovery_plan_2026-04-05_090230.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 42 | 18.8k | 1.6M | 80.7k | 1 | 9m 8s | | verify | 17 | 17.4k | 687.5k | 79.7k | 1 | 6m 55s | | implement | 77 | 28.2k | 4.4M | 89.7k | 1 | 15m 40s | | audit_impl | 14 | 8.9k | 348.9k | 43.4k | 1 | 3m 4s | | open_pr | 3.0k | 17.7k | 865.3k | 63.1k | 1 | 7m 30s | | **Total** | 3.1k | 91.0k | 8.0M | 356.6k | | 42m 19s | --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…ulator FakeClaudeCLI (#624) ## Summary Add 10 end-to-end tests in a new file `tests/execution/test_session_classification_e2e.py` that exercise the full session failure classification pipeline — from raw NDJSON subprocess output produced by api-simulator's `fake_claude` fixture through `parse_session_result()` and `_build_skill_result()` to final `SkillResult` classification. Today all headless tests use `MockSubprocessRunner` with pre-constructed `SubprocessResult` objects; the NDJSON parsing and classification logic is never exercised against realistic subprocess output. These tests close that gap using 4 groups: NDJSON stream robustness (4 tests), context exhaustion edge cases (2 tests), kill boundary scenarios (2 tests), and process behavior simulation (2 tests). No production code changes are required. The `api-simulator` dev dependency was added by #607. ## Requirements ### BRIDGE — Integration Bridge - **REQ-BRIDGE-001:** Tests must use `fake_claude.run()` to produce real subprocess output, not hand-constructed strings. - **REQ-BRIDGE-002:** Tests must feed `proc.stdout` through `parse_session_result()` from `autoskillit.execution.session`. - **REQ-BRIDGE-003:** Tests must wrap the parsed result in a `SubprocessResult` and pass it to `_build_skill_result()` for full classification. ### PARSE — NDJSON Parse Robustness - **REQ-PARSE-001:** The parser must correctly skip `type=system` / `api_retry` records and still extract the final `type=result` record. - **REQ-PARSE-002:** The parser must handle non-JSON lines (stream corruption) gracefully without losing valid records. - **REQ-PARSE-003:** When multiple `type=result` records appear, the last one must determine classification. ### CTX — Context Exhaustion - **REQ-CTX-001:** A flat assistant record containing the context exhaustion marker with no `type=result` record must classify as `context_exhaustion` with `needs_retry=True`. - **REQ-CTX-002:** A `type=result` record with `is_error=True` and `errors` containing the marker must classify as retriable with `retry_reason=RESUME`. ### KILL — Kill Boundary - **REQ-KILL-001:** A truncated stream (via `truncate_after`) must produce `subtype=unparseable` or partial classification with nonzero exit code. - **REQ-KILL-002:** An `interrupted` subtype with nonzero exit code must result in `needs_retry=False` (gated by returncode). ### PROC — Process Behavior - **REQ-PROC-001:** The hang-after-result scenario must verify that the result record was emitted to stdout before the process hung. - **REQ-PROC-002:** Mid-stream exit via `inject_exit` must produce the correct exit code and truncated stdout. ### COMPAT — Compatibility - **REQ-COMPAT-001:** Existing `test_headless.py` and `test_session.py` tests must remain unchanged and passing. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; START([FakeClaudeCLI ━━━━━━━━━━ api-simulator fixture]) subgraph Bridge ["★ E2E Test Bridge (new)"] direction TB RUN["★ fake_claude.run() ━━━━━━━━━━ CompletedProcess with real NDJSON stdout"] WRAP["★ _classify() / inline ━━━━━━━━━━ Wrap in SubprocessResult pid=0, caller termination"] end subgraph Parse ["parse_session_result()"] direction TB SCAN{"stdout empty?"} LOOP["Scan NDJSON lines ━━━━━━━━━━ JSON decode; skip errors last type=result wins"] CTX_FLAG{"flat assistant output_tokens=0 + ctx marker?"} RESULT_FOUND{"result record found?"} end subgraph Classify ["_compute_outcome()"] direction TB SUCCESS_GATE{"_compute_success ━━━━━━━━━━ returncode=0? is_error? result?"} RETRY_GATE{"_compute_retry ━━━━━━━━━━ session.needs_retry? kill anomaly?"} CONTRA{"contradiction success+retry?"} DEADEND{"dead-end failed+confirmed +ABSENT?"} end subgraph Normalize ["_normalize_subtype()"] NORM["Map raw CLI subtype ━━━━━━━━━━ to final string label"] end subgraph Gates ["Post-Classification Gates"] BUDGET{"budget exhausted?"} ZERO{"zero writes when expected?"} end subgraph Outcomes ["SkillResult"] direction LR OK([success]) CTX([context_exhaustion needs_retry=True]) EMPTY([empty_output / unparseable]) INTR([interrupted needs_retry=False]) FAIL([failure terminal]) end START --> RUN RUN --> WRAP WRAP --> SCAN SCAN -->|"empty"| EMPTY SCAN -->|"non-empty"| LOOP LOOP --> CTX_FLAG CTX_FLAG -->|"yes → jsonl_context_exhausted=True"| RESULT_FOUND CTX_FLAG -->|"no"| RESULT_FOUND RESULT_FOUND -->|"yes"| SUCCESS_GATE RESULT_FOUND -->|"no → UNPARSEABLE / CTX_EXHAUSTION"| RETRY_GATE SUCCESS_GATE --> RETRY_GATE RETRY_GATE --> CONTRA CONTRA -->|"demote success"| DEADEND CONTRA -->|"consistent"| DEADEND DEADEND -->|"DRAIN_RACE"| NORM DEADEND -->|"terminal"| NORM NORM --> BUDGET BUDGET -->|"BUDGET_EXHAUSTED"| FAIL BUDGET -->|"ok"| ZERO ZERO -->|"zero_writes"| CTX ZERO -->|"ok"| OK SUCCESS_GATE -->|"returncode!=0"| INTR class START terminal; class RUN,WRAP newComponent; class LOOP handler; class SCAN,CTX_FLAG,RESULT_FOUND stateNode; class SUCCESS_GATE,RETRY_GATE,CONTRA,DEADEND phase; class NORM handler; class BUDGET,ZERO detector; class OK,CTX,EMPTY,INTR,FAIL terminal; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | Start (FakeClaudeCLI), final SkillResult outcomes | | Green | New Component | ★ `_classify()` bridge helper and `fake_claude.run()` — new test code | | Orange | Handler | NDJSON scan/accumulation and subtype normalization | | Teal | State | Decision points: empty check, context flag, result found | | Purple | Phase | Outcome computation gates (success, retry, contradiction, dead-end) | | Red | Detector | Post-classification guards (budget, zero-write) | ### Error/Resilience Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 55, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; START([★ E2E Test Suite ━━━━━━━━━━ 10 failure scenarios via FakeClaudeCLI]) subgraph ParseGates ["NDJSON Parse Resilience Gates"] direction TB EMPTY_CHECK{"stdout empty?"} JSON_ERR["Corrupt / non-JSON lines ━━━━━━━━━━ silently skipped (test 2: corrupt_stream)"] API_RETRY["api_retry records ━━━━━━━━━━ skipped — not type=result (test 1: inject_api_retry)"] LAST_WINS["Multiple result records ━━━━━━━━━━ last record wins (test 3: two results)"] EXHAUST["Exhausted retries ━━━━━━━━━━ no result record emitted (test 4: exhaust=True)"] end subgraph CtxDetect ["Context Exhaustion Detection"] direction TB FLAT_DETECT{"flat assistant output_tokens=0 + ctx marker? (test 5)"} ERR_DETECT{"is_error=True AND marker in errors[]? (test 6)"} CTX_FLAG["jsonl_context_exhausted ━━━━━━━━━━ race-resilient flag"] end subgraph KillGates ["Kill Boundary Gates"] direction TB RC_CHECK{"returncode != 0?"} KILL_ANOM{"_is_kill_anomaly? ━━━━━━━━━━ UNPARSEABLE /\nEMPTY_OUTPUT /\nINTERRUPTED"} INTR_GATE{"subtype=interrupted + rc != 0? (test 8)"} end subgraph PostGates ["Post-Classification Guards"] BUDGET{"consecutive failures > budget max?"} ZERO_WRITE{"success AND write_count=0 AND write expected?"} end T_SUCCESS([success ━━━━━━━━━━ needs_retry=False]) T_CTX([context_exhaustion ━━━━━━━━━━ needs_retry=True, RESUME]) T_EMPTY([empty_output / unparseable ━━━━━━━━━━ needs_retry=True via RESUME]) T_INTR([interrupted ━━━━━━━━━━ needs_retry=False, terminal]) T_BUDGET([budget_exhausted ━━━━━━━━━━ needs_retry=False, terminal]) T_ZERO([zero_writes ━━━━━━━━━━ needs_retry=True]) START --> EMPTY_CHECK EMPTY_CHECK -->|"empty stdout"| T_EMPTY EMPTY_CHECK -->|"has content"| JSON_ERR JSON_ERR -->|"skip bad lines, continue"| API_RETRY API_RETRY -->|"skip, continue to result"| LAST_WINS LAST_WINS -->|"no result"| EXHAUST EXHAUST -->|"empty_output / unparseable"| T_EMPTY LAST_WINS -->|"result found"| FLAT_DETECT FLAT_DETECT -->|"yes"| CTX_FLAG FLAT_DETECT -->|"no"| ERR_DETECT ERR_DETECT -->|"yes"| CTX_FLAG CTX_FLAG -->|"needs_retry=True"| T_CTX ERR_DETECT -->|"no"| RC_CHECK RC_CHECK -->|"nonzero (test 7,8,10)"| INTR_GATE INTR_GATE -->|"yes → no retry"| T_INTR INTR_GATE -->|"no"| T_EMPTY RC_CHECK -->|"zero"| KILL_ANOM KILL_ANOM -->|"anomaly → RESUME retry"| T_EMPTY KILL_ANOM -->|"no anomaly"| BUDGET BUDGET -->|"exceeded"| T_BUDGET BUDGET -->|"ok"| ZERO_WRITE ZERO_WRITE -->|"violation"| T_ZERO ZERO_WRITE -->|"ok"| T_SUCCESS class START newComponent; class EMPTY_CHECK,FLAT_DETECT,ERR_DETECT,RC_CHECK,KILL_ANOM,INTR_GATE stateNode; class JSON_ERR,API_RETRY,LAST_WINS,EXHAUST,CTX_FLAG handler; class BUDGET,ZERO_WRITE detector; class T_SUCCESS,T_CTX,T_EMPTY,T_INTR,T_BUDGET,T_ZERO terminal; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Green | New Component | ★ E2E test suite (new) — exercises all failure paths | | Teal | Decision Gates | Key detection and routing decisions | | Orange | Handler | Parse resilience processing and flag setting | | Red | Guard | Post-classification safety guards (budget, zero-write) | | Dark Blue | Terminal | Final SkillResult outcome states | ### State Lifecycle Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 45, 'rankSpacing': 55, 'curve': 'basis'}}}%% flowchart TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; TEST["★ test_session_classification_e2e.py ━━━━━━━━━━ 10 scenarios assert field contracts across all classification paths"] subgraph ParseState ["INIT_ONLY — Set by Parser, Never Overwritten"] direction LR CTX_EX["jsonl_context_exhausted ━━━━━━━━━━ flat assistant → True read by _is_context_exhausted()"] RC["returncode / termination ━━━━━━━━━━ from SubprocessResult used in all compute_* gates"] SID["session_id ━━━━━━━━━━ from result record passed through unchanged"] end subgraph DerivedState ["DERIVED — Computed, Not Stored During Parse"] direction TB SUCCESS_D["success ━━━━━━━━━━ returncode=0 AND content gates must be False if needs_retry=True"] RETRY_D["needs_retry + retry_reason ━━━━━━━━━━ RESUME / ZERO_WRITES / etc. only valid pair if needs_retry=True"] SUBTYPE_D["subtype (normalized) ━━━━━━━━━━ 'success' / 'context_exhaustion' / 'interrupted' / etc."] end subgraph Contracts ["CONTRACT ENFORCEMENT GATES"] direction TB CONTRA_GATE{"Contradiction Guard ━━━━━━━━━━ success=True AND needs_retry=True?"} INTR_GATE{"Interrupted Gate ━━━━━━━━━━ subtype=interrupted AND rc != 0?"} CTX_GATE{"Context Exhaustion ━━━━━━━━━━ jsonl_context_exhausted OR marker in errors[]?"} BUDGET_GATE{"Budget Guard ━━━━━━━━━━ consecutive failures > budget max?"} end subgraph ResumeStates ["RESUME SAFETY — needs_retry contract"] direction LR RESUME_OK(["needs_retry=True retry_reason=RESUME ━━━━━━━━━━ context_exhaustion path"]) NO_RETRY(["needs_retry=False retry_reason=NONE ━━━━━━━━━━ interrupted + rc!=0 path"]) BUDGET_STOP(["needs_retry=False retry_reason=BUDGET_EXHAUSTED ━━━━━━━━━━ terminal, no more retries"]) end TEST -->|"asserts all contracts"| CTX_EX TEST --> RC TEST --> SID CTX_EX -->|"read by"| CTX_GATE RC -->|"read by"| INTR_GATE RC -->|"read by"| CONTRA_GATE CTX_GATE -->|"exhausted → needs_retry=True"| RETRY_D CTX_GATE -->|"not exhausted"| INTR_GATE INTR_GATE -->|"interrupted+rc!=0 → terminal"| NO_RETRY INTR_GATE -->|"other"| CONTRA_GATE CONTRA_GATE -->|"contradiction → demote success"| SUCCESS_D CONTRA_GATE -->|"consistent"| SUCCESS_D RETRY_D --> BUDGET_GATE SUCCESS_D --> BUDGET_GATE SUBTYPE_D --> BUDGET_GATE BUDGET_GATE -->|"exceeded → clamp"| BUDGET_STOP BUDGET_GATE -->|"within budget"| RESUME_OK class TEST newComponent; class CTX_EX,RC,SID detector; class SUCCESS_D,RETRY_D,SUBTYPE_D phase; class CTX_GATE,INTR_GATE,CONTRA_GATE,BUDGET_GATE stateNode; class RESUME_OK,NO_RETRY,BUDGET_STOP cli; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Green | New Component | ★ E2E test suite — asserts all field contracts | | Red | INIT_ONLY | Fields set by parser, never overwritten | | Purple | Derived | Fields computed from classification, not stored during parse | | Teal | Gates | Contract enforcement decision points | | Dark Blue | Resume States | Terminal resume-safety outcomes | Closes #608 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-608-20260405-085643-660865/.autoskillit/temp/make-plan/test_session_failure_classification_with_api_simulator_plan_2026-04-05_090300.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 31 | 22.4k | 812.6k | 59.1k | 1 | 12m 6s | | verify | 21 | 17.2k | 863.3k | 66.7k | 1 | 9m 28s | | implement | 2.5k | 9.4k | 1.1M | 48.2k | 1 | 5m 43s | | fix | 21 | 7.3k | 703.0k | 42.4k | 1 | 7m 38s | | audit_impl | 10 | 7.4k | 139.9k | 39.6k | 1 | 3m 29s | | open_pr | 47 | 27.2k | 2.2M | 74.8k | 1 | 10m 44s | | **Total** | 2.7k | 90.9k | 5.8M | 330.8k | | 49m 11s | --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…rect Changes (#623) ## Summary When `implement-worktree-no-merge` runs and the model ignores instructions to create a worktree (via `git worktree add`), it edits files directly in the clone directory. This leaves dirty uncommitted changes (or direct commits) on the clone's branch. On retry, the next session inherits a contaminated working tree. This plan adds a **clone contamination guard** to the headless execution pipeline. The guard: 1. Snapshots the clone's HEAD SHA before each worktree-based skill session 2. After a failed session where no worktree was created, detects contamination (uncommitted changes or direct commits) 3. Reverts the clone to its pre-session state 4. Logs the cleanup for pipeline observability Key architectural insight: `EnterWorktree` does not exist in this codebase. Worktree creation uses standard `git worktree add` via Bash, and success is signaled by emitting `worktree_path = <path>` tokens in assistant messages. Detection of "no worktree created" is therefore: no `worktree_path` token in `session.assistant_messages`. ## Requirements ### Snapshot (SNAP) - **REQ-SNAP-001:** The system must capture the clone HEAD SHA before each `run_skill` invocation for worktree-based skills (implement-worktree-no-merge, retry-worktree). - **REQ-SNAP-002:** The system must capture the clone working tree cleanliness state (clean/dirty) before each `run_skill` invocation for worktree-based skills. ### Detection (DET) - **REQ-DET-001:** The system must detect uncommitted changes in the clone CWD after a worktree-based skill session that was adjudicated as failure. - **REQ-DET-002:** The system must detect direct commits in the clone (HEAD differs from pre-session SHA) after a worktree-based skill session that was adjudicated as failure. - **REQ-DET-003:** The system must verify whether `EnterWorktree` was called during the session by inspecting tool_uses in the session result. ### Revert (REV) - **REQ-REV-001:** The system must revert uncommitted changes in the clone when contamination is detected (git checkout + git clean). - **REQ-REV-002:** The system must revert direct commits in the clone when contamination is detected (git reset to pre-session SHA). - **REQ-REV-003:** The revert must only execute when all three conditions are met: worktree-based skill, adjudicated failure, and no EnterWorktree call in tool_uses. ### Observability (OBS) - **REQ-OBS-001:** The system must log all contamination detection and revert actions in the audit log with sufficient detail for pipeline visibility. - **REQ-OBS-002:** The audit log entry must include the pre-session SHA, post-session SHA, list of contaminated files, and revert action taken. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; START(["● run_headless_core()"]) subgraph PreSession ["★ Pre-Session Snapshot"] direction TB IS_WT{"★ is_worktree_skill? ━━━━━━━━━━ implement-worktree-no-merge or retry-worktree in cmd"} IS_CLONE{"★ not is_git_worktree? ━━━━━━━━━━ cwd is clone root, not a worktree"} SNAP["★ snapshot_clone_state() ━━━━━━━━━━ git rev-parse HEAD → CloneSnapshot(head_sha)"] end subgraph Session ["Existing Session Lifecycle"] direction TB RUN["● runner() subprocess ━━━━━━━━━━ Headless Claude CLI"] BUILD["● _build_skill_result() ━━━━━━━━━━ Adjudication + gates worktree_path always extracted"] end subgraph PostGuard ["★ Post-Session Clone Guard"] direction TB CHK_SNAP{"★ snapshot captured? ━━━━━━━━━━ _clone_snapshot is not None"} CHK_SUCC{"★ skill_result.success?"} CHK_WT{"★ worktree_path set? ━━━━━━━━━━ skill_result.worktree_path is not None"} DETECT["★ detect_contamination() ━━━━━━━━━━ git rev-parse HEAD → post_sha git status --porcelain → files"] CHK_DIRTY{"★ contamination found? ━━━━━━━━━━ post_sha ≠ pre_sha OR dirty files"} REVERT["★ revert_contamination() ━━━━━━━━━━ git reset --hard pre_sha git clean -fd"] AUDIT["★ audit.record_failure() ━━━━━━━━━━ subtype=clone_contamination RetryReason.CLONE_CONTAMINATION"] end FLUSH["● flush_session_log() ━━━━━━━━━━ ★ clone_contamination_reverted → summary.json"] RETURN(["● return skill_result"]) SKIP_SNAP(["skip → _clone_snapshot=None"]) START --> IS_WT IS_WT -->|"no: not a worktree skill"| SKIP_SNAP IS_WT -->|"yes"| IS_CLONE IS_CLONE -->|"already a worktree CWD"| SKIP_SNAP IS_CLONE -->|"clone root CWD"| SNAP SNAP --> RUN SKIP_SNAP --> RUN RUN --> BUILD BUILD --> CHK_SNAP CHK_SNAP -->|"no snapshot"| FLUSH CHK_SNAP -->|"snapshot exists"| CHK_SUCC CHK_SUCC -->|"success=True"| FLUSH CHK_SUCC -->|"success=False"| CHK_WT CHK_WT -->|"worktree created"| FLUSH CHK_WT -->|"no worktree"| DETECT DETECT --> CHK_DIRTY CHK_DIRTY -->|"clean"| FLUSH CHK_DIRTY -->|"contaminated"| REVERT REVERT --> AUDIT AUDIT --> FLUSH FLUSH --> RETURN class START,RETURN,SKIP_SNAP terminal; class IS_WT,IS_CLONE,CHK_SNAP,CHK_SUCC,CHK_WT,CHK_DIRTY stateNode; class RUN,BUILD,FLUSH handler; class SNAP,DETECT,REVERT,AUDIT newComponent; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | Entry/exit points of `run_headless_core` | | Teal | State/Decision | Routing decisions that control guard activation | | Orange | Handler | Existing subprocess, adjudication, and telemetry nodes | | Green | New Component | New clone contamination guard components (★) | ### Module Dependency Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%% graph TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; subgraph L3 ["L3 — SERVER (existing, unchanged)"] direction LR SERVER["server/tools_execution.py ━━━━━━━━━━ run_skill, run_cmd handlers"] end subgraph L1 ["L1 — EXECUTION"] direction TB HEADLESS["● execution/headless.py ━━━━━━━━━━ run_headless_core() _build_skill_result()"] CLONE_GUARD["★ execution/clone_guard.py ━━━━━━━━━━ is_worktree_skill() snapshot_clone_state() check_and_revert_clone_contamination()"] SESSION_LOG["● execution/session_log.py ━━━━━━━━━━ flush_session_log() ★ clone_contamination_reverted"] COMMANDS["execution/commands.py ━━━━━━━━━━ build_full_headless_cmd()"] SESSION["execution/session.py ━━━━━━━━━━ ClaudeSessionResult"] end subgraph L0 ["L0 — CORE (zero autoskillit imports)"] direction TB ENUMS["● core/_type_enums.py ━━━━━━━━━━ RetryReason enum ★ CLONE_CONTAMINATION added"] TYPES["core/types.py ━━━━━━━━━━ SkillResult, FailureRecord AuditStore, SubprocessRunner"] PATHS["core/paths.py ━━━━━━━━━━ is_git_worktree()"] LOGGING["core/logging.py ━━━━━━━━━━ get_logger()"] CORE_INIT["core/__init__.py ━━━━━━━━━━ Re-exports all L0 surface"] end subgraph Ext ["EXTERNAL (stdlib)"] STDLIB["dataclasses, pathlib datetime, typing"] end SERVER -->|"imports run_headless"| HEADLESS HEADLESS -->|"★ imports 3 functions"| CLONE_GUARD HEADLESS -->|"imports"| COMMANDS HEADLESS -->|"imports"| SESSION HEADLESS -->|"imports"| SESSION_LOG HEADLESS -->|"imports core surface"| CORE_INIT CLONE_GUARD -->|"★ imports FailureRecord RetryReason, SkillResult get_logger, is_git_worktree"| CORE_INIT SESSION_LOG -->|"imports"| LOGGING CORE_INIT -->|"re-exports"| ENUMS CORE_INIT -->|"re-exports"| TYPES CORE_INIT -->|"re-exports"| PATHS CORE_INIT -->|"re-exports"| LOGGING TYPES -->|"imports RetryReason"| ENUMS CLONE_GUARD -->|"stdlib only"| STDLIB ENUMS -->|"stdlib only"| STDLIB class SERVER cli; class HEADLESS,SESSION_LOG,COMMANDS,SESSION handler; class CLONE_GUARD newComponent; class ENUMS,TYPES,PATHS,LOGGING,CORE_INIT stateNode; class STDLIB integration; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Server (L3) | MCP tool handlers — top application layer | | Orange | Execution (L1) | Service/orchestration layer modules | | Green | New Module | `clone_guard.py` — new L1 execution module (★) | | Teal | Core (L0) | Stable vocabulary/type layer — high fan-in | | Red | External | Standard library dependencies | Closes #617 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-617-20260405-085643-202786/.autoskillit/temp/make-plan/clone_contamination_guard_plan_2026-04-05_090600.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 6.9k | 23.4k | 1.7M | 82.7k | 1 | 10m 39s | | verify | 33 | 20.7k | 1.4M | 55.6k | 1 | 8m 39s | | implement | 81 | 24.3k | 4.4M | 89.7k | 1 | 10m 6s | | fix | 40 | 14.4k | 1.7M | 62.9k | 1 | 9m 17s | | audit_impl | 13 | 11.0k | 288.2k | 45.3k | 1 | 4m 14s | | open_pr | 28 | 20.1k | 1.0M | 55.4k | 1 | 7m 18s | | **Total** | 7.1k | 113.9k | 10.5M | 391.6k | | 50m 15s | --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…rtifact Merge Phase (#625) ## Summary Add a six-step archival phase to the end of the research recipe (`research.yaml`) that separates research artifacts from experimental code before completion. After all review cycles, re-runs, and CI checks finish, the new phase: (1) captures the experiment branch name, (2) creates a clean artifact-only branch containing only `research/` from a temporary worktree, (3) opens an artifact PR targeting the base branch, (4) tags the full experiment branch under `archive/research/` for permanent reference, (5) closes the original experiment PR with cross-reference links, then (6) proceeds to `research_complete`. Every archival step degrades gracefully — `on_failure` routes to `research_complete` so the pipeline never blocks on archival failures. ## Requirements ### SPLIT — Artifact Extraction - **REQ-SPLIT-001:** The recipe must create a new branch from the base branch (e.g., main) containing only the `research/` directory contents from the experiment branch, with no production source file changes. - **REQ-SPLIT-002:** The artifact extraction must use `git checkout <experiment-branch> -- research/` (or equivalent) to copy only the research directory's file state, not replay commit history. - **REQ-SPLIT-003:** The artifact-only branch must produce a single clean commit with a descriptive message referencing the experiment name. ### PR — Artifact PR - **REQ-PR-001:** The recipe must open a PR targeting the base branch with the artifact-only branch, referencing the original experiment PR number and summarizing key findings in the body. - **REQ-PR-002:** The artifact PR must contain zero changes to production source files — only files under `research/`. ### TAG — Branch Archival - **REQ-TAG-001:** The recipe must create an annotated git tag with the prefix `archive/research/` capturing the final state of the experiment branch (after all reviews, re-runs, and CI pass). - **REQ-TAG-002:** The annotated tag message must include the experiment name and a note that the report was merged via the artifact PR. - **REQ-TAG-003:** The tag must be pushed to the remote before the experiment branch is cleaned up. ### CLOSE — Experiment PR Closure - **REQ-CLOSE-001:** The recipe must close the original experiment PR with a comment linking to the artifact PR, the archive tag, and any follow-up implementation issues. - **REQ-CLOSE-002:** The closure comment must explain why the PR was not merged (experimental code in production source files) and where the research record is preserved. ### ORDER — Execution Ordering - **REQ-ORDER-001:** The archival phase must execute only after all review cycles, review resolutions, experiment re-runs (per #618), and CI checks have completed successfully. - **REQ-ORDER-002:** The archival phase must be the final phase before `research_complete`, not interleaved with review or re-validation steps. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; subgraph PostReview ["● Post-Review Phase (modified routing)"] direction TB GPR{"guard_pr_url ━━━━━━━━━━ pr_url set?"} RRP["● review_research_pr ━━━━━━━━━━ run_skill: review-pr skip_when_false: review_pr"] RRR["● resolve_research_review ━━━━━━━━━━ run_skill: resolve-review retries: 2"] CE{"check_escalations ━━━━━━━━━━ needs_rerun?"} RERUN["re_run_experiment ━━━━━━━━━━ run-experiment --adjust"] REWRITE["re_write_report ━━━━━━━━━━ write-report"] RETEST["re_test ━━━━━━━━━━ test_check"] REPUSH["● re_push_research ━━━━━━━━━━ git push"] end subgraph Archival ["★ Archival Phase (new)"] direction TB BA{"★ begin_archival ━━━━━━━━━━ pr_url truthy?"} CEB["★ capture_experiment_branch ━━━━━━━━━━ git rev-parse HEAD captures: experiment_branch"] CAB["★ create_artifact_branch ━━━━━━━━━━ worktree + checkout research/ captures: artifact_branch"] OAP["★ open_artifact_pr ━━━━━━━━━━ gh pr create (research/ only) captures: artifact_pr_url"] TEB["★ tag_experiment_branch ━━━━━━━━━━ git tag -a archive/research/* captures: archive_tag"] CEP["★ close_experiment_pr ━━━━━━━━━━ gh pr close + comment"] end RC([research_complete ━━━━━━━━━━ action: stop]) GPR -->|"pr_url empty"| RC GPR -->|"pr_url truthy"| RRP RRP -->|"changes_requested"| RRR RRP -->|"needs_human / default / fail"| BA RRR -->|"success"| CE RRR -->|"exhausted / fail"| BA CE -->|"needs_rerun=true"| RERUN CE -->|"default"| REPUSH RERUN --> REWRITE --> RETEST --> REPUSH REPUSH -->|"success / fail"| BA BA -->|"pr_url truthy"| CEB BA -->|"default"| RC CEB -->|"success"| CAB CEB -->|"fail"| RC CAB -->|"success"| OAP CAB -->|"fail"| RC OAP -->|"success"| TEB OAP -->|"fail"| RC TEB -->|"success"| CEP TEB -->|"fail"| RC CEP -->|"success / fail"| RC class GPR,CE,BA stateNode; class RRP,RRR,RERUN,REWRITE,RETEST,REPUSH handler; class CEB,CAB,OAP,TEB,CEP newComponent; class RC terminal; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | `research_complete` stop state | | Teal | State/Route | Decision and routing steps (guard_pr_url, check_escalations, begin_archival) | | Orange | Handler | Existing processing steps — `●` marks modified routing targets | | Green | New Component | Six new archival steps (`★`) — linear chain with graceful degradation | Closes #621 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260405-101015-593986/.autoskillit/temp/make-plan/research_recipe_post_completion_archival_plan_2026-04-05_101500.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 2.2k | 36.6k | 1.4M | 90.3k | 1 | 16m 17s | | verify | 32 | 25.8k | 1.2M | 55.5k | 1 | 14m 5s | | implement | 48 | 14.0k | 1.9M | 50.5k | 1 | 5m 52s | | audit_impl | 16 | 9.7k | 178.9k | 55.3k | 2 | 4m 31s | | open_pr | 22 | 11.7k | 690.1k | 46.2k | 1 | 4m 26s | | **Total** | 2.3k | 97.7k | 5.4M | 297.8k | | 45m 13s | --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

## Summary - Add `Configure git auth for private deps` step to `patch-bump-integration.yml` and `version-bump.yml` before `uv lock` runs - Fixes authentication failure when resolving the private `api-simulator` git dependency added in PR #613 - Mirrors the existing auth pattern already present in `tests.yml` (line 76) ## Root Cause PR #613 added `api-simulator` as a private git dependency in `pyproject.toml`. The `tests.yml` workflow was updated with git auth, but both version-bump workflows were missed. Every PR merged to `integration` since then fails at the `uv lock` step with: ``` fatal: could not read Username for 'https://github.com': terminal prompts disabled ``` ## Test plan - [ ] This PR's own CI passes (tests.yml) - [ ] After merge, the patch-bump workflow should succeed — verify by checking the `bump-patch` check on this PR's merge commit - [ ] Re-run a recent failed bump-patch workflow to confirm the fix 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

## Summary Fixes a 3-iteration ejection loop in the merge queue pipeline by introducing ejection-cause enrichment (`ejected_ci_failure` state and `ejection_cause` field in `wait_for_merge_queue`), a CI gate after every force-push (`ci_watch_post_queue_fix` step), and two post-rebase manifest validation gates (language-aware validity check and duplicate key scan) in `resolve-merge-conflicts`. Closes all six gaps identified in #627: blind CI ejection routing, missing CI gate after re-push, absent manifest/semantic validation, and missing `head_sha` in CI results. <details> <summary>Individual Group Plans</summary> ### Group 1: Implementation Plan: Queue Ejection Loop Fix — PART A ONLY This part addresses the Python code layer for the queue ejection loop fix (Gaps 2 and 5 from issue #627). **Gap 2** — `execution/merge_queue.py` currently returns `pr_state="ejected"` for every ejection regardless of cause. When GitHub's CI fails on a merge-group commit, the recipe cannot distinguish a CI failure ejection from a conflict ejection, so it retries conflict resolution indefinitely (no-op rebase loop). The fix: when the ejection is confirmed and `checks_state == "FAILURE"`, return `pr_state="ejected_ci_failure"` plus an `ejection_cause="ci_failure"` field, allowing recipe `on_result` routing to send CI failures directly to `diagnose_ci` instead of `queue_ejected_fix`. **Gap 5** — `server/tools_ci.py` infers `head_sha` from `git rev-parse HEAD` but never includes it in the JSON response. Recipe orchestrators cannot verify that CI results correspond to the current HEAD after a force-push. The fix: include `head_sha` in the `wait_for_ci` return dict when it was resolved. ### Group 2: Implementation Plan: Queue Ejection Loop Fix — PART B ONLY This part addresses the recipe and skill layer of the queue ejection loop fix (Gaps 1, 3, 4, 6 from issue #627). Part A (code layer) must be implemented first — this part routes on `pr_state="ejected_ci_failure"` which Part A introduces. **Gap 1** — `re_push_queue_fix` routes directly to `reenter_merge_queue` after force-push, bypassing CI. Fix: insert a new `ci_watch_post_queue_fix` step between `re_push_queue_fix` and `reenter_merge_queue`, mirroring the existing `ci_watch` step. **Gap 6** — `wait_for_queue` routes all `ejected` states to `queue_ejected_fix` (conflict resolution), even when the ejection was caused by a CI failure that conflict resolution cannot fix. Fix: add an `ejected_ci_failure` route before `ejected` in `wait_for_queue.on_result`, routing to `diagnose_ci` instead. **Gap 3** — `resolve-merge-conflicts` SKILL.md runs only `pre-commit run --all-files` post-rebase. Fix: add Step 5a — language-detected manifest validation using fast non-compiling checks. **Gap 4** — Even a clean rebase can produce duplicate keys when both branches independently added the same dependency. Fix: add Step 5b — targeted duplicate key scan in TOML/JSON manifest files. Applied to: `recipes/implementation.yaml`, `recipes/remediation.yaml`, `recipes/implementation-groups.yaml`, `skills_extended/resolve-merge-conflicts/SKILL.md`. </details> ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; %% TERMINALS %% START([wait_for_queue\nrecipe step]) END_OK([release_issue_success]) END_FAIL([release_issue_failure]) END_TIMEOUT([release_issue_timeout]) END_DIAG([diagnose_ci]) subgraph MQPoll ["● Merge Queue Watcher (merge_queue.py)"] direction TB POLL["poll GitHub GraphQL\n━━━━━━━━━━\nPR state + queue state\n+ checks_state"] MERGED{"merged?"} CI_FAIL{"● checks_state\n== 'FAILURE'?"} CONFIRM["confirmation window\n━━━━━━━━━━\nnot_in_queue_cycles++"] CONFIRMED{"cycles ≥ threshold?"} STALL{"stall retries\nexhausted?"} TIMEOUT{"deadline\nexceeded?"} end subgraph EjectRoute ["● Recipe Ejection Routing (implementation.yaml)"] direction TB ROUTE{"● pr_state?"} REENROLL["reenroll_stalled_pr\n━━━━━━━━━━\ntoggle_auto_merge tool"] end subgraph ConflictFix ["● Conflict Fix Sub-Flow (implementation.yaml)"] direction TB QFIX["queue_ejected_fix\n━━━━━━━━━━\nresolve-merge-conflicts skill"] ESC{"escalation_required?"} REPUSH["re_push_queue_fix\n━━━━━━━━━━\npush_to_remote force=true"] CI_WATCH["● ci_watch_post_queue_fix\n━━━━━━━━━━\nwait_for_ci tool\ntimeout=300s"] CI_PASS{"CI pass?"} DETECT["detect_ci_conflict\n━━━━━━━━━━\ndiagnose-ci skill"] REENTER["reenter_merge_queue\n━━━━━━━━━━\ngh pr merge --squash --auto"] end subgraph WFCITool ["● wait_for_ci tool handler (tools_ci.py)"] direction LR INFER["infer head_sha\n━━━━━━━━━━\ngit rev-parse HEAD"] CIWAIT["ci_watcher.wait(scope)"] ENRICH["● result includes head_sha\n━━━━━━━━━━\nverifies SHA matches HEAD\nafter force-push"] end %% MAIN FLOW %% START --> POLL POLL --> MERGED MERGED -->|"yes"| END_OK MERGED -->|"no"| CONFIRM CONFIRM --> CONFIRMED CONFIRMED -->|"no"| STALL CONFIRMED -->|"yes (not in queue)"| CI_FAIL STALL -->|"yes"| END_TIMEOUT STALL -->|"no"| TIMEOUT TIMEOUT -->|"yes"| END_TIMEOUT TIMEOUT -->|"no"| POLL CI_FAIL -->|"yes"| ROUTE CI_FAIL -->|"no"| ROUTE ROUTE -->|"ejected_ci_failure\n(● new route)"| END_DIAG ROUTE -->|"ejected"| QFIX ROUTE -->|"stalled"| REENROLL ROUTE -->|"timeout"| END_TIMEOUT REENROLL -->|"success"| START REENROLL -->|"failure"| END_FAIL QFIX --> ESC ESC -->|"true"| END_FAIL ESC -->|"false"| REPUSH REPUSH -->|"failure"| END_FAIL REPUSH -->|"success"| CI_WATCH CI_WATCH --> INFER --> CIWAIT --> ENRICH ENRICH --> CI_PASS CI_PASS -->|"failure"| DETECT CI_PASS -->|"success"| REENTER DETECT --> END_FAIL REENTER -->|"success"| START REENTER -->|"failure"| END_FAIL %% CLASS ASSIGNMENTS %% class START terminal; class END_OK,END_FAIL,END_TIMEOUT,END_DIAG terminal; class POLL,CONFIRM handler; class MERGED,CONFIRMED,STALL,TIMEOUT stateNode; class CI_FAIL,ROUTE,ESC,CI_PASS detector; class QFIX,REPUSH,REENTER handler; class REENROLL,DETECT handler; class CI_WATCH,INFER,CIWAIT,ENRICH newComponent; ``` ### State Lifecycle Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; subgraph MQResult ["wait_for_merge_queue Return Dict (merge_queue.py)"] direction TB PS["● pr_state : str\n━━━━━━━━━━\nmerged | ejected\nejected_ci_failure | stalled\ntimeout | error\n(bare literals, no StrEnum)"] SUC["success : bool\n━━━━━━━━━━\ntrue only for 'merged'"] REASON["reason : str\n━━━━━━━━━━\nhuman-readable\nalways present"] STALL["stall_retries_attempted : int\n━━━━━━━━━━\nalways present\nexcept 'error' path"] EC["● ejection_cause : str\n━━━━━━━━━━\n'ci_failure' only\nwhen pr_state==ejected_ci_failure\nCONDITIONAL FIELD"] end subgraph InternalPoll ["PRFetchState — Internal Polling State (not returned)"] direction LR CHECKS["checks_state : str|None\n━━━━━━━━━━\nGitHub StatusCheckRollup\nNone = no checks configured"] INQUEUE["in_queue : bool\n━━━━━━━━━━\nPR in mergeQueue.entries"] QSTATE["queue_state : str|None\n━━━━━━━━━━\nUNMERGEABLE | AWAITING_CHECKS\n| LOCKED | null"] end subgraph Gate1 ["● Ejection Decision Gate (merge_queue.py)"] direction TB CFAIL{"checks_state\n== 'FAILURE'?"} SET_ECI["● set pr_state='ejected_ci_failure'\n━━━━━━━━━━\nejection_cause='ci_failure'\nINJECTED into result"] SET_EJ["set pr_state='ejected'\n━━━━━━━━━━\nno ejection_cause field\n(absent, not null)"] end subgraph CIScope ["CIRunScope — Frozen Input Scope (core/types)"] direction LR WF["workflow : str|None\n━━━━━━━━━━\ne.g. 'tests.yml'"] HS["● head_sha : str|None\n━━━━━━━━━━\ngit rev-parse HEAD\nor caller-supplied"] end subgraph CIResult ["● wait_for_ci Return Dict (tools_ci.py)"] direction TB RUNID["run_id : int|None\n━━━━━━━━━━\nGitHub Actions run ID"] CONC["conclusion : str\n━━━━━━━━━━\nsuccess|failure|cancelled\naction_required|timed_out\nno_runs|error|unknown"] FJOBS["failed_jobs : list\n━━━━━━━━━━\nalways present\nempty on billing errors"] HSHA["● head_sha : str\n━━━━━━━━━━\nCONDITIONAL: present only\nwhen scope.head_sha truthy\ninjected by tool layer"] end subgraph ConsumerGate ["Recipe Routing Gate (on_result)"] direction TB ROUTE{"pr_state value?"} R1["ejected_ci_failure\n→ diagnose_ci"] R2["ejected\n→ queue_ejected_fix"] R3["merged|stalled|timeout\n→ other routes"] end %% FLOW %% CHECKS --> CFAIL INQUEUE --> CFAIL QSTATE --> CFAIL CFAIL -->|"FAILURE"| SET_ECI CFAIL -->|"other"| SET_EJ SET_ECI --> PS SET_ECI --> EC SET_EJ --> PS PS --> SUC PS --> REASON PS --> STALL HS --> CIResult WF --> CIResult RUNID --> CONC CONC --> FJOBS FJOBS --> HSHA PS --> ROUTE EC --> ROUTE ROUTE --> R1 ROUTE --> R2 ROUTE --> R3 HSHA -.->|"verifies HEAD\nafter force-push"| R2 %% CLASS ASSIGNMENTS %% class PS,EC,HSHA,SET_ECI,HS,CFAIL gap; class SUC,REASON,STALL,RUNID,CONC,FJOBS output; class CHECKS,INQUEUE,QSTATE,WF stateNode; class SET_EJ handler; class ROUTE,R1,R2,R3 detector; class InternalPoll phase; ``` ### Error/Resilience Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; END_OK([release_issue_success]) END_FAIL([release_issue_failure\n━━━━━━━━━━\nhuman escalation\nclone preserved]) END_DIAG([diagnose_ci]) subgraph MQLoop ["● Merge Queue Poll Loop (merge_queue.py)"] direction TB POLL["GraphQL fetch\n━━━━━━━━━━\nPR + queue state"] POLL_ERR{"Exception\ncaught?"} TIMEOUT_CHK{"deadline\nexceeded?"} STALL_CHK{"stall retries\n≥ max (3)?"} end subgraph EjectGate ["● Ejection Classification Gate (merge_queue.py)"] direction TB EJECT_DECISION{"● checks_state\n== 'FAILURE'?"} CI_EJ["● ejected_ci_failure\n━━━━━━━━━━\nejection_cause=ci_failure\nskips conflict resolution"] CONF_EJ["ejected\n━━━━━━━━━━\nno cause field\nconflict resolution"] end subgraph StallBreaker ["Stall Circuit Breaker (merge_queue.py)"] direction LR TOGGLE["_toggle_auto_merge\n━━━━━━━━━━\ndisable → 2s → re-enable\nbackoff: 30/60/120s"] TOGGLE_ERR{"Exception\ncaught?"} end subgraph ConflictPath ["Conflict Resolution Path (implementation.yaml)"] direction TB QFIX["queue_ejected_fix\n━━━━━━━━━━\nresolve-merge-conflicts skill"] ESC_CHK{"escalation\nrequired?"} REPUSH["re_push_queue_fix\n━━━━━━━━━━\nforce-push"] REPUSH_FAIL{"push\nfailed?"} end subgraph CIGate ["● CI Gate After Re-Push (implementation.yaml + tools_ci.py)"] direction TB CI_WATCH["● ci_watch_post_queue_fix\n━━━━━━━━━━\nwait_for_ci, timeout=300s\nincludes head_sha"] CI_CONC{"conclusion\n== success?"} DETECT["detect_ci_conflict\n━━━━━━━━━━\ngit merge-base check\n(stale base?)"] DETECT_CHK{"stale\nbase?"} CI_CF["ci_conflict_fix\n━━━━━━━━━━\nresolve-merge-conflicts"] end subgraph ManifestGates ["● Post-Rebase Manifest Validation (SKILL.md)"] direction TB STEP5A["● Step 5a: manifest validity\n━━━━━━━━━━\ncargo metadata / node JSON.parse\nuv lock --check / tomllib"] STEP5A_CHK{"manifest\nvalid?"} STEP5B["● Step 5b: duplicate key scan\n━━━━━━━━━━\nTOML dep sections\nJSON object_pairs_hook"] STEP5B_CHK{"duplicates\nfound?"} REBASE_ABORT["git rebase --abort\n━━━━━━━━━━\nescalation_required=true"] end %% POLL LOOP FLOW %% POLL --> POLL_ERR POLL_ERR -->|"yes: log + retry"| POLL POLL_ERR -->|"no"| TIMEOUT_CHK TIMEOUT_CHK -->|"yes"| END_FAIL TIMEOUT_CHK -->|"no"| STALL_CHK STALL_CHK -->|"yes: stalled"| END_FAIL STALL_CHK -->|"no: stall attempt"| TOGGLE TOGGLE --> TOGGLE_ERR TOGGLE_ERR -->|"yes: log + increment"| STALL_CHK TOGGLE_ERR -->|"no: success"| POLL %% EJECTION GATE %% STALL_CHK -->|"ejection confirmed"| EJECT_DECISION EJECT_DECISION -->|"FAILURE"| CI_EJ EJECT_DECISION -->|"other"| CONF_EJ CI_EJ --> END_DIAG CONF_EJ --> QFIX %% CONFLICT PATH %% QFIX --> STEP5A STEP5A --> STEP5A_CHK STEP5A_CHK -->|"invalid"| REBASE_ABORT STEP5A_CHK -->|"valid"| STEP5B STEP5B --> STEP5B_CHK STEP5B_CHK -->|"duplicates"| REBASE_ABORT STEP5B_CHK -->|"clean"| ESC_CHK REBASE_ABORT --> ESC_CHK ESC_CHK -->|"true"| END_FAIL ESC_CHK -->|"false"| REPUSH REPUSH --> REPUSH_FAIL REPUSH_FAIL -->|"yes"| END_FAIL REPUSH_FAIL -->|"no"| CI_WATCH %% CI GATE %% CI_WATCH --> CI_CONC CI_CONC -->|"yes"| END_OK CI_CONC -->|"no"| DETECT DETECT --> DETECT_CHK DETECT_CHK -->|"yes: stale base"| CI_CF DETECT_CHK -->|"no: code failure"| END_DIAG CI_CF --> ESC_CHK %% CLASS ASSIGNMENTS %% class END_OK,END_FAIL,END_DIAG terminal; class POLL,TOGGLE handler; class POLL_ERR,TOGGLE_ERR,TIMEOUT_CHK,STALL_CHK gap; class EJECT_DECISION,CI_CONC,DETECT_CHK,STEP5A_CHK,STEP5B_CHK,ESC_CHK,REPUSH_FAIL detector; class CI_EJ,CONF_EJ,REBASE_ABORT output; class QFIX,REPUSH,CI_WATCH,DETECT,CI_CF handler; class STEP5A,STEP5B phase; ``` Closes #627 ## Implementation Plan Plan files: - `/home/talon/projects/autoskillit-runs/impl-20260405-122055-152199/.autoskillit/temp/make-plan/queue_ejection_loop_plan_2026-04-05_122055_part_a.md` - `/home/talon/projects/autoskillit-runs/impl-20260405-122055-152199/.autoskillit/temp/make-plan/queue_ejection_loop_plan_2026-04-05_122055_part_b.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 37 | 31.7k | 1.9M | 113.2k | 1 | 11m 19s | | review | 3.4k | 5.6k | 147.3k | 41.5k | 1 | 5m 45s | | verify | 44 | 35.4k | 1.9M | 144.8k | 2 | 11m 15s | | implement | 100 | 33.5k | 4.6M | 123.5k | 2 | 12m 17s | | audit_impl | 15 | 14.0k | 279.5k | 44.2k | 1 | 3m 46s | | open_pr | 33 | 30.5k | 1.2M | 68.1k | 1 | 10m 58s | | **Total** | 3.6k | 150.8k | 9.9M | 535.3k | | 55m 23s | --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…Artifact Preservation (#630) ## Summary The review-design skill has four compounding defects that make GO verdicts structurally unreachable. This plan fixes all four: 1. **Threshold unreachable** — Replace the static `>= 3` warning threshold with a proportional formula based on active dimensions (`active_dimensions * WARNING_BUDGET_PER_DIM` where budget = 5), calibrated so that the spectral-init v6 baseline (32 warnings across ~7 dimensions, deemed "substantively sound") would receive a GO verdict. 2. **Prescriptive findings** — Add evaluative-only constraints to Critical Constraints and a shared subagent evaluation scope block before Step 2, requiring findings to describe WHAT is lacking, never HOW to fix it. 3. **Scope drift** — Add a design scope boundary to the shared subagent block, prohibiting evaluation of implementation code snippets and constraining review to experimental design elements. 4. **Artifact preservation** — Enhance the `create_worktree` step in research.yaml to copy all review-cycle artifacts (dashboards, revision guidance, plan versions, resolve-design-review output) into `research/.../artifacts/`, and add a `commit_research_artifacts` step before `push_branch` to capture phase-groups and phase-plans from the worktree. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; START([plan_experiment]) COMPLETE([research_complete]) STOP_OUT([design_rejected]) subgraph DesignReview ["● review_design Step (research.yaml)"] direction TB RD["● review_design ━━━━━━━━━━ run_skill retries: 2"] REVISE_ROUTE["revise_design ━━━━━━━━━━ route → plan_experiment"] RESOLVE["resolve_design_review ━━━━━━━━━━ run_skill, retries: 1"] end subgraph VerdictSynthesis ["● Step 7: Verdict Synthesis (review-design SKILL.md)"] direction TB SCOPE["● Evaluative Scope Gate ━━━━━━━━━━ Findings: WHAT is lacking Design boundary only"] RTCAP["rt_cap = RT_MAX_SEVERITY ━━━━━━━━━━ Downgrade red_team severity by type"] CLASSIFY["Classify findings ━━━━━━━━━━ critical_findings warning_findings"] ACTIVE["● active_dimensions ━━━━━━━━━━ count spawned non-SILENT dims (L1+L2+L3+L4+RT)"] THRESH["★ warning_threshold ━━━━━━━━━━ active_dims × 5 WARNING_BUDGET_PER_DIM=5"] VERDICT{"● Verdict Decision ━━━━━━━━━━ stop_triggers? critical? warnings≥threshold?"} end subgraph ArtifactPath ["★ Artifact Commit Path (research.yaml)"] direction TB TEST["● test ━━━━━━━━━━ test_check"] FIX["fix_tests ━━━━━━━━━━ run_skill"] RETEST["● retest ━━━━━━━━━━ test_check"] COMMIT["★ commit_research_artifacts ━━━━━━━━━━ run_cmd: copy phase-groups phase-plans → artifacts/ on_failure: push_branch"] end PUSH["push_branch ━━━━━━━━━━ run_cmd"] START -->|"run review_design"| RD RD -->|"STOP verdict"| RESOLVE RD -->|"REVISE verdict"| REVISE_ROUTE RD -->|"GO verdict"| create_worktree REVISE_ROUTE -->|"loop back"| START RESOLVE -->|"revised"| REVISE_ROUTE RESOLVE -->|"failed"| STOP_OUT RD -->|"on_failure / on_exhausted"| create_worktree create_worktree["create_worktree ━━━━━━━━━━ ★ copies review-cycles plan-versions artifacts"] create_worktree --> decompose["decompose_phases plan_phase implement_phase"] decompose --> experiment["run_experiment write_report"] experiment --> TEST TEST -->|"pass"| COMMIT TEST -->|"fail"| FIX FIX --> RETEST RETEST -->|"pass"| COMMIT RETEST -->|"fail"| PUSH COMMIT -->|"success or failure"| PUSH PUSH --> COMPLETE SCOPE -.->|"constraint applied to all dimension subagents"| CLASSIFY RTCAP --> CLASSIFY CLASSIFY --> ACTIVE ACTIVE --> THRESH THRESH --> VERDICT VERDICT -->|"stop_triggers"| STOP_OUT VERDICT -->|"critical_findings or warnings ≥ threshold"| REVISE_ROUTE VERDICT -->|"else"| create_worktree class START,COMPLETE,STOP_OUT terminal; class RD,RESOLVE,decompose,experiment,FIX handler; class REVISE_ROUTE,RTCAP,CLASSIFY phase; class VERDICT,ACTIVE stateNode; class SCOPE detector; class THRESH,COMMIT,create_worktree newComponent; class TEST,RETEST,PUSH output; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | Start, complete, and terminal states | | Orange | Handler | Processing steps (run_skill, run_cmd) | | Purple | Phase | Control flow, routing, severity capping | | Teal | State | Decision and counting nodes | | Red | Detector | Constraint gates (evaluative scope) | | Green | New | ★ new components, ● modified components | | Dark Teal | Output | test_check steps and push_branch | Closes #629 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260405-160303-009353/.autoskillit/temp/make-plan/fix-review-design-threshold-unreachable-prescriptive-finding_plan_2026-04-05_161500.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 2.8k | 22.6k | 1.2M | 85.0k | 1 | 10m 36s | | verify | 30 | 14.6k | 1.5M | 74.8k | 1 | 8m 28s | | implement | 62 | 19.9k | 4.1M | 92.5k | 1 | 7m 41s | | audit_impl | 87 | 10.6k | 473.5k | 47.1k | 1 | 6m 41s | | open_pr | 25 | 11.7k | 806.3k | 48.9k | 1 | 4m 22s | | **Total** | 3.0k | 79.4k | 8.1M | 348.3k | | 37m 50s | --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…ound Bash Tasks (#633) ## Summary Headless sessions running long-lived background Bash tasks (e.g. `cargo bench` launched via `run_in_background: true`) are killed as stale because the staleness signal is JSONL file growth, not actual session liveness. When the LLM goes idle waiting for a background child, the JSONL stops growing and the 20-minute staleness threshold is breached — even though child processes are actively running. Three changes eliminate this class of false kill: 1. **`_has_active_child_processes`** — a second suppression gate in `_session_log_monitor` that checks child process CPU activity before issuing a kill. Added alongside the existing `_has_active_api_connection` port-443 gate. 2. **`RecipeStep.stale_threshold`** — an optional per-step threshold field that recipe authors can raise for steps known to run long-lived experiments, passed through `run_skill` → `run_headless_core` → `_session_log_monitor`. 3. **Recipe YAML overrides** — `stale_threshold: 2400` (40 min) on specific long-running steps in `research.yaml`, `implementation.yaml`, `remediation.yaml`, `implementation-groups.yaml`, and `merge-prs.yaml`. ## Requirements ### STALE — Staleness Suppression via Child Process Detection - **REQ-STALE-001:** The system must detect active child processes in the headless session's process tree when the stale threshold is breached. - **REQ-STALE-002:** The system must suppress the stale kill when any child process in the tree reports CPU usage exceeding ~10% via `cpu_percent(interval=0)`. - **REQ-STALE-003:** The system must reset the staleness clock (`last_change`) when child process activity suppresses the stale kill, identical to the existing `_has_active_api_connection` suppression behavior. - **REQ-STALE-004:** The child process detection must follow the established exception-handling pattern, silently skipping `NoSuchProcess`, `ZombieProcess`, and `AccessDenied` errors per process. - **REQ-STALE-005:** The child process detection must only execute when the stale threshold has already been breached (zero performance impact during normal operation). - **REQ-STALE-006:** The child process detection must emit a structured log warning when suppressing a stale kill, following the pattern established by `_has_active_api_connection`. ### SCHEMA — Per-Step Stale Threshold in RecipeStep - **REQ-SCHEMA-001:** The `RecipeStep` dataclass must accept an optional `stale_threshold` field of type `int | None` with no default value (defaults to `None`). - **REQ-SCHEMA-002:** When `stale_threshold` is `None` on a recipe step, the global `RunSkillConfig.stale_threshold` (1200s) must apply. - **REQ-SCHEMA-003:** The `run_skill` MCP tool handler must accept an optional `stale_threshold` parameter and forward it to `run_headless_core`. - **REQ-SCHEMA-004:** The recipe validator must reject `stale_threshold` values that are not positive integers when set. ### RECIPE — Research Recipe Step Overrides - **REQ-RECIPE-001:** Research-oriented recipes must set `stale_threshold: 2400` (40 minutes) on specific long-running steps (e.g., `implement_phase`, `run_experiment`). - **REQ-RECIPE-002:** Fast-completing steps (e.g., `plan_phase`) must not have a `stale_threshold` override, relying on the global default. ### TEST — Test Coverage - **REQ-TEST-001:** Unit tests must verify `_has_active_child_processes` returns `True` when a child process exceeds the CPU threshold. - **REQ-TEST-002:** Unit tests must verify `_has_active_child_processes` returns `False` when all children are idle, when no children exist, and when exceptions are raised. - **REQ-TEST-003:** An integration test must verify stale suppression when a child process is CPU-active but has no port-443 connection. - **REQ-TEST-004:** The existing `TestSessionLogMonitorStaleSuppressionGate` test class must be extended with the child-process-active scenario. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; %% TERMINALS %% START([SESSION LAUNCHED]) T_COMPLETE([COMPLETION]) T_STALE([STALE — KILL]) %% CONFIG CHAIN %% subgraph Config ["● RECIPE STEP CONFIG (stale_threshold flow)"] direction TB RecipeStep["● RecipeStep YAML ━━━━━━━━━━ stale_threshold: 2400 (or unset → None)"] RunSkill["● run_skill handler ━━━━━━━━━━ tools_execution.py stale_threshold: int | None"] Runner["DefaultSubprocessRunner ━━━━━━━━━━ process.py default: 1200s"] end %% PHASE 1 %% subgraph Phase1 ["PHASE 1 — JSONL File Discovery (poll 1s, timeout 30s)"] direction TB P1_Poll["Poll session_log_dir ━━━━━━━━━━ ctime > spawn_time? Match session_id?"] P1_Found{"File found within 30s?"} end %% PHASE 2 %% subgraph Phase2 ["● PHASE 2 — Staleness Monitor Loop (poll every 2s)"] direction TB P2_Stat["stat(session_file) ━━━━━━━━━━ current_size vs last_size"] P2_Grew{"JSONL grew?"} P2_Marker["Read new content ━━━━━━━━━━ scan for completion marker in JSONL"] P2_MarkerFound{"Completion marker found?"} P2_ResetGrow["last_size = current_size last_change = now()"] P2_Elapsed{"elapsed >= stale_threshold?"} end %% SUPPRESSION GATES %% subgraph Gates ["● SUPPRESSION GATES (only fire when stale threshold breached)"] direction TB Gate1["_has_active_api_connection ━━━━━━━━━━ Walk proc tree ESTABLISHED port-443?"] Gate1_Active{"API conn active?"} Gate2["● _has_active_child_processes ━━━━━━━━━━ Walk child procs cpu_percent > 10%?"] Gate2_Active{"Child CPU > 10%?"} ResetClock["last_change = now() ━━━━━━━━━━ Suppress stale kill reset staleness clock"] end %% CONNECTIONS %% START --> RecipeStep RecipeStep -->|"stale_threshold (int|None)"| RunSkill RunSkill -->|"float(x) or None → default 1200s"| Runner Runner -->|"stale_threshold, pid"| P1_Poll P1_Poll --> P1_Found P1_Found -->|"yes"| P2_Stat P1_Found -->|"no (30s timeout)"| T_STALE P2_Stat --> P2_Grew P2_Grew -->|"yes"| P2_ResetGrow P2_ResetGrow --> P2_Marker P2_Marker --> P2_MarkerFound P2_MarkerFound -->|"yes"| T_COMPLETE P2_MarkerFound -->|"no"| P2_Elapsed P2_Grew -->|"no"| P2_Elapsed P2_Elapsed -->|"no (wait)"| P2_Stat P2_Elapsed -->|"yes"| Gate1 Gate1 --> Gate1_Active Gate1_Active -->|"yes"| ResetClock Gate1_Active -->|"no"| Gate2 Gate2 --> Gate2_Active Gate2_Active -->|"yes"| ResetClock Gate2_Active -->|"no"| T_STALE ResetClock -->|"continue loop"| P2_Stat %% CLASS ASSIGNMENTS %% class START,T_COMPLETE,T_STALE terminal; class RecipeStep,RunSkill handler; class Runner stateNode; class P1_Poll,P2_Stat,P2_Marker,P2_ResetGrow,ResetClock phase; class P1_Found,P2_Grew,P2_MarkerFound,P2_Elapsed,Gate1_Active,Gate2_Active stateNode; class Gate1 handler; class Gate2 newComponent; ``` ### Concurrency Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; %% TERMINALS %% START([SESSION LAUNCHED]) COMPLETE([TASK GROUP CANCELLED]) %% MAIN THREAD: Sequential setup %% subgraph MainSeq ["MAIN COROUTINE — Sequential Setup"] direction TB SpawnProc["Spawn Claude Code process ━━━━━━━━━━ asyncio subprocess get proc.pid"] CreateAcc["Create RaceAccumulator + trigger ━━━━━━━━━━ anyio.Event (idempotent set) channel_b_ready Event"] OpenTG["anyio.create_task_group() ━━━━━━━━━━ Fork: start 4–5 coroutines as tg.start_soon(...)"] TrigWait["await trigger.wait() ━━━━━━━━━━ Block until first watcher wins (or wall-clock timeout)"] DrainWait["Optional drain window ━━━━━━━━━━ await channel_b_ready if process exited but B pending"] CancelTG["tg.cancel_scope.cancel() ━━━━━━━━━━ Tear down all remaining tasks"] Resolve["resolve_termination(RaceSignals) ━━━━━━━━━━ Priority: exit > stale > completion"] end %% TASK GROUP: Concurrent watchers %% subgraph TaskGroup ["anyio TASK GROUP — Concurrent Watchers (cooperative, single event loop)"] direction LR subgraph ChA ["Channel A"] WatchProc["_watch_process ━━━━━━━━━━ await proc.wait() acc.process_exited=True"] WatchHB["_watch_heartbeat ━━━━━━━━━━ poll stdout NDJSON 0.5s acc.channel_a_confirmed=True"] end subgraph ChB ["● Channel B — Session Log"] ExtractID["_extract_stdout_session_id ━━━━━━━━━━ poll stdout for type=system sets stdout_session_id_ready"] WatchSL["● _watch_session_log ━━━━━━━━━━ calls _session_log_monitor acc.channel_b_status=COMPLETION|STALE"] end end %% STALENESS SUPPRESSION %% subgraph StaleGates ["● STALENESS SUPPRESSION — Sync psutil walks (inside _session_log_monitor)"] direction TB Gate1["_has_active_api_connection(pid) ━━━━━━━━━━ [parent + children(recursive=True)] net_connections port-443 ESTABLISHED?"] Gate2["● _has_active_child_processes(pid) ━━━━━━━━━━ [children(recursive=True) only] cpu_percent(interval=0) > 10%?"] ResetClock["last_change = monotonic() ━━━━━━━━━━ suppress stale kill continue Phase 2 loop"] ReturnStale["return STALE ━━━━━━━━━━ acc.channel_b_status = STALE trigger.set()"] end %% FLOW %% START --> SpawnProc SpawnProc --> CreateAcc CreateAcc --> OpenTG OpenTG -->|"tg.start_soon"| WatchProc OpenTG -->|"tg.start_soon"| WatchHB OpenTG -->|"tg.start_soon"| ExtractID OpenTG -->|"tg.start_soon"| WatchSL WatchProc -->|"trigger.set()"| TrigWait WatchHB -->|"trigger.set()"| TrigWait WatchSL -->|"trigger.set() after drain"| TrigWait WatchSL -->|"stale threshold breached"| Gate1 Gate1 -->|"no API conn"| Gate2 Gate2 -->|"child CPU active"| ResetClock Gate2 -->|"no activity"| ReturnStale Gate1 -->|"API conn active"| ResetClock ResetClock -->|"continue loop"| WatchSL TrigWait --> DrainWait DrainWait --> CancelTG CancelTG --> Resolve Resolve --> COMPLETE %% CLASS ASSIGNMENTS %% class START,COMPLETE terminal; class SpawnProc,CreateAcc,TrigWait,DrainWait,CancelTG,Resolve phase; class OpenTG detector; class WatchProc,WatchHB handler; class ExtractID handler; class WatchSL handler; class Gate1 handler; class Gate2 newComponent; class ResetClock output; class ReturnStale detector; ``` ### State Lifecycle Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; %% TERMINALS %% START([RECIPE YAML LOADED]) T_PASS([VALID — forwarded to run_skill]) T_FAIL([INVALID — validation error]) %% PARSE LAYER %% subgraph Parse ["● YAML → RecipeStep (io.py _parse_step)"] direction TB YAMLRead["● YAML key read ━━━━━━━━━━ data.get('stale_threshold') absent → None (no coercion)"] Construct["● RecipeStep(...) ━━━━━━━━━━ stale_threshold: int | None = None No __post_init__ mutations"] IntegrityGuard["_PARSE_STEP_HANDLED_FIELDS guard ━━━━━━━━━━ compile-time assert: fields == dataclass RuntimeError if diverged"] end %% VALIDATION LAYER %% subgraph Validation ["● STRUCTURAL VALIDATION (validator.py validate_recipe)"] direction TB IsNone{"stale_threshold is None?"} TypeCheck{"isinstance(int) AND > 0?"} AppendError["append error ━━━━━━━━━━ 'must be positive integer when set'"] PassThrough["field passes ━━━━━━━━━━ no validation error for None or valid int"] end %% SEMANTIC LAYER %% subgraph Semantic ["● SEMANTIC RULE — _TOOL_PARAMS registry (rules_tools.py)"] direction TB ToolParamsCheck["_TOOL_PARAMS['run_skill'] ━━━━━━━━━━ frozenset includes 'stale_threshold' dead-with-param rule: NO warning"] OtherToolWarn["Other tools ━━━━━━━━━━ stale_threshold not in their params dead-with-param: WARNING emitted"] end %% EXECUTION FORWARDING %% subgraph Execution ["EXECUTION FORWARDING (tools_execution.py run_skill)"] direction TB NullPath["stale_threshold = None ━━━━━━━━━━ → DefaultSubprocessRunner default = 1200s (global config)"] OverridePath["stale_threshold = int ━━━━━━━━━━ float(stale_threshold) → overrides global default"] Monitor["_session_log_monitor ━━━━━━━━━━ stale_threshold used as breach-detection window"] end %% FLOW %% START --> YAMLRead YAMLRead --> Construct Construct --> IntegrityGuard IntegrityGuard -->|"fields match — import OK"| IsNone IsNone -->|"yes (absent or None)"| PassThrough IsNone -->|"no (value present)"| TypeCheck TypeCheck -->|"valid"| PassThrough TypeCheck -->|"invalid (non-int or ≤ 0)"| AppendError AppendError --> T_FAIL PassThrough --> ToolParamsCheck ToolParamsCheck -->|"tool: run_skill"| T_PASS ToolParamsCheck -->|"other tool"| OtherToolWarn T_PASS --> NullPath T_PASS --> OverridePath NullPath --> Monitor OverridePath --> Monitor Monitor --> T_PASS %% CLASS ASSIGNMENTS %% class START,T_PASS,T_FAIL terminal; class YAMLRead,Construct handler; class IntegrityGuard detector; class IsNone,TypeCheck stateNode; class AppendError detector; class PassThrough output; class ToolParamsCheck newComponent; class OtherToolWarn gap; class NullPath,OverridePath,Monitor phase; ``` Closes #631 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260405-170436-566038/.autoskillit/temp/make-plan/fix_false_stale_kills_plan_2026-04-05_000000.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 2.8k | 45.6k | 2.0M | 151.7k | 2 | 19m 31s | | verify | 62 | 36.0k | 3.3M | 155.3k | 2 | 15m 1s | | implement | 149 | 47.2k | 9.6M | 183.8k | 2 | 16m 24s | | audit_impl | 102 | 20.0k | 762.1k | 90.1k | 2 | 10m 31s | | open_pr | 69 | 39.4k | 2.6M | 116.8k | 2 | 15m 32s | | review_pr | 38 | 57.4k | 1.8M | 103.1k | 1 | 18m 47s | | resolve_review | 55 | 32.5k | 3.1M | 84.3k | 1 | 14m 9s | | fix | 38 | 14.6k | 1.3M | 58.3k | 1 | 9m 9s | | **Total** | 3.3k | 292.6k | 24.3M | 943.5k | | 1h 59m | --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

## Summary All four bundled recipes (`implementation`, `remediation`, `merge-prs`, `implementation-groups`) currently ship with `audit: default: "true"`, meaning `audit-impl` runs unless explicitly disabled. This plan changes all four recipes to `default: "false"` so `audit-impl` is skipped by default and becomes opt-in. No structural changes to the step graph, routing, or test infrastructure are needed — only the ingredient default changes. **Scope:** 4 YAML ingredient default changes + 1 test assertion added. ## Requirements ### RCFG — Recipe Configuration - **REQ-RCFG-001:** The `audit` input in `implementation.yaml` must default to `"false"`. - **REQ-RCFG-002:** The `audit` input in `implementation-groups.yaml` must default to `"false"`. - **REQ-RCFG-003:** The `audit` input in `remediation.yaml` must default to `"false"`. - **REQ-RCFG-004:** The `audit` input in `merge-prs.yaml` must default to `"false"`. - **REQ-RCFG-005:** The `audit_impl` step definition and its `skip_when_false: "inputs.audit"` guard must remain unchanged in all recipes. - **REQ-RCFG-006:** Callers must still be able to opt in to audit-impl by passing `audit: "true"` at pipeline invocation time. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; %% TERMINALS %% START([Pipeline Invoked]) CONTINUE([Continue to push / merge]) ERROR([escalate_stop / register_clone_failure]) subgraph Ingredient ["● Ingredient Resolution"] direction TB AuditIng["● audit ingredient ━━━━━━━━━━ BEFORE: default='true' AFTER: default='false'"] end subgraph Gate ["skip_when_false Gate"] direction TB SkipCheck{"inputs.audit == 'true'?"} SkipBypass["BYPASS ━━━━━━━━━━ Skip audit_impl (now default path)"] RunAudit["● run audit-impl skill ━━━━━━━━━━ runs /autoskillit:audit-impl (now opt-in path)"] Verdict{"GO / NO GO?"} Remediate["remediate ━━━━━━━━━━ Route to remediation or re-plan"] end %% FLOW %% START --> AuditIng AuditIng -->|"resolves to 'false' (new default)"| SkipCheck SkipCheck -->|"false (default — bypass)"| SkipBypass SkipCheck -->|"true (opt-in — explicit)"| RunAudit RunAudit --> Verdict Verdict -->|"GO"| CONTINUE Verdict -->|"NO GO"| Remediate Verdict -->|"error"| ERROR Remediate -->|"re-plan loop"| START SkipBypass --> CONTINUE %% CLASS ASSIGNMENTS %% class START,CONTINUE,ERROR terminal; class AuditIng handler; class SkipCheck,Verdict stateNode; class SkipBypass phase; class RunAudit detector; class Remediate phase; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | Pipeline start, continuation, and error states | | Teal | State | Decision gates (skip_when_false, GO/NO GO) | | Orange | Handler | ● Audit ingredient (modified: default flipped to "false") | | Red | Detector | ● audit-impl skill execution (now opt-in path) | | Purple | Phase | Bypass path (now default) and remediation routing | ### State Lifecycle Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; %% TERMINALS %% START([Recipe Invoked]) GATE([skip_when_false Evaluated]) subgraph Contracts ["● INGREDIENT CONTRACT DEFINITIONS"] direction TB ImplYaml["● implementation.yaml ━━━━━━━━━━ audit: default: 'false' (was: 'true')"] ImplGroupsYaml["● implementation-groups.yaml ━━━━━━━━━━ audit: default: 'false' (was: 'true')"] RemediationYaml["● remediation.yaml ━━━━━━━━━━ audit: default: 'false' (was: 'true')"] MergePrsYaml["● merge-prs.yaml ━━━━━━━━━━ audit: default: 'false' (was: 'true')"] end subgraph Resolution ["INIT_ONLY: Ingredient Resolution"] direction TB CallerSupplied["Caller-supplied value ━━━━━━━━━━ audit='true' (opt-in) INIT_ONLY — frozen for run"] DefaultApplied["● Contract default applied ━━━━━━━━━━ audit='false' INIT_ONLY — frozen for run"] end subgraph TestGate ["● CONTRACT VALIDATION (test_bundled_recipes.py)"] direction TB TestAssert["● test_audit_ingredient_defaults_to_false ━━━━━━━━━━ @pytest.mark.parametrize asserts audit.default == 'false' for all 4 recipes"] end %% FLOW %% START -->|"caller passes audit='true'"| CallerSupplied START -->|"no audit arg (default)"| DefaultApplied ImplYaml --> DefaultApplied ImplGroupsYaml --> DefaultApplied RemediationYaml --> DefaultApplied MergePrsYaml --> DefaultApplied CallerSupplied --> GATE DefaultApplied --> GATE Contracts -.->|"validated by"| TestAssert %% CLASS ASSIGNMENTS %% class START terminal; class GATE stateNode; class ImplYaml,ImplGroupsYaml,RemediationYaml,MergePrsYaml handler; class CallerSupplied detector; class DefaultApplied phase; class TestAssert gap; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | Pipeline invocation point | | Teal | Gate | skip_when_false evaluation (INIT_ONLY field read) | | Orange | Contract | ● Recipe YAML ingredient contract definitions (modified) | | Red | Opt-in | Caller-supplied value override (explicit audit='true') | | Purple | Default | ● Contract default applied (now 'false') | | Yellow | Test | ● Contract validation test assertion (new) | Closes #632 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260405-180825-135856/.autoskillit/temp/make-plan/feat_default_audit_impl_off_plan_2026-04-05_181000.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 2.8k | 60.3k | 4.0M | 213.2k | 3 | 24m 25s | | verify | 82 | 43.0k | 3.9M | 193.2k | 3 | 22m 22s | | implement | 176 | 53.6k | 10.3M | 221.3k | 3 | 18m 51s | | audit_impl | 117 | 25.1k | 1.0M | 114.6k | 3 | 12m 6s | | open_pr | 101 | 60.0k | 3.7M | 168.5k | 3 | 22m 39s | | review_pr | 71 | 112.5k | 3.4M | 189.2k | 2 | 33m 19s | | resolve_review | 77 | 40.4k | 3.7M | 117.7k | 2 | 18m 16s | | fix | 38 | 14.6k | 1.3M | 58.3k | 1 | 9m 9s | | **Total** | 3.5k | 409.5k | 31.4M | 1.3M | | 2h 41m | Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

Increase sensitivity to catch quota exhaustion earlier, giving more buffer before hard API limits are hit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…l for Experiment Failures (#636) ## Summary This plan adds automated failure diagnosis to the research pipeline (issue #635). There are two distinct requirements: **DIAG**: Create a `troubleshoot-experiment` skill that reads session logs and process traces to classify why a research step failed, then emit a structured diagnostic artifact and `is_fixable` signal. Wire this skill into `research.yaml` so that `implement_phase` failures route to it instead of dying at `escalate_stop`. **SEP**: Fix the structural misuse of `retry-worktree` in `implement_phase`. The skill `retry-worktree` is designed to *resume* context-exhausted `implement-worktree` sessions — it is not a primary implementation driver. The research recipe already has the correct purpose-built skill: `implement-experiment`, which explicitly forbids experiment execution during implementation and routes context exhaustion directly to `run-experiment`. Switching `implement_phase` to use `implement-experiment` addresses REQ-SEP-001 and REQ-SEP-002 at the skill level, where the constraint is enforceable. ## Requirements ### DIAG — Experiment Failure Diagnosis - **REQ-DIAG-001:** The system must provide a skill that investigates why a research recipe step failed by reading session logs and process traces. - **REQ-DIAG-002:** The skill must classify the failure type (stale timeout, context exhaustion, build failure, data missing, parameter issue, unknown). - **REQ-DIAG-003:** The skill must emit a structured diagnostic artifact that downstream steps or the human can act on. - **REQ-DIAG-004:** The research recipe must route experiment failures to the diagnostic skill instead of `escalate_stop`. ### SEP — Structural Separation of Implementation and Execution - **REQ-SEP-001:** Implementation worktree steps must not perform experiment execution (benchmarks, profiling, data collection). - **REQ-SEP-002:** Experiment execution must route through the `run_experiment` step (or equivalent) which has appropriate timeout and retry semantics. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; START([RESEARCH PIPELINE]) ESCALATE([escalate_stop]) COMPLETE([research_complete]) subgraph PhaseMgmt ["Phase Management"] plan_phase["● plan_phase ━━━━━━━━━━ make-plan skill plans current group"] implement_phase["● implement_phase ━━━━━━━━━━ implement-experiment (was: retry-worktree) stale_threshold: 2400"] next_phase{"next_phase_or_experiment ━━━━━━━━━━ more phases?"} end subgraph DiagPhase ["★ Failure Diagnosis (NEW)"] troubleshoot["★ troubleshoot_implement_failure ━━━━━━━━━━ troubleshoot-experiment skill worktree_path + implement_phase"] route_fix{"★ route_implement_failure ━━━━━━━━━━ is_fixable?"} end subgraph SkillInternals ["★ troubleshoot-experiment Internals"] direction TB init_idx["★ initialize code-index ━━━━━━━━━━ set_project_path(worktree_path)"] session_lookup["★ locate failed session ━━━━━━━━━━ sessions.jsonl select success=false + cwd match"] read_diags["★ read session diagnostics ━━━━━━━━━━ summary.json: termination_reason write_call_count, exit_code anomalies.jsonl: kind, severity"] classify{"★ classify failure type ━━━━━━━━━━ priority-ordered decision table"} write_diag["★ diagnosis_{ts}.md ━━━━━━━━━━ failure_type, is_fixable evidence + recommended action"] emit_tokens["★ emit output tokens ━━━━━━━━━━ diagnosis_path= failure_type= is_fixable="] end subgraph ExperimentPhase ["Experiment Phase"] run_experiment["run_experiment ━━━━━━━━━━ run-experiment skill stale_threshold: 2400, retries: 2"] end START --> plan_phase plan_phase --> implement_phase implement_phase -->|"on_success"| next_phase implement_phase -->|"on_failure"| troubleshoot implement_phase -->|"on_exhausted / on_context_limit"| run_experiment next_phase -->|"more_phases"| plan_phase next_phase -->|"done"| run_experiment troubleshoot --> init_idx init_idx --> session_lookup session_lookup -->|"session found"| read_diags session_lookup -->|"no session / missing log"| write_diag read_diags --> classify classify -->|"context_limit → context_exhaustion, fixable=true"| write_diag classify -->|"stale + write=0 → stale_timeout, fixable=true"| write_diag classify -->|"exit!=0 + build error → build_failure, fixable=true"| write_diag classify -->|"infra error / OOM → environment_error, fixable=false"| write_diag classify -->|"unknown"| write_diag write_diag --> emit_tokens emit_tokens --> route_fix route_fix -->|"is_fixable=true"| plan_phase route_fix -->|"is_fixable=false"| ESCALATE troubleshoot -->|"on_failure (skill crash)"| ESCALATE run_experiment --> COMPLETE class START,ESCALATE,COMPLETE terminal; class plan_phase,implement_phase handler; class next_phase,route_fix,classify stateNode; class troubleshoot,init_idx,session_lookup,read_diags,write_diag,emit_tokens newComponent; ``` ### Module Dependency Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%% graph TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; subgraph L2_Recipe ["L2 — Recipe System"] recipe_io["recipe/io.py ━━━━━━━━━━ load_recipe, builtin_recipes_dir"] recipe_validator["recipe/validator.py ━━━━━━━━━━ validate_recipe"] recipe_contracts["recipe/contracts.py ━━━━━━━━━━ contract card generation"] end subgraph L1_Workspace ["L1 — Workspace"] workspace_skills["workspace/skills.py ━━━━━━━━━━ SkillResolver discovers skills_extended/"] end subgraph L0_Core ["L0 — Core"] core_paths["core/paths.py ━━━━━━━━━━ pkg_root() canonical package root"] end subgraph DataRecipes ["Data — Recipes (YAML)"] research_yaml["● recipes/research.yaml ━━━━━━━━━━ implement-experiment (was: retry-worktree) on_failure → troubleshoot_implement_failure on_exhausted → run_experiment"] end subgraph DataContracts ["Data — Contracts (YAML)"] skill_contracts["● recipe/skill_contracts.yaml ━━━━━━━━━━ ★ troubleshoot-experiment entry is_fixable output pattern"] end subgraph DataSkills ["Data — Skills (SKILL.md)"] troubleshoot_skill["★ skills_extended/troubleshoot-experiment/ ━━━━━━━━━━ session log reader failure classifier, is_fixable emitter"] implement_exp["skills_extended/implement-experiment/ ━━━━━━━━━━ no experiment execution routes exhaustion → run-experiment"] end subgraph Tests ["Tests"] test_diag["★ tests/recipe/test_research_recipe_diag.py ━━━━━━━━━━ validates research.yaml routing asserts skill_command swap"] test_contracts["★ tests/skills/test_troubleshoot_experiment_contracts.py ━━━━━━━━━━ SkillResolver discovery SKILL.md existence"] test_skills_ws["● tests/workspace/test_skills.py ━━━━━━━━━━ skill count +1"] end recipe_io -->|"loads at runtime"| research_yaml recipe_validator -->|"validates"| research_yaml recipe_contracts -->|"loads at runtime"| skill_contracts research_yaml -->|"skill_command references"| troubleshoot_skill research_yaml -->|"skill_command references"| implement_exp skill_contracts -->|"contract entry for"| troubleshoot_skill workspace_skills -->|"discovers via pkg_root()"| troubleshoot_skill workspace_skills -->|"uses"| core_paths test_diag -->|"imports"| recipe_io test_diag -->|"imports"| recipe_validator test_contracts -->|"imports"| workspace_skills test_contracts -->|"imports"| core_paths test_skills_ws -->|"imports"| workspace_skills class recipe_io,recipe_validator,recipe_contracts phase; class workspace_skills handler; class core_paths stateNode; class research_yaml,skill_contracts output; class troubleshoot_skill newComponent; class implement_exp handler; class test_diag,test_contracts newComponent; class test_skills_ws handler; ``` Closes #635 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260405-193031-162971/.autoskillit/temp/make-plan/research_recipe_troubleshoot_plan_2026-04-05_193500.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 2.9k | 93.0k | 4.6M | 271.4k | 4 | 37m 40s | | verify | 109 | 64.2k | 5.4M | 277.1k | 4 | 28m 55s | | implement | 224 | 71.2k | 12.5M | 282.5k | 4 | 32m 50s | | audit_impl | 117 | 25.1k | 1.0M | 114.6k | 3 | 12m 6s | | open_pr | 131 | 76.9k | 4.8M | 232.2k | 4 | 27m 43s | | review_pr | 100 | 134.7k | 4.3M | 237.6k | 3 | 38m 8s | | resolve_review | 77 | 40.4k | 3.7M | 117.7k | 2 | 18m 16s | | fix | 91 | 32.1k | 3.8M | 120.9k | 2 | 21m 36s | | diagnose_ci | 13 | 1.4k | 161.4k | 15.6k | 1 | 37s | | resolve_ci | 18 | 3.7k | 293.8k | 29.1k | 1 | 3m 2s | | **Total** | 3.8k | 542.7k | 40.5M | 1.7M | | 3h 40m | --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

Trecek · 2026-04-15T05:05:06Z

+from pathlib import Path
+from typing import Any, Literal
+
+import httpx


[warning] arch: httpx imported at module level — adds a hard runtime dep on every CLI invocation even when update checks are suppressed. Should be deferred inside the function that uses it.

Investigated — this is intentional. Duplicate of 3084082999. (category: false_positive_intentional_pattern)

Trecek · 2026-04-15T05:05:08Z

+    )
+
+
+def _check_plugin_cache_exists(


[warning] bugs: _check_plugin_cache_exists has no exception handler around detect_install(). Any classification error will surface as an unhandled exception rather than a graceful DoctorResult.

Investigated — this is intentional. Duplicate of 3084083006. (category: false_positive_intentional_pattern)

Trecek · 2026-04-15T05:05:10Z

+                return
+
+    async with anyio.create_task_group() as tg:
+        await tg.start(_watch, tg.cancel_scope)


[warning] bugs: mcp_server.run_async() exception before cancel_scope is cancelled propagates silently in the task group rather than being logged.

Investigated — this is intentional. Duplicate of 3084083010. (category: design_intent_misread)

Trecek · 2026-04-15T05:05:11Z

+import sys
+
+from autoskillit.cli._ansi import supports_color
+from autoskillit.cli._init_helpers import _require_interactive_stdin


[warning] arch: _timed_input imports from _init_helpers creating a circular-risk peer dependency; _require_interactive_stdin should be inlined or moved to a lower layer.

Investigated — this is intentional. Duplicate of 3084083014. (category: false_positive_intentional_pattern)

Trecek · 2026-04-15T05:05:40Z

+    # Deferred import breaks the circular dependency with _analysis.py.
+    from autoskillit.recipe._analysis import _build_step_graph, extract_blocks  # noqa: PLC0415
+
+    recipe.blocks = extract_blocks(recipe, _build_step_graph(recipe))


[warning] arch: load_recipe mutates Recipe.blocks after construction via extract_blocks, breaking the dataclass immutability contract. Code that calls _parse_recipe directly (e.g. contracts.py) gets a Recipe with empty blocks, silently.

Investigated — this is intentional. Duplicate of 3084083031. (category: false_positive_intentional_pattern)

Trecek · 2026-04-15T05:05:45Z

+    budget_entry = _budget_for(bctx.block.name)
+    if "run_cmd" not in budget_entry:
+        return []  # No run_cmd budget declared for this block — skip check
+    budget = int(budget_entry["run_cmd"])


[warning] bugs: _check_block_run_cmd_budget: int(budget_entry['run_cmd']) will raise ValueError/TypeError if the YAML value is not a valid integer (e.g. a float or string). Should guard with try/except or explicit type check.

Trecek · 2026-04-15T05:05:46Z

+@lru_cache(maxsize=1)
+def _block_budgets() -> Mapping[str, Mapping[str, Any]]:
+    """Load block_budgets.yaml, cached for the lifetime of the process."""
+    path = pkg_root() / "recipe" / "block_budgets.yaml"


[warning] defense: _block_budgets() catches FileNotFoundError but not YAMLError or ValueError from load_yaml. A malformed block_budgets.yaml returns empty dict and all block rules silently skip.

Trecek · 2026-04-15T05:05:47Z

+    should never contain these characters, but the guard makes the failure
+    loud and free.
+    """
+    if "\n" in temp_dir_relpath or ": " in temp_dir_relpath:


[warning] defense: substitute_temp_placeholder only guards against newline and ': '. A path containing # or [ could still produce malformed YAML. The guard comment says 'filesystem paths should never contain these' but this is an assertion, not enforcement.

Trecek · 2026-04-15T05:05:49Z

@@ -97,6 +123,9 @@ class Recipe:
    kitchen_rules: list[str] = field(default_factory=list)
    version: str | None = None
    experimental: bool = False
+    requires_packs: list[str] = field(default_factory=list)
+    # Populated by extract_blocks() during load; empty tuple for recipes with no block: anchors.
+    blocks: tuple[RecipeBlock, ...] = field(default_factory=tuple)


[warning] arch: Recipe.blocks has a two-phase initialization pattern: _parse_recipe produces empty blocks, while load_recipe populates them. Code calling _parse_recipe directly gets a Recipe with empty blocks silently, with no sentinel or incomplete-state marker.

Trecek · 2026-04-15T05:05:50Z

+            if isinstance(data, dict):
+                spec = _parse_experiment_type(data, path)
+                result[spec.name] = spec
+        except Exception:


[warning] defense: _load_types_from_dir catches bare Exception and logs a warning, but this also hides TypeError/AttributeError from _parse_experiment_type bugs during development. Narrow to (ValueError, TypeError, OSError, KeyError).

Trecek · 2026-04-15T05:06:18Z

+_AUTOSKILLIT_LOG_DIR_ENV = "AUTOSKILLIT_LOG_DIR"
+
+
+def _read_quota_cache(cache_path_str: str, max_age: int) -> dict | None:


[warning] cohesion: _read_quota_cache is duplicated verbatim from quota_guard.py. These two sibling stdlib hooks share identical logic that should live in _hook_settings.py to avoid drift.

Trecek · 2026-04-15T05:06:20Z

+        return None
+
+
+def _resolve_quota_log_dir() -> Path | None:


[warning] cohesion: _resolve_quota_log_dir is duplicated verbatim from quota_guard.py. Same consolidation opportunity.

Trecek · 2026-04-15T05:06:21Z

+        return None
+
+
+def _write_quota_log_event(event: dict, log_dir: Path | None) -> None:


[warning] cohesion: _write_quota_log_event is duplicated verbatim from quota_guard.py. Three identical helper functions across two quota hooks with no shared home.

Trecek · 2026-04-15T05:06:22Z

+                    _age = datetime.now(UTC) - _opened_at
+                    if _age.total_seconds() >= _ttl_hours * 3600:
+                        _p.unlink()
+                except Exception:


[warning] defense: On a corrupt marker file, the sweep calls _p.unlink() — silently deleting a file that may be temporarily unreadable (e.g. EINTR or disk flush). At minimum log before deleting.

Trecek · 2026-04-15T05:06:24Z

+        with os.fdopen(fd, "w", encoding="utf-8") as f:
+            f.write(payload)
+        os.replace(tmp, marker_path)
+    except Exception:


[warning] defense: On exception in _write_kitchen_marker, bare raise re-raises after cleanup, but the caller catches and emits a warning message — the original exception traceback is lost to the user (only str(e) survives). Consider logging traceback to stderr.

Trecek · 2026-04-15T05:06:25Z

+    for remote_name in ("upstream", "origin"):
+        result = _probe_single_remote(source, remote_name)
+        last_result = result
+        if result.reason == "ok" and _is_not_file_url(result.url):


[warning] bugs: _probe_clone_source_url: when both upstream and origin timeout, caller gets 'timeout' reason with no indication that both probes timed out.

Trecek · 2026-04-15T05:06:27Z

-    def __init__(self) -> None:
-        self._resolver = SkillResolver()
+    def __init__(self, temp_dir_relpath: str = ".autoskillit/temp") -> None:
+        if "\n" in temp_dir_relpath or ": " in temp_dir_relpath:


[warning] defense: SkillsDirectoryProvider.init validates temp_dir_relpath for newline and ': ' but not for other YAML-unsafe chars like bare colon, {}, or []. The guard catches the most dangerous cases but documents incomplete coverage.

- cli/_update_checks: _api_sha now tries refs/tags for tag revisions - config/settings: annotate _EXIT_GRACE_BUFFER_MS as ClassVar[int] - execution/_process_monitor: cache psutil.Process objects across calls so cpu_percent(interval=0) returns meaningful deltas - hooks/_hook_settings: add ENV_DISABLED env-var override for disabled - workspace/clone_registry: wrap open+flock in try/except in __enter__ to prevent fd leak if flock() raises - recipe/_analysis: extract_blocks accepts precomputed predecessors map to avoid duplicate computation; add warning logs for fallback entry/exit selection - recipe/rules_fixing: use deque.popleft() instead of list.pop(0) - recipe/rules_reachability: use ctx.predecessors in _ancestors(); _find_capture_producers returns all producers - recipe/rules_contracts: log warning on unreadable SKILL.md - server/tools_kitchen: add gate.disable() on start_quota_refresh failure for consistency - server/_factory: make recording ImportError degrade gracefully like replay path - server/_wire_compat: use model_copy() instead of in-place mutation to avoid modifying shared FastMCP tool registry objects Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Update JSON write site allowlist line numbers for clone_registry and tools_kitchen after code changes shifted lines - Wire compat middleware tests: use model_copy mock returns instead of in-place mutation expectations - Process monitor tests: account for two-call priming pattern with cached psutil.Process objects; clear module cache between tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Trecek and others added 30 commits April 4, 2026 21:25

chore: bump version to 0.7.11

dbc0210

chore: bump version to 0.7.12

869287e

chore: bump version to 0.7.13

5b995f8

chore: bump version to 0.7.14

3f9ed12

chore: bump version to 0.7.15

f9234cd

chore: bump version to 0.7.16

d9ffa9c

chore: bump version to 0.7.17

ea3eb7f

chore: bump version to 0.7.18

681fab9

chore: bump version to 0.7.19

2a6d889

chore: bump version to 0.7.20

864cb6b

fix: lower quota guard threshold from 90% to 85%

56362f8

Increase sensitivity to catch quota exhaustion earlier, giving more buffer before hard API limits are hit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: bump version to 0.7.21

56b79b3