Promote develop to main (200 PRs, 160+ issues, 179 fixes, 480 features, 27 refactors, 22 infra) by Trecek · Pull Request #2213 · TalonT-Org/AutoSkillit

Trecek · 2026-05-08T04:38:06Z

Promotion: develop to main

This release represents a major expansion of AutoSkillit with 1,459 commits across 200 PRs, introducing three new core packages: fleet/ for campaign dispatch, workspace/ for clone management, and planner/ for progressive resolution planning. Performance has been significantly improved through adoption of orjson, regex, uvloop, CDumper for YAML writes, CSafeLoader (11.1x speedup), and pre-compiled recipe YAML, collectively reducing hot-path latency across the board. The L2/L3 orchestration layers gained substantial new capabilities including activity-aware fleet dispatch timeouts, resumable quota sleep exits, L2 session resume for context exhaustion, and a full fleet campaign retry and preview system. Reliability was hardened through a dozen targeted Rectify fixes addressing resume gate boot failures, review loop counter bypasses, merge conflict semantic guards, and token summary schema drift. The recipe schema, hook structure, and CLI surface have all been meaningfully extended with new kinds, fields, validation commands, and the sous-chef → admiral rename for L3 terminology clarity.

Stats: 1,550 files changed, +287,137 / -46,998 lines | 1,459 commits | 200 PRs | 179 fixes, 480 features, 27 refactors, 22 infra, 50 tests, 9 docs

Highlights

Three new packages introduced: fleet/ (campaign dispatch + semaphore + sidecar + liveness), workspace/ (clone management + worktrees + skill resolution), and planner/ (progressive resolution planner with phases, assignments, and work packages)
Performance suite: orjson on JSON hot paths, regex drop-in package, uvloop event loop, CDumper for YAML writes, CSafeLoader (11.1x speedup), pre-compiled bundled recipe YAML to JSON, and mtime/lru_cache for registry loaders — applied across the critical execution path
Fleet orchestration maturity: activity-aware dispatch timeout, quota sleep resumable exit, campaign retry unblocking from terminal FAILURE state, pre-launch dispatch preview, and L2 session resume for context exhaustion and API disconnects
Recipe schema extended with RecipeKind, RecipeBlock, and CampaignDispatch; migration engine gains AdvisoryMigrationAdapter; hooks restructured into guards/ (17 scripts) and formatters/ (5 files); 9 new runtime dependencies added
L3 orchestrator terminology clarified: sous-chef renamed to admiral throughout; import layer (IL-N) vs orchestration level (L-N) notation formally disambiguated in docs and CLAUDE.md

Release Notes

New Features

Fleet campaign dispatch — new fleet/ package with campaign dispatch, semaphore, sidecar, liveness probes, and state persistence; includes pre-launch preview and retry unblocking from terminal FAILURE state
Workspace and planner packages — workspace/ (clone management, worktrees, skill resolution) and planner/ (progressive resolution with phases, assignments, WPs, validation) added as first-class packages
Activity-aware fleet dispatch timeout + resumable quota sleep exit — fleet sessions now respect activity signals before timing out; quota sleep is resumable across process restarts
L2 session resume — headless sessions can resume after context exhaustion or API disconnects without losing orchestration state
BEM pre-step gate for multi-issue dispatch — blocks dispatch until batch-eligibility conditions are met, preventing premature multi-issue fan-out
Local review rounds before PR creation — recipe-driven local review loop executes before any PR is opened, reducing rework on remote branches
Trigger-evaluation ordering mechanism — deterministic ordering of trigger evaluation across recipe steps
Content-aware cascade downgrade for additive-only changes — automatically downgrades review intensity when diffs are purely additive
Batch issue creation via GraphQL aliases — multi-issue creation in a single API round-trip using aliased mutations
User-config validation CLI — autoskillit config validate command surfaces misconfiguration before runtime
--profile CLI flag for provider selection at invocation time
Token summary per-step model column — token usage breakdown now includes the model used per step
Ingredient table display in fleet campaign sessions — campaigns surface the resolved ingredient table for operator inspection
Skip push_branch when output_mode == local — avoids unnecessary remote pushes in local-only workflows
Review-design handling for all-silent types — silent-type constructs now have explicit review design rules
Research-recipe smoke test — new smoke test covering the research-family recipe end-to-end
Bound unbounded routing loops in research-family recipes — loop guards prevent infinite routing under edge conditions

Bug Fixes

Food Truck resume gate boot — resume gate failed to initialize on cold boot in certain fleet configurations
Review loop counter bypass — review iteration counter could be bypassed, allowing unbounded review rounds
Merge failure domain routing — failures during merge were routed to the wrong error domain, masking root cause
Merge-PR conflict detection semantic validation guards — conflict detection now enforces semantic correctness, not just syntactic presence
Artifact-dependent routing immunity + batch failure gate — artifact-dependent routes were incorrectly skipped; batch failure gate now enforced
Early-stop worktree routing blindness — early-stop signals were not propagated to worktree routing, causing stale sessions
Token summary schema drift — token summary output schema drifted from consumer expectations; realigned
Fleet tool visibility breach — fleet tools were incorrectly visible outside an open kitchen session
Completion marker architectural immunity — completion markers were not immune to architecture-level route overrides
Idle stall watchdog immunity — idle stall watchdog was bypassed by certain session states
Provider fields silent omission — provider configuration fields were silently dropped when unrecognized
run_python type-erasure boundary — type information was erased at the run_python IL boundary, causing downstream type errors

Performance

orjson adopted for all JSON hot paths — faster serialization/deserialization across pipeline and fleet layers
regex drop-in package replaces stdlib re on hot paths — PCRE2-backed with significant throughput gains on complex patterns
uvloop event loop enabled — replaces asyncio default loop for lower per-call overhead in headless sessions
CDumper for YAML writes — C-extension YAML dumper replaces Python dumper on all write paths
CSafeLoader — 11.1x speedup on YAML load benchmarks; applied to all bundled recipe loading
Pre-compiled bundled recipe YAML to JSON — recipes are compiled to JSON at install time; runtime load bypasses YAML parsing entirely
mtime/lru_cache for registry loaders — registry files are re-parsed only when mtime changes; session-scope cache added
Lazy-import igraph — igraph import deferred to first use, shaving startup time for sessions that never touch the graph layer

Refactoring

Decompose tools_execution.py (924 lines) — split into focused modules; no behavioral change
Decompose cli/_prompts.py (819 lines) — prompt logic extracted into cohesive sub-modules
Split _type_results.py — result types separated by domain for cleaner import boundaries
Sous-chef renamed to admiral — L3 orchestrator role renamed throughout code, docs, and skills for terminology clarity
Restructure headless prompt — prompt assembly refactored for maintainability and testability
Clarify session type labels — session type identifiers normalized across CLI, logs, and telemetry
Disambiguate IL-N vs L-N notation — import layer levels (IL-0…IL-3) and orchestration levels (L0…L3) formally separated in all docs and CLAUDE.md
docs/CLAUDE.md hub-and-spoke reorganization — top-level CLAUDE.md now acts as a hub linking to per-package CLAUDE.md files

Infrastructure

Hooks restructured — hook scripts reorganized into guards/ (17 scripts) and formatters/ (5 files) subdirectories; HOOK_REGISTRY and RETIRED_SCRIPT_BASENAMES updated accordingly
9 new runtime dependencies: orjson, regex, uvloop, lazy-loader, markdown-it-py, pathspec, pygments, pyjwt; plus api-simulator as a dev dependency
Recipe schema additions: RecipeKind enum, RecipeBlock, CampaignDispatch type; new fields on Recipe and RecipeStep
AdvisoryMigrationAdapter — new migration engine adapter; diagrams are now advisory-only and no longer block migration
Quota guard dual-window — quota guard now enforces both short and long observation windows independently
safety.protected_branches updated: integration → develop
New review config section — recipe-level review configuration surface added to AutomationConfig

Breaking Changes

Sous-chef skill renamed to admiral — any external references to /autoskillit:sous-chef must be updated to /autoskillit:admiral
Hook script paths changed — hooks previously at hooks/*.py now live under hooks/guards/ or hooks/formatters/; any external hook registrations must be updated
safety.protected_branches — the branch name integration has been replaced by develop; update any config files that reference the old value
Recipe schema — RecipeKind, RecipeBlock, and CampaignDispatch are new required/optional fields; existing recipes without these fields will be processed via AdvisoryMigrationAdapter but authors should update schemas

Attention Required

Worktree setup procedure changed — never run autoskillit init from within a git worktree; use task install-worktree instead
Session log paths use hyphens — log directory and session folder names are hyphen-separated; any scripts constructing log paths with underscores will silently fail to find sessions
Subagent invocations require CLAUDE_CODE_EXIT_AFTER_STOP_DELAY=120000 — without this env var, subagents may not exit cleanly when finished
Pre-commit hooks must be re-run after this upgrade — hook script relocations mean existing local hook symlinks may point to deleted paths

Merged PRs

PR	Title	Author
#2212	Rectify: Food Truck Resume Gate Boot	Trecek
#2211	Activity-Aware Fleet Dispatch Timeout + Quota Sleep Resumable Exit	Trecek
#2210	Adopt orjson for JSON Hot Paths	Trecek
#2209	Bound Unbounded Routing Loops in Research-Family Recipes	Trecek
#2207	Add BEM pre-step gate to fleet dispatcher for multi-issue parallel dispatch	Trecek
#2206	Perf — Drop-in Regex Package + Enable uvloop for MCP Server	Trecek
#2204	Rectify: Review Loop Counter Bypass via Approved-Verdict Catch-All Routing	Trecek
#2203	Display ingredient table in fleet campaign sessions	Trecek
#2202	Clarify session type labels, docstrings, and variable naming	Trecek
#2200	Perf — Add CDumper for YAML Writes + Hoist re.compile() to Module Level	Trecek
#2199	Mock asyncio.sleep in TestBulkCloseIssues to eliminate rate-limit delays	Trecek
#2195	Research-Recipe Smoke Test (Two Fixtures)	Trecek
#2194	Sweep stale documentation: counts, retired names, constraint scopes	Trecek
#2187	Rename sous-chef to admiral for L3 orchestrator terminology	Trecek
#2184	Tests for methodology-tradition expansion	Trecek
#2183	Tests for Experiment-Type Registry Expansion	Trecek
#2181	Disambiguate IL-N Import Layers from L-N Orchestration Levels	Trecek
#2180	Document Guard Fail-Mode Matrix (Fail-Open vs Fail-Closed)	Trecek
#2178	Fix batch_create_issues Validation Summary Append	Trecek
#2177	No-Mandatory-Figures Path in vis-lens-methodology-norms	Trecek
#2176	Review-Design Handling of All-Silent Types	Trecek
#2175	Rectify: Merge Failure Domain Routing — Rebase Misrouted to resolve-failures	Trecek
#2174	Audit-trail artifact in worktree	Trecek
#2172	Rectify: Merge-PR Conflict Detection — Semantic Validation Guards	Trecek
#2171	Add type field to RecipeIngredient and enforce integer-default consistency	Trecek
#2170	Pre-compile Bundled Recipe YAML to JSON for Faster Loading	Trecek
#2169	perf: lazy-import igraph inside build_recipe_graph	Trecek
#2168	fix: exempt subagents via agent_id and fix flag path via ancestor walk	Trecek
#2167	Fix API-Simulator Wheel Caching in CI	Trecek
#2165	Clarify recipe validation API naming and contract suffix conventions	Trecek
#2164	Fix misleading names in session/gating layer	Trecek
#2163	Rename mcp_health_guard.py to mcp_health_advisor.py	Trecek
#2162	Document the tag-visibility vs application-gate split for MCP tools	Trecek
#2160	Thread experiment_type and methodology_tradition through research recipe	Trecek
#2159	Skill-Rename Migration Note	Trecek
#2157	Cache DefaultSkillResolver Results and Share Across Rule Functions	Trecek
#2156	Low-risk housekeeping — sort utility, package gateway, and all gaps	Trecek
#2155	Recipe completeness guard in _parse_recipe	Trecek
#2154	Perf — Eliminate Redundant I/O in load_and_validate()	Trecek
#2153	perf: add mtime/lru_cache to uncached registry loaders	Trecek
#2152	Rectify: Artifact-Dependent Routing Immunity + Batch Failure Gate	Trecek
#2151	P5-A3-WP2 — Test dispatch_id_filter in audit/tokens/timings consumers	Trecek
#2150	perf: switch YAML loading to CSafeLoader (11.1x speedup)	Trecek
#2149	Perf — Session-scope _resolve_test_config cache (stop cache thrash)	Trecek
#2148	P5-A5-WP3 — Wire normalization + campaign_id into fleet _api.py dispatch paths	Trecek
#2147	Decompose cli/_prompts.py (819 lines)	Trecek
#2146	fleet/state.py — Size reduction and DispatchRecord Factory	Trecek
#2143	Decompose tools_execution.py (924 lines)	Trecek
#2130	Add Content-Aware Cascade Downgrade for Additive-Only Changes	Trecek
#2129	Split _type_results.py: Extract Execution-Scoped Types	Trecek

Show all 200 PRs

The full list of 200 merged PRs includes additional features, fixes, tests, and infrastructure changes spanning v0.7.0 through v0.9.562. See the commit history for the complete changelog.

Linked Issues

Issue	Title	Status	Labels
#720	Add model: field to all run_skill recipe steps	OPEN	bug, recipe:implementation, staged
#830	Auto-init git repo in create_worktree	OPEN	recipe:implementation, staged
#831	Skip push_branch when output_mode == local	OPEN	recipe:implementation, staged
#832	Revise causal_inference trigger to require manipulation	OPEN	recipe:implementation, staged
#833	Land 7 new experiment-type YAML files	OPEN	recipe:implementation, staged
#834	Trigger-evaluation ordering mechanism (priority field)	OPEN	recipe:implementation, staged
#835	Review-design handling of all-silent types	OPEN	recipe:implementation, staged
#836	Tests for experiment-type registry expansion	OPEN	recipe:implementation, staged
#837	Audit deep-research citations (human-in-the-loop)	OPEN	recipe:implementation, staged
#838	Design spec for dedicated environment-setup skill	OPEN	recipe:implementation, staged
#839	Implement environment-setup skill	OPEN	recipe:implementation, staged
#840	Decouple Docker concern from implement-experiment	OPEN	recipe:implementation, staged
#841	Rename vis-lens-domain-norms → vis-lens-methodology-norms	OPEN	recipe:implementation, staged
#842	Land 12 methodology-tradition entries	OPEN	recipe:implementation, staged
#845	Migrate ML sub-areas as conditional-branching venue appendices	OPEN	recipe:implementation, staged
#847	Tests for methodology-tradition expansion	OPEN	recipe:implementation, staged
#848	Generate research.yaml contract card	OPEN	recipe:implementation, staged
#849	Generate research.yaml pre-rendered diagram	OPEN	recipe:implementation, staged
#850	Research-recipe smoke test (two fixtures)	OPEN	recipe:implementation, staged
#851	Extend stage-data network probes for biology databases	OPEN	recipe:implementation, staged

160+ additional linked issues

This promotion carries forward closing references for 160+ additional issues across all domains. See the closing references section below.

Attention Required

Recipe schema breaking changes — RecipeKind, RecipeBlock, and CampaignDispatch are new types; Recipe and RecipeStep have many new validated fields; validate_recipe renamed to validate_recipe_structure
Migration engine refactor — DiagramMigrationAdapter is now advisory-only; diagrams will no longer auto-regenerate on migration
Hooks reorganization — three scripts renamed, 17+ new guards, HOOK_REGISTRY and RETIRED_SCRIPT_BASENAMES must be verified
9 new runtime dependencies — including private api-simulator via uv.sources; lockfile and install verification required
Config changes — safety.protected_branches changed from integration to develop; quota_guard dual-window fields added
Version jump — 0.7.0 → 0.9.562 (562 patch versions accumulated on develop)

Architecture Impact

Module Dependency (Structural — "How are modules coupled?")

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%%
graph TB
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;

    subgraph IL3 ["IL-3 — APPLICATION"]
        direction LR
        SERVER["● server/<br/>━━━━━━━━━━<br/>FastMCP server<br/>★ tools/ subpackage<br/>● _factory.py, _state.py"]
        CLI["● cli/<br/>━━━━━━━━━━<br/>CLI entry points<br/>★ doctor/, fleet/, session/, ui/<br/>★ 11 new modules"]
    end

    subgraph IL2 ["IL-2 — DOMAIN"]
        direction LR
        RECIPE["● recipe/<br/>━━━━━━━━━━<br/>Schema + validation<br/>★ rules/ subpackage<br/>● schema.py — RecipeKind, RecipeBlock<br/>Fan-in: 24"]
        MIGRATION["● migration/<br/>━━━━━━━━━━<br/>Versioned migration engine<br/>● engine.py — AdvisoryAdapter"]
        FLEET["★ fleet/<br/>━━━━━━━━━━<br/>Campaign dispatch<br/>semaphore, sidecar, liveness<br/>11 modules"]
    end

    subgraph IL1 ["IL-1 — INFRASTRUCTURE"]
        direction LR
        CONFIG["● config/<br/>━━━━━━━━━━<br/>AutomationConfig + Dynaconf<br/>● defaults.yaml<br/>Fan-in: 29"]
        PIPELINE["● pipeline/<br/>━━━━━━━━━━<br/>ToolContext DI, gate<br/>● tokens.py, telemetry_fmt.py"]
        EXECUTION["● execution/<br/>━━━━━━━━━━<br/>Headless sessions, CI<br/>★ process/, headless/<br/>★ session/, merge_queue/<br/>35 modules → core"]
        WORKSPACE["★ workspace/<br/>━━━━━━━━━━<br/>Clone mgmt, worktrees<br/>skill resolution<br/>9 modules"]
        PLANNER["★ planner/<br/>━━━━━━━━━━<br/>Progressive resolution<br/>phases, assignments, WPs<br/>5 modules"]
    end

    subgraph IL0 ["IL-0 — FOUNDATION"]
        direction LR
        CORE["● core/<br/>━━━━━━━━━━<br/>★ types/ subpackage<br/>★ runtime/ subpackage<br/>✕ _type_*.py replaced<br/>Fan-in: 200"]
    end

    subgraph HOOKS_LAYER ["HOOKS — Cross-cutting"]
        direction LR
        HOOKS["● hooks/<br/>━━━━━━━━━━<br/>★ guards/ — 17 scripts<br/>★ formatters/ — 5 files<br/>★ _dispatch, _hook_settings"]
        HOOKREG["● hook_registry.py<br/>━━━━━━━━━━<br/>Registry + retired names"]
    end

    SERVER -->|"imports"| RECIPE
    SERVER -->|"imports"| MIGRATION
    CLI -->|"imports"| RECIPE
    SERVER -->|"imports"| CONFIG
    SERVER -->|"imports"| PIPELINE
    SERVER -->|"imports"| EXECUTION
    SERVER -->|"imports"| WORKSPACE
    CLI -->|"imports"| CONFIG
    CLI -->|"imports"| EXECUTION
    CLI -->|"imports"| WORKSPACE
    SERVER -->|"imports"| HOOKS
    CLI -->|"imports"| HOOKS
    CLI -->|"imports"| HOOKREG
    RECIPE -->|"imports"| CORE
    MIGRATION -->|"imports"| CORE
    FLEET -->|"imports"| CORE
    FLEET -.->|"lateral: _prompts"| HOOKS
    CONFIG -->|"imports"| CORE
    PIPELINE -->|"imports"| CORE
    EXECUTION -->|"imports"| CORE
    WORKSPACE -->|"imports"| CORE
    PLANNER -->|"imports"| CORE
    PIPELINE -.->|"runtime: AutomationConfig"| CONFIG
    HOOKS -->|"imports"| CORE
    HOOKREG -->|"imports"| CORE
    SERVER -->|"imports"| CORE
    CLI -->|"imports"| CORE

    class SERVER,CLI cli;
    class RECIPE,MIGRATION phase;
    class FLEET newComponent;
    class CONFIG,PIPELINE handler;
    class EXECUTION handler;
    class WORKSPACE,PLANNER newComponent;
    class CORE stateNode;
    class HOOKS,HOOKREG detector;

Color	Category	Description
Dark Blue	IL-3 Apps	Application layer — server and CLI
Purple	IL-2 Domain	Recipe schema, migration engine
Green	New Packages	fleet/, workspace/, planner/ (new in this promotion)
Orange	IL-1 Infra	config, pipeline, execution
Teal	IL-0 Foundation	core/ (fan-in: 200 files)
Red	Hooks	Cross-cutting hook scripts and registry

Process Flow (Physiological — "How does it behave?")

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 55, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;

    START([MCP Tool Call])
    COMPLETE([SkillResult JSON])
    ERROR([Gate Error / Crash])

    subgraph Visibility ["● Session-Type Tag Visibility"]
        direction TB
        STYPE{"Session Type?<br/>━━━━━━━━━━<br/>env: SESSION_TYPE"}
        FLEET_VIS["★ Fleet Tags<br/>━━━━━━━━━━<br/>fleet-dispatch"]
        ORCH_VIS["Orchestrator Tags<br/>━━━━━━━━━━<br/>kitchen-core + packs"]
        SKILL_VIS["Skill Tags<br/>━━━━━━━━━━<br/>headless only"]
    end

    subgraph GatePhase ["● Gate Lifecycle"]
        direction TB
        OPEN_K["● open_kitchen<br/>━━━━━━━━━━<br/>gate.enable + hook_config<br/>quota cache prime"]
        GATE{"Gate<br/>Enabled?"}
        LOAD_R["● load_and_validate<br/>━━━━━━━━━━<br/>Cache → YAML → sub-recipe<br/>→ semantic rules → contract"]
        VALID{"Recipe<br/>Valid?"}
    end

    subgraph Dispatch ["● run_skill Dispatch"]
        direction TB
        GUARDS["● Guard Chain<br/>━━━━━━━━━━<br/>1. orchestrator_check<br/>2. gate_enabled<br/>3. skill_command_valid<br/>4. cwd_absolute<br/>5. dry_walkthrough"]
        PROVIDER["● Provider Resolution<br/>━━━━━━━━━━<br/>step override → recipe<br/>→ YAML field → config"]
        SKILL_SETUP["● Skill Session Setup<br/>━━━━━━━━━━<br/>resolve namespace<br/>compute closure<br/>init ephemeral dir"]
    end

    subgraph Headless ["● Headless Session"]
        direction TB
        LAUNCH["● _execute_claude_headless<br/>━━━━━━━━━━<br/>subprocess + PTY<br/>completion marker"]
        STALE_CHK{"Termination<br/>Reason?"}
        RECOVER["● Recovery Paths<br/>━━━━━━━━━━<br/>Channel B drain-race<br/>marker search<br/>pattern recovery"]
    end

    subgraph Adjudication ["● Result Adjudication"]
        direction TB
        OUTCOME["● _compute_outcome<br/>━━━━━━━━━━<br/>success gate chain<br/>+ retry FSM"]
        RETRY{"needs_retry?"}
        BUDGET{"Budget<br/>Exhausted?"}
        FALLBACK["● Provider Fallback<br/>━━━━━━━━━━<br/>inject fallback env<br/>re-launch session"]
        POST["● Post-Session<br/>━━━━━━━━━━<br/>flush log, record tokens<br/>refresh quota cache"]
    end

    START --> STYPE
    STYPE -->|"FLEET"| FLEET_VIS
    STYPE -->|"ORCHESTRATOR"| ORCH_VIS
    STYPE -->|"SKILL"| SKILL_VIS
    FLEET_VIS --> OPEN_K
    ORCH_VIS --> OPEN_K
    SKILL_VIS --> OPEN_K
    OPEN_K --> GATE
    GATE -->|"closed"| ERROR
    GATE -->|"open"| LOAD_R
    LOAD_R --> VALID
    VALID -->|"errors"| ERROR
    VALID -->|"valid"| GUARDS
    GUARDS -->|"any guard fails"| ERROR
    GUARDS -->|"pass"| PROVIDER
    PROVIDER --> SKILL_SETUP
    SKILL_SETUP --> LAUNCH
    LAUNCH --> STALE_CHK
    STALE_CHK -->|"STALE / IDLE_STALL"| RECOVER
    STALE_CHK -->|"TIMED_OUT"| OUTCOME
    STALE_CHK -->|"COMPLETED / NATURAL_EXIT"| OUTCOME
    RECOVER --> OUTCOME
    OUTCOME --> RETRY
    RETRY -->|"no"| POST
    RETRY -->|"yes"| BUDGET
    BUDGET -->|"exhausted"| POST
    BUDGET -->|"remaining"| FALLBACK
    FALLBACK --> LAUNCH
    POST --> COMPLETE

    class START,COMPLETE,ERROR terminal;
    class STYPE,GATE,VALID,STALE_CHK,RETRY,BUDGET stateNode;
    class OPEN_K,LOAD_R,GUARDS,PROVIDER,SKILL_SETUP phase;
    class LAUNCH,RECOVER,OUTCOME handler;
    class FLEET_VIS,ORCH_VIS,SKILL_VIS newComponent;
    class FALLBACK,POST output;

Color	Category	Description
Dark Blue	Terminal	Entry and exit points
Teal	Decision	Routing and branching decisions
Purple	Phase	Configuration, validation, and guard chains
Orange	Handler	Execution, recovery, and adjudication
Green	New	New session-type visibility components

Closes #1622
Closes #1695
Closes #1699
Closes #1700
Closes #1701
Closes #1702
Closes #1703
Closes #1706
Closes #1707
Closes #1708
Closes #1709
Closes #1710
Closes #1712
Closes #1716
Closes #1717
Closes #1718
Closes #1719
Closes #1722
Closes #1723
Closes #1724
Closes #1725
Closes #1726
Closes #1727
Closes #1728
Closes #1729
Closes #1735
Closes #1745
Closes #1747
Closes #1748
Closes #1749
Closes #1751
Closes #1752
Closes #1753
Closes #1754
Closes #1755
Closes #1756
Closes #1772
Closes #1773
Closes #1774
Closes #1775
Closes #1776
Closes #1777
Closes #1778
Closes #1779
Closes #1780
Closes #1798
Closes #1802
Closes #1803
Closes #1804
Closes #1805
Closes #1806
Closes #1825
Closes #1831
Closes #1834
Closes #1835
Closes #1837
Closes #1838
Closes #1849
Closes #1851
Closes #1852
Closes #1853
Closes #1860
Closes #1861
Closes #1862
Closes #1863
Closes #1875
Closes #1877
Closes #1879
Closes #1880
Closes #1881
Closes #1882
Closes #1883
Closes #1884
Closes #1885
Closes #1886
Closes #1887
Closes #1888
Closes #1897
Closes #1898
Closes #1899
Closes #1900
Closes #1901
Closes #1902
Closes #1903
Closes #1905
Closes #1906
Closes #1910
Closes #1918
Closes #1924
Closes #1928
Closes #1932
Closes #1936
Closes #1943
Closes #1944
Closes #1945
Closes #1954
Closes #1955
Closes #1963
Closes #1964
Closes #1965
Closes #1966
Closes #1975
Closes #1976
Closes #1980
Closes #1986
Closes #1987
Closes #2005
Closes #2007
Closes #2008
Closes #2009
Closes #2020
Closes #2029
Closes #2035
Closes #2036
Closes #2039
Closes #2043
Closes #2044
Closes #2045
Closes #2047
Closes #2048
Closes #2049
Closes #2051
Closes #2061
Closes #2063
Closes #2097
Closes #2133
Closes #2134
Closes #2136
Closes #2137
Closes #2138
Closes #2139
Closes #2140
Closes #2141
Closes #2158
Closes #2173
Closes #2182
Closes #2188
Closes #2190
Closes #2196
Closes #2197
Closes #2205
Closes #2208
Closes #720
Closes #830
Closes #831
Closes #832
Closes #833
Closes #834
Closes #835
Closes #836
Closes #837
Closes #838
Closes #839
Closes #840
Closes #841
Closes #842
Closes #845
Closes #847
Closes #848
Closes #849
Closes #850
Closes #851
Closes #852
Closes #857
Closes #858

_{Generated with Claude Code via AutoSkillit}

…wn (#1961) ## Summary Add a `Model` column to the per-step token summary table and a new per-model aggregate breakdown table. The model identity is sourced from the `model_breakdown` dict already parsed by `extract_token_usage()` in `_session_model.py`. The change threads model identity through three paths: (1) in-memory accumulation via `TokenEntry.model`, (2) on-disk persistence via a new `model_identifier` field in `token_usage.json`, and (3) formatting in `TelemetryFormatter`, the stdlib-only hook, and the compact PostToolUse formatter. No cost estimation is included — the acceptance criteria require token counts only, and no pricing infrastructure exists. Closes #1906 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260505-135150-264911/.autoskillit/temp/make-plan/token_summary_model_column_plan_2026-05-05_135500.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit  --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ll-Count Tests (#1967) ## Summary Remove redundant and brittle test code across three test modules: parametrize the three identical kitchen_rules rejection tests, delete the duplicate session-type warning test, and convert exact skill-count assertions to lower-bound checks. No production code changes. Closes #1886 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260505-180602-586205/.autoskillit/temp/make-plan/deduplicate_session_type_kitchen_rules_skill_count_tests_plan_2026-05-05_180602.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit  --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…1969) ## Summary The `resolve-failures` SKILL.md has an ambiguous verdict decision flow that allows an LLM executor to emit `ci_only_failure` even after successfully applying a fix. The fix restructures the Step 2d decision tree to make the override rule explicit: **any time a code change is committed and tests pass, the verdict is `real_fix`, regardless of `failure_subtype`**. The Step 2d table is clarified to apply ONLY to the "no fix applied" path, and a post-fix-loop verdict override is added to prevent re-evaluation through the wrong decision path. ## Requirements - REQ-RF-001: When `resolve-failures` applies a code change AND the subsequent CI run passes, the verdict MUST be `real_fix`, not `ci_only_failure` - REQ-RF-002: `ci_only_failure` should only be emitted when no fix was applied or when the applied fix did not resolve the CI failure - REQ-RF-003: The fix must not break the existing `ci_only_failure` path for genuinely unfixable CI failures Closes #1954 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260505-180603-520539/.autoskillit/temp/make-plan/resolve_failures_ci_only_failure_verdict_fix_plan_2026-05-05_181000.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit  --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…itHub API Usage) (#1970) ## Summary Add a `local_review_rounds` configuration and plumbing so that the existing review loop steps (`annotate_pr_diff`, `review_pr`, `resolve_review`, `check_review_loop`) receive a `review_mode` context value (`"local"` or `"github"`) computed per-iteration. Part A covers the config dataclass, defaults, ingredient bridge, callable modification, recipe YAML wiring for all three looping recipes, and tests for all of the above. Part B will cover the skill SKILL.md behavioral changes (how `review-pr` and `resolve-review` branch on `mode=local` vs `mode=github`) — implement as a separate task. Closes #1945 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260505-165914-350121/.autoskillit/temp/make-plan/local_review_rounds_plan_2026-05-05_170500_part_a.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit  --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…ng (#1971) ## Summary Update three SKILL.md files (`compose-pr`, `prepare-pr`, `diagnose-ci`) to use directive description language and fix a step numbering gap in `diagnose-ci`. Then uncomment four step overrides in `.autoskillit/config.yaml` to route `retry_worktree`, `compose_pr`, `diagnose_ci`, and `prepare_pr` to the MiniMax M2.7-highspeed profile. ## Requirements ### REQ-1: Update 3 SKILL.md descriptions to directive language PR #1937 updated 6 SKILL.md files to use directive language (per the Seleznov 650-trial study: 94-100% activation vs 37-77% for passive descriptions). Three of the four new MiniMax target skills still use passive descriptions. **Files to update:** | File | Current description style | Required change | |------|--------------------------|-----------------| | `src/autoskillit/skills_extended/compose-pr/SKILL.md` | Passive prose: "Reads the PR prep file and validated arch-lens diagrams..." | Directive: "PR composition executor. ALWAYS invoke this skill when instructed to compose a PR. Do not read prep files or create PRs directly — use this skill first to load the composition workflow." | | `src/autoskillit/skills_extended/prepare-pr/SKILL.md` | Passive prose: "Reads plan(s), runs git diff, classifies changed files..." | Directive: "PR preparation executor. ALWAYS invoke this skill when instructed to prepare PR metadata. Do not read plans or classify files directly — use this skill first to load the preparation workflow." | | `src/autoskillit/skills_extended/diagnose-ci/SKILL.md` | No `description:` field at all | Add directive: "CI diagnosis executor. ALWAYS invoke this skill when instructed to diagnose CI failures. Do not fetch CI logs directly — use this skill first to load the diagnosis workflow." | `retry-worktree` already has directive language (updated in PR #1937). **Why this matters:** The hook-based skill load guard (PR #1937) enforces Skill tool loading regardless of description language. But directive descriptions improve voluntary model compliance — belt-and-suspenders. MiniMax's "thoughtful disobedience" pattern means every compliance signal helps. ### REQ-2: Fix diagnose-ci step numbering gap The `diagnose-ci` SKILL.md workflow skips from Step 1 to Step 3 — there is no Step 2. This is likely accidental (no logical reason for the gap). MiniMax may attempt to invent a Step 2 from training priors, causing unintended tool calls. **File:** `src/autoskillit/skills_extended/diagnose-ci/SKILL.md` **Current numbering:** Step 1, Step 3, Step 4, Step 5, Step 5a, Step 6, Step 7 (no Step 2). **Fix:** Renumber to sequential: Step 1, Step 2, Step 3, Step 4, Step 4a, Step 5, Step 6. | Old | New | |-----|-----| | Step 3: Fetch Failure Summary | Step 2: Fetch Failure Summary | | Step 4: Fetch Per-Job Logs | Step 3: Fetch Per-Job Logs | | Step 5: Classify Failure | Step 4: Classify Failure | | Step 5a: Subtype Classification | Step 4a: Subtype Classification | | Step 6: Write Diagnosis Report | Step 5: Write Diagnosis Report | | Step 7: Emit Output Tokens | Step 6: Emit Output Tokens | **Bonus fix:** Line 80 says "proceed to Step 5 (write minimal diagnosis)" — but old Step 5 is Classify Failure, not Write Diagnosis. After renumbering, "proceed to Step 5" correctly resolves to Write Diagnosis Report for the first time. This fixes a latent cross-reference bug. ### REQ-3: Verify step override activation After REQ-1 and REQ-2 are complete, uncomment the 4 step overrides in `.autoskillit/.secrets.yaml` and run one implementation pipeline to verify: 1. Each step receives the FIRST ACTION directive (check session JSONL for "FIRST ACTION" in the prompt) 2. Each step calls the Skill tool as its first action (check JSONL for `"name":"Skill"` as the first tool call) 3. Each step completes successfully with structured output tokens emitted correctly Closes #1966 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260505-185537-800888/.autoskillit/temp/make-plan/prepare_4_run_skill_steps_for_minimax_m27_routing_plan_2026-05-05_190500.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit  --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…#1972) ## Summary The `run_python` MCP tool dispatches callables through `_import_and_call` using a `dict[str, object]` parameter — a type-erasure boundary. Direct MCP tools get Pydantic lax-mode validation via FastMCP (which coerces `str→int` automatically), but `run_python` callables receive raw unvalidated values. The dispatcher already calls `inspect.signature(func)` but only uses it for `None→default` coercion (PR #1602), never reading `param.annotation`. This creates an asymmetry where `str`-typed callable parameters receiving JSON integers crash at subprocess boundaries, f-string operations, or Path construction. The architectural fix: extend `_import_and_call`'s existing coercion loop to read `param.annotation` and coerce primitive scalar types, closing the type-safety gap between the two dispatch paths. Defense-in-depth: add `str()` guards at each subprocess call site. Structural tests: a parametrized test matrix that exercises every callable × every param type combination through `_import_and_call`. Closes #1965 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/remediation-20260505-183247-696032/.autoskillit/temp/rectify/rectify_run_python_type_coercion_2026-05-05_183800.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit  --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

… State (#1968) ## Summary When a fleet campaign dispatch fails, the `FAILURE` status is terminal — `_ALLOWED_TRANSITIONS[FAILURE]` is `frozenset()` with no outgoing transitions. This blocks explicit user retry (`--resume`) because both the `has_failed_dispatch()` halt guard in `dispatch_food_truck` and Phase 2 of `resume_campaign_from_state` unconditionally reject campaigns with any FAILURE record. The fix adds a `FAILURE → PENDING` transition, a `reset_failed_dispatch()` function, and modifies the two halt check sites to distinguish between **automatic continuation** (should still halt) and **explicit user retry** (should reset the failed dispatch and re-execute). ## Requirements - REQ-RETRY-001: A failed dispatch MUST be retryable without manual state file edits - REQ-RETRY-002: The halt-on-failure guard MUST still prevent automatic continuation to subsequent dispatches after an unacknowledged failure - REQ-RETRY-003: Retry of a failed dispatch MUST reset its state and re-execute it from scratch (not resume) - REQ-RETRY-004: The retry mechanism MUST be safe under concurrent access (respect existing `_resume_lock` + `fcntl.LOCK_EX` pattern) Closes #1695 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260505-180604-269785/.autoskillit/temp/make-plan/fleet_campaign_retry_blocked_by_terminal_failure_state_plan_2026-05-05_181500.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit  --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…1973) ## Summary Create `src/autoskillit/recipes/research-review.yaml` as a standalone sub-recipe containing the 22-step PR/review phase extracted from `research.yaml`. The recipe receives campaign-injected hidden ingredients (`worktree_path`, `research_dir`, `report_path`, `experiment_plan`, `experiment_results`, `experiment_type`, `scope_report`, `visualization_plan_path`), lifts all review steps verbatim-and-adapted, replaces the archival phase with dual terminal stops (`review_pr_complete` for PR mode, `review_local_complete` for local mode), and corrects routing targets to terminate within this sub-recipe rather than routing to archival or non-existent steps. Closes #1702 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260505-184855-274993/.autoskillit/temp/make-plan/p2_wp3_create_research_review_yaml_plan_2026-05-05_185500.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit  --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…1974) ## Summary _INFRA_UNCONDITIONAL_FILES in tests/_test_filter.py contains 9 filenames, but 3 of them (test_hook_executability.py, test_hook_registration_coverage.py, test_hook_registry.py) live in tests/hooks/, not tests/infra/. The path construction loop at line 1271 resolves all 9 under tests/infra/, silently dropping the 3 hook tests from every tiered conservative filter run. The guard test in test_test_filter_tiered_always_run.py checks only basenames (p.name), masking the bug. This was introduced by commit 26c8059 (#1734) which moved the files without updating the constant. The fix splits the constant into two frozen sets with correct directory mappings, adds a second path construction loop, and strengthens all guard tests to assert parent directories. ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260505-203325-376042/.autoskillit/temp/make-plan/fix_test_filter_hook_test_path_mismatch_plan_2026-05-05_203500.md` ## Changed Files - tests/_test_filter.py - tests/test_test_filter.py - tests/test_test_filter_coverage_map.py - tests/test_test_filter_tiered_always_run.py Closes #1875 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit  --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…1977) ## Summary Fix two validated audit findings (C4-1 and C4-3) by replacing weak test assertions with precise positive checks. No production code changes are required. **Finding C4-1** — `tests/server/test_tools_dispatch_halt.py`: Five tests that verify dispatch proceeds past the halt gate use `assert result.get("error") != "fleet_campaign_halted"`. Each test calls `_setup_standard_dispatch()` which wires a valid recipe and executor, so the expected outcome is that the dispatch proceeds past the halt gate — `assert "dispatch_id" in result` is the correct assertion. **Finding C4-3** — `tests/cli/test_doctor.py` lines 448 and 464: Two tests assert `checks[0]["severity"] in ("warning", "error")`. Source inspection of `_check_plugin_cache_exists` and `_check_installed_plugins_entry` confirms both return `Severity.WARNING` unconditionally under the test conditions. Closes #1887 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/strengthen-assertions-20260505-211450-512860/.autoskillit/temp/make-plan/strengthen_assertions_plan_2026-05-05_211450.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit  Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

## Summary Add a validate-only pre-commit hook (`scripts/check_sub_claude_md.py`) that checks every sub-CLAUDE.md file table mentions all `.py` files in its directory. This catches the gap at commit time (during `pre-commit run --all-files`) instead of at CI time, preventing the systematic 5+ CI round-trip failures observed since PR #1820. The script replicates the coverage logic from `test_sub_claude_md_covers_all_py_files` and `test_tests_sub_claude_md_covers_all_py_files`, using the same `EXPECTED_SUB_CLAUDE_MDS` lists. A new `.pre-commit-config.yaml` stanza triggers it on `.py` file changes under `src/autoskillit/` and `tests/`. ## Requirements - New script: `scripts/check_sub_claude_md.py` (validate-only, exits 1 with structured message on mismatch) - New stanza in `.pre-commit-config.yaml` triggered on `files: ^(tests/|src/autoskillit/).*\.py$` - Must check both `tests/<subdir>/CLAUDE.md` and `src/autoskillit/<subdir>/CLAUDE.md` file tables - Must use the same `EXPECTED_SUB_CLAUDE_MDS` lists as the test files (or derive from disk) - No auto-fix — the agent must manually add the row with a meaningful Purpose description - Pattern: `pass_filenames: false` (like existing `doc-counts` hook) ## Changed Files ### New (★): ★ scripts/check_sub_claude_md.py ★ tests/docs/test_check_sub_claude_md_script.py ### Modified (●): ● .pre-commit-config.yaml ● tests/docs/CLAUDE.md ● tests/infra/test_ci_dev_config.py Closes #1975 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260505-211945-173795/.autoskillit/temp/make-plan/add_pre_commit_hook_sub_claude_md_plan_2026-05-05_213000.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit  --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

## Summary Replace `test_config_resolution_fleet_enabled_via_experimental` in `tests/config/test_fleet_config.py` (lines 140–155) with an isolated version that uses `tmp_path` to create a synthetic `config.yaml` containing `experimental_enabled: true`, then calls `load_config(tmp_path)` to exercise the full config loading pipeline. The live-disk read (`Path(__file__).parents[2] / ".autoskillit" / "config.yaml"`) and its CI skip guard are removed entirely. The test is **rewritten** (not deleted) because `test_is_feature_enabled_fleet_defaults_false` (in `tests/core/test_type_constants.py`) only exercises `is_feature_enabled()` in isolation — it does not cover the `load_config()` → Dynaconf layer merge → `AutomationConfig.from_dynaconf()` → `is_feature_enabled()` pipeline. The rewrite provides genuine CI coverage of that end-to-end path, which the delete option would leave uncovered. ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260505-221515-520014/.autoskillit/temp/make-plan/remove_live_config_read_plan_2026-05-05_221745.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit  Co-authored-by: Trecek <trecek@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…1981) ## Summary C4-6 (load_config(tmp_path / "settings.toml") wrong arg type) was already resolved by a prior split commit. C4-9 (three separate load_config() calls for one-liner assertions in TestWorkspaceConfig) is the sole remaining change: consolidate three test methods into a single test_workspace_config_defaults method calling load_config(tmp_path) once and asserting all fields together. Drop the hasattr check as redundant. ## Requirements ### C4-6 — load_config() argument bug (ALREADY RESOLVED) Fix `load_config(tmp_path / "settings.toml")` → `load_config(tmp_path)`. Passing a `.toml` file path causes silent fallback to defaults. (Resolved in commit a22cb18.) ### C4-9 — Consolidate TestWorkspaceConfig assertions (REQUIRES ACTION) Consolidate `TestWorkspaceConfig`'s three separate `load_config(tmp_path)` calls into a single `test_workspace_config_defaults` test checking all three fields at once. Drop the `hasattr` check. ## Changed Files ### Modified (●): tests/config/test_config.py Closes #1888 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260505-221514-570647/.autoskillit/temp/make-plan/fix_load_config_argument_bug_and_consolidate_workspace_assertions_plan_2026-05-05_222100.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit  --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

## Summary Two test files contain stale docstrings and assertion messages that reference old names or misstate what the assertions actually check: - **C5-2** (`tests/workspace/test_skills.py`): `test_bundled_skills_list_matches_filesystem` docstring and failure message still say `make-script-skill` — the skill was renamed to `write-recipe`. - **C5-5** (`tests/execution/test_process_submodules.py`): per-symbol test docstrings say `"exports X"` but the assertions verify `__module__` (definition origin), not `__all__` membership. Each docstring should say `"is defined in X submodule"`. Both fixes are pure string edits in test files. No logic changes, no new fixtures, no isolation concerns. ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260505-221516-303979/.autoskillit/temp/make-plan/fix_stale_docstrings_workspace_execution_tests_plan_2026-05-05_000000.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit  --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…1979) ## Summary Create `src/autoskillit/recipes/research-archive.yaml` as a standalone sub-recipe that extracts the 9-step archival phase from `research.yaml` (lines 855–963). The critical change from the parent recipe: all ingredient-sourced values (`pr_url`, `worktree_path`, `research_dir`, `base_branch`) use `inputs.X` references instead of `context.X`, since these are declared ingredients in the standalone recipe rather than step-captured context variables. Step-captured values (`experiment_branch`, `artifact_branch`, `artifact_pr_url`, `archive_tag`) correctly remain as `context.X`. Pack declarations are `[github, ci]` (the archival phase only needs GitHub CLI and CI tools, not the full `research` pack). No `autoskillit_version` field — consistent with bundled recipe policy (removed in #1950). Only 4 ingredients: 3 campaign-sourced hidden (`worktree_path`, `research_dir`, `pr_url`) and 1 user-input (`base_branch`). The `report_path_after_finalize` and `source_dir` ingredients from the parent recipe are omitted because no archival step references them. Closes #1703 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260505-213323-163152/.autoskillit/temp/make-plan/p2_wp4_create_research_archive_yaml_sub_recipe_plan_2026-05-05_213600.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit  --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

## Summary The hook guard system lacks structural enforcement that command-inspecting guards must cover all tool variants that execute shell commands. Guards are written independently with ad-hoc extraction of command text, and the test suite validates each guard only against the tool format it was designed for — not against all tools it should logically intercept. The fix adds a structural meta-test that makes it impossible to register a command-inspecting guard without covering both the `Bash` native tool and `run_cmd` MCP tool, plus a parametrized test helper that forces every such guard to prove it blocks dangerous commands through either tool pathway. This closes a gap where `unsafe_install_guard.py` and `pr_create_guard.py` only read `tool_input.cmd` (from `run_cmd`) but ignore `tool_input.command` (from `Bash`), allowing headless agents to bypass these guards entirely when using the native Bash tool instead of the MCP wrapper. Closes #1980 ## Implementation Plan Plan file: `.autoskillit/temp/rectify/rectify_unsafe-install-guard-bash-tool-coverage-gap_2026-05-05_223500.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit  --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Trecek

AutoSkillit PR Review (individual)

Trecek · 2026-05-08T06:00:57Z

@@ -436,3 +450,129 @@ async def batch_cleanup_clones(
    except Exception as exc:
        logger.warning("batch_cleanup_clones failed", exc_info=True)
        return json.dumps({"deleted": [], "preserved": [], "error": str(exc)})


[warning] defense: batch_cleanup_clones error handler returns result missing delete_failures key. Callers get KeyError.

Trecek

AutoSkillit PR Review (individual)

Trecek · 2026-05-08T06:01:00Z

+                        parts.append(b.get("text", ""))
+                    # Non-text blocks (thinking, tool_use, etc.) contribute no text
+                self.result = "\n".join(parts)
+            elif not isinstance(self.result, str):


[warning] slop: Unreachable elif branch in post_init: elif not isinstance(self.result, str) is dead code.

Trecek

AutoSkillit PR Review (batch 4)

Trecek · 2026-05-08T06:01:03Z

+        description="L3 Fleet Orchestrator — multi-session campaign dispatch",
+        tool_tags=frozenset({"fleet"}),
+        skill_categories=frozenset({"fleet"}),
+        import_package="autoskillit.fleet",


[critical] arch: IL-0 module stores import_package='autoskillit.fleet' and 'autoskillit.planner' as string data in FEATURE_REGISTRY. No actual import at module load. But importlib.import_module usage would violate layer boundary.

This finding requires a human decision — the correct path is ambiguous.

Trecek · 2026-05-08T06:01:03Z

+    ended_at = time.time()
+
+    # --- Timeout pre-check: short-circuit before result-block parsing ---
+    if skill_result.subtype == "timeout":


[warning] bugs: skill_result.subtype compared as string literal == 'timeout'. Should use CliSubtype.TIMEOUT enum for type safety.

Trecek · 2026-05-08T06:01:03Z

+
+    try:
+        new_version = importlib.metadata.version("autoskillit")
+    except Exception:


[warning] defense: _verify_update_result catches bare except Exception and falls back to new_version=current. Cannot distinguish version unchanged from infrastructure error.

Trecek · 2026-05-08T06:01:03Z

+
+        if Version(latest) > Version(current):
+            return Signal("binary", f"New release: {latest} (you have {current})")
+    except Exception:


[warning] defense: _binary_signal catches bare except Exception and returns None. Missing packaging import silently returns no-signal.

Trecek · 2026-05-08T06:01:03Z

+        if not isinstance(conditions, list):
+            return False
+        return condition in conditions
+    except Exception:


[warning] defense: _is_dismissed catches bare except Exception and returns False. Malformed dismissed_version causes repeated prompts.

Trecek · 2026-05-08T06:01:04Z

+        args=["autoskillit", "install"], returncode=0
+    )
+    with terminal_guard():
+        subprocess.run(cmd, check=False, env=skip_env)


[warning] defense: _run_update_sequence: upgrade subprocess return code not inspected. Failed upgrade silently ignored.

Trecek · 2026-05-08T06:01:04Z

    extras: dict[str, str] = {
        "AUTOSKILLIT_HEADLESS": "1",
+        "AUTOSKILLIT_SESSION_TYPE": SESSION_TYPE_SKILL,
        "MAX_MCP_OUTPUT_TOKENS": _MAX_MCP_OUTPUT_TOKENS_VALUE,


[warning] defense: MAX_MCP_OUTPUT_TOKENS and MCP_CONNECTION_NONBLOCKING duplicated between extras dict and _SESSION_BASELINE_ENV. Silent maintenance hazard if canonical values change.

Trecek · 2026-05-08T06:01:04Z

+        extras[KITCHEN_SESSION_ID_ENV_VAR] = kitchen_session_id
+    if allowed_write_prefix:
+        extras["AUTOSKILLIT_ALLOWED_WRITE_PREFIX"] = allowed_write_prefix
+    # Layer caller env_extras (campaign vars) UNDER the mandatory keys.


[warning] defense: build_food_truck_cmd: caller env_extras keys silently dropped if they match mandatory keys. No warning logged.

Trecek · 2026-05-08T06:01:04Z

+            f"Recipe '{recipe}' could not be loaded: {exc}",
+        )
+
+    _DISPATCHABLE_KINDS = frozenset({"standard", "food-truck"})


[warning] defense: _DISPATCHABLE_KINDS defined inside _run_dispatch (recreated every call). Should be module-level constant.

Trecek · 2026-05-08T06:01:04Z

    except Exception as exc:
        logger.error("run_skill unhandled exception", exc_info=True)
        return SkillResult.crashed(
            exception=exc,
            skill_command=skill_command,
            order_id=order_id,
        ).to_json()
+    except BaseException:


[warning] defense: CancelledError (BaseException) raised inside inner try falls through to except Exception and returns crashed SkillResult rather than being re-raised.

This finding requires a human decision — the correct path is ambiguous.

Trecek

AutoSkillit PR Review (individual)

Trecek · 2026-05-08T06:01:08Z

+    if new_version != current:
+        return True
+
+    from autoskillit.cli._install_info import upgrade_command


[warning] slop: Redundant import of upgrade_command inside _verify_update_result — already imported at module level.

Trecek

AutoSkillit PR Review (individual)

Trecek · 2026-05-08T06:01:13Z

+    except Exception as exc:
+        logger.warning("load_recipe failed for '%s'", recipe, exc_info=True)
+        return fleet_error(
+            FleetErrorCode.FLEET_RECIPE_NOT_FOUND,


[warning] slop: _DISPATCHABLE_KINDS defined as frozenset inside _run_dispatch on every call. Should be module-level constant.

Trecek

AutoSkillit PR Review (individual)

Trecek · 2026-05-08T06:01:15Z

@@ -170,7 +183,19 @@ async def run_python(
        return json.dumps({"success": False, "error": f"{type(exc).__name__}: {exc}"})


-@mcp.tool(tags={"autoskillit", "kitchen"}, annotations={"readOnlyHint": True})
+def _persist_run_skill_state(skill_result: SkillResult, project_dir: Path) -> None:


[warning] slop: _persist_run_skill_state and _clear_run_skill_state are unnecessary single-line wrapper functions. Deferred imports could be inlined.

Trecek · 2026-05-08T06:01:29Z

(L403 — outside diff hunk) [warning] bugs: _check_merge_base_unpublished accesses step.on_result.routes.values() without guard. If routes is None, raises AttributeError.

Trecek · 2026-05-08T06:01:31Z

(L367 — outside diff hunk) [warning] defense: output_mode validation for research recipe is hardcoded name-based special case in generic open_kitchen handler. Should be driven by recipe schema.

This finding requires a human decision — the correct path is ambiguous.

Trecek · 2026-05-08T06:01:34Z

(L275 — outside diff hunk) [warning] slop: _check_always_has_no_write_exit has multi-line docstring whose first sentence repeats the @semantic_rule description exactly.

Trecek · 2026-05-08T06:01:36Z

(L270 — outside diff hunk) [warning] slop: install_result initialized to dummy CompletedProcess only to satisfy type checker before with block overwrites it. Use Optional typing instead.

Trecek

AutoSkillit PR Review — Verdict: approved_with_comments

Scope: Top 20 most-changed Python source files out of 1,551 total changed files (287K LoC added).

37 warning-level findings across 15 files. No blocking changes required. See inline comments.

Finding Summary by Dimension

Dimension	Findings	Key Pattern
defense	17	Broad `except Exception` handlers swallowing errors silently (headless flush, update checks, rule loaders)
bugs	9	Assert-as-validation (disabled under -O), string-vs-enum comparisons, token log ordering
slop	8	Dead code branches, redundant imports, unnecessary wrapper functions, duplicate logic
cohesion	2	Asymmetric SkillResult construction across recovery paths
arch	1	IL-0 FEATURE_REGISTRY storing IL-2 import paths as string data

Top Patterns

Silent exception swallowing (10 findings): except Exception: pass or except Exception: return [] patterns suppress infrastructure failures. Most common in headless/__init__.py (flush paths), _update_checks.py, and recipe rule loaders. Recommended: narrow exception types or raise log level to WARNING.
Assert-as-validation (2 findings): assert x is not None used for runtime invariants that should be if x is None: raise RuntimeError(...) — asserts are stripped under -O.
Dead code / unnecessary wrappers (5 findings): Unreachable branches (_session_model.py:82), redundant one-line delegation functions (_cmd_rpc.py:127,170), duplicate logic (tools_git.py:24).
Type safety (2 findings): String literal comparison == "timeout" instead of enum CliSubtype.TIMEOUT in fleet/_api.py.

Files NOT Reviewed

This review covers only the top 20 source files by addition count. Notable files NOT reviewed include:

All test files (1,400+ lines of test changes)
Recipe YAML/JSON contracts
Skills and SKILL.md files
Configuration, documentation, and CI files
The remaining ~1,500 changed files

## Summary Add `slots=True` to all 63 `@dataclass(frozen=True)` definitions (across 33 files in 11 packages) that currently lack it. Python `slots=True` on a frozen dataclass eliminates the `__dict__` per-instance overhead and replaces it with typed slot descriptors, reducing memory usage and improving attribute access speed. The project's `requires-python = ">=3.11"` means `slots=True` on `@dataclass` (introduced in 3.10) is fully supported. Two frozen dataclasses already have `slots=True` (`result_parser.py`, `_hook_settings.py`) and serve as the established pattern for this change. Every affected dataclass was verified to be safe: no inheritance chains between dataclasses, no manually defined `__slots__`, and no non-dataclass parent that would conflict. A new architecture compliance test is introduced first to establish the invariant and prevent future regressions. Closes #2192 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260507-215724-550140/.autoskillit/temp/make-plan/perf_add_slots_true_frozen_dataclasses_plan_2026-05-07_000001.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit  ## Token Usage Summary | Step | Model | count | uncached | output | cache_read | peak_ctx | turns | cache_write | time | |------|-------|-------|----------|--------|------------|----------|-------|-------------|------| | plan | claude-sonnet-4-6 | 1 | 115 | 10.6k | 396.2k | 45.2k | 82 | 51.2k | 6m 58s | | verify | claude-sonnet-4-6 | 1 | 204 | 21.8k | 1.2M | 59.7k | 66 | 48.2k | 5m 3s | | implement* | MiniMax-M2.7-highspeed | 1 | 1.1M | 15.0k | 1.0M | 35.1k | 166 | 52.5k | 4m 45s | | fix | claude-sonnet-4-6 | 1 | 360 | 17.2k | 2.7M | 90.0k | 108 | 79.7k | 15m 58s | | prepare_pr* | MiniMax-M2.7-highspeed | 1 | 58.8k | 4.2k | 149.0k | 29.8k | 16 | 42.3k | 1m 26s | | compose_pr* | MiniMax-M2.7-highspeed | 1 | 49.7k | 1.5k | 175.9k | 29.8k | 15 | 15.0k | 50s | | **Total** | | | 1.3M | 70.3k | 5.7M | 90.0k | | 289.0k | 35m 1s | \* *Step used a non-Anthropic provider; caching behavior may differ.* ## Token Efficiency | Step | LoC Changed | cache_read/LoC | cache_write/LoC | output/LoC | |------|-------------|----------------|-----------------|------------| | plan | 0 | — | — | — | | verify | 0 | — | — | — | | implement | 183 | 5727.6 | 287.1 | 81.7 | | fix | 16 | 170299.2 | 4982.6 | 1077.5 | | prepare_pr | 0 | — | — | — | | compose_pr | 0 | — | — | — | | **Total** | **199** | 28542.7 | 1452.1 | 353.0 | ## Model Usage Breakdown | Model | steps | uncached | output | cache_read | cache_write | time | |-------|-------|----------|--------|------------|-------------|------| | claude-sonnet-4-6 | 3 | 679 | 49.6k | 4.3M | 179.1k | 27m 59s | | MiniMax-M2.7-highspeed | 3 | 1.3M | 20.6k | 1.4M | 109.9k | 7m 1s | --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…#2217) ## Summary Replace `return asdict(self)` with explicit field dicts in the `to_dict()` method of four hot-path dataclasses: `DispatchRecord`, `TokenEntry`, `TimingEntry`, and `FailureRecord`. For `DispatchRecord.token_usage` (the single non-primitive field across all four), use `dict(self.token_usage)` for a shallow copy that avoids the deep-copy overhead of `asdict` while preserving safety for current callers. Remove `asdict` from the `dataclasses` import in all four files once it is no longer used. The cold-path `AutomationConfig` serialization at `cli/app.py:302` is explicitly out of scope and must remain unchanged. ## Requirements ## Acceptance Criteria - [ ] All 4 warm-path `to_dict()` methods use explicit field dicts - [ ] `DispatchRecord.token_usage` uses shallow copy (`dict(self.token_usage)`) - [ ] `AutomationConfig` at `cli/app.py:302` keeps `asdict()` - [ ] All existing tests pass - [ ] JSON output is identical (verified by round-trip tests) ## Changed Files ### New (★): tests/fleet/test_state_schema.py tests/pipeline/test_audit.py ### Modified (●): src/autoskillit/core/types/_type_results.py src/autoskillit/fleet/state_types.py src/autoskillit/pipeline/timings.py src/autoskillit/pipeline/tokens.py Closes #2193 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260507-215724-984456/.autoskillit/temp/make-plan/perf_replace_asdict_plan_2026-05-07_220015.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit  ## Token Usage Summary | Step | Model | count | uncached | output | cache_read | peak_ctx | turns | cache_write | time | |------|-------|-------|----------|--------|------------|----------|-------|-------------|------| | plan | claude-sonnet-4-6 | 1 | 107 | 10.0k | 340.2k | 61.7k | 40 | 32.2k | 4m 19s | | verify | claude-sonnet-4-6 | 1 | 68 | 8.6k | 256.7k | 46.2k | 42 | 33.3k | 4m 32s | | implement* | MiniMax-M2.7-highspeed | 1 | 1.2M | 6.5k | 831.6k | 29.8k | 77 | 16.3k | 2m 48s | | prepare_pr* | MiniMax-M2.7-highspeed | 1 | 73.3k | 3.7k | 205.7k | 29.8k | 20 | 15.3k | 1m 34s | | compose_pr* | MiniMax-M2.7-highspeed | 1 | 28.6k | 1.5k | 146.1k | 29.8k | 14 | 15.1k | 45s | | **Total** | | | 1.3M | 30.5k | 1.8M | 61.7k | | 112.2k | 14m 1s | \* *Step used a non-Anthropic provider; caching behavior may differ.* ## Token Efficiency | Step | LoC Changed | cache_read/LoC | cache_write/LoC | output/LoC | |------|-------------|----------------|-----------------|------------| | plan | 0 | — | — | — | | verify | 0 | — | — | — | | implement | 136 | 6114.4 | 119.6 | 48.1 | | prepare_pr | 0 | — | — | — | | compose_pr | 0 | — | — | — | | **Total** | **136** | 13090.7 | 825.1 | 223.9 | ## Model Usage Breakdown | Model | steps | uncached | output | cache_read | cache_write | time | |-------|-------|----------|--------|------------|-------------|------| | claude-sonnet-4-6 | 2 | 175 | 18.7k | 596.9k | 65.5k | 8m 52s | | MiniMax-M2.7-highspeed | 3 | 1.3M | 11.8k | 1.2M | 46.7k | 5m 8s | --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

## Summary Change validate-audit (project-local), validate-test-audit, and the extended validate-audit SKILL.md files from writing flat timestamped files into a shared `validate-audit/` directory to creating a per-run timestamped subdirectory (`validate-audit-{YYYY-MM-DD_HHMMSS}/`) with timestamp-free filenames inside. This matches the established pattern used by `validate-team`. Update downstream path references in `skill_contracts.yaml` and all affected tests. Closes #1960 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260507-225440-809232/.autoskillit/temp/make-plan/validate-audit_adopt_per-run_subdirectory_output_pattern_plan_2026-05-07_225500.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit  ## Token Usage Summary | Step | Model | count | uncached | output | cache_read | peak_ctx | turns | cache_write | time | |------|-------|-------|----------|--------|------------|----------|-------|-------------|------| | build_execution_map | claude-sonnet-4-6 | 1 | 76 | 12.2k | 347.2k | 52.1k | 31 | 39.9k | 4m 22s | | plan | claude-opus-4-6 | 1 | 79 | 19.2k | 755.8k | 79.2k | 57 | 85.7k | 7m 20s | | verify | claude-sonnet-4-6 | 1 | 29 | 15.7k | 447.8k | 54.6k | 77 | 41.5k | 7m 44s | | implement* | MiniMax-M2.7-highspeed | 1 | 3.8M | 20.0k | 2.3M | 29.8k | 178 | 72.2k | 7m 30s | | fix | claude-sonnet-4-6 | 1 | 190 | 10.7k | 1.1M | 60.6k | 61 | 50.3k | 8m 29s | | prepare_pr* | MiniMax-M2.7-highspeed | 1 | 89.7k | 4.2k | 208.5k | 29.8k | 20 | 42.2k | 1m 32s | | compose_pr* | MiniMax-M2.7-highspeed | 1 | 68.6k | 1.8k | 172.3k | 28.7k | 16 | 41.0k | 55s | | **Total** | | | 3.9M | 83.7k | 5.3M | 79.2k | | 372.7k | 37m 54s | \* *Step used a non-Anthropic provider; caching behavior may differ.* ## Token Efficiency | Step | LoC Changed | cache_read/LoC | cache_write/LoC | output/LoC | |------|-------------|----------------|-----------------|------------| | build_execution_map | 0 | — | — | — | | plan | 0 | — | — | — | | verify | 0 | — | — | — | | implement | 162 | 13957.0 | 446.0 | 123.2 | | fix | 2 | 532814.5 | 25159.0 | 5347.5 | | prepare_pr | 0 | — | — | — | | compose_pr | 0 | — | — | — | | **Total** | **164** | 32062.6 | 2272.9 | 510.5 | ## Model Usage Breakdown | Model | steps | uncached | output | cache_read | cache_write | time | |-------|-------|----------|--------|------------|-------------|------| | claude-sonnet-4-6 | 3 | 295 | 38.6k | 1.9M | 131.6k | 20m 35s | | claude-opus-4-6 | 1 | 79 | 19.2k | 755.8k | 85.7k | 7m 20s | | MiniMax-M2.7-highspeed | 3 | 3.9M | 25.9k | 2.6M | 155.4k | 9m 57s | --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

## Summary The recipe validation pipeline has an asymmetric structural gap: the feature-gate axis has a complete `undeclared-feature-requirement` rule (ERROR severity) that statically cross-references every `run_skill` step's categories against `FEATURE_REGISTRY`, but the pack-gate axis has no equivalent rule. This gap caused three incidents over five weeks. The fix adds an `undeclared-pack-requirement` semantic rule mirroring the feature-gate pattern, makes `unknown-required-pack` ERROR-severity, fixes `research-design.yaml` to declare `vis-lens`, and updates the test that locked in the incorrect `requires_packs` value. ## Requirements ## Conflict Resolution Decisions The following files had merge conflicts that were automatically resolved. ## Changed Files ### Modified (●): - `src/autoskillit/recipe/rules/rules_packs.py` - `src/autoskillit/recipe/rules/rules_skills.py` - `src/autoskillit/recipes/research-design.json` - `src/autoskillit/recipes/research-design.yaml` - `tests/recipe/test_bundled_recipes_general.py` - `tests/recipe/test_bundled_recipes_research_design.py` - `tests/recipe/test_rules_packs.py` Closes #2220 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/remediation-20260507-235311-480573/.autoskillit/temp/rectify/rectify_undeclared_pack_requirement_2026-05-08_000100.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit  ## Token Usage Summary | Step | Model | count | uncached | output | cache_read | peak_ctx | turns | cache_write | time | |------|-------|-------|----------|--------|------------|----------|-------|-------------|------| | rectify | claude-sonnet-4-6 | 1 | 6.6k | 13.1k | 907.1k | 118.7k | 197 | 67.7k | 8m 41s | | dry_walkthrough | claude-sonnet-4-6 | 1 | 40 | 9.6k | 547.7k | 53.1k | 99 | 40.0k | 4m 56s | | implement* | MiniMax-M2.7-highspeed | 1 | 1.9M | 16.7k | 1.6M | 70.4k | 160 | 129.5k | 7m 54s | | assess | claude-opus-4-6 | 1 | 84 | 9.4k | 2.1M | 71.5k | 88 | 60.9k | 6m 41s | | prepare_pr* | MiniMax-M2.7-highspeed | 1 | 83.5k | 2.9k | 206.4k | 34.0k | 22 | 52.2k | 1m 25s | | compose_pr* | MiniMax-M2.7-highspeed | 1 | 44.4k | 1.2k | 198.2k | 28.7k | 14 | 15.0k | 40s | | **Total** | | | 2.1M | 52.9k | 5.5M | 118.7k | | 365.3k | 30m 20s | \* *Step used a non-Anthropic provider; caching behavior may differ.* ## Token Efficiency | Step | LoC Changed | cache_read/LoC | cache_write/LoC | output/LoC | |------|-------------|----------------|-----------------|------------| | rectify | 0 | — | — | — | | dry_walkthrough | 0 | — | — | — | | implement | 234 | 6951.9 | 553.6 | 71.2 | | assess | 21 | 98049.5 | 2898.2 | 446.5 | | prepare_pr | 0 | — | — | — | | compose_pr | 0 | — | — | — | | **Total** | **255** | 21745.4 | 1432.6 | 207.3 | ## Model Usage Breakdown | Model | steps | uncached | output | cache_read | cache_write | time | |-------|-------|----------|--------|------------|-------------|------| | claude-sonnet-4-6 | 2 | 6.7k | 22.7k | 1.5M | 107.7k | 13m 38s | | MiniMax-M2.7-highspeed | 3 | 2.1M | 20.8k | 2.0M | 196.7k | 9m 59s | | claude-opus-4-6 | 1 | 84 | 9.4k | 2.1M | 60.9k | 6m 41s | --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

## Summary The `fleet campaign` session launches without an `initial_message`, so Claude has no first user turn to trigger the `FIRST ACTION` block that displays the ingredient table. The `fleet dispatch` path and `order` path both correctly pass a greeting as `initial_message`. The fix adds a `_FLEET_CAMPAIGN_GREETINGS` list and wires it through the campaign launch path, mirroring the existing dispatch pattern. Closes #2214 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260508-010012-804577/.autoskillit/temp/make-plan/fleet_campaign_session_missing_initial_message_greeting_trigger_plan_2026-05-08_010012.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit  ## Token Usage Summary | Step | Model | count | uncached | output | cache_read | peak_ctx | turns | cache_write | time | |------|-------|-------|----------|--------|------------|----------|-------|-------------|------| | plan | claude-opus-4-6 | 1 | 75 | 9.1k | 933.1k | 62.0k | 71 | 52.3k | 5m 8s | | verify | claude-opus-4-6 | 1 | 37 | 7.3k | 734.2k | 46.7k | 59 | 33.5k | 4m 3s | | implement* | MiniMax-M2.7-highspeed | 1 | 1.1M | 12.7k | 1.0M | 34.0k | 100 | 56.4k | 5m 43s | | fix | claude-sonnet-4-6 | 1 | 374 | 30.8k | 3.4M | 105.9k | 118 | 92.9k | 11m 35s | | prepare_pr* | MiniMax-M2.7-highspeed | 1 | 169.7k | 4.1k | 402.1k | 28.7k | 37 | 41.2k | 1m 56s | | compose_pr* | MiniMax-M2.7-highspeed | 1 | 52.2k | 1.2k | 227.0k | 28.7k | 16 | 15.1k | 42s | | **Total** | | | 1.3M | 65.4k | 6.7M | 105.9k | | 291.5k | 29m 9s | \* *Step used a non-Anthropic provider; caching behavior may differ.* ## Token Efficiency | Step | LoC Changed | cache_read/LoC | cache_write/LoC | output/LoC | |------|-------------|----------------|-----------------|------------| | plan | 0 | — | — | — | | verify | 0 | — | — | — | | implement | 153 | 6606.7 | 368.5 | 83.3 | | fix | 20 | 168857.8 | 4646.4 | 1541.3 | | prepare_pr | 0 | — | — | — | | compose_pr | 0 | — | — | — | | **Total** | **173** | 38638.6 | 1684.9 | 378.0 | ## Model Usage Breakdown | Model | steps | uncached | output | cache_read | cache_write | time | |-------|-------|----------|--------|------------|-------------|------| | claude-opus-4-6 | 2 | 112 | 16.5k | 1.7M | 85.8k | 9m 11s | | MiniMax-M2.7-highspeed | 3 | 1.3M | 18.1k | 1.6M | 112.8k | 8m 22s | | claude-sonnet-4-6 | 1 | 374 | 30.8k | 3.4M | 92.9k | 11m 35s | --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…2222) ## Summary The `research-campaign.yaml` declares 8 campaign-level ingredients but only forwards a subset to each dispatch's `ingredients:` block. Several ingredients (`task`, `review_design`, `output_mode`, `review_pr`, `audit_claims`) that sub-recipes declare are never forwarded — they silently fall back to sub-recipe defaults (or to the `_run_dispatch` auto-injection in the case of `task`), ignoring the user's campaign-level values. This plan fixes the YAML forwarding gaps and adds a new static validation rule (`campaign-dangling-ingredient`) that catches this class of bug at authoring time. ## Requirements (Embedded in the issue body — issue #2215 describes the problem and solution in detail) ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260508-010013-712390/.autoskillit/temp/make-plan/research_campaign_ingredient_forwarding_plan_2026-05-08_010600.md` ## Changed Files ### Modified (●): ● src/autoskillit/recipe/rules/rules_campaign.py ● src/autoskillit/recipes/campaigns/research-campaign.json ● src/autoskillit/recipes/campaigns/research-campaign.yaml ● tests/recipe/test_campaign_loader.py ● tests/recipe/test_research_campaign_rules.py ● tests/recipe/test_rules_campaign.py Closes #2215 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit  ## Token Usage Summary | Step | Model | count | uncached | output | cache_read | peak_ctx | turns | cache_write | time | |------|-------|-------|----------|--------|------------|----------|-------|-------------|------| | plan | claude-opus-4-6 | 1 | 78 | 13.1k | 649.7k | 68.6k | 78 | 59.9k | 6m 11s | | verify | claude-opus-4-6 | 1 | 897 | 6.2k | 625.9k | 70.5k | 81 | 57.3k | 4m 27s | | implement* | MiniMax-M2.7-highspeed | 1 | 1.1M | 10.7k | 974.1k | 28.7k | 83 | 51.5k | 7m 50s | | fix | claude-sonnet-4-6 | 1 | 174 | 8.1k | 929.2k | 61.4k | 53 | 48.3k | 3m 47s | | prepare_pr* | MiniMax-M2.7-highspeed | 1 | 133.1k | 3.0k | 341.7k | 28.7k | 24 | 15.2k | 1m 21s | | compose_pr* | MiniMax-M2.7-highspeed | 1 | 37.0k | 1.4k | 169.5k | 28.7k | 14 | 15.1k | 44s | | **Total** | | | 1.2M | 42.4k | 3.7M | 70.5k | | 247.4k | 24m 21s | \* *Step used a non-Anthropic provider; caching behavior may differ.* ## Token Efficiency | Step | LoC Changed | cache_read/LoC | cache_write/LoC | output/LoC | |------|-------------|----------------|-----------------|------------| | plan | 0 | — | — | — | | verify | 0 | — | — | — | | implement | 297 | 3279.6 | 173.5 | 35.9 | | fix | 33 | 28156.7 | 1463.2 | 246.1 | | prepare_pr | 0 | — | — | — | | compose_pr | 0 | — | — | — | | **Total** | **330** | 11181.9 | 749.7 | 128.6 | ## Model Usage Breakdown | Model | steps | uncached | output | cache_read | cache_write | time | |-------|-------|----------|--------|------------|-------------|------| | claude-opus-4-6 | 2 | 975 | 19.3k | 1.3M | 117.2k | 10m 38s | | MiniMax-M2.7-highspeed | 3 | 1.2M | 15.0k | 1.5M | 81.9k | 9m 55s | | claude-sonnet-4-6 | 1 | 174 | 8.1k | 929.2k | 48.3k | 3m 47s | --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

## Summary Create a new `validate-review-decisions` skill at `skills_extended/validate-review-decisions/SKILL.md` that adds mandatory intent analysis and seven evidence-gathering rules to the validation workflow for review-decisions audit reports. Update `full-audit.yaml` to route review-decisions validation through this new skill instead of the generic `validate-audit`. Add contract tests and update all test manifests. The skill follows the architecture established by `validate-test-audit` (domain-specific semantic rules + intent analysis) while preserving full output compatibility with `validate-audit` (same directory, naming convention, `validated: true` sentinel, `AUTOSKILLIT_AUDIT_RUN_DIR` support). ## Requirements ### REQ-VRD-1: Skill structure The skill MUST be placed at `skills_extended/validate-review-decisions/SKILL.md` with `categories: [audit]` frontmatter. ### REQ-VRD-2: Intent analysis as mandatory step Code validation subagents MUST perform intent analysis (docstring check, git provenance, test coverage, contract analysis, architectural constraint check, behavioral simulation) before assigning a verdict to ANY finding. This is not optional — every finding must have an intent analysis section in the subagent's reasoning. ### REQ-VRD-3: Evidence-gathering rules in subagent instructions The skill MUST include the seven evidence-gathering rules in the code validation subagent prompt. Rules MUST be generalizable (no references to specific finding IDs). ### REQ-VRD-4: Output compatibility Output files MUST use the same directory, naming convention, and format as `validate-audit`. ### REQ-VRD-5: Standalone invocability The skill MUST accept an `{audit_report_path}` argument and auto-discover the most recent review-decisions audit report when omitted. ### REQ-VRD-6: Full-audit recipe routing Update `full-audit.yaml` to dispatch review-decisions audit validation to `validate-review-decisions` and all other audit types to `validate-audit` (or `validate-test-audit` for tests). ### REQ-VRD-7: No pack changes The skill MUST use the existing `audit` pack. ### REQ-VRD-8: Consider generalizing intent analysis to validate-audit (follow-up) After `validate-review-decisions` and `validate-test-audit` both establish intent analysis patterns, evaluate merging the common intent analysis rules back into the generic `validate-audit` skill. This is a follow-up, not a blocker. ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260508-010014-405487/.autoskillit/temp/make-plan/validate_review_decisions_skill_plan_2026-05-08_010600.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit  ## Token Usage Summary | Step | Model | count | uncached | output | cache_read | peak_ctx | turns | cache_write | time | |------|-------|-------|----------|--------|------------|----------|-------|-------------|------| | plan | claude-opus-4-6 | 1 | 2.6k | 14.9k | 1.8M | 93.1k | 105 | 79.9k | 9m 1s | | verify | claude-opus-4-6 | 1 | 1.7k | 11.6k | 1.4M | 63.3k | 116 | 50.4k | 6m 22s | | implement* | MiniMax-M2.7-highspeed | 1 | 3.6M | 22.5k | 1.9M | 28.7k | 180 | 16.2k | 12m 59s | | fix | claude-sonnet-4-6 | 1 | 174 | 7.5k | 874.3k | 53.2k | 47 | 42.9k | 3m 33s | | prepare_pr* | MiniMax-M2.7-highspeed | 1 | 102.6k | 4.2k | 261.0k | 34.0k | 24 | 27.4k | 1m 37s | | compose_pr* | MiniMax-M2.7-highspeed | 1 | 57.5k | 1.8k | 227.0k | 28.7k | 16 | 15.1k | 49s | | **Total** | | | 3.7M | 62.5k | 6.4M | 93.1k | | 232.0k | 34m 23s | \* *Step used a non-Anthropic provider; caching behavior may differ.* ## Token Efficiency | Step | LoC Changed | cache_read/LoC | cache_write/LoC | output/LoC | |------|-------------|----------------|-----------------|------------| | plan | 0 | — | — | — | | verify | 0 | — | — | — | | implement | 766 | 2508.8 | 21.2 | 29.4 | | fix | 2 | 437171.0 | 21471.5 | 3770.5 | | prepare_pr | 0 | — | — | — | | compose_pr | 0 | — | — | — | | **Total** | **768** | 8375.0 | 302.1 | 81.3 | ## Model Usage Breakdown | Model | steps | uncached | output | cache_read | cache_write | time | |-------|-------|----------|--------|------------|-------------|------| | claude-opus-4-6 | 2 | 4.3k | 26.5k | 3.1M | 130.2k | 15m 24s | | MiniMax-M2.7-highspeed | 3 | 3.7M | 28.5k | 2.4M | 58.8k | 15m 25s | | claude-sonnet-4-6 | 1 | 174 | 7.5k | 874.3k | 42.9k | 3m 33s | --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…queued` State (#2225) ## Summary Formalize the issue label lifecycle as a first-class discrete system by introducing an `IssueLabelState` enum, a `LabelDef` registry with per-state metadata (color, description, swap semantics), and a transition table with validation — modeled after the fleet `DispatchStatus` pattern. Add `queued` as the fifth lifecycle state. Refactor `claim_issue`, `release_issue`, and `claim_and_resolve_issue` to derive colors, descriptions, and swap-remove sets from the registry instead of hardcoding them. Update `process-issues` Phase 0.5 to apply `queued` (not `in-progress`) during upfront claiming, and add a `queued → in-progress` swap at recipe pickup. ## Requirements ### R1: Formalize Label Lifecycle as a Discrete System Create an architectural component (enum + registry or Protocol-based approach) where each **lifecycle label** (as distinct from classification labels like `recipe:implementation`) is a first-class entity that declares: 1. **Label name** — the GitHub label string (e.g., `"queued"`, `"in-progress"`, `"staged"`, `"fail"`) 2. **Color** — hex color for `ensure_label` (currently hardcoded: `fbca04` for in-progress, `0075ca` for staged, `d73a4a` for fail) 3. **Description** — human-readable description for `ensure_label` 4. **Swap semantics** — which labels to remove when entering this state (e.g., entering `in-progress` removes `queued` and `fail`) 5. **Valid transitions** — which states this label can transition to (modeled after `_ALLOWED_TRANSITIONS` in fleet) When a new lifecycle label is added, the system must enforce that all of these are defined. No hanging labels — every label in the lifecycle participates in the tracking history. ### R2: Add `queued` Lifecycle State Add a `queued` label with the following lifecycle position: ``` [unlabeled / triaged] │ │ upfront claim (Phase 0.5) ▼ [queued] ← claimed by orchestrator, not yet processing │ │ recipe session begins ▼ [in-progress] ← recipe actively executing │ ├─ success → [staged] ├─ failure → [fail] └─ bare → [unlabeled] ``` Transitions for `queued`: - **Entry**: from unlabeled/fail (upfront claim swaps `fail` → `queued`) - **Exit**: to `in-progress` (recipe pickup), or to unlabeled (fatal cleanup release) ### R3: Integrate with `process-issues` Workflow Update `process-issues/SKILL.md` Phase 0.5 and Step 3: - **Phase 0.5 (upfront claiming)**: `claim_issue` should apply `queued` instead of `in-progress` - **Step 3b.1 (recipe pickup)**: Before loading the recipe, swap `queued` → `in-progress` for the specific issue being processed - **Fatal failure cleanup**: Release all `queued` issues (not just `in-progress`) back to unlabeled - **Recipe `claim_and_resolve` step**: When `upfront_claimed=true`, the issue arrives with `queued` (not `in-progress`), so the reentry path needs to handle the `queued` → `in-progress` swap ### R4: Integration Points These components need updates: | Component | File | Change | |-----------|------|--------| | `GitHubConfig` | `config/_config_dataclasses.py` | Add `queued_label` field (or migrate to registry) | | `defaults.yaml` | `config/defaults.yaml` | Add `queued` to `allowed_labels` and label config | | `claim_issue` | `server/tools/tools_issue_lifecycle.py` | Support claiming with `queued` label | | `claim_and_resolve_issue` | `server/tools/tools_issue_composite.py` | Handle `queued` → `in-progress` transition | | `release_issue` | `server/tools/tools_issue_lifecycle.py` | Remove `queued` in cleanup paths | | `swap_labels` | `execution/github.py` | No change needed (generic) | | `process-issues` | `skills_extended/process-issues/SKILL.md` | Phase 0.5 uses `queued`, Step 3 swaps to `in-progress` | | `build-execution-map` | `skills_extended/build-execution-map/SKILL.md` | Query `queued` label alongside `in-progress` for conflict detection | ### R5: Classification vs. Lifecycle Label Distinction The system should distinguish between: - **Lifecycle labels** (`queued`, `in-progress`, `staged`, `fail`) — managed by the state machine, subject to atomic swaps and transition validation - **Classification labels** (`recipe:implementation`, `recipe:remediation`, `bug`, `enhancement`, `autoreported`) — applied by triage/reporting, never removed by the pipeline, not part of the state machine This distinction should be encoded in the architecture, not just documented. ## Implementation Plan Plan files: - `/home/talon/projects/autoskillit-runs/impl-20260508-023057-647423/.autoskillit/temp/make-plan/formalize_issue_label_lifecycle_plan_2026-05-08_023500_part_a.md` - `/home/talon/projects/autoskillit-runs/impl-20260508-023057-647423/.autoskillit/temp/make-plan/formalize_issue_label_lifecycle_plan_2026-05-08_023500_part_b.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit  ## Token Usage Summary | Step | Model | count | uncached | output | cache_read | peak_ctx | turns | cache_write | time | |------|-------|-------|----------|--------|------------|----------|-------|-------------|------| | plan | claude-sonnet-4-6 | 1 | 80 | 32.2k | 840.0k | 81.9k | 100 | 82.2k | 18m 26s | | verify | claude-sonnet-4-6 | 2 | 1.9k | 27.0k | 1.9M | 77.8k | 210 | 108.7k | 15m 53s | | implement* | MiniMax-M2.7-highspeed | 2 | 5.8M | 44.2k | 4.9M | 87.5k | 374 | 272.0k | 25m 15s | | fix | claude-sonnet-4-6 | 1 | 394 | 21.5k | 3.3M | 98.0k | 126 | 98.3k | 17m 13s | | prepare_pr* | MiniMax-M2.7-highspeed | 1 | 91.2k | 8.3k | 206.6k | 34.2k | 25 | 60.5k | 2m 7s | | compose_pr* | MiniMax-M2.7-highspeed | 1 | 44.8k | 2.3k | 169.6k | 28.7k | 14 | 15.1k | 55s | | **Total** | | | 6.0M | 135.5k | 11.2M | 98.0k | | 636.9k | 1h 19m | \* *Step used a non-Anthropic provider; caching behavior may differ.* ## Token Efficiency | Step | LoC Changed | cache_read/LoC | cache_write/LoC | output/LoC | |------|-------------|----------------|-----------------|------------| | plan | 0 | — | — | — | | verify | 0 | — | — | — | | implement | 604 | 8054.5 | 450.4 | 73.2 | | fix | 52 | 62593.5 | 1891.2 | 413.6 | | prepare_pr | 0 | — | — | — | | compose_pr | 0 | — | — | — | | **Total** | **656** | 17128.3 | 970.9 | 206.5 | ## Model Usage Breakdown | Model | steps | uncached | output | cache_read | cache_write | time | |-------|-------|----------|--------|------------|-------------|------| | claude-sonnet-4-6 | 3 | 2.3k | 80.7k | 6.0M | 289.2k | 51m 34s | | MiniMax-M2.7-highspeed | 3 | 6.0M | 54.8k | 5.2M | 347.7k | 28m 17s | --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Trecek and others added 30 commits May 7, 2026 22:12

chore: bump version to 0.9.448

af25897

chore: bump version to 0.9.449

60a6f04

chore: bump version to 0.9.450

9082985

chore: bump version to 0.9.451

9f7f2d2

chore: bump version to 0.9.452

cf05424

chore: bump version to 0.9.453

45324b7

chore: bump version to 0.9.454

4756a69

chore: bump version to 0.9.455

14c691d

chore: bump version to 0.9.456

91ab648

chore: bump version to 0.9.457

a23a091

chore: bump version to 0.9.458

aea275f

chore: bump version to 0.9.459

dd8e6f8

chore: bump version to 0.9.460

47b000c

chore: bump version to 0.9.461

154338b

Trecek commented May 8, 2026

View reviewed changes

Trecek and others added 15 commits May 8, 2026 07:02

chore: bump version to 0.9.563

adbdaab

chore: bump version to 0.9.564

7c86aae

chore: bump version to 0.9.565

2039b94

chore: bump version to 0.9.566

5d20add

chore: bump version to 0.9.567

9c5b914

chore: bump version to 0.9.568

6edd617

chore: bump version to 0.9.569

f21ac2f

Trecek merged commit 45c1ba6 into main May 8, 2026
2 checks passed

Trecek deleted the develop branch May 8, 2026 15:32

Trecek restored the develop branch May 8, 2026 15:35

Trecek mentioned this pull request May 8, 2026

Enforce Closes #N injection via two-hook guard (PostToolUse state bridge + PreToolUse deny) #2248

Open

Conversation

Trecek commented May 8, 2026

Promotion: develop to main

Highlights

Release Notes

New Features

Bug Fixes

Performance

Refactoring

Infrastructure

Breaking Changes

Attention Required

Merged PRs

Linked Issues

Attention Required

Architecture Impact

Module Dependency (Structural — "How are modules coupled?")

Process Flow (Physiological — "How does it behave?")

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

AutoSkillit PR Review — Verdict: approved_with_comments

Finding Summary by Dimension

Top Patterns