Promote integration to main (198 PRs, 182 issues, 102 fixes, 107 features, 1 infra, 7 tests, 4 docs)#925
Promote integration to main (198 PRs, 182 issues, 102 fixes, 107 features, 1 infra, 7 tests, 4 docs)#925
Conversation
…rminal STOP Dead-End (#606) ## Summary The research recipe's `review_design` step currently hard-routes `verdict=STOP` directly to `design_rejected` (pipeline halt), bypassing any analysis of whether the stop triggers are actually fixable. This causes unnecessary pipeline deaths when stop triggers are mechanical methodological flaws with concrete fixes (as shown in TalonT-Org/spectral-init#222). This plan adds: 1. A new `resolve-design-review` skill that triages each stop-trigger finding as `ADDRESSABLE`, `STRUCTURAL`, or `DISCUSS` using parallel feasibility-validation subagents, then emits either `resolution=revised` (loop back for revision) or `resolution=failed` (genuinely terminal) 2. A new `resolve_design_review` recipe step in `research.yaml` that routes `STOP → resolve_design_review` instead of directly to `design_rejected` 3. A skill contract entry for `resolve-design-review` in `skill_contracts.yaml` 4. Updated tests: fix the existing STOP-routing assertion and add new tests for the step and skill ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; %% TERMINALS %% START([START]) REJECTED([design_rejected<br/>action: stop]) EXEC([create_worktree<br/>→ Execution Phase]) subgraph DesignPhase ["Research Recipe — Design Phase"] direction TB scope["scope<br/>━━━━━━━━━━<br/>Scope research question"] plan["plan_experiment<br/>━━━━━━━━━━<br/>Plan experiment<br/>(receives revision_guidance)"] review["● review_design<br/>━━━━━━━━━━<br/>Validate plan<br/>retries: 2"] revise["revise_design<br/>━━━━━━━━━━<br/>Route → plan_experiment"] rdr["★ resolve_design_review<br/>━━━━━━━━━━<br/>Triage STOP findings<br/>retries: 1"] triage{"★ Triage<br/>━━━━━━━━━━<br/>Any ADDRESSABLE<br/>or DISCUSS?"} end %% FLOW %% START --> scope scope --> plan plan --> review review -->|"verdict=GO"| EXEC review -->|"verdict=REVISE"| revise revise --> plan review -->|"● verdict=STOP<br/>(was: design_rejected)"| rdr rdr --> triage triage -->|"resolution=revised<br/>any ADDRESSABLE/DISCUSS"| revise triage -->|"resolution=failed<br/>all STRUCTURAL"| REJECTED %% CLASS ASSIGNMENTS %% class START,REJECTED,EXEC terminal; class scope,plan handler; class review,revise stateNode; class rdr,triage newComponent; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | START, design_rejected halt, create_worktree handoff | | Orange | Handler | Existing processing steps (scope, plan_experiment) | | Teal | State | Existing routing/decision nodes (review_design, revise_design) | | Green | New Component | ★ New resolve_design_review step + triage logic | Closes #605 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260404-132147-193877/.autoskillit/temp/make-plan/resolve_design_review_plan_2026-04-04_132804.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | input | output | cached | count | time | |------|-------|--------|--------|-------|------| | plan | 39 | 23.7k | 1.5M | 1 | 8m 6s | | verify | 23 | 12.0k | 937.2k | 1 | 4m 21s | | implement | 56 | 16.1k | 2.7M | 1 | 7m 30s | | fix | 25 | 9.1k | 879.4k | 1 | 5m 58s | | audit_impl | 17 | 14.8k | 356.6k | 1 | 5m 57s | | open_pr | 24 | 12.9k | 799.4k | 1 | 4m 41s | | **Total** | 184 | 88.6k | 7.2M | | 36m 35s | --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…ipeline (#611) ## Summary Every recipe (`implementation`, `remediation`, `implementation-groups`, `merge-prs`) previously had an interactive `confirm_cleanup` prompt at its terminal step. When `process-issues` drives batch processing, this halted the pipeline waiting for user input. A `defer_cleanup` flag was designed to bypass it, but made "interrupt the pipeline" the default and "don't interrupt" the opt-in. The fix: remove the interactive cleanup path entirely from all recipes. Every terminal step unconditionally calls `register_clone_status` (success or failure), writing to a shared registry file. After all issues in `process-issues` complete, a single `batch_cleanup_clones` call deletes all success-status clones and preserves all error-status clones. No prompts. No flags. No per-issue decisions. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 55, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; START([● process-issues starts batch]) subgraph PerIssue ["Per-Issue Recipe (× N issues)"] direction TB RECIPE["● Recipe Pipeline<br/>━━━━━━━━━━<br/>implementation / remediation<br/>implementation-groups / merge-prs<br/>plan → implement → test → push → PR → wait"] OUTCOME{"terminal<br/>outcome?"} REL_S["● release_issue_success<br/>━━━━━━━━━━<br/>release GitHub issue claim<br/>on_success/on_failure → register"] REL_F["● release_issue_failure<br/>━━━━━━━━━━<br/>release on error<br/>on_success/on_failure → register_failure"] REG_S["● register_clone_success<br/>━━━━━━━━━━<br/>register_clone_status<br/>status='success'<br/>on_success/on_failure → done"] REG_F["● register_clone_failure<br/>━━━━━━━━━━<br/>register_clone_status<br/>status='error'<br/>on_success/on_failure → escalate_stop"] DONE["● done<br/>━━━━━━━━━━<br/>action: stop (success)"] FAIL["● escalate_stop<br/>━━━━━━━━━━<br/>action: stop (failure)"] end REGISTRY[("● clone-cleanup-registry.json<br/>━━━━━━━━━━<br/>.autoskillit/temp/<br/>accumulated entries")] subgraph PostBatch ["● After ALL Batches Complete (process-issues Step 3d)"] direction LR BATCH["● batch_cleanup_clones<br/>━━━━━━━━━━<br/>reads registry<br/>deletes status=success clones<br/>preserves status=error clones<br/>no prompt, one call"] PRESERVED["preserved clones<br/>━━━━━━━━━━<br/>status=error kept<br/>for investigation"] DELETED["deleted clones<br/>━━━━━━━━━━<br/>status=success removed<br/>disk reclaimed"] end END_OK([COMPLETE]) START --> RECIPE RECIPE --> OUTCOME OUTCOME -->|"success path"| REL_S OUTCOME -->|"failure path"| REL_F REL_S --> REG_S REL_F --> REG_F REG_S -->|"writes status=success"| REGISTRY REG_F -->|"writes status=error"| REGISTRY REG_S --> DONE REG_F --> FAIL DONE -->|"after all issues done"| BATCH FAIL -->|"after all issues done"| BATCH BATCH -->|"reads registry"| REGISTRY BATCH --> PRESERVED BATCH --> DELETED DELETED --> END_OK PRESERVED --> END_OK class START,END_OK terminal; class RECIPE handler; class OUTCOME stateNode; class REL_S,REL_F phase; class REG_S,REG_F,BATCH newComponent; class DONE phase; class FAIL detector; class REGISTRY stateNode; class PRESERVED,DELETED output; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | Start and end states | | Orange | Handler | Recipe pipeline execution | | Teal | State | Decision routing and registry storage | | Purple | Phase | Control flow nodes (release, done) | | Green | New/Modified | ● Modified steps (register, batch cleanup) | | Red | Detector | Failure terminal (escalate_stop) | | Dark Teal | Output | Clone disposition artifacts | ### State Lifecycle Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; START([Pipeline Terminal Step]) subgraph WritePath ["● WRITE: Recipe Terminal Registration (once per clone)"] direction LR REG_S["● register_clone_success<br/>━━━━━━━━━━<br/>INIT_ONLY write<br/>status='success'<br/>clone_path (immutable)"] REG_F["● register_clone_failure<br/>━━━━━━━━━━<br/>INIT_ONLY write<br/>status='error'<br/>clone_path (immutable)"] end subgraph Registry ["● Registry File — APPEND_ONLY during run"] direction TB ENTRY["● clone-cleanup-registry.json<br/>━━━━━━━━━━<br/>entries: [{clone_path, status,<br/>step_name, timestamp}]<br/>written N times (once per clone)<br/>never mutated after write"] end subgraph ReadPath ["● READ: Batch Cleanup (once, post-run)"] direction LR BATCH["● batch_cleanup_clones<br/>━━━━━━━━━━<br/>reads all entries<br/>partitions by status"] GATE{"status?"} DEL["delete clone dir<br/>━━━━━━━━━━<br/>status=success<br/>disk reclaimed"] KEEP["preserve clone dir<br/>━━━━━━━━━━<br/>status=error<br/>for investigation"] end subgraph Contracts ["Contract Cards (recipe input contracts)"] direction LR C1["★ contracts/implementation-groups.yaml<br/>━━━━━━━━━━<br/>NEW — no defer_cleanup<br/>no registry_path"] C2["● contracts/implementation.yaml<br/>━━━━━━━━━━<br/>updated — removed<br/>defer_cleanup, registry_path"] C3["● contracts/remediation.yaml<br/>━━━━━━━━━━<br/>updated — removed<br/>defer_cleanup, registry_path"] C4["● contracts/merge-prs.yaml<br/>━━━━━━━━━━<br/>updated — removed defer_cleanup<br/>registry_path, keep_clone_on_failure"] end ELIMINATED["ELIMINATED state<br/>━━━━━━━━━━<br/>defer_cleanup ingredient<br/>registry_path ingredient<br/>keep_clone_on_failure ingredient<br/>check_defer_cleanup step<br/>confirm_cleanup step"] END_OK([COMPLETE]) START -->|"success terminal"| REG_S START -->|"failure terminal"| REG_F REG_S -->|"appends entry"| ENTRY REG_F -->|"appends entry"| ENTRY ENTRY -->|"read once post-run"| BATCH BATCH --> GATE GATE -->|"status=success"| DEL GATE -->|"status=error"| KEEP DEL --> END_OK KEEP --> END_OK C1 -.->|"contract enforces"| REG_S C2 -.->|"contract enforces"| REG_S C3 -.->|"contract enforces"| REG_S C4 -.->|"contract enforces"| REG_S ELIMINATED -.->|"no longer written"| ENTRY class START,END_OK terminal; class REG_S,REG_F,BATCH newComponent; class ENTRY stateNode; class GATE stateNode; class DEL,KEEP output; class C1 phase; class C2,C3,C4 phase; class ELIMINATED detector; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | Pipeline start and end | | Green | ● Modified / New | register steps and batch cleanup (this PR) | | Teal | State | Registry file and status decision | | Purple | Phase | Contract card files | | Dark Teal | Output | Clone disposition outcomes | | Red | Eliminated | State that no longer exists | Closes #610 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260404-185031-682892/.autoskillit/temp/make-plan/process_issues_defer_clone_cleanup_plan_2026-04-04_000000.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | input | output | cached | count | time | |------|-------|--------|--------|-------|------| | plan | 5.5k | 76.5k | 6.0M | 5 | 32m 41s | | verify | 3.1k | 86.2k | 5.4M | 5 | 31m 25s | | implement | 1.1k | 116.2k | 22.6M | 6 | 50m 55s | | fix | 214 | 28.4k | 3.5M | 5 | 30m 58s | | audit_impl | 137 | 58.9k | 3.1M | 5 | 19m 28s | | open_pr | 36 | 16.9k | 1.4M | 1 | 6m 15s | | **Total** | 10.1k | 383.2k | 42.0M | | 2h 51m | --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…ocal (#612) ## Summary Move `smoke-test.yaml` and its companion artifacts (contract card, flow diagram) from the bundled `src/autoskillit/recipes/` directory to the project-local `.autoskillit/recipes/` directory. This makes smoke-test invisible to end-user projects while remaining fully functional when running from the AutoSkillit repository root. The existing project-local recipe discovery mechanism already supports this — no production code changes are needed. All changes are file relocations and test updates. ## Requirements ### MOVE — Recipe File Relocation - **REQ-MOVE-001:** The file `src/autoskillit/recipes/smoke-test.yaml` must be relocated to `.autoskillit/recipes/smoke-test.yaml` at the project root. - **REQ-MOVE-002:** Associated contract card(s) in `src/autoskillit/recipes/contracts/` matching `smoke-test*` must be relocated to `.autoskillit/recipes/contracts/`. - **REQ-MOVE-003:** Associated diagram(s) in `src/autoskillit/recipes/diagrams/` matching `smoke-test*` must be relocated to `.autoskillit/recipes/diagrams/`. ### LIST — Listing Behavior - **REQ-LIST-001:** The smoke-test recipe must not appear in `list_recipes` output when the current working directory is outside the AutoSkillit repository. - **REQ-LIST-002:** The smoke-test recipe must appear in `list_recipes` output with source `PROJECT` when the current working directory is the AutoSkillit repository root. ### LOAD — Pipeline Compatibility - **REQ-LOAD-001:** `load_recipe("smoke-test")` must succeed when invoked from the AutoSkillit repository root. - **REQ-LOAD-002:** Existing smoke-test pipeline execution must remain functionally identical after the move. ### TEST — Test Updates - **REQ-TEST-001:** Tests that assert smoke-test has `RecipeSource.BUILTIN` must be updated to assert `RecipeSource.PROJECT`. - **REQ-TEST-002:** Tests that count the number of bundled recipes must be updated to reflect the removal of smoke-test from the bundled set. ## Architecture Impact ### Operational Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; START(["list_recipes / find_recipe_by_name called"]) subgraph ProjectLocal ["★ PROJECT-LOCAL SCAN (priority 1)"] direction TB PROJ_DIR["★ .autoskillit/recipes/<br/>━━━━━━━━━━<br/>source = PROJECT<br/>★ smoke-test.yaml (moved here)"] PROJ_CONTRACT["★ .autoskillit/recipes/contracts/<br/>━━━━━━━━━━<br/>★ smoke-test.yaml"] PROJ_DIAGRAM["★ .autoskillit/recipes/diagrams/<br/>━━━━━━━━━━<br/>★ smoke-test.md"] end subgraph Bundled ["BUNDLED SCAN (priority 2)"] direction TB BUILTIN_DIR["src/autoskillit/recipes/<br/>━━━━━━━━━━<br/>source = BUILTIN<br/>implementation, remediation,<br/>merge-prs, impl-groups<br/>(smoke-test removed)"] end DEDUP["Dedup via seen set<br/>━━━━━━━━━━<br/>Project names shadow bundled"] subgraph AutoskillitRepo ["AUTOSKILLIT REPO CONTEXT"] direction TB CLI_LIST["● autoskillit recipes list<br/>━━━━━━━━━━<br/>Shows smoke-test (source: project)"] CLI_ORDER["autoskillit order<br/>━━━━━━━━━━<br/>Pipeline execution menu"] CLI_RENDER["autoskillit recipes render<br/>━━━━━━━━━━<br/>_recipes_dir_for(PROJECT)<br/>→ .autoskillit/recipes/diagrams/"] end subgraph ExternalProject ["EXTERNAL PROJECT CONTEXT"] direction TB EXT_LIST["autoskillit recipes list<br/>━━━━━━━━━━<br/>smoke-test NOT visible<br/>(no project-local copy)"] end START --> PROJ_DIR PROJ_DIR --> DEDUP DEDUP --> BUILTIN_DIR PROJ_DIR --> PROJ_CONTRACT PROJ_DIR --> PROJ_DIAGRAM DEDUP --> CLI_LIST DEDUP --> CLI_ORDER CLI_RENDER --> PROJ_DIAGRAM DEDUP --> EXT_LIST class START terminal; class PROJ_DIR,PROJ_CONTRACT,PROJ_DIAGRAM newComponent; class BUILTIN_DIR stateNode; class DEDUP handler; class CLI_LIST,CLI_ORDER,CLI_RENDER cli; class EXT_LIST detector; ``` ### Module Dependency Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%% graph TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; subgraph Tests ["TESTS (modified ●)"] direction TB T_SMOKE["● test_smoke_pipeline.py<br/>━━━━━━━━━━<br/>uses SMOKE_SCRIPT<br/>→ project-local path"] T_BUNDLED["● test_bundled_recipes.py<br/>━━━━━━━━━━<br/>smoke_yaml fixture<br/>→ project-local path"] T_POLICY["● test_bundled_recipe_hidden_policy.py<br/>━━━━━━━━━━<br/>BUNDLED_RECIPE_NAMES<br/>smoke-test removed"] T_TOOLS["● test_tools_recipe.py<br/>━━━━━━━━━━<br/>list_recipes assertion<br/>smoke-test NOT in bundled"] T_ENGINE["● test_engine.py<br/>━━━━━━━━━━<br/>contract adapter test<br/>→ project-local path"] end subgraph L3 ["L3 — SERVER"] direction TB TOOLS_RECIPE["server.tools_recipe<br/>━━━━━━━━━━<br/>list_recipes, load_recipe<br/>validate_recipe"] end subgraph L2R ["L2 — RECIPE"] direction TB RECIPE_IO["recipe.io<br/>━━━━━━━━━━<br/>builtin_recipes_dir()<br/>list_recipes()"] RECIPE_VALIDATOR["recipe.validator<br/>━━━━━━━━━━<br/>run_semantic_rules<br/>analyze_dataflow"] RECIPE_CONTRACTS["recipe.contracts<br/>━━━━━━━━━━<br/>load_bundled_manifest"] end subgraph L2M ["L2 — MIGRATION"] direction TB MIG_ENGINE["migration.engine<br/>━━━━━━━━━━<br/>default_migration_engine<br/>contract adapters"] end subgraph L0 ["L0 — CORE"] direction TB CORE_PATHS["core.paths<br/>━━━━━━━━━━<br/>pkg_root() → bundled dir<br/>fan-in: all layers"] end subgraph Artifacts ["★ PROJECT-LOCAL ARTIFACTS (new)"] direction TB PROJ_RECIPE["★ .autoskillit/recipes/<br/>━━━━━━━━━━<br/>smoke-test.yaml"] PROJ_CONTRACT["★ .autoskillit/recipes/contracts/<br/>━━━━━━━━━━<br/>smoke-test.yaml"] PROJ_DIAGRAM["★ .autoskillit/recipes/diagrams/<br/>━━━━━━━━━━<br/>smoke-test.md"] end T_SMOKE -->|"imports"| TOOLS_RECIPE T_SMOKE -->|"imports"| RECIPE_IO T_BUNDLED -->|"imports"| RECIPE_IO T_BUNDLED -->|"imports"| RECIPE_CONTRACTS T_POLICY -->|"imports"| CORE_PATHS T_TOOLS -->|"imports"| TOOLS_RECIPE T_ENGINE -->|"imports"| CORE_PATHS T_ENGINE -->|"imports"| MIG_ENGINE TOOLS_RECIPE -->|"imports"| RECIPE_IO RECIPE_IO -->|"builtin_recipes_dir()"| CORE_PATHS RECIPE_VALIDATOR -->|"imports"| RECIPE_IO RECIPE_CONTRACTS -->|"imports"| RECIPE_IO MIG_ENGINE -->|"imports"| CORE_PATHS T_SMOKE -.->|"now reads"| PROJ_RECIPE T_BUNDLED -.->|"now reads"| PROJ_RECIPE T_ENGINE -.->|"now reads"| PROJ_CONTRACT class T_SMOKE,T_BUNDLED,T_POLICY,T_TOOLS,T_ENGINE phase; class TOOLS_RECIPE cli; class RECIPE_IO,RECIPE_VALIDATOR,RECIPE_CONTRACTS handler; class MIG_ENGINE handler; class CORE_PATHS stateNode; class PROJ_RECIPE,PROJ_CONTRACT,PROJ_DIAGRAM newComponent; ``` Closes #600 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260404-190817-394673/.autoskillit/temp/make-plan/move_smoke_test_recipe_plan_2026-04-04_190817.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | input | output | cached | count | time | |------|-------|--------|--------|-------|------| | plan | 5.5k | 76.5k | 6.0M | 5 | 32m 41s | | verify | 3.1k | 86.2k | 5.4M | 5 | 31m 25s | | implement | 1.1k | 116.2k | 22.6M | 6 | 50m 55s | | fix | 214 | 28.4k | 3.5M | 5 | 30m 58s | | audit_impl | 137 | 58.9k | 3.1M | 5 | 19m 28s | | open_pr | 74 | 37.1k | 3.0M | 2 | 12m 44s | | **Total** | 10.1k | 403.4k | 43.6M | | 2h 58m | --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…Type in review-design (#614) ## Summary The `review-design` skill has L1 severity calibration that correctly caps `estimand_clarity` and `hypothesis_falsifiability` by `experiment_type` — benchmarks can never produce L1 critical findings. But the red-team dimension has **no analogous calibration**, meaning any critical red-team finding triggers STOP regardless of experiment type. This creates an unresolvable loop for benchmarks: the red-team always finds new critical issues at progressively higher abstraction (the Hydra pattern), exhausting retries without ever producing GO. The fix adds a red-team severity calibration rubric to `review-design/SKILL.md` (mirroring the L1 rubric), updates the verdict logic to apply the cap before building `stop_triggers`, and adds diminishing-return awareness to `resolve-design-review/SKILL.md` so it can detect goalposts-moving across rounds. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; START([Plan submitted]) GO([GO → execute]) REVISE_OUT([REVISE → revise_design]) REVISED_OUT([revised → revise_design]) FAILED_OUT([failed → design_rejected]) subgraph ReviewDesign ["● review-design/SKILL.md"] direction TB L1["L1 Analysis<br/>━━━━━━━━━━<br/>estimand_clarity +<br/>hypothesis_falsifiability"] L1GATE{"L1 Fail-Fast<br/>━━━━━━━━━━<br/>Any L1 critical?"} PARALLEL["L2 + L3 + L4 + RT<br/>━━━━━━━━━━<br/>Parallel analysis"] RTCAP["● RT Severity Cap<br/>━━━━━━━━━━<br/>RT_MAX_SEVERITY[experiment_type]<br/>Downgrade if above ceiling"] MERGE["Merge + Dedup<br/>━━━━━━━━━━<br/>All findings pooled"] VERDICT{"● Verdict Logic<br/>━━━━━━━━━━<br/>stop_triggers built<br/>AFTER rt_cap applied"} end subgraph ResolveDesign ["● resolve-design-review/SKILL.md"] direction TB PARSE["Step 1: Parse Dashboard<br/>━━━━━━━━━━<br/>Extract stop-trigger findings<br/>Classify ADDRESSABLE/STRUCTURAL/DISCUSS"] DIMCHECK{"prior_revision_guidance<br/>━━━━━━━━━━<br/>provided?"} DIMRET["● Step 1.5: Diminishing-Return<br/>━━━━━━━━━━<br/>Compare ADDRESSABLE themes<br/>vs prior guidance entries"] GOALPOST{"goalposts_moving<br/>━━━━━━━━━━<br/>true for any finding?"} RECLASSIFY["● Reclassify<br/>━━━━━━━━━━<br/>ADDRESSABLE → STRUCTURAL<br/>annotate prior_theme_match"] RESGATE{"Any ADDRESSABLE<br/>or DISCUSS?"} end subgraph RecipeRouting ["● research.yaml — resolve_design_review step"] direction LR RECIPE["skill_command passes<br/>━━━━━━━━━━<br/>$context.revision_guidance<br/>as optional 3rd arg"] end START --> L1 L1 --> L1GATE L1GATE -->|"yes (L1 critical)"| MERGE L1GATE -->|"no"| PARALLEL PARALLEL --> RTCAP RTCAP --> MERGE MERGE --> VERDICT VERDICT -->|"stop_triggers present"| RECIPE VERDICT -->|"critical or ≥3 warnings"| REVISE_OUT VERDICT -->|"otherwise"| GO RECIPE --> PARSE PARSE --> DIMCHECK DIMCHECK -->|"yes"| DIMRET DIMCHECK -->|"no (round 1)"| RESGATE DIMRET --> GOALPOST GOALPOST -->|"true"| RECLASSIFY GOALPOST -->|"false"| RESGATE RECLASSIFY --> RESGATE RESGATE -->|"yes"| REVISED_OUT RESGATE -->|"all STRUCTURAL"| FAILED_OUT class START,GO,REVISE_OUT,REVISED_OUT,FAILED_OUT terminal; class L1,PARALLEL handler; class L1GATE,VERDICT,DIMCHECK,GOALPOST,RESGATE stateNode; class MERGE,PARSE phase; class RTCAP,DIMRET,RECLASSIFY newComponent; class RECIPE detector; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | Start and outcome states | | Orange | Handler | Analysis agents (L1, parallel L2-L4+RT) | | Teal | State | Decision points and verdict routing | | Purple | Phase | Merge and parse aggregation steps | | Green | Modified Component | ● Nodes changed by this PR (RT cap, diminishing-return detection, reclassify, recipe routing) | | Red | Detector | Recipe routing gate (passes revision_guidance) | Closes #609 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260404-185816-184240/.autoskillit/temp/make-plan/add-red-team-severity-calibration-by-experiment-type_plan_2026-04-04_185816.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | input | output | cached | count | time | |------|-------|--------|--------|-------|------| | plan | 5.5k | 76.5k | 6.0M | 5 | 32m 41s | | verify | 3.1k | 86.2k | 5.4M | 5 | 31m 25s | | implement | 1.1k | 116.2k | 22.6M | 6 | 50m 55s | | fix | 214 | 28.4k | 3.5M | 5 | 30m 58s | | audit_impl | 137 | 58.9k | 3.1M | 5 | 19m 28s | | open_pr | 135 | 68.4k | 5.4M | 4 | 23m 1s | | review_pr | 31 | 22.8k | 1.2M | 1 | 5m 50s | | **Total** | 10.2k | 457.5k | 47.2M | | 3h 14m | --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
#615) ## Summary The token summary table (displayed in PRs, terminal, and compact KV output) collapses 4 distinct Claude API token fields into 3 misleading columns. The column labeled "input" actually shows only the tiny uncached delta (`input_tokens`), and "cached" silently sums two cost-distinct categories (`cache_read_input_tokens` at 0.1x billing + `cache_creation_input_tokens` at 1.25x billing). This change splits the display into 4 token columns — `uncached`, `output`, `cache_read`, `cache_write` — across all 3 independent formatter implementations and their tests. No data model, extraction, or storage changes are needed — `TokenEntry` already preserves all 4 fields. This is purely a formatting-layer fix. ## Architecture Impact ### Data Lineage Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart LR classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; subgraph API ["Claude API Response"] direction TB F1["input_tokens<br/>━━━━━━━━━━<br/>Uncached delta"] F2["output_tokens<br/>━━━━━━━━━━<br/>Generated tokens"] F3["cache_read_input_tokens<br/>━━━━━━━━━━<br/>0.1x billing"] F4["cache_creation_input_tokens<br/>━━━━━━━━━━<br/>1.25x billing"] end subgraph Storage ["TokenEntry Storage"] TE[("TokenEntry<br/>━━━━━━━━━━<br/>4 fields intact<br/>Accumulated per step")] TJ[("token_usage.json<br/>━━━━━━━━━━<br/>Persisted session data<br/>All 4 fields")] end subgraph Canonical ["● telemetry_fmt.py (Canonical Formatter)"] direction TB FMD["● format_token_table()<br/>━━━━━━━━━━<br/>Markdown table<br/>Step|uncached|output|cache_read|cache_write|count|time"] FTM["● format_token_table_terminal()<br/>━━━━━━━━━━<br/>Terminal table<br/>UNCACHED|OUTPUT|CACHE_RD|CACHE_WR"] FKV["● format_compact_kv()<br/>━━━━━━━━━━<br/>Compact KV<br/>uc:|out:|cr:|cw:"] end subgraph Hooks ["Stdlib Hooks (no autoskillit imports)"] direction TB TSA["● token_summary_appender._format_table()<br/>━━━━━━━━━━<br/>Reads token_usage.json<br/>Markdown table → GitHub PR body"] POS["● pretty_output._fmt_get_token_summary()<br/>━━━━━━━━━━<br/>Reads get_token_summary JSON<br/>Compact KV → PostToolUse"] POR["● pretty_output._fmt_run_skill()<br/>━━━━━━━━━━<br/>Reads run_skill result dict<br/>Inline KV → PostToolUse"] end subgraph Outputs ["Display Targets"] direction TB MD["PR Body<br/>━━━━━━━━━━<br/>GitHub markdown table"] TERM["Terminal<br/>━━━━━━━━━━<br/>Padded column output"] KV["Compact KV<br/>━━━━━━━━━━<br/>One-liner summaries"] HOOK["PostToolUse Output<br/>━━━━━━━━━━<br/>Hook-formatted display"] end F1 --> TE F2 --> TE F3 --> TE F4 --> TE TE --> TJ TE --> FMD TE --> FTM TE --> FKV TJ --> TSA TJ -.-> POS FMD -->|"markdown rows"| MD FTM -->|"padded columns"| TERM FKV -->|"kv lines"| KV TSA -->|"gh api PATCH"| MD POS -->|"formatted text"| HOOK POR -->|"formatted text"| HOOK class F1,F2,F3,F4 cli; class TE,TJ stateNode; class FMD,FTM,FKV handler; class TSA,POS,POR integration; class MD,TERM,KV,HOOK output; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | API Fields | 4 Claude API token categories from usage response | | Teal | Storage | TokenEntry dataclass + persisted JSON session files | | Orange | Canonical Formatter | 3 functions in telemetry_fmt.py (all ● modified) | | Red | Stdlib Hooks | Independent hook implementations (all ● modified) | | Dark Teal | Outputs | Display targets: PR body, terminal, compact KV, PostToolUse | ### Operational Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; subgraph Triggers ["OPERATOR TRIGGERS"] direction TB GTS["get_token_summary<br/>━━━━━━━━━━<br/>MCP tool call<br/>format=json|markdown"] RS["run_skill<br/>━━━━━━━━━━<br/>MCP tool call<br/>Headless session"] PRPATCH["PR body update<br/>━━━━━━━━━━<br/>After open-pr skill<br/>PostToolUse event"] end subgraph State ["TOKEN STATE (read/write)"] direction TB TL[("DefaultTokenLog<br/>━━━━━━━━━━<br/>In-memory accumulator<br/>4 fields per step")] TJ[("token_usage.json<br/>━━━━━━━━━━<br/>Per-session disk files<br/>Read by stdlib hooks")] end subgraph Formatters ["● FORMATTERS (modified)"] direction TB TF["● telemetry_fmt.py<br/>━━━━━━━━━━<br/>format_token_table()<br/>format_token_table_terminal()<br/>format_compact_kv()"] TSA["● token_summary_appender.py<br/>━━━━━━━━━━<br/>_format_table()<br/>Stdlib-only hook"] PO["● pretty_output.py<br/>━━━━━━━━━━<br/>_fmt_get_token_summary()<br/>_fmt_run_skill()"] end subgraph Outputs ["OBSERVABILITY OUTPUTS (write-only)"] direction TB MDTBL["PR Body Table<br/>━━━━━━━━━━<br/>## Token Usage Summary<br/>Step|uncached|output|cache_read|cache_write|count|time"] TERM["Terminal Table<br/>━━━━━━━━━━<br/>STEP UNCACHED OUTPUT CACHE_RD CACHE_WR COUNT TIME<br/>Padded for readability"] KV["Compact KV<br/>━━━━━━━━━━<br/>name xN [uc:X out:X cr:X cw:X t:Xs]<br/>total_uncached / total_cache_read / total_cache_write"] HOOK["PostToolUse Display<br/>━━━━━━━━━━<br/>tokens_uncached:<br/>tokens_cache_read:<br/>tokens_cache_write:"] end GTS -->|"reads"| TL TL -.->|"flush"| TJ TJ -->|"load_sessions"| TSA TJ -.->|"via MCP JSON payload"| PO GTS --> TF TF -->|"markdown"| MDTBL TF -->|"terminal"| TERM TF -->|"compact"| KV RS -->|"PostToolUse event"| PO PO -->|"_fmt_run_skill"| HOOK PO -->|"_fmt_get_token_summary"| KV PRPATCH -->|"PostToolUse event"| TSA TSA -->|"gh api PATCH"| MDTBL class GTS,RS,PRPATCH cli; class TL,TJ stateNode; class TF,TSA,PO handler; class MDTBL,TERM,KV,HOOK output; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Triggers | Operator-initiated MCP tool calls and PostToolUse events | | Teal | State | Token accumulator (read/write) and persisted JSON files | | Orange | Formatters | 3 modified formatter implementations (all ● changed) | | Dark Teal | Outputs | Write-only observability artifacts: PR table, terminal, compact KV | Closes #604 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260404-190817-266225/.autoskillit/temp/make-plan/token_summary_4_columns_plan_2026-04-04_191000.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary Add `api-simulator` as a dev dependency and use its `mock_http_server` pytest fixture to test the quota guard's real HTTP path end-to-end. Currently all quota tests monkeypatch `_fetch_quota` at the function level — the actual httpx client construction, header injection (`Authorization: Bearer`, `anthropic-beta`), response parsing, and error handling are never exercised. This plan introduces a `base_url` parameter to `_fetch_quota` and `check_and_sleep_if_needed`, then writes 7 tests that point the real httpx client at `mock_http_server` to exercise the full HTTP path. **Files changed:** 3 (`pyproject.toml`, `src/autoskillit/execution/quota.py`, new `tests/execution/test_quota_http.py`) **Existing tests:** Unchanged — all monkeypatch-based tests in `test_quota.py` remain as-is. ## Requirements ### DEP — Dependency Integration - **REQ-DEP-001:** The system must include `api-simulator` as a dev-only dependency with a pinned git tag source. - **REQ-DEP-002:** The api-simulator dependency must not appear in production runtime dependencies. ### CFG — URL Configurability - **REQ-CFG-001:** `_fetch_quota` must accept a `base_url` parameter defaulting to `https://api.anthropic.com`. - **REQ-CFG-002:** `check_and_sleep_if_needed` must thread the `base_url` parameter through to `_fetch_quota` at both call sites. - **REQ-CFG-003:** The production behavior must be unchanged when `base_url` is not explicitly provided. ### HTTP — HTTP Path Verification - **REQ-HTTP-001:** Tests must exercise the real httpx client construction path, not monkeypatch `_fetch_quota`. - **REQ-HTTP-002:** Tests must verify that the `Authorization: Bearer` header is sent on the request. - **REQ-HTTP-003:** Tests must verify that the `anthropic-beta: oauth-2025-04-20` header is sent on the request. - **REQ-HTTP-004:** Tests must verify correct JSON response parsing for the `five_hour` utilization shape. ### ERR — Error Handling Verification - **REQ-ERR-001:** Tests must verify fail-open behavior on HTTP 4xx/5xx responses. - **REQ-ERR-002:** Tests must verify fail-open behavior on network timeout. - **REQ-ERR-003:** Tests must verify that the above-threshold path triggers a double-fetch (two HTTP requests). ### COMPAT — Backward Compatibility - **REQ-COMPAT-001:** Existing `test_quota.py` tests must continue to pass unchanged. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; START([START: check_and_sleep_if_needed]) subgraph GatePhase ["Gate Phase"] direction TB ENABLED{"config.enabled?"} DISABLED(["RETURN<br/>should_sleep: false"]) end subgraph CachePhase ["Cache Phase"] direction TB CACHE["_read_cache<br/>━━━━━━━━━━<br/>Read local JSON cache"] CACHE_HIT{"Cache fresh?<br/>━━━━━━━━━━<br/>age ≤ max_age?"} end subgraph FetchPhase ["HTTP Fetch Phase"] direction TB FETCH["● _fetch_quota<br/>━━━━━━━━━━<br/>★ base_url parameter<br/>httpx.AsyncClient GET"] BASEURL["★ base_url<br/>━━━━━━━━━━<br/>default: api.anthropic.com<br/>test: mock_http_server.url"] PARSE["Parse Response<br/>━━━━━━━━━━<br/>five_hour.utilization<br/>Z→+00:00 normalization"] end subgraph DecisionPhase ["Threshold Decision"] direction TB THRESHOLD{"utilization<br/>≥ threshold?"} RESETS_AT1{"resets_at<br/>is None?<br/>(Gate 1)"} REFETCH["● _fetch_quota re-fetch<br/>━━━━━━━━━━<br/>★ base_url threaded<br/>Double-fetch for accuracy"] RESETS_AT2{"resets_at<br/>still None?<br/>(Gate 2)"} end subgraph Results ["Results"] BELOW(["RETURN<br/>should_sleep: false"]) FALLBACK1(["RETURN<br/>should_sleep: true<br/>reason: unknown_reset<br/>fallback ≥ 60s"]) FALLBACK2(["RETURN<br/>should_sleep: true<br/>reason: unknown_reset<br/>fallback ≥ 60s"]) SLEEP(["RETURN<br/>should_sleep: true<br/>sleep_seconds computed"]) FAILOPEN(["RETURN<br/>should_sleep: false<br/>error key present"]) end subgraph TestInfra ["★ Test Infrastructure (test_quota_http.py)"] direction TB MOCK["★ mock_http_server<br/>━━━━━━━━━━<br/>api-simulator fixture<br/>HTTP server"] REGISTER["★ register / register_sequence<br/>━━━━━━━━━━<br/>Custom endpoint responses<br/>Status codes, delays"] INSPECT["★ get_requests / request_count<br/>━━━━━━━━━━<br/>Header verification<br/>Double-fetch assertion"] end START --> ENABLED ENABLED -->|"false"| DISABLED ENABLED -->|"true"| CACHE CACHE --> CACHE_HIT CACHE_HIT -->|"fresh + below threshold"| BELOW CACHE_HIT -->|"miss or expired"| FETCH FETCH --> BASEURL BASEURL --> PARSE PARSE --> THRESHOLD THRESHOLD -->|"below"| BELOW THRESHOLD -->|"above"| RESETS_AT1 RESETS_AT1 -->|"None"| FALLBACK1 RESETS_AT1 -->|"present"| REFETCH REFETCH --> RESETS_AT2 RESETS_AT2 -->|"None"| FALLBACK2 RESETS_AT2 -->|"present"| SLEEP FETCH -.->|"HTTP error / timeout"| FAILOPEN MOCK -.->|"serves responses to"| BASEURL REGISTER -.->|"configures"| MOCK INSPECT -.->|"verifies headers / count"| FETCH class START terminal; class DISABLED,BELOW,FALLBACK1,FALLBACK2,SLEEP,FAILOPEN phase; class ENABLED,CACHE_HIT,THRESHOLD,RESETS_AT1,RESETS_AT2 stateNode; class CACHE,PARSE handler; class FETCH,REFETCH handler; class BASEURL,MOCK,REGISTER,INSPECT newComponent; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | Entry point | | Teal | State | Decision points and routing | | Orange | Handler | Processing nodes (cache read, HTTP fetch, parse) | | Green | New Component | ★ New `base_url` parameter and test infrastructure | | Purple | Phase | Result return paths | ### Development Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; subgraph Deps ["● DEPENDENCY MANIFEST (pyproject.toml)"] direction TB PYPROJECT["● pyproject.toml<br/>━━━━━━━━━━<br/>hatchling build backend<br/>requires-python ≥ 3.11"] DEVDEPS["● dev optional-dependencies<br/>━━━━━━━━━━<br/>pytest, pytest-asyncio,<br/>pytest-httpx, pytest-xdist,<br/>pytest-timeout, ruff,<br/>import-linter, packaging"] APISIM["★ api-simulator<br/>━━━━━━━━━━<br/>New dev dependency<br/>HTTP mock fixture provider"] UVSRC["★ [tool.uv.sources]<br/>━━━━━━━━━━<br/>api-simulator pinned<br/>git: TalonT-Org/api-simulator<br/>branch: main"] UVLOCK["● uv.lock<br/>━━━━━━━━━━<br/>Regenerated with<br/>api-simulator entry"] end subgraph Quality ["CODE QUALITY GATES (pre-commit)"] direction TB FORMAT["ruff format<br/>━━━━━━━━━━<br/>Auto-fix code style<br/>reads + modifies src"] LINT["ruff check<br/>━━━━━━━━━━<br/>Auto-fix lint violations<br/>reads + modifies src"] TYPES["mypy<br/>━━━━━━━━━━<br/>Type checking<br/>reads src, reports only"] UVCHECK["uv lock check<br/>━━━━━━━━━━<br/>Verifies lockfile sync<br/>reads uv.lock"] SECRETS["gitleaks<br/>━━━━━━━━━━<br/>Secret scanning<br/>reads staged files"] IMPORTLINT["import-linter<br/>━━━━━━━━━━<br/>Layer contract enforcement<br/>IL-001 through IL-007"] end subgraph Testing ["TEST FRAMEWORK"] direction TB PYTEST["pytest + pytest-asyncio<br/>━━━━━━━━━━<br/>asyncio_mode=auto<br/>timeout=60s signal"] XDIST["pytest-xdist -n 4<br/>━━━━━━━━━━<br/>Parallel test workers<br/>worksteal distribution"] UNITQUOTA["● test_quota.py<br/>━━━━━━━━━━<br/>23 unit tests<br/>monkeypatch _fetch_quota<br/>mock signature updated"] HTTPQUOTA["★ test_quota_http.py<br/>━━━━━━━━━━<br/>7 end-to-end HTTP tests<br/>real httpx client path<br/>no monkeypatching"] MOCKSERVER["★ mock_http_server fixture<br/>━━━━━━━━━━<br/>api-simulator provides<br/>register / register_sequence<br/>get_requests / request_count"] end subgraph EntryPoints ["ENTRY POINTS"] CLI["autoskillit CLI<br/>━━━━━━━━━━<br/>autoskillit.cli:main"] end PYPROJECT --> DEVDEPS DEVDEPS --> APISIM APISIM --> UVSRC UVSRC --> UVLOCK PYPROJECT --> FORMAT FORMAT --> LINT LINT --> TYPES TYPES --> UVCHECK UVCHECK --> SECRETS SECRETS --> IMPORTLINT IMPORTLINT --> PYTEST PYTEST --> XDIST XDIST --> UNITQUOTA XDIST --> HTTPQUOTA APISIM -.->|"provides fixture"| MOCKSERVER MOCKSERVER -.->|"injected into"| HTTPQUOTA PYPROJECT --> CLI class PYPROJECT,DEVDEPS,UVLOCK phase; class APISIM,UVSRC,HTTPQUOTA,MOCKSERVER newComponent; class UNITQUOTA handler; class FORMAT,LINT,TYPES,UVCHECK,SECRETS,IMPORTLINT detector; class PYTEST,XDIST handler; class CLI output; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Purple | Build Config | pyproject.toml, dev deps, lockfile | | Green | New Component | ★ api-simulator dep, uv.sources, HTTP test file, mock fixture | | Orange | Test Framework | pytest, xdist, existing test_quota.py | | Red | Quality Gates | ruff, mypy, uv lock check, gitleaks, import-linter | | Dark Teal | Entry Points | CLI entry point | Closes #607 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260404-190816-816130/.autoskillit/temp/make-plan/integrate_api_simulator_quota_guard_plan_2026-04-04_191500.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | input | output | cached | count | time | |------|-------|--------|--------|-------|------| | plan | 5.5k | 76.5k | 6.0M | 5 | 32m 41s | | verify | 3.1k | 86.2k | 5.4M | 5 | 31m 25s | | implement | 1.1k | 116.2k | 22.6M | 6 | 50m 55s | | fix | 214 | 28.4k | 3.5M | 5 | 30m 58s | | audit_impl | 137 | 58.9k | 3.1M | 5 | 19m 28s | | open_pr | 100 | 51.3k | 3.9M | 3 | 16m 38s | | **Total** | 10.2k | 417.5k | 44.5M | | 3h 2m | --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
The `zero_writes` gate in `execution/headless.py` fires unconditionally
when `write_behavior.mode == "always"` and `write_call_count == 0`. The
`resolve-failures` contract declares `write_behavior: always`, but the
skill legitimately exits with zero `Edit`/`Write` calls when the
worktree is already green (0 fix iterations). The gate has no escape
path for this case — `success=True` is demoted to `zero_writes`, killing
an otherwise correct pipeline run.
This PR changes the contract to `conditional` mode with a pattern gated
on the `fixes_applied` structured token, extends the same fix to
`retry-worktree` and `resolve-review`, and adds a semantic rule to
prevent regression.
## Architecture Impact
### Process Flow Diagram
```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
%% CLASS DEFINITIONS %%
classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
%% TERMINALS %%
START([run_skill called])
SUCCESS(["✓ success=True<br/>subtype=success"])
DEMOTED(["✗ success=False<br/>subtype=zero_writes"])
subgraph Contract ["● Contract Resolution"]
direction TB
YAML["● skill_contracts.yaml<br/>━━━━━━━━━━<br/>resolve-failures:<br/> write_behavior: conditional<br/> write_expected_when:<br/> - fixes_applied ≥ 1 regex"]
FACTORY["● _factory.py<br/>━━━━━━━━━━<br/>_resolve_write_behavior()<br/>reads contract via lru_cache"]
SPEC["WriteBehaviorSpec<br/>━━━━━━━━━━<br/>mode=conditional<br/>expected_when=(pattern,)"]
end
subgraph Execution ["● Skill Execution"]
direction TB
SESSION["headless subprocess<br/>━━━━━━━━━━<br/>run tests, apply fixes<br/>via Bash / Edit / Write"]
TOKEN["● Structured Token<br/>━━━━━━━━━━<br/>fixes_applied = N<br/>emitted at Step 4"]
COUNT["write_call_count<br/>━━━━━━━━━━<br/>count Edit + Write<br/>in tool_uses"]
end
subgraph Gate ["● Zero-Write Gate"]
direction TB
GUARD{"success=True AND<br/>write_count=0 AND<br/>write_behavior≠None?"}
MODE{"● mode?<br/>━━━━━━━━━━<br/>always vs conditional"}
PATTERN{"● _check_expected_patterns<br/>━━━━━━━━━━<br/>AND-match all patterns<br/>against session output"}
EXPECT{"write_expected<br/>AND write_count=0?"}
end
%% FLOW %%
START --> YAML
YAML -->|"reads"| FACTORY
FACTORY -->|"builds"| SPEC
SPEC -->|"passed to executor"| SESSION
SESSION --> TOKEN
SESSION --> COUNT
TOKEN --> GUARD
COUNT --> GUARD
GUARD -->|"No — gate inactive"| SUCCESS
GUARD -->|"Yes"| MODE
MODE -->|"always"| EXPECT
MODE -->|"conditional"| PATTERN
PATTERN -->|"fixes_applied=0<br/>no match → False"| SUCCESS
PATTERN -->|"fixes_applied≥1<br/>match → True"| EXPECT
EXPECT -->|"write_count > 0<br/>artifact written"| SUCCESS
EXPECT -->|"write_count = 0<br/>no artifact"| DEMOTED
%% CLASS ASSIGNMENTS %%
class START,SUCCESS,DEMOTED terminal;
class YAML,SPEC stateNode;
class FACTORY,SESSION,COUNT handler;
class TOKEN output;
class GUARD,MODE,PATTERN,EXPECT detector;
```
### State Lifecycle Diagram
```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
%% CLASS DEFINITIONS %%
classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
subgraph ContractFields ["● INIT_ONLY: Contract Fields (YAML → frozen)"]
direction TB
WB["● write_behavior<br/>━━━━━━━━━━<br/>always ∣ conditional ∣ null<br/>Set in skill_contracts.yaml<br/>Cached via @lru_cache"]
WEW["● write_expected_when<br/>━━━━━━━━━━<br/>list of regex patterns<br/>AND-semantics at gate<br/>Empty = no pattern gate"]
end
subgraph SpecFields ["INIT_ONLY: WriteBehaviorSpec (frozen dataclass)"]
direction TB
MODE["● mode: str ∣ None<br/>━━━━━━━━━━<br/>Mirrors write_behavior<br/>Frozen after construction"]
EXPECTED["● expected_when: tuple<br/>━━━━━━━━━━<br/>Immutable tuple of patterns<br/>Frozen after construction"]
end
subgraph SessionState ["MUTABLE + APPEND: Session State"]
direction TB
TOOLS["tool_uses: list<br/>━━━━━━━━━━<br/>APPEND_ONLY during session<br/>Each Edit/Write appended"]
RESULT["● session output: str<br/>━━━━━━━━━━<br/>Contains structured tokens<br/>fixes_applied = N"]
WCC["write_call_count: int<br/>━━━━━━━━━━<br/>DERIVED from tool_uses<br/>count(Edit + Write)"]
end
subgraph GateState ["● MUTABLE: SkillResult Fields (gate mutations)"]
direction TB
SUCCESS["● success: bool<br/>━━━━━━━━━━<br/>Init: True (if session ok)<br/>Gate may demote → False"]
SUBTYPE["● subtype: str<br/>━━━━━━━━━━<br/>Init: success<br/>Gate may set → zero_writes"]
RETRY["● needs_retry: bool<br/>━━━━━━━━━━<br/>Init: False<br/>Gate may set → True"]
end
subgraph Validation ["● VALIDATION GATES"]
direction TB
G1{"● mode check<br/>━━━━━━━━━━<br/>always → write_expected=True<br/>conditional → check patterns"}
G2{"● _check_expected_patterns<br/>━━━━━━━━━━<br/>AND over all patterns<br/>re.search each on output"}
G3{"write_expected AND<br/>write_count == 0?<br/>━━━━━━━━━━<br/>Demote if both True"}
end
%% FLOW: Contract → Spec %%
WB -->|"reads"| MODE
WEW -->|"reads"| EXPECTED
%% FLOW: Spec → Gate %%
MODE -->|"determines gate path"| G1
EXPECTED -->|"provides patterns"| G2
%% FLOW: Session → Gate %%
TOOLS -->|"derives"| WCC
RESULT -->|"scanned by"| G2
WCC -->|"checked by"| G3
%% FLOW: Gate decisions %%
G1 -->|"conditional"| G2
G1 -->|"always"| G3
G2 -->|"match → True"| G3
G2 -->|"no match → False"| SUCCESS
%% FLOW: Gate → Mutation %%
G3 -->|"demote"| SUBTYPE
G3 -->|"demote"| RETRY
G3 -->|"preserve"| SUCCESS
%% CLASS ASSIGNMENTS %%
class WB,WEW detector;
class MODE,EXPECTED detector;
class TOOLS handler;
class RESULT output;
class WCC phase;
class SUCCESS,SUBTYPE,RETRY gap;
class G1,G2,G3 stateNode;
```
Closes #603
## Implementation Plan
Plan file:
`/home/talon/projects/autoskillit-runs/remediation-20260404-212507-745574/.autoskillit/temp/rectify/rectify_zero-writes-false-positive_2026-04-04_215019_part_a.md`
🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit
## Token Usage Summary
| Step | input | output | cached | count | time |
|------|-------|--------|--------|-------|------|
| investigate | 31 | 12.6k | 747.1k | 1 | 6m 34s |
| rectify | 11.4k | 57.9k | 2.0M | 1 | 27m 28s |
| review | 3.6k | 7.2k | 216.3k | 1 | 8m 0s |
| dry_walkthrough | 51 | 30.8k | 2.3M | 2 | 11m 22s |
| implement | 2.2k | 28.2k | 3.0M | 2 | 10m 56s |
| assess | 44 | 7.8k | 1.1M | 2 | 8m 43s |
| audit_impl | 30 | 18.6k | 654.7k | 2 | 9m 10s |
| open_pr | 28 | 15.8k | 1.0M | 1 | 7m 3s |
| **Total** | 17.3k | 178.9k | 11.1M | | 1h 29m |
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…alation Routing, and Pack Fix (#620) ## Summary This part adds the post-review re-validation loop and escalation consumption infrastructure to `research.yaml`, adds the `needs_rerun` structured output token to `resolve-research-review/SKILL.md`, and fixes the missing `exp-lens` pack registration. Additionally adds the data provenance lifecycle across 5 research pipeline skills (plan-experiment, run-experiment, write-report, review-design, review-research-pr) with contract and guard tests. ## Requirements ### DATA — Data Provenance Lifecycle - **REQ-DATA-001:** The `plan-experiment` skill must generate a Data Manifest section in every experiment plan that maps each hypothesis to its required data source(s), specifying source type (synthetic, fixture, external, gitignored), acquisition method (generate, download, copy), and verification criteria. - **REQ-DATA-002:** When the research task directive or issue specifies using particular data, the `plan-experiment` skill must include explicit acquisition steps for that data in the plan — the plan must not assume data will already be present. - **REQ-DATA-003:** The `run-experiment` skill pre-flight must perform a hypothesis-to-data mapping check against the Data Manifest: for each hypothesis, verify its required data source is present and non-empty before execution begins. - **REQ-DATA-004:** When `run-experiment` pre-flight finds that data the plan said would be acquired is missing, it must emit a structured `blocked_hypotheses` list and treat this as a FAIL — not silently degrade to N/A. - **REQ-DATA-005:** The `review-design` skill must include data acquisition completeness as a reviewable dimension at sufficient weight to influence the verdict (not L-weight), checking that every hypothesis has a data source, every external source has an acquisition step, and every gitignored path has a generation/download step. - **REQ-DATA-006:** The `review-research-pr` skill must include a `data-scope` review dimension that checks whether the experiment's data coverage matches the research task directive and flags when all benchmarks used only synthetic data for a domain-specific project. ### REPORT — Write-Report Data Scope Guardrails - **REQ-REPORT-001:** The `write-report` skill must include a mandatory Data Scope Statement in the Executive Summary that explicitly states what data types were used for all benchmarks and whether domain target data was present, absent, or partial. - **REQ-REPORT-002:** The `write-report` skill must perform a Metrics Provenance Check before including any `*_metrics.json` files: verify they were generated during the current experiment. If stale or unrelated, disclose and omit with explanation rather than silently dropping. - **REQ-REPORT-003:** The `write-report` skill must enforce pre-specified hypothesis gate thresholds: when a gate is not met, the report must state this as a failure, and GO recommendations must reference the specific gate that was met rather than silently substituting a different threshold. ### REVAL — Post-Review Re-Validation Loop - **REQ-REVAL-001:** The `resolve-research-review` skill must emit a structured output token (`needs_rerun = true/false`) indicating whether any `rerun_required` escalations exist, so the recipe can capture and route on it. - **REQ-REVAL-002:** The `research.yaml` recipe must include a routing step after `resolve_research_review` that checks for `rerun_required` escalations and routes to a `re_run_experiment` step when present. - **REQ-REVAL-003:** The `re_run_experiment` step must perform a targeted re-run of affected benchmarks/analyses (not a full experiment replay) using the same data and scripts, then flow to `re_write_report` → `re_push_research`. - **REQ-REVAL-004:** When only `design_flaw` escalations exist (no `rerun_required`), the recipe must annotate the PR body with the escalation details and continue to push. ### ESC — Escalation Consumption - **REQ-ESC-001:** The `research.yaml` recipe must include a `check_escalations` step between `resolve_research_review` and `re_push_research` that reads `escalation_records_{pr}.json` and routes based on escalation strategy types. - **REQ-ESC-002:** The `check_escalations` step must distinguish between `rerun_required` escalations (route to re-validation) and `design_flaw`-only escalations (annotate and continue). ### PACK — Exp-Lens Pack Registration - **REQ-PACK-001:** The `research.yaml` recipe must declare `requires_packs: [research, exp-lens]` so that all 18 exp-lens skills are available in headless sessions during the research recipe pipeline. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; PUSH_BR([push_branch<br/>━━━━━━━━━━<br/>git push worktree]) subgraph PRReview ["PR Review Phase"] direction TB OPEN["open_research_pr<br/>━━━━━━━━━━<br/>run_skill: open-pr"] GUARD{"guard_pr_url<br/>━━━━━━━━━━<br/>context.pr_url?"} REVIEW["● review_research_pr<br/>━━━━━━━━━━<br/>run_skill: review-research-pr<br/>captures: verdict"] end subgraph Resolution ["Review Resolution"] direction TB RESOLVE["● resolve_research_review<br/>━━━━━━━━━━<br/>run_skill: resolve-research-review<br/>captures: needs_rerun<br/>retries: 2"] end subgraph EscalationRouting ["★ Escalation Routing (New)"] direction TB CHECK{"★ check_escalations<br/>━━━━━━━━━━<br/>action: route<br/>context.needs_rerun?"} end subgraph RevalidationLoop ["★ Re-Validation Loop (New)"] direction TB RERUN["★ re_run_experiment<br/>━━━━━━━━━━<br/>run-experiment --adjust<br/>targeted benchmark re-run"] REWRITE["★ re_write_report<br/>━━━━━━━━━━<br/>write-report<br/>updated results"] RETEST["★ re_test<br/>━━━━━━━━━━<br/>test_check<br/>post-revalidation gate"] end REPUSH["● re_push_research<br/>━━━━━━━━━━<br/>run_cmd: git push"] COMPLETE([research_complete<br/>━━━━━━━━━━<br/>action: stop]) PUSH_BR --> OPEN OPEN --> GUARD GUARD -->|"pr_url truthy"| REVIEW GUARD -->|"no pr_url"| COMPLETE REVIEW -->|"changes_requested"| RESOLVE REVIEW -->|"approved / needs_human"| COMPLETE RESOLVE -->|"on_success"| CHECK RESOLVE -->|"on_failure / exhausted"| COMPLETE CHECK -->|"needs_rerun == true"| RERUN CHECK -->|"default (false/absent)"| REPUSH RERUN -->|"on_success"| REWRITE RERUN -->|"on_failure / context_limit"| REPUSH REWRITE -->|"on_success"| RETEST REWRITE -->|"on_failure / context_limit"| REPUSH RETEST -->|"pass or fail"| REPUSH REPUSH --> COMPLETE class PUSH_BR,COMPLETE terminal; class GUARD,CHECK stateNode; class OPEN,REVIEW,RESOLVE handler; class RERUN,REWRITE,RETEST newComponent; class REPUSH phase; ``` ### State Lifecycle Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; subgraph Manifest ["★ Data Manifest Contract (INIT_ONLY)"] direction TB DM["★ data_manifest<br/>━━━━━━━━━━<br/>hypothesis[], source_type,<br/>acquisition, location,<br/>verification, depends_on"] V9{"★ V9 Gate<br/>━━━━━━━━━━<br/>Every hypothesis has source?<br/>External has acquisition?<br/>Gitignored has generation?"} end subgraph DesignGate ["★ Design Review Gate"] direction TB DAQ{"★ data_acquisition L4<br/>━━━━━━━━━━<br/>Hypothesis coverage?<br/>External readiness?<br/>Directive compliance?"} end subgraph PreFlight ["★ Run-Experiment Pre-Flight"] direction TB PF{"★ Data Manifest<br/>Verification<br/>━━━━━━━━━━<br/>location exists?<br/>acquisition succeeds?"} BH["★ blocked_hypotheses<br/>━━━━━━━━━━<br/>APPEND_ONLY<br/>H5: missing at path"] end subgraph ReportGates ["★ Write-Report Validation Gates"] direction TB DSS["★ Data Scope Statement<br/>━━━━━━━━━━<br/>Mandatory in Executive Summary<br/>data types + domain coverage"] MPC["★ Metrics Provenance<br/>━━━━━━━━━━<br/>timestamp + relevance check<br/>disclose, never silently drop"] GE["★ Gate Enforcement<br/>━━━━━━━━━━<br/>pre-specified thresholds only<br/>no silent substitution"] end subgraph ReviewGate ["★ PR Review Gate"] direction TB DSCOPE["★ data-scope dimension<br/>━━━━━━━━━━<br/>Scope coverage?<br/>Claims qualified?<br/>Statement present?"] end subgraph EscalationState ["● Resolve Output Contract"] direction TB ESC["escalation_records<br/>━━━━━━━━━━<br/>APPEND_ONLY<br/>strategy: rerun_required<br/>strategy: design_flaw"] NR["● needs_rerun<br/>━━━━━━━━━━<br/>DERIVED from escalations<br/>any rerun_required → true<br/>else → false"] end DM -->|"writes"| V9 V9 -->|"PASS: plan saved"| DAQ V9 -->|"FAIL: plan rejected"| FAIL_PLAN([Plan Rejected]) DAQ -->|"GO: proceed"| PF DAQ -->|"STOP: hypothesis has no source"| REVISE([Revise Plan]) DAQ -->|"REVISE: missing verification"| REVISE PF -->|"ALL READY"| DSS PF -->|"BLOCKED: data missing"| BH BH --> FAIL_RUN([Status: FAILED]) DM -.->|"reads manifest"| PF DM -.->|"reads manifest"| DSS DM -.->|"reads manifest"| DSCOPE DSS --> MPC MPC --> GE GE -->|"report committed"| DSCOPE DSCOPE -->|"findings"| ESC ESC -->|"derive"| NR NR -->|"true → re-validate"| RERUN([Re-Validation Loop]) NR -->|"false → push"| PUSH([Direct Push]) class DM detector; class V9,DAQ,PF stateNode; class BH,ESC handler; class DSS,MPC,GE newComponent; class DSCOPE newComponent; class NR phase; class FAIL_PLAN,FAIL_RUN,REVISE gap; class RERUN,PUSH cli; ``` Closes #618 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260405-074034-301298/.autoskillit/temp/make-plan/research_recipe_data_provenance_plan_2026-04-05_074500_part_a.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 587 | 30.7k | 1.2M | 112.6k | 1 | 13m 29s | | verify | 73 | 35.9k | 3.7M | 137.0k | 2 | 11m 23s | | implement | 2.1k | 36.2k | 5.9M | 155.2k | 2 | 17m 4s | | fix | 50 | 13.2k | 2.1M | 64.5k | 1 | 10m 53s | | audit_impl | 28 | 17.3k | 786.1k | 51.7k | 1 | 5m 55s | | open_pr | 23 | 17.1k | 736.1k | 58.6k | 1 | 8m 12s | | **Total** | 2.9k | 150.3k | 14.5M | 579.5k | | 1h 6m | --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary
When a headless session spawns background agents via Claude Code's
`Agent` tool with `run_in_background: true`, Claude Code defers the
`type=result` NDJSON record until all background agents finish. If
autoskillit kills the process tree after Channel B confirms completion,
the deferred `type=result` is never flushed to stdout.
`parse_session_result` classifies the output as `UNPARSEABLE`, which
gates out all recovery paths and Channel B bypass — producing a false
failure for sessions that completed successfully.
The fix adds a **pre-gate Channel B drain-race recovery** in
`_build_skill_result` that runs *before* the `session.session_complete`
gate. When Channel B confirmed completion but the session is
UNPARSEABLE/EMPTY_OUTPUT, it reconstructs the result from
`assistant_messages` (which are written to stdout BEFORE the deferred
`type=result`) and promotes the session to SUCCESS, unlocking all
downstream recovery paths and Channel B bypass naturally.
## Architecture Impact
### Error/Resilience Diagram
```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
%% CLASS DEFINITIONS %%
classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
START(["● _build_skill_result<br/>━━━━━━━━━━<br/>Entry with SubprocessResult"])
subgraph PreGate ["● PRE-GATE: Channel B Drain-Race Recovery"]
direction TB
CB_CHECK{"● Channel B?<br/>+ subtype in<br/>RECOVERABLE_SUBTYPES?<br/>+ completion_marker?"}
CB_RECOVER["● _recover_from_separate_marker<br/>━━━━━━━━━━<br/>Reconstruct result from<br/>assistant_messages"]
CB_PROMOTE["● Promote session<br/>━━━━━━━━━━<br/>subtype → SUCCESS<br/>is_error → False"]
CB_SKIP["No recovery needed<br/>━━━━━━━━━━<br/>Pass through unchanged"]
end
subgraph CompletionGate ["session.session_complete Gate"]
direction TB
GATE{"session_complete?<br/>━━━━━━━━━━<br/>not is_error AND<br/>subtype not in<br/>FAILURE_SUBTYPES"}
MARKER_RECOVER["_recover_from_separate_marker<br/>━━━━━━━━━━<br/>Marker-based recovery"]
PATTERN_RECOVER["_recover_block_from_assistant_messages<br/>━━━━━━━━━━<br/>Pattern-based recovery"]
SYNTH["_synthesize_from_write_artifacts<br/>━━━━━━━━━━<br/>UNMONITORED only"]
SKIP_RECOVERY["Skip all recovery<br/>━━━━━━━━━━<br/>TIMEOUT / genuine failure"]
end
subgraph Outcome ["● _compute_outcome"]
direction TB
CB_BYPASS{"● Channel B<br/>bypass in<br/>_compute_success?"}
CONTENT_CHECK["_check_session_content<br/>━━━━━━━━━━<br/>6-gate validation"]
DEAD_END{"Dead-end guard<br/>━━━━━━━━━━<br/>ABSENT → DRAIN_RACE<br/>CONTRACT_VIOLATION → FAIL"}
end
subgraph PostOutcome ["Post-Outcome Gates"]
direction TB
BUDGET["_apply_budget_guard<br/>━━━━━━━━━━<br/>Max consecutive retries"]
CONTRACT["CONTRACT_RECOVERY gate<br/>━━━━━━━━━━<br/>adjudicated_failure +<br/>write evidence"]
ZERO_WRITE["Zero-write gate<br/>━━━━━━━━━━<br/>Expected writes missing"]
end
subgraph Terminals ["TERMINAL STATES"]
T_SUCCESS([SUCCEEDED])
T_RETRY([RETRIABLE<br/>DRAIN_RACE / RESUME /<br/>CONTRACT_RECOVERY])
T_FAIL([FAILED])
T_BUDGET([BUDGET_EXHAUSTED])
end
START --> CB_CHECK
CB_CHECK -->|"Yes: CHANNEL_B +<br/>UNPARSEABLE or EMPTY_OUTPUT"| CB_RECOVER
CB_CHECK -->|"No: other channel<br/>or non-recoverable subtype"| CB_SKIP
CB_RECOVER -->|"Recovery succeeds:<br/>marker standalone +<br/>substantive content"| CB_PROMOTE
CB_RECOVER -->|"Recovery fails:<br/>no marker in messages"| CB_SKIP
CB_PROMOTE --> GATE
CB_SKIP --> GATE
GATE -->|"True: session promoted<br/>or originally complete"| MARKER_RECOVER
GATE -->|"False: TIMEOUT /<br/>unrecoverable subtype"| SKIP_RECOVERY
MARKER_RECOVER --> PATTERN_RECOVER
PATTERN_RECOVER --> SYNTH
SYNTH --> CB_BYPASS
SKIP_RECOVERY --> CB_BYPASS
CB_BYPASS -->|"CHANNEL_B + session_complete<br/>+ patterns pass"| T_SUCCESS
CB_BYPASS -->|"No bypass: falls to<br/>termination dispatch"| CONTENT_CHECK
CONTENT_CHECK -->|"All 6 gates pass"| T_SUCCESS
CONTENT_CHECK -->|"Any gate fails"| DEAD_END
DEAD_END -->|"ABSENT + channel confirmed"| T_RETRY
DEAD_END -->|"CONTRACT_VIOLATION /<br/>SESSION_ERROR"| T_FAIL
T_RETRY --> BUDGET
BUDGET -->|"Under limit"| CONTRACT
BUDGET -->|"Exceeded"| T_BUDGET
CONTRACT -->|"adjudicated_failure +<br/>writes ≥ 1"| T_RETRY
CONTRACT -->|"No match"| ZERO_WRITE
ZERO_WRITE -->|"Expected writes missing"| T_RETRY
ZERO_WRITE -->|"No issue"| T_SUCCESS
%% CLASS ASSIGNMENTS %%
class START terminal;
class CB_CHECK,GATE,CB_BYPASS,DEAD_END stateNode;
class CB_RECOVER,CB_PROMOTE newComponent;
class CB_SKIP,SKIP_RECOVERY gap;
class MARKER_RECOVER,PATTERN_RECOVER,SYNTH handler;
class CONTENT_CHECK phase;
class BUDGET,CONTRACT,ZERO_WRITE detector;
class T_SUCCESS,T_RETRY,T_FAIL,T_BUDGET terminal;
```
### Process Flow Diagram
```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
%% CLASS DEFINITIONS %%
classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
START(["● _build_skill_result<br/>━━━━━━━━━━<br/>SubprocessResult input"])
subgraph EarlyExit ["Phase 1: Early Exit Interception"]
direction TB
TERM_CHECK{"termination<br/>reason?"}
STALE_PATH["STALE handler<br/>━━━━━━━━━━<br/>Attempt stdout recovery<br/>then retry or fail"]
TIMEOUT_PATH["TIMEOUT handler<br/>━━━━━━━━━━<br/>Override subtype=TIMEOUT<br/>is_error=True"]
end
PARSE["parse_session_result<br/>━━━━━━━━━━<br/>NDJSON → ClaudeSessionResult<br/>extracts assistant_messages"]
subgraph DrainRace ["● Phase 2: Channel B Drain-Race Recovery"]
direction TB
CB_MATCH{"● match channel<br/>━━━━━━━━━━<br/>CHANNEL_B +<br/>UNPARSEABLE/EMPTY_OUTPUT<br/>+ completion_marker?"}
CB_RECON["● _recover_from_separate_marker<br/>━━━━━━━━━━<br/>Check marker standalone<br/>in assistant_messages"]
CB_PROMOTE["● Promote session<br/>━━━━━━━━━━<br/>subtype → SUCCESS<br/>is_error → False"]
CB_NONE["No drain-race<br/>━━━━━━━━━━<br/>Session unchanged"]
end
subgraph GatedRecovery ["Phase 3: Completion-Gated Recovery"]
direction TB
GATE{"session_complete?<br/>━━━━━━━━━━<br/>not is_error AND<br/>subtype ∉ FAILURE_SUBTYPES"}
REC_MARKER["_recover_from_separate_marker<br/>━━━━━━━━━━<br/>Join assistant_messages<br/>when marker is standalone"]
REC_PATTERN["_recover_block_from_assistant<br/>━━━━━━━━━━<br/>Patterns in messages<br/>not in result"]
REC_SYNTH["_synthesize_from_write_artifacts<br/>━━━━━━━━━━<br/>UNMONITORED only:<br/>inject write paths"]
GATE_SKIP["Skip recovery<br/>━━━━━━━━━━<br/>Incomplete session"]
end
subgraph ComputeOutcome ["● Phase 4: Outcome Adjudication"]
direction TB
COMPUTE["● _compute_outcome<br/>━━━━━━━━━━<br/>_compute_success +<br/>_compute_retry"]
SUCCESS_CHECK{"● success?"}
RETRY_CHECK{"needs_retry?"}
end
subgraph PostGates ["Phase 5: Post-Outcome Gates"]
direction TB
BUDGET_G["_apply_budget_guard<br/>━━━━━━━━━━<br/>consecutive_failures ><br/>max_retries?"]
CONTRACT_G{"CONTRACT_RECOVERY?<br/>━━━━━━━━━━<br/>adjudicated_failure<br/>+ write_count ≥ 1"}
ZERO_G{"zero_write_gate?<br/>━━━━━━━━━━<br/>success but no<br/>Write/Edit calls"}
end
T_SUCCESS([SUCCEEDED])
T_RETRY([RETRIABLE])
T_FAIL([FAILED])
%% FLOW %%
START --> TERM_CHECK
TERM_CHECK -->|"STALE"| STALE_PATH
TERM_CHECK -->|"TIMED_OUT"| TIMEOUT_PATH
TERM_CHECK -->|"COMPLETED /<br/>NATURAL_EXIT"| PARSE
STALE_PATH --> T_RETRY
TIMEOUT_PATH --> PARSE
PARSE --> CB_MATCH
CB_MATCH -->|"Yes: all 3 guards pass"| CB_RECON
CB_MATCH -->|"No: wrong channel /<br/>wrong subtype / no marker"| CB_NONE
CB_RECON -->|"Marker found standalone<br/>+ substantive content"| CB_PROMOTE
CB_RECON -->|"No marker or<br/>empty content"| CB_NONE
CB_PROMOTE --> GATE
CB_NONE --> GATE
GATE -->|"True: complete session"| REC_MARKER
GATE -->|"False: incomplete"| GATE_SKIP
REC_MARKER --> REC_PATTERN
REC_PATTERN --> REC_SYNTH
REC_SYNTH --> COMPUTE
GATE_SKIP --> COMPUTE
COMPUTE --> SUCCESS_CHECK
SUCCESS_CHECK -->|"True"| ZERO_G
SUCCESS_CHECK -->|"False"| RETRY_CHECK
RETRY_CHECK -->|"True"| BUDGET_G
RETRY_CHECK -->|"False"| CONTRACT_G
BUDGET_G -->|"Under limit"| T_RETRY
BUDGET_G -->|"Exhausted"| T_FAIL
CONTRACT_G -->|"Yes: promote to retry"| BUDGET_G
CONTRACT_G -->|"No"| T_FAIL
ZERO_G -->|"Writes expected<br/>but count = 0"| T_RETRY
ZERO_G -->|"OK"| T_SUCCESS
%% CLASS ASSIGNMENTS %%
class START,T_SUCCESS,T_RETRY,T_FAIL terminal;
class TERM_CHECK,CB_MATCH,GATE,SUCCESS_CHECK,RETRY_CHECK stateNode;
class STALE_PATH,TIMEOUT_PATH,PARSE handler;
class CB_RECON,CB_PROMOTE newComponent;
class CB_NONE,GATE_SKIP gap;
class REC_MARKER,REC_PATTERN,REC_SYNTH handler;
class COMPUTE phase;
class BUDGET_G,CONTRACT_G,ZERO_G detector;
```
Closes #619
## Implementation Plan
Plan file:
`/home/talon/projects/autoskillit-runs/impl-619-20260405-085642-620214/.autoskillit/temp/make-plan/channel_b_drain_race_recovery_plan_2026-04-05_090230.md`
🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit
## Token Usage Summary
| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| plan | 42 | 18.8k | 1.6M | 80.7k | 1 | 9m 8s |
| verify | 17 | 17.4k | 687.5k | 79.7k | 1 | 6m 55s |
| implement | 77 | 28.2k | 4.4M | 89.7k | 1 | 15m 40s |
| audit_impl | 14 | 8.9k | 348.9k | 43.4k | 1 | 3m 4s |
| open_pr | 3.0k | 17.7k | 865.3k | 63.1k | 1 | 7m 30s |
| **Total** | 3.1k | 91.0k | 8.0M | 356.6k | | 42m 19s |
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…ulator FakeClaudeCLI (#624) ## Summary Add 10 end-to-end tests in a new file `tests/execution/test_session_classification_e2e.py` that exercise the full session failure classification pipeline — from raw NDJSON subprocess output produced by api-simulator's `fake_claude` fixture through `parse_session_result()` and `_build_skill_result()` to final `SkillResult` classification. Today all headless tests use `MockSubprocessRunner` with pre-constructed `SubprocessResult` objects; the NDJSON parsing and classification logic is never exercised against realistic subprocess output. These tests close that gap using 4 groups: NDJSON stream robustness (4 tests), context exhaustion edge cases (2 tests), kill boundary scenarios (2 tests), and process behavior simulation (2 tests). No production code changes are required. The `api-simulator` dev dependency was added by #607. ## Requirements ### BRIDGE — Integration Bridge - **REQ-BRIDGE-001:** Tests must use `fake_claude.run()` to produce real subprocess output, not hand-constructed strings. - **REQ-BRIDGE-002:** Tests must feed `proc.stdout` through `parse_session_result()` from `autoskillit.execution.session`. - **REQ-BRIDGE-003:** Tests must wrap the parsed result in a `SubprocessResult` and pass it to `_build_skill_result()` for full classification. ### PARSE — NDJSON Parse Robustness - **REQ-PARSE-001:** The parser must correctly skip `type=system` / `api_retry` records and still extract the final `type=result` record. - **REQ-PARSE-002:** The parser must handle non-JSON lines (stream corruption) gracefully without losing valid records. - **REQ-PARSE-003:** When multiple `type=result` records appear, the last one must determine classification. ### CTX — Context Exhaustion - **REQ-CTX-001:** A flat assistant record containing the context exhaustion marker with no `type=result` record must classify as `context_exhaustion` with `needs_retry=True`. - **REQ-CTX-002:** A `type=result` record with `is_error=True` and `errors` containing the marker must classify as retriable with `retry_reason=RESUME`. ### KILL — Kill Boundary - **REQ-KILL-001:** A truncated stream (via `truncate_after`) must produce `subtype=unparseable` or partial classification with nonzero exit code. - **REQ-KILL-002:** An `interrupted` subtype with nonzero exit code must result in `needs_retry=False` (gated by returncode). ### PROC — Process Behavior - **REQ-PROC-001:** The hang-after-result scenario must verify that the result record was emitted to stdout before the process hung. - **REQ-PROC-002:** Mid-stream exit via `inject_exit` must produce the correct exit code and truncated stdout. ### COMPAT — Compatibility - **REQ-COMPAT-001:** Existing `test_headless.py` and `test_session.py` tests must remain unchanged and passing. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; START([FakeClaudeCLI<br/>━━━━━━━━━━<br/>api-simulator fixture]) subgraph Bridge ["★ E2E Test Bridge (new)"] direction TB RUN["★ fake_claude.run()<br/>━━━━━━━━━━<br/>CompletedProcess<br/>with real NDJSON stdout"] WRAP["★ _classify() / inline<br/>━━━━━━━━━━<br/>Wrap in SubprocessResult<br/>pid=0, caller termination"] end subgraph Parse ["parse_session_result()"] direction TB SCAN{"stdout empty?"} LOOP["Scan NDJSON lines<br/>━━━━━━━━━━<br/>JSON decode; skip errors<br/>last type=result wins"] CTX_FLAG{"flat assistant<br/>output_tokens=0<br/>+ ctx marker?"} RESULT_FOUND{"result record<br/>found?"} end subgraph Classify ["_compute_outcome()"] direction TB SUCCESS_GATE{"_compute_success<br/>━━━━━━━━━━<br/>returncode=0?<br/>is_error? result?"} RETRY_GATE{"_compute_retry<br/>━━━━━━━━━━<br/>session.needs_retry?<br/>kill anomaly?"} CONTRA{"contradiction<br/>success+retry?"} DEADEND{"dead-end<br/>failed+confirmed<br/>+ABSENT?"} end subgraph Normalize ["_normalize_subtype()"] NORM["Map raw CLI subtype<br/>━━━━━━━━━━<br/>to final string label"] end subgraph Gates ["Post-Classification Gates"] BUDGET{"budget<br/>exhausted?"} ZERO{"zero writes<br/>when expected?"} end subgraph Outcomes ["SkillResult"] direction LR OK([success]) CTX([context_exhaustion<br/>needs_retry=True]) EMPTY([empty_output /<br/>unparseable]) INTR([interrupted<br/>needs_retry=False]) FAIL([failure<br/>terminal]) end START --> RUN RUN --> WRAP WRAP --> SCAN SCAN -->|"empty"| EMPTY SCAN -->|"non-empty"| LOOP LOOP --> CTX_FLAG CTX_FLAG -->|"yes → jsonl_context_exhausted=True"| RESULT_FOUND CTX_FLAG -->|"no"| RESULT_FOUND RESULT_FOUND -->|"yes"| SUCCESS_GATE RESULT_FOUND -->|"no → UNPARSEABLE / CTX_EXHAUSTION"| RETRY_GATE SUCCESS_GATE --> RETRY_GATE RETRY_GATE --> CONTRA CONTRA -->|"demote success"| DEADEND CONTRA -->|"consistent"| DEADEND DEADEND -->|"DRAIN_RACE"| NORM DEADEND -->|"terminal"| NORM NORM --> BUDGET BUDGET -->|"BUDGET_EXHAUSTED"| FAIL BUDGET -->|"ok"| ZERO ZERO -->|"zero_writes"| CTX ZERO -->|"ok"| OK SUCCESS_GATE -->|"returncode!=0"| INTR class START terminal; class RUN,WRAP newComponent; class LOOP handler; class SCAN,CTX_FLAG,RESULT_FOUND stateNode; class SUCCESS_GATE,RETRY_GATE,CONTRA,DEADEND phase; class NORM handler; class BUDGET,ZERO detector; class OK,CTX,EMPTY,INTR,FAIL terminal; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | Start (FakeClaudeCLI), final SkillResult outcomes | | Green | New Component | ★ `_classify()` bridge helper and `fake_claude.run()` — new test code | | Orange | Handler | NDJSON scan/accumulation and subtype normalization | | Teal | State | Decision points: empty check, context flag, result found | | Purple | Phase | Outcome computation gates (success, retry, contradiction, dead-end) | | Red | Detector | Post-classification guards (budget, zero-write) | ### Error/Resilience Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 55, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; START([★ E2E Test Suite<br/>━━━━━━━━━━<br/>10 failure scenarios<br/>via FakeClaudeCLI]) subgraph ParseGates ["NDJSON Parse Resilience Gates"] direction TB EMPTY_CHECK{"stdout<br/>empty?"} JSON_ERR["Corrupt / non-JSON lines<br/>━━━━━━━━━━<br/>silently skipped<br/>(test 2: corrupt_stream)"] API_RETRY["api_retry records<br/>━━━━━━━━━━<br/>skipped — not type=result<br/>(test 1: inject_api_retry)"] LAST_WINS["Multiple result records<br/>━━━━━━━━━━<br/>last record wins<br/>(test 3: two results)"] EXHAUST["Exhausted retries<br/>━━━━━━━━━━<br/>no result record emitted<br/>(test 4: exhaust=True)"] end subgraph CtxDetect ["Context Exhaustion Detection"] direction TB FLAT_DETECT{"flat assistant<br/>output_tokens=0<br/>+ ctx marker?<br/>(test 5)"} ERR_DETECT{"is_error=True AND<br/>marker in errors[]?<br/>(test 6)"} CTX_FLAG["jsonl_context_exhausted<br/>━━━━━━━━━━<br/>race-resilient flag"] end subgraph KillGates ["Kill Boundary Gates"] direction TB RC_CHECK{"returncode != 0?"} KILL_ANOM{"_is_kill_anomaly?<br/>━━━━━━━━━━<br/>UNPARSEABLE /\nEMPTY_OUTPUT /\nINTERRUPTED"} INTR_GATE{"subtype=interrupted<br/>+ rc != 0?<br/>(test 8)"} end subgraph PostGates ["Post-Classification Guards"] BUDGET{"consecutive failures<br/>> budget max?"} ZERO_WRITE{"success AND<br/>write_count=0<br/>AND write expected?"} end T_SUCCESS([success<br/>━━━━━━━━━━<br/>needs_retry=False]) T_CTX([context_exhaustion<br/>━━━━━━━━━━<br/>needs_retry=True, RESUME]) T_EMPTY([empty_output / unparseable<br/>━━━━━━━━━━<br/>needs_retry=True via RESUME]) T_INTR([interrupted<br/>━━━━━━━━━━<br/>needs_retry=False, terminal]) T_BUDGET([budget_exhausted<br/>━━━━━━━━━━<br/>needs_retry=False, terminal]) T_ZERO([zero_writes<br/>━━━━━━━━━━<br/>needs_retry=True]) START --> EMPTY_CHECK EMPTY_CHECK -->|"empty stdout"| T_EMPTY EMPTY_CHECK -->|"has content"| JSON_ERR JSON_ERR -->|"skip bad lines, continue"| API_RETRY API_RETRY -->|"skip, continue to result"| LAST_WINS LAST_WINS -->|"no result"| EXHAUST EXHAUST -->|"empty_output / unparseable"| T_EMPTY LAST_WINS -->|"result found"| FLAT_DETECT FLAT_DETECT -->|"yes"| CTX_FLAG FLAT_DETECT -->|"no"| ERR_DETECT ERR_DETECT -->|"yes"| CTX_FLAG CTX_FLAG -->|"needs_retry=True"| T_CTX ERR_DETECT -->|"no"| RC_CHECK RC_CHECK -->|"nonzero (test 7,8,10)"| INTR_GATE INTR_GATE -->|"yes → no retry"| T_INTR INTR_GATE -->|"no"| T_EMPTY RC_CHECK -->|"zero"| KILL_ANOM KILL_ANOM -->|"anomaly → RESUME retry"| T_EMPTY KILL_ANOM -->|"no anomaly"| BUDGET BUDGET -->|"exceeded"| T_BUDGET BUDGET -->|"ok"| ZERO_WRITE ZERO_WRITE -->|"violation"| T_ZERO ZERO_WRITE -->|"ok"| T_SUCCESS class START newComponent; class EMPTY_CHECK,FLAT_DETECT,ERR_DETECT,RC_CHECK,KILL_ANOM,INTR_GATE stateNode; class JSON_ERR,API_RETRY,LAST_WINS,EXHAUST,CTX_FLAG handler; class BUDGET,ZERO_WRITE detector; class T_SUCCESS,T_CTX,T_EMPTY,T_INTR,T_BUDGET,T_ZERO terminal; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Green | New Component | ★ E2E test suite (new) — exercises all failure paths | | Teal | Decision Gates | Key detection and routing decisions | | Orange | Handler | Parse resilience processing and flag setting | | Red | Guard | Post-classification safety guards (budget, zero-write) | | Dark Blue | Terminal | Final SkillResult outcome states | ### State Lifecycle Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 45, 'rankSpacing': 55, 'curve': 'basis'}}}%% flowchart TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; TEST["★ test_session_classification_e2e.py<br/>━━━━━━━━━━<br/>10 scenarios assert field contracts<br/>across all classification paths"] subgraph ParseState ["INIT_ONLY — Set by Parser, Never Overwritten"] direction LR CTX_EX["jsonl_context_exhausted<br/>━━━━━━━━━━<br/>flat assistant → True<br/>read by _is_context_exhausted()"] RC["returncode / termination<br/>━━━━━━━━━━<br/>from SubprocessResult<br/>used in all compute_* gates"] SID["session_id<br/>━━━━━━━━━━<br/>from result record<br/>passed through unchanged"] end subgraph DerivedState ["DERIVED — Computed, Not Stored During Parse"] direction TB SUCCESS_D["success<br/>━━━━━━━━━━<br/>returncode=0 AND content gates<br/>must be False if needs_retry=True"] RETRY_D["needs_retry + retry_reason<br/>━━━━━━━━━━<br/>RESUME / ZERO_WRITES / etc.<br/>only valid pair if needs_retry=True"] SUBTYPE_D["subtype (normalized)<br/>━━━━━━━━━━<br/>'success' / 'context_exhaustion'<br/>/ 'interrupted' / etc."] end subgraph Contracts ["CONTRACT ENFORCEMENT GATES"] direction TB CONTRA_GATE{"Contradiction Guard<br/>━━━━━━━━━━<br/>success=True AND<br/>needs_retry=True?"} INTR_GATE{"Interrupted Gate<br/>━━━━━━━━━━<br/>subtype=interrupted AND<br/>rc != 0?"} CTX_GATE{"Context Exhaustion<br/>━━━━━━━━━━<br/>jsonl_context_exhausted OR<br/>marker in errors[]?"} BUDGET_GATE{"Budget Guard<br/>━━━━━━━━━━<br/>consecutive failures<br/>> budget max?"} end subgraph ResumeStates ["RESUME SAFETY — needs_retry contract"] direction LR RESUME_OK(["needs_retry=True<br/>retry_reason=RESUME<br/>━━━━━━━━━━<br/>context_exhaustion path"]) NO_RETRY(["needs_retry=False<br/>retry_reason=NONE<br/>━━━━━━━━━━<br/>interrupted + rc!=0 path"]) BUDGET_STOP(["needs_retry=False<br/>retry_reason=BUDGET_EXHAUSTED<br/>━━━━━━━━━━<br/>terminal, no more retries"]) end TEST -->|"asserts all contracts"| CTX_EX TEST --> RC TEST --> SID CTX_EX -->|"read by"| CTX_GATE RC -->|"read by"| INTR_GATE RC -->|"read by"| CONTRA_GATE CTX_GATE -->|"exhausted → needs_retry=True"| RETRY_D CTX_GATE -->|"not exhausted"| INTR_GATE INTR_GATE -->|"interrupted+rc!=0 → terminal"| NO_RETRY INTR_GATE -->|"other"| CONTRA_GATE CONTRA_GATE -->|"contradiction → demote success"| SUCCESS_D CONTRA_GATE -->|"consistent"| SUCCESS_D RETRY_D --> BUDGET_GATE SUCCESS_D --> BUDGET_GATE SUBTYPE_D --> BUDGET_GATE BUDGET_GATE -->|"exceeded → clamp"| BUDGET_STOP BUDGET_GATE -->|"within budget"| RESUME_OK class TEST newComponent; class CTX_EX,RC,SID detector; class SUCCESS_D,RETRY_D,SUBTYPE_D phase; class CTX_GATE,INTR_GATE,CONTRA_GATE,BUDGET_GATE stateNode; class RESUME_OK,NO_RETRY,BUDGET_STOP cli; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Green | New Component | ★ E2E test suite — asserts all field contracts | | Red | INIT_ONLY | Fields set by parser, never overwritten | | Purple | Derived | Fields computed from classification, not stored during parse | | Teal | Gates | Contract enforcement decision points | | Dark Blue | Resume States | Terminal resume-safety outcomes | Closes #608 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-608-20260405-085643-660865/.autoskillit/temp/make-plan/test_session_failure_classification_with_api_simulator_plan_2026-04-05_090300.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 31 | 22.4k | 812.6k | 59.1k | 1 | 12m 6s | | verify | 21 | 17.2k | 863.3k | 66.7k | 1 | 9m 28s | | implement | 2.5k | 9.4k | 1.1M | 48.2k | 1 | 5m 43s | | fix | 21 | 7.3k | 703.0k | 42.4k | 1 | 7m 38s | | audit_impl | 10 | 7.4k | 139.9k | 39.6k | 1 | 3m 29s | | open_pr | 47 | 27.2k | 2.2M | 74.8k | 1 | 10m 44s | | **Total** | 2.7k | 90.9k | 5.8M | 330.8k | | 49m 11s | --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…rect Changes (#623) ## Summary When `implement-worktree-no-merge` runs and the model ignores instructions to create a worktree (via `git worktree add`), it edits files directly in the clone directory. This leaves dirty uncommitted changes (or direct commits) on the clone's branch. On retry, the next session inherits a contaminated working tree. This plan adds a **clone contamination guard** to the headless execution pipeline. The guard: 1. Snapshots the clone's HEAD SHA before each worktree-based skill session 2. After a failed session where no worktree was created, detects contamination (uncommitted changes or direct commits) 3. Reverts the clone to its pre-session state 4. Logs the cleanup for pipeline observability Key architectural insight: `EnterWorktree` does not exist in this codebase. Worktree creation uses standard `git worktree add` via Bash, and success is signaled by emitting `worktree_path = <path>` tokens in assistant messages. Detection of "no worktree created" is therefore: no `worktree_path` token in `session.assistant_messages`. ## Requirements ### Snapshot (SNAP) - **REQ-SNAP-001:** The system must capture the clone HEAD SHA before each `run_skill` invocation for worktree-based skills (implement-worktree-no-merge, retry-worktree). - **REQ-SNAP-002:** The system must capture the clone working tree cleanliness state (clean/dirty) before each `run_skill` invocation for worktree-based skills. ### Detection (DET) - **REQ-DET-001:** The system must detect uncommitted changes in the clone CWD after a worktree-based skill session that was adjudicated as failure. - **REQ-DET-002:** The system must detect direct commits in the clone (HEAD differs from pre-session SHA) after a worktree-based skill session that was adjudicated as failure. - **REQ-DET-003:** The system must verify whether `EnterWorktree` was called during the session by inspecting tool_uses in the session result. ### Revert (REV) - **REQ-REV-001:** The system must revert uncommitted changes in the clone when contamination is detected (git checkout + git clean). - **REQ-REV-002:** The system must revert direct commits in the clone when contamination is detected (git reset to pre-session SHA). - **REQ-REV-003:** The revert must only execute when all three conditions are met: worktree-based skill, adjudicated failure, and no EnterWorktree call in tool_uses. ### Observability (OBS) - **REQ-OBS-001:** The system must log all contamination detection and revert actions in the audit log with sufficient detail for pipeline visibility. - **REQ-OBS-002:** The audit log entry must include the pre-session SHA, post-session SHA, list of contaminated files, and revert action taken. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; START(["● run_headless_core()"]) subgraph PreSession ["★ Pre-Session Snapshot"] direction TB IS_WT{"★ is_worktree_skill?<br/>━━━━━━━━━━<br/>implement-worktree-no-merge<br/>or retry-worktree in cmd"} IS_CLONE{"★ not is_git_worktree?<br/>━━━━━━━━━━<br/>cwd is clone root,<br/>not a worktree"} SNAP["★ snapshot_clone_state()<br/>━━━━━━━━━━<br/>git rev-parse HEAD<br/>→ CloneSnapshot(head_sha)"] end subgraph Session ["Existing Session Lifecycle"] direction TB RUN["● runner() subprocess<br/>━━━━━━━━━━<br/>Headless Claude CLI"] BUILD["● _build_skill_result()<br/>━━━━━━━━━━<br/>Adjudication + gates<br/>worktree_path always extracted"] end subgraph PostGuard ["★ Post-Session Clone Guard"] direction TB CHK_SNAP{"★ snapshot captured?<br/>━━━━━━━━━━<br/>_clone_snapshot is not None"} CHK_SUCC{"★ skill_result.success?"} CHK_WT{"★ worktree_path set?<br/>━━━━━━━━━━<br/>skill_result.worktree_path<br/>is not None"} DETECT["★ detect_contamination()<br/>━━━━━━━━━━<br/>git rev-parse HEAD → post_sha<br/>git status --porcelain → files"] CHK_DIRTY{"★ contamination found?<br/>━━━━━━━━━━<br/>post_sha ≠ pre_sha<br/>OR dirty files"} REVERT["★ revert_contamination()<br/>━━━━━━━━━━<br/>git reset --hard pre_sha<br/>git clean -fd"] AUDIT["★ audit.record_failure()<br/>━━━━━━━━━━<br/>subtype=clone_contamination<br/>RetryReason.CLONE_CONTAMINATION"] end FLUSH["● flush_session_log()<br/>━━━━━━━━━━<br/>★ clone_contamination_reverted<br/>→ summary.json"] RETURN(["● return skill_result"]) SKIP_SNAP(["skip → _clone_snapshot=None"]) START --> IS_WT IS_WT -->|"no: not a worktree skill"| SKIP_SNAP IS_WT -->|"yes"| IS_CLONE IS_CLONE -->|"already a worktree CWD"| SKIP_SNAP IS_CLONE -->|"clone root CWD"| SNAP SNAP --> RUN SKIP_SNAP --> RUN RUN --> BUILD BUILD --> CHK_SNAP CHK_SNAP -->|"no snapshot"| FLUSH CHK_SNAP -->|"snapshot exists"| CHK_SUCC CHK_SUCC -->|"success=True"| FLUSH CHK_SUCC -->|"success=False"| CHK_WT CHK_WT -->|"worktree created"| FLUSH CHK_WT -->|"no worktree"| DETECT DETECT --> CHK_DIRTY CHK_DIRTY -->|"clean"| FLUSH CHK_DIRTY -->|"contaminated"| REVERT REVERT --> AUDIT AUDIT --> FLUSH FLUSH --> RETURN class START,RETURN,SKIP_SNAP terminal; class IS_WT,IS_CLONE,CHK_SNAP,CHK_SUCC,CHK_WT,CHK_DIRTY stateNode; class RUN,BUILD,FLUSH handler; class SNAP,DETECT,REVERT,AUDIT newComponent; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | Entry/exit points of `run_headless_core` | | Teal | State/Decision | Routing decisions that control guard activation | | Orange | Handler | Existing subprocess, adjudication, and telemetry nodes | | Green | New Component | New clone contamination guard components (★) | ### Module Dependency Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%% graph TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; subgraph L3 ["L3 — SERVER (existing, unchanged)"] direction LR SERVER["server/tools_execution.py<br/>━━━━━━━━━━<br/>run_skill, run_cmd handlers"] end subgraph L1 ["L1 — EXECUTION"] direction TB HEADLESS["● execution/headless.py<br/>━━━━━━━━━━<br/>run_headless_core()<br/>_build_skill_result()"] CLONE_GUARD["★ execution/clone_guard.py<br/>━━━━━━━━━━<br/>is_worktree_skill()<br/>snapshot_clone_state()<br/>check_and_revert_clone_contamination()"] SESSION_LOG["● execution/session_log.py<br/>━━━━━━━━━━<br/>flush_session_log()<br/>★ clone_contamination_reverted"] COMMANDS["execution/commands.py<br/>━━━━━━━━━━<br/>build_full_headless_cmd()"] SESSION["execution/session.py<br/>━━━━━━━━━━<br/>ClaudeSessionResult"] end subgraph L0 ["L0 — CORE (zero autoskillit imports)"] direction TB ENUMS["● core/_type_enums.py<br/>━━━━━━━━━━<br/>RetryReason enum<br/>★ CLONE_CONTAMINATION added"] TYPES["core/types.py<br/>━━━━━━━━━━<br/>SkillResult, FailureRecord<br/>AuditStore, SubprocessRunner"] PATHS["core/paths.py<br/>━━━━━━━━━━<br/>is_git_worktree()"] LOGGING["core/logging.py<br/>━━━━━━━━━━<br/>get_logger()"] CORE_INIT["core/__init__.py<br/>━━━━━━━━━━<br/>Re-exports all L0 surface"] end subgraph Ext ["EXTERNAL (stdlib)"] STDLIB["dataclasses, pathlib<br/>datetime, typing"] end SERVER -->|"imports run_headless"| HEADLESS HEADLESS -->|"★ imports 3 functions"| CLONE_GUARD HEADLESS -->|"imports"| COMMANDS HEADLESS -->|"imports"| SESSION HEADLESS -->|"imports"| SESSION_LOG HEADLESS -->|"imports core surface"| CORE_INIT CLONE_GUARD -->|"★ imports FailureRecord<br/>RetryReason, SkillResult<br/>get_logger, is_git_worktree"| CORE_INIT SESSION_LOG -->|"imports"| LOGGING CORE_INIT -->|"re-exports"| ENUMS CORE_INIT -->|"re-exports"| TYPES CORE_INIT -->|"re-exports"| PATHS CORE_INIT -->|"re-exports"| LOGGING TYPES -->|"imports RetryReason"| ENUMS CLONE_GUARD -->|"stdlib only"| STDLIB ENUMS -->|"stdlib only"| STDLIB class SERVER cli; class HEADLESS,SESSION_LOG,COMMANDS,SESSION handler; class CLONE_GUARD newComponent; class ENUMS,TYPES,PATHS,LOGGING,CORE_INIT stateNode; class STDLIB integration; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Server (L3) | MCP tool handlers — top application layer | | Orange | Execution (L1) | Service/orchestration layer modules | | Green | New Module | `clone_guard.py` — new L1 execution module (★) | | Teal | Core (L0) | Stable vocabulary/type layer — high fan-in | | Red | External | Standard library dependencies | Closes #617 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-617-20260405-085643-202786/.autoskillit/temp/make-plan/clone_contamination_guard_plan_2026-04-05_090600.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 6.9k | 23.4k | 1.7M | 82.7k | 1 | 10m 39s | | verify | 33 | 20.7k | 1.4M | 55.6k | 1 | 8m 39s | | implement | 81 | 24.3k | 4.4M | 89.7k | 1 | 10m 6s | | fix | 40 | 14.4k | 1.7M | 62.9k | 1 | 9m 17s | | audit_impl | 13 | 11.0k | 288.2k | 45.3k | 1 | 4m 14s | | open_pr | 28 | 20.1k | 1.0M | 55.4k | 1 | 7m 18s | | **Total** | 7.1k | 113.9k | 10.5M | 391.6k | | 50m 15s | --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…rtifact Merge Phase (#625) ## Summary Add a six-step archival phase to the end of the research recipe (`research.yaml`) that separates research artifacts from experimental code before completion. After all review cycles, re-runs, and CI checks finish, the new phase: (1) captures the experiment branch name, (2) creates a clean artifact-only branch containing only `research/` from a temporary worktree, (3) opens an artifact PR targeting the base branch, (4) tags the full experiment branch under `archive/research/` for permanent reference, (5) closes the original experiment PR with cross-reference links, then (6) proceeds to `research_complete`. Every archival step degrades gracefully — `on_failure` routes to `research_complete` so the pipeline never blocks on archival failures. ## Requirements ### SPLIT — Artifact Extraction - **REQ-SPLIT-001:** The recipe must create a new branch from the base branch (e.g., main) containing only the `research/` directory contents from the experiment branch, with no production source file changes. - **REQ-SPLIT-002:** The artifact extraction must use `git checkout <experiment-branch> -- research/` (or equivalent) to copy only the research directory's file state, not replay commit history. - **REQ-SPLIT-003:** The artifact-only branch must produce a single clean commit with a descriptive message referencing the experiment name. ### PR — Artifact PR - **REQ-PR-001:** The recipe must open a PR targeting the base branch with the artifact-only branch, referencing the original experiment PR number and summarizing key findings in the body. - **REQ-PR-002:** The artifact PR must contain zero changes to production source files — only files under `research/`. ### TAG — Branch Archival - **REQ-TAG-001:** The recipe must create an annotated git tag with the prefix `archive/research/` capturing the final state of the experiment branch (after all reviews, re-runs, and CI pass). - **REQ-TAG-002:** The annotated tag message must include the experiment name and a note that the report was merged via the artifact PR. - **REQ-TAG-003:** The tag must be pushed to the remote before the experiment branch is cleaned up. ### CLOSE — Experiment PR Closure - **REQ-CLOSE-001:** The recipe must close the original experiment PR with a comment linking to the artifact PR, the archive tag, and any follow-up implementation issues. - **REQ-CLOSE-002:** The closure comment must explain why the PR was not merged (experimental code in production source files) and where the research record is preserved. ### ORDER — Execution Ordering - **REQ-ORDER-001:** The archival phase must execute only after all review cycles, review resolutions, experiment re-runs (per #618), and CI checks have completed successfully. - **REQ-ORDER-002:** The archival phase must be the final phase before `research_complete`, not interleaved with review or re-validation steps. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; subgraph PostReview ["● Post-Review Phase (modified routing)"] direction TB GPR{"guard_pr_url<br/>━━━━━━━━━━<br/>pr_url set?"} RRP["● review_research_pr<br/>━━━━━━━━━━<br/>run_skill: review-pr<br/>skip_when_false: review_pr"] RRR["● resolve_research_review<br/>━━━━━━━━━━<br/>run_skill: resolve-review<br/>retries: 2"] CE{"check_escalations<br/>━━━━━━━━━━<br/>needs_rerun?"} RERUN["re_run_experiment<br/>━━━━━━━━━━<br/>run-experiment --adjust"] REWRITE["re_write_report<br/>━━━━━━━━━━<br/>write-report"] RETEST["re_test<br/>━━━━━━━━━━<br/>test_check"] REPUSH["● re_push_research<br/>━━━━━━━━━━<br/>git push"] end subgraph Archival ["★ Archival Phase (new)"] direction TB BA{"★ begin_archival<br/>━━━━━━━━━━<br/>pr_url truthy?"} CEB["★ capture_experiment_branch<br/>━━━━━━━━━━<br/>git rev-parse HEAD<br/>captures: experiment_branch"] CAB["★ create_artifact_branch<br/>━━━━━━━━━━<br/>worktree + checkout research/<br/>captures: artifact_branch"] OAP["★ open_artifact_pr<br/>━━━━━━━━━━<br/>gh pr create (research/ only)<br/>captures: artifact_pr_url"] TEB["★ tag_experiment_branch<br/>━━━━━━━━━━<br/>git tag -a archive/research/*<br/>captures: archive_tag"] CEP["★ close_experiment_pr<br/>━━━━━━━━━━<br/>gh pr close + comment"] end RC([research_complete<br/>━━━━━━━━━━<br/>action: stop]) GPR -->|"pr_url empty"| RC GPR -->|"pr_url truthy"| RRP RRP -->|"changes_requested"| RRR RRP -->|"needs_human / default / fail"| BA RRR -->|"success"| CE RRR -->|"exhausted / fail"| BA CE -->|"needs_rerun=true"| RERUN CE -->|"default"| REPUSH RERUN --> REWRITE --> RETEST --> REPUSH REPUSH -->|"success / fail"| BA BA -->|"pr_url truthy"| CEB BA -->|"default"| RC CEB -->|"success"| CAB CEB -->|"fail"| RC CAB -->|"success"| OAP CAB -->|"fail"| RC OAP -->|"success"| TEB OAP -->|"fail"| RC TEB -->|"success"| CEP TEB -->|"fail"| RC CEP -->|"success / fail"| RC class GPR,CE,BA stateNode; class RRP,RRR,RERUN,REWRITE,RETEST,REPUSH handler; class CEB,CAB,OAP,TEB,CEP newComponent; class RC terminal; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | `research_complete` stop state | | Teal | State/Route | Decision and routing steps (guard_pr_url, check_escalations, begin_archival) | | Orange | Handler | Existing processing steps — `●` marks modified routing targets | | Green | New Component | Six new archival steps (`★`) — linear chain with graceful degradation | Closes #621 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260405-101015-593986/.autoskillit/temp/make-plan/research_recipe_post_completion_archival_plan_2026-04-05_101500.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 2.2k | 36.6k | 1.4M | 90.3k | 1 | 16m 17s | | verify | 32 | 25.8k | 1.2M | 55.5k | 1 | 14m 5s | | implement | 48 | 14.0k | 1.9M | 50.5k | 1 | 5m 52s | | audit_impl | 16 | 9.7k | 178.9k | 55.3k | 2 | 4m 31s | | open_pr | 22 | 11.7k | 690.1k | 46.2k | 1 | 4m 26s | | **Total** | 2.3k | 97.7k | 5.4M | 297.8k | | 45m 13s | --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary - Add `Configure git auth for private deps` step to `patch-bump-integration.yml` and `version-bump.yml` before `uv lock` runs - Fixes authentication failure when resolving the private `api-simulator` git dependency added in PR #613 - Mirrors the existing auth pattern already present in `tests.yml` (line 76) ## Root Cause PR #613 added `api-simulator` as a private git dependency in `pyproject.toml`. The `tests.yml` workflow was updated with git auth, but both version-bump workflows were missed. Every PR merged to `integration` since then fails at the `uv lock` step with: ``` fatal: could not read Username for 'https://github.com': terminal prompts disabled ``` ## Test plan - [ ] This PR's own CI passes (tests.yml) - [ ] After merge, the patch-bump workflow should succeed — verify by checking the `bump-patch` check on this PR's merge commit - [ ] Re-run a recent failed bump-patch workflow to confirm the fix 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary Fixes a 3-iteration ejection loop in the merge queue pipeline by introducing ejection-cause enrichment (`ejected_ci_failure` state and `ejection_cause` field in `wait_for_merge_queue`), a CI gate after every force-push (`ci_watch_post_queue_fix` step), and two post-rebase manifest validation gates (language-aware validity check and duplicate key scan) in `resolve-merge-conflicts`. Closes all six gaps identified in #627: blind CI ejection routing, missing CI gate after re-push, absent manifest/semantic validation, and missing `head_sha` in CI results. <details> <summary>Individual Group Plans</summary> ### Group 1: Implementation Plan: Queue Ejection Loop Fix — PART A ONLY This part addresses the Python code layer for the queue ejection loop fix (Gaps 2 and 5 from issue #627). **Gap 2** — `execution/merge_queue.py` currently returns `pr_state="ejected"` for every ejection regardless of cause. When GitHub's CI fails on a merge-group commit, the recipe cannot distinguish a CI failure ejection from a conflict ejection, so it retries conflict resolution indefinitely (no-op rebase loop). The fix: when the ejection is confirmed and `checks_state == "FAILURE"`, return `pr_state="ejected_ci_failure"` plus an `ejection_cause="ci_failure"` field, allowing recipe `on_result` routing to send CI failures directly to `diagnose_ci` instead of `queue_ejected_fix`. **Gap 5** — `server/tools_ci.py` infers `head_sha` from `git rev-parse HEAD` but never includes it in the JSON response. Recipe orchestrators cannot verify that CI results correspond to the current HEAD after a force-push. The fix: include `head_sha` in the `wait_for_ci` return dict when it was resolved. ### Group 2: Implementation Plan: Queue Ejection Loop Fix — PART B ONLY This part addresses the recipe and skill layer of the queue ejection loop fix (Gaps 1, 3, 4, 6 from issue #627). Part A (code layer) must be implemented first — this part routes on `pr_state="ejected_ci_failure"` which Part A introduces. **Gap 1** — `re_push_queue_fix` routes directly to `reenter_merge_queue` after force-push, bypassing CI. Fix: insert a new `ci_watch_post_queue_fix` step between `re_push_queue_fix` and `reenter_merge_queue`, mirroring the existing `ci_watch` step. **Gap 6** — `wait_for_queue` routes all `ejected` states to `queue_ejected_fix` (conflict resolution), even when the ejection was caused by a CI failure that conflict resolution cannot fix. Fix: add an `ejected_ci_failure` route before `ejected` in `wait_for_queue.on_result`, routing to `diagnose_ci` instead. **Gap 3** — `resolve-merge-conflicts` SKILL.md runs only `pre-commit run --all-files` post-rebase. Fix: add Step 5a — language-detected manifest validation using fast non-compiling checks. **Gap 4** — Even a clean rebase can produce duplicate keys when both branches independently added the same dependency. Fix: add Step 5b — targeted duplicate key scan in TOML/JSON manifest files. Applied to: `recipes/implementation.yaml`, `recipes/remediation.yaml`, `recipes/implementation-groups.yaml`, `skills_extended/resolve-merge-conflicts/SKILL.md`. </details> ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; %% TERMINALS %% START([wait_for_queue\nrecipe step]) END_OK([release_issue_success]) END_FAIL([release_issue_failure]) END_TIMEOUT([release_issue_timeout]) END_DIAG([diagnose_ci]) subgraph MQPoll ["● Merge Queue Watcher (merge_queue.py)"] direction TB POLL["poll GitHub GraphQL\n━━━━━━━━━━\nPR state + queue state\n+ checks_state"] MERGED{"merged?"} CI_FAIL{"● checks_state\n== 'FAILURE'?"} CONFIRM["confirmation window\n━━━━━━━━━━\nnot_in_queue_cycles++"] CONFIRMED{"cycles ≥ threshold?"} STALL{"stall retries\nexhausted?"} TIMEOUT{"deadline\nexceeded?"} end subgraph EjectRoute ["● Recipe Ejection Routing (implementation.yaml)"] direction TB ROUTE{"● pr_state?"} REENROLL["reenroll_stalled_pr\n━━━━━━━━━━\ntoggle_auto_merge tool"] end subgraph ConflictFix ["● Conflict Fix Sub-Flow (implementation.yaml)"] direction TB QFIX["queue_ejected_fix\n━━━━━━━━━━\nresolve-merge-conflicts skill"] ESC{"escalation_required?"} REPUSH["re_push_queue_fix\n━━━━━━━━━━\npush_to_remote force=true"] CI_WATCH["● ci_watch_post_queue_fix\n━━━━━━━━━━\nwait_for_ci tool\ntimeout=300s"] CI_PASS{"CI pass?"} DETECT["detect_ci_conflict\n━━━━━━━━━━\ndiagnose-ci skill"] REENTER["reenter_merge_queue\n━━━━━━━━━━\ngh pr merge --squash --auto"] end subgraph WFCITool ["● wait_for_ci tool handler (tools_ci.py)"] direction LR INFER["infer head_sha\n━━━━━━━━━━\ngit rev-parse HEAD"] CIWAIT["ci_watcher.wait(scope)"] ENRICH["● result includes head_sha\n━━━━━━━━━━\nverifies SHA matches HEAD\nafter force-push"] end %% MAIN FLOW %% START --> POLL POLL --> MERGED MERGED -->|"yes"| END_OK MERGED -->|"no"| CONFIRM CONFIRM --> CONFIRMED CONFIRMED -->|"no"| STALL CONFIRMED -->|"yes (not in queue)"| CI_FAIL STALL -->|"yes"| END_TIMEOUT STALL -->|"no"| TIMEOUT TIMEOUT -->|"yes"| END_TIMEOUT TIMEOUT -->|"no"| POLL CI_FAIL -->|"yes"| ROUTE CI_FAIL -->|"no"| ROUTE ROUTE -->|"ejected_ci_failure\n(● new route)"| END_DIAG ROUTE -->|"ejected"| QFIX ROUTE -->|"stalled"| REENROLL ROUTE -->|"timeout"| END_TIMEOUT REENROLL -->|"success"| START REENROLL -->|"failure"| END_FAIL QFIX --> ESC ESC -->|"true"| END_FAIL ESC -->|"false"| REPUSH REPUSH -->|"failure"| END_FAIL REPUSH -->|"success"| CI_WATCH CI_WATCH --> INFER --> CIWAIT --> ENRICH ENRICH --> CI_PASS CI_PASS -->|"failure"| DETECT CI_PASS -->|"success"| REENTER DETECT --> END_FAIL REENTER -->|"success"| START REENTER -->|"failure"| END_FAIL %% CLASS ASSIGNMENTS %% class START terminal; class END_OK,END_FAIL,END_TIMEOUT,END_DIAG terminal; class POLL,CONFIRM handler; class MERGED,CONFIRMED,STALL,TIMEOUT stateNode; class CI_FAIL,ROUTE,ESC,CI_PASS detector; class QFIX,REPUSH,REENTER handler; class REENROLL,DETECT handler; class CI_WATCH,INFER,CIWAIT,ENRICH newComponent; ``` ### State Lifecycle Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; subgraph MQResult ["wait_for_merge_queue Return Dict (merge_queue.py)"] direction TB PS["● pr_state : str\n━━━━━━━━━━\nmerged | ejected\nejected_ci_failure | stalled\ntimeout | error\n(bare literals, no StrEnum)"] SUC["success : bool\n━━━━━━━━━━\ntrue only for 'merged'"] REASON["reason : str\n━━━━━━━━━━\nhuman-readable\nalways present"] STALL["stall_retries_attempted : int\n━━━━━━━━━━\nalways present\nexcept 'error' path"] EC["● ejection_cause : str\n━━━━━━━━━━\n'ci_failure' only\nwhen pr_state==ejected_ci_failure\nCONDITIONAL FIELD"] end subgraph InternalPoll ["PRFetchState — Internal Polling State (not returned)"] direction LR CHECKS["checks_state : str|None\n━━━━━━━━━━\nGitHub StatusCheckRollup\nNone = no checks configured"] INQUEUE["in_queue : bool\n━━━━━━━━━━\nPR in mergeQueue.entries"] QSTATE["queue_state : str|None\n━━━━━━━━━━\nUNMERGEABLE | AWAITING_CHECKS\n| LOCKED | null"] end subgraph Gate1 ["● Ejection Decision Gate (merge_queue.py)"] direction TB CFAIL{"checks_state\n== 'FAILURE'?"} SET_ECI["● set pr_state='ejected_ci_failure'\n━━━━━━━━━━\nejection_cause='ci_failure'\nINJECTED into result"] SET_EJ["set pr_state='ejected'\n━━━━━━━━━━\nno ejection_cause field\n(absent, not null)"] end subgraph CIScope ["CIRunScope — Frozen Input Scope (core/types)"] direction LR WF["workflow : str|None\n━━━━━━━━━━\ne.g. 'tests.yml'"] HS["● head_sha : str|None\n━━━━━━━━━━\ngit rev-parse HEAD\nor caller-supplied"] end subgraph CIResult ["● wait_for_ci Return Dict (tools_ci.py)"] direction TB RUNID["run_id : int|None\n━━━━━━━━━━\nGitHub Actions run ID"] CONC["conclusion : str\n━━━━━━━━━━\nsuccess|failure|cancelled\naction_required|timed_out\nno_runs|error|unknown"] FJOBS["failed_jobs : list\n━━━━━━━━━━\nalways present\nempty on billing errors"] HSHA["● head_sha : str\n━━━━━━━━━━\nCONDITIONAL: present only\nwhen scope.head_sha truthy\ninjected by tool layer"] end subgraph ConsumerGate ["Recipe Routing Gate (on_result)"] direction TB ROUTE{"pr_state value?"} R1["ejected_ci_failure\n→ diagnose_ci"] R2["ejected\n→ queue_ejected_fix"] R3["merged|stalled|timeout\n→ other routes"] end %% FLOW %% CHECKS --> CFAIL INQUEUE --> CFAIL QSTATE --> CFAIL CFAIL -->|"FAILURE"| SET_ECI CFAIL -->|"other"| SET_EJ SET_ECI --> PS SET_ECI --> EC SET_EJ --> PS PS --> SUC PS --> REASON PS --> STALL HS --> CIResult WF --> CIResult RUNID --> CONC CONC --> FJOBS FJOBS --> HSHA PS --> ROUTE EC --> ROUTE ROUTE --> R1 ROUTE --> R2 ROUTE --> R3 HSHA -.->|"verifies HEAD\nafter force-push"| R2 %% CLASS ASSIGNMENTS %% class PS,EC,HSHA,SET_ECI,HS,CFAIL gap; class SUC,REASON,STALL,RUNID,CONC,FJOBS output; class CHECKS,INQUEUE,QSTATE,WF stateNode; class SET_EJ handler; class ROUTE,R1,R2,R3 detector; class InternalPoll phase; ``` ### Error/Resilience Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; END_OK([release_issue_success]) END_FAIL([release_issue_failure\n━━━━━━━━━━\nhuman escalation\nclone preserved]) END_DIAG([diagnose_ci]) subgraph MQLoop ["● Merge Queue Poll Loop (merge_queue.py)"] direction TB POLL["GraphQL fetch\n━━━━━━━━━━\nPR + queue state"] POLL_ERR{"Exception\ncaught?"} TIMEOUT_CHK{"deadline\nexceeded?"} STALL_CHK{"stall retries\n≥ max (3)?"} end subgraph EjectGate ["● Ejection Classification Gate (merge_queue.py)"] direction TB EJECT_DECISION{"● checks_state\n== 'FAILURE'?"} CI_EJ["● ejected_ci_failure\n━━━━━━━━━━\nejection_cause=ci_failure\nskips conflict resolution"] CONF_EJ["ejected\n━━━━━━━━━━\nno cause field\nconflict resolution"] end subgraph StallBreaker ["Stall Circuit Breaker (merge_queue.py)"] direction LR TOGGLE["_toggle_auto_merge\n━━━━━━━━━━\ndisable → 2s → re-enable\nbackoff: 30/60/120s"] TOGGLE_ERR{"Exception\ncaught?"} end subgraph ConflictPath ["Conflict Resolution Path (implementation.yaml)"] direction TB QFIX["queue_ejected_fix\n━━━━━━━━━━\nresolve-merge-conflicts skill"] ESC_CHK{"escalation\nrequired?"} REPUSH["re_push_queue_fix\n━━━━━━━━━━\nforce-push"] REPUSH_FAIL{"push\nfailed?"} end subgraph CIGate ["● CI Gate After Re-Push (implementation.yaml + tools_ci.py)"] direction TB CI_WATCH["● ci_watch_post_queue_fix\n━━━━━━━━━━\nwait_for_ci, timeout=300s\nincludes head_sha"] CI_CONC{"conclusion\n== success?"} DETECT["detect_ci_conflict\n━━━━━━━━━━\ngit merge-base check\n(stale base?)"] DETECT_CHK{"stale\nbase?"} CI_CF["ci_conflict_fix\n━━━━━━━━━━\nresolve-merge-conflicts"] end subgraph ManifestGates ["● Post-Rebase Manifest Validation (SKILL.md)"] direction TB STEP5A["● Step 5a: manifest validity\n━━━━━━━━━━\ncargo metadata / node JSON.parse\nuv lock --check / tomllib"] STEP5A_CHK{"manifest\nvalid?"} STEP5B["● Step 5b: duplicate key scan\n━━━━━━━━━━\nTOML dep sections\nJSON object_pairs_hook"] STEP5B_CHK{"duplicates\nfound?"} REBASE_ABORT["git rebase --abort\n━━━━━━━━━━\nescalation_required=true"] end %% POLL LOOP FLOW %% POLL --> POLL_ERR POLL_ERR -->|"yes: log + retry"| POLL POLL_ERR -->|"no"| TIMEOUT_CHK TIMEOUT_CHK -->|"yes"| END_FAIL TIMEOUT_CHK -->|"no"| STALL_CHK STALL_CHK -->|"yes: stalled"| END_FAIL STALL_CHK -->|"no: stall attempt"| TOGGLE TOGGLE --> TOGGLE_ERR TOGGLE_ERR -->|"yes: log + increment"| STALL_CHK TOGGLE_ERR -->|"no: success"| POLL %% EJECTION GATE %% STALL_CHK -->|"ejection confirmed"| EJECT_DECISION EJECT_DECISION -->|"FAILURE"| CI_EJ EJECT_DECISION -->|"other"| CONF_EJ CI_EJ --> END_DIAG CONF_EJ --> QFIX %% CONFLICT PATH %% QFIX --> STEP5A STEP5A --> STEP5A_CHK STEP5A_CHK -->|"invalid"| REBASE_ABORT STEP5A_CHK -->|"valid"| STEP5B STEP5B --> STEP5B_CHK STEP5B_CHK -->|"duplicates"| REBASE_ABORT STEP5B_CHK -->|"clean"| ESC_CHK REBASE_ABORT --> ESC_CHK ESC_CHK -->|"true"| END_FAIL ESC_CHK -->|"false"| REPUSH REPUSH --> REPUSH_FAIL REPUSH_FAIL -->|"yes"| END_FAIL REPUSH_FAIL -->|"no"| CI_WATCH %% CI GATE %% CI_WATCH --> CI_CONC CI_CONC -->|"yes"| END_OK CI_CONC -->|"no"| DETECT DETECT --> DETECT_CHK DETECT_CHK -->|"yes: stale base"| CI_CF DETECT_CHK -->|"no: code failure"| END_DIAG CI_CF --> ESC_CHK %% CLASS ASSIGNMENTS %% class END_OK,END_FAIL,END_DIAG terminal; class POLL,TOGGLE handler; class POLL_ERR,TOGGLE_ERR,TIMEOUT_CHK,STALL_CHK gap; class EJECT_DECISION,CI_CONC,DETECT_CHK,STEP5A_CHK,STEP5B_CHK,ESC_CHK,REPUSH_FAIL detector; class CI_EJ,CONF_EJ,REBASE_ABORT output; class QFIX,REPUSH,CI_WATCH,DETECT,CI_CF handler; class STEP5A,STEP5B phase; ``` Closes #627 ## Implementation Plan Plan files: - `/home/talon/projects/autoskillit-runs/impl-20260405-122055-152199/.autoskillit/temp/make-plan/queue_ejection_loop_plan_2026-04-05_122055_part_a.md` - `/home/talon/projects/autoskillit-runs/impl-20260405-122055-152199/.autoskillit/temp/make-plan/queue_ejection_loop_plan_2026-04-05_122055_part_b.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 37 | 31.7k | 1.9M | 113.2k | 1 | 11m 19s | | review | 3.4k | 5.6k | 147.3k | 41.5k | 1 | 5m 45s | | verify | 44 | 35.4k | 1.9M | 144.8k | 2 | 11m 15s | | implement | 100 | 33.5k | 4.6M | 123.5k | 2 | 12m 17s | | audit_impl | 15 | 14.0k | 279.5k | 44.2k | 1 | 3m 46s | | open_pr | 33 | 30.5k | 1.2M | 68.1k | 1 | 10m 58s | | **Total** | 3.6k | 150.8k | 9.9M | 535.3k | | 55m 23s | --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…Artifact Preservation (#630) ## Summary The review-design skill has four compounding defects that make GO verdicts structurally unreachable. This plan fixes all four: 1. **Threshold unreachable** — Replace the static `>= 3` warning threshold with a proportional formula based on active dimensions (`active_dimensions * WARNING_BUDGET_PER_DIM` where budget = 5), calibrated so that the spectral-init v6 baseline (32 warnings across ~7 dimensions, deemed "substantively sound") would receive a GO verdict. 2. **Prescriptive findings** — Add evaluative-only constraints to Critical Constraints and a shared subagent evaluation scope block before Step 2, requiring findings to describe WHAT is lacking, never HOW to fix it. 3. **Scope drift** — Add a design scope boundary to the shared subagent block, prohibiting evaluation of implementation code snippets and constraining review to experimental design elements. 4. **Artifact preservation** — Enhance the `create_worktree` step in research.yaml to copy all review-cycle artifacts (dashboards, revision guidance, plan versions, resolve-design-review output) into `research/.../artifacts/`, and add a `commit_research_artifacts` step before `push_branch` to capture phase-groups and phase-plans from the worktree. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; START([plan_experiment]) COMPLETE([research_complete]) STOP_OUT([design_rejected]) subgraph DesignReview ["● review_design Step (research.yaml)"] direction TB RD["● review_design<br/>━━━━━━━━━━<br/>run_skill<br/>retries: 2"] REVISE_ROUTE["revise_design<br/>━━━━━━━━━━<br/>route → plan_experiment"] RESOLVE["resolve_design_review<br/>━━━━━━━━━━<br/>run_skill, retries: 1"] end subgraph VerdictSynthesis ["● Step 7: Verdict Synthesis (review-design SKILL.md)"] direction TB SCOPE["● Evaluative Scope Gate<br/>━━━━━━━━━━<br/>Findings: WHAT is lacking<br/>Design boundary only"] RTCAP["rt_cap = RT_MAX_SEVERITY<br/>━━━━━━━━━━<br/>Downgrade red_team<br/>severity by type"] CLASSIFY["Classify findings<br/>━━━━━━━━━━<br/>critical_findings<br/>warning_findings"] ACTIVE["● active_dimensions<br/>━━━━━━━━━━<br/>count spawned non-SILENT<br/>dims (L1+L2+L3+L4+RT)"] THRESH["★ warning_threshold<br/>━━━━━━━━━━<br/>active_dims × 5<br/>WARNING_BUDGET_PER_DIM=5"] VERDICT{"● Verdict Decision<br/>━━━━━━━━━━<br/>stop_triggers?<br/>critical? warnings≥threshold?"} end subgraph ArtifactPath ["★ Artifact Commit Path (research.yaml)"] direction TB TEST["● test<br/>━━━━━━━━━━<br/>test_check"] FIX["fix_tests<br/>━━━━━━━━━━<br/>run_skill"] RETEST["● retest<br/>━━━━━━━━━━<br/>test_check"] COMMIT["★ commit_research_artifacts<br/>━━━━━━━━━━<br/>run_cmd: copy phase-groups<br/>phase-plans → artifacts/<br/>on_failure: push_branch"] end PUSH["push_branch<br/>━━━━━━━━━━<br/>run_cmd"] START -->|"run review_design"| RD RD -->|"STOP verdict"| RESOLVE RD -->|"REVISE verdict"| REVISE_ROUTE RD -->|"GO verdict"| create_worktree REVISE_ROUTE -->|"loop back"| START RESOLVE -->|"revised"| REVISE_ROUTE RESOLVE -->|"failed"| STOP_OUT RD -->|"on_failure / on_exhausted"| create_worktree create_worktree["create_worktree<br/>━━━━━━━━━━<br/>★ copies review-cycles<br/>plan-versions artifacts"] create_worktree --> decompose["decompose_phases<br/>plan_phase<br/>implement_phase"] decompose --> experiment["run_experiment<br/>write_report"] experiment --> TEST TEST -->|"pass"| COMMIT TEST -->|"fail"| FIX FIX --> RETEST RETEST -->|"pass"| COMMIT RETEST -->|"fail"| PUSH COMMIT -->|"success or failure"| PUSH PUSH --> COMPLETE SCOPE -.->|"constraint applied to<br/>all dimension subagents"| CLASSIFY RTCAP --> CLASSIFY CLASSIFY --> ACTIVE ACTIVE --> THRESH THRESH --> VERDICT VERDICT -->|"stop_triggers"| STOP_OUT VERDICT -->|"critical_findings or<br/>warnings ≥ threshold"| REVISE_ROUTE VERDICT -->|"else"| create_worktree class START,COMPLETE,STOP_OUT terminal; class RD,RESOLVE,decompose,experiment,FIX handler; class REVISE_ROUTE,RTCAP,CLASSIFY phase; class VERDICT,ACTIVE stateNode; class SCOPE detector; class THRESH,COMMIT,create_worktree newComponent; class TEST,RETEST,PUSH output; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | Start, complete, and terminal states | | Orange | Handler | Processing steps (run_skill, run_cmd) | | Purple | Phase | Control flow, routing, severity capping | | Teal | State | Decision and counting nodes | | Red | Detector | Constraint gates (evaluative scope) | | Green | New | ★ new components, ● modified components | | Dark Teal | Output | test_check steps and push_branch | Closes #629 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260405-160303-009353/.autoskillit/temp/make-plan/fix-review-design-threshold-unreachable-prescriptive-finding_plan_2026-04-05_161500.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 2.8k | 22.6k | 1.2M | 85.0k | 1 | 10m 36s | | verify | 30 | 14.6k | 1.5M | 74.8k | 1 | 8m 28s | | implement | 62 | 19.9k | 4.1M | 92.5k | 1 | 7m 41s | | audit_impl | 87 | 10.6k | 473.5k | 47.1k | 1 | 6m 41s | | open_pr | 25 | 11.7k | 806.3k | 48.9k | 1 | 4m 22s | | **Total** | 3.0k | 79.4k | 8.1M | 348.3k | | 37m 50s | --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…ound Bash Tasks (#633) ## Summary Headless sessions running long-lived background Bash tasks (e.g. `cargo bench` launched via `run_in_background: true`) are killed as stale because the staleness signal is JSONL file growth, not actual session liveness. When the LLM goes idle waiting for a background child, the JSONL stops growing and the 20-minute staleness threshold is breached — even though child processes are actively running. Three changes eliminate this class of false kill: 1. **`_has_active_child_processes`** — a second suppression gate in `_session_log_monitor` that checks child process CPU activity before issuing a kill. Added alongside the existing `_has_active_api_connection` port-443 gate. 2. **`RecipeStep.stale_threshold`** — an optional per-step threshold field that recipe authors can raise for steps known to run long-lived experiments, passed through `run_skill` → `run_headless_core` → `_session_log_monitor`. 3. **Recipe YAML overrides** — `stale_threshold: 2400` (40 min) on specific long-running steps in `research.yaml`, `implementation.yaml`, `remediation.yaml`, `implementation-groups.yaml`, and `merge-prs.yaml`. ## Requirements ### STALE — Staleness Suppression via Child Process Detection - **REQ-STALE-001:** The system must detect active child processes in the headless session's process tree when the stale threshold is breached. - **REQ-STALE-002:** The system must suppress the stale kill when any child process in the tree reports CPU usage exceeding ~10% via `cpu_percent(interval=0)`. - **REQ-STALE-003:** The system must reset the staleness clock (`last_change`) when child process activity suppresses the stale kill, identical to the existing `_has_active_api_connection` suppression behavior. - **REQ-STALE-004:** The child process detection must follow the established exception-handling pattern, silently skipping `NoSuchProcess`, `ZombieProcess`, and `AccessDenied` errors per process. - **REQ-STALE-005:** The child process detection must only execute when the stale threshold has already been breached (zero performance impact during normal operation). - **REQ-STALE-006:** The child process detection must emit a structured log warning when suppressing a stale kill, following the pattern established by `_has_active_api_connection`. ### SCHEMA — Per-Step Stale Threshold in RecipeStep - **REQ-SCHEMA-001:** The `RecipeStep` dataclass must accept an optional `stale_threshold` field of type `int | None` with no default value (defaults to `None`). - **REQ-SCHEMA-002:** When `stale_threshold` is `None` on a recipe step, the global `RunSkillConfig.stale_threshold` (1200s) must apply. - **REQ-SCHEMA-003:** The `run_skill` MCP tool handler must accept an optional `stale_threshold` parameter and forward it to `run_headless_core`. - **REQ-SCHEMA-004:** The recipe validator must reject `stale_threshold` values that are not positive integers when set. ### RECIPE — Research Recipe Step Overrides - **REQ-RECIPE-001:** Research-oriented recipes must set `stale_threshold: 2400` (40 minutes) on specific long-running steps (e.g., `implement_phase`, `run_experiment`). - **REQ-RECIPE-002:** Fast-completing steps (e.g., `plan_phase`) must not have a `stale_threshold` override, relying on the global default. ### TEST — Test Coverage - **REQ-TEST-001:** Unit tests must verify `_has_active_child_processes` returns `True` when a child process exceeds the CPU threshold. - **REQ-TEST-002:** Unit tests must verify `_has_active_child_processes` returns `False` when all children are idle, when no children exist, and when exceptions are raised. - **REQ-TEST-003:** An integration test must verify stale suppression when a child process is CPU-active but has no port-443 connection. - **REQ-TEST-004:** The existing `TestSessionLogMonitorStaleSuppressionGate` test class must be extended with the child-process-active scenario. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; %% TERMINALS %% START([SESSION LAUNCHED]) T_COMPLETE([COMPLETION]) T_STALE([STALE — KILL]) %% CONFIG CHAIN %% subgraph Config ["● RECIPE STEP CONFIG (stale_threshold flow)"] direction TB RecipeStep["● RecipeStep YAML<br/>━━━━━━━━━━<br/>stale_threshold: 2400<br/>(or unset → None)"] RunSkill["● run_skill handler<br/>━━━━━━━━━━<br/>tools_execution.py<br/>stale_threshold: int | None"] Runner["DefaultSubprocessRunner<br/>━━━━━━━━━━<br/>process.py<br/>default: 1200s"] end %% PHASE 1 %% subgraph Phase1 ["PHASE 1 — JSONL File Discovery (poll 1s, timeout 30s)"] direction TB P1_Poll["Poll session_log_dir<br/>━━━━━━━━━━<br/>ctime > spawn_time?<br/>Match session_id?"] P1_Found{"File found<br/>within 30s?"} end %% PHASE 2 %% subgraph Phase2 ["● PHASE 2 — Staleness Monitor Loop (poll every 2s)"] direction TB P2_Stat["stat(session_file)<br/>━━━━━━━━━━<br/>current_size vs last_size"] P2_Grew{"JSONL<br/>grew?"} P2_Marker["Read new content<br/>━━━━━━━━━━<br/>scan for completion<br/>marker in JSONL"] P2_MarkerFound{"Completion<br/>marker found?"} P2_ResetGrow["last_size = current_size<br/>last_change = now()"] P2_Elapsed{"elapsed >=<br/>stale_threshold?"} end %% SUPPRESSION GATES %% subgraph Gates ["● SUPPRESSION GATES (only fire when stale threshold breached)"] direction TB Gate1["_has_active_api_connection<br/>━━━━━━━━━━<br/>Walk proc tree<br/>ESTABLISHED port-443?"] Gate1_Active{"API conn<br/>active?"} Gate2["● _has_active_child_processes<br/>━━━━━━━━━━<br/>Walk child procs<br/>cpu_percent > 10%?"] Gate2_Active{"Child CPU<br/>> 10%?"} ResetClock["last_change = now()<br/>━━━━━━━━━━<br/>Suppress stale kill<br/>reset staleness clock"] end %% CONNECTIONS %% START --> RecipeStep RecipeStep -->|"stale_threshold (int|None)"| RunSkill RunSkill -->|"float(x) or None → default 1200s"| Runner Runner -->|"stale_threshold, pid"| P1_Poll P1_Poll --> P1_Found P1_Found -->|"yes"| P2_Stat P1_Found -->|"no (30s timeout)"| T_STALE P2_Stat --> P2_Grew P2_Grew -->|"yes"| P2_ResetGrow P2_ResetGrow --> P2_Marker P2_Marker --> P2_MarkerFound P2_MarkerFound -->|"yes"| T_COMPLETE P2_MarkerFound -->|"no"| P2_Elapsed P2_Grew -->|"no"| P2_Elapsed P2_Elapsed -->|"no (wait)"| P2_Stat P2_Elapsed -->|"yes"| Gate1 Gate1 --> Gate1_Active Gate1_Active -->|"yes"| ResetClock Gate1_Active -->|"no"| Gate2 Gate2 --> Gate2_Active Gate2_Active -->|"yes"| ResetClock Gate2_Active -->|"no"| T_STALE ResetClock -->|"continue loop"| P2_Stat %% CLASS ASSIGNMENTS %% class START,T_COMPLETE,T_STALE terminal; class RecipeStep,RunSkill handler; class Runner stateNode; class P1_Poll,P2_Stat,P2_Marker,P2_ResetGrow,ResetClock phase; class P1_Found,P2_Grew,P2_MarkerFound,P2_Elapsed,Gate1_Active,Gate2_Active stateNode; class Gate1 handler; class Gate2 newComponent; ``` ### Concurrency Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; %% TERMINALS %% START([SESSION LAUNCHED]) COMPLETE([TASK GROUP CANCELLED]) %% MAIN THREAD: Sequential setup %% subgraph MainSeq ["MAIN COROUTINE — Sequential Setup"] direction TB SpawnProc["Spawn Claude Code process<br/>━━━━━━━━━━<br/>asyncio subprocess<br/>get proc.pid"] CreateAcc["Create RaceAccumulator + trigger<br/>━━━━━━━━━━<br/>anyio.Event (idempotent set)<br/>channel_b_ready Event"] OpenTG["anyio.create_task_group()<br/>━━━━━━━━━━<br/>Fork: start 4–5 coroutines<br/>as tg.start_soon(...)"] TrigWait["await trigger.wait()<br/>━━━━━━━━━━<br/>Block until first watcher wins<br/>(or wall-clock timeout)"] DrainWait["Optional drain window<br/>━━━━━━━━━━<br/>await channel_b_ready if<br/>process exited but B pending"] CancelTG["tg.cancel_scope.cancel()<br/>━━━━━━━━━━<br/>Tear down all remaining tasks"] Resolve["resolve_termination(RaceSignals)<br/>━━━━━━━━━━<br/>Priority: exit > stale > completion"] end %% TASK GROUP: Concurrent watchers %% subgraph TaskGroup ["anyio TASK GROUP — Concurrent Watchers (cooperative, single event loop)"] direction LR subgraph ChA ["Channel A"] WatchProc["_watch_process<br/>━━━━━━━━━━<br/>await proc.wait()<br/>acc.process_exited=True"] WatchHB["_watch_heartbeat<br/>━━━━━━━━━━<br/>poll stdout NDJSON 0.5s<br/>acc.channel_a_confirmed=True"] end subgraph ChB ["● Channel B — Session Log"] ExtractID["_extract_stdout_session_id<br/>━━━━━━━━━━<br/>poll stdout for type=system<br/>sets stdout_session_id_ready"] WatchSL["● _watch_session_log<br/>━━━━━━━━━━<br/>calls _session_log_monitor<br/>acc.channel_b_status=COMPLETION|STALE"] end end %% STALENESS SUPPRESSION %% subgraph StaleGates ["● STALENESS SUPPRESSION — Sync psutil walks (inside _session_log_monitor)"] direction TB Gate1["_has_active_api_connection(pid)<br/>━━━━━━━━━━<br/>[parent + children(recursive=True)]<br/>net_connections port-443 ESTABLISHED?"] Gate2["● _has_active_child_processes(pid)<br/>━━━━━━━━━━<br/>[children(recursive=True) only]<br/>cpu_percent(interval=0) > 10%?"] ResetClock["last_change = monotonic()<br/>━━━━━━━━━━<br/>suppress stale kill<br/>continue Phase 2 loop"] ReturnStale["return STALE<br/>━━━━━━━━━━<br/>acc.channel_b_status = STALE<br/>trigger.set()"] end %% FLOW %% START --> SpawnProc SpawnProc --> CreateAcc CreateAcc --> OpenTG OpenTG -->|"tg.start_soon"| WatchProc OpenTG -->|"tg.start_soon"| WatchHB OpenTG -->|"tg.start_soon"| ExtractID OpenTG -->|"tg.start_soon"| WatchSL WatchProc -->|"trigger.set()"| TrigWait WatchHB -->|"trigger.set()"| TrigWait WatchSL -->|"trigger.set() after drain"| TrigWait WatchSL -->|"stale threshold breached"| Gate1 Gate1 -->|"no API conn"| Gate2 Gate2 -->|"child CPU active"| ResetClock Gate2 -->|"no activity"| ReturnStale Gate1 -->|"API conn active"| ResetClock ResetClock -->|"continue loop"| WatchSL TrigWait --> DrainWait DrainWait --> CancelTG CancelTG --> Resolve Resolve --> COMPLETE %% CLASS ASSIGNMENTS %% class START,COMPLETE terminal; class SpawnProc,CreateAcc,TrigWait,DrainWait,CancelTG,Resolve phase; class OpenTG detector; class WatchProc,WatchHB handler; class ExtractID handler; class WatchSL handler; class Gate1 handler; class Gate2 newComponent; class ResetClock output; class ReturnStale detector; ``` ### State Lifecycle Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; %% TERMINALS %% START([RECIPE YAML LOADED]) T_PASS([VALID — forwarded to run_skill]) T_FAIL([INVALID — validation error]) %% PARSE LAYER %% subgraph Parse ["● YAML → RecipeStep (io.py _parse_step)"] direction TB YAMLRead["● YAML key read<br/>━━━━━━━━━━<br/>data.get('stale_threshold')<br/>absent → None (no coercion)"] Construct["● RecipeStep(...)<br/>━━━━━━━━━━<br/>stale_threshold: int | None = None<br/>No __post_init__ mutations"] IntegrityGuard["_PARSE_STEP_HANDLED_FIELDS guard<br/>━━━━━━━━━━<br/>compile-time assert: fields == dataclass<br/>RuntimeError if diverged"] end %% VALIDATION LAYER %% subgraph Validation ["● STRUCTURAL VALIDATION (validator.py validate_recipe)"] direction TB IsNone{"stale_threshold<br/>is None?"} TypeCheck{"isinstance(int)<br/>AND > 0?"} AppendError["append error<br/>━━━━━━━━━━<br/>'must be positive integer<br/>when set'"] PassThrough["field passes<br/>━━━━━━━━━━<br/>no validation error<br/>for None or valid int"] end %% SEMANTIC LAYER %% subgraph Semantic ["● SEMANTIC RULE — _TOOL_PARAMS registry (rules_tools.py)"] direction TB ToolParamsCheck["_TOOL_PARAMS['run_skill']<br/>━━━━━━━━━━<br/>frozenset includes 'stale_threshold'<br/>dead-with-param rule: NO warning"] OtherToolWarn["Other tools<br/>━━━━━━━━━━<br/>stale_threshold not in their params<br/>dead-with-param: WARNING emitted"] end %% EXECUTION FORWARDING %% subgraph Execution ["EXECUTION FORWARDING (tools_execution.py run_skill)"] direction TB NullPath["stale_threshold = None<br/>━━━━━━━━━━<br/>→ DefaultSubprocessRunner default<br/>= 1200s (global config)"] OverridePath["stale_threshold = int<br/>━━━━━━━━━━<br/>float(stale_threshold)<br/>→ overrides global default"] Monitor["_session_log_monitor<br/>━━━━━━━━━━<br/>stale_threshold used as<br/>breach-detection window"] end %% FLOW %% START --> YAMLRead YAMLRead --> Construct Construct --> IntegrityGuard IntegrityGuard -->|"fields match — import OK"| IsNone IsNone -->|"yes (absent or None)"| PassThrough IsNone -->|"no (value present)"| TypeCheck TypeCheck -->|"valid"| PassThrough TypeCheck -->|"invalid (non-int or ≤ 0)"| AppendError AppendError --> T_FAIL PassThrough --> ToolParamsCheck ToolParamsCheck -->|"tool: run_skill"| T_PASS ToolParamsCheck -->|"other tool"| OtherToolWarn T_PASS --> NullPath T_PASS --> OverridePath NullPath --> Monitor OverridePath --> Monitor Monitor --> T_PASS %% CLASS ASSIGNMENTS %% class START,T_PASS,T_FAIL terminal; class YAMLRead,Construct handler; class IntegrityGuard detector; class IsNone,TypeCheck stateNode; class AppendError detector; class PassThrough output; class ToolParamsCheck newComponent; class OtherToolWarn gap; class NullPath,OverridePath,Monitor phase; ``` Closes #631 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260405-170436-566038/.autoskillit/temp/make-plan/fix_false_stale_kills_plan_2026-04-05_000000.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 2.8k | 45.6k | 2.0M | 151.7k | 2 | 19m 31s | | verify | 62 | 36.0k | 3.3M | 155.3k | 2 | 15m 1s | | implement | 149 | 47.2k | 9.6M | 183.8k | 2 | 16m 24s | | audit_impl | 102 | 20.0k | 762.1k | 90.1k | 2 | 10m 31s | | open_pr | 69 | 39.4k | 2.6M | 116.8k | 2 | 15m 32s | | review_pr | 38 | 57.4k | 1.8M | 103.1k | 1 | 18m 47s | | resolve_review | 55 | 32.5k | 3.1M | 84.3k | 1 | 14m 9s | | fix | 38 | 14.6k | 1.3M | 58.3k | 1 | 9m 9s | | **Total** | 3.3k | 292.6k | 24.3M | 943.5k | | 1h 59m | --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary
All four bundled recipes (`implementation`, `remediation`, `merge-prs`,
`implementation-groups`)
currently ship with `audit: default: "true"`, meaning `audit-impl` runs
unless explicitly
disabled. This plan changes all four recipes to `default: "false"` so
`audit-impl` is skipped
by default and becomes opt-in. No structural changes to the step graph,
routing, or test
infrastructure are needed — only the ingredient default changes.
**Scope:** 4 YAML ingredient default changes + 1 test assertion added.
## Requirements
### RCFG — Recipe Configuration
- **REQ-RCFG-001:** The `audit` input in `implementation.yaml` must
default to `"false"`.
- **REQ-RCFG-002:** The `audit` input in `implementation-groups.yaml`
must default to `"false"`.
- **REQ-RCFG-003:** The `audit` input in `remediation.yaml` must default
to `"false"`.
- **REQ-RCFG-004:** The `audit` input in `merge-prs.yaml` must default
to `"false"`.
- **REQ-RCFG-005:** The `audit_impl` step definition and its
`skip_when_false: "inputs.audit"` guard must remain unchanged in all
recipes.
- **REQ-RCFG-006:** Callers must still be able to opt in to audit-impl
by passing `audit: "true"` at pipeline invocation time.
## Architecture Impact
### Process Flow Diagram
```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
%% CLASS DEFINITIONS %%
classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
%% TERMINALS %%
START([Pipeline Invoked])
CONTINUE([Continue to push / merge])
ERROR([escalate_stop / register_clone_failure])
subgraph Ingredient ["● Ingredient Resolution"]
direction TB
AuditIng["● audit ingredient<br/>━━━━━━━━━━<br/>BEFORE: default='true'<br/>AFTER: default='false'"]
end
subgraph Gate ["skip_when_false Gate"]
direction TB
SkipCheck{"inputs.audit == 'true'?"}
SkipBypass["BYPASS<br/>━━━━━━━━━━<br/>Skip audit_impl<br/>(now default path)"]
RunAudit["● run audit-impl skill<br/>━━━━━━━━━━<br/>runs /autoskillit:audit-impl<br/>(now opt-in path)"]
Verdict{"GO / NO GO?"}
Remediate["remediate<br/>━━━━━━━━━━<br/>Route to remediation<br/>or re-plan"]
end
%% FLOW %%
START --> AuditIng
AuditIng -->|"resolves to 'false'<br/>(new default)"| SkipCheck
SkipCheck -->|"false (default — bypass)"| SkipBypass
SkipCheck -->|"true (opt-in — explicit)"| RunAudit
RunAudit --> Verdict
Verdict -->|"GO"| CONTINUE
Verdict -->|"NO GO"| Remediate
Verdict -->|"error"| ERROR
Remediate -->|"re-plan loop"| START
SkipBypass --> CONTINUE
%% CLASS ASSIGNMENTS %%
class START,CONTINUE,ERROR terminal;
class AuditIng handler;
class SkipCheck,Verdict stateNode;
class SkipBypass phase;
class RunAudit detector;
class Remediate phase;
```
**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Terminal | Pipeline start, continuation, and error states
|
| Teal | State | Decision gates (skip_when_false, GO/NO GO) |
| Orange | Handler | ● Audit ingredient (modified: default flipped to
"false") |
| Red | Detector | ● audit-impl skill execution (now opt-in path) |
| Purple | Phase | Bypass path (now default) and remediation routing |
### State Lifecycle Diagram
```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
%% CLASS DEFINITIONS %%
classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
%% TERMINALS %%
START([Recipe Invoked])
GATE([skip_when_false Evaluated])
subgraph Contracts ["● INGREDIENT CONTRACT DEFINITIONS"]
direction TB
ImplYaml["● implementation.yaml<br/>━━━━━━━━━━<br/>audit:<br/> default: 'false'<br/>(was: 'true')"]
ImplGroupsYaml["● implementation-groups.yaml<br/>━━━━━━━━━━<br/>audit:<br/> default: 'false'<br/>(was: 'true')"]
RemediationYaml["● remediation.yaml<br/>━━━━━━━━━━<br/>audit:<br/> default: 'false'<br/>(was: 'true')"]
MergePrsYaml["● merge-prs.yaml<br/>━━━━━━━━━━<br/>audit:<br/> default: 'false'<br/>(was: 'true')"]
end
subgraph Resolution ["INIT_ONLY: Ingredient Resolution"]
direction TB
CallerSupplied["Caller-supplied value<br/>━━━━━━━━━━<br/>audit='true' (opt-in)<br/>INIT_ONLY — frozen for run"]
DefaultApplied["● Contract default applied<br/>━━━━━━━━━━<br/>audit='false'<br/>INIT_ONLY — frozen for run"]
end
subgraph TestGate ["● CONTRACT VALIDATION (test_bundled_recipes.py)"]
direction TB
TestAssert["● test_audit_ingredient_defaults_to_false<br/>━━━━━━━━━━<br/>@pytest.mark.parametrize<br/>asserts audit.default == 'false'<br/>for all 4 recipes"]
end
%% FLOW %%
START -->|"caller passes audit='true'"| CallerSupplied
START -->|"no audit arg (default)"| DefaultApplied
ImplYaml --> DefaultApplied
ImplGroupsYaml --> DefaultApplied
RemediationYaml --> DefaultApplied
MergePrsYaml --> DefaultApplied
CallerSupplied --> GATE
DefaultApplied --> GATE
Contracts -.->|"validated by"| TestAssert
%% CLASS ASSIGNMENTS %%
class START terminal;
class GATE stateNode;
class ImplYaml,ImplGroupsYaml,RemediationYaml,MergePrsYaml handler;
class CallerSupplied detector;
class DefaultApplied phase;
class TestAssert gap;
```
**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Terminal | Pipeline invocation point |
| Teal | Gate | skip_when_false evaluation (INIT_ONLY field read) |
| Orange | Contract | ● Recipe YAML ingredient contract definitions
(modified) |
| Red | Opt-in | Caller-supplied value override (explicit audit='true')
|
| Purple | Default | ● Contract default applied (now 'false') |
| Yellow | Test | ● Contract validation test assertion (new) |
Closes #632
## Implementation Plan
Plan file:
`/home/talon/projects/autoskillit-runs/impl-20260405-180825-135856/.autoskillit/temp/make-plan/feat_default_audit_impl_off_plan_2026-04-05_181000.md`
🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit
## Token Usage Summary
| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| plan | 2.8k | 60.3k | 4.0M | 213.2k | 3 | 24m 25s |
| verify | 82 | 43.0k | 3.9M | 193.2k | 3 | 22m 22s |
| implement | 176 | 53.6k | 10.3M | 221.3k | 3 | 18m 51s |
| audit_impl | 117 | 25.1k | 1.0M | 114.6k | 3 | 12m 6s |
| open_pr | 101 | 60.0k | 3.7M | 168.5k | 3 | 22m 39s |
| review_pr | 71 | 112.5k | 3.4M | 189.2k | 2 | 33m 19s |
| resolve_review | 77 | 40.4k | 3.7M | 117.7k | 2 | 18m 16s |
| fix | 38 | 14.6k | 1.3M | 58.3k | 1 | 9m 9s |
| **Total** | 3.5k | 409.5k | 31.4M | 1.3M | | 2h 41m |
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Increase sensitivity to catch quota exhaustion earlier, giving more buffer before hard API limits are hit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…l for Experiment Failures (#636) ## Summary This plan adds automated failure diagnosis to the research pipeline (issue #635). There are two distinct requirements: **DIAG**: Create a `troubleshoot-experiment` skill that reads session logs and process traces to classify why a research step failed, then emit a structured diagnostic artifact and `is_fixable` signal. Wire this skill into `research.yaml` so that `implement_phase` failures route to it instead of dying at `escalate_stop`. **SEP**: Fix the structural misuse of `retry-worktree` in `implement_phase`. The skill `retry-worktree` is designed to *resume* context-exhausted `implement-worktree` sessions — it is not a primary implementation driver. The research recipe already has the correct purpose-built skill: `implement-experiment`, which explicitly forbids experiment execution during implementation and routes context exhaustion directly to `run-experiment`. Switching `implement_phase` to use `implement-experiment` addresses REQ-SEP-001 and REQ-SEP-002 at the skill level, where the constraint is enforceable. ## Requirements ### DIAG — Experiment Failure Diagnosis - **REQ-DIAG-001:** The system must provide a skill that investigates why a research recipe step failed by reading session logs and process traces. - **REQ-DIAG-002:** The skill must classify the failure type (stale timeout, context exhaustion, build failure, data missing, parameter issue, unknown). - **REQ-DIAG-003:** The skill must emit a structured diagnostic artifact that downstream steps or the human can act on. - **REQ-DIAG-004:** The research recipe must route experiment failures to the diagnostic skill instead of `escalate_stop`. ### SEP — Structural Separation of Implementation and Execution - **REQ-SEP-001:** Implementation worktree steps must not perform experiment execution (benchmarks, profiling, data collection). - **REQ-SEP-002:** Experiment execution must route through the `run_experiment` step (or equivalent) which has appropriate timeout and retry semantics. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; START([RESEARCH PIPELINE]) ESCALATE([escalate_stop]) COMPLETE([research_complete]) subgraph PhaseMgmt ["Phase Management"] plan_phase["● plan_phase<br/>━━━━━━━━━━<br/>make-plan skill<br/>plans current group"] implement_phase["● implement_phase<br/>━━━━━━━━━━<br/>implement-experiment<br/>(was: retry-worktree)<br/>stale_threshold: 2400"] next_phase{"next_phase_or_experiment<br/>━━━━━━━━━━<br/>more phases?"} end subgraph DiagPhase ["★ Failure Diagnosis (NEW)"] troubleshoot["★ troubleshoot_implement_failure<br/>━━━━━━━━━━<br/>troubleshoot-experiment skill<br/>worktree_path + implement_phase"] route_fix{"★ route_implement_failure<br/>━━━━━━━━━━<br/>is_fixable?"} end subgraph SkillInternals ["★ troubleshoot-experiment Internals"] direction TB init_idx["★ initialize code-index<br/>━━━━━━━━━━<br/>set_project_path(worktree_path)"] session_lookup["★ locate failed session<br/>━━━━━━━━━━<br/>sessions.jsonl<br/>select success=false + cwd match"] read_diags["★ read session diagnostics<br/>━━━━━━━━━━<br/>summary.json: termination_reason<br/>write_call_count, exit_code<br/>anomalies.jsonl: kind, severity"] classify{"★ classify failure type<br/>━━━━━━━━━━<br/>priority-ordered<br/>decision table"} write_diag["★ diagnosis_{ts}.md<br/>━━━━━━━━━━<br/>failure_type, is_fixable<br/>evidence + recommended action"] emit_tokens["★ emit output tokens<br/>━━━━━━━━━━<br/>diagnosis_path=<br/>failure_type=<br/>is_fixable="] end subgraph ExperimentPhase ["Experiment Phase"] run_experiment["run_experiment<br/>━━━━━━━━━━<br/>run-experiment skill<br/>stale_threshold: 2400, retries: 2"] end START --> plan_phase plan_phase --> implement_phase implement_phase -->|"on_success"| next_phase implement_phase -->|"on_failure"| troubleshoot implement_phase -->|"on_exhausted / on_context_limit"| run_experiment next_phase -->|"more_phases"| plan_phase next_phase -->|"done"| run_experiment troubleshoot --> init_idx init_idx --> session_lookup session_lookup -->|"session found"| read_diags session_lookup -->|"no session / missing log"| write_diag read_diags --> classify classify -->|"context_limit → context_exhaustion, fixable=true"| write_diag classify -->|"stale + write=0 → stale_timeout, fixable=true"| write_diag classify -->|"exit!=0 + build error → build_failure, fixable=true"| write_diag classify -->|"infra error / OOM → environment_error, fixable=false"| write_diag classify -->|"unknown"| write_diag write_diag --> emit_tokens emit_tokens --> route_fix route_fix -->|"is_fixable=true"| plan_phase route_fix -->|"is_fixable=false"| ESCALATE troubleshoot -->|"on_failure (skill crash)"| ESCALATE run_experiment --> COMPLETE class START,ESCALATE,COMPLETE terminal; class plan_phase,implement_phase handler; class next_phase,route_fix,classify stateNode; class troubleshoot,init_idx,session_lookup,read_diags,write_diag,emit_tokens newComponent; ``` ### Module Dependency Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%% graph TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; subgraph L2_Recipe ["L2 — Recipe System"] recipe_io["recipe/io.py<br/>━━━━━━━━━━<br/>load_recipe, builtin_recipes_dir"] recipe_validator["recipe/validator.py<br/>━━━━━━━━━━<br/>validate_recipe"] recipe_contracts["recipe/contracts.py<br/>━━━━━━━━━━<br/>contract card generation"] end subgraph L1_Workspace ["L1 — Workspace"] workspace_skills["workspace/skills.py<br/>━━━━━━━━━━<br/>SkillResolver<br/>discovers skills_extended/"] end subgraph L0_Core ["L0 — Core"] core_paths["core/paths.py<br/>━━━━━━━━━━<br/>pkg_root()<br/>canonical package root"] end subgraph DataRecipes ["Data — Recipes (YAML)"] research_yaml["● recipes/research.yaml<br/>━━━━━━━━━━<br/>implement-experiment (was: retry-worktree)<br/>on_failure → troubleshoot_implement_failure<br/>on_exhausted → run_experiment"] end subgraph DataContracts ["Data — Contracts (YAML)"] skill_contracts["● recipe/skill_contracts.yaml<br/>━━━━━━━━━━<br/>★ troubleshoot-experiment entry<br/>is_fixable output pattern"] end subgraph DataSkills ["Data — Skills (SKILL.md)"] troubleshoot_skill["★ skills_extended/troubleshoot-experiment/<br/>━━━━━━━━━━<br/>session log reader<br/>failure classifier, is_fixable emitter"] implement_exp["skills_extended/implement-experiment/<br/>━━━━━━━━━━<br/>no experiment execution<br/>routes exhaustion → run-experiment"] end subgraph Tests ["Tests"] test_diag["★ tests/recipe/test_research_recipe_diag.py<br/>━━━━━━━━━━<br/>validates research.yaml routing<br/>asserts skill_command swap"] test_contracts["★ tests/skills/test_troubleshoot_experiment_contracts.py<br/>━━━━━━━━━━<br/>SkillResolver discovery<br/>SKILL.md existence"] test_skills_ws["● tests/workspace/test_skills.py<br/>━━━━━━━━━━<br/>skill count +1"] end recipe_io -->|"loads at runtime"| research_yaml recipe_validator -->|"validates"| research_yaml recipe_contracts -->|"loads at runtime"| skill_contracts research_yaml -->|"skill_command references"| troubleshoot_skill research_yaml -->|"skill_command references"| implement_exp skill_contracts -->|"contract entry for"| troubleshoot_skill workspace_skills -->|"discovers via pkg_root()"| troubleshoot_skill workspace_skills -->|"uses"| core_paths test_diag -->|"imports"| recipe_io test_diag -->|"imports"| recipe_validator test_contracts -->|"imports"| workspace_skills test_contracts -->|"imports"| core_paths test_skills_ws -->|"imports"| workspace_skills class recipe_io,recipe_validator,recipe_contracts phase; class workspace_skills handler; class core_paths stateNode; class research_yaml,skill_contracts output; class troubleshoot_skill newComponent; class implement_exp handler; class test_diag,test_contracts newComponent; class test_skills_ws handler; ``` Closes #635 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260405-193031-162971/.autoskillit/temp/make-plan/research_recipe_troubleshoot_plan_2026-04-05_193500.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 2.9k | 93.0k | 4.6M | 271.4k | 4 | 37m 40s | | verify | 109 | 64.2k | 5.4M | 277.1k | 4 | 28m 55s | | implement | 224 | 71.2k | 12.5M | 282.5k | 4 | 32m 50s | | audit_impl | 117 | 25.1k | 1.0M | 114.6k | 3 | 12m 6s | | open_pr | 131 | 76.9k | 4.8M | 232.2k | 4 | 27m 43s | | review_pr | 100 | 134.7k | 4.3M | 237.6k | 3 | 38m 8s | | resolve_review | 77 | 40.4k | 3.7M | 117.7k | 2 | 18m 16s | | fix | 91 | 32.1k | 3.8M | 120.9k | 2 | 21m 36s | | diagnose_ci | 13 | 1.4k | 161.4k | 15.6k | 1 | 37s | | resolve_ci | 18 | 3.7k | 293.8k | 29.1k | 1 | 3m 2s | | **Total** | 3.8k | 542.7k | 40.5M | 1.7M | | 3h 40m | --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
| from pathlib import Path | ||
| from typing import Any, Literal | ||
|
|
||
| import httpx |
There was a problem hiding this comment.
[warning] arch: httpx imported at module level — adds a hard runtime dep on every CLI invocation even when update checks are suppressed. Should be deferred inside the function that uses it.
There was a problem hiding this comment.
Investigated — this is intentional. Duplicate of 3084082999. (category: false_positive_intentional_pattern)
| ) | ||
|
|
||
|
|
||
| def _check_plugin_cache_exists( |
There was a problem hiding this comment.
[warning] bugs: _check_plugin_cache_exists has no exception handler around detect_install(). Any classification error will surface as an unhandled exception rather than a graceful DoctorResult.
There was a problem hiding this comment.
Investigated — this is intentional. Duplicate of 3084083006. (category: false_positive_intentional_pattern)
| return | ||
|
|
||
| async with anyio.create_task_group() as tg: | ||
| await tg.start(_watch, tg.cancel_scope) |
There was a problem hiding this comment.
[warning] bugs: mcp_server.run_async() exception before cancel_scope is cancelled propagates silently in the task group rather than being logged.
There was a problem hiding this comment.
Investigated — this is intentional. Duplicate of 3084083010. (category: design_intent_misread)
| import sys | ||
|
|
||
| from autoskillit.cli._ansi import supports_color | ||
| from autoskillit.cli._init_helpers import _require_interactive_stdin |
There was a problem hiding this comment.
[warning] arch: _timed_input imports from _init_helpers creating a circular-risk peer dependency; _require_interactive_stdin should be inlined or moved to a lower layer.
There was a problem hiding this comment.
Investigated — this is intentional. Duplicate of 3084083014. (category: false_positive_intentional_pattern)
| # Deferred import breaks the circular dependency with _analysis.py. | ||
| from autoskillit.recipe._analysis import _build_step_graph, extract_blocks # noqa: PLC0415 | ||
|
|
||
| recipe.blocks = extract_blocks(recipe, _build_step_graph(recipe)) |
There was a problem hiding this comment.
[warning] arch: load_recipe mutates Recipe.blocks after construction via extract_blocks, breaking the dataclass immutability contract. Code that calls _parse_recipe directly (e.g. contracts.py) gets a Recipe with empty blocks, silently.
There was a problem hiding this comment.
Investigated — this is intentional. Duplicate of 3084083031. (category: false_positive_intentional_pattern)
| budget_entry = _budget_for(bctx.block.name) | ||
| if "run_cmd" not in budget_entry: | ||
| return [] # No run_cmd budget declared for this block — skip check | ||
| budget = int(budget_entry["run_cmd"]) |
There was a problem hiding this comment.
[warning] bugs: _check_block_run_cmd_budget: int(budget_entry['run_cmd']) will raise ValueError/TypeError if the YAML value is not a valid integer (e.g. a float or string). Should guard with try/except or explicit type check.
| @lru_cache(maxsize=1) | ||
| def _block_budgets() -> Mapping[str, Mapping[str, Any]]: | ||
| """Load block_budgets.yaml, cached for the lifetime of the process.""" | ||
| path = pkg_root() / "recipe" / "block_budgets.yaml" |
There was a problem hiding this comment.
[warning] defense: _block_budgets() catches FileNotFoundError but not YAMLError or ValueError from load_yaml. A malformed block_budgets.yaml returns empty dict and all block rules silently skip.
| should never contain these characters, but the guard makes the failure | ||
| loud and free. | ||
| """ | ||
| if "\n" in temp_dir_relpath or ": " in temp_dir_relpath: |
There was a problem hiding this comment.
[warning] defense: substitute_temp_placeholder only guards against newline and ': '. A path containing # or [ could still produce malformed YAML. The guard comment says 'filesystem paths should never contain these' but this is an assertion, not enforcement.
| @@ -97,6 +123,9 @@ class Recipe: | |||
| kitchen_rules: list[str] = field(default_factory=list) | |||
| version: str | None = None | |||
| experimental: bool = False | |||
| requires_packs: list[str] = field(default_factory=list) | |||
| # Populated by extract_blocks() during load; empty tuple for recipes with no block: anchors. | |||
| blocks: tuple[RecipeBlock, ...] = field(default_factory=tuple) | |||
There was a problem hiding this comment.
[warning] arch: Recipe.blocks has a two-phase initialization pattern: _parse_recipe produces empty blocks, while load_recipe populates them. Code calling _parse_recipe directly gets a Recipe with empty blocks silently, with no sentinel or incomplete-state marker.
| if isinstance(data, dict): | ||
| spec = _parse_experiment_type(data, path) | ||
| result[spec.name] = spec | ||
| except Exception: |
There was a problem hiding this comment.
[warning] defense: _load_types_from_dir catches bare Exception and logs a warning, but this also hides TypeError/AttributeError from _parse_experiment_type bugs during development. Narrow to (ValueError, TypeError, OSError, KeyError).
| _AUTOSKILLIT_LOG_DIR_ENV = "AUTOSKILLIT_LOG_DIR" | ||
|
|
||
|
|
||
| def _read_quota_cache(cache_path_str: str, max_age: int) -> dict | None: |
There was a problem hiding this comment.
[warning] cohesion: _read_quota_cache is duplicated verbatim from quota_guard.py. These two sibling stdlib hooks share identical logic that should live in _hook_settings.py to avoid drift.
| return None | ||
|
|
||
|
|
||
| def _resolve_quota_log_dir() -> Path | None: |
There was a problem hiding this comment.
[warning] cohesion: _resolve_quota_log_dir is duplicated verbatim from quota_guard.py. Same consolidation opportunity.
| return None | ||
|
|
||
|
|
||
| def _write_quota_log_event(event: dict, log_dir: Path | None) -> None: |
There was a problem hiding this comment.
[warning] cohesion: _write_quota_log_event is duplicated verbatim from quota_guard.py. Three identical helper functions across two quota hooks with no shared home.
| _age = datetime.now(UTC) - _opened_at | ||
| if _age.total_seconds() >= _ttl_hours * 3600: | ||
| _p.unlink() | ||
| except Exception: |
There was a problem hiding this comment.
[warning] defense: On a corrupt marker file, the sweep calls _p.unlink() — silently deleting a file that may be temporarily unreadable (e.g. EINTR or disk flush). At minimum log before deleting.
| with os.fdopen(fd, "w", encoding="utf-8") as f: | ||
| f.write(payload) | ||
| os.replace(tmp, marker_path) | ||
| except Exception: |
There was a problem hiding this comment.
[warning] defense: On exception in _write_kitchen_marker, bare raise re-raises after cleanup, but the caller catches and emits a warning message — the original exception traceback is lost to the user (only str(e) survives). Consider logging traceback to stderr.
| for remote_name in ("upstream", "origin"): | ||
| result = _probe_single_remote(source, remote_name) | ||
| last_result = result | ||
| if result.reason == "ok" and _is_not_file_url(result.url): |
There was a problem hiding this comment.
[warning] bugs: _probe_clone_source_url: when both upstream and origin timeout, caller gets 'timeout' reason with no indication that both probes timed out.
| def __init__(self) -> None: | ||
| self._resolver = SkillResolver() | ||
| def __init__(self, temp_dir_relpath: str = ".autoskillit/temp") -> None: | ||
| if "\n" in temp_dir_relpath or ": " in temp_dir_relpath: |
There was a problem hiding this comment.
[warning] defense: SkillsDirectoryProvider.init validates temp_dir_relpath for newline and ': ' but not for other YAML-unsafe chars like bare colon, {}, or []. The guard catches the most dangerous cases but documents incomplete coverage.
- cli/_update_checks: _api_sha now tries refs/tags for tag revisions - config/settings: annotate _EXIT_GRACE_BUFFER_MS as ClassVar[int] - execution/_process_monitor: cache psutil.Process objects across calls so cpu_percent(interval=0) returns meaningful deltas - hooks/_hook_settings: add ENV_DISABLED env-var override for disabled - workspace/clone_registry: wrap open+flock in try/except in __enter__ to prevent fd leak if flock() raises - recipe/_analysis: extract_blocks accepts precomputed predecessors map to avoid duplicate computation; add warning logs for fallback entry/exit selection - recipe/rules_fixing: use deque.popleft() instead of list.pop(0) - recipe/rules_reachability: use ctx.predecessors in _ancestors(); _find_capture_producers returns all producers - recipe/rules_contracts: log warning on unreadable SKILL.md - server/tools_kitchen: add gate.disable() on start_quota_refresh failure for consistency - server/_factory: make recording ImportError degrade gracefully like replay path - server/_wire_compat: use model_copy() instead of in-place mutation to avoid modifying shared FastMCP tool registry objects Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update JSON write site allowlist line numbers for clone_registry and tools_kitchen after code changes shifted lines - Wire compat middleware tests: use model_copy mock returns instead of in-place mutation expectations - Process monitor tests: account for two-call priming pattern with cached psutil.Process objects; clear module cache between tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Promotion: integration to main
This promotion merges 382 commits across 198 PRs from the
integrationbranch intomain, advancing AutoSkillit from v0.5.2 to v0.8.38 across three minor release cycles. The release delivers a production-grade research pipeline with containerized Micromamba execution, a 31-lens visualization and experiment family, and full output-mode routing. The quota guard was fundamentally redesigned with a dual-window model, while headless session orchestration gained single-skill mode, anomaly detection, idle timeouts, and verdict-gated CI recovery. Merge workflow reliability was hardened with three-way routing and a cheap rebase gate, and the recording/replay infrastructure was rewritten in Rust via PyO3.Stats: 671 files changed, 99570 insertions(+), 12642 deletions(-) | 102 fixes, 107 features, 7 tests, 1 infra, 4 docs
Highlights
output_modeingredient (breaking: default changed from implicitprtolocal)disable_quota_guardMCP toolresolve-failuresemits typed verdicts (real_fix/already_green/flake_suspected/ci_only_failure); recipes route viaon_resultgates enforced by a new semantic ruleidle_output_timeoutper step, structured crash-path error returnsprepare-pr/compose-prdecomposition,validate-auditskill,review-design/resolve-design-reviewskills, first-run guided onboardingRelease Notes
New Features
Research Pipeline
ExperimentTypeSpec) with 5 bundled typesvis-lensvisualization skills and 19exp-lensexperiment lens skillsplan-visualizationstep wired post-design-review;output_modeingredient (local|pr)bundle-local-reportskill for offline report packagingQuota Guard
disable_quota_guardMCP tool for session-scoped opt-outHeadless Session Orchestration
MAX_MCP_OUTPUT_TOKENSinjected at builder level for all session typesidle_output_timeoutper-step recipe override; bounded staleness suppression (1800s max)Merge Workflow
autoMergeAllowedrepos (queue / direct / immediate)Skill System
prepare-pr/compose-prdecomposition replacingopen-prvalidate-audit,review-design,resolve-design-review,resolve-research-reviewskills--resumeflag forcookandorderCLI commandschart-courseandcheck-bearingproject-local strategic skillsRecording/Replay Infrastructure
RecordingSubprocessRunnerandSequencingSubprocessRunnerfor api-simulatorMcpRecordingMiddlewarefor MCP-level capture; api-simulator rewritten in Rust via PyO3CLI & Onboarding
terminal_guard()alternate-screen buffer envelope; terminal freeze immunityconfig.yaml; stable recipe listing orderToken Telemetry
order_idscopinggh pr edit/viewMCP Server
_wire_compat.py)Clone System
clone_repoclones from remote URLprocess-issuesPretty Output Hook
run_skill,run_cmd,test_check,merge_worktree,kitchen_status,clone_repo,load_recipe,open_kitchen,list_recipesBug Fixes
102 rectification PRs addressing:
run_cmdenv stripping regression; parallel pipeline deadlock (signal handling)Test Suite
Infrastructure
patch-bump-integration.ymlworkflow for auto-incrementing patch version on PR mergeversion-bump.ymlupdated: minor bump on promote (X.Y.Z → X.Y+1.0)Breaking Changes
Migration
0.7.77-to-0.8.0— Research Recipe Overhaulwrite-reportrenamed togenerate-reportoutput_modedefault changed from implicitprtolocalcommit_research_artifactsreplaced bystage_bundle+finalize_bundlevis-lensmust appear inrequires_packsMigration
0.8.9-to-0.9.0— Verdict-Gated CI Recoveryverdictandfixes_appliedoutputson_result:verdict dispatch instead of unconditionalon_success: re_pushconditional-skill-ungated-pushsemantic rule (ERROR severity) enforces thisOther
run_skillexit_after_stop_delay_msreduced from 120s to 2sthresholdreplaced by dual-window modeloptional_context_refs,stale_threshold,idle_output_timeout,block,requires_packsMerged PRs
Attention Required
GH_PATenvironment variable required to build0.7.77-to-0.8.0— failure defaults tolocalmode (no PR creation)0.8.9-to-0.9.0— unrouted pushes now fail validationrun_skillcallers:exit_after_stop_delay_msreduced from 120s to 2sArchitecture Impact
Module Dependency (Structural — "How are modules coupled?")
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%% graph TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; subgraph L3 ["L3 — APPLICATION"] direction LR SERVER["● server/<br/>━━━━━━━━━━<br/>FastMCP tools (18 files)<br/>Fan-in: 17"] CLI["● cli/<br/>━━━━━━━━━━<br/>CLI entry points (22 files)<br/>★ _onboarding, _update_checks<br/>★ _serve_guard, _terminal"] end subgraph L2 ["L2 — DOMAIN"] direction LR RECIPE["● recipe/<br/>━━━━━━━━━━<br/>Schema + validation (35 files)<br/>Fan-in: 40<br/>★ rules_blocks, rules_packs<br/>★ experiment_type_registry"] MIGRATION["● migration/<br/>━━━━━━━━━━<br/>Version migrations (5 files)<br/>★ 0.7.77→0.8.0<br/>★ 0.8.9→0.9.0"] end subgraph L1 ["L1 — SERVICES"] direction LR CONFIG["● config/<br/>━━━━━━━━━━<br/>Dynaconf settings (3 files)<br/>Fan-in: 20"] PIPELINE["● pipeline/<br/>━━━━━━━━━━<br/>DI + telemetry (9 files)<br/>Fan-in: 14<br/>★ background.py"] EXECUTION["● execution/<br/>━━━━━━━━━━<br/>Headless + process (21 files)<br/>Fan-in: 19<br/>★ recording, clone_guard<br/>★ _headless_scan"] WORKSPACE["● workspace/<br/>━━━━━━━━━━<br/>Clone + skills (7 files)<br/>Fan-in: 14<br/>★ clone_registry<br/>★ worktree"] end subgraph L0 ["L0 — FOUNDATION"] direction LR CORE["● core/<br/>━━━━━━━━━━<br/>Types + IO (15 files)<br/>Fan-in: 109<br/>★ _claude_env, readiness<br/>★ kitchen_state"] end subgraph STANDALONE ["STANDALONE — HOOKS"] direction LR HOOKS["● hooks/<br/>━━━━━━━━━━<br/>Pre/PostToolUse (19 files)<br/>★ pretty_output_hook<br/>★ quota_post_hook<br/>★ token_summary_hook"] HOOKREG["● hook_registry.py"] end subgraph EXT ["EXTERNAL"] direction LR FASTMCP["fastmcp"] HTTPX["httpx"] ANYIO["anyio"] end SERVER -->|"recipe, migration"| RECIPE SERVER -->|"migration"| MIGRATION CLI -->|"recipe, migration"| RECIPE SERVER -->|"pipeline, execution,<br/>workspace, config"| PIPELINE CLI -->|"config, execution,<br/>workspace"| CONFIG SERVER -->|"core (12 files)"| CORE CLI -->|"core (10 files)"| CORE RECIPE -->|"core (20 files)"| CORE MIGRATION -->|"core (3 files)"| CORE RECIPE -.->|"workspace (deferred)"| WORKSPACE CONFIG -->|"core"| CORE PIPELINE -->|"core"| CORE EXECUTION -->|"core"| CORE WORKSPACE -->|"core"| CORE EXECUTION -.->|"config (deferred)"| CONFIG HOOKREG -->|"core"| CORE HOOKS -->|"hook_registry"| HOOKREG SERVER -.->|"⚠ cli (hard L3→L3)"| CLI SERVER --> FASTMCP EXECUTION --> HTTPX CLI --> ANYIO class SERVER,CLI cli; class RECIPE,MIGRATION phase; class CONFIG,PIPELINE,EXECUTION,WORKSPACE handler; class CORE stateNode; class HOOKS,HOOKREG newComponent; class FASTMCP,HTTPX,ANYIO integration;Legend: Dark Blue = L3 App | Purple = L2 Domain | Orange = L1 Services | Teal = L0 Foundation | Green = Hooks | Red = External | Dashed = Deferred/violation
Process Flow (Physiological — "How does it behave?")
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 55, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; START([START]) COMPLETE([RECIPE COMPLETE]) ESCALATE([ESCALATE TO USER]) subgraph KITCHEN ["★ Kitchen Lifecycle"] OPEN["● open_kitchen<br/>━━━━━━━━━━<br/>Prime quota cache<br/>Start refresh loop"] LOAD["● load_recipe<br/>━━━━━━━━━━<br/>YAML → Recipe"] end subgraph SOUSCHEF ["● Sous-Chef Loop"] STEP{"● Step Eval<br/>━━━━━━━━━━<br/>skip? retries?"} QUOTA{"★ Quota Gate<br/>━━���━━━━━━━<br/>Dual-window"} DISPATCH["● run_skill"] end subgraph HEADLESS ["● Headless Session"] SPAWN["● run_managed_async<br/>━━━━━━━━━━<br/>anyio task group"] RACE{"● Channel Race<br/>━━━━━━━━━━<br/>A: stdout | B: JSONL<br/>★ idle_timeout"} CLASSIFY["● Result Classification<br/>━━━━━━━━━━<br/>★ Recovery pipeline<br/>★ Zero-write gate"] end subgraph VERDICT ["★ Verdict Routing"] ROUTE{"● on_result<br/>━━━━━━━━━━<br/>Typed dispatch"} REAL_FIX["re_push"] GREEN["★ pre_resolve_rebase"] HUMAN["release_issue_failure"] CI["● wait_for_ci"] end subgraph MERGE ["● Merge Workflow"] MERGE_EVAL{"● route_queue_mode<br/>━━━━━━━━━━<br/>★ 3-way routing"} QUEUE["queue path"] DIRECT["direct merge"] IMMEDIATE["★ immediate"] end START --> OPEN --> LOAD --> STEP STEP -->|"skip=false"| QUOTA QUOTA -->|"allowed"| DISPATCH QUOTA -->|"★ blocked"| QUOTA DISPATCH --> SPAWN --> RACE --> CLASSIFY CLASSIFY -->|"success"| ROUTE CLASSIFY -->|"needs_retry"| STEP CLASSIFY -->|"budget exhausted"| ESCALATE ROUTE -->|"on_success"| MERGE_EVAL ROUTE -->|"★ real_fix"| REAL_FIX ROUTE -->|"★ already_green"| GREEN ROUTE -->|"★ flake/ci_only"| HUMAN REAL_FIX --> CI GREEN --> CI CI -->|"green"| MERGE_EVAL CI -->|"failure"| ROUTE HUMAN --> ESCALATE MERGE_EVAL -->|"queue+trigger"| QUEUE MERGE_EVAL -->|"auto OK"| DIRECT MERGE_EVAL -->|"★ neither"| IMMEDIATE QUEUE --> COMPLETE DIRECT --> COMPLETE IMMEDIATE --> COMPLETE class START,COMPLETE,ESCALATE terminal; class OPEN,LOAD,DISPATCH,SPAWN,CLASSIFY handler; class STEP,QUOTA,RACE,MERGE_EVAL stateNode; class ROUTE,CI phase; class REAL_FIX,GREEN,HUMAN,QUEUE,DIRECT,IMMEDIATE newComponent;Legend: Dark Blue = Terminal | Teal = Decisions | Orange = Processing | Purple = Verdict routing | Green = New components
C4 Container (Anatomical — "How is it built?")
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% graph TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; USER(["Developer<br/>━━━━━━━━━━<br/>Claude Code user"]) subgraph APP ["APPLICATION"] direction LR CLI_APP["● CLI<br/>━━━━━━━━━━<br/>cyclopts + anyio<br/>★ onboarding, update"] MCP["● MCP Server<br/>━━━━━━━━━━<br/>FastMCP v3 (stdio)<br/>★ wire_compat, lifespan"] CHEF["● Sous-Chef<br/>━━━━━━━━━━<br/>Tier 1 Claude<br/>★ wavefront, verdicts"] end subgraph HOOKS_L ["★ HOOKS"] direction LR PRE["★ PreToolUse<br/>━━━━━━━━━━<br/>quota_guard<br/>ask_user_guard"] POST["★ PostToolUse<br/>━━━━━━━━━━<br/>pretty_output<br/>quota_post, token_summary"] end subgraph DOMAIN ["DOMAIN"] direction LR RECIPE["● Recipe Engine<br/>━━━━━━━━━━<br/>igraph + YAML<br/>★ 7 new rule modules<br/>★ experiment types"] MIGR["● Migration<br/>━━━━━━━━━━<br/>★ 0.7.77→0.8.0<br/>★ 0.8.9→0.9.0"] end subgraph SERVICE ["SERVICES"] direction LR EXEC["● Execution<br/>━━━━━━━━━━<br/>anyio + psutil<br/>★ recording, anomaly<br/>★ idle timeout"] WS["● Workspace<br/>━━━━━━━━━━<br/>★ clone_registry<br/>★ worktree"] PIPE["● Pipeline DI<br/>━━━━━━━━━━<br/>★ background"] CONF["● Config<br/>━━━━━━━━━━<br/>dynaconf<br/>★ dual-window quota"] end subgraph FOUND ["FOUNDATION"] CORE["● Core<br/>━━━━━━━━━━<br/>structlog + PyYAML<br/>★ _claude_env, readiness"] end subgraph STORE ["STORAGE"] direction LR RECIPES[("● Recipes<br/>━━━━━━━━━━<br/>★ research.yaml<br/>★ experiment-types/")] SKILLS[("● Skills<br/>━━━━━━━━━━<br/>120+ SKILL.md<br/>★ 60+ new")] LOGS[("Session Logs")] CACHE[("★ Quota Cache")] end subgraph EXT ["EXTERNAL"] direction LR CLAUDE["Claude CLI<br/>━━━━━━━━━━<br/>subprocess + PTY"] GH["GitHub API<br/>━━━━━━━━━━<br/>REST + GraphQL"] ANTH["Anthropic API<br/>━━━━━━━━━━<br/>Token quota"] end USER -->|"CLI / MCP stdio"| CLI_APP CLI_APP -->|"starts"| MCP MCP -->|"injects"| CHEF CHEF -->|"MCP tools"| MCP CHEF -.->|"intercept"| PRE MCP -.->|"intercept"| POST MCP -->|"loads"| RECIPE MCP -->|"migrates"| MIGR MCP -->|"spawns"| EXEC MCP -->|"isolates"| WS MCP -->|"injects"| PIPE MCP -->|"reads"| CONF RECIPE --> CORE EXEC --> CORE WS --> CORE CONF --> CORE PIPE --> CORE EXEC -->|"reads"| RECIPES WS -->|"reads"| SKILLS EXEC -->|"writes"| LOGS EXEC -->|"reads/writes"| CACHE PRE -->|"reads"| CACHE EXEC -->|"subprocess"| CLAUDE EXEC -->|"CI/merge queue"| GH EXEC -->|"quota"| ANTH WS -->|"git"| GH class USER,CLI_APP,MCP,CHEF cli; class RECIPE,MIGR phase; class EXEC,WS,PIPE,CONF handler; class CORE stateNode; class PRE,POST newComponent; class RECIPES,SKILLS,LOGS,CACHE output; class CLAUDE,GH,ANTH integration;Legend: Dark Blue = Application | Purple = Domain | Orange = Services | Teal = Foundation | Green = New hooks | Dark Teal = Storage | Red = External
Closes #401
Closes #427
Closes #429
Closes #439
Closes #440
Closes #441
Closes #444
Closes #445
Closes #447
Closes #448
Closes #449
Closes #456
Closes #457
Closes #461
Closes #462
Closes #466
Closes #468
Closes #469
Closes #470
Closes #471
Closes #475
Closes #477
Closes #480
Closes #481
Closes #486
Closes #487
Closes #488
Closes #494
Closes #498
Closes #499
Closes #503
Closes #504
Closes #506
Closes #507
Closes #509
Closes #512
Closes #513
Closes #514
Closes #516
Closes #522
Closes #524
Closes #525
Closes #526
Closes #527
Closes #529
Closes #532
Closes #533
Closes #537
Closes #538
Closes #539
Closes #540
Closes #541
Closes #547
Closes #550
Closes #553
Closes #554
Closes #555
Closes #565
Closes #566
Closes #567
Closes #572
Closes #576
Closes #579
Closes #589
Closes #590
Closes #591
Closes #592
Closes #593
Closes #599
Closes #600
Closes #601
Closes #603
Closes #604
Closes #605
Closes #607
Closes #608
Closes #609
Closes #610
Closes #617
Closes #618
Closes #619
Closes #621
Closes #627
Closes #629
Closes #631
Closes #632
Closes #635
Closes #637
Closes #638
Closes #641
Closes #643
Closes #644
Closes #646
Closes #647
Closes #652
Closes #653
Closes #655
Closes #657
Closes #662
Closes #663
Closes #664
Closes #666
Closes #669
Closes #672
Closes #673
Closes #676
Closes #680
Closes #681
Closes #686
Closes #688
Closes #690
Closes #692
Closes #693
Closes #697
Closes #698
Closes #700
Closes #703
Closes #704
Closes #707
Closes #710
Closes #711
Closes #712
Closes #716
Closes #717
Closes #718
Closes #719
Closes #721
Closes #723
Closes #724
Closes #725
Closes #729
Closes #739
Closes #741
Closes #742
Closes #744
Closes #745
Closes #747
Closes #755
Closes #756
Closes #757
Closes #758
Closes #759
Closes #760
Closes #763
Closes #771
Closes #774
Closes #775
Closes #777
Closes #778
Closes #784
Closes #785
Closes #786
Closes #787
Closes #788
Closes #789
Closes #790
Closes #801
Closes #802
Closes #804
Closes #805
Closes #806
Closes #807
Closes #811
Closes #814
Closes #815
Closes #816
Closes #817
Closes #819
Closes #859
Closes #892
Closes #894
Closes #895
Closes #902
Closes #904
Closes #907
Closes #911
Closes #912
Closes #913
Closes #914
Closes #915
Closes #916
Closes #919
Generated with Claude Code via AutoSkillit