Implementation Plan: Dry Walkthrough — Test Command Genericization (Issue #307) by Trecek · Pull Request #337 · TalonT-Org/AutoSkillit

Trecek · 2026-03-11T05:50:55Z

Summary

The dry-walkthrough skill's Step 4.5 (Historical Regression Check) is already fully implemented in SKILL.md — including the git history scan, GitHub issues cross-reference, gh auth guard, actionable vs. informational classification, and the ### Historical Context section in Step 7. Contract tests for Step 4.5 exist and pass in tests/skills/test_dry_walkthrough_contracts.py.

The remaining work is REQ-GEN-002 from Issue #311 (consolidated here): replace the hardcoded task test-all references in Step 4 of SKILL.md with a config-driven reference to test_check.command, so the skill validates correctly for any project regardless of its test runner.

Architecture Impact

Process Flow Diagram

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    START([START])
    PASS([PASS — Ready to implement])
    REVISED([REVISED — See changes])

    subgraph Load ["Step 1: Load & Detect"]
        direction TB
        LOAD["Load Plan<br/>━━━━━━━━━━<br/>arg path / pasted / temp/ scan"]
        MULTIPART{"Multi-part plan?<br/>━━━━━━━━━━<br/>filename contains _part_?"}
        SCOPE_WARN["Scope Warning<br/>━━━━━━━━━━<br/>Insert boundary block<br/>Restrict to this part only"]
    end

    subgraph Validate ["Steps 2–3: Validate Phases"]
        direction TB
        PHASE_V["Phase Subagents<br/>━━━━━━━━━━<br/>files exist · fns exist<br/>assumptions correct<br/>wiring complete"]
        XPHASE["Cross-Phase Deps<br/>━━━━━━━━━━<br/>Phase ordering · implicit deps<br/>reorder safety"]
    end

    subgraph Rules ["● Step 4: Validate Against Project Rules"]
        direction TB
        RULE_CHK["Rule Checklist<br/>━━━━━━━━━━<br/>No compat code · no hidden fallbacks<br/>no stakeholder sections · arch patterns"]
        READ_CFG["● Read test_check.command<br/>━━━━━━━━━━<br/>.autoskillit/config.yaml<br/>default: task test-check"]
        TEST_CMD{"● Non-config test<br/>command in plan?<br/>━━━━━━━━━━<br/>pytest / python -m pytest<br/>make test etc."}
        WORKTREE{"Hardcoded worktree<br/>setup in plan?<br/>━━━━━━━━━━<br/>uv venv / pip install etc."}
    end

    subgraph History ["Step 4.5: Historical Regression Check"]
        direction TB
        GH_AUTH{"gh auth status<br/>━━━━━━━━━━<br/>GitHub auth available?"}
        GIT_SCAN["Git History Scan<br/>━━━━━━━━━━<br/>git log -100 on PLAN_FILES<br/>fix/revert/remove/replace/delete"]
        GIT_SIG{"Signal strength?<br/>━━━━━━━━━━<br/>Symbol-level match?"}
        GH_ISSUES["GitHub Issues XRef<br/>━━━━━━━━━━<br/>open + closed (last 30d)<br/>keyword cross-reference"]
        GH_MATCH{"Issue type?<br/>━━━━━━━━━━<br/>open vs closed match"}
    end

    subgraph Fix ["Steps 5–6: Fix & Mark"]
        direction TB
        FIX["Fix the Plan<br/>━━━━━━━━━━<br/>Direct edits to plan file<br/>No gap-analysis sections"]
        MARK["Mark Verified<br/>━━━━━━━━━━<br/>Dry-walkthrough verified = TRUE<br/>(first line of plan)"]
    end

    REPORT["● Step 7: Report to Terminal<br/>━━━━━━━━━━<br/>Changes Made · Verified<br/>### Historical Context · Recommendation"]

    %% FLOW %%
    START --> LOAD
    LOAD --> MULTIPART
    MULTIPART -->|"YES"| SCOPE_WARN
    MULTIPART -->|"NO"| PHASE_V
    SCOPE_WARN --> PHASE_V
    PHASE_V --> XPHASE
    XPHASE --> RULE_CHK
    RULE_CHK --> READ_CFG
    READ_CFG --> TEST_CMD
    TEST_CMD -->|"YES — replace"| FIX
    TEST_CMD -->|"NO"| WORKTREE
    WORKTREE -->|"YES — flag & replace"| FIX
    WORKTREE -->|"NO"| GH_AUTH
    GH_AUTH -->|"available"| GIT_SCAN
    GH_AUTH -->|"unavailable"| GIT_SCAN
    GIT_SCAN --> GIT_SIG
    GIT_SIG -->|"strong — actionable"| FIX
    GIT_SIG -->|"weak — informational"| REPORT
    GH_AUTH -->|"available"| GH_ISSUES
    GH_ISSUES --> GH_MATCH
    GH_MATCH -->|"closed issue — actionable"| FIX
    GH_MATCH -->|"open issue — informational"| REPORT
    FIX --> MARK
    MARK --> REPORT
    REPORT -->|"issues found"| REVISED
    REPORT -->|"no issues"| PASS

    %% CLASS ASSIGNMENTS %%
    class START,PASS,REVISED terminal;
    class LOAD,PHASE_V,XPHASE,GIT_SCAN,GH_ISSUES handler;
    class RULE_CHK,READ_CFG phase;
    class MULTIPART,TEST_CMD,WORKTREE,GH_AUTH,GIT_SIG,GH_MATCH stateNode;
    class FIX,MARK detector;
    class SCOPE_WARN output;
    class REPORT output;

● Modified component | ★ New component

Closes #307

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/impl-307-20260310-203646-026624/temp/make-plan/dry_walkthrough_historical_regression_plan_2026-03-10_120000.md

🤖 Generated with Claude Code via AutoSkillit

Replace hardcoded `task test-all` references in Step 4 of dry-walkthrough SKILL.md with config-driven references to `test_check.command` from .autoskillit/config.yaml. Add two contract tests that enforce the new behaviour and fail against the old hardcoded wording. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Trecek

AutoSkillit review passed. No blocking issues found. (Self-review — cannot approve own PR)

…333, #342 into integration (#351) ## Integration Summary Collapsed 9 PRs into `pr-batch/pr-merge-20260311-133920` targeting `integration`. ## Merged PRs | # | Title | Complexity | Additions | Deletions | Overlaps | |---|-------|-----------|-----------|-----------|---------| | #337 | Implementation Plan: Dry Walkthrough — Test Command Genericization (Issue #307) | simple | +29 | -2 | — | | #339 | Implementation Plan: Release CI — Force-Push Integration Back-Sync | simple | +88 | -45 | — | | #336 | Enhance prepare-issue with Duplicate Detection and Broader Triggers | needs_check | +161 | -8 | — | | #332 | Rectify: Display Output Bugs #329 — Terminal Targets Consolidation — PART A ONLY | needs_check | +783 | -13 | — | | #338 | Implementation Plan: Pre-release Readiness — Stability Fixes | needs_check | +238 | -36 | — | | #343 | Implementation Plan: PR Pipeline Gates — Mergeability Gate and Review Cycle | needs_check | +384 | -5 | #338 | | #341 | Pipeline observability: quota events, wall-clock timing, drift fix | needs_check | +480 | -5 | #332, #338 | | #333 | Remove run_recipe — Eliminate Sub-Orchestrator Pattern | needs_check | +538 | -655 | #332, #338, #341 | | #342 | feat: genericize codebase and bundle external dependencies for public release | needs_check | +5286 | -1062 | #332, #333, #338, #341, #343 | ## Audit **Verdict:** GO ## Architecture Impact ### Development Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; subgraph SourceTree ["PROJECT STRUCTURE (● = modified)"] direction TB SRC["● src/autoskillit/ ━━━━━━━━━━ 105 .py source files cli · config · core execution · hooks · pipeline recipe · server · workspace"] SKILLS["● + ★ src/autoskillit/skills/ ━━━━━━━━━━ 52 bundled skills ★ 13 arch-lens-* SKILL.md added ★ 3 audit-* SKILL.md added ● 14 existing skills updated"] RECIPES["● src/autoskillit/recipes/ ━━━━━━━━━━ 8 bundled YAML recipes All recipes updated"] TESTS["● + ★ tests/ ━━━━━━━━━━ 173 .py test files ★ 6 new test files added"] end subgraph Build ["BUILD TOOLING"] direction TB PYPROJECT["● pyproject.toml ━━━━━━━━━━ hatchling build backend uv package manager 10 runtime deps"] TASKFILE["Taskfile.yml ━━━━━━━━━━ test-all · test-check test-smoke · install-worktree"] end subgraph Quality ["CODE QUALITY GATES"] direction TB RFMT["ruff-format ━━━━━━━━━━ Auto-fix formatting"] RLINT["ruff ━━━━━━━━━━ Lint + auto-fix"] MYPY["mypy src/ ━━━━━━━━━━ --ignore-missing-imports"] UVLOCK["uv lock --check ━━━━━━━━━━ Lock file integrity"] SECRETS["gitleaks ━━━━━━━━━━ Secret scanning"] GUARD["★ headless_orchestration_guard.py ━━━━━━━━━━ ★ PreToolUse hook Blocks run_skill/run_cmd/run_python from headless sessions"] end subgraph Testing ["TEST FRAMEWORK"] direction TB PYTEST["pytest + asyncio_mode=auto ━━━━━━━━━━ xdist -n 4 parallel timeout=60s signal method"] NEWTEST["★ New Test Files ━━━━━━━━━━ ★ test_headless_orchestration_guard ★ test_audit_and_fix_degradation ★ test_rules_inputs ★ test_skill_genericization ★ test_pyproject_metadata ★ test_release_sanity"] end subgraph CI ["CI/CD WORKFLOWS"] direction LR TESTS_WF["tests.yml ━━━━━━━━━━ PR test gate"] RELEASE_WF["release.yml ━━━━━━━━━━ Release automation"] BUMP_WF["● version-bump.yml ━━━━━━━━━━ ● Force-push back-sync integration → main"] end subgraph EntryPoints ["ENTRY POINTS"] EP["autoskillit CLI ━━━━━━━━━━ serve · init · skills recipes · doctor · workspace"] end SRC --> PYPROJECT SKILLS --> PYPROJECT TESTS --> PYTEST PYPROJECT --> TASKFILE PYPROJECT --> RFMT RFMT --> RLINT RLINT --> MYPY MYPY --> UVLOCK UVLOCK --> SECRETS SECRETS --> GUARD GUARD --> PYTEST PYTEST --> NEWTEST NEWTEST --> BUMP_WF TESTS_WF --> PYTEST PYPROJECT --> EP class SRC,TESTS stateNode; class SKILLS,RECIPES newComponent; class PYPROJECT,TASKFILE phase; class RFMT,RLINT,MYPY,UVLOCK,SECRETS detector; class GUARD newComponent; class PYTEST handler; class NEWTEST newComponent; class TESTS_WF,RELEASE_WF phase; class BUMP_WF newComponent; class EP output; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Teal | Structure | Source directories and test suite | | Green (★) | New/Modified | New files and components added in this PR | | Purple | Build | Build configuration and task automation | | Red | Quality Gates | Pre-commit hooks, linters, type checker | | Orange | Test Runner | pytest execution engine | | Dark Teal | Entry Points | CLI commands | ### Module Dependency Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%% graph TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; subgraph L0 ["L0 — CORE (zero autoskillit imports)"] direction LR TYPES["● core/types.py ━━━━━━━━━━ GATED_TOOLS · UNGATED_TOOLS RecipeSource (★ promoted here) ClaudeFlags · StrEnums fan-in: ~75 files"] COREIO["core/io.py · logging.py · paths.py ━━━━━━━━━━ Atomic write · Logger · pkg_root()"] end subgraph L1P ["L1 — PIPELINE (imports L0 only)"] direction TB GATE["● pipeline/gate.py ━━━━━━━━━━ DefaultGateState gate_error_result() ★ headless_error_result() re-exports GATED/UNGATED_TOOLS"] PIPEINIT["● pipeline/__init__.py ━━━━━━━━━━ Re-exports public surface ToolContext · AuditLog TokenLog · DefaultGateState"] end subgraph L1E ["L1 — EXECUTION (imports L0 only)"] direction TB HEADLESS["● execution/headless.py ━━━━━━━━━━ Headless Claude sessions Imports core types via TYPE_CHECKING for ToolContext (no runtime cycle)"] COMMANDS["● execution/commands.py ━━━━━━━━━━ ClaudeHeadlessCmd builder"] SESSION_LOG["● execution/session_log.py ━━━━━━━━━━ Session diagnostics writer"] end subgraph L2 ["L2 — RECIPE (imports L0+L1)"] direction TB SCHEMA["● recipe/schema.py ━━━━━━━━━━ Recipe · RecipeStep · DataFlowWarning RecipeSource (now from L0)"] RULES["● recipe/rules_inputs.py ━━━━━━━━━━ ★ Ingredient validation rules reads GATED_TOOLS from L0 via pipeline re-export"] ANALYSIS["● recipe/_analysis.py ━━━━━━━━━━ Step graph builder"] VALIDATOR["● recipe/validator.py ━━━━━━━━━━ validate_recipe()"] end subgraph L3S ["L3 — SERVER (imports all layers)"] direction TB HELPERS["● server/helpers.py ━━━━━━━━━━ _require_enabled() — reads gate ★ _require_not_headless() Shared by all tool handlers"] TOOLS_EX["● server/tools_execution.py ━━━━━━━━━━ run_cmd · run_python · run_skill ✗ run_recipe REMOVED Uses _require_not_headless()"] TOOLS_GIT["● server/tools_git.py ━━━━━━━━━━ merge_worktree · classify_fix ● check_pr_mergeable (new gate)"] TOOLS_K["● server/tools_kitchen.py ━━━━━━━━━━ open_kitchen · close_kitchen"] FACTORY["● server/_factory.py ━━━━━━━━━━ Composition root Wires ToolContext"] end subgraph L3H ["L3 — HOOKS (stdlib only for guard)"] direction LR HOOK_GUARD["★ hooks/headless_orchestration_guard.py ━━━━━━━━━━ ★ PreToolUse hook (stdlib only) Blocks run_skill/run_cmd/run_python from AUTOSKILLIT_HEADLESS=1 sessions NO autoskillit imports"] PRETTY["● hooks/pretty_output.py ━━━━━━━━━━ PostToolUse response formatter"] end subgraph L3C ["L3 — CLI (imports all layers)"] direction LR CLI_APP["● cli/app.py ━━━━━━━━━━ serve · init · skills · recipes doctor · workspace"] CLI_PROMPTS["● cli/_prompts.py ━━━━━━━━━━ Orchestrator prompt builder"] end TYPES -->|"fan-in ~75"| GATE TYPES -->|"fan-in ~75"| HEADLESS TYPES -->|"fan-in ~75"| SCHEMA COREIO --> PIPEINIT GATE --> PIPEINIT PIPEINIT -->|"gate_error_result headless_error_result"| HELPERS HEADLESS --> HELPERS COMMANDS --> HEADLESS SESSION_LOG --> HELPERS SCHEMA -->|"RecipeSource from L0"| RULES RULES --> VALIDATOR ANALYSIS --> VALIDATOR HELPERS -->|"_require_not_headless"| TOOLS_EX HELPERS --> TOOLS_GIT HELPERS --> TOOLS_K VALIDATOR --> FACTORY PIPEINIT --> FACTORY FACTORY --> CLI_APP FACTORY --> CLI_PROMPTS HOOK_GUARD -.->|"ENV: AUTOSKILLIT_HEADLESS zero autoskillit imports"| TOOLS_EX class TYPES,COREIO stateNode; class GATE,PIPEINIT phase; class HEADLESS,COMMANDS,SESSION_LOG handler; class SCHEMA,RULES,ANALYSIS,VALIDATOR phase; class HELPERS,TOOLS_EX,TOOLS_GIT,TOOLS_K handler; class FACTORY cli; class CLI_APP,CLI_PROMPTS cli; class HOOK_GUARD newComponent; class PRETTY handler; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Teal | L0 Core | High fan-in foundation types (zero reverse deps) | | Purple | L1/L2 Control | Pipeline gate, recipe schema and rules | | Orange | L1/L3 Processors | Execution handlers, server tool handlers | | Dark Blue | L3 CLI | Composition root and CLI entry points | | Green (★) | New Components | headless_orchestration_guard — standalone hook | | Dashed | ENV Signal | OS-level check; no code import relationship | Closes #307 Closes #327 Closes #308 Closes #329 Closes #304 Closes #328 Closes #302 Closes #330 Closes #311 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit --- ## Merge Conflict Resolution The batch branch was rebased onto `integration` to resolve 17 file conflicts. All conflicts arose because PRs #337–#341 were squash-merged into both `integration` (directly) and the batch branch (via the pipeline), while PRs #333 and #342 required conflict resolution work that only exists on the batch branch. **Resolution principle:** Batch branch version wins for all files touched by #333/#342 conflict resolution and remediation, since that state was fully tested (3752 passed). Integration-only additions (e.g. `TestGetQuotaEvents`) were preserved where they don't overlap. ### Per-file decisions | File | Decision | Rationale | |------|----------|-----------| | `CLAUDE.md` | **Batch wins** | Batch has corrected tool inventory (run_recipe removed, get_quota_events added, 25 kitchen tools) | | `core/types.py` | **Batch wins** | Batch splits monolithic UNGATED_TOOLS into WORKER_TOOLS + HEADLESS_BLOCKED_UNGATED_TOOLS; removes run_recipe from GATED_TOOLS | | `execution/__init__.py` | **Batch wins** | Batch removes dead exports (build_subrecipe_cmd, run_subrecipe_session) | | `execution/headless.py` | **Batch wins** | Batch deletes run_subrecipe_session function (530+ lines); keeps run_headless_core with token_log error handling | | `hooks/pretty_output.py` | **Batch wins** | Batch removes run_recipe from _UNFORMATTED_TOOLS, adds get_quota_events | | `recipes/pr-merge-pipeline.yaml` | **Batch wins** | Batch has base_branch required:true, updated kitchen rules (main instead of integration) | | `server/_state.py` | **Batch wins** | Batch adds .telemetry_cleared_at marker reading in _initialize | | `server/helpers.py` | **Batch wins** | Batch removes _run_subrecipe and run_subrecipe_session import; adds _require_not_headless | | `server/tools_git.py` | **Batch wins** | Batch has updated classify_fix with git fetch and check_pr_mergeable gate | | `server/tools_kitchen.py` | **Batch wins** | Batch adds headless gates to open_kitchen/close_kitchen; adds TOOL_CATEGORIES listing | | `server/tools_status.py` | **Merge both** | Batch headless gates + wall_clock_seconds merged with integration's TestGetQuotaEvents (deduplicated) | | `tests/conftest.py` | **Batch wins** | Batch replaces AUTOSKILLIT_KITCHEN_OPEN with AUTOSKILLIT_HEADLESS in fixture | | `tests/execution/test_headless.py` | **Batch wins** | Batch removes run_subrecipe_session tests (deleted code); updates docstring | | `tests/recipe/test_bundled_recipes.py` | **Merge both** | Batch base_branch=main assertions + integration WF7 graph test both kept | | `tests/server/test_tools_kitchen.py` | **Batch wins** | Batch adds headless gate denial tests for open/close kitchen | | `tests/server/test_tools_status.py` | **Merge both** | Batch headless gate tests merged with integration quota events tests | ### Post-rebase fixes - Removed duplicate `TestGetQuotaEvents` class (existed in both batch commit and auto-merged integration code) - Fixed stale `_build_tool_listing` → `_build_tool_category_listing` attribute reference - Added `if diagram: print(diagram)` to `cli/app.py` cook function (test expected terminal output) ### Verification - **3752 passed**, 23 skipped, 0 failures - 7 architecture contracts kept, 0 broken - Pre-commit hooks all pass --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…, Headless Isolation (#404) ## Summary Integration rollup of **43 PRs** (#293–#406) consolidating **62 commits** across **291 files** (+27,909 / −6,040 lines). This release advances AutoSkillit from v0.2.0 to v0.3.1 with GitHub merge queue integration, sub-recipe composition, a PostToolUse output reformatter, headless session isolation guards, and comprehensive pipeline observability — plus 24 new bundled skills, 3 new MCP tools, and 47 new test files. --- ## Major Features ### GitHub Merge Queue Integration (#370, #362, #390) - New `wait_for_merge_queue` MCP tool — polls a PR through GitHub's merge queue until merged, ejected, or timed out (default 600s). Uses REST + GraphQL APIs with stuck-queue detection and auto-merge re-enrollment - New `DefaultMergeQueueWatcher` L1 service (`execution/merge_queue.py`) — never raises; all outcomes are structured results - `parse_merge_queue_response()` pure function for GraphQL queue entry parsing - New `auto_merge` ingredient in `implementation.yaml` and `remediation.yaml` — enrolls PRs in the merge queue after CI passes - Full queue-mode path added to `merge-prs.yaml`: detect queue → enqueue → wait → handle ejections → re-enter - `analyze-prs` skill gains Step 0.5 (merge queue detection) and Step 1.5 (CI/review eligibility filtering) ### Sub-Recipe Composition (#380) - Recipe steps can now reference sub-recipes via `sub_recipe` + `gate` fields — lazy-loaded and merged at validation time - Composition engine in `recipe/_api.py`: `_merge_sub_recipe()` inlines sub-recipe steps with safe name-prefixing and route remapping (`done` → parent's `on_success`, `escalate` → parent's `on_failure`) - `_build_active_recipe()` evaluates gate ingredients against overrides/defaults; dual validation runs on both active and combined recipes - First sub-recipe: `sprint-prefix.yaml` — triage → plan → confirm → dispatch workflow, gated by `sprint_mode` ingredient (hidden, default false) - Both `implementation.yaml` and `remediation.yaml` gain `sprint_entry` placeholder step - New semantic rules: `unknown-sub-recipe` (ERROR), `circular-sub-recipe` (ERROR) with DFS cycle detection ### PostToolUse Output Reformatter (#293, #405) - `pretty_output.py` — new 671-line PostToolUse hook that rewrites raw MCP JSON responses to Markdown-KV before Claude consumes them (30–77% token overhead reduction) - Dedicated formatters for 11 high-traffic tools (`run_skill`, `run_cmd`, `test_check`, `merge_worktree`, `get_token_summary`, etc.) plus a generic KV formatter for remaining tools - Pipeline vs. interactive mode detection via hook config file - Unwraps Claude Code's `{"result": "<json-string>"}` envelope before dispatching - 1,516-line test file with 40+ behavioral tests ### Headless Session Isolation (#359, #393, #397, #405, #406) - **Env isolation**: `build_sanitized_env()` strips `AUTOSKILLIT_PRIVATE_ENV_VARS` from subprocess environments, preventing `AUTOSKILLIT_HEADLESS=1` from leaking into test runners - **CWD path contamination defense**: `_inject_cwd_anchor()` anchors all relative paths to session CWD; `_validate_output_paths()` checks structured output tokens against CWD prefix; `_scan_jsonl_write_paths()` post-session scanner catches actual Write/Edit/Bash tool calls outside CWD - **Headless orchestration guard**: new PreToolUse hook blocks `run_skill`/`run_cmd`/`run_python` when `AUTOSKILLIT_HEADLESS=1`, enforcing Tier 1/Tier 2 nesting invariant - **`_require_not_headless()` server-side guard**: blocks 10 orchestration-only tools from headless sessions at the handler layer - **Unified error response contract**: `headless_error_result()` produces consistent 9-field responses; `_build_headless_error_response()` canonical builder for all failure paths in `tools_integrations.py` ### Cook UX Overhaul (#375, #363) - `open_kitchen` now accepts optional `name` + `overrides` — opens kitchen AND loads recipe in a single call - Pre-launch terminal preview with ANSI-colored flow diagram and ingredients table via new `cli/_ansi.py` module - `--dangerously-skip-permissions` warning banner with interactive confirmation prompt - Randomized session greetings from themed pools - Orchestrator prompt rewritten: recipe YAML no longer injected via `--append-system-prompt`; session calls `open_kitchen('{recipe_name}')` as first action - Conversational ingredient collection replaces mechanical per-field prompting --- ## New MCP Tools | Tool | Gate | Description | |------|------|-------------| | `wait_for_merge_queue` | Kitchen | Polls PR through GitHub merge queue (REST + GraphQL) | | `set_commit_status` | Kitchen | Posts GitHub Commit Status to a SHA for review-first gating | | `get_quota_events` | Ungated | Surfaces quota guard decisions from `quota_events.jsonl` | --- ## Pipeline Observability (#318, #341) - **`TelemetryFormatter`** (`pipeline/telemetry_fmt.py`) — single source of truth for all telemetry rendering; replaces dual-formatter anti-pattern. Four rendering modes: Markdown table, terminal table, compact KV (for PostToolUse hook) - `get_token_summary` and `get_timing_summary` gain `format` parameter (`"json"` | `"table"`) - `wall_clock_seconds` merged into token summary output — see duration alongside token counts in one call - **Telemetry clear marker**: `write_telemetry_clear_marker()` / `read_telemetry_clear_marker()` prevent token accounting drift on MCP server restart after `clear=True` - **Quota event logging**: `quota_check.py` hook now writes structured JSONL events (`cache_miss`, `parse_error`, `blocked`, `approved`) to `quota_events.jsonl` --- ## CI Watcher & Remote Resolution Fixes (#395, #406) - **`CIRunScope` value object** — carries `workflow` + `head_sha` scope; replaces bare `head_sha` parameter across all CI watcher signatures - **Workflow filter**: `wait_for_ci` and `get_ci_status` accept `workflow` parameter (falls back to project-level `config.ci.workflow`), preventing unrelated workflows (version bumps, labelers) from satisfying CI checks - **`FAILED_CONCLUSIONS` expanded**: `failure` → `{failure, timed_out, startup_failure, cancelled}` - **Canonical remote resolver** (`execution/remote_resolver.py`): `resolve_remote_repo()` with `REMOTE_PRECEDENCE = (upstream, origin)` — correctly resolves `owner/repo` after `clone_repo` sets `origin` to `file://` isolation URL - **Clone isolation fix**: `clone_repo` now always clones from remote URL (never local path); sets `origin=file:///<clone>` for isolation and `upstream=<real_url>` for push/CI operations --- ## PR Pipeline Gates (#317, #343) - **`pipeline/pr_gates.py`**: `is_ci_passing()`, `is_review_passing()`, `partition_prs()` — partitions PRs into eligible/CI-blocked/review-blocked with human-readable reasons - **`pipeline/fidelity.py`**: `extract_linked_issues()` (Closes/Fixes/Resolves patterns), `is_valid_fidelity_finding()` schema validation - **`check_pr_mergeable`** now returns `mergeable_status` field alongside boolean - **`release_issue`** gains `target_branch` + `staged_label` parameters for staged issue lifecycle on non-default branches (#392) --- ## Recipe System Changes ### Structural - `RecipeIngredient.hidden` field — excluded from ingredients table (used for internal flags like `sprint_mode`) - `Recipe.experimental` flag parsed from YAML - `_TERMINAL_TARGETS` moved to `schema.py` as single source of truth - `format_ingredients_table()` with sorted display order (required → auto-detect → flags → optional → constants) - Diagram rendering engine (~670 lines) removed from `diagrams.py` — rendering now handled by `/render-recipe` skill; format version bumped to v7 ### Recipe YAML Changes - **Deleted**: `audit-and-fix.yaml`, `batch-implementation.yaml`, `bugfix-loop.yaml` - **Renamed**: `pr-merge-pipeline.yaml` → `merge-prs.yaml` - **`implementation.yaml`**: merge queue steps, `auto_merge`/`sprint_mode` ingredients, `base_branch` default → `""` (auto-detect), CI workflow filter, `extract_pr_number` step - **`remediation.yaml`**: `topic` → `task` rename, merge queue steps, `dry_walkthrough` retries:3 with forward-only routing, `verify` → `test` rename - **`merge-prs.yaml`**: full queue-mode path, `open-integration-pr` step (replaces `create-review-pr`), post-PR mergeability polling, review cycle with `resolve-review` retries ### New Semantic Rules - `missing-output-patterns` (WARNING) — flags `run_skill` steps without `expected_output_patterns` - `unknown-sub-recipe` (ERROR) — validates sub-recipe references exist - `circular-sub-recipe` (ERROR) — DFS cycle detection - `unknown-skill-command` (ERROR) — validates skill names against bundled set - `telemetry-before-open-pr` (WARNING) — ensures telemetry step precedes `open-pr` --- ## New Skills (24) ### Architecture Lens Family (13) `arch-lens-c4-container`, `arch-lens-concurrency`, `arch-lens-data-lineage`, `arch-lens-deployment`, `arch-lens-development`, `arch-lens-error-resilience`, `arch-lens-module-dependency`, `arch-lens-operational`, `arch-lens-process-flow`, `arch-lens-repository-access`, `arch-lens-scenarios`, `arch-lens-security`, `arch-lens-state-lifecycle` ### Audit Family (5) `audit-arch`, `audit-bugs`, `audit-cohesion`, `audit-defense-standards`, `audit-tests` ### Planning & Diagramming (3) `elaborate-phase`, `make-arch-diag`, `make-req` ### Bug/Guard Lifecycle (2) `design-guards`, `verify-diag` ### Pipeline (1) `open-integration-pr` — creates integration PRs with per-PR details, arch-lens diagrams, carried-forward `Closes #N` references, and auto-closes collapsed PRs ### Sprint Planning (1 — gated by sub-recipe) `sprint-planner` — selects a focused, conflict-free sprint from a triage manifest --- ## Skill Modifications (Highlights) - **`analyze-prs`**: merge queue detection, CI/review eligibility filtering, queue-mode ordering - **`dry-walkthrough`**: Step 4.5 Historical Regression Check (git history mining + GitHub issue cross-reference) - **`review-pr`**: deterministic diff annotation via `diff_annotator.py`, echo-primary-obligation step, post-completion confirmation, degraded-mode narration - **`collapse-issues`**: content fidelity enforcement — per-issue `fetch_github_issue` calls, copy-mode body assembly (#388) - **`prepare-issue`**: multi-keyword dedup search, numbered candidate selection, extend-existing-issue flow - **`resolve-review`**: GraphQL thread auto-resolution after addressing findings (#379) - **`resolve-merge-conflicts`**: conflict resolution decision report with per-file log (#389) - **Cross-skill**: output tokens migrated to `key = value` format; code-index paths made generic with fallback notes; arch-lens references fully qualified; anti-prose guards at loop boundaries --- ## CLI & Hooks ### New CLI Commands - `autoskillit install` — plugin installation + cache refresh - `autoskillit upgrade` — `.autoskillit/scripts/` → `.autoskillit/recipes/` migration ### CLI Changes - `doctor`: plugin-aware MCP check, PostToolUse hook scanning, `--fix` flag removed - `init`: GitHub repo prompt, `.secrets.yaml` template, plugin-aware registration - `chefs-hat`: pre-launch banner, `--dangerously-skip-permissions` confirmation - `recipes render`: repurposed from generator to viewer (delegates to `/render-recipe`) - `serve`: server import deferred to after `configure_logging()` to prevent stdout corruption ### New Hooks - `branch_protection_guard.py` (PreToolUse) — denies `merge_worktree`/`push_to_remote` targeting protected branches - `headless_orchestration_guard.py` (PreToolUse) — blocks orchestration tools in headless sessions - `pretty_output.py` (PostToolUse) — MCP JSON → Markdown-KV reformatter ### Hook Infrastructure - `HookDef.event_type` field — registry now handles both PreToolUse and PostToolUse - `generate_hooks_json()` groups entries by event type - `_evict_stale_autoskillit_hooks` and `sync_hooks_to_settings` made event-type-agnostic --- ## Core & Config ### New Core Modules - `core/branch_guard.py` — `is_protected_branch()` pure function - `core/github_url.py` — `parse_github_repo()` + `normalize_owner_repo()` canonical parsers ### Core Type Expansions - `AUTOSKILLIT_PRIVATE_ENV_VARS` frozenset - `WORKER_TOOLS` / `HEADLESS_BLOCKED_UNGATED_TOOLS` split from `UNGATED_TOOLS` - `TOOL_CATEGORIES` — categorized listing for `open_kitchen` response - `CIRunScope` — immutable scope for CI watcher calls - `MergeQueueWatcher` protocol - `SkillResult.cli_subtype` + `write_path_warnings` fields - `SubprocessRunner.env` parameter ### Config - `safety.protected_branches`: `[main, integration, stable]` - `github.staged_label`: `"staged"` - `ci.workflow`: workflow filename filter (e.g., `"tests.yml"`) - `branching.default_base_branch`: `"integration"` → `"main"` - `ModelConfig.default`: `str | None` → `str = "sonnet"` --- ## Infrastructure & Release ### Version - `0.2.0` → `0.3.1` across `pyproject.toml`, `plugin.json`, `uv.lock` - FastMCP dependency: `>=3.0.2` → `>=3.1.1,<4.0` (#399) ### CI/CD Workflows - **`version-bump.yml`** (new) — auto patch-bumps `main` on integration PR merge, force-syncs integration branch one patch ahead - **`release.yml`** (new) — minor version bump + GitHub Release on merge to `stable` - **`codeql.yml`** (new) — CodeQL analysis for `stable` PRs (Python + Actions) - **`tests.yml`** — `merge_group:` trigger added; multi-OS now only for `stable` ### PyPI Readiness - `pyproject.toml`: `readme`, `license`, `authors`, `keywords`, `classifiers`, `project.urls`, `hatch.build.targets.sdist` inclusion list ### readOnlyHint Parallel Execution Fix - All MCP tools annotated `readOnlyHint=True` — enables Claude Code parallel tool execution (~7x speedup). One deliberate exception: `wait_for_merge_queue` uses `readOnlyHint=False` (actually mutates queue state) ### Tool Response Exception Boundary - `track_response_size` decorator catches unhandled exceptions and serializes them as `{"success": false, "subtype": "tool_exception"}` — prevents FastMCP opaque error wrapping ### SkillResult Subtype Normalization (#358) - `_normalize_subtype()` gate eliminates dual-source contradiction between CLI subtype and session outcome - Class 2 upward: `SUCCEEDED + error_subtype → "success"` (drain-race artifact) - Class 1 downward: `non-SUCCEEDED + "success" → "empty_result"` / `"missing_completion_marker"` / `"adjudicated_failure"` --- ## Test Coverage **47 new test files** (+12,703 lines) covering: | Area | Key Tests | |------|-----------| | Merge queue watcher state machine | `test_merge_queue.py` (226 lines) | | Clone isolation × CI resolution | `test_clone_ci_contract.py`, `test_remote_resolver.py` | | PostToolUse hook | `test_pretty_output.py` (1,516 lines, 40+ cases) | | Branch protection + headless guards | `test_branch_protection_guard.py`, `test_headless_orchestration_guard.py` | | Sub-recipe composition | 5 test files (schema, loading, validation, sprint mode × 2) | | Telemetry formatter | `test_telemetry_formatter.py` (281 lines) | | PR pipeline gates | `test_analyze_prs_gates.py`, `test_review_pr_fidelity.py` | | Diff annotator | `test_diff_annotator.py` (242 lines) | | Skill compliance | Output token format, genericization, loop-boundary guards | | Release workflows | Structural contracts for `version-bump.yml`, `release.yml` | | Issue content fidelity | Body-assembling skills must call `fetch_github_issue` per-issue | | CI watcher scope | `test_ci_params.py` — workflow_id query param composition | --- ## Consolidated PRs #293, #295, #314, #315, #316, #317, #318, #319, #323, #332, #336, #337, #338, #339, #341, #343, #351, #358, #359, #360, #361, #362, #363, #366, #368, #370, #375, #377, #378, #379, #380, #388, #389, #390, #391, #392, #393, #395, #396, #397, #399, #405, #406 --- 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Trecek commented Mar 11, 2026

View reviewed changes

Trecek merged commit 79a77b3 into integration Mar 11, 2026
3 checks passed

Trecek deleted the dry-walkthrough-add-historical-regression-check-against-git/307 branch March 11, 2026 20:45

Trecek mentioned this pull request Mar 12, 2026

Integration: collapsed PRs #337, #339, #336, #332, #338, #343, #341, #333, #342 into integration #351

Merged

Trecek mentioned this pull request Mar 15, 2026

Integration v0.3.1: Merge Queue, Sub-Recipes, PostToolUse Reformatter, Headless Isolation #404

Merged

Trecek mentioned this pull request Mar 19, 2026

Promote integration → main (56 PRs, 46 issues) #438

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation Plan: Dry Walkthrough — Test Command Genericization (Issue #307)#337

Implementation Plan: Dry Walkthrough — Test Command Genericization (Issue #307)#337
Trecek merged 1 commit intointegrationfrom
dry-walkthrough-add-historical-regression-check-against-git/307

Trecek commented Mar 11, 2026

Uh oh!

Trecek left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Trecek commented Mar 11, 2026

Summary

Architecture Impact

Process Flow Diagram

Implementation Plan

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant