Skip to content

Implementation Plan: Dry Walkthrough — Test Command Genericization (Issue #307)#337

Merged
Trecek merged 1 commit intointegrationfrom
dry-walkthrough-add-historical-regression-check-against-git/307
Mar 11, 2026
Merged

Implementation Plan: Dry Walkthrough — Test Command Genericization (Issue #307)#337
Trecek merged 1 commit intointegrationfrom
dry-walkthrough-add-historical-regression-check-against-git/307

Conversation

@Trecek
Copy link
Collaborator

@Trecek Trecek commented Mar 11, 2026

Summary

The dry-walkthrough skill's Step 4.5 (Historical Regression Check) is already fully implemented in SKILL.md — including the git history scan, GitHub issues cross-reference, gh auth guard, actionable vs. informational classification, and the ### Historical Context section in Step 7. Contract tests for Step 4.5 exist and pass in tests/skills/test_dry_walkthrough_contracts.py.

The remaining work is REQ-GEN-002 from Issue #311 (consolidated here): replace the hardcoded task test-all references in Step 4 of SKILL.md with a config-driven reference to test_check.command, so the skill validates correctly for any project regardless of its test runner.

Architecture Impact

Process Flow Diagram

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    START([START])
    PASS([PASS — Ready to implement])
    REVISED([REVISED — See changes])

    subgraph Load ["Step 1: Load & Detect"]
        direction TB
        LOAD["Load Plan<br/>━━━━━━━━━━<br/>arg path / pasted / temp/ scan"]
        MULTIPART{"Multi-part plan?<br/>━━━━━━━━━━<br/>filename contains _part_?"}
        SCOPE_WARN["Scope Warning<br/>━━━━━━━━━━<br/>Insert boundary block<br/>Restrict to this part only"]
    end

    subgraph Validate ["Steps 2–3: Validate Phases"]
        direction TB
        PHASE_V["Phase Subagents<br/>━━━━━━━━━━<br/>files exist · fns exist<br/>assumptions correct<br/>wiring complete"]
        XPHASE["Cross-Phase Deps<br/>━━━━━━━━━━<br/>Phase ordering · implicit deps<br/>reorder safety"]
    end

    subgraph Rules ["● Step 4: Validate Against Project Rules"]
        direction TB
        RULE_CHK["Rule Checklist<br/>━━━━━━━━━━<br/>No compat code · no hidden fallbacks<br/>no stakeholder sections · arch patterns"]
        READ_CFG["● Read test_check.command<br/>━━━━━━━━━━<br/>.autoskillit/config.yaml<br/>default: task test-check"]
        TEST_CMD{"● Non-config test<br/>command in plan?<br/>━━━━━━━━━━<br/>pytest / python -m pytest<br/>make test etc."}
        WORKTREE{"Hardcoded worktree<br/>setup in plan?<br/>━━━━━━━━━━<br/>uv venv / pip install etc."}
    end

    subgraph History ["Step 4.5: Historical Regression Check"]
        direction TB
        GH_AUTH{"gh auth status<br/>━━━━━━━━━━<br/>GitHub auth available?"}
        GIT_SCAN["Git History Scan<br/>━━━━━━━━━━<br/>git log -100 on PLAN_FILES<br/>fix/revert/remove/replace/delete"]
        GIT_SIG{"Signal strength?<br/>━━━━━━━━━━<br/>Symbol-level match?"}
        GH_ISSUES["GitHub Issues XRef<br/>━━━━━━━━━━<br/>open + closed (last 30d)<br/>keyword cross-reference"]
        GH_MATCH{"Issue type?<br/>━━━━━━━━━━<br/>open vs closed match"}
    end

    subgraph Fix ["Steps 5–6: Fix & Mark"]
        direction TB
        FIX["Fix the Plan<br/>━━━━━━━━━━<br/>Direct edits to plan file<br/>No gap-analysis sections"]
        MARK["Mark Verified<br/>━━━━━━━━━━<br/>Dry-walkthrough verified = TRUE<br/>(first line of plan)"]
    end

    REPORT["● Step 7: Report to Terminal<br/>━━━━━━━━━━<br/>Changes Made · Verified<br/>### Historical Context · Recommendation"]

    %% FLOW %%
    START --> LOAD
    LOAD --> MULTIPART
    MULTIPART -->|"YES"| SCOPE_WARN
    MULTIPART -->|"NO"| PHASE_V
    SCOPE_WARN --> PHASE_V
    PHASE_V --> XPHASE
    XPHASE --> RULE_CHK
    RULE_CHK --> READ_CFG
    READ_CFG --> TEST_CMD
    TEST_CMD -->|"YES — replace"| FIX
    TEST_CMD -->|"NO"| WORKTREE
    WORKTREE -->|"YES — flag & replace"| FIX
    WORKTREE -->|"NO"| GH_AUTH
    GH_AUTH -->|"available"| GIT_SCAN
    GH_AUTH -->|"unavailable"| GIT_SCAN
    GIT_SCAN --> GIT_SIG
    GIT_SIG -->|"strong — actionable"| FIX
    GIT_SIG -->|"weak — informational"| REPORT
    GH_AUTH -->|"available"| GH_ISSUES
    GH_ISSUES --> GH_MATCH
    GH_MATCH -->|"closed issue — actionable"| FIX
    GH_MATCH -->|"open issue — informational"| REPORT
    FIX --> MARK
    MARK --> REPORT
    REPORT -->|"issues found"| REVISED
    REPORT -->|"no issues"| PASS

    %% CLASS ASSIGNMENTS %%
    class START,PASS,REVISED terminal;
    class LOAD,PHASE_V,XPHASE,GIT_SCAN,GH_ISSUES handler;
    class RULE_CHK,READ_CFG phase;
    class MULTIPART,TEST_CMD,WORKTREE,GH_AUTH,GIT_SIG,GH_MATCH stateNode;
    class FIX,MARK detector;
    class SCOPE_WARN output;
    class REPORT output;
Loading

● Modified component | ★ New component

Closes #307

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/impl-307-20260310-203646-026624/temp/make-plan/dry_walkthrough_historical_regression_plan_2026-03-10_120000.md

🤖 Generated with Claude Code via AutoSkillit

Replace hardcoded `task test-all` references in Step 4 of dry-walkthrough
SKILL.md with config-driven references to `test_check.command` from
.autoskillit/config.yaml. Add two contract tests that enforce the new
behaviour and fail against the old hardcoded wording.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit review passed. No blocking issues found. (Self-review — cannot approve own PR)

@Trecek Trecek merged commit 79a77b3 into integration Mar 11, 2026
3 checks passed
@Trecek Trecek deleted the dry-walkthrough-add-historical-regression-check-against-git/307 branch March 11, 2026 20:45
Trecek added a commit that referenced this pull request Mar 12, 2026
…333, #342 into integration (#351)

## Integration Summary

Collapsed 9 PRs into `pr-batch/pr-merge-20260311-133920` targeting
`integration`.

## Merged PRs

| # | Title | Complexity | Additions | Deletions | Overlaps |
|---|-------|-----------|-----------|-----------|---------|
| #337 | Implementation Plan: Dry Walkthrough — Test Command
Genericization (Issue #307) | simple | +29 | -2 | — |
| #339 | Implementation Plan: Release CI — Force-Push Integration
Back-Sync | simple | +88 | -45 | — |
| #336 | Enhance prepare-issue with Duplicate Detection and Broader
Triggers | needs_check | +161 | -8 | — |
| #332 | Rectify: Display Output Bugs #329 — Terminal Targets
Consolidation — PART A ONLY | needs_check | +783 | -13 | — |
| #338 | Implementation Plan: Pre-release Readiness — Stability Fixes |
needs_check | +238 | -36 | — |
| #343 | Implementation Plan: PR Pipeline Gates — Mergeability Gate and
Review Cycle | needs_check | +384 | -5 | #338 |
| #341 | Pipeline observability: quota events, wall-clock timing, drift
fix | needs_check | +480 | -5 | #332, #338 |
| #333 | Remove run_recipe — Eliminate Sub-Orchestrator Pattern |
needs_check | +538 | -655 | #332, #338, #341 |
| #342 | feat: genericize codebase and bundle external dependencies for
public release | needs_check | +5286 | -1062 | #332, #333, #338, #341,
#343 |

## Audit

**Verdict:** GO

## Architecture Impact

### Development Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    subgraph SourceTree ["PROJECT STRUCTURE (● = modified)"]
        direction TB
        SRC["● src/autoskillit/<br/>━━━━━━━━━━<br/>105 .py source files<br/>cli · config · core<br/>execution · hooks · pipeline<br/>recipe · server · workspace"]
        SKILLS["● + ★ src/autoskillit/skills/<br/>━━━━━━━━━━<br/>52 bundled skills<br/>★ 13 arch-lens-* SKILL.md added<br/>★ 3 audit-* SKILL.md added<br/>● 14 existing skills updated"]
        RECIPES["● src/autoskillit/recipes/<br/>━━━━━━━━━━<br/>8 bundled YAML recipes<br/>All recipes updated"]
        TESTS["● + ★ tests/<br/>━━━━━━━━━━<br/>173 .py test files<br/>★ 6 new test files added"]
    end

    subgraph Build ["BUILD TOOLING"]
        direction TB
        PYPROJECT["● pyproject.toml<br/>━━━━━━━━━━<br/>hatchling build backend<br/>uv package manager<br/>10 runtime deps"]
        TASKFILE["Taskfile.yml<br/>━━━━━━━━━━<br/>test-all · test-check<br/>test-smoke · install-worktree"]
    end

    subgraph Quality ["CODE QUALITY GATES"]
        direction TB
        RFMT["ruff-format<br/>━━━━━━━━━━<br/>Auto-fix formatting"]
        RLINT["ruff<br/>━━━━━━━━━━<br/>Lint + auto-fix"]
        MYPY["mypy src/<br/>━━━━━━━━━━<br/>--ignore-missing-imports"]
        UVLOCK["uv lock --check<br/>━━━━━━━━━━<br/>Lock file integrity"]
        SECRETS["gitleaks<br/>━━━━━━━━━━<br/>Secret scanning"]
        GUARD["★ headless_orchestration_guard.py<br/>━━━━━━━━━━<br/>★ PreToolUse hook<br/>Blocks run_skill/run_cmd/run_python<br/>from headless sessions"]
    end

    subgraph Testing ["TEST FRAMEWORK"]
        direction TB
        PYTEST["pytest + asyncio_mode=auto<br/>━━━━━━━━━━<br/>xdist -n 4 parallel<br/>timeout=60s signal method"]
        NEWTEST["★ New Test Files<br/>━━━━━━━━━━<br/>★ test_headless_orchestration_guard<br/>★ test_audit_and_fix_degradation<br/>★ test_rules_inputs<br/>★ test_skill_genericization<br/>★ test_pyproject_metadata<br/>★ test_release_sanity"]
    end

    subgraph CI ["CI/CD WORKFLOWS"]
        direction LR
        TESTS_WF["tests.yml<br/>━━━━━━━━━━<br/>PR test gate"]
        RELEASE_WF["release.yml<br/>━━━━━━━━━━<br/>Release automation"]
        BUMP_WF["● version-bump.yml<br/>━━━━━━━━━━<br/>● Force-push back-sync<br/>integration → main"]
    end

    subgraph EntryPoints ["ENTRY POINTS"]
        EP["autoskillit CLI<br/>━━━━━━━━━━<br/>serve · init · skills<br/>recipes · doctor · workspace"]
    end

    SRC --> PYPROJECT
    SKILLS --> PYPROJECT
    TESTS --> PYTEST
    PYPROJECT --> TASKFILE
    PYPROJECT --> RFMT
    RFMT --> RLINT
    RLINT --> MYPY
    MYPY --> UVLOCK
    UVLOCK --> SECRETS
    SECRETS --> GUARD
    GUARD --> PYTEST
    PYTEST --> NEWTEST
    NEWTEST --> BUMP_WF
    TESTS_WF --> PYTEST
    PYPROJECT --> EP

    class SRC,TESTS stateNode;
    class SKILLS,RECIPES newComponent;
    class PYPROJECT,TASKFILE phase;
    class RFMT,RLINT,MYPY,UVLOCK,SECRETS detector;
    class GUARD newComponent;
    class PYTEST handler;
    class NEWTEST newComponent;
    class TESTS_WF,RELEASE_WF phase;
    class BUMP_WF newComponent;
    class EP output;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Teal | Structure | Source directories and test suite |
| Green (★) | New/Modified | New files and components added in this PR |
| Purple | Build | Build configuration and task automation |
| Red | Quality Gates | Pre-commit hooks, linters, type checker |
| Orange | Test Runner | pytest execution engine |
| Dark Teal | Entry Points | CLI commands |

### Module Dependency Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%%
graph TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;

    subgraph L0 ["L0 — CORE (zero autoskillit imports)"]
        direction LR
        TYPES["● core/types.py<br/>━━━━━━━━━━<br/>GATED_TOOLS · UNGATED_TOOLS<br/>RecipeSource (★ promoted here)<br/>ClaudeFlags · StrEnums<br/>fan-in: ~75 files"]
        COREIO["core/io.py · logging.py · paths.py<br/>━━━━━━━━━━<br/>Atomic write · Logger · pkg_root()"]
    end

    subgraph L1P ["L1 — PIPELINE (imports L0 only)"]
        direction TB
        GATE["● pipeline/gate.py<br/>━━━━━━━━━━<br/>DefaultGateState<br/>gate_error_result()<br/>★ headless_error_result()<br/>re-exports GATED/UNGATED_TOOLS"]
        PIPEINIT["● pipeline/__init__.py<br/>━━━━━━━━━━<br/>Re-exports public surface<br/>ToolContext · AuditLog<br/>TokenLog · DefaultGateState"]
    end

    subgraph L1E ["L1 — EXECUTION (imports L0 only)"]
        direction TB
        HEADLESS["● execution/headless.py<br/>━━━━━━━━━━<br/>Headless Claude sessions<br/>Imports core types via TYPE_CHECKING<br/>for ToolContext (no runtime cycle)"]
        COMMANDS["● execution/commands.py<br/>━━━━━━━━━━<br/>ClaudeHeadlessCmd builder"]
        SESSION_LOG["● execution/session_log.py<br/>━━━━━━━━━━<br/>Session diagnostics writer"]
    end

    subgraph L2 ["L2 — RECIPE (imports L0+L1)"]
        direction TB
        SCHEMA["● recipe/schema.py<br/>━━━━━━━━━━<br/>Recipe · RecipeStep · DataFlowWarning<br/>RecipeSource (now from L0)"]
        RULES["● recipe/rules_inputs.py<br/>━━━━━━━━━━<br/>★ Ingredient validation rules<br/>reads GATED_TOOLS from L0 via<br/>pipeline re-export"]
        ANALYSIS["● recipe/_analysis.py<br/>━━━━━━━━━━<br/>Step graph builder"]
        VALIDATOR["● recipe/validator.py<br/>━━━━━━━━━━<br/>validate_recipe()"]
    end

    subgraph L3S ["L3 — SERVER (imports all layers)"]
        direction TB
        HELPERS["● server/helpers.py<br/>━━━━━━━━━━<br/>_require_enabled() — reads gate<br/>★ _require_not_headless()<br/>Shared by all tool handlers"]
        TOOLS_EX["● server/tools_execution.py<br/>━━━━━━━━━━<br/>run_cmd · run_python · run_skill<br/>✗ run_recipe REMOVED<br/>Uses _require_not_headless()"]
        TOOLS_GIT["● server/tools_git.py<br/>━━━━━━━━━━<br/>merge_worktree · classify_fix<br/>● check_pr_mergeable (new gate)"]
        TOOLS_K["● server/tools_kitchen.py<br/>━━━━━━━━━━<br/>open_kitchen · close_kitchen"]
        FACTORY["● server/_factory.py<br/>━━━━━━━━━━<br/>Composition root<br/>Wires ToolContext"]
    end

    subgraph L3H ["L3 — HOOKS (stdlib only for guard)"]
        direction LR
        HOOK_GUARD["★ hooks/headless_orchestration_guard.py<br/>━━━━━━━━━━<br/>★ PreToolUse hook (stdlib only)<br/>Blocks run_skill/run_cmd/run_python<br/>from AUTOSKILLIT_HEADLESS=1 sessions<br/>NO autoskillit imports"]
        PRETTY["● hooks/pretty_output.py<br/>━━━━━━━━━━<br/>PostToolUse response formatter"]
    end

    subgraph L3C ["L3 — CLI (imports all layers)"]
        direction LR
        CLI_APP["● cli/app.py<br/>━━━━━━━━━━<br/>serve · init · skills · recipes<br/>doctor · workspace"]
        CLI_PROMPTS["● cli/_prompts.py<br/>━━━━━━━━━━<br/>Orchestrator prompt builder"]
    end

    TYPES -->|"fan-in ~75"| GATE
    TYPES -->|"fan-in ~75"| HEADLESS
    TYPES -->|"fan-in ~75"| SCHEMA
    COREIO --> PIPEINIT
    GATE --> PIPEINIT
    PIPEINIT -->|"gate_error_result<br/>headless_error_result"| HELPERS
    HEADLESS --> HELPERS
    COMMANDS --> HEADLESS
    SESSION_LOG --> HELPERS
    SCHEMA -->|"RecipeSource from L0"| RULES
    RULES --> VALIDATOR
    ANALYSIS --> VALIDATOR
    HELPERS -->|"_require_not_headless"| TOOLS_EX
    HELPERS --> TOOLS_GIT
    HELPERS --> TOOLS_K
    VALIDATOR --> FACTORY
    PIPEINIT --> FACTORY
    FACTORY --> CLI_APP
    FACTORY --> CLI_PROMPTS
    HOOK_GUARD -.->|"ENV: AUTOSKILLIT_HEADLESS<br/>zero autoskillit imports"| TOOLS_EX

    class TYPES,COREIO stateNode;
    class GATE,PIPEINIT phase;
    class HEADLESS,COMMANDS,SESSION_LOG handler;
    class SCHEMA,RULES,ANALYSIS,VALIDATOR phase;
    class HELPERS,TOOLS_EX,TOOLS_GIT,TOOLS_K handler;
    class FACTORY cli;
    class CLI_APP,CLI_PROMPTS cli;
    class HOOK_GUARD newComponent;
    class PRETTY handler;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Teal | L0 Core | High fan-in foundation types (zero reverse deps) |
| Purple | L1/L2 Control | Pipeline gate, recipe schema and rules |
| Orange | L1/L3 Processors | Execution handlers, server tool handlers |
| Dark Blue | L3 CLI | Composition root and CLI entry points |
| Green (★) | New Components | headless_orchestration_guard — standalone
hook |
| Dashed | ENV Signal | OS-level check; no code import relationship |

Closes #307
Closes #327
Closes #308
Closes #329
Closes #304
Closes #328
Closes #302
Closes #330
Closes #311

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit


---

## Merge Conflict Resolution

The batch branch was rebased onto `integration` to resolve 17 file
conflicts. All conflicts arose because PRs #337#341 were squash-merged
into both `integration` (directly) and the batch branch (via the
pipeline), while PRs #333 and #342 required conflict resolution work
that only exists on the batch branch.

**Resolution principle:** Batch branch version wins for all files
touched by #333/#342 conflict resolution and remediation, since that
state was fully tested (3752 passed). Integration-only additions (e.g.
`TestGetQuotaEvents`) were preserved where they don't overlap.

### Per-file decisions

| File | Decision | Rationale |
|------|----------|-----------|
| `CLAUDE.md` | **Batch wins** | Batch has corrected tool inventory
(run_recipe removed, get_quota_events added, 25 kitchen tools) |
| `core/types.py` | **Batch wins** | Batch splits monolithic
UNGATED_TOOLS into WORKER_TOOLS + HEADLESS_BLOCKED_UNGATED_TOOLS;
removes run_recipe from GATED_TOOLS |
| `execution/__init__.py` | **Batch wins** | Batch removes dead exports
(build_subrecipe_cmd, run_subrecipe_session) |
| `execution/headless.py` | **Batch wins** | Batch deletes
run_subrecipe_session function (530+ lines); keeps run_headless_core
with token_log error handling |
| `hooks/pretty_output.py` | **Batch wins** | Batch removes run_recipe
from _UNFORMATTED_TOOLS, adds get_quota_events |
| `recipes/pr-merge-pipeline.yaml` | **Batch wins** | Batch has
base_branch required:true, updated kitchen rules (main instead of
integration) |
| `server/_state.py` | **Batch wins** | Batch adds .telemetry_cleared_at
marker reading in _initialize |
| `server/helpers.py` | **Batch wins** | Batch removes _run_subrecipe
and run_subrecipe_session import; adds _require_not_headless |
| `server/tools_git.py` | **Batch wins** | Batch has updated
classify_fix with git fetch and check_pr_mergeable gate |
| `server/tools_kitchen.py` | **Batch wins** | Batch adds headless gates
to open_kitchen/close_kitchen; adds TOOL_CATEGORIES listing |
| `server/tools_status.py` | **Merge both** | Batch headless gates +
wall_clock_seconds merged with integration's TestGetQuotaEvents
(deduplicated) |
| `tests/conftest.py` | **Batch wins** | Batch replaces
AUTOSKILLIT_KITCHEN_OPEN with AUTOSKILLIT_HEADLESS in fixture |
| `tests/execution/test_headless.py` | **Batch wins** | Batch removes
run_subrecipe_session tests (deleted code); updates docstring |
| `tests/recipe/test_bundled_recipes.py` | **Merge both** | Batch
base_branch=main assertions + integration WF7 graph test both kept |
| `tests/server/test_tools_kitchen.py` | **Batch wins** | Batch adds
headless gate denial tests for open/close kitchen |
| `tests/server/test_tools_status.py` | **Merge both** | Batch headless
gate tests merged with integration quota events tests |

### Post-rebase fixes
- Removed duplicate `TestGetQuotaEvents` class (existed in both batch
commit and auto-merged integration code)
- Fixed stale `_build_tool_listing` → `_build_tool_category_listing`
attribute reference
- Added `if diagram: print(diagram)` to `cli/app.py` cook function (test
expected terminal output)

### Verification
- **3752 passed**, 23 skipped, 0 failures
- 7 architecture contracts kept, 0 broken
- Pre-commit hooks all pass

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Trecek added a commit that referenced this pull request Mar 15, 2026
…, Headless Isolation (#404)

## Summary

Integration rollup of **43 PRs** (#293#406) consolidating **62
commits** across **291 files** (+27,909 / −6,040 lines). This release
advances AutoSkillit from v0.2.0 to v0.3.1 with GitHub merge queue
integration, sub-recipe composition, a PostToolUse output reformatter,
headless session isolation guards, and comprehensive pipeline
observability — plus 24 new bundled skills, 3 new MCP tools, and 47 new
test files.

---

## Major Features

### GitHub Merge Queue Integration (#370, #362, #390)
- New `wait_for_merge_queue` MCP tool — polls a PR through GitHub's
merge queue until merged, ejected, or timed out (default 600s). Uses
REST + GraphQL APIs with stuck-queue detection and auto-merge
re-enrollment
- New `DefaultMergeQueueWatcher` L1 service (`execution/merge_queue.py`)
— never raises; all outcomes are structured results
- `parse_merge_queue_response()` pure function for GraphQL queue entry
parsing
- New `auto_merge` ingredient in `implementation.yaml` and
`remediation.yaml` — enrolls PRs in the merge queue after CI passes
- Full queue-mode path added to `merge-prs.yaml`: detect queue → enqueue
→ wait → handle ejections → re-enter
- `analyze-prs` skill gains Step 0.5 (merge queue detection) and Step
1.5 (CI/review eligibility filtering)

### Sub-Recipe Composition (#380)
- Recipe steps can now reference sub-recipes via `sub_recipe` + `gate`
fields — lazy-loaded and merged at validation time
- Composition engine in `recipe/_api.py`: `_merge_sub_recipe()` inlines
sub-recipe steps with safe name-prefixing and route remapping (`done` →
parent's `on_success`, `escalate` → parent's `on_failure`)
- `_build_active_recipe()` evaluates gate ingredients against
overrides/defaults; dual validation runs on both active and combined
recipes
- First sub-recipe: `sprint-prefix.yaml` — triage → plan → confirm →
dispatch workflow, gated by `sprint_mode` ingredient (hidden, default
false)
- Both `implementation.yaml` and `remediation.yaml` gain `sprint_entry`
placeholder step
- New semantic rules: `unknown-sub-recipe` (ERROR),
`circular-sub-recipe` (ERROR) with DFS cycle detection

### PostToolUse Output Reformatter (#293, #405)
- `pretty_output.py` — new 671-line PostToolUse hook that rewrites raw
MCP JSON responses to Markdown-KV before Claude consumes them (30–77%
token overhead reduction)
- Dedicated formatters for 11 high-traffic tools (`run_skill`,
`run_cmd`, `test_check`, `merge_worktree`, `get_token_summary`, etc.)
plus a generic KV formatter for remaining tools
- Pipeline vs. interactive mode detection via hook config file
- Unwraps Claude Code's `{"result": "<json-string>"}` envelope before
dispatching
- 1,516-line test file with 40+ behavioral tests

### Headless Session Isolation (#359, #393, #397, #405, #406)
- **Env isolation**: `build_sanitized_env()` strips
`AUTOSKILLIT_PRIVATE_ENV_VARS` from subprocess environments, preventing
`AUTOSKILLIT_HEADLESS=1` from leaking into test runners
- **CWD path contamination defense**: `_inject_cwd_anchor()` anchors all
relative paths to session CWD; `_validate_output_paths()` checks
structured output tokens against CWD prefix; `_scan_jsonl_write_paths()`
post-session scanner catches actual Write/Edit/Bash tool calls outside
CWD
- **Headless orchestration guard**: new PreToolUse hook blocks
`run_skill`/`run_cmd`/`run_python` when `AUTOSKILLIT_HEADLESS=1`,
enforcing Tier 1/Tier 2 nesting invariant
- **`_require_not_headless()` server-side guard**: blocks 10
orchestration-only tools from headless sessions at the handler layer
- **Unified error response contract**: `headless_error_result()`
produces consistent 9-field responses;
`_build_headless_error_response()` canonical builder for all failure
paths in `tools_integrations.py`

### Cook UX Overhaul (#375, #363)
- `open_kitchen` now accepts optional `name` + `overrides` — opens
kitchen AND loads recipe in a single call
- Pre-launch terminal preview with ANSI-colored flow diagram and
ingredients table via new `cli/_ansi.py` module
- `--dangerously-skip-permissions` warning banner with interactive
confirmation prompt
- Randomized session greetings from themed pools
- Orchestrator prompt rewritten: recipe YAML no longer injected via
`--append-system-prompt`; session calls `open_kitchen('{recipe_name}')`
as first action
- Conversational ingredient collection replaces mechanical per-field
prompting

---

## New MCP Tools

| Tool | Gate | Description |
|------|------|-------------|
| `wait_for_merge_queue` | Kitchen | Polls PR through GitHub merge queue
(REST + GraphQL) |
| `set_commit_status` | Kitchen | Posts GitHub Commit Status to a SHA
for review-first gating |
| `get_quota_events` | Ungated | Surfaces quota guard decisions from
`quota_events.jsonl` |

---

## Pipeline Observability (#318, #341)

- **`TelemetryFormatter`** (`pipeline/telemetry_fmt.py`) — single source
of truth for all telemetry rendering; replaces dual-formatter
anti-pattern. Four rendering modes: Markdown table, terminal table,
compact KV (for PostToolUse hook)
- `get_token_summary` and `get_timing_summary` gain `format` parameter
(`"json"` | `"table"`)
- `wall_clock_seconds` merged into token summary output — see duration
alongside token counts in one call
- **Telemetry clear marker**: `write_telemetry_clear_marker()` /
`read_telemetry_clear_marker()` prevent token accounting drift on MCP
server restart after `clear=True`
- **Quota event logging**: `quota_check.py` hook now writes structured
JSONL events (`cache_miss`, `parse_error`, `blocked`, `approved`) to
`quota_events.jsonl`

---

## CI Watcher & Remote Resolution Fixes (#395, #406)

- **`CIRunScope` value object** — carries `workflow` + `head_sha` scope;
replaces bare `head_sha` parameter across all CI watcher signatures
- **Workflow filter**: `wait_for_ci` and `get_ci_status` accept
`workflow` parameter (falls back to project-level `config.ci.workflow`),
preventing unrelated workflows (version bumps, labelers) from satisfying
CI checks
- **`FAILED_CONCLUSIONS` expanded**: `failure` → `{failure, timed_out,
startup_failure, cancelled}`
- **Canonical remote resolver** (`execution/remote_resolver.py`):
`resolve_remote_repo()` with `REMOTE_PRECEDENCE = (upstream, origin)` —
correctly resolves `owner/repo` after `clone_repo` sets `origin` to
`file://` isolation URL
- **Clone isolation fix**: `clone_repo` now always clones from remote
URL (never local path); sets `origin=file:///<clone>` for isolation and
`upstream=<real_url>` for push/CI operations

---

## PR Pipeline Gates (#317, #343)

- **`pipeline/pr_gates.py`**: `is_ci_passing()`, `is_review_passing()`,
`partition_prs()` — partitions PRs into
eligible/CI-blocked/review-blocked with human-readable reasons
- **`pipeline/fidelity.py`**: `extract_linked_issues()`
(Closes/Fixes/Resolves patterns), `is_valid_fidelity_finding()` schema
validation
- **`check_pr_mergeable`** now returns `mergeable_status` field
alongside boolean
- **`release_issue`** gains `target_branch` + `staged_label` parameters
for staged issue lifecycle on non-default branches (#392)

---

## Recipe System Changes

### Structural
- `RecipeIngredient.hidden` field — excluded from ingredients table
(used for internal flags like `sprint_mode`)
- `Recipe.experimental` flag parsed from YAML
- `_TERMINAL_TARGETS` moved to `schema.py` as single source of truth
- `format_ingredients_table()` with sorted display order (required →
auto-detect → flags → optional → constants)
- Diagram rendering engine (~670 lines) removed from `diagrams.py` —
rendering now handled by `/render-recipe` skill; format version bumped
to v7

### Recipe YAML Changes
- **Deleted**: `audit-and-fix.yaml`, `batch-implementation.yaml`,
`bugfix-loop.yaml`
- **Renamed**: `pr-merge-pipeline.yaml` → `merge-prs.yaml`
- **`implementation.yaml`**: merge queue steps,
`auto_merge`/`sprint_mode` ingredients, `base_branch` default → `""`
(auto-detect), CI workflow filter, `extract_pr_number` step
- **`remediation.yaml`**: `topic` → `task` rename, merge queue steps,
`dry_walkthrough` retries:3 with forward-only routing, `verify` → `test`
rename
- **`merge-prs.yaml`**: full queue-mode path, `open-integration-pr` step
(replaces `create-review-pr`), post-PR mergeability polling, review
cycle with `resolve-review` retries

### New Semantic Rules
- `missing-output-patterns` (WARNING) — flags `run_skill` steps without
`expected_output_patterns`
- `unknown-sub-recipe` (ERROR) — validates sub-recipe references exist
- `circular-sub-recipe` (ERROR) — DFS cycle detection
- `unknown-skill-command` (ERROR) — validates skill names against
bundled set
- `telemetry-before-open-pr` (WARNING) — ensures telemetry step precedes
`open-pr`

---

## New Skills (24)

### Architecture Lens Family (13)
`arch-lens-c4-container`, `arch-lens-concurrency`,
`arch-lens-data-lineage`, `arch-lens-deployment`,
`arch-lens-development`, `arch-lens-error-resilience`,
`arch-lens-module-dependency`, `arch-lens-operational`,
`arch-lens-process-flow`, `arch-lens-repository-access`,
`arch-lens-scenarios`, `arch-lens-security`, `arch-lens-state-lifecycle`

### Audit Family (5)
`audit-arch`, `audit-bugs`, `audit-cohesion`, `audit-defense-standards`,
`audit-tests`

### Planning & Diagramming (3)
`elaborate-phase`, `make-arch-diag`, `make-req`

### Bug/Guard Lifecycle (2)
`design-guards`, `verify-diag`

### Pipeline (1)
`open-integration-pr` — creates integration PRs with per-PR details,
arch-lens diagrams, carried-forward `Closes #N` references, and
auto-closes collapsed PRs

### Sprint Planning (1 — gated by sub-recipe)
`sprint-planner` — selects a focused, conflict-free sprint from a triage
manifest

---

## Skill Modifications (Highlights)

- **`analyze-prs`**: merge queue detection, CI/review eligibility
filtering, queue-mode ordering
- **`dry-walkthrough`**: Step 4.5 Historical Regression Check (git
history mining + GitHub issue cross-reference)
- **`review-pr`**: deterministic diff annotation via
`diff_annotator.py`, echo-primary-obligation step, post-completion
confirmation, degraded-mode narration
- **`collapse-issues`**: content fidelity enforcement — per-issue
`fetch_github_issue` calls, copy-mode body assembly (#388)
- **`prepare-issue`**: multi-keyword dedup search, numbered candidate
selection, extend-existing-issue flow
- **`resolve-review`**: GraphQL thread auto-resolution after addressing
findings (#379)
- **`resolve-merge-conflicts`**: conflict resolution decision report
with per-file log (#389)
- **Cross-skill**: output tokens migrated to `key = value` format;
code-index paths made generic with fallback notes; arch-lens references
fully qualified; anti-prose guards at loop boundaries

---

## CLI & Hooks

### New CLI Commands
- `autoskillit install` — plugin installation + cache refresh
- `autoskillit upgrade` — `.autoskillit/scripts/` →
`.autoskillit/recipes/` migration

### CLI Changes
- `doctor`: plugin-aware MCP check, PostToolUse hook scanning, `--fix`
flag removed
- `init`: GitHub repo prompt, `.secrets.yaml` template, plugin-aware
registration
- `chefs-hat`: pre-launch banner, `--dangerously-skip-permissions`
confirmation
- `recipes render`: repurposed from generator to viewer (delegates to
`/render-recipe`)
- `serve`: server import deferred to after `configure_logging()` to
prevent stdout corruption

### New Hooks
- `branch_protection_guard.py` (PreToolUse) — denies
`merge_worktree`/`push_to_remote` targeting protected branches
- `headless_orchestration_guard.py` (PreToolUse) — blocks orchestration
tools in headless sessions
- `pretty_output.py` (PostToolUse) — MCP JSON → Markdown-KV reformatter

### Hook Infrastructure
- `HookDef.event_type` field — registry now handles both PreToolUse and
PostToolUse
- `generate_hooks_json()` groups entries by event type
- `_evict_stale_autoskillit_hooks` and `sync_hooks_to_settings` made
event-type-agnostic

---

## Core & Config

### New Core Modules
- `core/branch_guard.py` — `is_protected_branch()` pure function
- `core/github_url.py` — `parse_github_repo()` +
`normalize_owner_repo()` canonical parsers

### Core Type Expansions
- `AUTOSKILLIT_PRIVATE_ENV_VARS` frozenset
- `WORKER_TOOLS` / `HEADLESS_BLOCKED_UNGATED_TOOLS` split from
`UNGATED_TOOLS`
- `TOOL_CATEGORIES` — categorized listing for `open_kitchen` response
- `CIRunScope` — immutable scope for CI watcher calls
- `MergeQueueWatcher` protocol
- `SkillResult.cli_subtype` + `write_path_warnings` fields
- `SubprocessRunner.env` parameter

### Config
- `safety.protected_branches`: `[main, integration, stable]`
- `github.staged_label`: `"staged"`
- `ci.workflow`: workflow filename filter (e.g., `"tests.yml"`)
- `branching.default_base_branch`: `"integration"` → `"main"`
- `ModelConfig.default`: `str | None` → `str = "sonnet"`

---

## Infrastructure & Release

### Version
- `0.2.0` → `0.3.1` across `pyproject.toml`, `plugin.json`, `uv.lock`
- FastMCP dependency: `>=3.0.2` → `>=3.1.1,<4.0` (#399)

### CI/CD Workflows
- **`version-bump.yml`** (new) — auto patch-bumps `main` on integration
PR merge, force-syncs integration branch one patch ahead
- **`release.yml`** (new) — minor version bump + GitHub Release on merge
to `stable`
- **`codeql.yml`** (new) — CodeQL analysis for `stable` PRs (Python +
Actions)
- **`tests.yml`** — `merge_group:` trigger added; multi-OS now only for
`stable`

### PyPI Readiness
- `pyproject.toml`: `readme`, `license`, `authors`, `keywords`,
`classifiers`, `project.urls`, `hatch.build.targets.sdist` inclusion
list

### readOnlyHint Parallel Execution Fix
- All MCP tools annotated `readOnlyHint=True` — enables Claude Code
parallel tool execution (~7x speedup). One deliberate exception:
`wait_for_merge_queue` uses `readOnlyHint=False` (actually mutates queue
state)

### Tool Response Exception Boundary
- `track_response_size` decorator catches unhandled exceptions and
serializes them as `{"success": false, "subtype": "tool_exception"}` —
prevents FastMCP opaque error wrapping

### SkillResult Subtype Normalization (#358)
- `_normalize_subtype()` gate eliminates dual-source contradiction
between CLI subtype and session outcome
- Class 2 upward: `SUCCEEDED + error_subtype → "success"` (drain-race
artifact)
- Class 1 downward: `non-SUCCEEDED + "success" → "empty_result"` /
`"missing_completion_marker"` / `"adjudicated_failure"`

---

## Test Coverage

**47 new test files** (+12,703 lines) covering:

| Area | Key Tests |
|------|-----------|
| Merge queue watcher state machine | `test_merge_queue.py` (226 lines)
|
| Clone isolation × CI resolution | `test_clone_ci_contract.py`,
`test_remote_resolver.py` |
| PostToolUse hook | `test_pretty_output.py` (1,516 lines, 40+ cases) |
| Branch protection + headless guards |
`test_branch_protection_guard.py`,
`test_headless_orchestration_guard.py` |
| Sub-recipe composition | 5 test files (schema, loading, validation,
sprint mode × 2) |
| Telemetry formatter | `test_telemetry_formatter.py` (281 lines) |
| PR pipeline gates | `test_analyze_prs_gates.py`,
`test_review_pr_fidelity.py` |
| Diff annotator | `test_diff_annotator.py` (242 lines) |
| Skill compliance | Output token format, genericization, loop-boundary
guards |
| Release workflows | Structural contracts for `version-bump.yml`,
`release.yml` |
| Issue content fidelity | Body-assembling skills must call
`fetch_github_issue` per-issue |
| CI watcher scope | `test_ci_params.py` — workflow_id query param
composition |

---

## Consolidated PRs

#293, #295, #314, #315, #316, #317, #318, #319, #323, #332, #336, #337,
#338, #339, #341, #343, #351, #358, #359, #360, #361, #362, #363, #366,
#368, #370, #375, #377, #378, #379, #380, #388, #389, #390, #391, #392,
#393, #395, #396, #397, #399, #405, #406

---

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant