Implementation Plan: Dry Walkthrough — Test Command Genericization (Issue #307)#337
Merged
Trecek merged 1 commit intointegrationfrom Mar 11, 2026
Conversation
Replace hardcoded `task test-all` references in Step 4 of dry-walkthrough SKILL.md with config-driven references to `test_check.command` from .autoskillit/config.yaml. Add two contract tests that enforce the new behaviour and fail against the old hardcoded wording. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Trecek
commented
Mar 11, 2026
Collaborator
Author
Trecek
left a comment
There was a problem hiding this comment.
AutoSkillit review passed. No blocking issues found. (Self-review — cannot approve own PR)
Trecek
added a commit
that referenced
this pull request
Mar 12, 2026
…333, #342 into integration (#351) ## Integration Summary Collapsed 9 PRs into `pr-batch/pr-merge-20260311-133920` targeting `integration`. ## Merged PRs | # | Title | Complexity | Additions | Deletions | Overlaps | |---|-------|-----------|-----------|-----------|---------| | #337 | Implementation Plan: Dry Walkthrough — Test Command Genericization (Issue #307) | simple | +29 | -2 | — | | #339 | Implementation Plan: Release CI — Force-Push Integration Back-Sync | simple | +88 | -45 | — | | #336 | Enhance prepare-issue with Duplicate Detection and Broader Triggers | needs_check | +161 | -8 | — | | #332 | Rectify: Display Output Bugs #329 — Terminal Targets Consolidation — PART A ONLY | needs_check | +783 | -13 | — | | #338 | Implementation Plan: Pre-release Readiness — Stability Fixes | needs_check | +238 | -36 | — | | #343 | Implementation Plan: PR Pipeline Gates — Mergeability Gate and Review Cycle | needs_check | +384 | -5 | #338 | | #341 | Pipeline observability: quota events, wall-clock timing, drift fix | needs_check | +480 | -5 | #332, #338 | | #333 | Remove run_recipe — Eliminate Sub-Orchestrator Pattern | needs_check | +538 | -655 | #332, #338, #341 | | #342 | feat: genericize codebase and bundle external dependencies for public release | needs_check | +5286 | -1062 | #332, #333, #338, #341, #343 | ## Audit **Verdict:** GO ## Architecture Impact ### Development Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; subgraph SourceTree ["PROJECT STRUCTURE (● = modified)"] direction TB SRC["● src/autoskillit/<br/>━━━━━━━━━━<br/>105 .py source files<br/>cli · config · core<br/>execution · hooks · pipeline<br/>recipe · server · workspace"] SKILLS["● + ★ src/autoskillit/skills/<br/>━━━━━━━━━━<br/>52 bundled skills<br/>★ 13 arch-lens-* SKILL.md added<br/>★ 3 audit-* SKILL.md added<br/>● 14 existing skills updated"] RECIPES["● src/autoskillit/recipes/<br/>━━━━━━━━━━<br/>8 bundled YAML recipes<br/>All recipes updated"] TESTS["● + ★ tests/<br/>━━━━━━━━━━<br/>173 .py test files<br/>★ 6 new test files added"] end subgraph Build ["BUILD TOOLING"] direction TB PYPROJECT["● pyproject.toml<br/>━━━━━━━━━━<br/>hatchling build backend<br/>uv package manager<br/>10 runtime deps"] TASKFILE["Taskfile.yml<br/>━━━━━━━━━━<br/>test-all · test-check<br/>test-smoke · install-worktree"] end subgraph Quality ["CODE QUALITY GATES"] direction TB RFMT["ruff-format<br/>━━━━━━━━━━<br/>Auto-fix formatting"] RLINT["ruff<br/>━━━━━━━━━━<br/>Lint + auto-fix"] MYPY["mypy src/<br/>━━━━━━━━━━<br/>--ignore-missing-imports"] UVLOCK["uv lock --check<br/>━━━━━━━━━━<br/>Lock file integrity"] SECRETS["gitleaks<br/>━━━━━━━━━━<br/>Secret scanning"] GUARD["★ headless_orchestration_guard.py<br/>━━━━━━━━━━<br/>★ PreToolUse hook<br/>Blocks run_skill/run_cmd/run_python<br/>from headless sessions"] end subgraph Testing ["TEST FRAMEWORK"] direction TB PYTEST["pytest + asyncio_mode=auto<br/>━━━━━━━━━━<br/>xdist -n 4 parallel<br/>timeout=60s signal method"] NEWTEST["★ New Test Files<br/>━━━━━━━━━━<br/>★ test_headless_orchestration_guard<br/>★ test_audit_and_fix_degradation<br/>★ test_rules_inputs<br/>★ test_skill_genericization<br/>★ test_pyproject_metadata<br/>★ test_release_sanity"] end subgraph CI ["CI/CD WORKFLOWS"] direction LR TESTS_WF["tests.yml<br/>━━━━━━━━━━<br/>PR test gate"] RELEASE_WF["release.yml<br/>━━━━━━━━━━<br/>Release automation"] BUMP_WF["● version-bump.yml<br/>━━━━━━━━━━<br/>● Force-push back-sync<br/>integration → main"] end subgraph EntryPoints ["ENTRY POINTS"] EP["autoskillit CLI<br/>━━━━━━━━━━<br/>serve · init · skills<br/>recipes · doctor · workspace"] end SRC --> PYPROJECT SKILLS --> PYPROJECT TESTS --> PYTEST PYPROJECT --> TASKFILE PYPROJECT --> RFMT RFMT --> RLINT RLINT --> MYPY MYPY --> UVLOCK UVLOCK --> SECRETS SECRETS --> GUARD GUARD --> PYTEST PYTEST --> NEWTEST NEWTEST --> BUMP_WF TESTS_WF --> PYTEST PYPROJECT --> EP class SRC,TESTS stateNode; class SKILLS,RECIPES newComponent; class PYPROJECT,TASKFILE phase; class RFMT,RLINT,MYPY,UVLOCK,SECRETS detector; class GUARD newComponent; class PYTEST handler; class NEWTEST newComponent; class TESTS_WF,RELEASE_WF phase; class BUMP_WF newComponent; class EP output; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Teal | Structure | Source directories and test suite | | Green (★) | New/Modified | New files and components added in this PR | | Purple | Build | Build configuration and task automation | | Red | Quality Gates | Pre-commit hooks, linters, type checker | | Orange | Test Runner | pytest execution engine | | Dark Teal | Entry Points | CLI commands | ### Module Dependency Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%% graph TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; subgraph L0 ["L0 — CORE (zero autoskillit imports)"] direction LR TYPES["● core/types.py<br/>━━━━━━━━━━<br/>GATED_TOOLS · UNGATED_TOOLS<br/>RecipeSource (★ promoted here)<br/>ClaudeFlags · StrEnums<br/>fan-in: ~75 files"] COREIO["core/io.py · logging.py · paths.py<br/>━━━━━━━━━━<br/>Atomic write · Logger · pkg_root()"] end subgraph L1P ["L1 — PIPELINE (imports L0 only)"] direction TB GATE["● pipeline/gate.py<br/>━━━━━━━━━━<br/>DefaultGateState<br/>gate_error_result()<br/>★ headless_error_result()<br/>re-exports GATED/UNGATED_TOOLS"] PIPEINIT["● pipeline/__init__.py<br/>━━━━━━━━━━<br/>Re-exports public surface<br/>ToolContext · AuditLog<br/>TokenLog · DefaultGateState"] end subgraph L1E ["L1 — EXECUTION (imports L0 only)"] direction TB HEADLESS["● execution/headless.py<br/>━━━━━━━━━━<br/>Headless Claude sessions<br/>Imports core types via TYPE_CHECKING<br/>for ToolContext (no runtime cycle)"] COMMANDS["● execution/commands.py<br/>━━━━━━━━━━<br/>ClaudeHeadlessCmd builder"] SESSION_LOG["● execution/session_log.py<br/>━━━━━━━━━━<br/>Session diagnostics writer"] end subgraph L2 ["L2 — RECIPE (imports L0+L1)"] direction TB SCHEMA["● recipe/schema.py<br/>━━━━━━━━━━<br/>Recipe · RecipeStep · DataFlowWarning<br/>RecipeSource (now from L0)"] RULES["● recipe/rules_inputs.py<br/>━━━━━━━━━━<br/>★ Ingredient validation rules<br/>reads GATED_TOOLS from L0 via<br/>pipeline re-export"] ANALYSIS["● recipe/_analysis.py<br/>━━━━━━━━━━<br/>Step graph builder"] VALIDATOR["● recipe/validator.py<br/>━━━━━━━━━━<br/>validate_recipe()"] end subgraph L3S ["L3 — SERVER (imports all layers)"] direction TB HELPERS["● server/helpers.py<br/>━━━━━━━━━━<br/>_require_enabled() — reads gate<br/>★ _require_not_headless()<br/>Shared by all tool handlers"] TOOLS_EX["● server/tools_execution.py<br/>━━━━━━━━━━<br/>run_cmd · run_python · run_skill<br/>✗ run_recipe REMOVED<br/>Uses _require_not_headless()"] TOOLS_GIT["● server/tools_git.py<br/>━━━━━━━━━━<br/>merge_worktree · classify_fix<br/>● check_pr_mergeable (new gate)"] TOOLS_K["● server/tools_kitchen.py<br/>━━━━━━━━━━<br/>open_kitchen · close_kitchen"] FACTORY["● server/_factory.py<br/>━━━━━━━━━━<br/>Composition root<br/>Wires ToolContext"] end subgraph L3H ["L3 — HOOKS (stdlib only for guard)"] direction LR HOOK_GUARD["★ hooks/headless_orchestration_guard.py<br/>━━━━━━━━━━<br/>★ PreToolUse hook (stdlib only)<br/>Blocks run_skill/run_cmd/run_python<br/>from AUTOSKILLIT_HEADLESS=1 sessions<br/>NO autoskillit imports"] PRETTY["● hooks/pretty_output.py<br/>━━━━━━━━━━<br/>PostToolUse response formatter"] end subgraph L3C ["L3 — CLI (imports all layers)"] direction LR CLI_APP["● cli/app.py<br/>━━━━━━━━━━<br/>serve · init · skills · recipes<br/>doctor · workspace"] CLI_PROMPTS["● cli/_prompts.py<br/>━━━━━━━━━━<br/>Orchestrator prompt builder"] end TYPES -->|"fan-in ~75"| GATE TYPES -->|"fan-in ~75"| HEADLESS TYPES -->|"fan-in ~75"| SCHEMA COREIO --> PIPEINIT GATE --> PIPEINIT PIPEINIT -->|"gate_error_result<br/>headless_error_result"| HELPERS HEADLESS --> HELPERS COMMANDS --> HEADLESS SESSION_LOG --> HELPERS SCHEMA -->|"RecipeSource from L0"| RULES RULES --> VALIDATOR ANALYSIS --> VALIDATOR HELPERS -->|"_require_not_headless"| TOOLS_EX HELPERS --> TOOLS_GIT HELPERS --> TOOLS_K VALIDATOR --> FACTORY PIPEINIT --> FACTORY FACTORY --> CLI_APP FACTORY --> CLI_PROMPTS HOOK_GUARD -.->|"ENV: AUTOSKILLIT_HEADLESS<br/>zero autoskillit imports"| TOOLS_EX class TYPES,COREIO stateNode; class GATE,PIPEINIT phase; class HEADLESS,COMMANDS,SESSION_LOG handler; class SCHEMA,RULES,ANALYSIS,VALIDATOR phase; class HELPERS,TOOLS_EX,TOOLS_GIT,TOOLS_K handler; class FACTORY cli; class CLI_APP,CLI_PROMPTS cli; class HOOK_GUARD newComponent; class PRETTY handler; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Teal | L0 Core | High fan-in foundation types (zero reverse deps) | | Purple | L1/L2 Control | Pipeline gate, recipe schema and rules | | Orange | L1/L3 Processors | Execution handlers, server tool handlers | | Dark Blue | L3 CLI | Composition root and CLI entry points | | Green (★) | New Components | headless_orchestration_guard — standalone hook | | Dashed | ENV Signal | OS-level check; no code import relationship | Closes #307 Closes #327 Closes #308 Closes #329 Closes #304 Closes #328 Closes #302 Closes #330 Closes #311 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit --- ## Merge Conflict Resolution The batch branch was rebased onto `integration` to resolve 17 file conflicts. All conflicts arose because PRs #337–#341 were squash-merged into both `integration` (directly) and the batch branch (via the pipeline), while PRs #333 and #342 required conflict resolution work that only exists on the batch branch. **Resolution principle:** Batch branch version wins for all files touched by #333/#342 conflict resolution and remediation, since that state was fully tested (3752 passed). Integration-only additions (e.g. `TestGetQuotaEvents`) were preserved where they don't overlap. ### Per-file decisions | File | Decision | Rationale | |------|----------|-----------| | `CLAUDE.md` | **Batch wins** | Batch has corrected tool inventory (run_recipe removed, get_quota_events added, 25 kitchen tools) | | `core/types.py` | **Batch wins** | Batch splits monolithic UNGATED_TOOLS into WORKER_TOOLS + HEADLESS_BLOCKED_UNGATED_TOOLS; removes run_recipe from GATED_TOOLS | | `execution/__init__.py` | **Batch wins** | Batch removes dead exports (build_subrecipe_cmd, run_subrecipe_session) | | `execution/headless.py` | **Batch wins** | Batch deletes run_subrecipe_session function (530+ lines); keeps run_headless_core with token_log error handling | | `hooks/pretty_output.py` | **Batch wins** | Batch removes run_recipe from _UNFORMATTED_TOOLS, adds get_quota_events | | `recipes/pr-merge-pipeline.yaml` | **Batch wins** | Batch has base_branch required:true, updated kitchen rules (main instead of integration) | | `server/_state.py` | **Batch wins** | Batch adds .telemetry_cleared_at marker reading in _initialize | | `server/helpers.py` | **Batch wins** | Batch removes _run_subrecipe and run_subrecipe_session import; adds _require_not_headless | | `server/tools_git.py` | **Batch wins** | Batch has updated classify_fix with git fetch and check_pr_mergeable gate | | `server/tools_kitchen.py` | **Batch wins** | Batch adds headless gates to open_kitchen/close_kitchen; adds TOOL_CATEGORIES listing | | `server/tools_status.py` | **Merge both** | Batch headless gates + wall_clock_seconds merged with integration's TestGetQuotaEvents (deduplicated) | | `tests/conftest.py` | **Batch wins** | Batch replaces AUTOSKILLIT_KITCHEN_OPEN with AUTOSKILLIT_HEADLESS in fixture | | `tests/execution/test_headless.py` | **Batch wins** | Batch removes run_subrecipe_session tests (deleted code); updates docstring | | `tests/recipe/test_bundled_recipes.py` | **Merge both** | Batch base_branch=main assertions + integration WF7 graph test both kept | | `tests/server/test_tools_kitchen.py` | **Batch wins** | Batch adds headless gate denial tests for open/close kitchen | | `tests/server/test_tools_status.py` | **Merge both** | Batch headless gate tests merged with integration quota events tests | ### Post-rebase fixes - Removed duplicate `TestGetQuotaEvents` class (existed in both batch commit and auto-merged integration code) - Fixed stale `_build_tool_listing` → `_build_tool_category_listing` attribute reference - Added `if diagram: print(diagram)` to `cli/app.py` cook function (test expected terminal output) ### Verification - **3752 passed**, 23 skipped, 0 failures - 7 architecture contracts kept, 0 broken - Pre-commit hooks all pass --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Trecek
added a commit
that referenced
this pull request
Mar 15, 2026
…, Headless Isolation (#404) ## Summary Integration rollup of **43 PRs** (#293–#406) consolidating **62 commits** across **291 files** (+27,909 / −6,040 lines). This release advances AutoSkillit from v0.2.0 to v0.3.1 with GitHub merge queue integration, sub-recipe composition, a PostToolUse output reformatter, headless session isolation guards, and comprehensive pipeline observability — plus 24 new bundled skills, 3 new MCP tools, and 47 new test files. --- ## Major Features ### GitHub Merge Queue Integration (#370, #362, #390) - New `wait_for_merge_queue` MCP tool — polls a PR through GitHub's merge queue until merged, ejected, or timed out (default 600s). Uses REST + GraphQL APIs with stuck-queue detection and auto-merge re-enrollment - New `DefaultMergeQueueWatcher` L1 service (`execution/merge_queue.py`) — never raises; all outcomes are structured results - `parse_merge_queue_response()` pure function for GraphQL queue entry parsing - New `auto_merge` ingredient in `implementation.yaml` and `remediation.yaml` — enrolls PRs in the merge queue after CI passes - Full queue-mode path added to `merge-prs.yaml`: detect queue → enqueue → wait → handle ejections → re-enter - `analyze-prs` skill gains Step 0.5 (merge queue detection) and Step 1.5 (CI/review eligibility filtering) ### Sub-Recipe Composition (#380) - Recipe steps can now reference sub-recipes via `sub_recipe` + `gate` fields — lazy-loaded and merged at validation time - Composition engine in `recipe/_api.py`: `_merge_sub_recipe()` inlines sub-recipe steps with safe name-prefixing and route remapping (`done` → parent's `on_success`, `escalate` → parent's `on_failure`) - `_build_active_recipe()` evaluates gate ingredients against overrides/defaults; dual validation runs on both active and combined recipes - First sub-recipe: `sprint-prefix.yaml` — triage → plan → confirm → dispatch workflow, gated by `sprint_mode` ingredient (hidden, default false) - Both `implementation.yaml` and `remediation.yaml` gain `sprint_entry` placeholder step - New semantic rules: `unknown-sub-recipe` (ERROR), `circular-sub-recipe` (ERROR) with DFS cycle detection ### PostToolUse Output Reformatter (#293, #405) - `pretty_output.py` — new 671-line PostToolUse hook that rewrites raw MCP JSON responses to Markdown-KV before Claude consumes them (30–77% token overhead reduction) - Dedicated formatters for 11 high-traffic tools (`run_skill`, `run_cmd`, `test_check`, `merge_worktree`, `get_token_summary`, etc.) plus a generic KV formatter for remaining tools - Pipeline vs. interactive mode detection via hook config file - Unwraps Claude Code's `{"result": "<json-string>"}` envelope before dispatching - 1,516-line test file with 40+ behavioral tests ### Headless Session Isolation (#359, #393, #397, #405, #406) - **Env isolation**: `build_sanitized_env()` strips `AUTOSKILLIT_PRIVATE_ENV_VARS` from subprocess environments, preventing `AUTOSKILLIT_HEADLESS=1` from leaking into test runners - **CWD path contamination defense**: `_inject_cwd_anchor()` anchors all relative paths to session CWD; `_validate_output_paths()` checks structured output tokens against CWD prefix; `_scan_jsonl_write_paths()` post-session scanner catches actual Write/Edit/Bash tool calls outside CWD - **Headless orchestration guard**: new PreToolUse hook blocks `run_skill`/`run_cmd`/`run_python` when `AUTOSKILLIT_HEADLESS=1`, enforcing Tier 1/Tier 2 nesting invariant - **`_require_not_headless()` server-side guard**: blocks 10 orchestration-only tools from headless sessions at the handler layer - **Unified error response contract**: `headless_error_result()` produces consistent 9-field responses; `_build_headless_error_response()` canonical builder for all failure paths in `tools_integrations.py` ### Cook UX Overhaul (#375, #363) - `open_kitchen` now accepts optional `name` + `overrides` — opens kitchen AND loads recipe in a single call - Pre-launch terminal preview with ANSI-colored flow diagram and ingredients table via new `cli/_ansi.py` module - `--dangerously-skip-permissions` warning banner with interactive confirmation prompt - Randomized session greetings from themed pools - Orchestrator prompt rewritten: recipe YAML no longer injected via `--append-system-prompt`; session calls `open_kitchen('{recipe_name}')` as first action - Conversational ingredient collection replaces mechanical per-field prompting --- ## New MCP Tools | Tool | Gate | Description | |------|------|-------------| | `wait_for_merge_queue` | Kitchen | Polls PR through GitHub merge queue (REST + GraphQL) | | `set_commit_status` | Kitchen | Posts GitHub Commit Status to a SHA for review-first gating | | `get_quota_events` | Ungated | Surfaces quota guard decisions from `quota_events.jsonl` | --- ## Pipeline Observability (#318, #341) - **`TelemetryFormatter`** (`pipeline/telemetry_fmt.py`) — single source of truth for all telemetry rendering; replaces dual-formatter anti-pattern. Four rendering modes: Markdown table, terminal table, compact KV (for PostToolUse hook) - `get_token_summary` and `get_timing_summary` gain `format` parameter (`"json"` | `"table"`) - `wall_clock_seconds` merged into token summary output — see duration alongside token counts in one call - **Telemetry clear marker**: `write_telemetry_clear_marker()` / `read_telemetry_clear_marker()` prevent token accounting drift on MCP server restart after `clear=True` - **Quota event logging**: `quota_check.py` hook now writes structured JSONL events (`cache_miss`, `parse_error`, `blocked`, `approved`) to `quota_events.jsonl` --- ## CI Watcher & Remote Resolution Fixes (#395, #406) - **`CIRunScope` value object** — carries `workflow` + `head_sha` scope; replaces bare `head_sha` parameter across all CI watcher signatures - **Workflow filter**: `wait_for_ci` and `get_ci_status` accept `workflow` parameter (falls back to project-level `config.ci.workflow`), preventing unrelated workflows (version bumps, labelers) from satisfying CI checks - **`FAILED_CONCLUSIONS` expanded**: `failure` → `{failure, timed_out, startup_failure, cancelled}` - **Canonical remote resolver** (`execution/remote_resolver.py`): `resolve_remote_repo()` with `REMOTE_PRECEDENCE = (upstream, origin)` — correctly resolves `owner/repo` after `clone_repo` sets `origin` to `file://` isolation URL - **Clone isolation fix**: `clone_repo` now always clones from remote URL (never local path); sets `origin=file:///<clone>` for isolation and `upstream=<real_url>` for push/CI operations --- ## PR Pipeline Gates (#317, #343) - **`pipeline/pr_gates.py`**: `is_ci_passing()`, `is_review_passing()`, `partition_prs()` — partitions PRs into eligible/CI-blocked/review-blocked with human-readable reasons - **`pipeline/fidelity.py`**: `extract_linked_issues()` (Closes/Fixes/Resolves patterns), `is_valid_fidelity_finding()` schema validation - **`check_pr_mergeable`** now returns `mergeable_status` field alongside boolean - **`release_issue`** gains `target_branch` + `staged_label` parameters for staged issue lifecycle on non-default branches (#392) --- ## Recipe System Changes ### Structural - `RecipeIngredient.hidden` field — excluded from ingredients table (used for internal flags like `sprint_mode`) - `Recipe.experimental` flag parsed from YAML - `_TERMINAL_TARGETS` moved to `schema.py` as single source of truth - `format_ingredients_table()` with sorted display order (required → auto-detect → flags → optional → constants) - Diagram rendering engine (~670 lines) removed from `diagrams.py` — rendering now handled by `/render-recipe` skill; format version bumped to v7 ### Recipe YAML Changes - **Deleted**: `audit-and-fix.yaml`, `batch-implementation.yaml`, `bugfix-loop.yaml` - **Renamed**: `pr-merge-pipeline.yaml` → `merge-prs.yaml` - **`implementation.yaml`**: merge queue steps, `auto_merge`/`sprint_mode` ingredients, `base_branch` default → `""` (auto-detect), CI workflow filter, `extract_pr_number` step - **`remediation.yaml`**: `topic` → `task` rename, merge queue steps, `dry_walkthrough` retries:3 with forward-only routing, `verify` → `test` rename - **`merge-prs.yaml`**: full queue-mode path, `open-integration-pr` step (replaces `create-review-pr`), post-PR mergeability polling, review cycle with `resolve-review` retries ### New Semantic Rules - `missing-output-patterns` (WARNING) — flags `run_skill` steps without `expected_output_patterns` - `unknown-sub-recipe` (ERROR) — validates sub-recipe references exist - `circular-sub-recipe` (ERROR) — DFS cycle detection - `unknown-skill-command` (ERROR) — validates skill names against bundled set - `telemetry-before-open-pr` (WARNING) — ensures telemetry step precedes `open-pr` --- ## New Skills (24) ### Architecture Lens Family (13) `arch-lens-c4-container`, `arch-lens-concurrency`, `arch-lens-data-lineage`, `arch-lens-deployment`, `arch-lens-development`, `arch-lens-error-resilience`, `arch-lens-module-dependency`, `arch-lens-operational`, `arch-lens-process-flow`, `arch-lens-repository-access`, `arch-lens-scenarios`, `arch-lens-security`, `arch-lens-state-lifecycle` ### Audit Family (5) `audit-arch`, `audit-bugs`, `audit-cohesion`, `audit-defense-standards`, `audit-tests` ### Planning & Diagramming (3) `elaborate-phase`, `make-arch-diag`, `make-req` ### Bug/Guard Lifecycle (2) `design-guards`, `verify-diag` ### Pipeline (1) `open-integration-pr` — creates integration PRs with per-PR details, arch-lens diagrams, carried-forward `Closes #N` references, and auto-closes collapsed PRs ### Sprint Planning (1 — gated by sub-recipe) `sprint-planner` — selects a focused, conflict-free sprint from a triage manifest --- ## Skill Modifications (Highlights) - **`analyze-prs`**: merge queue detection, CI/review eligibility filtering, queue-mode ordering - **`dry-walkthrough`**: Step 4.5 Historical Regression Check (git history mining + GitHub issue cross-reference) - **`review-pr`**: deterministic diff annotation via `diff_annotator.py`, echo-primary-obligation step, post-completion confirmation, degraded-mode narration - **`collapse-issues`**: content fidelity enforcement — per-issue `fetch_github_issue` calls, copy-mode body assembly (#388) - **`prepare-issue`**: multi-keyword dedup search, numbered candidate selection, extend-existing-issue flow - **`resolve-review`**: GraphQL thread auto-resolution after addressing findings (#379) - **`resolve-merge-conflicts`**: conflict resolution decision report with per-file log (#389) - **Cross-skill**: output tokens migrated to `key = value` format; code-index paths made generic with fallback notes; arch-lens references fully qualified; anti-prose guards at loop boundaries --- ## CLI & Hooks ### New CLI Commands - `autoskillit install` — plugin installation + cache refresh - `autoskillit upgrade` — `.autoskillit/scripts/` → `.autoskillit/recipes/` migration ### CLI Changes - `doctor`: plugin-aware MCP check, PostToolUse hook scanning, `--fix` flag removed - `init`: GitHub repo prompt, `.secrets.yaml` template, plugin-aware registration - `chefs-hat`: pre-launch banner, `--dangerously-skip-permissions` confirmation - `recipes render`: repurposed from generator to viewer (delegates to `/render-recipe`) - `serve`: server import deferred to after `configure_logging()` to prevent stdout corruption ### New Hooks - `branch_protection_guard.py` (PreToolUse) — denies `merge_worktree`/`push_to_remote` targeting protected branches - `headless_orchestration_guard.py` (PreToolUse) — blocks orchestration tools in headless sessions - `pretty_output.py` (PostToolUse) — MCP JSON → Markdown-KV reformatter ### Hook Infrastructure - `HookDef.event_type` field — registry now handles both PreToolUse and PostToolUse - `generate_hooks_json()` groups entries by event type - `_evict_stale_autoskillit_hooks` and `sync_hooks_to_settings` made event-type-agnostic --- ## Core & Config ### New Core Modules - `core/branch_guard.py` — `is_protected_branch()` pure function - `core/github_url.py` — `parse_github_repo()` + `normalize_owner_repo()` canonical parsers ### Core Type Expansions - `AUTOSKILLIT_PRIVATE_ENV_VARS` frozenset - `WORKER_TOOLS` / `HEADLESS_BLOCKED_UNGATED_TOOLS` split from `UNGATED_TOOLS` - `TOOL_CATEGORIES` — categorized listing for `open_kitchen` response - `CIRunScope` — immutable scope for CI watcher calls - `MergeQueueWatcher` protocol - `SkillResult.cli_subtype` + `write_path_warnings` fields - `SubprocessRunner.env` parameter ### Config - `safety.protected_branches`: `[main, integration, stable]` - `github.staged_label`: `"staged"` - `ci.workflow`: workflow filename filter (e.g., `"tests.yml"`) - `branching.default_base_branch`: `"integration"` → `"main"` - `ModelConfig.default`: `str | None` → `str = "sonnet"` --- ## Infrastructure & Release ### Version - `0.2.0` → `0.3.1` across `pyproject.toml`, `plugin.json`, `uv.lock` - FastMCP dependency: `>=3.0.2` → `>=3.1.1,<4.0` (#399) ### CI/CD Workflows - **`version-bump.yml`** (new) — auto patch-bumps `main` on integration PR merge, force-syncs integration branch one patch ahead - **`release.yml`** (new) — minor version bump + GitHub Release on merge to `stable` - **`codeql.yml`** (new) — CodeQL analysis for `stable` PRs (Python + Actions) - **`tests.yml`** — `merge_group:` trigger added; multi-OS now only for `stable` ### PyPI Readiness - `pyproject.toml`: `readme`, `license`, `authors`, `keywords`, `classifiers`, `project.urls`, `hatch.build.targets.sdist` inclusion list ### readOnlyHint Parallel Execution Fix - All MCP tools annotated `readOnlyHint=True` — enables Claude Code parallel tool execution (~7x speedup). One deliberate exception: `wait_for_merge_queue` uses `readOnlyHint=False` (actually mutates queue state) ### Tool Response Exception Boundary - `track_response_size` decorator catches unhandled exceptions and serializes them as `{"success": false, "subtype": "tool_exception"}` — prevents FastMCP opaque error wrapping ### SkillResult Subtype Normalization (#358) - `_normalize_subtype()` gate eliminates dual-source contradiction between CLI subtype and session outcome - Class 2 upward: `SUCCEEDED + error_subtype → "success"` (drain-race artifact) - Class 1 downward: `non-SUCCEEDED + "success" → "empty_result"` / `"missing_completion_marker"` / `"adjudicated_failure"` --- ## Test Coverage **47 new test files** (+12,703 lines) covering: | Area | Key Tests | |------|-----------| | Merge queue watcher state machine | `test_merge_queue.py` (226 lines) | | Clone isolation × CI resolution | `test_clone_ci_contract.py`, `test_remote_resolver.py` | | PostToolUse hook | `test_pretty_output.py` (1,516 lines, 40+ cases) | | Branch protection + headless guards | `test_branch_protection_guard.py`, `test_headless_orchestration_guard.py` | | Sub-recipe composition | 5 test files (schema, loading, validation, sprint mode × 2) | | Telemetry formatter | `test_telemetry_formatter.py` (281 lines) | | PR pipeline gates | `test_analyze_prs_gates.py`, `test_review_pr_fidelity.py` | | Diff annotator | `test_diff_annotator.py` (242 lines) | | Skill compliance | Output token format, genericization, loop-boundary guards | | Release workflows | Structural contracts for `version-bump.yml`, `release.yml` | | Issue content fidelity | Body-assembling skills must call `fetch_github_issue` per-issue | | CI watcher scope | `test_ci_params.py` — workflow_id query param composition | --- ## Consolidated PRs #293, #295, #314, #315, #316, #317, #318, #319, #323, #332, #336, #337, #338, #339, #341, #343, #351, #358, #359, #360, #361, #362, #363, #366, #368, #370, #375, #377, #378, #379, #380, #388, #389, #390, #391, #392, #393, #395, #396, #397, #399, #405, #406 --- 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
dry-walkthroughskill's Step 4.5 (Historical Regression Check) is already fully implemented inSKILL.md— including the git history scan, GitHub issues cross-reference,gh authguard, actionable vs. informational classification, and the### Historical Contextsection in Step 7. Contract tests for Step 4.5 exist and pass intests/skills/test_dry_walkthrough_contracts.py.The remaining work is REQ-GEN-002 from Issue #311 (consolidated here): replace the hardcoded
task test-allreferences in Step 4 ofSKILL.mdwith a config-driven reference totest_check.command, so the skill validates correctly for any project regardless of its test runner.Architecture Impact
Process Flow Diagram
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; %% TERMINALS %% START([START]) PASS([PASS — Ready to implement]) REVISED([REVISED — See changes]) subgraph Load ["Step 1: Load & Detect"] direction TB LOAD["Load Plan<br/>━━━━━━━━━━<br/>arg path / pasted / temp/ scan"] MULTIPART{"Multi-part plan?<br/>━━━━━━━━━━<br/>filename contains _part_?"} SCOPE_WARN["Scope Warning<br/>━━━━━━━━━━<br/>Insert boundary block<br/>Restrict to this part only"] end subgraph Validate ["Steps 2–3: Validate Phases"] direction TB PHASE_V["Phase Subagents<br/>━━━━━━━━━━<br/>files exist · fns exist<br/>assumptions correct<br/>wiring complete"] XPHASE["Cross-Phase Deps<br/>━━━━━━━━━━<br/>Phase ordering · implicit deps<br/>reorder safety"] end subgraph Rules ["● Step 4: Validate Against Project Rules"] direction TB RULE_CHK["Rule Checklist<br/>━━━━━━━━━━<br/>No compat code · no hidden fallbacks<br/>no stakeholder sections · arch patterns"] READ_CFG["● Read test_check.command<br/>━━━━━━━━━━<br/>.autoskillit/config.yaml<br/>default: task test-check"] TEST_CMD{"● Non-config test<br/>command in plan?<br/>━━━━━━━━━━<br/>pytest / python -m pytest<br/>make test etc."} WORKTREE{"Hardcoded worktree<br/>setup in plan?<br/>━━━━━━━━━━<br/>uv venv / pip install etc."} end subgraph History ["Step 4.5: Historical Regression Check"] direction TB GH_AUTH{"gh auth status<br/>━━━━━━━━━━<br/>GitHub auth available?"} GIT_SCAN["Git History Scan<br/>━━━━━━━━━━<br/>git log -100 on PLAN_FILES<br/>fix/revert/remove/replace/delete"] GIT_SIG{"Signal strength?<br/>━━━━━━━━━━<br/>Symbol-level match?"} GH_ISSUES["GitHub Issues XRef<br/>━━━━━━━━━━<br/>open + closed (last 30d)<br/>keyword cross-reference"] GH_MATCH{"Issue type?<br/>━━━━━━━━━━<br/>open vs closed match"} end subgraph Fix ["Steps 5–6: Fix & Mark"] direction TB FIX["Fix the Plan<br/>━━━━━━━━━━<br/>Direct edits to plan file<br/>No gap-analysis sections"] MARK["Mark Verified<br/>━━━━━━━━━━<br/>Dry-walkthrough verified = TRUE<br/>(first line of plan)"] end REPORT["● Step 7: Report to Terminal<br/>━━━━━━━━━━<br/>Changes Made · Verified<br/>### Historical Context · Recommendation"] %% FLOW %% START --> LOAD LOAD --> MULTIPART MULTIPART -->|"YES"| SCOPE_WARN MULTIPART -->|"NO"| PHASE_V SCOPE_WARN --> PHASE_V PHASE_V --> XPHASE XPHASE --> RULE_CHK RULE_CHK --> READ_CFG READ_CFG --> TEST_CMD TEST_CMD -->|"YES — replace"| FIX TEST_CMD -->|"NO"| WORKTREE WORKTREE -->|"YES — flag & replace"| FIX WORKTREE -->|"NO"| GH_AUTH GH_AUTH -->|"available"| GIT_SCAN GH_AUTH -->|"unavailable"| GIT_SCAN GIT_SCAN --> GIT_SIG GIT_SIG -->|"strong — actionable"| FIX GIT_SIG -->|"weak — informational"| REPORT GH_AUTH -->|"available"| GH_ISSUES GH_ISSUES --> GH_MATCH GH_MATCH -->|"closed issue — actionable"| FIX GH_MATCH -->|"open issue — informational"| REPORT FIX --> MARK MARK --> REPORT REPORT -->|"issues found"| REVISED REPORT -->|"no issues"| PASS %% CLASS ASSIGNMENTS %% class START,PASS,REVISED terminal; class LOAD,PHASE_V,XPHASE,GIT_SCAN,GH_ISSUES handler; class RULE_CHK,READ_CFG phase; class MULTIPART,TEST_CMD,WORKTREE,GH_AUTH,GIT_SIG,GH_MATCH stateNode; class FIX,MARK detector; class SCOPE_WARN output; class REPORT output;● Modified component | ★ New component
Closes #307
Implementation Plan
Plan file:
/home/talon/projects/autoskillit-runs/impl-307-20260310-203646-026624/temp/make-plan/dry_walkthrough_historical_regression_plan_2026-03-10_120000.md🤖 Generated with Claude Code via AutoSkillit