Skip to content

Various fixes: integration rollup into main#366

Merged
Trecek merged 11 commits intointegrationfrom
various_fixes
Mar 12, 2026
Merged

Various fixes: integration rollup into main#366
Trecek merged 11 commits intointegrationfrom
various_fixes

Conversation

@Trecek
Copy link
Collaborator

@Trecek Trecek commented Mar 12, 2026

Summary

  • Branch protection: Protected-branch validation for merge and push operations with hooks and structural tests
  • Pipeline observability: Canonical TelemetryFormatter, quota events, wall-clock timing, drift fix
  • PR pipeline gates: Mergeability gate, review cycle, fidelity checks, CI gating, review-first enforcement
  • Release CI: Version bump automation, branch sync, force-push integration back-sync, release workflows
  • Skill hardening: Anti-prose guards for loop constructs, loop-boundary detector, skill compliance tests, end-turn hazard documentation
  • Recipe validation: Unknown-skill-command semantic rule, arch import violation fixes
  • PostToolUse hook: Pretty output formatter with exception boundary and data-loss bug fixes
  • Dry walkthrough: Test command genericization, Step 4.5 historical regression check
  • prepare-issue: Duplicate detection and broader triggers
  • Display output: Terminal targets consolidation (Part A)
  • Pre-release stability: Test failures, arch exemptions, init idempotency, CLAUDE.md corrections
  • Documentation: Release docs sprint, getting-started, CLI reference, architecture, configuration, installation guides
  • readOnlyHint: Added to all MCP tools for parallel execution support
  • review-pr skill: Restored bundled skill and reverted erroneous deletions

Test plan

  • CI passes (Preflight checks + Tests on ubuntu-latest)
  • No regressions in existing test suite
  • New tests for branch protection, telemetry formatter, pretty output, release workflows, skill compliance all pass

🤖 Generated with Claude Code

Trecek and others added 11 commits March 12, 2026 08:10
Add is_protected_branch() guard in core/branch_guard.py, enforced in
perform_merge() and push_to_remote() to prevent accidental merges or
pushes to main, integration, or stable. Configurable via
SafetyConfig.protected_branches with defaults.yaml defaults.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add branch_protection_guard.py PreToolUse hook for merge_worktree and push_to_remote
- Register branch_protection_guard.py and headless_orchestration_guard.py in HOOK_REGISTRY
- Add structural test: every hook script in hooks/ must be in HOOK_REGISTRY
- Add structural test: every destructive tool must have PreToolUse hook
- Add cwd assertions to 3 run_skill tests (MockSubprocessRunner[N][1])
- Fix BranchingConfig.default_base_branch: 'integration' -> 'main' (match defaults.yaml)
- Add test verifying Python default matches defaults.yaml

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add branch_protection_guard.py to _PRINT_EXEMPT (standalone hooks use print for JSON)
- Document branch_protection_guard.py in CLAUDE.md Architecture section
- Fix test_init_idempotent_no_duplicates: check for duplicate matchers instead
  of counting matchers containing "run_skill" (which now legitimately appears
  in both the run_skill matcher and the headless_orchestration_guard matcher)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…splay

Eliminates the dual-formatter anti-pattern where two independent formatting
implementations (server-side bullet lists and hook-side Markdown-KV) produced
incompatible outputs for the same telemetry data.

- Create TelemetryFormatter in pipeline layer with format_token_table(),
  format_timing_table(), and format_compact_kv() methods
- Add format parameter to get_token_summary and get_timing_summary MCP tools
  (format="json" default, format="table" returns pre-formatted markdown)
- Rewrite write_telemetry_files to use TelemetryFormatter and merge
  wall_clock_seconds (previously bypassed, root cause of field never
  reaching file output)
- Delete _format_token_summary() and _format_timing_summary() server-side
  formatters
- Update PostToolUse hook to prefer wall_clock_seconds, add dedicated
  _fmt_get_timing_summary formatter, handle pre-formatted responses
- Simplify recipe TOKEN SUMMARY instructions from 5-step manual chain
  to single get_token_summary(format=table) call
- Add telemetry-before-open-pr semantic rule (WARNING)
- Fix stale open-pr skill_contracts.yaml entry
- Add comprehensive tests: TelemetryFormatter unit tests, format parameter
  tests, output-equivalence test (hook ≡ canonical), format-structural
  assertions replacing weak presence-only checks, semantic rule tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a new semantic rule that validates skill_command references in run_skill
steps resolve to actual bundled skills on disk. Mirrors the existing unknown-tool
rule pattern. Prevents silent recipe breakage when skills are deleted.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Import SkillResolver from workspace __init__ (not submodule) per REQ-ARCH-001
- Detect dynamic template expressions in skill name portion only, not in arguments
  (e.g. /autoskillit:audit-${{...}} is dynamic, /autoskillit:review-pr ${{...}} is not)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All 6 failing tests correctly detect the same issue: bundled recipes
reference /autoskillit:review-pr which was erroneously deleted. Filters
are tagged TODO(Part-B) for removal once review-pr skill is restored.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extends the text-then-tool compliance suite with _check_loop_boundary()
that detects "For each" loops containing tool invocations without an
anti-prose guard. Adds 3 detector unit test fixtures and wires the new
check into the project-wide test_no_text_then_tool_in_any_step scan.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds CRITICAL guard instructions to "For each" loops containing tool
invocations in open-pr, create-review-pr, process-issues, and
setup-project. Prevents stochastic end_turn windows at loop iteration
boundaries where the model would otherwise emit progress prose.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Explains why text output between tool calls kills headless sessions,
the two anti-pattern classes (text-then-tool, loop-boundary), how to
reproduce the issue, why recipes are immune, and what upstream fixes
would make the guards unnecessary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Commit 5baa6b9 erroneously deleted the review-pr skill and made cascading
changes to tests and recipes. This restores the skill, its tests, and reverts
the degradation logic — making the codebase consistent so the
unknown-skill-command rule passes for all bundled recipes.

- Restore review-pr/SKILL.md and its two test files from main
- Revert audit-and-fix.yaml: remove check_review_pr_available gate,
  use /autoskillit:review-pr, restore on_failure → resolve_review
- Remove all 6 TODO(Part-B) suppression filters and xfail markers
- Update BUNDLED_SKILLS list, write-recipe skill list, CLAUDE.md (52→53)
- Replace pytest.skip with pytest.fail in review_pr_text fixture
- Genericize code-index path example in review-pr SKILL.md (REQ-GEN-004)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Trecek Trecek changed the base branch from main to integration March 12, 2026 22:33
@Trecek Trecek added this pull request to the merge queue Mar 12, 2026
@Trecek Trecek removed this pull request from the merge queue due to a manual request Mar 12, 2026
@Trecek Trecek added this pull request to the merge queue Mar 12, 2026
Merged via the queue into integration with commit 302cea8 Mar 12, 2026
2 checks passed
@Trecek Trecek deleted the various_fixes branch March 12, 2026 22:59
Trecek added a commit that referenced this pull request Mar 15, 2026
…, Headless Isolation (#404)

## Summary

Integration rollup of **43 PRs** (#293#406) consolidating **62
commits** across **291 files** (+27,909 / −6,040 lines). This release
advances AutoSkillit from v0.2.0 to v0.3.1 with GitHub merge queue
integration, sub-recipe composition, a PostToolUse output reformatter,
headless session isolation guards, and comprehensive pipeline
observability — plus 24 new bundled skills, 3 new MCP tools, and 47 new
test files.

---

## Major Features

### GitHub Merge Queue Integration (#370, #362, #390)
- New `wait_for_merge_queue` MCP tool — polls a PR through GitHub's
merge queue until merged, ejected, or timed out (default 600s). Uses
REST + GraphQL APIs with stuck-queue detection and auto-merge
re-enrollment
- New `DefaultMergeQueueWatcher` L1 service (`execution/merge_queue.py`)
— never raises; all outcomes are structured results
- `parse_merge_queue_response()` pure function for GraphQL queue entry
parsing
- New `auto_merge` ingredient in `implementation.yaml` and
`remediation.yaml` — enrolls PRs in the merge queue after CI passes
- Full queue-mode path added to `merge-prs.yaml`: detect queue → enqueue
→ wait → handle ejections → re-enter
- `analyze-prs` skill gains Step 0.5 (merge queue detection) and Step
1.5 (CI/review eligibility filtering)

### Sub-Recipe Composition (#380)
- Recipe steps can now reference sub-recipes via `sub_recipe` + `gate`
fields — lazy-loaded and merged at validation time
- Composition engine in `recipe/_api.py`: `_merge_sub_recipe()` inlines
sub-recipe steps with safe name-prefixing and route remapping (`done` →
parent's `on_success`, `escalate` → parent's `on_failure`)
- `_build_active_recipe()` evaluates gate ingredients against
overrides/defaults; dual validation runs on both active and combined
recipes
- First sub-recipe: `sprint-prefix.yaml` — triage → plan → confirm →
dispatch workflow, gated by `sprint_mode` ingredient (hidden, default
false)
- Both `implementation.yaml` and `remediation.yaml` gain `sprint_entry`
placeholder step
- New semantic rules: `unknown-sub-recipe` (ERROR),
`circular-sub-recipe` (ERROR) with DFS cycle detection

### PostToolUse Output Reformatter (#293, #405)
- `pretty_output.py` — new 671-line PostToolUse hook that rewrites raw
MCP JSON responses to Markdown-KV before Claude consumes them (30–77%
token overhead reduction)
- Dedicated formatters for 11 high-traffic tools (`run_skill`,
`run_cmd`, `test_check`, `merge_worktree`, `get_token_summary`, etc.)
plus a generic KV formatter for remaining tools
- Pipeline vs. interactive mode detection via hook config file
- Unwraps Claude Code's `{"result": "<json-string>"}` envelope before
dispatching
- 1,516-line test file with 40+ behavioral tests

### Headless Session Isolation (#359, #393, #397, #405, #406)
- **Env isolation**: `build_sanitized_env()` strips
`AUTOSKILLIT_PRIVATE_ENV_VARS` from subprocess environments, preventing
`AUTOSKILLIT_HEADLESS=1` from leaking into test runners
- **CWD path contamination defense**: `_inject_cwd_anchor()` anchors all
relative paths to session CWD; `_validate_output_paths()` checks
structured output tokens against CWD prefix; `_scan_jsonl_write_paths()`
post-session scanner catches actual Write/Edit/Bash tool calls outside
CWD
- **Headless orchestration guard**: new PreToolUse hook blocks
`run_skill`/`run_cmd`/`run_python` when `AUTOSKILLIT_HEADLESS=1`,
enforcing Tier 1/Tier 2 nesting invariant
- **`_require_not_headless()` server-side guard**: blocks 10
orchestration-only tools from headless sessions at the handler layer
- **Unified error response contract**: `headless_error_result()`
produces consistent 9-field responses;
`_build_headless_error_response()` canonical builder for all failure
paths in `tools_integrations.py`

### Cook UX Overhaul (#375, #363)
- `open_kitchen` now accepts optional `name` + `overrides` — opens
kitchen AND loads recipe in a single call
- Pre-launch terminal preview with ANSI-colored flow diagram and
ingredients table via new `cli/_ansi.py` module
- `--dangerously-skip-permissions` warning banner with interactive
confirmation prompt
- Randomized session greetings from themed pools
- Orchestrator prompt rewritten: recipe YAML no longer injected via
`--append-system-prompt`; session calls `open_kitchen('{recipe_name}')`
as first action
- Conversational ingredient collection replaces mechanical per-field
prompting

---

## New MCP Tools

| Tool | Gate | Description |
|------|------|-------------|
| `wait_for_merge_queue` | Kitchen | Polls PR through GitHub merge queue
(REST + GraphQL) |
| `set_commit_status` | Kitchen | Posts GitHub Commit Status to a SHA
for review-first gating |
| `get_quota_events` | Ungated | Surfaces quota guard decisions from
`quota_events.jsonl` |

---

## Pipeline Observability (#318, #341)

- **`TelemetryFormatter`** (`pipeline/telemetry_fmt.py`) — single source
of truth for all telemetry rendering; replaces dual-formatter
anti-pattern. Four rendering modes: Markdown table, terminal table,
compact KV (for PostToolUse hook)
- `get_token_summary` and `get_timing_summary` gain `format` parameter
(`"json"` | `"table"`)
- `wall_clock_seconds` merged into token summary output — see duration
alongside token counts in one call
- **Telemetry clear marker**: `write_telemetry_clear_marker()` /
`read_telemetry_clear_marker()` prevent token accounting drift on MCP
server restart after `clear=True`
- **Quota event logging**: `quota_check.py` hook now writes structured
JSONL events (`cache_miss`, `parse_error`, `blocked`, `approved`) to
`quota_events.jsonl`

---

## CI Watcher & Remote Resolution Fixes (#395, #406)

- **`CIRunScope` value object** — carries `workflow` + `head_sha` scope;
replaces bare `head_sha` parameter across all CI watcher signatures
- **Workflow filter**: `wait_for_ci` and `get_ci_status` accept
`workflow` parameter (falls back to project-level `config.ci.workflow`),
preventing unrelated workflows (version bumps, labelers) from satisfying
CI checks
- **`FAILED_CONCLUSIONS` expanded**: `failure` → `{failure, timed_out,
startup_failure, cancelled}`
- **Canonical remote resolver** (`execution/remote_resolver.py`):
`resolve_remote_repo()` with `REMOTE_PRECEDENCE = (upstream, origin)` —
correctly resolves `owner/repo` after `clone_repo` sets `origin` to
`file://` isolation URL
- **Clone isolation fix**: `clone_repo` now always clones from remote
URL (never local path); sets `origin=file:///<clone>` for isolation and
`upstream=<real_url>` for push/CI operations

---

## PR Pipeline Gates (#317, #343)

- **`pipeline/pr_gates.py`**: `is_ci_passing()`, `is_review_passing()`,
`partition_prs()` — partitions PRs into
eligible/CI-blocked/review-blocked with human-readable reasons
- **`pipeline/fidelity.py`**: `extract_linked_issues()`
(Closes/Fixes/Resolves patterns), `is_valid_fidelity_finding()` schema
validation
- **`check_pr_mergeable`** now returns `mergeable_status` field
alongside boolean
- **`release_issue`** gains `target_branch` + `staged_label` parameters
for staged issue lifecycle on non-default branches (#392)

---

## Recipe System Changes

### Structural
- `RecipeIngredient.hidden` field — excluded from ingredients table
(used for internal flags like `sprint_mode`)
- `Recipe.experimental` flag parsed from YAML
- `_TERMINAL_TARGETS` moved to `schema.py` as single source of truth
- `format_ingredients_table()` with sorted display order (required →
auto-detect → flags → optional → constants)
- Diagram rendering engine (~670 lines) removed from `diagrams.py` —
rendering now handled by `/render-recipe` skill; format version bumped
to v7

### Recipe YAML Changes
- **Deleted**: `audit-and-fix.yaml`, `batch-implementation.yaml`,
`bugfix-loop.yaml`
- **Renamed**: `pr-merge-pipeline.yaml` → `merge-prs.yaml`
- **`implementation.yaml`**: merge queue steps,
`auto_merge`/`sprint_mode` ingredients, `base_branch` default → `""`
(auto-detect), CI workflow filter, `extract_pr_number` step
- **`remediation.yaml`**: `topic` → `task` rename, merge queue steps,
`dry_walkthrough` retries:3 with forward-only routing, `verify` → `test`
rename
- **`merge-prs.yaml`**: full queue-mode path, `open-integration-pr` step
(replaces `create-review-pr`), post-PR mergeability polling, review
cycle with `resolve-review` retries

### New Semantic Rules
- `missing-output-patterns` (WARNING) — flags `run_skill` steps without
`expected_output_patterns`
- `unknown-sub-recipe` (ERROR) — validates sub-recipe references exist
- `circular-sub-recipe` (ERROR) — DFS cycle detection
- `unknown-skill-command` (ERROR) — validates skill names against
bundled set
- `telemetry-before-open-pr` (WARNING) — ensures telemetry step precedes
`open-pr`

---

## New Skills (24)

### Architecture Lens Family (13)
`arch-lens-c4-container`, `arch-lens-concurrency`,
`arch-lens-data-lineage`, `arch-lens-deployment`,
`arch-lens-development`, `arch-lens-error-resilience`,
`arch-lens-module-dependency`, `arch-lens-operational`,
`arch-lens-process-flow`, `arch-lens-repository-access`,
`arch-lens-scenarios`, `arch-lens-security`, `arch-lens-state-lifecycle`

### Audit Family (5)
`audit-arch`, `audit-bugs`, `audit-cohesion`, `audit-defense-standards`,
`audit-tests`

### Planning & Diagramming (3)
`elaborate-phase`, `make-arch-diag`, `make-req`

### Bug/Guard Lifecycle (2)
`design-guards`, `verify-diag`

### Pipeline (1)
`open-integration-pr` — creates integration PRs with per-PR details,
arch-lens diagrams, carried-forward `Closes #N` references, and
auto-closes collapsed PRs

### Sprint Planning (1 — gated by sub-recipe)
`sprint-planner` — selects a focused, conflict-free sprint from a triage
manifest

---

## Skill Modifications (Highlights)

- **`analyze-prs`**: merge queue detection, CI/review eligibility
filtering, queue-mode ordering
- **`dry-walkthrough`**: Step 4.5 Historical Regression Check (git
history mining + GitHub issue cross-reference)
- **`review-pr`**: deterministic diff annotation via
`diff_annotator.py`, echo-primary-obligation step, post-completion
confirmation, degraded-mode narration
- **`collapse-issues`**: content fidelity enforcement — per-issue
`fetch_github_issue` calls, copy-mode body assembly (#388)
- **`prepare-issue`**: multi-keyword dedup search, numbered candidate
selection, extend-existing-issue flow
- **`resolve-review`**: GraphQL thread auto-resolution after addressing
findings (#379)
- **`resolve-merge-conflicts`**: conflict resolution decision report
with per-file log (#389)
- **Cross-skill**: output tokens migrated to `key = value` format;
code-index paths made generic with fallback notes; arch-lens references
fully qualified; anti-prose guards at loop boundaries

---

## CLI & Hooks

### New CLI Commands
- `autoskillit install` — plugin installation + cache refresh
- `autoskillit upgrade` — `.autoskillit/scripts/` →
`.autoskillit/recipes/` migration

### CLI Changes
- `doctor`: plugin-aware MCP check, PostToolUse hook scanning, `--fix`
flag removed
- `init`: GitHub repo prompt, `.secrets.yaml` template, plugin-aware
registration
- `chefs-hat`: pre-launch banner, `--dangerously-skip-permissions`
confirmation
- `recipes render`: repurposed from generator to viewer (delegates to
`/render-recipe`)
- `serve`: server import deferred to after `configure_logging()` to
prevent stdout corruption

### New Hooks
- `branch_protection_guard.py` (PreToolUse) — denies
`merge_worktree`/`push_to_remote` targeting protected branches
- `headless_orchestration_guard.py` (PreToolUse) — blocks orchestration
tools in headless sessions
- `pretty_output.py` (PostToolUse) — MCP JSON → Markdown-KV reformatter

### Hook Infrastructure
- `HookDef.event_type` field — registry now handles both PreToolUse and
PostToolUse
- `generate_hooks_json()` groups entries by event type
- `_evict_stale_autoskillit_hooks` and `sync_hooks_to_settings` made
event-type-agnostic

---

## Core & Config

### New Core Modules
- `core/branch_guard.py` — `is_protected_branch()` pure function
- `core/github_url.py` — `parse_github_repo()` +
`normalize_owner_repo()` canonical parsers

### Core Type Expansions
- `AUTOSKILLIT_PRIVATE_ENV_VARS` frozenset
- `WORKER_TOOLS` / `HEADLESS_BLOCKED_UNGATED_TOOLS` split from
`UNGATED_TOOLS`
- `TOOL_CATEGORIES` — categorized listing for `open_kitchen` response
- `CIRunScope` — immutable scope for CI watcher calls
- `MergeQueueWatcher` protocol
- `SkillResult.cli_subtype` + `write_path_warnings` fields
- `SubprocessRunner.env` parameter

### Config
- `safety.protected_branches`: `[main, integration, stable]`
- `github.staged_label`: `"staged"`
- `ci.workflow`: workflow filename filter (e.g., `"tests.yml"`)
- `branching.default_base_branch`: `"integration"` → `"main"`
- `ModelConfig.default`: `str | None` → `str = "sonnet"`

---

## Infrastructure & Release

### Version
- `0.2.0` → `0.3.1` across `pyproject.toml`, `plugin.json`, `uv.lock`
- FastMCP dependency: `>=3.0.2` → `>=3.1.1,<4.0` (#399)

### CI/CD Workflows
- **`version-bump.yml`** (new) — auto patch-bumps `main` on integration
PR merge, force-syncs integration branch one patch ahead
- **`release.yml`** (new) — minor version bump + GitHub Release on merge
to `stable`
- **`codeql.yml`** (new) — CodeQL analysis for `stable` PRs (Python +
Actions)
- **`tests.yml`** — `merge_group:` trigger added; multi-OS now only for
`stable`

### PyPI Readiness
- `pyproject.toml`: `readme`, `license`, `authors`, `keywords`,
`classifiers`, `project.urls`, `hatch.build.targets.sdist` inclusion
list

### readOnlyHint Parallel Execution Fix
- All MCP tools annotated `readOnlyHint=True` — enables Claude Code
parallel tool execution (~7x speedup). One deliberate exception:
`wait_for_merge_queue` uses `readOnlyHint=False` (actually mutates queue
state)

### Tool Response Exception Boundary
- `track_response_size` decorator catches unhandled exceptions and
serializes them as `{"success": false, "subtype": "tool_exception"}` —
prevents FastMCP opaque error wrapping

### SkillResult Subtype Normalization (#358)
- `_normalize_subtype()` gate eliminates dual-source contradiction
between CLI subtype and session outcome
- Class 2 upward: `SUCCEEDED + error_subtype → "success"` (drain-race
artifact)
- Class 1 downward: `non-SUCCEEDED + "success" → "empty_result"` /
`"missing_completion_marker"` / `"adjudicated_failure"`

---

## Test Coverage

**47 new test files** (+12,703 lines) covering:

| Area | Key Tests |
|------|-----------|
| Merge queue watcher state machine | `test_merge_queue.py` (226 lines)
|
| Clone isolation × CI resolution | `test_clone_ci_contract.py`,
`test_remote_resolver.py` |
| PostToolUse hook | `test_pretty_output.py` (1,516 lines, 40+ cases) |
| Branch protection + headless guards |
`test_branch_protection_guard.py`,
`test_headless_orchestration_guard.py` |
| Sub-recipe composition | 5 test files (schema, loading, validation,
sprint mode × 2) |
| Telemetry formatter | `test_telemetry_formatter.py` (281 lines) |
| PR pipeline gates | `test_analyze_prs_gates.py`,
`test_review_pr_fidelity.py` |
| Diff annotator | `test_diff_annotator.py` (242 lines) |
| Skill compliance | Output token format, genericization, loop-boundary
guards |
| Release workflows | Structural contracts for `version-bump.yml`,
`release.yml` |
| Issue content fidelity | Body-assembling skills must call
`fetch_github_issue` per-issue |
| CI watcher scope | `test_ci_params.py` — workflow_id query param
composition |

---

## Consolidated PRs

#293, #295, #314, #315, #316, #317, #318, #319, #323, #332, #336, #337,
#338, #339, #341, #343, #351, #358, #359, #360, #361, #362, #363, #366,
#368, #370, #375, #377, #378, #379, #380, #388, #389, #390, #391, #392,
#393, #395, #396, #397, #399, #405, #406

---

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant