Skip to content

Pipeline observability: quota events, wall-clock timing, drift fix#341

Merged
Trecek merged 9 commits intointegrationfrom
combined-pipeline-observability-quota-guard-logging-and-per/302
Mar 11, 2026
Merged

Pipeline observability: quota events, wall-clock timing, drift fix#341
Trecek merged 9 commits intointegrationfrom
combined-pipeline-observability-quota-guard-logging-and-per/302

Conversation

@Trecek
Copy link
Collaborator

@Trecek Trecek commented Mar 11, 2026

Summary

Adds three pipeline observability capabilities: a new get_quota_events MCP tool surfacing quota guard decisions from quota_events.jsonl, wall_clock_seconds merged into get_token_summary output for per-step wall-clock visibility, and a .telemetry_cleared_at replay fence preventing token accounting drift when the MCP server restarts after a clear=True call. Includes a follow-up refactor extracting _get_log_root() in tools_status.py to eliminate three identical inline log-root expressions.

Individual Plan Details

Group 1: Pipeline Observability — Quota Guard Logging and Per-Step Elapsed Time

Three related pipeline observability improvements, tracked as GitHub issue #302 (collapsing #218, #65, and the #304/#148 token accounting item):

  1. Quota guard MCP tool (feat: Add quota guard observability to diagnostic logging system #218): The quota_check.py hook already writes quota_events.jsonl with approved/blocked/cache-miss events. Add a new ungated get_quota_events tool to tools_status.py to surface those decisions through the MCP API.

  2. Wall-clock time in token summary (feat: report per-step elapsed time in token summary #65): Merge total_seconds from the timing log into each step's get_token_summary output as wall_clock_seconds, so operators see wall-clock duration alongside token counts in one call. Updates _format_token_summary and write_telemetry_files.

  3. Token accounting drift fix (Combined: Pre-release readiness — stability fixes #304/Stability and correctness fixes for public release #148): Persist a .telemetry_cleared_at timestamp when any log is cleared. _state._initialize reads this on startup and uses max(now - 24h, marker_ts) as the effective replay lower bound, excluding already-cleared sessions.

Group 2: Remediation — Extract _get_log_root() helper in tools_status.py

The audit identified that tools_status.py had three identical inline expressions — resolve_log_dir(_get_ctx().config.linux_tracing.log_dir) — repeated in get_pipeline_report, get_token_summary, and get_timing_summary. This remediation adds _get_log_root() to centralize that computation and replaces all three inline call sites. No behavioral change.

Architecture Impact

Operational Diagram

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 65, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    subgraph UngatedTools ["UNGATED MCP TOOLS (tools_status.py)"]
        GTS["● get_token_summary<br/>━━━━━━━━━━<br/>clear=False<br/>+ ● wall_clock_seconds<br/>from timing_log"]
        GTIM["● get_timing_summary<br/>━━━━━━━━━━<br/>clear=False<br/>total_seconds per step"]
        GPR["● get_pipeline_report<br/>━━━━━━━━━━<br/>clear=False<br/>audit failures"]
        GQE["★ get_quota_events<br/>━━━━━━━━━━<br/>n=50<br/>quota guard decisions"]
    end

    subgraph LogRoot ["★ _get_log_root() helper (tools_status.py)"]
        LR["★ _get_log_root()<br/>━━━━━━━━━━<br/>resolve_log_dir(ctx.config<br/>.linux_tracing.log_dir)"]
    end

    subgraph InMemory ["IN-MEMORY PIPELINE LOGS"]
        TK["token_log<br/>━━━━━━━━━━<br/>step_name → tokens<br/>elapsed_seconds"]
        TI["timing_log<br/>━━━━━━━━━━<br/>step_name → total_seconds<br/>(monotonic clock)"]
        AU["audit_log<br/>━━━━━━━━━━<br/>list FailureRecord"]
    end

    subgraph DiskLogs ["PERSISTENT LOG FILES (~/.local/share/autoskillit/logs/)"]
        QE["quota_events.jsonl<br/>━━━━━━━━━━<br/>approved / blocked<br/>cache_miss / parse_error"]
        CM["★ .telemetry_cleared_at<br/>━━━━━━━━━━<br/>UTC ISO timestamp fence<br/>written on clear=True"]
    end

    subgraph Startup ["SERVER STARTUP (_state._initialize)"]
        INIT["● _state._initialize<br/>━━━━━━━━━━<br/>since = max(now−24h, marker)<br/>load_from_log_dir × 3"]
    end

    subgraph Hook ["HOOK (quota_check.py)"]
        QH["quota_check.py<br/>━━━━━━━━━━<br/>PreToolUse: approve/block<br/>_write_quota_log_event"]
    end

    GTS -->|"clear=True"| LR
    GTIM -->|"clear=True"| LR
    GPR -->|"clear=True"| LR
    LR -->|"write_telemetry_clear_marker"| CM
    GQE -->|"_read_quota_events(n)"| QE
    QH -->|"append event"| QE
    GTS -->|"get_report()"| TK
    GTS -->|"● merge wall_clock_seconds"| TI
    GTIM -->|"get_report()"| TI
    GPR -->|"get_report()"| AU
    CM -->|"read marker → since bound"| INIT
    INIT -->|"load_from_log_dir since=effective"| TK
    INIT -->|"load_from_log_dir since=effective"| TI
    INIT -->|"load_from_log_dir since=effective"| AU

    class GTS,GTIM,GPR cli;
    class GQE newComponent;
    class LR newComponent;
    class TK,TI,AU stateNode;
    class QE stateNode;
    class CM newComponent;
    class INIT phase;
    class QH detector;
Loading

Color Legend: Dark Blue = MCP query tools | Green = New components (get_quota_events, _get_log_root, .telemetry_cleared_at) | Teal = State/logs | Purple = Server startup | Dark Red = PreToolUse hook

State Lifecycle Diagram

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 65, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;

    subgraph InMem ["IN-MEMORY (clearable — MUTABLE)"]
        TK["● token_log<br/>━━━━━━━━━━<br/>MUTABLE<br/>step_name → tokens + elapsed_seconds<br/>cleared on clear=True"]
        TI["timing_log<br/>━━━━━━━━━━<br/>MUTABLE<br/>step_name → total_seconds<br/>cleared on clear=True"]
        AU["audit_log<br/>━━━━━━━━━━<br/>MUTABLE<br/>list FailureRecord<br/>cleared on clear=True"]
    end

    subgraph Derived ["DERIVED (computed per query)"]
        WC["★ wall_clock_seconds<br/>━━━━━━━━━━<br/>DERIVED<br/>timing_log.get_report()<br/>merged into token summary response<br/>never persisted"]
    end

    subgraph ClearFence ["★ CLEAR FENCE (write-then-read across restarts)"]
        CM["★ .telemetry_cleared_at<br/>━━━━━━━━━━<br/>WRITE-FENCE<br/>UTC ISO timestamp<br/>written atomically by write_telemetry_clear_marker<br/>read exactly once by _initialize"]
    end

    subgraph DiskReplay ["DISK REPLAY (bounded by clear fence)"]
        SJ["sessions.jsonl + session/<br/>━━━━━━━━━━<br/>REPLAY-SOURCE<br/>historical token + timing + audit data<br/>replayed with since= lower bound"]
        QE["quota_events.jsonl<br/>━━━━━━━━━━<br/>APPEND-ONLY<br/>quota hook writes, never rewrites<br/>read by ★ get_quota_events"]
    end

    subgraph ClearGate ["CLEAR GATE (state mutation trigger)"]
        ClearTrue["clear=True in<br/>● get_token_summary /<br/>● get_timing_summary /<br/>● get_pipeline_report<br/>━━━━━━━━━━<br/>1. Clear in-memory log<br/>2. ★ Write .telemetry_cleared_at"]
    end

    subgraph StartupGate ["★ STARTUP REPLAY GATE (_state._initialize)"]
        INIT["● _state._initialize<br/>━━━━━━━━━━<br/>1. Read .telemetry_cleared_at<br/>2. since = max(now−24h, marker)<br/>3. load_from_log_dir × 3<br/>Guards: no double-counting"]
    end

    ClearTrue -->|"1. in_memory.clear()"| TK
    ClearTrue -->|"1. in_memory.clear()"| TI
    ClearTrue -->|"1. in_memory.clear()"| AU
    ClearTrue -->|"2. write_telemetry_clear_marker()"| CM
    TI -->|"get_report() per query"| WC
    WC -->|"★ merged into response"| TK
    CM -->|"read → since bound"| INIT
    SJ -->|"load_from_log_dir since=effective"| INIT
    INIT -->|"populate (bounded)"| TK
    INIT -->|"populate (bounded)"| TI
    INIT -->|"populate (bounded)"| AU

    class TK,TI,AU handler;
    class WC newComponent;
    class CM newComponent;
    class ClearTrue detector;
    class INIT phase;
    class SJ,QE stateNode;
Loading

Color Legend: Orange = MUTABLE in-memory logs | Green = New (wall_clock_seconds, .telemetry_cleared_at fence) | Dark Red = clear=True trigger | Purple = startup replay gate | Teal = persistent disk state

Module Dependency Diagram

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%%
graph TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;

    subgraph L3 ["L3 — SERVER (tools_status.py, _state.py, helpers.py)"]
        direction LR
        TS["● tools_status.py<br/>━━━━━━━━━━<br/>★ _get_log_root()<br/>★ get_quota_events<br/>● 3 clear=True paths"]
        ST["● _state.py<br/>━━━━━━━━━━<br/>● _initialize<br/>reads clear marker"]
        HLP["● helpers.py<br/>━━━━━━━━━━<br/>re-exports<br/>write/read_telemetry_clear_marker<br/>resolve_log_dir"]
    end

    subgraph L1 ["L1 — EXECUTION (execution/__init__.py, session_log.py)"]
        direction LR
        EINIT["● execution/__init__.py<br/>━━━━━━━━━━<br/>★ exports write/read_telemetry_clear_marker<br/>public API surface"]
        SL["● session_log.py<br/>━━━━━━━━━━<br/>★ write_telemetry_clear_marker()<br/>★ read_telemetry_clear_marker()<br/>_CLEAR_MARKER_FILENAME"]
    end

    subgraph L0 ["L0 — CORE (core/types.py)"]
        TY["● core/types.py<br/>━━━━━━━━━━<br/>● UNGATED_TOOLS frozenset<br/>+ get_quota_events"]
    end

    TS -->|"import resolve_log_dir<br/>write/read_telemetry_clear_marker<br/>(via helpers shim)"| HLP
    ST -->|"★ import read_telemetry_clear_marker<br/>(direct from execution)"| EINIT
    HLP -->|"re-export from execution"| EINIT
    EINIT -->|"defined in"| SL
    TS -.->|"UNGATED_TOOLS check<br/>(via pipeline.gate)"| TY

    class TS,ST,HLP cli;
    class EINIT,SL handler;
    class TY stateNode;
Loading

Color Legend: Dark Blue = L3 server layer | Orange = L1 execution layer | Teal = L0 core types | Dashed = indirect (via pipeline.gate) | All imports flow downward (no violations)

Closes #302

Implementation Plans

Plan files:

  • temp/make-plan/302_pipeline_observability_plan_2026-03-10_204500.md
  • temp/make-plan/302_remediation_get_log_root_plan_2026-03-10_210500.md

🤖 Generated with Claude Code via AutoSkillit

Trecek and others added 3 commits March 10, 2026 22:18
…onds, and telemetry drift fix

- Add `get_quota_events` ungated MCP tool to surface quota_check.py hook
  decisions (approved/blocked/cache_miss) from quota_events.jsonl
- Merge `timing_log.total_seconds` into `get_token_summary` response as
  `wall_clock_seconds` per step; falls back to `elapsed_seconds` when no
  timing entry exists; also added to `_format_token_summary` markdown output
- Write `.telemetry_cleared_at` marker on `clear=True` in all three status
  tools (get_token_summary, get_timing_summary, get_pipeline_report)
- `_state._initialize` reads the marker on startup and uses
  `max(now-24h, marker)` as the effective `since` lower bound, preventing
  double-counting of cleared sessions on server restart
- Add `write_telemetry_clear_marker` / `read_telemetry_clear_marker` to
  `execution/session_log.py` and re-export from `execution/__init__.py`
- Update CLAUDE.md tool count 38 → 39, add get_quota_events to tool list
- Add `get_quota_events` to UNGATED_TOOLS frozenset in core/types.py
- Tests: clear marker roundtrip, _initialize drift prevention, get_quota_events,
  wall_clock_seconds fallback, clear=True marker writes for all three tools

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lementation

- Route write_telemetry_clear_marker/resolve_log_dir through server/helpers.py
  re-exports so tools_status.py does not import from autoskillit.execution
  (REQ-IMP-003, test_server_tools_import_only_allowed_packages,
   test_no_cross_package_submodule_imports)
- Extract for-loop/dict-comprehension from get_token_summary into
  _merge_wall_clock_seconds() module-level helper (REQ-CNST-008)
- Replace except Exception: pass with logger.debug(..., exc_info=True) in
  tools_status.py and _state.py (ARCH-003)
- Fix except Exception: continue in _read_quota_events to use specific
  json.JSONDecodeError (ARCH-003)
- Add get_quota_events to expected frozenset in test_ungated_tools_contains_expected_names

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Centralizes the repeated resolve_log_dir(_get_ctx().config.linux_tracing.log_dir)
expression into a module-private helper. Replaces the three identical inline
call sites in get_pipeline_report, get_token_summary, and get_timing_summary.
Adds TestGetLogRoot unit tests for the new helper.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit PR Review — Verdict: changes_requested

assert result["events"][0]["event"] == "blocked" # most recent first

@pytest.mark.anyio
async def test_limits_to_n_events(self, tool_ctx, tmp_path, monkeypatch):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] tests: test_limits_to_n_events does not assert total_count equals 10 (the full log size). Only checks len(result["events"]) == 3 but omits verifying total_count reflects the full dataset.

]
(log_dir / "quota_events.jsonl").write_text("\n".join(lines) + "\n")
monkeypatch.setattr(tool_ctx.config.linux_tracing, "log_dir", str(log_dir))
result = json.loads(await get_quota_events(n=3))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] tests: test_limits_to_n_events does not verify ordering of returned events (most-recent-first). The 10-event pagination test never checks which 3 events are returned.

result = json.loads(await get_token_summary())
step = next(s for s in result["steps"] if s["step_name"] == "step-b")
# No timing_log entry → falls back to elapsed_seconds
assert step["wall_clock_seconds"] == pytest.approx(5.0)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] tests: test_wall_clock_falls_back_to_elapsed_when_no_timing never verifies timing_log has no step-b entry. Parallel test pollution could silently bypass the fallback path.

(log_dir / ".telemetry_cleared_at").write_text(three_hours_ago.isoformat())

monkeypatch.setattr(tool_ctx.config.linux_tracing, "log_dir", str(log_dir))
monkeypatch.setattr(_state, "_ctx", None)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] tests: test_initialize_uses_clear_marker_as_since_bound: monkeypatching _state._ctx=None conflicts with tool_ctx fixture which already patches it. Interleaved None-then-reinit may leave _ctx in unexpected state under xdist teardown.

)

monkeypatch.setattr(tool_ctx.config.linux_tracing, "log_dir", str(log_dir))
monkeypatch.setattr(_state, "_ctx", None)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] tests: Same xdist/fixture-teardown concern as L661. Also missing boundary condition: marker timestamp == session timestamp (boundary for <= vs < in since_dt logic).

from autoskillit.core import RESERVED_LOG_RECORD_KEYS, TerminationReason, get_logger
from autoskillit.execution import (
resolve_log_dir, # noqa: F401 — used by tools_integrations.py
read_telemetry_clear_marker, # noqa: F401 — used by tools_status.py
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] slop: noqa comment claims read_telemetry_clear_marker is used by tools_status.py but it is not called anywhere in tools_status.py. The re-export and comment are misleading dead weight.


@mcp.tool(tags={"automation"})
@track_response_size("kitchen_status")
async def kitchen_status() -> str:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] defense: _merge_wall_clock_seconds parameter timing_log is typed as Any, bypassing static type checking. Should be typed as DefaultTimingLog or its protocol to enable mypy to catch misuse at call sites.

@@ -151,9 +190,66 @@ async def get_timing_summary(clear: bool = False) -> str:
total = _get_ctx().timing_log.compute_total()
if clear:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] fidelity: write_telemetry_clear_marker is called when get_timing_summary(clear=True) fires, advancing the shared fence even when token_log and audit are NOT cleared. On next restart, _state._initialize skips sessions for all three log types — may under-count token and audit data that was never cleared. Issue #302 does not describe this shared-fence side effect.

Copy link
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit Review Findings

Verdict: changes_requested

8 actionable findings (all warning severity). Implementation is correct and addresses all three requirements from issue #302 (quota events tool, wall-clock seconds in token summary, drift prevention fence). Inline comments posted above.

src/autoskillit/server/helpers.py

  • L14 [warning/slop]: noqa comment claims read_telemetry_clear_marker is 'used by tools_status.py' but it is not called anywhere in tools_status.py — unused re-export with misleading comment.

src/autoskillit/server/tools_status.py

  • L37 [warning/defense]: _merge_wall_clock_seconds parameter timing_log typed as Any — bypasses static type checking; use DefaultTimingLog or its protocol.
  • L191 [warning/fidelity]: Shared-fence side effect — get_timing_summary(clear=True) advances the fence for all three log types (token_log, timing_log, audit), not just timing_log. On next restart, _state._initialize may skip token/audit sessions that were never explicitly cleared. Not described in issue #302.

tests/server/test_tools_status.py

  • L627 [warning/tests]: test_limits_to_n_events omits assert result["total_count"] == 10 — total_count goes untested for the n-limiting case.
  • L636 [warning/tests]: test_limits_to_n_events never asserts which 3 events are returned (oldest or newest) — ordering unverified in the paginated case.
  • L695 [warning/tests]: test_wall_clock_falls_back_to_elapsed_when_no_timing never asserts timing_log has no 'step-b' entry — parallel pollution could silently bypass the fallback.

tests/server/test_server_init.py

  • L661 [warning/tests]: test_initialize_uses_clear_marker_as_since_bound uses bare _state._ctx = None assignment conflicting with fixture monkeypatch — xdist teardown ordering may leave _ctx in unexpected state.
  • L705 [warning/tests]: Same xdist concern as L661; missing boundary condition test (marker ts == session ts).

Trecek and others added 6 commits March 11, 2026 10:46
…annotations

Both sides of the conflict were complementary: the PR added _get_log_root()
and _merge_wall_clock_seconds() helpers plus get_quota_events tool, while
integration added readOnlyHint=True annotations to all @mcp.tool decorators.
Resolution keeps all new code and applies readOnlyHint to every decorator
including the new get_quota_events tool.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Trecek Trecek merged commit 20f174d into integration Mar 11, 2026
2 checks passed
@Trecek Trecek deleted the combined-pipeline-observability-quota-guard-logging-and-per/302 branch March 11, 2026 20:58
Trecek added a commit that referenced this pull request Mar 12, 2026
…333, #342 into integration (#351)

## Integration Summary

Collapsed 9 PRs into `pr-batch/pr-merge-20260311-133920` targeting
`integration`.

## Merged PRs

| # | Title | Complexity | Additions | Deletions | Overlaps |
|---|-------|-----------|-----------|-----------|---------|
| #337 | Implementation Plan: Dry Walkthrough — Test Command
Genericization (Issue #307) | simple | +29 | -2 | — |
| #339 | Implementation Plan: Release CI — Force-Push Integration
Back-Sync | simple | +88 | -45 | — |
| #336 | Enhance prepare-issue with Duplicate Detection and Broader
Triggers | needs_check | +161 | -8 | — |
| #332 | Rectify: Display Output Bugs #329 — Terminal Targets
Consolidation — PART A ONLY | needs_check | +783 | -13 | — |
| #338 | Implementation Plan: Pre-release Readiness — Stability Fixes |
needs_check | +238 | -36 | — |
| #343 | Implementation Plan: PR Pipeline Gates — Mergeability Gate and
Review Cycle | needs_check | +384 | -5 | #338 |
| #341 | Pipeline observability: quota events, wall-clock timing, drift
fix | needs_check | +480 | -5 | #332, #338 |
| #333 | Remove run_recipe — Eliminate Sub-Orchestrator Pattern |
needs_check | +538 | -655 | #332, #338, #341 |
| #342 | feat: genericize codebase and bundle external dependencies for
public release | needs_check | +5286 | -1062 | #332, #333, #338, #341,
#343 |

## Audit

**Verdict:** GO

## Architecture Impact

### Development Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    subgraph SourceTree ["PROJECT STRUCTURE (● = modified)"]
        direction TB
        SRC["● src/autoskillit/<br/>━━━━━━━━━━<br/>105 .py source files<br/>cli · config · core<br/>execution · hooks · pipeline<br/>recipe · server · workspace"]
        SKILLS["● + ★ src/autoskillit/skills/<br/>━━━━━━━━━━<br/>52 bundled skills<br/>★ 13 arch-lens-* SKILL.md added<br/>★ 3 audit-* SKILL.md added<br/>● 14 existing skills updated"]
        RECIPES["● src/autoskillit/recipes/<br/>━━━━━━━━━━<br/>8 bundled YAML recipes<br/>All recipes updated"]
        TESTS["● + ★ tests/<br/>━━━━━━━━━━<br/>173 .py test files<br/>★ 6 new test files added"]
    end

    subgraph Build ["BUILD TOOLING"]
        direction TB
        PYPROJECT["● pyproject.toml<br/>━━━━━━━━━━<br/>hatchling build backend<br/>uv package manager<br/>10 runtime deps"]
        TASKFILE["Taskfile.yml<br/>━━━━━━━━━━<br/>test-all · test-check<br/>test-smoke · install-worktree"]
    end

    subgraph Quality ["CODE QUALITY GATES"]
        direction TB
        RFMT["ruff-format<br/>━━━━━━━━━━<br/>Auto-fix formatting"]
        RLINT["ruff<br/>━━━━━━━━━━<br/>Lint + auto-fix"]
        MYPY["mypy src/<br/>━━━━━━━━━━<br/>--ignore-missing-imports"]
        UVLOCK["uv lock --check<br/>━━━━━━━━━━<br/>Lock file integrity"]
        SECRETS["gitleaks<br/>━━━━━━━━━━<br/>Secret scanning"]
        GUARD["★ headless_orchestration_guard.py<br/>━━━━━━━━━━<br/>★ PreToolUse hook<br/>Blocks run_skill/run_cmd/run_python<br/>from headless sessions"]
    end

    subgraph Testing ["TEST FRAMEWORK"]
        direction TB
        PYTEST["pytest + asyncio_mode=auto<br/>━━━━━━━━━━<br/>xdist -n 4 parallel<br/>timeout=60s signal method"]
        NEWTEST["★ New Test Files<br/>━━━━━━━━━━<br/>★ test_headless_orchestration_guard<br/>★ test_audit_and_fix_degradation<br/>★ test_rules_inputs<br/>★ test_skill_genericization<br/>★ test_pyproject_metadata<br/>★ test_release_sanity"]
    end

    subgraph CI ["CI/CD WORKFLOWS"]
        direction LR
        TESTS_WF["tests.yml<br/>━━━━━━━━━━<br/>PR test gate"]
        RELEASE_WF["release.yml<br/>━━━━━━━━━━<br/>Release automation"]
        BUMP_WF["● version-bump.yml<br/>━━━━━━━━━━<br/>● Force-push back-sync<br/>integration → main"]
    end

    subgraph EntryPoints ["ENTRY POINTS"]
        EP["autoskillit CLI<br/>━━━━━━━━━━<br/>serve · init · skills<br/>recipes · doctor · workspace"]
    end

    SRC --> PYPROJECT
    SKILLS --> PYPROJECT
    TESTS --> PYTEST
    PYPROJECT --> TASKFILE
    PYPROJECT --> RFMT
    RFMT --> RLINT
    RLINT --> MYPY
    MYPY --> UVLOCK
    UVLOCK --> SECRETS
    SECRETS --> GUARD
    GUARD --> PYTEST
    PYTEST --> NEWTEST
    NEWTEST --> BUMP_WF
    TESTS_WF --> PYTEST
    PYPROJECT --> EP

    class SRC,TESTS stateNode;
    class SKILLS,RECIPES newComponent;
    class PYPROJECT,TASKFILE phase;
    class RFMT,RLINT,MYPY,UVLOCK,SECRETS detector;
    class GUARD newComponent;
    class PYTEST handler;
    class NEWTEST newComponent;
    class TESTS_WF,RELEASE_WF phase;
    class BUMP_WF newComponent;
    class EP output;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Teal | Structure | Source directories and test suite |
| Green (★) | New/Modified | New files and components added in this PR |
| Purple | Build | Build configuration and task automation |
| Red | Quality Gates | Pre-commit hooks, linters, type checker |
| Orange | Test Runner | pytest execution engine |
| Dark Teal | Entry Points | CLI commands |

### Module Dependency Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%%
graph TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;

    subgraph L0 ["L0 — CORE (zero autoskillit imports)"]
        direction LR
        TYPES["● core/types.py<br/>━━━━━━━━━━<br/>GATED_TOOLS · UNGATED_TOOLS<br/>RecipeSource (★ promoted here)<br/>ClaudeFlags · StrEnums<br/>fan-in: ~75 files"]
        COREIO["core/io.py · logging.py · paths.py<br/>━━━━━━━━━━<br/>Atomic write · Logger · pkg_root()"]
    end

    subgraph L1P ["L1 — PIPELINE (imports L0 only)"]
        direction TB
        GATE["● pipeline/gate.py<br/>━━━━━━━━━━<br/>DefaultGateState<br/>gate_error_result()<br/>★ headless_error_result()<br/>re-exports GATED/UNGATED_TOOLS"]
        PIPEINIT["● pipeline/__init__.py<br/>━━━━━━━━━━<br/>Re-exports public surface<br/>ToolContext · AuditLog<br/>TokenLog · DefaultGateState"]
    end

    subgraph L1E ["L1 — EXECUTION (imports L0 only)"]
        direction TB
        HEADLESS["● execution/headless.py<br/>━━━━━━━━━━<br/>Headless Claude sessions<br/>Imports core types via TYPE_CHECKING<br/>for ToolContext (no runtime cycle)"]
        COMMANDS["● execution/commands.py<br/>━━━━━━━━━━<br/>ClaudeHeadlessCmd builder"]
        SESSION_LOG["● execution/session_log.py<br/>━━━━━━━━━━<br/>Session diagnostics writer"]
    end

    subgraph L2 ["L2 — RECIPE (imports L0+L1)"]
        direction TB
        SCHEMA["● recipe/schema.py<br/>━━━━━━━━━━<br/>Recipe · RecipeStep · DataFlowWarning<br/>RecipeSource (now from L0)"]
        RULES["● recipe/rules_inputs.py<br/>━━━━━━━━━━<br/>★ Ingredient validation rules<br/>reads GATED_TOOLS from L0 via<br/>pipeline re-export"]
        ANALYSIS["● recipe/_analysis.py<br/>━━━━━━━━━━<br/>Step graph builder"]
        VALIDATOR["● recipe/validator.py<br/>━━━━━━━━━━<br/>validate_recipe()"]
    end

    subgraph L3S ["L3 — SERVER (imports all layers)"]
        direction TB
        HELPERS["● server/helpers.py<br/>━━━━━━━━━━<br/>_require_enabled() — reads gate<br/>★ _require_not_headless()<br/>Shared by all tool handlers"]
        TOOLS_EX["● server/tools_execution.py<br/>━━━━━━━━━━<br/>run_cmd · run_python · run_skill<br/>✗ run_recipe REMOVED<br/>Uses _require_not_headless()"]
        TOOLS_GIT["● server/tools_git.py<br/>━━━━━━━━━━<br/>merge_worktree · classify_fix<br/>● check_pr_mergeable (new gate)"]
        TOOLS_K["● server/tools_kitchen.py<br/>━━━━━━━━━━<br/>open_kitchen · close_kitchen"]
        FACTORY["● server/_factory.py<br/>━━━━━━━━━━<br/>Composition root<br/>Wires ToolContext"]
    end

    subgraph L3H ["L3 — HOOKS (stdlib only for guard)"]
        direction LR
        HOOK_GUARD["★ hooks/headless_orchestration_guard.py<br/>━━━━━━━━━━<br/>★ PreToolUse hook (stdlib only)<br/>Blocks run_skill/run_cmd/run_python<br/>from AUTOSKILLIT_HEADLESS=1 sessions<br/>NO autoskillit imports"]
        PRETTY["● hooks/pretty_output.py<br/>━━━━━━━━━━<br/>PostToolUse response formatter"]
    end

    subgraph L3C ["L3 — CLI (imports all layers)"]
        direction LR
        CLI_APP["● cli/app.py<br/>━━━━━━━━━━<br/>serve · init · skills · recipes<br/>doctor · workspace"]
        CLI_PROMPTS["● cli/_prompts.py<br/>━━━━━━━━━━<br/>Orchestrator prompt builder"]
    end

    TYPES -->|"fan-in ~75"| GATE
    TYPES -->|"fan-in ~75"| HEADLESS
    TYPES -->|"fan-in ~75"| SCHEMA
    COREIO --> PIPEINIT
    GATE --> PIPEINIT
    PIPEINIT -->|"gate_error_result<br/>headless_error_result"| HELPERS
    HEADLESS --> HELPERS
    COMMANDS --> HEADLESS
    SESSION_LOG --> HELPERS
    SCHEMA -->|"RecipeSource from L0"| RULES
    RULES --> VALIDATOR
    ANALYSIS --> VALIDATOR
    HELPERS -->|"_require_not_headless"| TOOLS_EX
    HELPERS --> TOOLS_GIT
    HELPERS --> TOOLS_K
    VALIDATOR --> FACTORY
    PIPEINIT --> FACTORY
    FACTORY --> CLI_APP
    FACTORY --> CLI_PROMPTS
    HOOK_GUARD -.->|"ENV: AUTOSKILLIT_HEADLESS<br/>zero autoskillit imports"| TOOLS_EX

    class TYPES,COREIO stateNode;
    class GATE,PIPEINIT phase;
    class HEADLESS,COMMANDS,SESSION_LOG handler;
    class SCHEMA,RULES,ANALYSIS,VALIDATOR phase;
    class HELPERS,TOOLS_EX,TOOLS_GIT,TOOLS_K handler;
    class FACTORY cli;
    class CLI_APP,CLI_PROMPTS cli;
    class HOOK_GUARD newComponent;
    class PRETTY handler;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Teal | L0 Core | High fan-in foundation types (zero reverse deps) |
| Purple | L1/L2 Control | Pipeline gate, recipe schema and rules |
| Orange | L1/L3 Processors | Execution handlers, server tool handlers |
| Dark Blue | L3 CLI | Composition root and CLI entry points |
| Green (★) | New Components | headless_orchestration_guard — standalone
hook |
| Dashed | ENV Signal | OS-level check; no code import relationship |

Closes #307
Closes #327
Closes #308
Closes #329
Closes #304
Closes #328
Closes #302
Closes #330
Closes #311

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit


---

## Merge Conflict Resolution

The batch branch was rebased onto `integration` to resolve 17 file
conflicts. All conflicts arose because PRs #337#341 were squash-merged
into both `integration` (directly) and the batch branch (via the
pipeline), while PRs #333 and #342 required conflict resolution work
that only exists on the batch branch.

**Resolution principle:** Batch branch version wins for all files
touched by #333/#342 conflict resolution and remediation, since that
state was fully tested (3752 passed). Integration-only additions (e.g.
`TestGetQuotaEvents`) were preserved where they don't overlap.

### Per-file decisions

| File | Decision | Rationale |
|------|----------|-----------|
| `CLAUDE.md` | **Batch wins** | Batch has corrected tool inventory
(run_recipe removed, get_quota_events added, 25 kitchen tools) |
| `core/types.py` | **Batch wins** | Batch splits monolithic
UNGATED_TOOLS into WORKER_TOOLS + HEADLESS_BLOCKED_UNGATED_TOOLS;
removes run_recipe from GATED_TOOLS |
| `execution/__init__.py` | **Batch wins** | Batch removes dead exports
(build_subrecipe_cmd, run_subrecipe_session) |
| `execution/headless.py` | **Batch wins** | Batch deletes
run_subrecipe_session function (530+ lines); keeps run_headless_core
with token_log error handling |
| `hooks/pretty_output.py` | **Batch wins** | Batch removes run_recipe
from _UNFORMATTED_TOOLS, adds get_quota_events |
| `recipes/pr-merge-pipeline.yaml` | **Batch wins** | Batch has
base_branch required:true, updated kitchen rules (main instead of
integration) |
| `server/_state.py` | **Batch wins** | Batch adds .telemetry_cleared_at
marker reading in _initialize |
| `server/helpers.py` | **Batch wins** | Batch removes _run_subrecipe
and run_subrecipe_session import; adds _require_not_headless |
| `server/tools_git.py` | **Batch wins** | Batch has updated
classify_fix with git fetch and check_pr_mergeable gate |
| `server/tools_kitchen.py` | **Batch wins** | Batch adds headless gates
to open_kitchen/close_kitchen; adds TOOL_CATEGORIES listing |
| `server/tools_status.py` | **Merge both** | Batch headless gates +
wall_clock_seconds merged with integration's TestGetQuotaEvents
(deduplicated) |
| `tests/conftest.py` | **Batch wins** | Batch replaces
AUTOSKILLIT_KITCHEN_OPEN with AUTOSKILLIT_HEADLESS in fixture |
| `tests/execution/test_headless.py` | **Batch wins** | Batch removes
run_subrecipe_session tests (deleted code); updates docstring |
| `tests/recipe/test_bundled_recipes.py` | **Merge both** | Batch
base_branch=main assertions + integration WF7 graph test both kept |
| `tests/server/test_tools_kitchen.py` | **Batch wins** | Batch adds
headless gate denial tests for open/close kitchen |
| `tests/server/test_tools_status.py` | **Merge both** | Batch headless
gate tests merged with integration quota events tests |

### Post-rebase fixes
- Removed duplicate `TestGetQuotaEvents` class (existed in both batch
commit and auto-merged integration code)
- Fixed stale `_build_tool_listing` → `_build_tool_category_listing`
attribute reference
- Added `if diagram: print(diagram)` to `cli/app.py` cook function (test
expected terminal output)

### Verification
- **3752 passed**, 23 skipped, 0 failures
- 7 architecture contracts kept, 0 broken
- Pre-commit hooks all pass

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Trecek added a commit that referenced this pull request Mar 15, 2026
…, Headless Isolation (#404)

## Summary

Integration rollup of **43 PRs** (#293#406) consolidating **62
commits** across **291 files** (+27,909 / −6,040 lines). This release
advances AutoSkillit from v0.2.0 to v0.3.1 with GitHub merge queue
integration, sub-recipe composition, a PostToolUse output reformatter,
headless session isolation guards, and comprehensive pipeline
observability — plus 24 new bundled skills, 3 new MCP tools, and 47 new
test files.

---

## Major Features

### GitHub Merge Queue Integration (#370, #362, #390)
- New `wait_for_merge_queue` MCP tool — polls a PR through GitHub's
merge queue until merged, ejected, or timed out (default 600s). Uses
REST + GraphQL APIs with stuck-queue detection and auto-merge
re-enrollment
- New `DefaultMergeQueueWatcher` L1 service (`execution/merge_queue.py`)
— never raises; all outcomes are structured results
- `parse_merge_queue_response()` pure function for GraphQL queue entry
parsing
- New `auto_merge` ingredient in `implementation.yaml` and
`remediation.yaml` — enrolls PRs in the merge queue after CI passes
- Full queue-mode path added to `merge-prs.yaml`: detect queue → enqueue
→ wait → handle ejections → re-enter
- `analyze-prs` skill gains Step 0.5 (merge queue detection) and Step
1.5 (CI/review eligibility filtering)

### Sub-Recipe Composition (#380)
- Recipe steps can now reference sub-recipes via `sub_recipe` + `gate`
fields — lazy-loaded and merged at validation time
- Composition engine in `recipe/_api.py`: `_merge_sub_recipe()` inlines
sub-recipe steps with safe name-prefixing and route remapping (`done` →
parent's `on_success`, `escalate` → parent's `on_failure`)
- `_build_active_recipe()` evaluates gate ingredients against
overrides/defaults; dual validation runs on both active and combined
recipes
- First sub-recipe: `sprint-prefix.yaml` — triage → plan → confirm →
dispatch workflow, gated by `sprint_mode` ingredient (hidden, default
false)
- Both `implementation.yaml` and `remediation.yaml` gain `sprint_entry`
placeholder step
- New semantic rules: `unknown-sub-recipe` (ERROR),
`circular-sub-recipe` (ERROR) with DFS cycle detection

### PostToolUse Output Reformatter (#293, #405)
- `pretty_output.py` — new 671-line PostToolUse hook that rewrites raw
MCP JSON responses to Markdown-KV before Claude consumes them (30–77%
token overhead reduction)
- Dedicated formatters for 11 high-traffic tools (`run_skill`,
`run_cmd`, `test_check`, `merge_worktree`, `get_token_summary`, etc.)
plus a generic KV formatter for remaining tools
- Pipeline vs. interactive mode detection via hook config file
- Unwraps Claude Code's `{"result": "<json-string>"}` envelope before
dispatching
- 1,516-line test file with 40+ behavioral tests

### Headless Session Isolation (#359, #393, #397, #405, #406)
- **Env isolation**: `build_sanitized_env()` strips
`AUTOSKILLIT_PRIVATE_ENV_VARS` from subprocess environments, preventing
`AUTOSKILLIT_HEADLESS=1` from leaking into test runners
- **CWD path contamination defense**: `_inject_cwd_anchor()` anchors all
relative paths to session CWD; `_validate_output_paths()` checks
structured output tokens against CWD prefix; `_scan_jsonl_write_paths()`
post-session scanner catches actual Write/Edit/Bash tool calls outside
CWD
- **Headless orchestration guard**: new PreToolUse hook blocks
`run_skill`/`run_cmd`/`run_python` when `AUTOSKILLIT_HEADLESS=1`,
enforcing Tier 1/Tier 2 nesting invariant
- **`_require_not_headless()` server-side guard**: blocks 10
orchestration-only tools from headless sessions at the handler layer
- **Unified error response contract**: `headless_error_result()`
produces consistent 9-field responses;
`_build_headless_error_response()` canonical builder for all failure
paths in `tools_integrations.py`

### Cook UX Overhaul (#375, #363)
- `open_kitchen` now accepts optional `name` + `overrides` — opens
kitchen AND loads recipe in a single call
- Pre-launch terminal preview with ANSI-colored flow diagram and
ingredients table via new `cli/_ansi.py` module
- `--dangerously-skip-permissions` warning banner with interactive
confirmation prompt
- Randomized session greetings from themed pools
- Orchestrator prompt rewritten: recipe YAML no longer injected via
`--append-system-prompt`; session calls `open_kitchen('{recipe_name}')`
as first action
- Conversational ingredient collection replaces mechanical per-field
prompting

---

## New MCP Tools

| Tool | Gate | Description |
|------|------|-------------|
| `wait_for_merge_queue` | Kitchen | Polls PR through GitHub merge queue
(REST + GraphQL) |
| `set_commit_status` | Kitchen | Posts GitHub Commit Status to a SHA
for review-first gating |
| `get_quota_events` | Ungated | Surfaces quota guard decisions from
`quota_events.jsonl` |

---

## Pipeline Observability (#318, #341)

- **`TelemetryFormatter`** (`pipeline/telemetry_fmt.py`) — single source
of truth for all telemetry rendering; replaces dual-formatter
anti-pattern. Four rendering modes: Markdown table, terminal table,
compact KV (for PostToolUse hook)
- `get_token_summary` and `get_timing_summary` gain `format` parameter
(`"json"` | `"table"`)
- `wall_clock_seconds` merged into token summary output — see duration
alongside token counts in one call
- **Telemetry clear marker**: `write_telemetry_clear_marker()` /
`read_telemetry_clear_marker()` prevent token accounting drift on MCP
server restart after `clear=True`
- **Quota event logging**: `quota_check.py` hook now writes structured
JSONL events (`cache_miss`, `parse_error`, `blocked`, `approved`) to
`quota_events.jsonl`

---

## CI Watcher & Remote Resolution Fixes (#395, #406)

- **`CIRunScope` value object** — carries `workflow` + `head_sha` scope;
replaces bare `head_sha` parameter across all CI watcher signatures
- **Workflow filter**: `wait_for_ci` and `get_ci_status` accept
`workflow` parameter (falls back to project-level `config.ci.workflow`),
preventing unrelated workflows (version bumps, labelers) from satisfying
CI checks
- **`FAILED_CONCLUSIONS` expanded**: `failure` → `{failure, timed_out,
startup_failure, cancelled}`
- **Canonical remote resolver** (`execution/remote_resolver.py`):
`resolve_remote_repo()` with `REMOTE_PRECEDENCE = (upstream, origin)` —
correctly resolves `owner/repo` after `clone_repo` sets `origin` to
`file://` isolation URL
- **Clone isolation fix**: `clone_repo` now always clones from remote
URL (never local path); sets `origin=file:///<clone>` for isolation and
`upstream=<real_url>` for push/CI operations

---

## PR Pipeline Gates (#317, #343)

- **`pipeline/pr_gates.py`**: `is_ci_passing()`, `is_review_passing()`,
`partition_prs()` — partitions PRs into
eligible/CI-blocked/review-blocked with human-readable reasons
- **`pipeline/fidelity.py`**: `extract_linked_issues()`
(Closes/Fixes/Resolves patterns), `is_valid_fidelity_finding()` schema
validation
- **`check_pr_mergeable`** now returns `mergeable_status` field
alongside boolean
- **`release_issue`** gains `target_branch` + `staged_label` parameters
for staged issue lifecycle on non-default branches (#392)

---

## Recipe System Changes

### Structural
- `RecipeIngredient.hidden` field — excluded from ingredients table
(used for internal flags like `sprint_mode`)
- `Recipe.experimental` flag parsed from YAML
- `_TERMINAL_TARGETS` moved to `schema.py` as single source of truth
- `format_ingredients_table()` with sorted display order (required →
auto-detect → flags → optional → constants)
- Diagram rendering engine (~670 lines) removed from `diagrams.py` —
rendering now handled by `/render-recipe` skill; format version bumped
to v7

### Recipe YAML Changes
- **Deleted**: `audit-and-fix.yaml`, `batch-implementation.yaml`,
`bugfix-loop.yaml`
- **Renamed**: `pr-merge-pipeline.yaml` → `merge-prs.yaml`
- **`implementation.yaml`**: merge queue steps,
`auto_merge`/`sprint_mode` ingredients, `base_branch` default → `""`
(auto-detect), CI workflow filter, `extract_pr_number` step
- **`remediation.yaml`**: `topic` → `task` rename, merge queue steps,
`dry_walkthrough` retries:3 with forward-only routing, `verify` → `test`
rename
- **`merge-prs.yaml`**: full queue-mode path, `open-integration-pr` step
(replaces `create-review-pr`), post-PR mergeability polling, review
cycle with `resolve-review` retries

### New Semantic Rules
- `missing-output-patterns` (WARNING) — flags `run_skill` steps without
`expected_output_patterns`
- `unknown-sub-recipe` (ERROR) — validates sub-recipe references exist
- `circular-sub-recipe` (ERROR) — DFS cycle detection
- `unknown-skill-command` (ERROR) — validates skill names against
bundled set
- `telemetry-before-open-pr` (WARNING) — ensures telemetry step precedes
`open-pr`

---

## New Skills (24)

### Architecture Lens Family (13)
`arch-lens-c4-container`, `arch-lens-concurrency`,
`arch-lens-data-lineage`, `arch-lens-deployment`,
`arch-lens-development`, `arch-lens-error-resilience`,
`arch-lens-module-dependency`, `arch-lens-operational`,
`arch-lens-process-flow`, `arch-lens-repository-access`,
`arch-lens-scenarios`, `arch-lens-security`, `arch-lens-state-lifecycle`

### Audit Family (5)
`audit-arch`, `audit-bugs`, `audit-cohesion`, `audit-defense-standards`,
`audit-tests`

### Planning & Diagramming (3)
`elaborate-phase`, `make-arch-diag`, `make-req`

### Bug/Guard Lifecycle (2)
`design-guards`, `verify-diag`

### Pipeline (1)
`open-integration-pr` — creates integration PRs with per-PR details,
arch-lens diagrams, carried-forward `Closes #N` references, and
auto-closes collapsed PRs

### Sprint Planning (1 — gated by sub-recipe)
`sprint-planner` — selects a focused, conflict-free sprint from a triage
manifest

---

## Skill Modifications (Highlights)

- **`analyze-prs`**: merge queue detection, CI/review eligibility
filtering, queue-mode ordering
- **`dry-walkthrough`**: Step 4.5 Historical Regression Check (git
history mining + GitHub issue cross-reference)
- **`review-pr`**: deterministic diff annotation via
`diff_annotator.py`, echo-primary-obligation step, post-completion
confirmation, degraded-mode narration
- **`collapse-issues`**: content fidelity enforcement — per-issue
`fetch_github_issue` calls, copy-mode body assembly (#388)
- **`prepare-issue`**: multi-keyword dedup search, numbered candidate
selection, extend-existing-issue flow
- **`resolve-review`**: GraphQL thread auto-resolution after addressing
findings (#379)
- **`resolve-merge-conflicts`**: conflict resolution decision report
with per-file log (#389)
- **Cross-skill**: output tokens migrated to `key = value` format;
code-index paths made generic with fallback notes; arch-lens references
fully qualified; anti-prose guards at loop boundaries

---

## CLI & Hooks

### New CLI Commands
- `autoskillit install` — plugin installation + cache refresh
- `autoskillit upgrade` — `.autoskillit/scripts/` →
`.autoskillit/recipes/` migration

### CLI Changes
- `doctor`: plugin-aware MCP check, PostToolUse hook scanning, `--fix`
flag removed
- `init`: GitHub repo prompt, `.secrets.yaml` template, plugin-aware
registration
- `chefs-hat`: pre-launch banner, `--dangerously-skip-permissions`
confirmation
- `recipes render`: repurposed from generator to viewer (delegates to
`/render-recipe`)
- `serve`: server import deferred to after `configure_logging()` to
prevent stdout corruption

### New Hooks
- `branch_protection_guard.py` (PreToolUse) — denies
`merge_worktree`/`push_to_remote` targeting protected branches
- `headless_orchestration_guard.py` (PreToolUse) — blocks orchestration
tools in headless sessions
- `pretty_output.py` (PostToolUse) — MCP JSON → Markdown-KV reformatter

### Hook Infrastructure
- `HookDef.event_type` field — registry now handles both PreToolUse and
PostToolUse
- `generate_hooks_json()` groups entries by event type
- `_evict_stale_autoskillit_hooks` and `sync_hooks_to_settings` made
event-type-agnostic

---

## Core & Config

### New Core Modules
- `core/branch_guard.py` — `is_protected_branch()` pure function
- `core/github_url.py` — `parse_github_repo()` +
`normalize_owner_repo()` canonical parsers

### Core Type Expansions
- `AUTOSKILLIT_PRIVATE_ENV_VARS` frozenset
- `WORKER_TOOLS` / `HEADLESS_BLOCKED_UNGATED_TOOLS` split from
`UNGATED_TOOLS`
- `TOOL_CATEGORIES` — categorized listing for `open_kitchen` response
- `CIRunScope` — immutable scope for CI watcher calls
- `MergeQueueWatcher` protocol
- `SkillResult.cli_subtype` + `write_path_warnings` fields
- `SubprocessRunner.env` parameter

### Config
- `safety.protected_branches`: `[main, integration, stable]`
- `github.staged_label`: `"staged"`
- `ci.workflow`: workflow filename filter (e.g., `"tests.yml"`)
- `branching.default_base_branch`: `"integration"` → `"main"`
- `ModelConfig.default`: `str | None` → `str = "sonnet"`

---

## Infrastructure & Release

### Version
- `0.2.0` → `0.3.1` across `pyproject.toml`, `plugin.json`, `uv.lock`
- FastMCP dependency: `>=3.0.2` → `>=3.1.1,<4.0` (#399)

### CI/CD Workflows
- **`version-bump.yml`** (new) — auto patch-bumps `main` on integration
PR merge, force-syncs integration branch one patch ahead
- **`release.yml`** (new) — minor version bump + GitHub Release on merge
to `stable`
- **`codeql.yml`** (new) — CodeQL analysis for `stable` PRs (Python +
Actions)
- **`tests.yml`** — `merge_group:` trigger added; multi-OS now only for
`stable`

### PyPI Readiness
- `pyproject.toml`: `readme`, `license`, `authors`, `keywords`,
`classifiers`, `project.urls`, `hatch.build.targets.sdist` inclusion
list

### readOnlyHint Parallel Execution Fix
- All MCP tools annotated `readOnlyHint=True` — enables Claude Code
parallel tool execution (~7x speedup). One deliberate exception:
`wait_for_merge_queue` uses `readOnlyHint=False` (actually mutates queue
state)

### Tool Response Exception Boundary
- `track_response_size` decorator catches unhandled exceptions and
serializes them as `{"success": false, "subtype": "tool_exception"}` —
prevents FastMCP opaque error wrapping

### SkillResult Subtype Normalization (#358)
- `_normalize_subtype()` gate eliminates dual-source contradiction
between CLI subtype and session outcome
- Class 2 upward: `SUCCEEDED + error_subtype → "success"` (drain-race
artifact)
- Class 1 downward: `non-SUCCEEDED + "success" → "empty_result"` /
`"missing_completion_marker"` / `"adjudicated_failure"`

---

## Test Coverage

**47 new test files** (+12,703 lines) covering:

| Area | Key Tests |
|------|-----------|
| Merge queue watcher state machine | `test_merge_queue.py` (226 lines)
|
| Clone isolation × CI resolution | `test_clone_ci_contract.py`,
`test_remote_resolver.py` |
| PostToolUse hook | `test_pretty_output.py` (1,516 lines, 40+ cases) |
| Branch protection + headless guards |
`test_branch_protection_guard.py`,
`test_headless_orchestration_guard.py` |
| Sub-recipe composition | 5 test files (schema, loading, validation,
sprint mode × 2) |
| Telemetry formatter | `test_telemetry_formatter.py` (281 lines) |
| PR pipeline gates | `test_analyze_prs_gates.py`,
`test_review_pr_fidelity.py` |
| Diff annotator | `test_diff_annotator.py` (242 lines) |
| Skill compliance | Output token format, genericization, loop-boundary
guards |
| Release workflows | Structural contracts for `version-bump.yml`,
`release.yml` |
| Issue content fidelity | Body-assembling skills must call
`fetch_github_issue` per-issue |
| CI watcher scope | `test_ci_params.py` — workflow_id query param
composition |

---

## Consolidated PRs

#293, #295, #314, #315, #316, #317, #318, #319, #323, #332, #336, #337,
#338, #339, #341, #343, #351, #358, #359, #360, #361, #362, #363, #366,
#368, #370, #375, #377, #378, #379, #380, #388, #389, #390, #391, #392,
#393, #395, #396, #397, #399, #405, #406

---

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant