Pipeline observability: quota events, wall-clock timing, drift fix by Trecek · Pull Request #341 · TalonT-Org/AutoSkillit

Trecek · 2026-03-11T17:27:58Z

Summary

Adds three pipeline observability capabilities: a new get_quota_events MCP tool surfacing quota guard decisions from quota_events.jsonl, wall_clock_seconds merged into get_token_summary output for per-step wall-clock visibility, and a .telemetry_cleared_at replay fence preventing token accounting drift when the MCP server restarts after a clear=True call. Includes a follow-up refactor extracting _get_log_root() in tools_status.py to eliminate three identical inline log-root expressions.

Individual Plan Details

Group 1: Pipeline Observability — Quota Guard Logging and Per-Step Elapsed Time

Three related pipeline observability improvements, tracked as GitHub issue #302 (collapsing #218, #65, and the #304/#148 token accounting item):

Quota guard MCP tool (feat: Add quota guard observability to diagnostic logging system #218): The quota_check.py hook already writes quota_events.jsonl with approved/blocked/cache-miss events. Add a new ungated get_quota_events tool to tools_status.py to surface those decisions through the MCP API.
Wall-clock time in token summary (feat: report per-step elapsed time in token summary #65): Merge total_seconds from the timing log into each step's get_token_summary output as wall_clock_seconds, so operators see wall-clock duration alongside token counts in one call. Updates _format_token_summary and write_telemetry_files.
Token accounting drift fix (Combined: Pre-release readiness — stability fixes #304/Stability and correctness fixes for public release #148): Persist a .telemetry_cleared_at timestamp when any log is cleared. _state._initialize reads this on startup and uses max(now - 24h, marker_ts) as the effective replay lower bound, excluding already-cleared sessions.

Group 2: Remediation — Extract `_get_log_root()` helper in `tools_status.py`

The audit identified that tools_status.py had three identical inline expressions — resolve_log_dir(_get_ctx().config.linux_tracing.log_dir) — repeated in get_pipeline_report, get_token_summary, and get_timing_summary. This remediation adds _get_log_root() to centralize that computation and replaces all three inline call sites. No behavioral change.

Architecture Impact

Operational Diagram

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 65, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    subgraph UngatedTools ["UNGATED MCP TOOLS (tools_status.py)"]
        GTS["● get_token_summary<br/>━━━━━━━━━━<br/>clear=False<br/>+ ● wall_clock_seconds<br/>from timing_log"]
        GTIM["● get_timing_summary<br/>━━━━━━━━━━<br/>clear=False<br/>total_seconds per step"]
        GPR["● get_pipeline_report<br/>━━━━━━━━━━<br/>clear=False<br/>audit failures"]
        GQE["★ get_quota_events<br/>━━━━━━━━━━<br/>n=50<br/>quota guard decisions"]
    end

    subgraph LogRoot ["★ _get_log_root() helper (tools_status.py)"]
        LR["★ _get_log_root()<br/>━━━━━━━━━━<br/>resolve_log_dir(ctx.config<br/>.linux_tracing.log_dir)"]
    end

    subgraph InMemory ["IN-MEMORY PIPELINE LOGS"]
        TK["token_log<br/>━━━━━━━━━━<br/>step_name → tokens<br/>elapsed_seconds"]
        TI["timing_log<br/>━━━━━━━━━━<br/>step_name → total_seconds<br/>(monotonic clock)"]
        AU["audit_log<br/>━━━━━━━━━━<br/>list FailureRecord"]
    end

    subgraph DiskLogs ["PERSISTENT LOG FILES (~/.local/share/autoskillit/logs/)"]
        QE["quota_events.jsonl<br/>━━━━━━━━━━<br/>approved / blocked<br/>cache_miss / parse_error"]
        CM["★ .telemetry_cleared_at<br/>━━━━━━━━━━<br/>UTC ISO timestamp fence<br/>written on clear=True"]
    end

    subgraph Startup ["SERVER STARTUP (_state._initialize)"]
        INIT["● _state._initialize<br/>━━━━━━━━━━<br/>since = max(now−24h, marker)<br/>load_from_log_dir × 3"]
    end

    subgraph Hook ["HOOK (quota_check.py)"]
        QH["quota_check.py<br/>━━━━━━━━━━<br/>PreToolUse: approve/block<br/>_write_quota_log_event"]
    end

    GTS -->|"clear=True"| LR
    GTIM -->|"clear=True"| LR
    GPR -->|"clear=True"| LR
    LR -->|"write_telemetry_clear_marker"| CM
    GQE -->|"_read_quota_events(n)"| QE
    QH -->|"append event"| QE
    GTS -->|"get_report()"| TK
    GTS -->|"● merge wall_clock_seconds"| TI
    GTIM -->|"get_report()"| TI
    GPR -->|"get_report()"| AU
    CM -->|"read marker → since bound"| INIT
    INIT -->|"load_from_log_dir since=effective"| TK
    INIT -->|"load_from_log_dir since=effective"| TI
    INIT -->|"load_from_log_dir since=effective"| AU

    class GTS,GTIM,GPR cli;
    class GQE newComponent;
    class LR newComponent;
    class TK,TI,AU stateNode;
    class QE stateNode;
    class CM newComponent;
    class INIT phase;
    class QH detector;

Color Legend: Dark Blue = MCP query tools | Green = New components (get_quota_events, _get_log_root, .telemetry_cleared_at) | Teal = State/logs | Purple = Server startup | Dark Red = PreToolUse hook

State Lifecycle Diagram

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 65, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;

    subgraph InMem ["IN-MEMORY (clearable — MUTABLE)"]
        TK["● token_log<br/>━━━━━━━━━━<br/>MUTABLE<br/>step_name → tokens + elapsed_seconds<br/>cleared on clear=True"]
        TI["timing_log<br/>━━━━━━━━━━<br/>MUTABLE<br/>step_name → total_seconds<br/>cleared on clear=True"]
        AU["audit_log<br/>━━━━━━━━━━<br/>MUTABLE<br/>list FailureRecord<br/>cleared on clear=True"]
    end

    subgraph Derived ["DERIVED (computed per query)"]
        WC["★ wall_clock_seconds<br/>━━━━━━━━━━<br/>DERIVED<br/>timing_log.get_report()<br/>merged into token summary response<br/>never persisted"]
    end

    subgraph ClearFence ["★ CLEAR FENCE (write-then-read across restarts)"]
        CM["★ .telemetry_cleared_at<br/>━━━━━━━━━━<br/>WRITE-FENCE<br/>UTC ISO timestamp<br/>written atomically by write_telemetry_clear_marker<br/>read exactly once by _initialize"]
    end

    subgraph DiskReplay ["DISK REPLAY (bounded by clear fence)"]
        SJ["sessions.jsonl + session/<br/>━━━━━━━━━━<br/>REPLAY-SOURCE<br/>historical token + timing + audit data<br/>replayed with since= lower bound"]
        QE["quota_events.jsonl<br/>━━━━━━━━━━<br/>APPEND-ONLY<br/>quota hook writes, never rewrites<br/>read by ★ get_quota_events"]
    end

    subgraph ClearGate ["CLEAR GATE (state mutation trigger)"]
        ClearTrue["clear=True in<br/>● get_token_summary /<br/>● get_timing_summary /<br/>● get_pipeline_report<br/>━━━━━━━━━━<br/>1. Clear in-memory log<br/>2. ★ Write .telemetry_cleared_at"]
    end

    subgraph StartupGate ["★ STARTUP REPLAY GATE (_state._initialize)"]
        INIT["● _state._initialize<br/>━━━━━━━━━━<br/>1. Read .telemetry_cleared_at<br/>2. since = max(now−24h, marker)<br/>3. load_from_log_dir × 3<br/>Guards: no double-counting"]
    end

    ClearTrue -->|"1. in_memory.clear()"| TK
    ClearTrue -->|"1. in_memory.clear()"| TI
    ClearTrue -->|"1. in_memory.clear()"| AU
    ClearTrue -->|"2. write_telemetry_clear_marker()"| CM
    TI -->|"get_report() per query"| WC
    WC -->|"★ merged into response"| TK
    CM -->|"read → since bound"| INIT
    SJ -->|"load_from_log_dir since=effective"| INIT
    INIT -->|"populate (bounded)"| TK
    INIT -->|"populate (bounded)"| TI
    INIT -->|"populate (bounded)"| AU

    class TK,TI,AU handler;
    class WC newComponent;
    class CM newComponent;
    class ClearTrue detector;
    class INIT phase;
    class SJ,QE stateNode;

Color Legend: Orange = MUTABLE in-memory logs | Green = New (wall_clock_seconds, .telemetry_cleared_at fence) | Dark Red = clear=True trigger | Purple = startup replay gate | Teal = persistent disk state

Module Dependency Diagram

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%%
graph TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;

    subgraph L3 ["L3 — SERVER (tools_status.py, _state.py, helpers.py)"]
        direction LR
        TS["● tools_status.py<br/>━━━━━━━━━━<br/>★ _get_log_root()<br/>★ get_quota_events<br/>● 3 clear=True paths"]
        ST["● _state.py<br/>━━━━━━━━━━<br/>● _initialize<br/>reads clear marker"]
        HLP["● helpers.py<br/>━━━━━━━━━━<br/>re-exports<br/>write/read_telemetry_clear_marker<br/>resolve_log_dir"]
    end

    subgraph L1 ["L1 — EXECUTION (execution/__init__.py, session_log.py)"]
        direction LR
        EINIT["● execution/__init__.py<br/>━━━━━━━━━━<br/>★ exports write/read_telemetry_clear_marker<br/>public API surface"]
        SL["● session_log.py<br/>━━━━━━━━━━<br/>★ write_telemetry_clear_marker()<br/>★ read_telemetry_clear_marker()<br/>_CLEAR_MARKER_FILENAME"]
    end

    subgraph L0 ["L0 — CORE (core/types.py)"]
        TY["● core/types.py<br/>━━━━━━━━━━<br/>● UNGATED_TOOLS frozenset<br/>+ get_quota_events"]
    end

    TS -->|"import resolve_log_dir<br/>write/read_telemetry_clear_marker<br/>(via helpers shim)"| HLP
    ST -->|"★ import read_telemetry_clear_marker<br/>(direct from execution)"| EINIT
    HLP -->|"re-export from execution"| EINIT
    EINIT -->|"defined in"| SL
    TS -.->|"UNGATED_TOOLS check<br/>(via pipeline.gate)"| TY

    class TS,ST,HLP cli;
    class EINIT,SL handler;
    class TY stateNode;

Color Legend: Dark Blue = L3 server layer | Orange = L1 execution layer | Teal = L0 core types | Dashed = indirect (via pipeline.gate) | All imports flow downward (no violations)

Closes #302

Implementation Plans

Plan files:

temp/make-plan/302_pipeline_observability_plan_2026-03-10_204500.md
temp/make-plan/302_remediation_get_log_root_plan_2026-03-10_210500.md

🤖 Generated with Claude Code via AutoSkillit

…onds, and telemetry drift fix - Add `get_quota_events` ungated MCP tool to surface quota_check.py hook decisions (approved/blocked/cache_miss) from quota_events.jsonl - Merge `timing_log.total_seconds` into `get_token_summary` response as `wall_clock_seconds` per step; falls back to `elapsed_seconds` when no timing entry exists; also added to `_format_token_summary` markdown output - Write `.telemetry_cleared_at` marker on `clear=True` in all three status tools (get_token_summary, get_timing_summary, get_pipeline_report) - `_state._initialize` reads the marker on startup and uses `max(now-24h, marker)` as the effective `since` lower bound, preventing double-counting of cleared sessions on server restart - Add `write_telemetry_clear_marker` / `read_telemetry_clear_marker` to `execution/session_log.py` and re-export from `execution/__init__.py` - Update CLAUDE.md tool count 38 → 39, add get_quota_events to tool list - Add `get_quota_events` to UNGATED_TOOLS frozenset in core/types.py - Tests: clear marker roundtrip, _initialize drift prevention, get_quota_events, wall_clock_seconds fallback, clear=True marker writes for all three tools Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…lementation - Route write_telemetry_clear_marker/resolve_log_dir through server/helpers.py re-exports so tools_status.py does not import from autoskillit.execution (REQ-IMP-003, test_server_tools_import_only_allowed_packages, test_no_cross_package_submodule_imports) - Extract for-loop/dict-comprehension from get_token_summary into _merge_wall_clock_seconds() module-level helper (REQ-CNST-008) - Replace except Exception: pass with logger.debug(..., exc_info=True) in tools_status.py and _state.py (ARCH-003) - Fix except Exception: continue in _read_quota_events to use specific json.JSONDecodeError (ARCH-003) - Add get_quota_events to expected frozenset in test_ungated_tools_contains_expected_names Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Centralizes the repeated resolve_log_dir(_get_ctx().config.linux_tracing.log_dir) expression into a module-private helper. Replaces the three identical inline call sites in get_pipeline_report, get_token_summary, and get_timing_summary. Adds TestGetLogRoot unit tests for the new helper. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Trecek

AutoSkillit PR Review — Verdict: changes_requested

Trecek · 2026-03-11T17:42:41Z

tests/server/test_tools_status.py

+        assert result["events"][0]["event"] == "blocked"  # most recent first
+
+    @pytest.mark.anyio
+    async def test_limits_to_n_events(self, tool_ctx, tmp_path, monkeypatch):


[warning] tests: test_limits_to_n_events does not assert total_count equals 10 (the full log size). Only checks len(result["events"]) == 3 but omits verifying total_count reflects the full dataset.

Trecek · 2026-03-11T17:42:41Z

tests/server/test_tools_status.py

+        ]
+        (log_dir / "quota_events.jsonl").write_text("\n".join(lines) + "\n")
+        monkeypatch.setattr(tool_ctx.config.linux_tracing, "log_dir", str(log_dir))
+        result = json.loads(await get_quota_events(n=3))


[warning] tests: test_limits_to_n_events does not verify ordering of returned events (most-recent-first). The 10-event pagination test never checks which 3 events are returned.

Trecek · 2026-03-11T17:42:41Z

tests/server/test_tools_status.py

+        result = json.loads(await get_token_summary())
+        step = next(s for s in result["steps"] if s["step_name"] == "step-b")
+        # No timing_log entry → falls back to elapsed_seconds
+        assert step["wall_clock_seconds"] == pytest.approx(5.0)


[warning] tests: test_wall_clock_falls_back_to_elapsed_when_no_timing never verifies timing_log has no step-b entry. Parallel test pollution could silently bypass the fallback path.

Trecek · 2026-03-11T17:42:41Z

tests/server/test_server_init.py

+        (log_dir / ".telemetry_cleared_at").write_text(three_hours_ago.isoformat())
+
+        monkeypatch.setattr(tool_ctx.config.linux_tracing, "log_dir", str(log_dir))
+        monkeypatch.setattr(_state, "_ctx", None)


[warning] tests: test_initialize_uses_clear_marker_as_since_bound: monkeypatching _state._ctx=None conflicts with tool_ctx fixture which already patches it. Interleaved None-then-reinit may leave _ctx in unexpected state under xdist teardown.

Trecek · 2026-03-11T17:42:41Z

tests/server/test_server_init.py

+        )
+
+        monkeypatch.setattr(tool_ctx.config.linux_tracing, "log_dir", str(log_dir))
+        monkeypatch.setattr(_state, "_ctx", None)


[warning] tests: Same xdist/fixture-teardown concern as L661. Also missing boundary condition: marker timestamp == session timestamp (boundary for <= vs < in since_dt logic).

Trecek · 2026-03-11T17:42:41Z

src/autoskillit/server/helpers.py

 from autoskillit.core import RESERVED_LOG_RECORD_KEYS, TerminationReason, get_logger
 from autoskillit.execution import (
-    resolve_log_dir,  # noqa: F401 — used by tools_integrations.py
+    read_telemetry_clear_marker,  # noqa: F401 — used by tools_status.py


[warning] slop: noqa comment claims read_telemetry_clear_marker is used by tools_status.py but it is not called anywhere in tools_status.py. The re-export and comment are misleading dead weight.

Trecek · 2026-03-11T17:42:41Z

src/autoskillit/server/tools_status.py

+
 @mcp.tool(tags={"automation"})
 @track_response_size("kitchen_status")
 async def kitchen_status() -> str:


[warning] defense: _merge_wall_clock_seconds parameter timing_log is typed as Any, bypassing static type checking. Should be typed as DefaultTimingLog or its protocol to enable mypy to catch misuse at call sites.

Trecek · 2026-03-11T17:42:41Z

src/autoskillit/server/tools_status.py

@@ -151,9 +190,66 @@ async def get_timing_summary(clear: bool = False) -> str:
    total = _get_ctx().timing_log.compute_total()
    if clear:


[warning] fidelity: write_telemetry_clear_marker is called when get_timing_summary(clear=True) fires, advancing the shared fence even when token_log and audit are NOT cleared. On next restart, _state._initialize skips sessions for all three log types — may under-count token and audit data that was never cleared. Issue #302 does not describe this shared-fence side effect.

Trecek

AutoSkillit Review Findings

Verdict: changes_requested

8 actionable findings (all warning severity). Implementation is correct and addresses all three requirements from issue #302 (quota events tool, wall-clock seconds in token summary, drift prevention fence). Inline comments posted above.

src/autoskillit/server/helpers.py

L14 [warning/slop]: noqa comment claims read_telemetry_clear_marker is 'used by tools_status.py' but it is not called anywhere in tools_status.py — unused re-export with misleading comment.

src/autoskillit/server/tools_status.py

L37 [warning/defense]: _merge_wall_clock_seconds parameter timing_log typed as Any — bypasses static type checking; use DefaultTimingLog or its protocol.
L191 [warning/fidelity]: Shared-fence side effect — get_timing_summary(clear=True) advances the fence for all three log types (token_log, timing_log, audit), not just timing_log. On next restart, _state._initialize may skip token/audit sessions that were never explicitly cleared. Not described in issue #302.

tests/server/test_tools_status.py

L627 [warning/tests]: test_limits_to_n_events omits assert result["total_count"] == 10 — total_count goes untested for the n-limiting case.
L636 [warning/tests]: test_limits_to_n_events never asserts which 3 events are returned (oldest or newest) — ordering unverified in the paginated case.
L695 [warning/tests]: test_wall_clock_falls_back_to_elapsed_when_no_timing never asserts timing_log has no 'step-b' entry — parallel pollution could silently bypass the fallback.

tests/server/test_server_init.py

L661 [warning/tests]: test_initialize_uses_clear_marker_as_since_bound uses bare _state._ctx = None assignment conflicting with fixture monkeypatch — xdist teardown ordering may leave _ctx in unexpected state.
L705 [warning/tests]: Same xdist concern as L661; missing boundary condition test (marker ts == session ts).

…misleading comment from helpers.py

…imingStore protocol instead of Any

…clear=True in docstring

…allback assertions

… boundary test

…annotations Both sides of the conflict were complementary: the PR added _get_log_root() and _merge_wall_clock_seconds() helpers plus get_quota_events tool, while integration added readOnlyHint=True annotations to all @mcp.tool decorators. Resolution keeps all new code and applies readOnlyHint to every decorator including the new get_quota_events tool. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…333, #342 into integration (#351) ## Integration Summary Collapsed 9 PRs into `pr-batch/pr-merge-20260311-133920` targeting `integration`. ## Merged PRs | # | Title | Complexity | Additions | Deletions | Overlaps | |---|-------|-----------|-----------|-----------|---------| | #337 | Implementation Plan: Dry Walkthrough — Test Command Genericization (Issue #307) | simple | +29 | -2 | — | | #339 | Implementation Plan: Release CI — Force-Push Integration Back-Sync | simple | +88 | -45 | — | | #336 | Enhance prepare-issue with Duplicate Detection and Broader Triggers | needs_check | +161 | -8 | — | | #332 | Rectify: Display Output Bugs #329 — Terminal Targets Consolidation — PART A ONLY | needs_check | +783 | -13 | — | | #338 | Implementation Plan: Pre-release Readiness — Stability Fixes | needs_check | +238 | -36 | — | | #343 | Implementation Plan: PR Pipeline Gates — Mergeability Gate and Review Cycle | needs_check | +384 | -5 | #338 | | #341 | Pipeline observability: quota events, wall-clock timing, drift fix | needs_check | +480 | -5 | #332, #338 | | #333 | Remove run_recipe — Eliminate Sub-Orchestrator Pattern | needs_check | +538 | -655 | #332, #338, #341 | | #342 | feat: genericize codebase and bundle external dependencies for public release | needs_check | +5286 | -1062 | #332, #333, #338, #341, #343 | ## Audit **Verdict:** GO ## Architecture Impact ### Development Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; subgraph SourceTree ["PROJECT STRUCTURE (● = modified)"] direction TB SRC["● src/autoskillit/ ━━━━━━━━━━ 105 .py source files cli · config · core execution · hooks · pipeline recipe · server · workspace"] SKILLS["● + ★ src/autoskillit/skills/ ━━━━━━━━━━ 52 bundled skills ★ 13 arch-lens-* SKILL.md added ★ 3 audit-* SKILL.md added ● 14 existing skills updated"] RECIPES["● src/autoskillit/recipes/ ━━━━━━━━━━ 8 bundled YAML recipes All recipes updated"] TESTS["● + ★ tests/ ━━━━━━━━━━ 173 .py test files ★ 6 new test files added"] end subgraph Build ["BUILD TOOLING"] direction TB PYPROJECT["● pyproject.toml ━━━━━━━━━━ hatchling build backend uv package manager 10 runtime deps"] TASKFILE["Taskfile.yml ━━━━━━━━━━ test-all · test-check test-smoke · install-worktree"] end subgraph Quality ["CODE QUALITY GATES"] direction TB RFMT["ruff-format ━━━━━━━━━━ Auto-fix formatting"] RLINT["ruff ━━━━━━━━━━ Lint + auto-fix"] MYPY["mypy src/ ━━━━━━━━━━ --ignore-missing-imports"] UVLOCK["uv lock --check ━━━━━━━━━━ Lock file integrity"] SECRETS["gitleaks ━━━━━━━━━━ Secret scanning"] GUARD["★ headless_orchestration_guard.py ━━━━━━━━━━ ★ PreToolUse hook Blocks run_skill/run_cmd/run_python from headless sessions"] end subgraph Testing ["TEST FRAMEWORK"] direction TB PYTEST["pytest + asyncio_mode=auto ━━━━━━━━━━ xdist -n 4 parallel timeout=60s signal method"] NEWTEST["★ New Test Files ━━━━━━━━━━ ★ test_headless_orchestration_guard ★ test_audit_and_fix_degradation ★ test_rules_inputs ★ test_skill_genericization ★ test_pyproject_metadata ★ test_release_sanity"] end subgraph CI ["CI/CD WORKFLOWS"] direction LR TESTS_WF["tests.yml ━━━━━━━━━━ PR test gate"] RELEASE_WF["release.yml ━━━━━━━━━━ Release automation"] BUMP_WF["● version-bump.yml ━━━━━━━━━━ ● Force-push back-sync integration → main"] end subgraph EntryPoints ["ENTRY POINTS"] EP["autoskillit CLI ━━━━━━━━━━ serve · init · skills recipes · doctor · workspace"] end SRC --> PYPROJECT SKILLS --> PYPROJECT TESTS --> PYTEST PYPROJECT --> TASKFILE PYPROJECT --> RFMT RFMT --> RLINT RLINT --> MYPY MYPY --> UVLOCK UVLOCK --> SECRETS SECRETS --> GUARD GUARD --> PYTEST PYTEST --> NEWTEST NEWTEST --> BUMP_WF TESTS_WF --> PYTEST PYPROJECT --> EP class SRC,TESTS stateNode; class SKILLS,RECIPES newComponent; class PYPROJECT,TASKFILE phase; class RFMT,RLINT,MYPY,UVLOCK,SECRETS detector; class GUARD newComponent; class PYTEST handler; class NEWTEST newComponent; class TESTS_WF,RELEASE_WF phase; class BUMP_WF newComponent; class EP output; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Teal | Structure | Source directories and test suite | | Green (★) | New/Modified | New files and components added in this PR | | Purple | Build | Build configuration and task automation | | Red | Quality Gates | Pre-commit hooks, linters, type checker | | Orange | Test Runner | pytest execution engine | | Dark Teal | Entry Points | CLI commands | ### Module Dependency Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%% graph TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; subgraph L0 ["L0 — CORE (zero autoskillit imports)"] direction LR TYPES["● core/types.py ━━━━━━━━━━ GATED_TOOLS · UNGATED_TOOLS RecipeSource (★ promoted here) ClaudeFlags · StrEnums fan-in: ~75 files"] COREIO["core/io.py · logging.py · paths.py ━━━━━━━━━━ Atomic write · Logger · pkg_root()"] end subgraph L1P ["L1 — PIPELINE (imports L0 only)"] direction TB GATE["● pipeline/gate.py ━━━━━━━━━━ DefaultGateState gate_error_result() ★ headless_error_result() re-exports GATED/UNGATED_TOOLS"] PIPEINIT["● pipeline/__init__.py ━━━━━━━━━━ Re-exports public surface ToolContext · AuditLog TokenLog · DefaultGateState"] end subgraph L1E ["L1 — EXECUTION (imports L0 only)"] direction TB HEADLESS["● execution/headless.py ━━━━━━━━━━ Headless Claude sessions Imports core types via TYPE_CHECKING for ToolContext (no runtime cycle)"] COMMANDS["● execution/commands.py ━━━━━━━━━━ ClaudeHeadlessCmd builder"] SESSION_LOG["● execution/session_log.py ━━━━━━━━━━ Session diagnostics writer"] end subgraph L2 ["L2 — RECIPE (imports L0+L1)"] direction TB SCHEMA["● recipe/schema.py ━━━━━━━━━━ Recipe · RecipeStep · DataFlowWarning RecipeSource (now from L0)"] RULES["● recipe/rules_inputs.py ━━━━━━━━━━ ★ Ingredient validation rules reads GATED_TOOLS from L0 via pipeline re-export"] ANALYSIS["● recipe/_analysis.py ━━━━━━━━━━ Step graph builder"] VALIDATOR["● recipe/validator.py ━━━━━━━━━━ validate_recipe()"] end subgraph L3S ["L3 — SERVER (imports all layers)"] direction TB HELPERS["● server/helpers.py ━━━━━━━━━━ _require_enabled() — reads gate ★ _require_not_headless() Shared by all tool handlers"] TOOLS_EX["● server/tools_execution.py ━━━━━━━━━━ run_cmd · run_python · run_skill ✗ run_recipe REMOVED Uses _require_not_headless()"] TOOLS_GIT["● server/tools_git.py ━━━━━━━━━━ merge_worktree · classify_fix ● check_pr_mergeable (new gate)"] TOOLS_K["● server/tools_kitchen.py ━━━━━━━━━━ open_kitchen · close_kitchen"] FACTORY["● server/_factory.py ━━━━━━━━━━ Composition root Wires ToolContext"] end subgraph L3H ["L3 — HOOKS (stdlib only for guard)"] direction LR HOOK_GUARD["★ hooks/headless_orchestration_guard.py ━━━━━━━━━━ ★ PreToolUse hook (stdlib only) Blocks run_skill/run_cmd/run_python from AUTOSKILLIT_HEADLESS=1 sessions NO autoskillit imports"] PRETTY["● hooks/pretty_output.py ━━━━━━━━━━ PostToolUse response formatter"] end subgraph L3C ["L3 — CLI (imports all layers)"] direction LR CLI_APP["● cli/app.py ━━━━━━━━━━ serve · init · skills · recipes doctor · workspace"] CLI_PROMPTS["● cli/_prompts.py ━━━━━━━━━━ Orchestrator prompt builder"] end TYPES -->|"fan-in ~75"| GATE TYPES -->|"fan-in ~75"| HEADLESS TYPES -->|"fan-in ~75"| SCHEMA COREIO --> PIPEINIT GATE --> PIPEINIT PIPEINIT -->|"gate_error_result headless_error_result"| HELPERS HEADLESS --> HELPERS COMMANDS --> HEADLESS SESSION_LOG --> HELPERS SCHEMA -->|"RecipeSource from L0"| RULES RULES --> VALIDATOR ANALYSIS --> VALIDATOR HELPERS -->|"_require_not_headless"| TOOLS_EX HELPERS --> TOOLS_GIT HELPERS --> TOOLS_K VALIDATOR --> FACTORY PIPEINIT --> FACTORY FACTORY --> CLI_APP FACTORY --> CLI_PROMPTS HOOK_GUARD -.->|"ENV: AUTOSKILLIT_HEADLESS zero autoskillit imports"| TOOLS_EX class TYPES,COREIO stateNode; class GATE,PIPEINIT phase; class HEADLESS,COMMANDS,SESSION_LOG handler; class SCHEMA,RULES,ANALYSIS,VALIDATOR phase; class HELPERS,TOOLS_EX,TOOLS_GIT,TOOLS_K handler; class FACTORY cli; class CLI_APP,CLI_PROMPTS cli; class HOOK_GUARD newComponent; class PRETTY handler; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Teal | L0 Core | High fan-in foundation types (zero reverse deps) | | Purple | L1/L2 Control | Pipeline gate, recipe schema and rules | | Orange | L1/L3 Processors | Execution handlers, server tool handlers | | Dark Blue | L3 CLI | Composition root and CLI entry points | | Green (★) | New Components | headless_orchestration_guard — standalone hook | | Dashed | ENV Signal | OS-level check; no code import relationship | Closes #307 Closes #327 Closes #308 Closes #329 Closes #304 Closes #328 Closes #302 Closes #330 Closes #311 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit --- ## Merge Conflict Resolution The batch branch was rebased onto `integration` to resolve 17 file conflicts. All conflicts arose because PRs #337–#341 were squash-merged into both `integration` (directly) and the batch branch (via the pipeline), while PRs #333 and #342 required conflict resolution work that only exists on the batch branch. **Resolution principle:** Batch branch version wins for all files touched by #333/#342 conflict resolution and remediation, since that state was fully tested (3752 passed). Integration-only additions (e.g. `TestGetQuotaEvents`) were preserved where they don't overlap. ### Per-file decisions | File | Decision | Rationale | |------|----------|-----------| | `CLAUDE.md` | **Batch wins** | Batch has corrected tool inventory (run_recipe removed, get_quota_events added, 25 kitchen tools) | | `core/types.py` | **Batch wins** | Batch splits monolithic UNGATED_TOOLS into WORKER_TOOLS + HEADLESS_BLOCKED_UNGATED_TOOLS; removes run_recipe from GATED_TOOLS | | `execution/__init__.py` | **Batch wins** | Batch removes dead exports (build_subrecipe_cmd, run_subrecipe_session) | | `execution/headless.py` | **Batch wins** | Batch deletes run_subrecipe_session function (530+ lines); keeps run_headless_core with token_log error handling | | `hooks/pretty_output.py` | **Batch wins** | Batch removes run_recipe from _UNFORMATTED_TOOLS, adds get_quota_events | | `recipes/pr-merge-pipeline.yaml` | **Batch wins** | Batch has base_branch required:true, updated kitchen rules (main instead of integration) | | `server/_state.py` | **Batch wins** | Batch adds .telemetry_cleared_at marker reading in _initialize | | `server/helpers.py` | **Batch wins** | Batch removes _run_subrecipe and run_subrecipe_session import; adds _require_not_headless | | `server/tools_git.py` | **Batch wins** | Batch has updated classify_fix with git fetch and check_pr_mergeable gate | | `server/tools_kitchen.py` | **Batch wins** | Batch adds headless gates to open_kitchen/close_kitchen; adds TOOL_CATEGORIES listing | | `server/tools_status.py` | **Merge both** | Batch headless gates + wall_clock_seconds merged with integration's TestGetQuotaEvents (deduplicated) | | `tests/conftest.py` | **Batch wins** | Batch replaces AUTOSKILLIT_KITCHEN_OPEN with AUTOSKILLIT_HEADLESS in fixture | | `tests/execution/test_headless.py` | **Batch wins** | Batch removes run_subrecipe_session tests (deleted code); updates docstring | | `tests/recipe/test_bundled_recipes.py` | **Merge both** | Batch base_branch=main assertions + integration WF7 graph test both kept | | `tests/server/test_tools_kitchen.py` | **Batch wins** | Batch adds headless gate denial tests for open/close kitchen | | `tests/server/test_tools_status.py` | **Merge both** | Batch headless gate tests merged with integration quota events tests | ### Post-rebase fixes - Removed duplicate `TestGetQuotaEvents` class (existed in both batch commit and auto-merged integration code) - Fixed stale `_build_tool_listing` → `_build_tool_category_listing` attribute reference - Added `if diagram: print(diagram)` to `cli/app.py` cook function (test expected terminal output) ### Verification - **3752 passed**, 23 skipped, 0 failures - 7 architecture contracts kept, 0 broken - Pre-commit hooks all pass --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…, Headless Isolation (#404) ## Summary Integration rollup of **43 PRs** (#293–#406) consolidating **62 commits** across **291 files** (+27,909 / −6,040 lines). This release advances AutoSkillit from v0.2.0 to v0.3.1 with GitHub merge queue integration, sub-recipe composition, a PostToolUse output reformatter, headless session isolation guards, and comprehensive pipeline observability — plus 24 new bundled skills, 3 new MCP tools, and 47 new test files. --- ## Major Features ### GitHub Merge Queue Integration (#370, #362, #390) - New `wait_for_merge_queue` MCP tool — polls a PR through GitHub's merge queue until merged, ejected, or timed out (default 600s). Uses REST + GraphQL APIs with stuck-queue detection and auto-merge re-enrollment - New `DefaultMergeQueueWatcher` L1 service (`execution/merge_queue.py`) — never raises; all outcomes are structured results - `parse_merge_queue_response()` pure function for GraphQL queue entry parsing - New `auto_merge` ingredient in `implementation.yaml` and `remediation.yaml` — enrolls PRs in the merge queue after CI passes - Full queue-mode path added to `merge-prs.yaml`: detect queue → enqueue → wait → handle ejections → re-enter - `analyze-prs` skill gains Step 0.5 (merge queue detection) and Step 1.5 (CI/review eligibility filtering) ### Sub-Recipe Composition (#380) - Recipe steps can now reference sub-recipes via `sub_recipe` + `gate` fields — lazy-loaded and merged at validation time - Composition engine in `recipe/_api.py`: `_merge_sub_recipe()` inlines sub-recipe steps with safe name-prefixing and route remapping (`done` → parent's `on_success`, `escalate` → parent's `on_failure`) - `_build_active_recipe()` evaluates gate ingredients against overrides/defaults; dual validation runs on both active and combined recipes - First sub-recipe: `sprint-prefix.yaml` — triage → plan → confirm → dispatch workflow, gated by `sprint_mode` ingredient (hidden, default false) - Both `implementation.yaml` and `remediation.yaml` gain `sprint_entry` placeholder step - New semantic rules: `unknown-sub-recipe` (ERROR), `circular-sub-recipe` (ERROR) with DFS cycle detection ### PostToolUse Output Reformatter (#293, #405) - `pretty_output.py` — new 671-line PostToolUse hook that rewrites raw MCP JSON responses to Markdown-KV before Claude consumes them (30–77% token overhead reduction) - Dedicated formatters for 11 high-traffic tools (`run_skill`, `run_cmd`, `test_check`, `merge_worktree`, `get_token_summary`, etc.) plus a generic KV formatter for remaining tools - Pipeline vs. interactive mode detection via hook config file - Unwraps Claude Code's `{"result": "<json-string>"}` envelope before dispatching - 1,516-line test file with 40+ behavioral tests ### Headless Session Isolation (#359, #393, #397, #405, #406) - **Env isolation**: `build_sanitized_env()` strips `AUTOSKILLIT_PRIVATE_ENV_VARS` from subprocess environments, preventing `AUTOSKILLIT_HEADLESS=1` from leaking into test runners - **CWD path contamination defense**: `_inject_cwd_anchor()` anchors all relative paths to session CWD; `_validate_output_paths()` checks structured output tokens against CWD prefix; `_scan_jsonl_write_paths()` post-session scanner catches actual Write/Edit/Bash tool calls outside CWD - **Headless orchestration guard**: new PreToolUse hook blocks `run_skill`/`run_cmd`/`run_python` when `AUTOSKILLIT_HEADLESS=1`, enforcing Tier 1/Tier 2 nesting invariant - **`_require_not_headless()` server-side guard**: blocks 10 orchestration-only tools from headless sessions at the handler layer - **Unified error response contract**: `headless_error_result()` produces consistent 9-field responses; `_build_headless_error_response()` canonical builder for all failure paths in `tools_integrations.py` ### Cook UX Overhaul (#375, #363) - `open_kitchen` now accepts optional `name` + `overrides` — opens kitchen AND loads recipe in a single call - Pre-launch terminal preview with ANSI-colored flow diagram and ingredients table via new `cli/_ansi.py` module - `--dangerously-skip-permissions` warning banner with interactive confirmation prompt - Randomized session greetings from themed pools - Orchestrator prompt rewritten: recipe YAML no longer injected via `--append-system-prompt`; session calls `open_kitchen('{recipe_name}')` as first action - Conversational ingredient collection replaces mechanical per-field prompting --- ## New MCP Tools | Tool | Gate | Description | |------|------|-------------| | `wait_for_merge_queue` | Kitchen | Polls PR through GitHub merge queue (REST + GraphQL) | | `set_commit_status` | Kitchen | Posts GitHub Commit Status to a SHA for review-first gating | | `get_quota_events` | Ungated | Surfaces quota guard decisions from `quota_events.jsonl` | --- ## Pipeline Observability (#318, #341) - **`TelemetryFormatter`** (`pipeline/telemetry_fmt.py`) — single source of truth for all telemetry rendering; replaces dual-formatter anti-pattern. Four rendering modes: Markdown table, terminal table, compact KV (for PostToolUse hook) - `get_token_summary` and `get_timing_summary` gain `format` parameter (`"json"` | `"table"`) - `wall_clock_seconds` merged into token summary output — see duration alongside token counts in one call - **Telemetry clear marker**: `write_telemetry_clear_marker()` / `read_telemetry_clear_marker()` prevent token accounting drift on MCP server restart after `clear=True` - **Quota event logging**: `quota_check.py` hook now writes structured JSONL events (`cache_miss`, `parse_error`, `blocked`, `approved`) to `quota_events.jsonl` --- ## CI Watcher & Remote Resolution Fixes (#395, #406) - **`CIRunScope` value object** — carries `workflow` + `head_sha` scope; replaces bare `head_sha` parameter across all CI watcher signatures - **Workflow filter**: `wait_for_ci` and `get_ci_status` accept `workflow` parameter (falls back to project-level `config.ci.workflow`), preventing unrelated workflows (version bumps, labelers) from satisfying CI checks - **`FAILED_CONCLUSIONS` expanded**: `failure` → `{failure, timed_out, startup_failure, cancelled}` - **Canonical remote resolver** (`execution/remote_resolver.py`): `resolve_remote_repo()` with `REMOTE_PRECEDENCE = (upstream, origin)` — correctly resolves `owner/repo` after `clone_repo` sets `origin` to `file://` isolation URL - **Clone isolation fix**: `clone_repo` now always clones from remote URL (never local path); sets `origin=file:///<clone>` for isolation and `upstream=<real_url>` for push/CI operations --- ## PR Pipeline Gates (#317, #343) - **`pipeline/pr_gates.py`**: `is_ci_passing()`, `is_review_passing()`, `partition_prs()` — partitions PRs into eligible/CI-blocked/review-blocked with human-readable reasons - **`pipeline/fidelity.py`**: `extract_linked_issues()` (Closes/Fixes/Resolves patterns), `is_valid_fidelity_finding()` schema validation - **`check_pr_mergeable`** now returns `mergeable_status` field alongside boolean - **`release_issue`** gains `target_branch` + `staged_label` parameters for staged issue lifecycle on non-default branches (#392) --- ## Recipe System Changes ### Structural - `RecipeIngredient.hidden` field — excluded from ingredients table (used for internal flags like `sprint_mode`) - `Recipe.experimental` flag parsed from YAML - `_TERMINAL_TARGETS` moved to `schema.py` as single source of truth - `format_ingredients_table()` with sorted display order (required → auto-detect → flags → optional → constants) - Diagram rendering engine (~670 lines) removed from `diagrams.py` — rendering now handled by `/render-recipe` skill; format version bumped to v7 ### Recipe YAML Changes - **Deleted**: `audit-and-fix.yaml`, `batch-implementation.yaml`, `bugfix-loop.yaml` - **Renamed**: `pr-merge-pipeline.yaml` → `merge-prs.yaml` - **`implementation.yaml`**: merge queue steps, `auto_merge`/`sprint_mode` ingredients, `base_branch` default → `""` (auto-detect), CI workflow filter, `extract_pr_number` step - **`remediation.yaml`**: `topic` → `task` rename, merge queue steps, `dry_walkthrough` retries:3 with forward-only routing, `verify` → `test` rename - **`merge-prs.yaml`**: full queue-mode path, `open-integration-pr` step (replaces `create-review-pr`), post-PR mergeability polling, review cycle with `resolve-review` retries ### New Semantic Rules - `missing-output-patterns` (WARNING) — flags `run_skill` steps without `expected_output_patterns` - `unknown-sub-recipe` (ERROR) — validates sub-recipe references exist - `circular-sub-recipe` (ERROR) — DFS cycle detection - `unknown-skill-command` (ERROR) — validates skill names against bundled set - `telemetry-before-open-pr` (WARNING) — ensures telemetry step precedes `open-pr` --- ## New Skills (24) ### Architecture Lens Family (13) `arch-lens-c4-container`, `arch-lens-concurrency`, `arch-lens-data-lineage`, `arch-lens-deployment`, `arch-lens-development`, `arch-lens-error-resilience`, `arch-lens-module-dependency`, `arch-lens-operational`, `arch-lens-process-flow`, `arch-lens-repository-access`, `arch-lens-scenarios`, `arch-lens-security`, `arch-lens-state-lifecycle` ### Audit Family (5) `audit-arch`, `audit-bugs`, `audit-cohesion`, `audit-defense-standards`, `audit-tests` ### Planning & Diagramming (3) `elaborate-phase`, `make-arch-diag`, `make-req` ### Bug/Guard Lifecycle (2) `design-guards`, `verify-diag` ### Pipeline (1) `open-integration-pr` — creates integration PRs with per-PR details, arch-lens diagrams, carried-forward `Closes #N` references, and auto-closes collapsed PRs ### Sprint Planning (1 — gated by sub-recipe) `sprint-planner` — selects a focused, conflict-free sprint from a triage manifest --- ## Skill Modifications (Highlights) - **`analyze-prs`**: merge queue detection, CI/review eligibility filtering, queue-mode ordering - **`dry-walkthrough`**: Step 4.5 Historical Regression Check (git history mining + GitHub issue cross-reference) - **`review-pr`**: deterministic diff annotation via `diff_annotator.py`, echo-primary-obligation step, post-completion confirmation, degraded-mode narration - **`collapse-issues`**: content fidelity enforcement — per-issue `fetch_github_issue` calls, copy-mode body assembly (#388) - **`prepare-issue`**: multi-keyword dedup search, numbered candidate selection, extend-existing-issue flow - **`resolve-review`**: GraphQL thread auto-resolution after addressing findings (#379) - **`resolve-merge-conflicts`**: conflict resolution decision report with per-file log (#389) - **Cross-skill**: output tokens migrated to `key = value` format; code-index paths made generic with fallback notes; arch-lens references fully qualified; anti-prose guards at loop boundaries --- ## CLI & Hooks ### New CLI Commands - `autoskillit install` — plugin installation + cache refresh - `autoskillit upgrade` — `.autoskillit/scripts/` → `.autoskillit/recipes/` migration ### CLI Changes - `doctor`: plugin-aware MCP check, PostToolUse hook scanning, `--fix` flag removed - `init`: GitHub repo prompt, `.secrets.yaml` template, plugin-aware registration - `chefs-hat`: pre-launch banner, `--dangerously-skip-permissions` confirmation - `recipes render`: repurposed from generator to viewer (delegates to `/render-recipe`) - `serve`: server import deferred to after `configure_logging()` to prevent stdout corruption ### New Hooks - `branch_protection_guard.py` (PreToolUse) — denies `merge_worktree`/`push_to_remote` targeting protected branches - `headless_orchestration_guard.py` (PreToolUse) — blocks orchestration tools in headless sessions - `pretty_output.py` (PostToolUse) — MCP JSON → Markdown-KV reformatter ### Hook Infrastructure - `HookDef.event_type` field — registry now handles both PreToolUse and PostToolUse - `generate_hooks_json()` groups entries by event type - `_evict_stale_autoskillit_hooks` and `sync_hooks_to_settings` made event-type-agnostic --- ## Core & Config ### New Core Modules - `core/branch_guard.py` — `is_protected_branch()` pure function - `core/github_url.py` — `parse_github_repo()` + `normalize_owner_repo()` canonical parsers ### Core Type Expansions - `AUTOSKILLIT_PRIVATE_ENV_VARS` frozenset - `WORKER_TOOLS` / `HEADLESS_BLOCKED_UNGATED_TOOLS` split from `UNGATED_TOOLS` - `TOOL_CATEGORIES` — categorized listing for `open_kitchen` response - `CIRunScope` — immutable scope for CI watcher calls - `MergeQueueWatcher` protocol - `SkillResult.cli_subtype` + `write_path_warnings` fields - `SubprocessRunner.env` parameter ### Config - `safety.protected_branches`: `[main, integration, stable]` - `github.staged_label`: `"staged"` - `ci.workflow`: workflow filename filter (e.g., `"tests.yml"`) - `branching.default_base_branch`: `"integration"` → `"main"` - `ModelConfig.default`: `str | None` → `str = "sonnet"` --- ## Infrastructure & Release ### Version - `0.2.0` → `0.3.1` across `pyproject.toml`, `plugin.json`, `uv.lock` - FastMCP dependency: `>=3.0.2` → `>=3.1.1,<4.0` (#399) ### CI/CD Workflows - **`version-bump.yml`** (new) — auto patch-bumps `main` on integration PR merge, force-syncs integration branch one patch ahead - **`release.yml`** (new) — minor version bump + GitHub Release on merge to `stable` - **`codeql.yml`** (new) — CodeQL analysis for `stable` PRs (Python + Actions) - **`tests.yml`** — `merge_group:` trigger added; multi-OS now only for `stable` ### PyPI Readiness - `pyproject.toml`: `readme`, `license`, `authors`, `keywords`, `classifiers`, `project.urls`, `hatch.build.targets.sdist` inclusion list ### readOnlyHint Parallel Execution Fix - All MCP tools annotated `readOnlyHint=True` — enables Claude Code parallel tool execution (~7x speedup). One deliberate exception: `wait_for_merge_queue` uses `readOnlyHint=False` (actually mutates queue state) ### Tool Response Exception Boundary - `track_response_size` decorator catches unhandled exceptions and serializes them as `{"success": false, "subtype": "tool_exception"}` — prevents FastMCP opaque error wrapping ### SkillResult Subtype Normalization (#358) - `_normalize_subtype()` gate eliminates dual-source contradiction between CLI subtype and session outcome - Class 2 upward: `SUCCEEDED + error_subtype → "success"` (drain-race artifact) - Class 1 downward: `non-SUCCEEDED + "success" → "empty_result"` / `"missing_completion_marker"` / `"adjudicated_failure"` --- ## Test Coverage **47 new test files** (+12,703 lines) covering: | Area | Key Tests | |------|-----------| | Merge queue watcher state machine | `test_merge_queue.py` (226 lines) | | Clone isolation × CI resolution | `test_clone_ci_contract.py`, `test_remote_resolver.py` | | PostToolUse hook | `test_pretty_output.py` (1,516 lines, 40+ cases) | | Branch protection + headless guards | `test_branch_protection_guard.py`, `test_headless_orchestration_guard.py` | | Sub-recipe composition | 5 test files (schema, loading, validation, sprint mode × 2) | | Telemetry formatter | `test_telemetry_formatter.py` (281 lines) | | PR pipeline gates | `test_analyze_prs_gates.py`, `test_review_pr_fidelity.py` | | Diff annotator | `test_diff_annotator.py` (242 lines) | | Skill compliance | Output token format, genericization, loop-boundary guards | | Release workflows | Structural contracts for `version-bump.yml`, `release.yml` | | Issue content fidelity | Body-assembling skills must call `fetch_github_issue` per-issue | | CI watcher scope | `test_ci_params.py` — workflow_id query param composition | --- ## Consolidated PRs #293, #295, #314, #315, #316, #317, #318, #319, #323, #332, #336, #337, #338, #339, #341, #343, #351, #358, #359, #360, #361, #362, #363, #366, #368, #370, #375, #377, #378, #379, #380, #388, #389, #390, #391, #392, #393, #395, #396, #397, #399, #405, #406 --- 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Trecek and others added 3 commits March 10, 2026 22:18

Trecek commented Mar 11, 2026

View reviewed changes

Trecek and others added 6 commits March 11, 2026 10:46

fix(review): remove unused read_telemetry_clear_marker re-export and …

08a9bcf

…misleading comment from helpers.py

fix(review): type _merge_wall_clock_seconds timing_log parameter as T…

f3155eb

…imingStore protocol instead of Any

fix(review): document shared-fence side effect of get_timing_summary …

48aaf2f

…clear=True in docstring

fix(review): strengthen test_limits_to_n_events and test_wall_clock_f…

3d4a44d

…allback assertions

fix(review): remove redundant _state._ctx=None patches and add marker…

9ad3019

… boundary test

Trecek merged commit 20f174d into integration Mar 11, 2026
2 checks passed

Trecek deleted the combined-pipeline-observability-quota-guard-logging-and-per/302 branch March 11, 2026 20:58

Trecek mentioned this pull request Mar 12, 2026

Integration: collapsed PRs #337, #339, #336, #332, #338, #343, #341, #333, #342 into integration #351

Merged

This was referenced Mar 13, 2026

Token and timing summaries include stale data from prior pipeline runs in the same server session #340

Closed

Integration v0.3.1: Merge Queue, Sub-Recipes, PostToolUse Reformatter, Headless Isolation #404

Merged

Trecek mentioned this pull request Mar 19, 2026

Promote integration → main (56 PRs, 46 issues) #438

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline observability: quota events, wall-clock timing, drift fix#341

Pipeline observability: quota events, wall-clock timing, drift fix#341
Trecek merged 9 commits intointegrationfrom
combined-pipeline-observability-quota-guard-logging-and-per/302

Trecek commented Mar 11, 2026

Uh oh!

Trecek left a comment

Uh oh!

Trecek Mar 11, 2026

Uh oh!

Trecek Mar 11, 2026

Uh oh!

Trecek Mar 11, 2026

Uh oh!

Trecek Mar 11, 2026

Uh oh!

Trecek Mar 11, 2026

Uh oh!

Trecek Mar 11, 2026

Uh oh!

Trecek Mar 11, 2026

Uh oh!

Trecek Mar 11, 2026

Uh oh!

Trecek left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		@@ -151,9 +190,66 @@ async def get_timing_summary(clear: bool = False) -> str:
		total = _get_ctx().timing_log.compute_total()
		if clear:

Conversation

Trecek commented Mar 11, 2026

Summary

Group 1: Pipeline Observability — Quota Guard Logging and Per-Step Elapsed Time

Group 2: Remediation — Extract _get_log_root() helper in tools_status.py

Architecture Impact

Operational Diagram

State Lifecycle Diagram

Module Dependency Diagram

Implementation Plans

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

AutoSkillit Review Findings

src/autoskillit/server/helpers.py

src/autoskillit/server/tools_status.py

tests/server/test_tools_status.py

tests/server/test_server_init.py

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Group 2: Remediation — Extract `_get_log_root()` helper in `tools_status.py`