Skip to content

Implementation Plan: Conversational Ingredient Collection#363

Merged
Trecek merged 6 commits intointegrationfrom
cook-ingredient-collection-conversational-prompting-instead/331
Mar 12, 2026
Merged

Implementation Plan: Conversational Ingredient Collection#363
Trecek merged 6 commits intointegrationfrom
cook-ingredient-collection-conversational-prompting-instead/331

Conversation

@Trecek
Copy link
Collaborator

@Trecek Trecek commented Mar 12, 2026

Summary

Replace the mechanical "Prompt for input values using AskUserQuestion" instruction — which causes Claude to call AskUserQuestion once per ingredient field — with a conversational block that asks the user what they want to do, infers ingredient values from the free-form response, and only follows up on required ingredients that could not be inferred.

The change touches exactly two text locations: src/autoskillit/cli/_prompts.py (_build_orchestrator_prompt()) and src/autoskillit/server/tools_recipe.py (load_recipe docstring). Both receive identical replacement text. A cross-reference comment establishes the sync contract, satisfying REQ-ALIGN-001 without Python logic changes.

Requirements

PROMPT

  • REQ-PROMPT-001: The orchestrator prompt must instruct Claude to ask the user what they want to do conversationally rather than prompting for each ingredient field individually.
  • REQ-PROMPT-002: Claude must extract ingredient values from the user's free-form response and only follow up on required values that could not be inferred.
  • REQ-PROMPT-003: The ingredient collection instructions must be identical between the cook path (_build_orchestrator_prompt) and the load_recipe tool docstring path.

ALIGN

  • REQ-ALIGN-001: The orchestrator behavioral instructions shared between cook and open_kitchen + load_recipe must originate from a single source of truth.

Architecture Impact

Process Flow Diagram

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    START([Recipe Loaded])
    DONE([Pipeline Executing])

    subgraph Entry ["Entry Paths"]
        direction LR
        CLIPath["autoskillit cook recipe<br/>━━━━━━━━━━<br/>CLI Entry Point"]
        MCPPath["load_recipe MCP tool<br/>━━━━━━━━━━<br/>Agent calls tool"]
    end

    subgraph Prompt ["● Prompt Construction (modified)"]
        direction LR
        OrchestratorPrompt["● _build_orchestrator_prompt()<br/>━━━━━━━━━━<br/>cli/_prompts.py<br/>Injects --append-system-prompt"]
        LoadRecipeDoc["● load_recipe docstring<br/>━━━━━━━━━━<br/>server/tools_recipe.py<br/>LLM-visible behavioral contract"]
    end

    subgraph Collection ["● Conversational Ingredient Collection (modified)"]
        direction TB
        AskOpen["● Ask open-ended question<br/>━━━━━━━━━━<br/>What would you like to do?<br/>Single question only"]
        Infer["● Infer ingredient values<br/>━━━━━━━━━━<br/>task, source_dir, run_name…<br/>from free-form response"]
        Gate{"● Required values<br/>all inferred?"}
        FollowUp["● Follow-up question<br/>━━━━━━━━━━<br/>Ask only missing required<br/>values in one question"]
        Defaults["Accept optional ingredients<br/>━━━━━━━━━━<br/>Use defaults unless user<br/>explicitly overrode them"]
    end

    Execute["Execute pipeline steps<br/>━━━━━━━━━━<br/>Call MCP tools directly<br/>per recipe step sequence"]

    %% FLOW %%
    START --> CLIPath & MCPPath
    CLIPath --> OrchestratorPrompt
    MCPPath --> LoadRecipeDoc
    OrchestratorPrompt --> AskOpen
    LoadRecipeDoc --> AskOpen
    AskOpen --> Infer
    Infer --> Gate
    Gate -->|"yes — all required inferred"| Defaults
    Gate -->|"no — gaps remain"| FollowUp
    FollowUp --> Defaults
    Defaults --> Execute
    Execute --> DONE

    %% CLASS ASSIGNMENTS %%
    class START,DONE terminal;
    class CLIPath,MCPPath cli;
    class OrchestratorPrompt,LoadRecipeDoc handler;
    class AskOpen,Infer,FollowUp,Defaults newComponent;
    class Gate stateNode;
    class Execute phase;
Loading

Color Legend:

Color Category Description
Dark Blue Terminal/CLI Entry points and completion state
Orange Handler ● Modified prompt construction locations
Green New Component ● Modified conversational collection flow nodes
Teal State Decision: all required values inferred?
Purple Phase Pipeline execution (unchanged)

Closes #331

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/conv-ingredients-20260311-214046-753062/temp/make-plan/conversational_ingredient_collection_plan_2026-03-11_120000.md

Token Usage Summary

Token Summary

fix

  • input_tokens: 650
  • output_tokens: 261005
  • cache_creation_input_tokens: 873078
  • cache_read_input_tokens: 46076601
  • invocation_count: 13
  • elapsed_seconds: 6260.165456347986

resolve_review

  • input_tokens: 391
  • output_tokens: 168303
  • cache_creation_input_tokens: 449347
  • cache_read_input_tokens: 21658363
  • invocation_count: 7
  • elapsed_seconds: 4158.092358417003

audit_impl

  • input_tokens: 2771
  • output_tokens: 174758
  • cache_creation_input_tokens: 612009
  • cache_read_input_tokens: 4594262
  • invocation_count: 14
  • elapsed_seconds: 5043.723946458995

open_pr

  • input_tokens: 261
  • output_tokens: 170926
  • cache_creation_input_tokens: 528835
  • cache_read_input_tokens: 8445137
  • invocation_count: 10
  • elapsed_seconds: 4311.458001982992

review_pr

  • input_tokens: 187
  • output_tokens: 246219
  • cache_creation_input_tokens: 493945
  • cache_read_input_tokens: 6342581
  • invocation_count: 8
  • elapsed_seconds: 4665.401066615992

plan

  • input_tokens: 1665
  • output_tokens: 187361
  • cache_creation_input_tokens: 655841
  • cache_read_input_tokens: 11017819
  • invocation_count: 8
  • elapsed_seconds: 3779.893521053007

verify

  • input_tokens: 8274
  • output_tokens: 151282
  • cache_creation_input_tokens: 664012
  • cache_read_input_tokens: 12450916
  • invocation_count: 10
  • elapsed_seconds: 3138.836677256011

implement

  • input_tokens: 822
  • output_tokens: 279442
  • cache_creation_input_tokens: 928381
  • cache_read_input_tokens: 62486155
  • invocation_count: 11
  • elapsed_seconds: 6355.765075421015

analyze_prs

  • input_tokens: 15
  • output_tokens: 29915
  • cache_creation_input_tokens: 67472
  • cache_read_input_tokens: 374939
  • invocation_count: 1
  • elapsed_seconds: 524.8599999569997

merge_pr

  • input_tokens: 166
  • output_tokens: 56815
  • cache_creation_input_tokens: 314110
  • cache_read_input_tokens: 4466380
  • invocation_count: 9
  • elapsed_seconds: 1805.746022643012

create_review_pr

  • input_tokens: 27
  • output_tokens: 20902
  • cache_creation_input_tokens: 62646
  • cache_read_input_tokens: 856008
  • invocation_count: 1
  • elapsed_seconds: 444.64073076299974

resolve_merge_conflicts

  • input_tokens: 88
  • output_tokens: 52987
  • cache_creation_input_tokens: 122668
  • cache_read_input_tokens: 7419558
  • invocation_count: 1
  • elapsed_seconds: 975.0682389580033

diagnose_ci

  • input_tokens: 26
  • output_tokens: 3415
  • cache_creation_input_tokens: 31027
  • cache_read_input_tokens: 326408
  • invocation_count: 2
  • elapsed_seconds: 97.75469558501209

🤖 Generated with Claude Code via AutoSkillit

Trecek and others added 3 commits March 11, 2026 22:39
Adds test_orchestrator_prompt_ingredient_collection_is_conversational to
tests/cli/test_cli_prompts.py and TestIngredientCollectionAlignment class
to tests/contracts/test_instruction_surface.py. These tests fail against
the current mechanical text and will pass after the implementation changes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t collection

Replaces the mechanical "Prompt for input values using AskUserQuestion" instruction
in both _build_orchestrator_prompt() and the load_recipe docstring with a 4-clause
conversational block that asks users one open-ended question, infers ingredient
values from their free-form response, and only follows up on required values that
could not be inferred.

Adds a cross-reference [SYNC] comment in both locations to establish and document
the sync contract (REQ-ALIGN-001). Fixes all pre-commit linting issues in tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit PR Review — Verdict: changes_requested

9 actionable findings. See inline comments.

- Step overview with routing, retry, and capture info
- Kitchen rules
2. Prompt for input values using AskUserQuestion
[SYNC: identical block in load_recipe docstring — tools_recipe.py]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] slop: [SYNC: identical block in load_recipe docstring] annotation is redundant; the sync contract is already structurally enforced by TestIngredientCollectionAlignment. Remove the annotation to avoid maintenance-burden slop.

and formatting constraints needed for correct changes. Do NOT edit the YAML
file directly — always delegate modifications to write-recipe.
4. Prompt for input values using AskUserQuestion
[SYNC: identical block in _build_orchestrator_prompt — cli/_prompts.py]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] slop: [SYNC: identical block in _build_orchestrator_prompt] annotation is redundant; the sync contract is structurally enforced by TestIngredientCollectionAlignment. Remove the annotation.


prompt = _build_orchestrator_prompt("<dummy yaml>")
# New conversational behavior must be present
assert "infer" in prompt.lower(), (
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] tests: Assertion 'infer' in prompt.lower() is too broad; the word 'infer' can appear in unrelated prompt content (e.g. 'inferred', 'inference'). Assert the full sentinel phrase 'infer as many ingredient values' instead.

None,
)
if start is None:
return ""
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] defense: _extract_block returns an empty string on extraction failure; outer non-empty assertions attribute the failure to a missing sentinel, obscuring a broken extractor. Raise a descriptive exception inside _extract_block rather than returning an empty sentinel value.

# Walk back to find the opening line ("Collect ingredient values conversationally")
for i in range(start, -1, -1):
if "Collect ingredient values conversationally" in lines[i]:
start = i
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] bugs: The walk-back loop to find 'Collect ingredient values conversationally' silently leaves start at the sentinel line if the opener is not found; this produces a truncated one-line extraction that passes the alignment assertion falsely.

start = i
break
# Walk forward to find the closing clause (clause d about optional defaults)
end = start
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] cohesion: _extract_block is a nested function inside test_ingredient_collection_instructions_are_aligned. Move it to class scope so a future third surface does not require duplicating the extraction logic.

end = start
for i in range(start, min(start + 15, len(lines))):
if (
"optional ingredients" in lines[i].lower()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] defense: 15-line window hardcoded in _extract_block (start + 15); if the conversational block ever grows beyond 15 lines, extraction is silently truncated and the alignment assertion passes on an incomplete block. Use a named constant with a comment or make the search length-independent.

"optional ingredients" in lines[i].lower()
or "default values" in lines[i].lower()
):
end = i
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] bugs: If neither 'optional ingredients' nor 'default values' appears within the 15-line window, end stays at start and _extract_block returns a single line. The equality assertion then trivially passes on truncated single-line blocks, masking real divergence.


prompt_block = _extract_block(prompt)
doc_block = _extract_block(doc)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] bugs: line.strip() normalises indentation during block extraction. If the prompt block and docstring block ever use semantically-meaningful differing indentation (e.g., sub-clause nesting), stripping masks real divergence and the alignment assertion produces a false positive.

assert "infer" in prompt.lower(), (
"Orchestrator prompt must instruct Claude to infer ingredient values"
)
assert "free-form" in prompt.lower() or "open-ended" in prompt.lower(), (
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] tests: Disjunctive assertion ('free-form' in prompt.lower() or 'open-ended' in prompt.lower()) is weak: removing either phrase individually would not fail the test. Clarify whether both phrases are required (use two assertions) or if only one is canonical (assert it directly).

Copy link
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit review found 9 blocking issues. See inline comments.

Trecek and others added 2 commits March 12, 2026 09:53
…ools_recipe.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…o class scope

- Assert full sentinel phrase 'infer as many ingredient values' instead of partial 'infer'
- Split disjunctive assertion into two separate assertions for 'open-ended' and 'free-form'
- Move _extract_block from nested function to class-level @staticmethod
- Raise AssertionError with descriptive message on extraction failure instead of returning ""
- Raise AssertionError if opener not found during walk-back
- Replace hardcoded start+15 window with named constant _BLOCK_MAX_LINES=30 with comment
- Raise AssertionError if closing clause not found within search window
- Use textwrap.dedent() instead of line.strip() to preserve relative indentation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit PR Review — Verdict: changes_requested


class TestQuotaGuardStructuralEnforcement:
"""Quota guard must be structurally enforced by the PreToolUse hook, not via docstring."""

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] tests: _extract_block uses textwrap.dedent for comparison, but the two sources have different surrounding indentation levels. A whitespace-only change in one location will break the test without any semantic drift. Strip trailing whitespace per-line before comparing to make the comparison robust.

closer_idx = i
break
if closer_idx is None:
max_lines = TestIngredientCollectionAlignment._BLOCK_MAX_LINES
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] tests: _extract_block backward search starts at sentinel_idx (inclusive) rather than sentinel_idx - 1. If the opener and sentinel ever share a line the opener would match on the sentinel line, producing a zero-length block prefix silently.

assert "free-form" in prompt.lower(), "Orchestrator prompt must describe a free-form response"
# Old mechanical per-field instruction must be gone from the input-collection step
# (AskUserQuestion may still appear in the confirm-step section — that's expected)
lines = prompt.splitlines()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] tests: Negative assertion scans all prompt lines for Prompt for input values using AskUserQuestion. If this phrase ever appears in a comment or example block in the prompt, the test fails spuriously. Restrict the search to numbered-list lines for precision.

def test_ingredient_collection_instructions_are_aligned(self):
"""The conversational ingredient collection block must be identical in both paths."""
from autoskillit.cli._prompts import _build_orchestrator_prompt
from autoskillit.server.tools_recipe import load_recipe
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] tests: test_orchestrator_prompt_has_conversational_sentinel and test_load_recipe_docstring_has_conversational_sentinel are redundant - their failure modes are fully covered by test_ingredient_collection_instructions_are_aligned.

raise AssertionError(
f"Could not find closing clause ('optional ingredients' or 'default values') "
f"within {max_lines} lines of opener at line {opener_idx}"
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] bugs: closer_idx matches the first line of step d but the continuation line is excluded from the block slice (lines[opener_idx : closer_idx + 1]). Divergence introduced only in that continuation line would produce a false negative - the alignment test would not catch it.

opener_idx + TestIngredientCollectionAlignment._BLOCK_MAX_LINES, len(lines)
)
for i in range(opener_idx, search_limit):
if "optional ingredients" in lines[i].lower() or "default values" in lines[i].lower():
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] cohesion: _extract_block is a @staticmethod but references TestIngredientCollectionAlignment._BLOCK_MAX_LINES by fully-qualified class name. Convert to @classmethod and use cls._BLOCK_MAX_LINES for consistency.

from autoskillit.server.tools_recipe import load_recipe

prompt = _build_orchestrator_prompt("<dummy yaml>")
doc = load_recipe.__doc__ or ""
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] cohesion: test_orchestrator_prompt_has_conversational_sentinel and test_load_recipe_docstring_has_conversational_sentinel add no coverage beyond the alignment test. Remove them to reduce asymmetric test granularity.

@@ -34,6 +34,31 @@ def test_build_orchestrator_prompt_not_in_app_module():
)


Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] cohesion: Duplicates the sentinel assertion already in TestIngredientCollectionAlignment.test_orchestrator_prompt_has_conversational_sentinel (test_instruction_surface.py). Split across two files without clear boundary.

- Step overview with routing, retry, and capture info
- Kitchen rules
2. Prompt for input values using AskUserQuestion
2. Collect recipe ingredients from the user:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] cohesion: The conversational collection block is duplicated verbatim between _prompts.py and tools_recipe.py docstring. The alignment test guards against drift but the duplication itself is a maintenance risk. Consider extracting to a shared constant.

Copy link
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit review found 5 blocking issues (warnings). See inline comments. Verdict: changes_requested.

…sentinel tests

- Convert @staticmethod to @classmethod so _extract_block references
  cls._BLOCK_MAX_LINES instead of the fully-qualified class name
- Start backward search at sentinel_idx - 1 (exclusive) so opener and
  sentinel can never share a line and produce a zero-length block prefix
- Extend block slice to include continuation lines of the closing clause
  (lines deeper-indented than the closer line) so divergence in the
  continuation line is caught by the alignment assertion
- Strip trailing whitespace per line before returning so whitespace-only
  differences in surrounding indentation context do not break the test
- Remove test_orchestrator_prompt_has_conversational_sentinel and
  test_load_recipe_docstring_has_conversational_sentinel — both are
  fully covered by _extract_block raising AssertionError when the
  sentinel is absent in test_ingredient_collection_instructions_are_aligned

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Trecek Trecek added this pull request to the merge queue Mar 12, 2026
Merged via the queue into integration with commit cbb2da4 Mar 12, 2026
2 checks passed
@Trecek Trecek deleted the cook-ingredient-collection-conversational-prompting-instead/331 branch March 12, 2026 23:36
Trecek added a commit that referenced this pull request Mar 15, 2026
…, Headless Isolation (#404)

## Summary

Integration rollup of **43 PRs** (#293#406) consolidating **62
commits** across **291 files** (+27,909 / −6,040 lines). This release
advances AutoSkillit from v0.2.0 to v0.3.1 with GitHub merge queue
integration, sub-recipe composition, a PostToolUse output reformatter,
headless session isolation guards, and comprehensive pipeline
observability — plus 24 new bundled skills, 3 new MCP tools, and 47 new
test files.

---

## Major Features

### GitHub Merge Queue Integration (#370, #362, #390)
- New `wait_for_merge_queue` MCP tool — polls a PR through GitHub's
merge queue until merged, ejected, or timed out (default 600s). Uses
REST + GraphQL APIs with stuck-queue detection and auto-merge
re-enrollment
- New `DefaultMergeQueueWatcher` L1 service (`execution/merge_queue.py`)
— never raises; all outcomes are structured results
- `parse_merge_queue_response()` pure function for GraphQL queue entry
parsing
- New `auto_merge` ingredient in `implementation.yaml` and
`remediation.yaml` — enrolls PRs in the merge queue after CI passes
- Full queue-mode path added to `merge-prs.yaml`: detect queue → enqueue
→ wait → handle ejections → re-enter
- `analyze-prs` skill gains Step 0.5 (merge queue detection) and Step
1.5 (CI/review eligibility filtering)

### Sub-Recipe Composition (#380)
- Recipe steps can now reference sub-recipes via `sub_recipe` + `gate`
fields — lazy-loaded and merged at validation time
- Composition engine in `recipe/_api.py`: `_merge_sub_recipe()` inlines
sub-recipe steps with safe name-prefixing and route remapping (`done` →
parent's `on_success`, `escalate` → parent's `on_failure`)
- `_build_active_recipe()` evaluates gate ingredients against
overrides/defaults; dual validation runs on both active and combined
recipes
- First sub-recipe: `sprint-prefix.yaml` — triage → plan → confirm →
dispatch workflow, gated by `sprint_mode` ingredient (hidden, default
false)
- Both `implementation.yaml` and `remediation.yaml` gain `sprint_entry`
placeholder step
- New semantic rules: `unknown-sub-recipe` (ERROR),
`circular-sub-recipe` (ERROR) with DFS cycle detection

### PostToolUse Output Reformatter (#293, #405)
- `pretty_output.py` — new 671-line PostToolUse hook that rewrites raw
MCP JSON responses to Markdown-KV before Claude consumes them (30–77%
token overhead reduction)
- Dedicated formatters for 11 high-traffic tools (`run_skill`,
`run_cmd`, `test_check`, `merge_worktree`, `get_token_summary`, etc.)
plus a generic KV formatter for remaining tools
- Pipeline vs. interactive mode detection via hook config file
- Unwraps Claude Code's `{"result": "<json-string>"}` envelope before
dispatching
- 1,516-line test file with 40+ behavioral tests

### Headless Session Isolation (#359, #393, #397, #405, #406)
- **Env isolation**: `build_sanitized_env()` strips
`AUTOSKILLIT_PRIVATE_ENV_VARS` from subprocess environments, preventing
`AUTOSKILLIT_HEADLESS=1` from leaking into test runners
- **CWD path contamination defense**: `_inject_cwd_anchor()` anchors all
relative paths to session CWD; `_validate_output_paths()` checks
structured output tokens against CWD prefix; `_scan_jsonl_write_paths()`
post-session scanner catches actual Write/Edit/Bash tool calls outside
CWD
- **Headless orchestration guard**: new PreToolUse hook blocks
`run_skill`/`run_cmd`/`run_python` when `AUTOSKILLIT_HEADLESS=1`,
enforcing Tier 1/Tier 2 nesting invariant
- **`_require_not_headless()` server-side guard**: blocks 10
orchestration-only tools from headless sessions at the handler layer
- **Unified error response contract**: `headless_error_result()`
produces consistent 9-field responses;
`_build_headless_error_response()` canonical builder for all failure
paths in `tools_integrations.py`

### Cook UX Overhaul (#375, #363)
- `open_kitchen` now accepts optional `name` + `overrides` — opens
kitchen AND loads recipe in a single call
- Pre-launch terminal preview with ANSI-colored flow diagram and
ingredients table via new `cli/_ansi.py` module
- `--dangerously-skip-permissions` warning banner with interactive
confirmation prompt
- Randomized session greetings from themed pools
- Orchestrator prompt rewritten: recipe YAML no longer injected via
`--append-system-prompt`; session calls `open_kitchen('{recipe_name}')`
as first action
- Conversational ingredient collection replaces mechanical per-field
prompting

---

## New MCP Tools

| Tool | Gate | Description |
|------|------|-------------|
| `wait_for_merge_queue` | Kitchen | Polls PR through GitHub merge queue
(REST + GraphQL) |
| `set_commit_status` | Kitchen | Posts GitHub Commit Status to a SHA
for review-first gating |
| `get_quota_events` | Ungated | Surfaces quota guard decisions from
`quota_events.jsonl` |

---

## Pipeline Observability (#318, #341)

- **`TelemetryFormatter`** (`pipeline/telemetry_fmt.py`) — single source
of truth for all telemetry rendering; replaces dual-formatter
anti-pattern. Four rendering modes: Markdown table, terminal table,
compact KV (for PostToolUse hook)
- `get_token_summary` and `get_timing_summary` gain `format` parameter
(`"json"` | `"table"`)
- `wall_clock_seconds` merged into token summary output — see duration
alongside token counts in one call
- **Telemetry clear marker**: `write_telemetry_clear_marker()` /
`read_telemetry_clear_marker()` prevent token accounting drift on MCP
server restart after `clear=True`
- **Quota event logging**: `quota_check.py` hook now writes structured
JSONL events (`cache_miss`, `parse_error`, `blocked`, `approved`) to
`quota_events.jsonl`

---

## CI Watcher & Remote Resolution Fixes (#395, #406)

- **`CIRunScope` value object** — carries `workflow` + `head_sha` scope;
replaces bare `head_sha` parameter across all CI watcher signatures
- **Workflow filter**: `wait_for_ci` and `get_ci_status` accept
`workflow` parameter (falls back to project-level `config.ci.workflow`),
preventing unrelated workflows (version bumps, labelers) from satisfying
CI checks
- **`FAILED_CONCLUSIONS` expanded**: `failure` → `{failure, timed_out,
startup_failure, cancelled}`
- **Canonical remote resolver** (`execution/remote_resolver.py`):
`resolve_remote_repo()` with `REMOTE_PRECEDENCE = (upstream, origin)` —
correctly resolves `owner/repo` after `clone_repo` sets `origin` to
`file://` isolation URL
- **Clone isolation fix**: `clone_repo` now always clones from remote
URL (never local path); sets `origin=file:///<clone>` for isolation and
`upstream=<real_url>` for push/CI operations

---

## PR Pipeline Gates (#317, #343)

- **`pipeline/pr_gates.py`**: `is_ci_passing()`, `is_review_passing()`,
`partition_prs()` — partitions PRs into
eligible/CI-blocked/review-blocked with human-readable reasons
- **`pipeline/fidelity.py`**: `extract_linked_issues()`
(Closes/Fixes/Resolves patterns), `is_valid_fidelity_finding()` schema
validation
- **`check_pr_mergeable`** now returns `mergeable_status` field
alongside boolean
- **`release_issue`** gains `target_branch` + `staged_label` parameters
for staged issue lifecycle on non-default branches (#392)

---

## Recipe System Changes

### Structural
- `RecipeIngredient.hidden` field — excluded from ingredients table
(used for internal flags like `sprint_mode`)
- `Recipe.experimental` flag parsed from YAML
- `_TERMINAL_TARGETS` moved to `schema.py` as single source of truth
- `format_ingredients_table()` with sorted display order (required →
auto-detect → flags → optional → constants)
- Diagram rendering engine (~670 lines) removed from `diagrams.py` —
rendering now handled by `/render-recipe` skill; format version bumped
to v7

### Recipe YAML Changes
- **Deleted**: `audit-and-fix.yaml`, `batch-implementation.yaml`,
`bugfix-loop.yaml`
- **Renamed**: `pr-merge-pipeline.yaml` → `merge-prs.yaml`
- **`implementation.yaml`**: merge queue steps,
`auto_merge`/`sprint_mode` ingredients, `base_branch` default → `""`
(auto-detect), CI workflow filter, `extract_pr_number` step
- **`remediation.yaml`**: `topic` → `task` rename, merge queue steps,
`dry_walkthrough` retries:3 with forward-only routing, `verify` → `test`
rename
- **`merge-prs.yaml`**: full queue-mode path, `open-integration-pr` step
(replaces `create-review-pr`), post-PR mergeability polling, review
cycle with `resolve-review` retries

### New Semantic Rules
- `missing-output-patterns` (WARNING) — flags `run_skill` steps without
`expected_output_patterns`
- `unknown-sub-recipe` (ERROR) — validates sub-recipe references exist
- `circular-sub-recipe` (ERROR) — DFS cycle detection
- `unknown-skill-command` (ERROR) — validates skill names against
bundled set
- `telemetry-before-open-pr` (WARNING) — ensures telemetry step precedes
`open-pr`

---

## New Skills (24)

### Architecture Lens Family (13)
`arch-lens-c4-container`, `arch-lens-concurrency`,
`arch-lens-data-lineage`, `arch-lens-deployment`,
`arch-lens-development`, `arch-lens-error-resilience`,
`arch-lens-module-dependency`, `arch-lens-operational`,
`arch-lens-process-flow`, `arch-lens-repository-access`,
`arch-lens-scenarios`, `arch-lens-security`, `arch-lens-state-lifecycle`

### Audit Family (5)
`audit-arch`, `audit-bugs`, `audit-cohesion`, `audit-defense-standards`,
`audit-tests`

### Planning & Diagramming (3)
`elaborate-phase`, `make-arch-diag`, `make-req`

### Bug/Guard Lifecycle (2)
`design-guards`, `verify-diag`

### Pipeline (1)
`open-integration-pr` — creates integration PRs with per-PR details,
arch-lens diagrams, carried-forward `Closes #N` references, and
auto-closes collapsed PRs

### Sprint Planning (1 — gated by sub-recipe)
`sprint-planner` — selects a focused, conflict-free sprint from a triage
manifest

---

## Skill Modifications (Highlights)

- **`analyze-prs`**: merge queue detection, CI/review eligibility
filtering, queue-mode ordering
- **`dry-walkthrough`**: Step 4.5 Historical Regression Check (git
history mining + GitHub issue cross-reference)
- **`review-pr`**: deterministic diff annotation via
`diff_annotator.py`, echo-primary-obligation step, post-completion
confirmation, degraded-mode narration
- **`collapse-issues`**: content fidelity enforcement — per-issue
`fetch_github_issue` calls, copy-mode body assembly (#388)
- **`prepare-issue`**: multi-keyword dedup search, numbered candidate
selection, extend-existing-issue flow
- **`resolve-review`**: GraphQL thread auto-resolution after addressing
findings (#379)
- **`resolve-merge-conflicts`**: conflict resolution decision report
with per-file log (#389)
- **Cross-skill**: output tokens migrated to `key = value` format;
code-index paths made generic with fallback notes; arch-lens references
fully qualified; anti-prose guards at loop boundaries

---

## CLI & Hooks

### New CLI Commands
- `autoskillit install` — plugin installation + cache refresh
- `autoskillit upgrade` — `.autoskillit/scripts/` →
`.autoskillit/recipes/` migration

### CLI Changes
- `doctor`: plugin-aware MCP check, PostToolUse hook scanning, `--fix`
flag removed
- `init`: GitHub repo prompt, `.secrets.yaml` template, plugin-aware
registration
- `chefs-hat`: pre-launch banner, `--dangerously-skip-permissions`
confirmation
- `recipes render`: repurposed from generator to viewer (delegates to
`/render-recipe`)
- `serve`: server import deferred to after `configure_logging()` to
prevent stdout corruption

### New Hooks
- `branch_protection_guard.py` (PreToolUse) — denies
`merge_worktree`/`push_to_remote` targeting protected branches
- `headless_orchestration_guard.py` (PreToolUse) — blocks orchestration
tools in headless sessions
- `pretty_output.py` (PostToolUse) — MCP JSON → Markdown-KV reformatter

### Hook Infrastructure
- `HookDef.event_type` field — registry now handles both PreToolUse and
PostToolUse
- `generate_hooks_json()` groups entries by event type
- `_evict_stale_autoskillit_hooks` and `sync_hooks_to_settings` made
event-type-agnostic

---

## Core & Config

### New Core Modules
- `core/branch_guard.py` — `is_protected_branch()` pure function
- `core/github_url.py` — `parse_github_repo()` +
`normalize_owner_repo()` canonical parsers

### Core Type Expansions
- `AUTOSKILLIT_PRIVATE_ENV_VARS` frozenset
- `WORKER_TOOLS` / `HEADLESS_BLOCKED_UNGATED_TOOLS` split from
`UNGATED_TOOLS`
- `TOOL_CATEGORIES` — categorized listing for `open_kitchen` response
- `CIRunScope` — immutable scope for CI watcher calls
- `MergeQueueWatcher` protocol
- `SkillResult.cli_subtype` + `write_path_warnings` fields
- `SubprocessRunner.env` parameter

### Config
- `safety.protected_branches`: `[main, integration, stable]`
- `github.staged_label`: `"staged"`
- `ci.workflow`: workflow filename filter (e.g., `"tests.yml"`)
- `branching.default_base_branch`: `"integration"` → `"main"`
- `ModelConfig.default`: `str | None` → `str = "sonnet"`

---

## Infrastructure & Release

### Version
- `0.2.0` → `0.3.1` across `pyproject.toml`, `plugin.json`, `uv.lock`
- FastMCP dependency: `>=3.0.2` → `>=3.1.1,<4.0` (#399)

### CI/CD Workflows
- **`version-bump.yml`** (new) — auto patch-bumps `main` on integration
PR merge, force-syncs integration branch one patch ahead
- **`release.yml`** (new) — minor version bump + GitHub Release on merge
to `stable`
- **`codeql.yml`** (new) — CodeQL analysis for `stable` PRs (Python +
Actions)
- **`tests.yml`** — `merge_group:` trigger added; multi-OS now only for
`stable`

### PyPI Readiness
- `pyproject.toml`: `readme`, `license`, `authors`, `keywords`,
`classifiers`, `project.urls`, `hatch.build.targets.sdist` inclusion
list

### readOnlyHint Parallel Execution Fix
- All MCP tools annotated `readOnlyHint=True` — enables Claude Code
parallel tool execution (~7x speedup). One deliberate exception:
`wait_for_merge_queue` uses `readOnlyHint=False` (actually mutates queue
state)

### Tool Response Exception Boundary
- `track_response_size` decorator catches unhandled exceptions and
serializes them as `{"success": false, "subtype": "tool_exception"}` —
prevents FastMCP opaque error wrapping

### SkillResult Subtype Normalization (#358)
- `_normalize_subtype()` gate eliminates dual-source contradiction
between CLI subtype and session outcome
- Class 2 upward: `SUCCEEDED + error_subtype → "success"` (drain-race
artifact)
- Class 1 downward: `non-SUCCEEDED + "success" → "empty_result"` /
`"missing_completion_marker"` / `"adjudicated_failure"`

---

## Test Coverage

**47 new test files** (+12,703 lines) covering:

| Area | Key Tests |
|------|-----------|
| Merge queue watcher state machine | `test_merge_queue.py` (226 lines)
|
| Clone isolation × CI resolution | `test_clone_ci_contract.py`,
`test_remote_resolver.py` |
| PostToolUse hook | `test_pretty_output.py` (1,516 lines, 40+ cases) |
| Branch protection + headless guards |
`test_branch_protection_guard.py`,
`test_headless_orchestration_guard.py` |
| Sub-recipe composition | 5 test files (schema, loading, validation,
sprint mode × 2) |
| Telemetry formatter | `test_telemetry_formatter.py` (281 lines) |
| PR pipeline gates | `test_analyze_prs_gates.py`,
`test_review_pr_fidelity.py` |
| Diff annotator | `test_diff_annotator.py` (242 lines) |
| Skill compliance | Output token format, genericization, loop-boundary
guards |
| Release workflows | Structural contracts for `version-bump.yml`,
`release.yml` |
| Issue content fidelity | Body-assembling skills must call
`fetch_github_issue` per-issue |
| CI watcher scope | `test_ci_params.py` — workflow_id query param
composition |

---

## Consolidated PRs

#293, #295, #314, #315, #316, #317, #318, #319, #323, #332, #336, #337,
#338, #339, #341, #343, #351, #358, #359, #360, #361, #362, #363, #366,
#368, #370, #375, #377, #378, #379, #380, #388, #389, #390, #391, #392,
#393, #395, #396, #397, #399, #405, #406

---

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant