Rectify: stdlib-only Contract for SKILL.md Python Blocks#508
Conversation
…c rule Adds a new Severity.ERROR semantic rule that detects when SKILL.md bash blocks import from the autoskillit package. Headless sessions run with the system Python interpreter which may not have autoskillit installed, making such imports unreliable or silently broken. The rule scans bash blocks via extract_bash_blocks() and matches three patterns: `from autoskillit`, `import autoskillit`, and string-literal __import__ form (`'autoskillit'`). Fires one RuleFinding per violating block with Severity.ERROR, blocking validate_recipe() from succeeding. Adds 7 tests (SC-PKG-1 through SC-PKG-7): positive cases for from-import, heredoc, bare import, and __import__ forms; negative cases for stdlib-only and no-python blocks; and an xfail(strict=False) integration test that asserts zero findings on the bundled merge-prs.yaml recipe — expected to pass once Parts B and C remove the four known violations. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tic error assertions Pre-existing tests that assert zero ERROR-severity semantic findings on bundled recipes now fail because the new rule fires on open-pr and review-pr SKILL.md files. These violations are known and scheduled for Parts B and C. Filter the no-autoskillit-import-in-skill-python-block rule from the affected assertions so the broader test suite remains green until the violations are resolved.
… through TSA-8) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…y fixes - Add stdlib-only token_summary_appender.py PostToolUse hook that fires after every run_skill response, reads on-disk session logs, and appends ## Token Usage Summary to any PR opened via gh - Register new HookDef in HOOK_REGISTRY with run_skill-scoped matcher - Extend _extract_hook_commands() in test_hook_executability.py to cover PostToolUse hooks (previously only PreToolUse was subprocess-tested) - Replace hardcoded test_posttooluse_hooks_are_registered with generic structural invariant: every hook script must be registered as either PreToolUse or PostToolUse (no orphaned scripts) - Remove Step 0b from open-pr/SKILL.md (LLM-dependent autoskillit import inside Python heredoc — replaced by PostToolUse hook) - Remove stale token_summary_path input from open-pr skill_contracts.yaml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add token_summary_appender.py to _BROAD_EXCEPT_EXEMPT (stdlib-only hook, same treatment as pretty_output.py) - Document token_summary_appender.py in CLAUDE.md hooks section - Update test_open_pr_skill_self_retrieves_token_summary → _via_hook to reflect PostToolUse hook approach instead of removed Step 0b self-retrieval - Add hook reference to open-pr/SKILL.md Output section Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…r, review-pr, analyze-prs SKILL.md - Add compute_domain_partitions, annotate_pr_diff, fetch_merge_queue_data callables to smoke_utils.py - Replace python3/python import blocks in all three SKILL.md files with stdlib file-reads - Insert run_python pre-staging steps in merge-prs.yaml (fetch_merge_queue_data, compute_domain_partitions, annotate_pr_diff) - Add domain_partitions_path, annotated_diff_path, hunk_ranges_path, merge_queue_data_path to skill_contracts.yaml - Upgrade SC-PKG-7 from xfail to a normal assertion (all violations resolved) - Add C-OIP-1, C-APR-1, C-RPR-1 contract tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nputs in all recipes - Update test_merge_prs.py routing assertions to reflect new pre-staging step routing (fetch_merge_queue_data before analyze_prs, compute_domain_partitions before open_integration_pr, annotate_pr_diff before review_pr_integration) - Wire annotated_diff_path and hunk_ranges_path in review_pr steps of implementation.yaml, implementation-groups.yaml, and remediation.yaml to satisfy uncaptured-handoff-consumer rule (context values are empty for non-pipeline callers; SKILL.md fallback handles absent paths gracefully) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… in review-pr contract Change annotated_diff_path and hunk_ranges_path from file_path to string type in skill_contracts.yaml. The file_path type triggers the uncaptured-handoff-consumer semantic rule for other recipe callers (implementation.yaml, remediation.yaml, implementation-groups.yaml) where open_pr_step (outputs: []) appears as an indirect predecessor via bypass-edge injection. Since these files are produced by a run_python step (not a skill), string type is architecturally correct and avoids false-positive rule violations in non-merge-prs recipe callers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace cross-package submodule imports (autoskillit.execution.pr_analysis, autoskillit.execution.diff_annotator, autoskillit.execution.github, autoskillit.core.io) with package __init__ re-exports. Replace direct write_text() calls with atomic_write() from autoskillit.core. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Trecek
left a comment
There was a problem hiding this comment.
AutoSkillit PR Review — Verdict: changes_requested (10 inline findings: 1 critical, 9 warning)
| cwd=cwd, | ||
| ) | ||
| info = json.loads(repo_info.stdout) | ||
| owner = info["owner"]["login"] |
There was a problem hiding this comment.
[warning] bugs: info["owner"]["login"] uses bare dict access. If gh repo view returns unexpected JSON schema, KeyError or TypeError propagates uncaught. Wrap in try/except with descriptive error or validate keys.
There was a problem hiding this comment.
Investigated — this is intentional. The subprocess at L109-115 uses check=True, so CalledProcessError is raised before json.loads if gh exits non-zero. The gh repo view --json owner,name command returns a stable schema where owner.login is always present on success.
| from autoskillit.execution import parse_merge_queue_response # noqa: PLC0415 | ||
|
|
||
| repo_info = subprocess.run( | ||
| ["gh", "repo", "view", "--json", "owner,name"], |
There was a problem hiding this comment.
[warning] cohesion: fetch_merge_queue_data uses check=True for gh repo view but manually checks returncode for gh api graphql. Inconsistent error handling strategy within the same function.
There was a problem hiding this comment.
Investigated — this is intentional. check=True for gh repo view is a hard prerequisite (function cannot proceed without owner/name). Manual returncode check for GraphQL is a soft dependency — empty entries list is a valid degraded state (no PRs in queue). This dual-mode error handling reflects different criticality levels.
| tool: run_skill | ||
| with: | ||
| skill_command: "/autoskillit:analyze-prs ${{ inputs.base_branch }}" | ||
| skill_command: "/autoskillit:analyze-prs ${{ inputs.base_branch }} merge_queue_data_path=${{ context.merge_queue_data_path }}" |
There was a problem hiding this comment.
[warning] bugs: When fetch_merge_queue_data fails and routes to analyze_prs via on_failure, context.merge_queue_data_path is empty string. The skill_command passes merge_queue_data_path= (empty value). Verify analyze-prs handles empty-string path correctly via its bash guard.
There was a problem hiding this comment.
Investigated — this is intentional. The step note at lines 179-183 documents the failure path. The analyze-prs SKILL.md bash guard uses [ -n "${merge_queue_data_path:-}" ] to handle empty/unset values, falling back to QUEUE_ENTRIES="[]". This is the designed fallback (QUEUE_MODE=false).
| - name: base_branch | ||
| type: string | ||
| required: true | ||
| - name: annotated_diff_path |
There was a problem hiding this comment.
[warning] cohesion: annotated_diff_path and hunk_ranges_path use type string while merge_queue_data_path and domain_partitions_path use type file_path. All four are pre-staged file paths. Use consistent type file_path for all.
There was a problem hiding this comment.
Investigated — this is intentional. The type 'path' does not exist in skill_contracts.yaml (vocabulary: string, file_path, directory_path, file_path_list). The change from file_path to string for annotated_diff_path/hunk_ranges_path was deliberate (commit ee3b894) — file_path triggered false-positive uncaptured-handoff-consumer semantic rule violations.
Trecek
left a comment
There was a problem hiding this comment.
AutoSkillit review found 10 blocking issues (1 critical, 9 warning). See inline comments.
Critical
- token_summary_appender.py:69 —
re.searchreceives non-stringresult_textwhen inner JSONresultfield is a dict/list/int. TypeError at runtime.
Warnings (source)
- token_summary_appender.py:31 —
_log_root()ignoresXDG_DATA_HOME, diverging fromresolve_log_dir(). Silent zero-session result in CI. - token_summary_appender.py:111 — TOCTOU:
read_text()not guarded afterexists()check. - token_summary_appender.py:232 — Empty
gh pr viewstdout causes PR body replacement with only the token table. - token_summary_appender.py:234 —
CalledProcessErrorcaught by broadexcept Exceptionwith generic message. - smoke_utils.py:117 — Bare
info["owner"]["login"]dict access;KeyError/TypeErroron unexpected JSON. - smoke_utils.py:135 —
json.loadson GraphQL stdout not guarded againstJSONDecodeError. - smoke_utils.py:110 — Asymmetric error handling:
check=Trueforgh repo viewbut manual returncode forgh api graphql. - merge-prs.yaml:188 — Empty
context.merge_queue_data_pathpassed asmerge_queue_data_path=on failure path. - skill_contracts.yaml:389 —
annotated_diff_path/hunk_ranges_pathuse typestringwhile sibling inputs usefile_path.
Additional info-level findings (non-blocking)
- Redundant
mkdirinannotate_pr_diff(smoke_utils.py:86) - "Parts B and C" implementation-phase comments in 5 test files
- Weak assertions in TSA-2 and TSA-5 tests
post_only = all_scripts - pre_registeredlogic flaw in hook registration coverage test
…checks - Guard result_text with isinstance(str) before re.search in _extract_pr_url - Honor XDG_DATA_HOME in _log_root() instead of hardcoding ~/.local/share - Replace TOCTOU .exists()/.read_text() with try/except in _load_sessions - Guard empty PR body from gh pr view before appending token summary - Catch CalledProcessError separately from gh pr edit for better diagnostics - Wrap json.loads(graphql_result.stdout) in try/except JSONDecodeError Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
SKILL.md bash blocks can embed
python3 -corpython3 - <<'EOF'blocks that import from theautoskillitpackage. These blocks execute inside headless Claude Code sessions where the Python interpreter is the user's system Python — not the project's virtual environment. The package may not be installed there, causing silent or visible failures. The architecture has no structural mechanism preventing this pattern.This part adds a semantic validation rule that makes the pattern statically unacceptable: any SKILL.md that imports
autoskillitin a python3 bash block fails validation atvalidate_recipe()time, before the skill can be executed. Once merged, CI rejects future violations at commit time rather than surfacing them silently at runtime.Part B adds a deterministic PostToolUse hook (
token_summary_appender) replacing the LLM-dependent Step 0b token summary injection. Part C fixes the remaining four SKILL.md import violations caught by the new rule.Architecture Impact
Module Dependency Diagram
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%% graph TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; subgraph L3_Hooks ["L3 — HOOKS (PostToolUse)"] direction LR TSA["★ token_summary_appender.py<br/>━━━━━━━━━━<br/>stdlib-only PostToolUse hook<br/>Appends token table to PR body"] HREG["● hook_registry.py<br/>━━━━━━━━━━<br/>HookDef, HOOK_REGISTRY<br/>imports: core.pkg_root"] end subgraph L2_Recipe ["L2 — RECIPE (Semantic Validation)"] direction LR RSC["● rules_skill_content.py<br/>━━━━━━━━━━<br/>+no-autoskillit-import rule<br/>imports: core, recipe._analysis,<br/>recipe._skill_placeholder_parser,<br/>recipe.contracts, recipe.registry"] REG["recipe/registry.py<br/>━━━━━━━━━━<br/>@semantic_rule, RuleFinding<br/>imports: core, recipe._analysis,<br/>recipe.schema"] SPP["recipe/_skill_placeholder_parser.py<br/>━━━━━━━━━━<br/>extract_bash_blocks()<br/>imports: stdlib only"] CON["● recipe/skill_contracts.yaml<br/>━━━━━━━━━━<br/>Contract definitions"] end subgraph L1_Exec ["L1 — EXECUTION"] direction LR EINIT["● execution/__init__.py<br/>━━━━━━━━━━<br/>Re-export hub<br/>+parse_hunk_ranges"] end subgraph L0_Core ["L0 — CORE (Foundation)"] direction LR CORE["core/<br/>━━━━━━━━━━<br/>Severity, pkg_root,<br/>SkillResult, types"] end subgraph Skills ["SKILL.md FILES (bash blocks — not Python imports)"] direction LR OPR["● open-pr/SKILL.md<br/>━━━━━━━━━━<br/>Removed python3 heredoc<br/>importing autoskillit"] OIP["● open-integration-pr/SKILL.md<br/>━━━━━━━━━━<br/>Replaced autoskillit import<br/>with stdlib inline"] RPR["● review-pr/SKILL.md<br/>━━━━━━━━━━<br/>Replaced autoskillit import<br/>with stdlib inline"] APR["● analyze-prs/SKILL.md<br/>━━━━━━━━━━<br/>Replaced __import__ autoskillit<br/>with stdlib inline"] end subgraph Tests ["TEST COVERAGE"] direction LR TTSA["★ test_token_summary_appender.py<br/>━━━━━━━━━━<br/>TSA-1 through TSA-8"] TRSC["● test_rules_skill_content.py<br/>━━━━━━━━━━<br/>SC-PKG-1 through SC-PKG-7"] TRVD["★ test_review_pr_diff_annotation.py"] THEX["● test_hook_executability.py"] THRC["● test_hook_registration_coverage.py"] end %% L2 internal dependencies %% RSC -->|"imports"| REG RSC -->|"imports"| SPP RSC -->|"imports"| CON %% Downward dependencies (valid) %% RSC -->|"imports Severity"| CORE REG -->|"imports Severity"| CORE HREG -->|"imports pkg_root"| CORE EINIT -->|"re-exports from"| CORE %% TSA is stdlib-only — no package dependencies %% TSA -.-x|"NO autoskillit imports<br/>(stdlib-only by design)"| CORE %% Semantic rule scans SKILL.md files %% RSC -.->|"validates bash blocks"| OPR RSC -.->|"validates bash blocks"| OIP RSC -.->|"validates bash blocks"| RPR RSC -.->|"validates bash blocks"| APR %% Test dependencies %% TTSA -.->|"tests"| TSA TRSC -.->|"tests"| RSC THEX -.->|"tests"| HREG THRC -.->|"tests"| HREG %% Class assignments %% class TSA,TTSA,TRVD newComponent; class RSC,REG,SPP,CON phase; class HREG handler; class EINIT handler; class CORE stateNode; class OPR,OIP,RPR,APR gap; class TRSC,THEX,THRC output;Process Flow Diagram
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; START([Pipeline Start]) subgraph Validation ["★ SEMANTIC VALIDATION (no-autoskillit-import rule)"] direction TB VR["● validate_recipe / load_recipe<br/>━━━━━━━━━━<br/>MCP tool entry point<br/>calls run_semantic_rules()"] VS["● run_semantic_rules()<br/>━━━━━━━━━━<br/>Iterates _RULE_REGISTRY<br/>calls each @semantic_rule"] VI{"★ _check_no_autoskillit_import<br/>━━━━━━━━━━<br/>step.tool == run_skill?"} VB["★ extract_bash_blocks()<br/>━━━━━━━━━━<br/>Parse SKILL.md content<br/>extract python3 blocks"] VM{"★ Pattern match?<br/>━━━━━━━━━━<br/>from autoskillit...<br/>import autoskillit...<br/>__import__ autoskillit"} VF["★ RuleFinding(ERROR)<br/>━━━━━━━━━━<br/>Skill X bash block contains<br/>autoskillit import"] VP["Pass<br/>━━━━━━━━━━<br/>No violation found"] end subgraph Hook ["★ POSTTOOLUSE HOOK (token_summary_appender)"] direction TB HT["★ PostToolUse fires<br/>━━━━━━━━━━<br/>on run_skill response"] HP{"★ PR URL in response?<br/>━━━━━━━━━━<br/>regex: github.com/.../pull/N"} HS["★ Read session logs<br/>━━━━━━━━━━<br/>sessions.jsonl + token_usage.json<br/>filter by cwd"] HF["★ Format token table<br/>━━━━━━━━━━<br/>stdlib-only markdown formatter"] HA["★ gh pr edit --body-file<br/>━━━━━━━━━━<br/>Append Token Usage Summary<br/>to PR body"] HN["Skip<br/>━━━━━━━━━━<br/>No PR URL or no data"] end subgraph SkillFix ["● SKILL.MD FIXES (bash block cleanup)"] direction TB SF["● 4 SKILL.md files<br/>━━━━━━━━━━<br/>open-pr, open-integration-pr,<br/>review-pr, analyze-prs"] SR["Replaced autoskillit imports<br/>━━━━━━━━━━<br/>with stdlib-only inline code<br/>or PostToolUse hook delegation"] end COMPLETE([Pipeline Complete]) %% Validation flow %% START -->|"recipe loaded"| VR VR -->|"iterates rules"| VS VS -->|"invokes new rule"| VI VI -->|"yes: run_skill step"| VB VI -->|"no: skip step"| VP VB -->|"for each bash block"| VM VM -->|"match found"| VF VM -->|"no match"| VP %% Hook flow %% START -->|"run_skill completes"| HT HT --> HP HP -->|"yes"| HS HP -->|"no"| HN HS -->|"tokens found"| HF HS -->|"no data"| HN HF --> HA HA --> COMPLETE %% Skill fix flow %% SF --> SR SR -.->|"now passes"| VI class START,COMPLETE terminal; class VR,VS phase; class VI,VM,HP stateNode; class VB,HT,HS,HF,HA newComponent; class VF detector; class VP,HN handler; class SF,SR gap;Closes #506
Implementation Plan
Plan file:
/home/talon/projects/autoskillit-runs/remediation-20260325-123256-483611/.autoskillit/temp/rectify/rectify_skill-python-autoskillit-import-contract_2026-03-25_160000_part_a.md🤖 Generated with Claude Code via AutoSkillit