Rectify: Skill Contracts Pattern-Example Binding — PART A ONLY#425
Conversation
Trecek
left a comment
There was a problem hiding this comment.
AutoSkillit PR Review — Verdict: changes_requested (5 actionable findings)
| Regression guard for issue #418: pattern had NO_GO (underscore) while SKILL.md | ||
| mandates NO GO (space). Must stay RED until skill_contracts.yaml is fixed. | ||
| """ | ||
| import re |
There was a problem hiding this comment.
[info] tests: import re inside the test function body is repeated across multiple test functions (lines 67 and 85). Move to module-level imports per Python convention.
| """For every skill with expected_output_patterns and pattern_examples, | ||
| every pattern must re.search-match at least one example. | ||
|
|
||
| Permanent architectural guard: pattern/SKILL.md divergence fails CI before production. |
There was a problem hiding this comment.
[info] slop: Permanent architectural guard is an unnecessary phrase in the docstring — all tests are permanent until removed. Reads as AI-generated justification prose; remove.
| ) | ||
|
|
||
|
|
||
| def test_pattern_examples_match_rule_fires_on_mismatch(monkeypatch) -> None: |
There was a problem hiding this comment.
[info] cohesion: Rule-level tests live in tests/recipe/test_rules_contracts.py while data-invariant tests live in tests/contracts/test_skill_contracts.py. These are symmetric pairs for the same feature; grouping strategy is inconsistent. Consider whether rule tests or data tests belong together.
Trecek
left a comment
There was a problem hiding this comment.
AutoSkillit review found 5 blocking issues (warning severity). See inline comments for details.
Findings summary:
src/autoskillit/recipe/rules_contracts.pyL90 [defense]: Unguardedre.search—re.errorfrom malformed patterns crashes the rule passsrc/autoskillit/recipe/rules_contracts.pyL72, L112 [slop]: Function docstrings restate decorator descriptions verbatimtests/contracts/test_skill_contracts.pyL73 [tests]: Test asserts ALL patterns match NO GO output, not just the relevant onetests/contracts/test_skill_contracts.pyL61 [cohesion]: Specifictest_audit_impl_no_go_pattern_matches_literal_outputis made redundant by the new parametrictest_every_pattern_example_matches_its_patterns
Fixes issue #418: `skill_contracts.yaml` had `(GO|NO_GO)` (underscore) while `audit-impl/SKILL.md` mandates `verdict = NO GO` (space). This caused any NO GO verdict to trigger CONTRACT_VIOLATION instead of routing to remediation. Changes: - skill_contracts.yaml: fix expected_output_patterns for audit-impl to use `(GO|NO GO)` instead of `(GO|NO_GO)` - tests/contracts/test_skill_contracts.py: add regression guard `test_audit_impl_no_go_pattern_matches_literal_output` that fails if the pattern stops matching `verdict = NO GO` - tests/recipe/test_skill_emit_consistency.py: expand scan from `skills/` (3 skills) to `skills/ + skills_extended/` (60 skills) - tests/execution/test_session_adjudication.py: fix 4 hardcoded `(GO|NO_GO)` patterns to `(GO|NO GO)` - tests/execution/test_headless.py: fix 2 hardcoded patterns - tests/execution/test_session_debug_logging.py: fix 1 hardcoded pattern Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds four tests that are RED on the post-Part-A codebase: - test_every_pattern_example_matches_its_patterns: CI guard that all expected_output_patterns match at least one declared example via re.search - test_every_skill_with_patterns_has_examples: every skill with patterns must also have pattern_examples - test_pattern_examples_match_rule_fires_on_mismatch: semantic rule ERROR fires when pattern matches none of the examples - test_missing_pattern_examples_rule_fires_when_examples_absent: semantic rule WARNING fires when patterns exist but examples are absent Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extends SkillContract with pattern_examples: list[str], loaded from skill_contracts.yaml by get_skill_contract() and serialized by generate_recipe_card(). Backward-compatible: defaults to [] when absent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
For every skill with expected_output_patterns, add a companion pattern_examples list with canonical literal emit strings drawn from SKILL.md output specifications. All 38 skills covered; all patterns verified to re.search-match at least one of their examples. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…c rules Two new semantic rules in rules_contracts.py: - pattern-examples-match (ERROR): fires when an expected_output_patterns regex matches none of the declared pattern_examples. Closes the static validation gap where a pattern could be syntactically valid but never match any real skill output. - missing-pattern-examples (WARNING): fires when a skill has expected_output_patterns but no pattern_examples, signaling that static validation cannot occur. Together with the test invariants and YAML updates, these rules create a triangular binding between patterns, examples, and actual skill output. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…xample_matches_its_patterns
6af2b27 to
b1688d0
Compare
Summary
skill_contracts.yamlholdsexpected_output_patterns— regexes used at runtime to validatesession output. These regexes are authored manually, independently from the SKILL.md output
specs they're meant to validate. The
audit-implentry used(GO|NO_GO)(underscore) whileaudit-impl/SKILL.mdline 340 mandatesverdict = NO GO(space). When the skill ran tocompletion with a NO GO verdict,
_check_expected_patterns()returnedFalse, the session wasclassified as
ContentState.CONTRACT_VIOLATION, and the pipeline hard-stopped instead of routingto remediation.
Part A fixes the immediate mismatch, expands the test scope guard that should have caught it,
and corrects the seven test locations that propagated the wrong pattern.
Architecture Impact
Process Flow Diagram
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef modified fill:#0d47a1,stroke:#42a5f5,stroke-width:3px,color:#fff; %% TERMINALS %% START([run_skill called]) PIPELINE_OK([pipeline continues]) PIPELINE_RETRY([pipeline retries]) PIPELINE_FAIL([on_failure: escalate_stop]) subgraph ContractLookup ["Contract Lookup"] direction TB Contracts["● skill_contracts.yaml<br/>━━━━━━━━━━<br/>expected_output_patterns:<br/>- verdict\\s*=\\s*(GO|NO GO)"] Resolver["● tools_execution.py<br/>━━━━━━━━━━<br/>output_pattern_resolver(skill_command)<br/>→ list[str]"] end subgraph SessionExec ["Session Execution"] direction TB Headless["● headless.py<br/>━━━━━━━━━━<br/>run_headless_core()<br/>subprocess emits NDJSON"] SubResult["SubprocessResult<br/>━━━━━━━━━━<br/>stdout / returncode<br/>termination / channel_confirmation"] end subgraph Recovery ["Recovery Attempts"] direction TB Rec1{"Separate Marker?<br/>━━━━━━━━━━<br/>marker in standalone msg?"} Rec2{"Pattern Recovery?<br/>━━━━━━━━━━<br/>patterns match assistant_messages<br/>+ channel != UNMONITORED?"} end subgraph OutcomeComputation ["Outcome Computation"] direction TB PatternCheck["_check_expected_patterns()<br/>━━━━━━━━━━<br/>re.search(pattern, result)<br/>AND across all patterns"] ContentState{"_evaluate_content_state()<br/>━━━━━━━━━━<br/>COMPLETE / ABSENT<br/>CONTRACT_VIOLATION / SESSION_ERROR"} Contradiction{"Contradiction guard<br/>━━━━━━━━━━<br/>success=True AND retry=True?"} DeadEnd{"Dead-end guard<br/>━━━━━━━━━━<br/>failed + no-retry<br/>+ channel confirmed?"} end subgraph Outcomes ["SkillResult"] direction LR Succeeded["SUCCEEDED<br/>━━━━━━━━━━<br/>success=True"] Retriable["RETRIABLE<br/>━━━━━━━━━━<br/>needs_retry=True"] Failed["FAILED<br/>━━━━━━━━━━<br/>subtype=adjudicated_failure"] end %% FLOW %% START --> Contracts Contracts -->|"patterns resolved"| Resolver Resolver -->|"patterns injected"| Headless Headless -->|"output captured"| SubResult SubResult --> Rec1 Rec1 -->|"yes: combine messages"| PatternCheck Rec1 -->|"no"| Rec2 Rec2 -->|"yes: reconstruct result"| PatternCheck Rec2 -->|"no"| PatternCheck PatternCheck -->|"all match → COMPLETE"| Contradiction PatternCheck -->|"any fail → CONTRACT_VIOLATION"| ContentState ContentState -->|"ABSENT → promote"| Retriable ContentState -->|"CONTRACT_VIOLATION<br/>SESSION_ERROR → terminal"| Failed Contradiction -->|"demote: retry wins"| Retriable Contradiction -->|"no conflict"| DeadEnd DeadEnd -->|"ABSENT → promote"| Retriable DeadEnd -->|"FAILED terminal"| Failed DeadEnd -->|"SUCCEEDED"| Succeeded Succeeded --> PIPELINE_OK Retriable --> PIPELINE_RETRY Failed --> PIPELINE_FAIL %% CLASS ASSIGNMENTS %% class START,PIPELINE_OK,PIPELINE_RETRY,PIPELINE_FAIL terminal; class Contracts,Resolver,Headless modified; class SubResult stateNode; class Rec1,Rec2 phase; class PatternCheck handler; class ContentState,Contradiction,DeadEnd detector; class Succeeded,Retriable,Failed stateNode;State Lifecycle Diagram
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; classDef modified fill:#0d47a1,stroke:#42a5f5,stroke-width:3px,color:#fff; subgraph ContractFields ["SKILL CONTRACT FIELDS (SkillContract)"] direction LR InitOnly["inputs / outputs<br/>━━━━━━━━━━<br/>INIT_ONLY<br/>Set at YAML load"] Patterns["● expected_output_patterns<br/>━━━━━━━━━━<br/>MUTABLE: corrected value<br/>verdict\\s*=\\s*(GO|NO GO)"] Examples["pattern_examples<br/>━━━━━━━━━━<br/>MUTABLE: examples list<br/>Binding cross-check target"] end subgraph Layer1 ["LAYER 1 — Semantic Rules (run_semantic_rules)"] direction TB MissingPatterns["missing-output-patterns<br/>━━━━━━━━━━<br/>WARNING if file_path output<br/>has no expected_output_patterns"] MissingExamples["missing-pattern-examples<br/>━━━━━━━━━━<br/>WARNING if patterns exist<br/>but pattern_examples is empty"] PatternMatch["pattern-examples-match<br/>━━━━━━━━━━<br/>ERROR if re.search(pattern, example)<br/>fails for ALL examples → blocks recipe"] end subgraph Layer2 ["LAYER 2 — Test Suite (pytest invariants)"] direction TB RegressionGuard["● test_audit_impl_no_go_pattern_matches_literal_output<br/>━━━━━━━━━━<br/>re.search(pattern, 'verdict = NO GO') must succeed<br/>Regression guard for issue #418"] AllPatternsTest["● test_every_pattern_example_matches_its_patterns<br/>━━━━━━━━━━<br/>ALL skills: every pattern must match<br/>at least one declared example"] EmitConsistency["● test_every_declared_output_has_emit_instruction<br/>━━━━━━━━━━<br/>skills/ + skills_extended/ scanned<br/>SKILL.md must have output_name = ..."] end subgraph Layer3 ["LAYER 3 — Runtime Enforcement (session.py)"] direction TB CheckPatterns["_check_expected_patterns()<br/>━━━━━━━━━━<br/>re.search(pattern, result)<br/>AND across all patterns"] ContentOutcome{"ContentState<br/>━━━━━━━━━━<br/>COMPLETE if all match<br/>CONTRACT_VIOLATION if any fail"} end subgraph Outcomes ["PIPELINE OUTCOME"] direction LR GoodRoute["SUCCEEDED<br/>━━━━━━━━━━<br/>→ on_result routes"] BadRoute["FAILED<br/>━━━━━━━━━━<br/>→ escalate_stop<br/>(was bug path)"] end %% FLOW %% InitOnly -->|"loaded once"| MissingPatterns Patterns -->|"validated by"| MissingPatterns Patterns -->|"cross-checked"| PatternMatch Examples -->|"cross-checked"| PatternMatch Examples -->|"validated by"| MissingExamples MissingPatterns -->|"WARNING only"| MissingExamples MissingExamples -->|"WARNING only"| PatternMatch PatternMatch -->|"ERROR → blocks recipe"| RegressionGuard RegressionGuard -->|"FAIL if NO_GO pattern"| AllPatternsTest AllPatternsTest -->|"FAIL if example mismatch"| EmitConsistency EmitConsistency -->|"FAIL if no emit line in SKILL.md"| CheckPatterns CheckPatterns -->|"match result"| ContentOutcome ContentOutcome -->|"COMPLETE"| GoodRoute ContentOutcome -->|"CONTRACT_VIOLATION"| BadRoute %% CLASS ASSIGNMENTS %% class InitOnly detector; class Patterns modified; class Examples phase; class MissingPatterns,MissingExamples gap; class PatternMatch stateNode; class RegressionGuard,AllPatternsTest,EmitConsistency modified; class CheckPatterns handler; class ContentOutcome stateNode; class GoodRoute output; class BadRoute detector;Closes #418
Implementation Plan
Plan file:
/home/talon/projects/autoskillit-runs/remediation-418-20260316-202127-835503/temp/rectify/rectify_skill_contracts_pattern_examples_2026-03-16_000000_part_a.mdToken Usage Summary
Token Summary\n\n(No token data recorded for this pipeline run)
🤖 Generated with Claude Code via AutoSkillit