Skip to content

Rectify: Skill Contracts Pattern-Example Binding — PART A ONLY#425

Merged
Trecek merged 9 commits intointegrationfrom
skill-contracts-yaml-no-go-pattern-mismatch-causes-legitimat/418
Mar 17, 2026
Merged

Rectify: Skill Contracts Pattern-Example Binding — PART A ONLY#425
Trecek merged 9 commits intointegrationfrom
skill-contracts-yaml-no-go-pattern-mismatch-causes-legitimat/418

Conversation

@Trecek
Copy link
Collaborator

@Trecek Trecek commented Mar 17, 2026

Summary

skill_contracts.yaml holds expected_output_patterns — regexes used at runtime to validate
session output. These regexes are authored manually, independently from the SKILL.md output
specs they're meant to validate. The audit-impl entry used (GO|NO_GO) (underscore) while
audit-impl/SKILL.md line 340 mandates verdict = NO GO (space). When the skill ran to
completion with a NO GO verdict, _check_expected_patterns() returned False, the session was
classified as ContentState.CONTRACT_VIOLATION, and the pipeline hard-stopped instead of routing
to remediation.

Part A fixes the immediate mismatch, expands the test scope guard that should have caught it,
and corrects the seven test locations that propagated the wrong pattern.

Architecture Impact

Process Flow Diagram

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef modified fill:#0d47a1,stroke:#42a5f5,stroke-width:3px,color:#fff;

    %% TERMINALS %%
    START([run_skill called])
    PIPELINE_OK([pipeline continues])
    PIPELINE_RETRY([pipeline retries])
    PIPELINE_FAIL([on_failure: escalate_stop])

    subgraph ContractLookup ["Contract Lookup"]
        direction TB
        Contracts["● skill_contracts.yaml<br/>━━━━━━━━━━<br/>expected_output_patterns:<br/>- verdict\\s*=\\s*(GO|NO GO)"]
        Resolver["● tools_execution.py<br/>━━━━━━━━━━<br/>output_pattern_resolver(skill_command)<br/>→ list[str]"]
    end

    subgraph SessionExec ["Session Execution"]
        direction TB
        Headless["● headless.py<br/>━━━━━━━━━━<br/>run_headless_core()<br/>subprocess emits NDJSON"]
        SubResult["SubprocessResult<br/>━━━━━━━━━━<br/>stdout / returncode<br/>termination / channel_confirmation"]
    end

    subgraph Recovery ["Recovery Attempts"]
        direction TB
        Rec1{"Separate Marker?<br/>━━━━━━━━━━<br/>marker in standalone msg?"}
        Rec2{"Pattern Recovery?<br/>━━━━━━━━━━<br/>patterns match assistant_messages<br/>+ channel != UNMONITORED?"}
    end

    subgraph OutcomeComputation ["Outcome Computation"]
        direction TB
        PatternCheck["_check_expected_patterns()<br/>━━━━━━━━━━<br/>re.search(pattern, result)<br/>AND across all patterns"]
        ContentState{"_evaluate_content_state()<br/>━━━━━━━━━━<br/>COMPLETE / ABSENT<br/>CONTRACT_VIOLATION / SESSION_ERROR"}
        Contradiction{"Contradiction guard<br/>━━━━━━━━━━<br/>success=True AND retry=True?"}
        DeadEnd{"Dead-end guard<br/>━━━━━━━━━━<br/>failed + no-retry<br/>+ channel confirmed?"}
    end

    subgraph Outcomes ["SkillResult"]
        direction LR
        Succeeded["SUCCEEDED<br/>━━━━━━━━━━<br/>success=True"]
        Retriable["RETRIABLE<br/>━━━━━━━━━━<br/>needs_retry=True"]
        Failed["FAILED<br/>━━━━━━━━━━<br/>subtype=adjudicated_failure"]
    end

    %% FLOW %%
    START --> Contracts
    Contracts -->|"patterns resolved"| Resolver
    Resolver -->|"patterns injected"| Headless
    Headless -->|"output captured"| SubResult
    SubResult --> Rec1
    Rec1 -->|"yes: combine messages"| PatternCheck
    Rec1 -->|"no"| Rec2
    Rec2 -->|"yes: reconstruct result"| PatternCheck
    Rec2 -->|"no"| PatternCheck
    PatternCheck -->|"all match → COMPLETE"| Contradiction
    PatternCheck -->|"any fail → CONTRACT_VIOLATION"| ContentState
    ContentState -->|"ABSENT → promote"| Retriable
    ContentState -->|"CONTRACT_VIOLATION<br/>SESSION_ERROR → terminal"| Failed
    Contradiction -->|"demote: retry wins"| Retriable
    Contradiction -->|"no conflict"| DeadEnd
    DeadEnd -->|"ABSENT → promote"| Retriable
    DeadEnd -->|"FAILED terminal"| Failed
    DeadEnd -->|"SUCCEEDED"| Succeeded
    Succeeded --> PIPELINE_OK
    Retriable --> PIPELINE_RETRY
    Failed --> PIPELINE_FAIL

    %% CLASS ASSIGNMENTS %%
    class START,PIPELINE_OK,PIPELINE_RETRY,PIPELINE_FAIL terminal;
    class Contracts,Resolver,Headless modified;
    class SubResult stateNode;
    class Rec1,Rec2 phase;
    class PatternCheck handler;
    class ContentState,Contradiction,DeadEnd detector;
    class Succeeded,Retriable,Failed stateNode;
Loading

State Lifecycle Diagram

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef modified fill:#0d47a1,stroke:#42a5f5,stroke-width:3px,color:#fff;

    subgraph ContractFields ["SKILL CONTRACT FIELDS (SkillContract)"]
        direction LR
        InitOnly["inputs / outputs<br/>━━━━━━━━━━<br/>INIT_ONLY<br/>Set at YAML load"]
        Patterns["● expected_output_patterns<br/>━━━━━━━━━━<br/>MUTABLE: corrected value<br/>verdict\\s*=\\s*(GO|NO GO)"]
        Examples["pattern_examples<br/>━━━━━━━━━━<br/>MUTABLE: examples list<br/>Binding cross-check target"]
    end

    subgraph Layer1 ["LAYER 1 — Semantic Rules (run_semantic_rules)"]
        direction TB
        MissingPatterns["missing-output-patterns<br/>━━━━━━━━━━<br/>WARNING if file_path output<br/>has no expected_output_patterns"]
        MissingExamples["missing-pattern-examples<br/>━━━━━━━━━━<br/>WARNING if patterns exist<br/>but pattern_examples is empty"]
        PatternMatch["pattern-examples-match<br/>━━━━━━━━━━<br/>ERROR if re.search(pattern, example)<br/>fails for ALL examples → blocks recipe"]
    end

    subgraph Layer2 ["LAYER 2 — Test Suite (pytest invariants)"]
        direction TB
        RegressionGuard["● test_audit_impl_no_go_pattern_matches_literal_output<br/>━━━━━━━━━━<br/>re.search(pattern, 'verdict = NO GO') must succeed<br/>Regression guard for issue #418"]
        AllPatternsTest["● test_every_pattern_example_matches_its_patterns<br/>━━━━━━━━━━<br/>ALL skills: every pattern must match<br/>at least one declared example"]
        EmitConsistency["● test_every_declared_output_has_emit_instruction<br/>━━━━━━━━━━<br/>skills/ + skills_extended/ scanned<br/>SKILL.md must have output_name = ..."]
    end

    subgraph Layer3 ["LAYER 3 — Runtime Enforcement (session.py)"]
        direction TB
        CheckPatterns["_check_expected_patterns()<br/>━━━━━━━━━━<br/>re.search(pattern, result)<br/>AND across all patterns"]
        ContentOutcome{"ContentState<br/>━━━━━━━━━━<br/>COMPLETE if all match<br/>CONTRACT_VIOLATION if any fail"}
    end

    subgraph Outcomes ["PIPELINE OUTCOME"]
        direction LR
        GoodRoute["SUCCEEDED<br/>━━━━━━━━━━<br/>→ on_result routes"]
        BadRoute["FAILED<br/>━━━━━━━━━━<br/>→ escalate_stop<br/>(was bug path)"]
    end

    %% FLOW %%
    InitOnly -->|"loaded once"| MissingPatterns
    Patterns -->|"validated by"| MissingPatterns
    Patterns -->|"cross-checked"| PatternMatch
    Examples -->|"cross-checked"| PatternMatch
    Examples -->|"validated by"| MissingExamples

    MissingPatterns -->|"WARNING only"| MissingExamples
    MissingExamples -->|"WARNING only"| PatternMatch
    PatternMatch -->|"ERROR → blocks recipe"| RegressionGuard

    RegressionGuard -->|"FAIL if NO_GO pattern"| AllPatternsTest
    AllPatternsTest -->|"FAIL if example mismatch"| EmitConsistency
    EmitConsistency -->|"FAIL if no emit line in SKILL.md"| CheckPatterns

    CheckPatterns -->|"match result"| ContentOutcome
    ContentOutcome -->|"COMPLETE"| GoodRoute
    ContentOutcome -->|"CONTRACT_VIOLATION"| BadRoute

    %% CLASS ASSIGNMENTS %%
    class InitOnly detector;
    class Patterns modified;
    class Examples phase;
    class MissingPatterns,MissingExamples gap;
    class PatternMatch stateNode;
    class RegressionGuard,AllPatternsTest,EmitConsistency modified;
    class CheckPatterns handler;
    class ContentOutcome stateNode;
    class GoodRoute output;
    class BadRoute detector;
Loading

Closes #418

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/remediation-418-20260316-202127-835503/temp/rectify/rectify_skill_contracts_pattern_examples_2026-03-16_000000_part_a.md

Token Usage Summary

Token Summary\n\n(No token data recorded for this pipeline run)

🤖 Generated with Claude Code via AutoSkillit

Copy link
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit PR Review — Verdict: changes_requested (5 actionable findings)

Regression guard for issue #418: pattern had NO_GO (underscore) while SKILL.md
mandates NO GO (space). Must stay RED until skill_contracts.yaml is fixed.
"""
import re
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] tests: import re inside the test function body is repeated across multiple test functions (lines 67 and 85). Move to module-level imports per Python convention.

"""For every skill with expected_output_patterns and pattern_examples,
every pattern must re.search-match at least one example.

Permanent architectural guard: pattern/SKILL.md divergence fails CI before production.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] slop: Permanent architectural guard is an unnecessary phrase in the docstring — all tests are permanent until removed. Reads as AI-generated justification prose; remove.

)


def test_pattern_examples_match_rule_fires_on_mismatch(monkeypatch) -> None:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] cohesion: Rule-level tests live in tests/recipe/test_rules_contracts.py while data-invariant tests live in tests/contracts/test_skill_contracts.py. These are symmetric pairs for the same feature; grouping strategy is inconsistent. Consider whether rule tests or data tests belong together.

Copy link
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit review found 5 blocking issues (warning severity). See inline comments for details.

Findings summary:

  • src/autoskillit/recipe/rules_contracts.py L90 [defense]: Unguarded re.searchre.error from malformed patterns crashes the rule pass
  • src/autoskillit/recipe/rules_contracts.py L72, L112 [slop]: Function docstrings restate decorator descriptions verbatim
  • tests/contracts/test_skill_contracts.py L73 [tests]: Test asserts ALL patterns match NO GO output, not just the relevant one
  • tests/contracts/test_skill_contracts.py L61 [cohesion]: Specific test_audit_impl_no_go_pattern_matches_literal_output is made redundant by the new parametric test_every_pattern_example_matches_its_patterns

@Trecek Trecek enabled auto-merge March 17, 2026 06:11
Trecek and others added 9 commits March 17, 2026 07:58
Fixes issue #418: `skill_contracts.yaml` had `(GO|NO_GO)` (underscore)
while `audit-impl/SKILL.md` mandates `verdict = NO GO` (space). This
caused any NO GO verdict to trigger CONTRACT_VIOLATION instead of
routing to remediation.

Changes:
- skill_contracts.yaml: fix expected_output_patterns for audit-impl
  to use `(GO|NO GO)` instead of `(GO|NO_GO)`
- tests/contracts/test_skill_contracts.py: add regression guard
  `test_audit_impl_no_go_pattern_matches_literal_output` that fails
  if the pattern stops matching `verdict = NO GO`
- tests/recipe/test_skill_emit_consistency.py: expand scan from
  `skills/` (3 skills) to `skills/ + skills_extended/` (60 skills)
- tests/execution/test_session_adjudication.py: fix 4 hardcoded
  `(GO|NO_GO)` patterns to `(GO|NO GO)`
- tests/execution/test_headless.py: fix 2 hardcoded patterns
- tests/execution/test_session_debug_logging.py: fix 1 hardcoded pattern

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds four tests that are RED on the post-Part-A codebase:
- test_every_pattern_example_matches_its_patterns: CI guard that all
  expected_output_patterns match at least one declared example via re.search
- test_every_skill_with_patterns_has_examples: every skill with patterns
  must also have pattern_examples
- test_pattern_examples_match_rule_fires_on_mismatch: semantic rule ERROR
  fires when pattern matches none of the examples
- test_missing_pattern_examples_rule_fires_when_examples_absent: semantic
  rule WARNING fires when patterns exist but examples are absent

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extends SkillContract with pattern_examples: list[str], loaded from
skill_contracts.yaml by get_skill_contract() and serialized by
generate_recipe_card(). Backward-compatible: defaults to [] when absent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
For every skill with expected_output_patterns, add a companion
pattern_examples list with canonical literal emit strings drawn from
SKILL.md output specifications. All 38 skills covered; all patterns
verified to re.search-match at least one of their examples.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…c rules

Two new semantic rules in rules_contracts.py:
- pattern-examples-match (ERROR): fires when an expected_output_patterns
  regex matches none of the declared pattern_examples. Closes the static
  validation gap where a pattern could be syntactically valid but never
  match any real skill output.
- missing-pattern-examples (WARNING): fires when a skill has
  expected_output_patterns but no pattern_examples, signaling that
  static validation cannot occur.

Together with the test invariants and YAML updates, these rules create a
triangular binding between patterns, examples, and actual skill output.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Trecek Trecek force-pushed the skill-contracts-yaml-no-go-pattern-mismatch-causes-legitimat/418 branch from 6af2b27 to b1688d0 Compare March 17, 2026 15:12
@Trecek Trecek added this pull request to the merge queue Mar 17, 2026
Merged via the queue into integration with commit 17ce2d6 Mar 17, 2026
2 checks passed
@Trecek Trecek deleted the skill-contracts-yaml-no-go-pattern-mismatch-causes-legitimat/418 branch March 17, 2026 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant