Skip to content

Rectify: Structured Output Instruction Hardening — PART B ONLY#484

Merged
Trecek merged 9 commits intointegrationfrom
adjudicated-failure-false-positive-make-plan-session-killed/477
Mar 23, 2026
Merged

Rectify: Structured Output Instruction Hardening — PART B ONLY#484
Trecek merged 9 commits intointegrationfrom
adjudicated-failure-false-positive-make-plan-session-killed/477

Conversation

@Trecek
Copy link
Copy Markdown
Collaborator

@Trecek Trecek commented Mar 22, 2026

Summary

Part A addressed the execution engine: when a skill writes a file but omits the structured output token, the system can now recover using tool call evidence (_synthesize_from_write_artifacts) and promotes the result to RETRIABLE(CONTRACT_RECOVERY) instead of abandoning with a terminal failure. Part B addresses the source: SKILL.md instruction quality across 20+ path-capture skills. Two compounding defects caused models to intermittently omit the structured output token — late instruction positioning (token requirement only in ## Output, not ## Critical Constraints) and a relative/absolute path contradiction between the save instruction and the contract regex. This PR establishes the "Concrete Token Instruction" canonical pattern in Critical Constraints for every affected skill and adds a static CI test that prevents regression as new skills are added. Together, Part A (recovery when the model still fails) and Part B (reduced failure rate from improved instructions) provide defense-in-depth for structured output compliance.

Architecture Impact

Process Flow Diagram

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    START([run_skill invoked])
    SUCCEEDED([SkillResult: SUCCEEDED])
    RETRIABLE([SkillResult: RETRIABLE])
    FAILED([SkillResult: FAILED])

    subgraph Parsing ["Phase 1 — NDJSON Parsing"]
        direction TB
        Parse["● parse_session_result<br/>━━━━━━━━━━<br/>Scan stdout NDJSON<br/>Accumulate tool_uses + messages"]
        CSR["ClaudeSessionResult<br/>━━━━━━━━━━<br/>result, subtype, is_error<br/>tool_uses, assistant_messages<br/>write_call_count"]
    end

    subgraph Recovery ["Phase 2 — Recovery Chain"]
        direction TB
        RecA["Recovery A<br/>━━━━━━━━━━<br/>_recover_from_separate_marker<br/>Standalone %%ORDER_UP%% → join messages"]
        RecB["● Recovery B<br/>━━━━━━━━━━<br/>_recover_block_from_assistant_messages<br/>Channel confirmed + patterns missing?<br/>Scan assistant_messages for tokens"]
        RecC["● Recovery C (NEW)<br/>━━━━━━━━━━<br/>_synthesize_from_write_artifacts<br/>write_count≥1 + patterns still missing?<br/>Synthesize token from Write tool_use file_path"]
    end

    subgraph Outcome ["Phase 3 — Outcome Computation"]
        direction TB
        CompS["● _compute_success<br/>━━━━━━━━━━<br/>CHANNEL_B bypass gate<br/>TerminationReason dispatch<br/>_check_session_content"]
        CompR["_compute_retry<br/>━━━━━━━━━━<br/>context_exhausted → RESUME<br/>kill_anomaly → RESUME<br/>marker absent → EARLY_STOP"]
        ContradictionGuard{"Contradiction<br/>Guard<br/>success ∧ retry?"}
        DeadEnd{"Dead-End<br/>Guard<br/>¬success ∧ ¬retry<br/>+ channel confirmed?"}
        ContentEval["● _evaluate_content_state<br/>━━━━━━━━━━<br/>COMPLETE / ABSENT<br/>CONTRACT_VIOLATION / SESSION_ERROR"]
    end

    subgraph PostProcess ["Phase 4 — Post-Processing"]
        direction TB
        NormSub["● _normalize_subtype<br/>━━━━━━━━━━<br/>Resolve CLI vs adjudicated contradiction<br/>→ adjudicated_failure / empty_result / etc."]
        BudgetG1["_apply_budget_guard (pass 1)<br/>━━━━━━━━━━<br/>Consecutive failures > max?<br/>Override needs_retry=False"]
        CRGate["● CONTRACT_RECOVERY gate<br/>━━━━━━━━━━<br/>adjudicated_failure + write_count≥1?<br/>Promote to RETRIABLE(CONTRACT_RECOVERY)"]
        BudgetG2["_apply_budget_guard (pass 2)<br/>━━━━━━━━━━<br/>Cap CONTRACT_RECOVERY retries<br/>→ BUDGET_EXHAUSTED"]
        ZeroWrite["Zero-Write Gate<br/>━━━━━━━━━━<br/>success + write=0 + expected?<br/>Demote to RETRIABLE(ZERO_WRITES)"]
    end

    %% MAIN FLOW %%
    START --> Parse
    Parse --> CSR
    CSR --> RecA
    RecA -->|"completion_marker configured"| RecB
    RecA -->|"no marker config"| RecB
    RecB -->|"channel confirmed + patterns found in messages"| RecC
    RecB -->|"patterns not in messages"| RecC
    RecC -->|"write evidence + path-token patterns → synthesize tokens"| CompS
    RecC -->|"no write evidence or non-path patterns"| CompS
    CompS --> CompR
    CompR --> ContradictionGuard
    ContradictionGuard -->|"success=True AND retry=True<br/>demote success"| DeadEnd
    ContradictionGuard -->|"no contradiction"| DeadEnd
    DeadEnd -->|"¬success ∧ ¬retry ∧ channel confirmed"| ContentEval
    DeadEnd -->|"otherwise"| NormSub
    ContentEval -->|"ABSENT → DRAIN_RACE<br/>promote to RETRIABLE"| NormSub
    ContentEval -->|"CONTRACT_VIOLATION<br/>SESSION_ERROR → FAILED"| NormSub
    NormSub --> BudgetG1
    BudgetG1 -->|"needs_retry=True → budget check"| CRGate
    BudgetG1 -->|"budget exhausted → BUDGET_EXHAUSTED"| FAILED
    CRGate -->|"adjudicated_failure + write_count≥1<br/>→ needs_retry=True, CONTRACT_RECOVERY"| BudgetG2
    CRGate -->|"conditions not met"| ZeroWrite
    BudgetG2 -->|"budget not exhausted"| ZeroWrite
    BudgetG2 -->|"budget exhausted"| FAILED
    ZeroWrite -->|"success + write=0 + expected"| RETRIABLE
    ZeroWrite -->|"all gates passed"| SUCCEEDED
    ZeroWrite -->|"success=False, no retry"| FAILED
    ZeroWrite -->|"needs_retry=True"| RETRIABLE

    %% CLASS ASSIGNMENTS %%
    class START terminal;
    class SUCCEEDED,RETRIABLE,FAILED terminal;
    class Parse,RecA handler;
    class CSR stateNode;
    class RecB,RecC newComponent;
    class CompS,CompR phase;
    class ContradictionGuard,DeadEnd,ContentEval detector;
    class NormSub,BudgetG1,BudgetG2,ZeroWrite handler;
    class CRGate newComponent;
Loading

Color Legend:

Color Category Description
Dark Blue Terminal Start and result states (SUCCEEDED / RETRIABLE / FAILED)
Orange Handler Processing nodes (parse, normalize, budget guard)
Teal State ClaudeSessionResult data container
Purple Phase Outcome computation nodes
Green New/Modified Nodes changed in this PR (● Recovery B, ● Recovery C, ● CONTRACT_RECOVERY gate)
Red Detector Validation and guard nodes (dead-end guard, content evaluation)

Error Resilience Diagram

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;

    T_COMPLETE([SUCCEEDED])
    T_RETRIABLE([RETRIABLE])
    T_FAILED([FAILED — terminal])

    subgraph Prevention ["PREVENTION — Part B: SKILL.md Instruction Hardening"]
        direction TB
        SKILLMd["● SKILL.md<br/>━━━━━━━━━━<br/>Token instruction moved to<br/>Critical Constraints section<br/>Absolute path example given"]
        StaticTest["● test_skill_output_compliance.py<br/>━━━━━━━━━━<br/>Static regex: token instruction<br/>must appear inside ## Critical Constraints<br/>Catches regression as new skills added"]
        Contracts["● skill_contracts.yaml<br/>━━━━━━━━━━<br/>setup-project contract removed<br/>(no emit instruction existed)"]
    end

    subgraph Detection ["DETECTION — Contract Violation Recognition"]
        direction TB
        PatternCheck["● _check_expected_patterns<br/>━━━━━━━━━━<br/>Normalize bold markdown<br/>AND-match all regex patterns<br/>vs session.result"]
        ContentEval["● _evaluate_content_state<br/>━━━━━━━━━━<br/>COMPLETE / ABSENT<br/>CONTRACT_VIOLATION / SESSION_ERROR"]
        DeadEnd{"Dead-End Guard<br/>━━━━━━━━━━<br/>¬success ∧ ¬retry<br/>+ channel confirmed?"}
    end

    subgraph RecoveryChain ["RECOVERY CHAIN — Three-Stage Fallback"]
        direction TB
        RecA["Recovery A: Separate Marker<br/>━━━━━━━━━━<br/>Standalone %%ORDER_UP%% message<br/>→ join assistant_messages"]
        RecB["● Recovery B: Assistant Messages<br/>━━━━━━━━━━<br/>Channel confirmed + patterns missing<br/>→ scan all assistant_messages<br/>(drain-race artifact fix)"]
        RecC["● Recovery C: Artifact Synthesis (NEW)<br/>━━━━━━━━━━<br/>write_count≥1 + patterns still absent<br/>→ scan tool_uses for Write file_path<br/>→ synthesize token = /abs/path"]
    end

    subgraph CircuitBreakers ["CIRCUIT BREAKERS — Retry Caps"]
        direction TB
        BudgetG1["_apply_budget_guard (pass 1)<br/>━━━━━━━━━━<br/>consecutive failures > max_consecutive_retries<br/>→ BUDGET_EXHAUSTED, needs_retry=False"]
        CRGate["● CONTRACT_RECOVERY Gate (NEW)<br/>━━━━━━━━━━<br/>adjudicated_failure + write_count≥1<br/>→ promote to RETRIABLE(CONTRACT_RECOVERY)"]
        BudgetG2["_apply_budget_guard (pass 2)<br/>━━━━━━━━━━<br/>Caps CONTRACT_RECOVERY retries<br/>(prevents infinite loop)"]
        DrainRace["Dead-End Guard → DRAIN_RACE<br/>━━━━━━━━━━<br/>ABSENT state: channel confirmed completion<br/>but result empty → transient → retry"]
    end

    %% PREVENTION → DETECTION %%
    SKILLMd -->|"reduced omission rate"| PatternCheck
    StaticTest -->|"regression guard"| SKILLMd
    Contracts -->|"removes false positives"| PatternCheck

    %% DETECTION %%
    PatternCheck -->|"patterns match"| T_COMPLETE
    PatternCheck -->|"patterns absent"| ContentEval
    ContentEval --> DeadEnd

    %% RECOVERY CHAIN (pre-detection) %%
    RecA -->|"token found in messages"| PatternCheck
    RecA -->|"not found"| RecB
    RecB -->|"token found in assistant_messages"| PatternCheck
    RecB -->|"not found"| RecC
    RecC -->|"synthesized token → updated result"| PatternCheck
    RecC -->|"no write evidence"| PatternCheck

    %% DEAD-END ROUTING %%
    DeadEnd -->|"ABSENT → drain-race"| DrainRace
    DeadEnd -->|"CONTRACT_VIOLATION"| BudgetG1
    DeadEnd -->|"SESSION_ERROR"| T_FAILED

    %% CIRCUIT BREAKERS %%
    DrainRace -->|"RETRIABLE(DRAIN_RACE)"| T_RETRIABLE
    BudgetG1 -->|"budget not exhausted"| CRGate
    BudgetG1 -->|"budget exhausted"| T_FAILED
    CRGate -->|"write_count≥1 → RETRIABLE(CONTRACT_RECOVERY)"| BudgetG2
    CRGate -->|"no write evidence → terminal"| T_FAILED
    BudgetG2 -->|"budget not exhausted"| T_RETRIABLE
    BudgetG2 -->|"budget exhausted → BUDGET_EXHAUSTED"| T_FAILED

    %% CLASS ASSIGNMENTS %%
    class T_COMPLETE,T_RETRIABLE,T_FAILED terminal;
    class SKILLMd,Contracts newComponent;
    class StaticTest newComponent;
    class PatternCheck,ContentEval detector;
    class DeadEnd stateNode;
    class RecA handler;
    class RecB,RecC newComponent;
    class BudgetG1,BudgetG2 phase;
    class CRGate newComponent;
    class DrainRace output;
Loading

Color Legend:

Color Category Description
Dark Blue Terminal Final states: SUCCEEDED, RETRIABLE, FAILED
Green New/Modified Components changed in this PR
Red Detector Pattern matching and content state evaluation
Teal State Dead-end guard decision node
Orange Handler Recovery A (existing)
Purple Phase Budget guard passes
Dark Teal Recovery Drain-race promotion

State Lifecycle Diagram

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;

    T_OK([Contract Satisfied])
    T_RETRY([Contract: Retry Eligible])
    T_VIOLATED([Contract Violated — Terminal])

    subgraph ContractDef ["CONTRACT DEFINITION LAYER"]
        direction LR
        SkillContracts["● skill_contracts.yaml<br/>━━━━━━━━━━<br/>expected_output_patterns<br/>completion_marker<br/>write_behavior<br/>setup-project contract removed"]
        SkillMD["● SKILL.md<br/>━━━━━━━━━━<br/>Critical Constraints section<br/>Concrete token instruction<br/>Absolute path example<br/>(20+ skills updated)"]
        StaticTest["● test_skill_output_compliance.py<br/>━━━━━━━━━━<br/>CI gate: token instruction<br/>must be in ## Critical Constraints<br/>Regex: r'## Critical Constraints.*plan_path\\s*='<br/>Covers all path-capture skills"]
    end

    subgraph ModelExecution ["MODEL EXECUTION — Headless Session"]
        direction TB
        WriteArtifact["Model writes artifact<br/>━━━━━━━━━━<br/>Write tool call<br/>file_path → disk<br/>write_call_count += 1"]
        EmitToken{"● Emits structured token?<br/>━━━━━━━━━━<br/>plan_path = /abs/path<br/>or investigation_path = ...<br/>or diagram_path = ..."}
    end

    subgraph RuntimeRecovery ["RUNTIME RECOVERY — Three-Stage Chain"]
        direction TB
        RecovB["● Recovery B<br/>━━━━━━━━━━<br/>Scan assistant_messages<br/>for token in JSONL stream<br/>(drain-race: stdout not flushed)"]
        RecovC["● Recovery C (NEW)<br/>━━━━━━━━━━<br/>Scan tool_uses for Write.file_path<br/>Synthesize: token_name = file_path<br/>Only for path-capture patterns"]
        Synthesized["● Synthesized contract token<br/>━━━━━━━━━━<br/>plan_path = /abs/path/plan.md<br/>(from Write tool_use metadata)<br/>Prepended to session.result"]
    end

    subgraph ContentStateEval ["CONTENT STATE EVALUATION — session.py"]
        direction TB
        MarkerCheck{"Completion marker<br/>━━━━━━━━━━<br/>%%ORDER_UP%% present<br/>in session.result?"}
        PatternCheck{"● Patterns match?<br/>━━━━━━━━━━<br/>_check_expected_patterns<br/>AND-match all regexes<br/>normalize bold markup"}
        StateDecide["● _evaluate_content_state<br/>━━━━━━━━━━<br/>COMPLETE / ABSENT<br/>CONTRACT_VIOLATION<br/>SESSION_ERROR"]
    end

    subgraph ContractGates ["CONTRACT GATES — Dead-End Guard"]
        direction TB
        AbsentGate{"ContentState<br/>ABSENT?<br/>━━━━━━━━━━<br/>result empty or<br/>marker missing"}
        CVGate{"ContentState<br/>CONTRACT_VIOLATION?<br/>━━━━━━━━━━<br/>marker present<br/>patterns failed"}
        WriteEvidence{"● Write evidence?<br/>━━━━━━━━━━<br/>write_call_count ≥ 1<br/>AND adjudicated_failure"}
    end

    %% CONTRACT DEFINITION FLOW %%
    StaticTest -->|"CI enforces"| SkillMD
    SkillMD -->|"instructs model"| EmitToken
    SkillContracts -->|"defines patterns"| PatternCheck

    %% MODEL EXECUTION %%
    WriteArtifact --> EmitToken
    EmitToken -->|"YES — token emitted"| PatternCheck
    EmitToken -->|"NO — token omitted"| RecovB

    %% RECOVERY %%
    RecovB -->|"found in messages"| PatternCheck
    RecovB -->|"not found"| RecovC
    RecovC -->|"Write.file_path found"| Synthesized
    RecovC -->|"no write evidence"| PatternCheck
    Synthesized --> PatternCheck

    %% CONTENT STATE EVALUATION %%
    PatternCheck -->|"all patterns match"| MarkerCheck
    PatternCheck -->|"patterns absent"| StateDecide
    MarkerCheck -->|"present"| T_OK
    MarkerCheck -->|"absent"| StateDecide
    StateDecide --> AbsentGate
    AbsentGate -->|"ABSENT"| T_RETRY
    AbsentGate -->|"not ABSENT"| CVGate
    CVGate -->|"SESSION_ERROR"| T_VIOLATED
    CVGate -->|"CONTRACT_VIOLATION"| WriteEvidence
    WriteEvidence -->|"write evidence present<br/>CONTRACT_RECOVERY gate"| T_RETRY
    WriteEvidence -->|"no write evidence<br/>terminal violation"| T_VIOLATED

    %% OUTCOMES %%
    T_OK -->|"subtype=success"| T_OK
    T_RETRY -->|"DRAIN_RACE or CONTRACT_RECOVERY<br/>budget-capped by _apply_budget_guard"| T_RETRY

    %% CLASS ASSIGNMENTS %%
    class T_OK,T_RETRY,T_VIOLATED terminal;
    class SkillContracts,SkillMD newComponent;
    class StaticTest newComponent;
    class WriteArtifact handler;
    class EmitToken stateNode;
    class RecovB,RecovC,Synthesized newComponent;
    class PatternCheck,MarkerCheck detector;
    class StateDecide phase;
    class AbsentGate,CVGate,WriteEvidence stateNode;
Loading

Color Legend:

Color Category Description
Dark Blue Terminal Contract outcomes: Satisfied, Retry Eligible, Violated
Green New/Modified Components changed in this PR
Red Detector Pattern matching gates and marker checks
Purple Phase ContentState evaluation dispatcher
Teal State Decision nodes (marker check, content state, write evidence)
Orange Handler Model Write tool call execution

Closes #477

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/remediation-20260322-120141-753065/temp/rectify/rectify_artifact-aware-contract-recovery_2026-03-22_120141_part_b.md

🤖 Generated with Claude Code via AutoSkillit

Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit PR Review — Verdict: changes_requested


# Skills with path-capture contracts that must have their token instruction
# in ## Critical Constraints (not only in ## Output or a late workflow step).
PATH_CAPTURE_SKILLS: dict[str, list[str]] = {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] cohesion: PATH_CAPTURE_SKILLS is a hardcoded static dict that duplicates data already derivable from skill_contracts.yaml via _get_contracted_path_capture_skills(). Skills added to skill_contracts.yaml with path-capture contracts will not automatically be covered by test_path_capture_token_instruction_in_critical_constraints or test_path_capture_token_instruction_mentions_absolute, creating a silent coverage gap. Fix: replace list(PATH_CAPTURE_SKILLS.items()) in both @pytest.mark.parametrize calls with list(_get_contracted_path_capture_skills().items()).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid observation — flagged for design decision. Investigated — the coverage gap is real (6 skills with path-capture contracts missing from PATH_CAPTURE_SKILLS: diagnose-ci, implement-worktree, implement-worktree-no-merge, resolve-merge-conflicts, retry-worktree, write-recipe). However, applying the proposed mechanical fix has two consequences requiring human decision: (1) audit-impl would be dropped from the parametrize tests because its contract pattern (verdict\s*=\s*(GO|NO GO)) doesn't match the path-capture regex, yet PATH_CAPTURE_SKILLS intentionally includes remediation_path for it; (2) 4 newly-covered skills (implement-worktree, implement-worktree-no-merge, retry-worktree, write-recipe) don't yet have the required tokens in ## Critical Constraints, so the fix would immediately break CI without coordinated SKILL.md updates. Flagged for design decision — see temp/resolve-review/analysis_484_20260322-143154.md.

# T2: _synthesize_from_write_artifacts
# ---------------------------------------------------------------------------

from autoskillit.execution.session import ClaudeSessionResult # noqa: E402
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] slop: Mid-file import from autoskillit.execution.session import ClaudeSessionResult with # noqa: E402. This belongs at the top of the file with other imports.

# ---------------------------------------------------------------------------
# T1: parse_session_result preserves file_path from Write/Edit tool_use input
# ---------------------------------------------------------------------------

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] slop: Mid-file import json with # noqa: E402. json is a stdlib module and belongs at the top of the file. The comment justifying mid-file placement ("to keep T1 tests self-contained") rationalizes poor structure rather than fixing it.

Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit review found 1 blocking issue (cohesion/warning) and 2 info findings. See inline comments. The critical/warning bugs flagged by early analysis were false positives — the actual implementation in headless.py has correct indentation, valid regex for YAML-loaded patterns, and sound logic. The one actionable issue is PATH_CAPTURE_SKILLS static dict creating a second source of truth.

Trecek and others added 9 commits March 22, 2026 17:31
- T4: test_contract_recovery_retry_reason_exists + update test_retry_reason_values
- T1: parse_session_result preserves file_path for Write/Edit tool_uses
- T2: _synthesize_from_write_artifacts synthesizes missing structured output tokens
- T3: _build_skill_result CONTRACT_RECOVERY gate integration tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
marker present + write evidence — omission not structural.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ion_result

Only Write and Edit tool_uses carry file_path evidence of artifact creation.
Other tools (Bash, Glob, Read, etc.) are excluded to keep the data model narrow.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Synthesizes missing structured output tokens from Write/Edit tool_use
file_path data when the model wrote a file but omitted the path token.
Only activates for path-capture patterns (token\s*=\s*/.+) — non-path
patterns like verdict= remain text-compliance-only.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…build_skill_result

- 5a: Move write_call_count before recovery chain (used by synthesis step)
- 5b: Add R3 synthesis recovery step after R2 pattern recovery
- 5c: Add CONTRACT_RECOVERY gate after _apply_budget_guard — when adjudicated_failure
      has write evidence (write_call_count >= 1), promote to RETRIABLE(CONTRACT_RECOVERY)
      then re-apply budget guard to cap retries

Also record CONTRACT_RECOVERY failures as needs_retry=True in audit so the consecutive
chain is preserved for the second budget guard call: CONTRACT_RECOVERY failures are
genuinely retriable and should be counted as such in the consecutive run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…placement

T1: verify token instruction appears in ## Critical Constraints section
T2: verify every contracted skill has an emit instruction in SKILL.md
T3: verify Critical Constraints token instruction mentions absolute path

These tests fail before the SKILL.md updates in subsequent commits.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ath-capture skills

Establishes the Canonical Token Instruction pattern across 20 SKILL.md files:
- make-plan, rectify: plan_path / plan_parts tokens
- investigate: investigation_path token
- make-groups: groups_path / manifest_path / group_files tokens
- review-approach: review_path token
- audit-impl: remediation_path token (NO GO path only)
- arch-lens-* (x13): diagram_path token

Each instruction states the token must use the absolute path (resolve CWD prefix),
eliminating the relative/absolute contradiction that caused ~26% emission failures.

Also removes setup-project from skill_contracts.yaml (Option B): analysis_path
is not captured by any bundled recipe, making the contract a guaranteed 100%
CONTRACT_VIOLATION with no benefit. Removes setup-project from the static
coverage list in test_contracts.py accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…project removal

- Remove analysis_path and config_path from _EXPECTED_OUTPUT_PATH_TOKENS in
  test_headless.py — these tokens were owned solely by setup-project which is
  no longer in skill_contracts.yaml
- Remove analysis_path and config_path from expected_path_tokens in
  test_skill_output_compliance.py for the same reason
- Fix _get_contracted_path_capture_skills() to navigate the nested YAML
  structure (contracts["skills"]) instead of top-level iteration which
  iterated over the "version" key as well

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Trecek Trecek force-pushed the adjudicated-failure-false-positive-make-plan-session-killed/477 branch from 167be0c to 8c0b80d Compare March 23, 2026 00:35
@Trecek Trecek added this pull request to the merge queue Mar 23, 2026
Merged via the queue into integration with commit 926bbf8 Mar 23, 2026
2 checks passed
@Trecek Trecek deleted the adjudicated-failure-false-positive-make-plan-session-killed/477 branch March 23, 2026 00:40
@Trecek Trecek restored the adjudicated-failure-false-positive-make-plan-session-killed/477 branch March 23, 2026 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant