Skip to content

Rectify: Context-Limit Dirty-Tree Architectural Immunity#901

Merged
Trecek merged 4 commits intointegrationfrom
resolve-failures-skill-lacks-context-limit-behavior-leaves-u/894
Apr 14, 2026
Merged

Rectify: Context-Limit Dirty-Tree Architectural Immunity#901
Trecek merged 4 commits intointegrationfrom
resolve-failures-skill-lacks-context-limit-behavior-leaves-u/894

Conversation

@Trecek
Copy link
Copy Markdown
Collaborator

@Trecek Trecek commented Apr 14, 2026

Summary

When any file-writing skill (write_behavior: always) hits a context limit mid-execution, edits land on disk without being committed. The recipe's on_context_limit routing then bypasses the skill's commit protocol, sending the pipeline to test_check (which has zero git awareness) and onward to merge_worktree, which hard-rejects the dirty tree. The pipeline self-heals via the dirty_tree -> fix loop, but burns ~30+ minutes per occurrence (9 confirmed since March 16).

The issue is not specific to resolve-failures — it is a class of bug affecting any file-writing skill invoked from a recipe step with on_context_limit routing. The architecture currently validates only worktree-creating skills (2 of 107), leaving all worktree-operating skills unprotected.

Architectural weakness: There is no structural guarantee that a worktree is clean before reaching merge_worktree. The dirty-tree gate in perform_merge() correctly detects the problem but can only reject — it cannot commit. The expensive dirty_tree -> fix -> Step 0.5 -> test -> merge recovery loop is a symptom-level mitigation, not a structural fix.

Proposed immunity: A three-layer defense:

  1. Recipe layer — A commit_guard step immediately before every merge_worktree step that auto-commits any pending changes
  2. Validation layer — New semantic rules that catch unsafe on_context_limit routing and missing context-limit declarations at recipe lint time
  3. Contract layer — Expanded skill classifications and SKILL.md sections that make context-limit behavior declarative and testable

Architecture Impact

Error/Resilience Diagram

%%{init: {'flowchart': {'nodeSpacing': 45, 'rankSpacing': 55, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    %% ENTRY %%
    RECIPE(["Recipe YAML<br/>━━━━━━━━━━<br/>Loaded recipe object"])

    subgraph StructLayer ["● STRUCTURAL VALIDATION — validator.py"]
        direction TB
        SV_NAME["Gate: name present<br/>━━━━━━━━━━<br/>errors += 'must have name'"]
        SV_STEPS["Gate: steps non-empty<br/>━━━━━━━━━━<br/>errors += 'must have steps'"]
        SV_DISC["Gate: step discriminators<br/>━━━━━━━━━━<br/>tool|action|python|constant"]
        SV_ROUTE["Gate: routing refs valid<br/>━━━━━━━━━━<br/>on_success/on_failure/on_context_limit<br/>→ known steps only"]
        SV_RETRIES["Gate: retries/timeouts<br/>━━━━━━━━━━<br/>non-negative integers"]
        SV_KITCHEN["Gate: kitchen_rules present<br/>━━━━━━━━━━<br/>orchestrator discipline"]
        SV_ACCUM["Accumulate list[str]<br/>━━━━━━━━━━<br/>all errors collected<br/>never fail-fast"]
    end

    subgraph CtxBuild ["CONTEXT BUILDER"]
        CTX["make_validation_context()<br/>━━━━━━━━━━<br/>step graph + dataflow report<br/>computed once, shared"]
    end

    subgraph SemanticLayer ["SEMANTIC RULE REGISTRY — run_semantic_rules()"]
        direction TB

        subgraph MergeRules ["● rules_merge.py"]
            MR1["● merge-routing-incomplete<br/>━━━━━━━━━━<br/>Guards: tool==merge_worktree<br/>+ on_result.conditions present<br/>Checks: all 4 recoverable failed_step<br/>routes declared<br/>(DIRTY_TREE, TEST_GATE,<br/>POST_REBASE_TEST_GATE, REBASE)<br/>Severity: ERROR"]
            MR2["● merge-without-commit-guard<br/>━━━━━━━━━━<br/>Guards: tool==merge_worktree<br/>Checks: predecessor is commit_guard<br/>or run_cmd with 'git commit'<br/>Severity: ERROR"]
        end

        subgraph WorktreeRules ["● rules_worktree.py"]
            WR1["● retries-on-worktree-creating-skill<br/>━━━━━━━━━━<br/>Guards: skill tool + retries>0<br/>+ skill in worktree-modifying set<br/>Severity: ERROR"]
            WR2["● missing-context-limit-on-worktree<br/>━━━━━━━━━━<br/>Guards: skill tool + worktree-modifying<br/>+ retries<=0 + on_context_limit None<br/>Severity: WARNING"]
            WR3["● retry-worktree-cwd<br/>━━━━━━━━━━<br/>Guards: skill==retry-worktree<br/>Checks: cwd contains '${{ context.'<br/>Severity: ERROR"]
            WR4["● file-writing-skill-missing-context-limit<br/>━━━━━━━━━━<br/>Guards: manifest loadable + skill tool<br/>+ write_behavior==always<br/>+ on_context_limit absent<br/>Severity: WARNING"]
            WR5["model-on-non-skill<br/>━━━━━━━━━━<br/>Guards: model set + not skill tool<br/>Severity: WARNING"]
        end

        MANIFEST_GATE{"Manifest<br/>loadable?"}
        MANIFEST_DEGRADE["Silent skip<br/>━━━━━━━━━━<br/>load_bundled_manifest()==None<br/>→ return [] (no findings)<br/>log WARNING only"]

        SEM_ACCUM["Accumulate list[RuleFinding]<br/>━━━━━━━━━━<br/>extend per rule, no exception<br/>handling — rule exception<br/>propagates unhandled"]
    end

    subgraph ContractLayer ["CONTRACT VALIDATION — validate_recipe_cards()"]
        CV1["contract-unreferenced-required<br/>━━━━━━━━━━<br/>Required input not referenced<br/>Severity: error"]
        CV2["contract-unsatisfied-input<br/>━━━━━━━━━━<br/>Input contract not satisfied<br/>Severity: error"]
        CV_ACCUM["Accumulate list[dict]<br/>━━━━━━━━━━<br/>contract findings"]
    end

    subgraph DataFlowLayer ["DATAFLOW ANALYSIS — analyze_dataflow() (advisory)"]
        DF["DataFlowReport<br/>━━━━━━━━━━<br/>DEAD_OUTPUT<br/>IMPLICIT_HANDOFF<br/>REF_INVALIDATED<br/>Non-blocking warnings"]
    end

    subgraph ValidityGate ["VALIDITY PREDICATE — compute_recipe_validity()"]
        VP{"Any structural<br/>errors?"}
        VP2{"Any Severity.ERROR<br/>findings?"}
        VP3{"Any contract<br/>severity==error?"}
    end

    %% TERMINALS %%
    T_VALID(["✓ VALID<br/>━━━━━━━━━━<br/>Recipe accepted"])
    T_INVALID(["✗ INVALID<br/>━━━━━━━━━━<br/>Recipe rejected"])
    T_WARN(["⚠ WARNINGS<br/>━━━━━━━━━━<br/>Advisory findings<br/>recipe still valid"])

    %% FLOW %%
    RECIPE --> SV_NAME
    RECIPE --> SV_STEPS
    RECIPE --> SV_DISC
    RECIPE --> SV_ROUTE
    RECIPE --> SV_RETRIES
    RECIPE --> SV_KITCHEN
    SV_NAME --> SV_ACCUM
    SV_STEPS --> SV_ACCUM
    SV_DISC --> SV_ACCUM
    SV_ROUTE --> SV_ACCUM
    SV_RETRIES --> SV_ACCUM
    SV_KITCHEN --> SV_ACCUM

    SV_ACCUM --> CTX

    CTX --> MR1
    CTX --> MR2
    CTX --> WR1
    CTX --> WR2
    CTX --> WR3
    CTX --> WR4
    CTX --> WR5
    CTX --> CV1
    CTX --> CV2
    CTX --> DF

    WR4 --> MANIFEST_GATE
    MANIFEST_GATE -->|"None"| MANIFEST_DEGRADE
    MANIFEST_GATE -->|"loaded"| SEM_ACCUM

    MR1 --> SEM_ACCUM
    MR2 --> SEM_ACCUM
    WR1 --> SEM_ACCUM
    WR2 --> SEM_ACCUM
    WR3 --> SEM_ACCUM
    WR5 --> SEM_ACCUM
    MANIFEST_DEGRADE --> SEM_ACCUM

    CV1 --> CV_ACCUM
    CV2 --> CV_ACCUM

    SV_ACCUM --> VP
    SEM_ACCUM --> VP2
    CV_ACCUM --> VP3

    VP -->|"yes"| T_INVALID
    VP -->|"no"| VP2
    VP2 -->|"yes"| T_INVALID
    VP2 -->|"no"| VP3
    VP3 -->|"yes"| T_INVALID
    VP3 -->|"no"| T_VALID

    DF -->|"advisory"| T_WARN
    T_VALID --> T_WARN

    %% CLASS ASSIGNMENTS %%
    class RECIPE cli;
    class SV_NAME,SV_STEPS,SV_DISC,SV_ROUTE,SV_RETRIES,SV_KITCHEN detector;
    class SV_ACCUM,SEM_ACCUM,CV_ACCUM phase;
    class CTX stateNode;
    class MR1,MR2 handler;
    class WR1,WR2,WR3,WR4,WR5 handler;
    class CV1,CV2 detector;
    class MANIFEST_GATE stateNode;
    class MANIFEST_DEGRADE gap;
    class VP,VP2,VP3 stateNode;
    class DF output;
    class T_VALID,T_INVALID,T_WARN terminal;
Loading

Process Flow Diagram

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 55, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    START([START])
    VALID_TERMINAL(["VALID: Recipe Deployed to Agent"])
    BLOCKED(["BLOCKED: Validation Errors"])
    DONE(["DONE: Merge Complete"])
    FAILED(["FAILED: escalate_stop"])

    %% VALIDATION PIPELINE %%
    subgraph Validation ["Pre-Flight Validation Pipeline"]
        direction TB
        LoadRecipe["load_recipe<br/>━━━━━━━━━━<br/>YAML parse + temp substitution<br/>sub-recipe composition + block extraction"]
        StructCheck["validate_recipe<br/>━━━━━━━━━━<br/>Structural schema checks<br/>routing targets · retries · fields"]
        SemCtx["make_validation_context<br/>━━━━━━━━━━<br/>Build step graph<br/>predecessor map + dataflow analysis"]
        RunRules["run_semantic_rules<br/>━━━━━━━━━━<br/>Iterate _RULE_REGISTRY per spec<br/>then _BLOCK_RULE_REGISTRY per block"]
    end

    %% MERGE RULES %%
    subgraph MergeRules ["● rules_merge.py — Merge Guard Rules"]
        direction TB
        MergeRouting["● merge-routing-incomplete<br/>━━━━━━━━━━<br/>For each merge_worktree step:<br/>scan on_result when-clauses for<br/>failed_step == VALUE patterns<br/>Must cover all 4: dirty_tree,<br/>test_gate, post_rebase_test_gate, rebase<br/>Severity: ERROR"]
        CommitGuardRule["● merge-without-commit-guard<br/>━━━━━━━━━━<br/>For each merge_worktree step:<br/>check ctx.predecessors for a step<br/>named commit_guard* OR<br/>run_cmd containing 'git commit'<br/>Severity: ERROR"]
    end

    %% WORKTREE RULES %%
    subgraph WorktreeRules ["● rules_worktree.py — Worktree Guard Rules"]
        direction TB
        RetriesRule["● retries-on-worktree-creating-skill<br/>━━━━━━━━━━<br/>run_skill steps where skill resolves to<br/>implement-worktree, implement-worktree-no-merge,<br/>or implement-experiment<br/>Must have retries: 0<br/>Severity: ERROR"]
        CtxLimitRule["● missing-context-limit-on-worktree<br/>━━━━━━━━━━<br/>Worktree-modifying skill with<br/>retries le 0 AND no on_context_limit set<br/>Use on_context_limit to route to<br/>retry-worktree instead<br/>Severity: WARNING"]
        RetryCwdRule["● retry-worktree-cwd<br/>━━━━━━━━━━<br/>retry-worktree steps must reference<br/>a context capture in the cwd arg<br/>e.g. context.worktree_path<br/>Severity: ERROR"]
        FileWriteRule["● file-writing-skill-missing-context-limit<br/>━━━━━━━━━━<br/>Skills with write_behavior=always<br/>in bundled manifest require<br/>on_context_limit to be set<br/>Severity: WARNING"]
    end

    %% VALIDITY GATE %%
    ValidDecision{"compute_recipe_validity<br/>━━━━━━━━━━<br/>Any structural errors?<br/>Any Severity.ERROR findings?<br/>Any contract violations?"}

    %% RUNTIME EXECUTION %%
    subgraph WorktreeExec ["● Recipe Execution: Worktree Pattern (implementation · remediation · merge-prs)"]
        direction TB
        Implement["● implement<br/>━━━━━━━━━━<br/>run_skill: implement-worktree-no-merge<br/>retries: 0 (enforced by rule)<br/>on_context_limit: retry_worktree"]
        CtxDecision{"context<br/>limit hit?"}
        RetryWorktree["retry_worktree<br/>━━━━━━━━━━<br/>run_skill: retry-worktree<br/>cwd: context capture ref<br/>(cwd rule enforced)"]
        Test["test<br/>━━━━━━━━━━<br/>test_check"]
        TestDecision{"tests<br/>pass?"}
        Fix["fix / assess<br/>━━━━━━━━━━<br/>run_skill: resolve-failures<br/>on_result: verdict dispatch"]
        CommitGuardStep["commit_guard<br/>━━━━━━━━━━<br/>run_cmd: git commit<br/>belt-and-suspenders dirty-tree guard<br/>(required predecessor for merge)"]
        MergeStep["● merge<br/>━━━━━━━━━━<br/>merge_worktree<br/>on_result: multi-condition routing"]
        MergeDecision{"failed_step<br/>value?"}
    end

    %% VALIDATION FLOW %%
    START --> LoadRecipe
    LoadRecipe --> StructCheck
    StructCheck --> SemCtx
    SemCtx --> RunRules
    RunRules --> MergeRules
    RunRules --> WorktreeRules
    MergeRules --> ValidDecision
    WorktreeRules --> ValidDecision
    ValidDecision -->|"all clean"| VALID_TERMINAL
    ValidDecision -->|"errors found"| BLOCKED

    %% RUNTIME FLOW %%
    VALID_TERMINAL --> Implement
    Implement --> CtxDecision
    CtxDecision -->|"yes"| RetryWorktree
    CtxDecision -->|"no: completed"| Test
    RetryWorktree --> Test
    Test --> TestDecision
    TestDecision -->|"fail"| Fix
    TestDecision -->|"pass"| CommitGuardStep
    Fix -->|"retry"| Test
    CommitGuardStep --> MergeStep
    MergeStep --> MergeDecision
    MergeDecision -->|"dirty_tree / test_gate /<br/>post_rebase_test_gate / rebase"| Fix
    MergeDecision -->|"success"| DONE
    MergeDecision -->|"error"| FAILED

    %% CLASS ASSIGNMENTS %%
    class START,VALID_TERMINAL,BLOCKED,DONE,FAILED terminal;
    class LoadRecipe,StructCheck,SemCtx,RunRules phase;
    class MergeRouting,CommitGuardRule,RetriesRule,CtxLimitRule,RetryCwdRule,FileWriteRule detector;
    class Implement,RetryWorktree,Test,Fix,CommitGuardStep,MergeStep handler;
    class CtxDecision,TestDecision,MergeDecision,ValidDecision stateNode;
Loading

State Lifecycle Diagram

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;

    START([SKILL INVOCATION])

    %% ─────────────── FIELD LIFECYCLES ─────────────── %%
    subgraph Lifecycles ["FIELD LIFECYCLE CATEGORIES"]
        direction LR
        INIT_ONLY["INIT_ONLY<br/>━━━━━━━━━━<br/>worktree_path, plan_path<br/>base_branch, PR_NUMBER<br/>NEVER modify after Step 0"]
        INIT_PRES["INIT_PRESERVE<br/>━━━━━━━━━━<br/>FEATURE_BRANCH (git rev-parse)<br/>BASE_BRANCH (upstream→file fallback)<br/>ts (timestamp for file naming)<br/>Stable across retry iterations"]
        MUTABLE["MUTABLE<br/>━━━━━━━━━━<br/>verdict, failure_type<br/>failure_subtype, resolution<br/>stop_triggers<br/>Refined during analysis phases"]
        APPEND["APPEND_ONLY<br/>━━━━━━━━━━<br/>fixes_applied (0→3)<br/>PHASES_IMPLEMENTED<br/>classification_map<br/>resolved_count<br/>Only grows; never shrinks"]
    end

    %% ─────────────── VALIDATION GATES ─────────────── %%
    subgraph Gates ["VALIDATION GATES (ordered)"]
        direction TB
        GATE_ARG["Gate 1: Arg Presence<br/>━━━━━━━━━━<br/>FAIL-FAST: required args missing<br/>→ abort with usage message<br/>emit empty token, exit 0"]
        GATE_RECIPE["● Gate 2: Recipe Structural<br/>━━━━━━━━━━<br/>validator.py: validate_recipe()<br/>ACCUMULATE (non-fail-fast)<br/>on_context_limit → valid step name<br/>or 'done'; routing target check<br/>stale_threshold, idle_output_timeout<br/>with_args context resolution"]
        GATE_CONTRACT["● Gate 3: Context-Limit Contract<br/>━━━━━━━━━━<br/>test_instruction_surface.py<br/>TestContextLimitBehaviorContract<br/>write_behavior=always → SKILL.md<br/>MUST contain '## Context Limit Behavior'<br/>Enforced at CI time, not runtime"]
        GATE_DIRTY["● Gate 4: Dirty Tree Guard<br/>━━━━━━━━━━<br/>git status --porcelain<br/>if non-empty:<br/>  git add -A && git commit<br/>Runs: before test AND before tokens<br/>Write skills only (resolve-failures,<br/>resolve-review, generate-report,<br/>retry-worktree)"]
    end

    %% ─────────────── SKILL WRITE CATEGORIES ─────────────── %%
    subgraph SkillMutation ["● SKILL WRITE BEHAVIOR CLASSIFICATION"]
        direction LR
        WRITE_SKILLS["● Write Skills<br/>━━━━━━━━━━<br/>resolve-failures<br/>resolve-review<br/>generate-report<br/>retry-worktree<br/>write_behavior = always<br/>Dirty-tree guard active<br/>on_context_limit routed"]
        READONLY_SKILLS["● Read-Only Skills<br/>━━━━━━━━━━<br/>diagnose-ci<br/>prepare-pr, compose-pr<br/>review-design<br/>resolve-design-review<br/>run-experiment<br/>dry-walkthrough<br/>write_behavior ≠ always<br/>Graceful degradation<br/>on_context_limit: empty token"]
    end

    %% ─────────────── CONTEXT LIMIT PATHS ─────────────── %%
    subgraph ContextLimit ["● CONTEXT-LIMIT BEHAVIOR (new contract)"]
        direction TB
        CL_COMMIT["Dirty-Tree Commit<br/>━━━━━━━━━━<br/>chore: commit pending changes<br/>before context limit<br/>All disk edits preserved<br/>No work lost on exhaustion"]
        CL_EMPTY["Empty Token Emit<br/>━━━━━━━━━━<br/>emit token = (empty string)<br/>exit 0 / exit non-zero<br/>Orchestrator on_context_limit<br/>routes to recovery step"]
        CL_FALLBACK["Safe Fallback State<br/>━━━━━━━━━━<br/>diagnose-ci: failure_type=unknown<br/>review-design: verdict=STOP<br/>resolve-design-review: resolution=failed<br/>Never leaves pipeline unroutable"]
    end

    %% ─────────────── RESUME DETECTION ─────────────── %%
    subgraph Resume ["RESUME DETECTION TIERS"]
        direction TB
        TIER1["Tier 1: Explicit Resume<br/>━━━━━━━━━━<br/>retry-worktree:<br/>git log scan → identify<br/>complete/partial/unstarted phases<br/>Continue from partial phase"]
        TIER2["Tier 2: Adjust Mode<br/>━━━━━━━━━━<br/>run-experiment --adjust:<br/>reads prior temp results<br/>identifies failure mode<br/>minimal re-run only"]
        TIER3["Tier 3: Implicit Safety<br/>━━━━━━━━━━<br/>Dirty-tree commit guard<br/>All write skills: pre-test +<br/>pre-token commit ensures<br/>no uncommitted edits on disk"]
    end

    %% ─────────────── OUTPUT TOKENS ─────────────── %%
    subgraph Tokens ["STRUCTURED OUTPUT TOKENS (DERIVED)"]
        direction TB
        TOKEN_RULES["Token Emission Rules<br/>━━━━━━━━━━<br/>ALWAYS after dirty-tree guard<br/>Literal plain text (no markdown)<br/>Adjudicator: regex match<br/>Empty value = graceful degradation<br/>%%ORDER_UP%% = pipeline advance"]
    end

    %% ─────────────── FLOW ─────────────── %%
    START --> Lifecycles
    START --> GATE_ARG
    GATE_ARG -->|"pass"| GATE_RECIPE
    GATE_ARG -->|"fail → emit empty token, exit 0"| TOKEN_RULES
    GATE_RECIPE -->|"structural errors accumulate"| GATE_CONTRACT
    GATE_CONTRACT -->|"CI enforcement"| GATE_DIRTY

    GATE_DIRTY --> WRITE_SKILLS
    GATE_RECIPE --> READONLY_SKILLS

    WRITE_SKILLS --> CL_COMMIT
    READONLY_SKILLS --> CL_EMPTY
    READONLY_SKILLS --> CL_FALLBACK

    CL_COMMIT --> TOKEN_RULES
    CL_EMPTY --> TOKEN_RULES
    CL_FALLBACK --> TOKEN_RULES

    TIER1 --> WRITE_SKILLS
    TIER2 --> READONLY_SKILLS
    TIER3 --> WRITE_SKILLS

    INIT_ONLY --> GATE_ARG
    INIT_PRES --> GATE_RECIPE
    MUTABLE --> CL_FALLBACK
    APPEND --> CL_COMMIT

    %% CLASS ASSIGNMENTS %%
    class START terminal;
    class INIT_ONLY detector;
    class INIT_PRES gap;
    class MUTABLE phase;
    class APPEND handler;
    class GATE_ARG detector;
    class GATE_RECIPE stateNode;
    class GATE_CONTRACT stateNode;
    class GATE_DIRTY stateNode;
    class WRITE_SKILLS handler;
    class READONLY_SKILLS phase;
    class CL_COMMIT output;
    class CL_EMPTY output;
    class CL_FALLBACK gap;
    class TIER1 cli;
    class TIER2 cli;
    class TIER3 cli;
    class TOKEN_RULES output;
Loading

Closes #894

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/remediation-20260413-220101-339308/.autoskillit/temp/rectify/rectify_context-limit-dirty-tree-immunity_2026-04-13_223000.md

🤖 Generated with Claude Code via AutoSkillit

Token Usage Summary

Step uncached output cache_read cache_write count time
investigate 54 9.2k 759.2k 63.6k 1 6m 23s
rectify 7.5k 19.5k 1.7M 74.6k 1 10m 45s
dry_walkthrough 240 22.6k 1.8M 73.8k 1 5m 52s
implement 82 4.1k 271.0k 34.1k 1 3m 21s
retry_worktree 1.5k 81.3k 19.2M 214.4k 2 33m 4s
prepare_pr 52 6.2k 161.6k 29.5k 1 1m 53s
run_arch_lenses 4.0k 28.3k 727.5k 104.0k 3 13m 31s
compose_pr 75 12.0k 265.2k 37.5k 1 3m 21s
Total 13.5k 183.3k 24.9M 631.5k 1h 18m

Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit PR Review — Verdict: changes_requested

Comment thread tests/contracts/test_instruction_surface.py
from autoskillit.recipe.contracts import load_bundled_manifest, resolve_skill_name
from autoskillit.recipe.io import builtin_recipes_dir, load_recipe

manifest = load_bundled_manifest()
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] defense: test_pipeline_file_writing_skills_have_context_limit_section silently skips skills whose SKILL.md does not exist (L527-L528: if not skill_md_path.exists(): continue). A missing SKILL.md likely indicates a misconfiguration; silently skipping weakens the contract guard — the test would pass even when the skill directory exists but has no SKILL.md. Log or fail with a diagnostic message instead of skipping.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid observation — flagged for design decision. Whether a missing SKILL.md is a misconfiguration error or a valid configuration state for non-bundled/external skills is a policy question requiring a human decision about the intended contract scope.

Comment thread src/autoskillit/recipe/rules_worktree.py Outdated
Comment thread src/autoskillit/recipe/rules_worktree.py Outdated
wf = ctx.recipe
findings: list[RuleFinding] = []

manifest = load_bundled_manifest()
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] defense: load_bundled_manifest() is called inside the rule function body at every rule evaluation. If the manifest fails to load, the rule silently returns [] (an empty findings list), causing all file-writing skill checks to be silently skipped rather than reported. The None-return path should log a warning at ERROR severity or the manifest should be validated eagerly.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. The guard at lines 162-164 explicitly calls logger.warning() before returning — the failure is not silent. Additionally, load_bundled_manifest() is decorated with @lru_cache(maxsize=1) (contracts.py), so it is evaluated once and cached; the framing of 'called at every rule evaluation' is factually incorrect.

)

def test_resolve_failures_has_context_limit_section(self):
"""resolve-failures SKILL.md must contain '## Context Limit Behavior'."""
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] slop: test_resolve_failures_has_context_limit_section is strictly subsumed by test_pipeline_file_writing_skills_have_context_limit_section in the same class: the broader test already checks every write_behavior=always skill (which includes resolve-failures) for the '## Context Limit Behavior' section. The single-skill test provides no additional coverage and will become stale if resolve-failures changes write_behavior. Remove it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. test_resolve_failures_has_context_limit_section is NOT subsumed by the broader test: test_pipeline_file_writing_skills_have_context_limit_section only inspects steps where step.on_context_limit is not None (L517 filter). The narrow test checks resolve-failures/SKILL.md content unconditionally, providing independent coverage when resolve-failures is used in steps without on_context_limit declared.

Comment thread tests/recipe/test_rules_worktree.py Outdated


def _is_commit_guard(step_name: str, ctx: ValidationContext) -> bool:
"""Return True if step_name is a commit_guard predecessor for merge_worktree.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] slop: _is_commit_guard is a 7-line private helper with a self-evident name. The docstring repeats exactly what the 7 lines of code do and adds no information. Remove it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. The docstring explains two distinct detection heuristics (name-prefix AND tool/cmd) which is non-trivially useful; the two-branch logic is not self-evident from the function name alone. Not removing.

commit_guard:
tool: run_cmd
with:
cmd: "cd ${{ context.worktree_path }} && if [ -n \"$(git status --porcelain)\" ]; then git add -A && git commit -m 'chore: commit pending session changes'; fi"
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] defense: commit_guard uses git add -A which stages all untracked files including build artifacts and temp files not covered by .gitignore. If the worktree has .autoskillit/temp/ files or other ignored-but-present artifacts, they will be committed under 'chore: commit pending session changes'. Consider git add -u (tracked files only) unless staging all untracked files is intentional.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. git add -A is a deliberate design choice: the commit_guard purpose is to catch ALL dirty state including new untracked files from skills (commit e9cf6cf, note field at lines 339-342). Using git add -u would miss untracked files created by skills. The merge_worktree dirty-tree gate is the final arbiter for real issues.

commit_guard:
tool: run_cmd
with:
cmd: "cd ${{ context.worktree_path }} && if [ -n \"$(git status --porcelain)\" ]; then git add -A && git commit -m 'chore: commit pending session changes'; fi"
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] defense: commit_guard uses git add -A which stages all untracked files. Same concern as implementation.yaml: temp/build artifacts not covered by .gitignore may be committed. This pattern appears in all three recipe files (implementation.yaml, implementation-groups.yaml, remediation.yaml) and should be reviewed consistently.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. Consistent with implementation.yaml: git add -A across all three recipe files is a deliberate pattern for capturing all dirty state including new untracked files before merge.

Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit review found 10 blocking issues. See inline comments.

Trecek and others added 4 commits April 14, 2026 00:23
Three-layer defense against the class of bug where file-writing skills hit
context limits mid-execution, leaving uncommitted edits on disk that later
fail the merge_worktree dirty-tree gate — causing expensive 30-min recovery
cycles (9 confirmed since March 16).

**Layer 1: Recipe commit_guard steps**
- Add `commit_guard` step before every `merge_worktree` step in
  implementation.yaml, implementation-groups.yaml, and remediation.yaml
- Auto-commits any uncommitted changes (from context-exhausted skills,
  pre-commit auto-fixes, or any other source) before merge
- on_failure routes to merge so the dirty-tree gate remains the final arbiter

**Layer 2: Missing on_context_limit fields**
- Add `on_context_limit: re_push` to resolve_ci steps in all 3 pipeline
  recipes (implementation, remediation, implementation-groups)
- Add `on_context_limit: retest` to fix_tests in research.yaml
- Add `on_context_limit` to merge-prs.yaml resolve_review_integration and
  research.yaml generate_report / generate_report_inconclusive steps

**Layer 3: Semantic validation rules**
- New rule `merge-without-commit-guard` (ERROR): merge_worktree step without
  a commit_guard predecessor — fires at lint time to block regression
- New rule `file-writing-skill-missing-context-limit` (WARNING): run_skill
  step invoking a write_behavior=always skill without on_context_limit
- Rename _WORKTREE_CREATING_SKILLS → _WORKTREE_MODIFYING_SKILLS; add
  implement-experiment to coverage

**Layer 4: SKILL.md Context Limit Behavior sections**
- Add ## Context Limit Behavior sections to resolve-failures, resolve-review,
  generate-report, retry-worktree, run-experiment, compose-pr, prepare-pr,
  diagnose-ci, dry-walkthrough, resolve-design-review, review-design
- Strengthen resolve-failures Step 3 commit instructions to stage all modified
  files (including pre-commit auto-fixes) before structured output emission

**Tests (all new, all pass)**
- test_merge_worktree_has_commit_guard_predecessor (parametric, all recipes)
- test_resolve_failures_steps_have_on_context_limit (parametric, all recipes)
- test_worktree_creating_skills_includes_experiment
- test_file_writing_skill_missing_context_limit_fires + negative cases
- test_merge_without_commit_guard_fires + negative cases
- test_resolve_failures_has_context_limit_section
- test_pipeline_file_writing_skills_have_context_limit_section

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
load_bundled_manifest() can return None for a corrupt/empty YAML;
the test now asserts clearly rather than raising AttributeError.
…skill

The frozenset was already renamed to _WORKTREE_MODIFYING_SKILLS; the
semantic rule name, function name, and all test assertions now match.
…uction fn

The test helper duplicated _is_commit_guard() from rules_merge.py verbatim.
The test now imports and calls the production function directly, eliminating
silent drift if the detection logic changes.
@Trecek Trecek force-pushed the resolve-failures-skill-lacks-context-limit-behavior-leaves-u/894 branch from 16dade1 to 52218be Compare April 14, 2026 07:25
@Trecek Trecek added this pull request to the merge queue Apr 14, 2026
Merged via the queue into integration with commit 38c424d Apr 14, 2026
2 checks passed
@Trecek Trecek deleted the resolve-failures-skill-lacks-context-limit-behavior-leaves-u/894 branch April 14, 2026 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant