Rectify: Context-Limit Dirty-Tree Architectural Immunity by Trecek · Pull Request #901 · TalonT-Org/AutoSkillit

Trecek · 2026-04-14T06:43:54Z

Summary

When any file-writing skill (write_behavior: always) hits a context limit mid-execution, edits land on disk without being committed. The recipe's on_context_limit routing then bypasses the skill's commit protocol, sending the pipeline to test_check (which has zero git awareness) and onward to merge_worktree, which hard-rejects the dirty tree. The pipeline self-heals via the dirty_tree -> fix loop, but burns ~30+ minutes per occurrence (9 confirmed since March 16).

The issue is not specific to resolve-failures — it is a class of bug affecting any file-writing skill invoked from a recipe step with on_context_limit routing. The architecture currently validates only worktree-creating skills (2 of 107), leaving all worktree-operating skills unprotected.

Architectural weakness: There is no structural guarantee that a worktree is clean before reaching merge_worktree. The dirty-tree gate in perform_merge() correctly detects the problem but can only reject — it cannot commit. The expensive dirty_tree -> fix -> Step 0.5 -> test -> merge recovery loop is a symptom-level mitigation, not a structural fix.

Proposed immunity: A three-layer defense:

Recipe layer — A commit_guard step immediately before every merge_worktree step that auto-commits any pending changes
Validation layer — New semantic rules that catch unsafe on_context_limit routing and missing context-limit declarations at recipe lint time
Contract layer — Expanded skill classifications and SKILL.md sections that make context-limit behavior declarative and testable

Architecture Impact

Error/Resilience Diagram

%%{init: {'flowchart': {'nodeSpacing': 45, 'rankSpacing': 55, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    %% ENTRY %%
    RECIPE(["Recipe YAML<br/>━━━━━━━━━━<br/>Loaded recipe object"])

    subgraph StructLayer ["● STRUCTURAL VALIDATION — validator.py"]
        direction TB
        SV_NAME["Gate: name present<br/>━━━━━━━━━━<br/>errors += 'must have name'"]
        SV_STEPS["Gate: steps non-empty<br/>━━━━━━━━━━<br/>errors += 'must have steps'"]
        SV_DISC["Gate: step discriminators<br/>━━━━━━━━━━<br/>tool|action|python|constant"]
        SV_ROUTE["Gate: routing refs valid<br/>━━━━━━━━━━<br/>on_success/on_failure/on_context_limit<br/>→ known steps only"]
        SV_RETRIES["Gate: retries/timeouts<br/>━━━━━━━━━━<br/>non-negative integers"]
        SV_KITCHEN["Gate: kitchen_rules present<br/>━━━━━━━━━━<br/>orchestrator discipline"]
        SV_ACCUM["Accumulate list[str]<br/>━━━━━━━━━━<br/>all errors collected<br/>never fail-fast"]
    end

    subgraph CtxBuild ["CONTEXT BUILDER"]
        CTX["make_validation_context()<br/>━━━━━━━━━━<br/>step graph + dataflow report<br/>computed once, shared"]
    end

    subgraph SemanticLayer ["SEMANTIC RULE REGISTRY — run_semantic_rules()"]
        direction TB

        subgraph MergeRules ["● rules_merge.py"]
            MR1["● merge-routing-incomplete<br/>━━━━━━━━━━<br/>Guards: tool==merge_worktree<br/>+ on_result.conditions present<br/>Checks: all 4 recoverable failed_step<br/>routes declared<br/>(DIRTY_TREE, TEST_GATE,<br/>POST_REBASE_TEST_GATE, REBASE)<br/>Severity: ERROR"]
            MR2["● merge-without-commit-guard<br/>━━━━━━━━━━<br/>Guards: tool==merge_worktree<br/>Checks: predecessor is commit_guard<br/>or run_cmd with 'git commit'<br/>Severity: ERROR"]
        end

        subgraph WorktreeRules ["● rules_worktree.py"]
            WR1["● retries-on-worktree-creating-skill<br/>━━━━━━━━━━<br/>Guards: skill tool + retries>0<br/>+ skill in worktree-modifying set<br/>Severity: ERROR"]
            WR2["● missing-context-limit-on-worktree<br/>━━━━━━━━━━<br/>Guards: skill tool + worktree-modifying<br/>+ retries<=0 + on_context_limit None<br/>Severity: WARNING"]
            WR3["● retry-worktree-cwd<br/>━━━━━━━━━━<br/>Guards: skill==retry-worktree<br/>Checks: cwd contains '${{ context.'<br/>Severity: ERROR"]
            WR4["● file-writing-skill-missing-context-limit<br/>━━━━━━━━━━<br/>Guards: manifest loadable + skill tool<br/>+ write_behavior==always<br/>+ on_context_limit absent<br/>Severity: WARNING"]
            WR5["model-on-non-skill<br/>━━━━━━━━━━<br/>Guards: model set + not skill tool<br/>Severity: WARNING"]
        end

        MANIFEST_GATE{"Manifest<br/>loadable?"}
        MANIFEST_DEGRADE["Silent skip<br/>━━━━━━━━━━<br/>load_bundled_manifest()==None<br/>→ return [] (no findings)<br/>log WARNING only"]

        SEM_ACCUM["Accumulate list[RuleFinding]<br/>━━━━━━━━━━<br/>extend per rule, no exception<br/>handling — rule exception<br/>propagates unhandled"]
    end

    subgraph ContractLayer ["CONTRACT VALIDATION — validate_recipe_cards()"]
        CV1["contract-unreferenced-required<br/>━━━━━━━━━━<br/>Required input not referenced<br/>Severity: error"]
        CV2["contract-unsatisfied-input<br/>━━━━━━━━━━<br/>Input contract not satisfied<br/>Severity: error"]
        CV_ACCUM["Accumulate list[dict]<br/>━━━━━━━━━━<br/>contract findings"]
    end

    subgraph DataFlowLayer ["DATAFLOW ANALYSIS — analyze_dataflow() (advisory)"]
        DF["DataFlowReport<br/>━━━━━━━━━━<br/>DEAD_OUTPUT<br/>IMPLICIT_HANDOFF<br/>REF_INVALIDATED<br/>Non-blocking warnings"]
    end

    subgraph ValidityGate ["VALIDITY PREDICATE — compute_recipe_validity()"]
        VP{"Any structural<br/>errors?"}
        VP2{"Any Severity.ERROR<br/>findings?"}
        VP3{"Any contract<br/>severity==error?"}
    end

    %% TERMINALS %%
    T_VALID(["✓ VALID<br/>━━━━━━━━━━<br/>Recipe accepted"])
    T_INVALID(["✗ INVALID<br/>━━━━━━━━━━<br/>Recipe rejected"])
    T_WARN(["⚠ WARNINGS<br/>━━━━━━━━━━<br/>Advisory findings<br/>recipe still valid"])

    %% FLOW %%
    RECIPE --> SV_NAME
    RECIPE --> SV_STEPS
    RECIPE --> SV_DISC
    RECIPE --> SV_ROUTE
    RECIPE --> SV_RETRIES
    RECIPE --> SV_KITCHEN
    SV_NAME --> SV_ACCUM
    SV_STEPS --> SV_ACCUM
    SV_DISC --> SV_ACCUM
    SV_ROUTE --> SV_ACCUM
    SV_RETRIES --> SV_ACCUM
    SV_KITCHEN --> SV_ACCUM

    SV_ACCUM --> CTX

    CTX --> MR1
    CTX --> MR2
    CTX --> WR1
    CTX --> WR2
    CTX --> WR3
    CTX --> WR4
    CTX --> WR5
    CTX --> CV1
    CTX --> CV2
    CTX --> DF

    WR4 --> MANIFEST_GATE
    MANIFEST_GATE -->|"None"| MANIFEST_DEGRADE
    MANIFEST_GATE -->|"loaded"| SEM_ACCUM

    MR1 --> SEM_ACCUM
    MR2 --> SEM_ACCUM
    WR1 --> SEM_ACCUM
    WR2 --> SEM_ACCUM
    WR3 --> SEM_ACCUM
    WR5 --> SEM_ACCUM
    MANIFEST_DEGRADE --> SEM_ACCUM

    CV1 --> CV_ACCUM
    CV2 --> CV_ACCUM

    SV_ACCUM --> VP
    SEM_ACCUM --> VP2
    CV_ACCUM --> VP3

    VP -->|"yes"| T_INVALID
    VP -->|"no"| VP2
    VP2 -->|"yes"| T_INVALID
    VP2 -->|"no"| VP3
    VP3 -->|"yes"| T_INVALID
    VP3 -->|"no"| T_VALID

    DF -->|"advisory"| T_WARN
    T_VALID --> T_WARN

    %% CLASS ASSIGNMENTS %%
    class RECIPE cli;
    class SV_NAME,SV_STEPS,SV_DISC,SV_ROUTE,SV_RETRIES,SV_KITCHEN detector;
    class SV_ACCUM,SEM_ACCUM,CV_ACCUM phase;
    class CTX stateNode;
    class MR1,MR2 handler;
    class WR1,WR2,WR3,WR4,WR5 handler;
    class CV1,CV2 detector;
    class MANIFEST_GATE stateNode;
    class MANIFEST_DEGRADE gap;
    class VP,VP2,VP3 stateNode;
    class DF output;
    class T_VALID,T_INVALID,T_WARN terminal;

Process Flow Diagram

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 55, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    START([START])
    VALID_TERMINAL(["VALID: Recipe Deployed to Agent"])
    BLOCKED(["BLOCKED: Validation Errors"])
    DONE(["DONE: Merge Complete"])
    FAILED(["FAILED: escalate_stop"])

    %% VALIDATION PIPELINE %%
    subgraph Validation ["Pre-Flight Validation Pipeline"]
        direction TB
        LoadRecipe["load_recipe<br/>━━━━━━━━━━<br/>YAML parse + temp substitution<br/>sub-recipe composition + block extraction"]
        StructCheck["validate_recipe<br/>━━━━━━━━━━<br/>Structural schema checks<br/>routing targets · retries · fields"]
        SemCtx["make_validation_context<br/>━━━━━━━━━━<br/>Build step graph<br/>predecessor map + dataflow analysis"]
        RunRules["run_semantic_rules<br/>━━━━━━━━━━<br/>Iterate _RULE_REGISTRY per spec<br/>then _BLOCK_RULE_REGISTRY per block"]
    end

    %% MERGE RULES %%
    subgraph MergeRules ["● rules_merge.py — Merge Guard Rules"]
        direction TB
        MergeRouting["● merge-routing-incomplete<br/>━━━━━━━━━━<br/>For each merge_worktree step:<br/>scan on_result when-clauses for<br/>failed_step == VALUE patterns<br/>Must cover all 4: dirty_tree,<br/>test_gate, post_rebase_test_gate, rebase<br/>Severity: ERROR"]
        CommitGuardRule["● merge-without-commit-guard<br/>━━━━━━━━━━<br/>For each merge_worktree step:<br/>check ctx.predecessors for a step<br/>named commit_guard* OR<br/>run_cmd containing 'git commit'<br/>Severity: ERROR"]
    end

    %% WORKTREE RULES %%
    subgraph WorktreeRules ["● rules_worktree.py — Worktree Guard Rules"]
        direction TB
        RetriesRule["● retries-on-worktree-creating-skill<br/>━━━━━━━━━━<br/>run_skill steps where skill resolves to<br/>implement-worktree, implement-worktree-no-merge,<br/>or implement-experiment<br/>Must have retries: 0<br/>Severity: ERROR"]
        CtxLimitRule["● missing-context-limit-on-worktree<br/>━━━━━━━━━━<br/>Worktree-modifying skill with<br/>retries le 0 AND no on_context_limit set<br/>Use on_context_limit to route to<br/>retry-worktree instead<br/>Severity: WARNING"]
        RetryCwdRule["● retry-worktree-cwd<br/>━━━━━━━━━━<br/>retry-worktree steps must reference<br/>a context capture in the cwd arg<br/>e.g. context.worktree_path<br/>Severity: ERROR"]
        FileWriteRule["● file-writing-skill-missing-context-limit<br/>━━━━━━━━━━<br/>Skills with write_behavior=always<br/>in bundled manifest require<br/>on_context_limit to be set<br/>Severity: WARNING"]
    end

    %% VALIDITY GATE %%
    ValidDecision{"compute_recipe_validity<br/>━━━━━━━━━━<br/>Any structural errors?<br/>Any Severity.ERROR findings?<br/>Any contract violations?"}

    %% RUNTIME EXECUTION %%
    subgraph WorktreeExec ["● Recipe Execution: Worktree Pattern (implementation · remediation · merge-prs)"]
        direction TB
        Implement["● implement<br/>━━━━━━━━━━<br/>run_skill: implement-worktree-no-merge<br/>retries: 0 (enforced by rule)<br/>on_context_limit: retry_worktree"]
        CtxDecision{"context<br/>limit hit?"}
        RetryWorktree["retry_worktree<br/>━━━━━━━━━━<br/>run_skill: retry-worktree<br/>cwd: context capture ref<br/>(cwd rule enforced)"]
        Test["test<br/>━━━━━━━━━━<br/>test_check"]
        TestDecision{"tests<br/>pass?"}
        Fix["fix / assess<br/>━━━━━━━━━━<br/>run_skill: resolve-failures<br/>on_result: verdict dispatch"]
        CommitGuardStep["commit_guard<br/>━━━━━━━━━━<br/>run_cmd: git commit<br/>belt-and-suspenders dirty-tree guard<br/>(required predecessor for merge)"]
        MergeStep["● merge<br/>━━━━━━━━━━<br/>merge_worktree<br/>on_result: multi-condition routing"]
        MergeDecision{"failed_step<br/>value?"}
    end

    %% VALIDATION FLOW %%
    START --> LoadRecipe
    LoadRecipe --> StructCheck
    StructCheck --> SemCtx
    SemCtx --> RunRules
    RunRules --> MergeRules
    RunRules --> WorktreeRules
    MergeRules --> ValidDecision
    WorktreeRules --> ValidDecision
    ValidDecision -->|"all clean"| VALID_TERMINAL
    ValidDecision -->|"errors found"| BLOCKED

    %% RUNTIME FLOW %%
    VALID_TERMINAL --> Implement
    Implement --> CtxDecision
    CtxDecision -->|"yes"| RetryWorktree
    CtxDecision -->|"no: completed"| Test
    RetryWorktree --> Test
    Test --> TestDecision
    TestDecision -->|"fail"| Fix
    TestDecision -->|"pass"| CommitGuardStep
    Fix -->|"retry"| Test
    CommitGuardStep --> MergeStep
    MergeStep --> MergeDecision
    MergeDecision -->|"dirty_tree / test_gate /<br/>post_rebase_test_gate / rebase"| Fix
    MergeDecision -->|"success"| DONE
    MergeDecision -->|"error"| FAILED

    %% CLASS ASSIGNMENTS %%
    class START,VALID_TERMINAL,BLOCKED,DONE,FAILED terminal;
    class LoadRecipe,StructCheck,SemCtx,RunRules phase;
    class MergeRouting,CommitGuardRule,RetriesRule,CtxLimitRule,RetryCwdRule,FileWriteRule detector;
    class Implement,RetryWorktree,Test,Fix,CommitGuardStep,MergeStep handler;
    class CtxDecision,TestDecision,MergeDecision,ValidDecision stateNode;

State Lifecycle Diagram

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;

    START([SKILL INVOCATION])

    %% ─────────────── FIELD LIFECYCLES ─────────────── %%
    subgraph Lifecycles ["FIELD LIFECYCLE CATEGORIES"]
        direction LR
        INIT_ONLY["INIT_ONLY<br/>━━━━━━━━━━<br/>worktree_path, plan_path<br/>base_branch, PR_NUMBER<br/>NEVER modify after Step 0"]
        INIT_PRES["INIT_PRESERVE<br/>━━━━━━━━━━<br/>FEATURE_BRANCH (git rev-parse)<br/>BASE_BRANCH (upstream→file fallback)<br/>ts (timestamp for file naming)<br/>Stable across retry iterations"]
        MUTABLE["MUTABLE<br/>━━━━━━━━━━<br/>verdict, failure_type<br/>failure_subtype, resolution<br/>stop_triggers<br/>Refined during analysis phases"]
        APPEND["APPEND_ONLY<br/>━━━━━━━━━━<br/>fixes_applied (0→3)<br/>PHASES_IMPLEMENTED<br/>classification_map<br/>resolved_count<br/>Only grows; never shrinks"]
    end

    %% ─────────────── VALIDATION GATES ─────────────── %%
    subgraph Gates ["VALIDATION GATES (ordered)"]
        direction TB
        GATE_ARG["Gate 1: Arg Presence<br/>━━━━━━━━━━<br/>FAIL-FAST: required args missing<br/>→ abort with usage message<br/>emit empty token, exit 0"]
        GATE_RECIPE["● Gate 2: Recipe Structural<br/>━━━━━━━━━━<br/>validator.py: validate_recipe()<br/>ACCUMULATE (non-fail-fast)<br/>on_context_limit → valid step name<br/>or 'done'; routing target check<br/>stale_threshold, idle_output_timeout<br/>with_args context resolution"]
        GATE_CONTRACT["● Gate 3: Context-Limit Contract<br/>━━━━━━━━━━<br/>test_instruction_surface.py<br/>TestContextLimitBehaviorContract<br/>write_behavior=always → SKILL.md<br/>MUST contain '## Context Limit Behavior'<br/>Enforced at CI time, not runtime"]
        GATE_DIRTY["● Gate 4: Dirty Tree Guard<br/>━━━━━━━━━━<br/>git status --porcelain<br/>if non-empty:<br/>  git add -A && git commit<br/>Runs: before test AND before tokens<br/>Write skills only (resolve-failures,<br/>resolve-review, generate-report,<br/>retry-worktree)"]
    end

    %% ─────────────── SKILL WRITE CATEGORIES ─────────────── %%
    subgraph SkillMutation ["● SKILL WRITE BEHAVIOR CLASSIFICATION"]
        direction LR
        WRITE_SKILLS["● Write Skills<br/>━━━━━━━━━━<br/>resolve-failures<br/>resolve-review<br/>generate-report<br/>retry-worktree<br/>write_behavior = always<br/>Dirty-tree guard active<br/>on_context_limit routed"]
        READONLY_SKILLS["● Read-Only Skills<br/>━━━━━━━━━━<br/>diagnose-ci<br/>prepare-pr, compose-pr<br/>review-design<br/>resolve-design-review<br/>run-experiment<br/>dry-walkthrough<br/>write_behavior ≠ always<br/>Graceful degradation<br/>on_context_limit: empty token"]
    end

    %% ─────────────── CONTEXT LIMIT PATHS ─────────────── %%
    subgraph ContextLimit ["● CONTEXT-LIMIT BEHAVIOR (new contract)"]
        direction TB
        CL_COMMIT["Dirty-Tree Commit<br/>━━━━━━━━━━<br/>chore: commit pending changes<br/>before context limit<br/>All disk edits preserved<br/>No work lost on exhaustion"]
        CL_EMPTY["Empty Token Emit<br/>━━━━━━━━━━<br/>emit token = (empty string)<br/>exit 0 / exit non-zero<br/>Orchestrator on_context_limit<br/>routes to recovery step"]
        CL_FALLBACK["Safe Fallback State<br/>━━━━━━━━━━<br/>diagnose-ci: failure_type=unknown<br/>review-design: verdict=STOP<br/>resolve-design-review: resolution=failed<br/>Never leaves pipeline unroutable"]
    end

    %% ─────────────── RESUME DETECTION ─────────────── %%
    subgraph Resume ["RESUME DETECTION TIERS"]
        direction TB
        TIER1["Tier 1: Explicit Resume<br/>━━━━━━━━━━<br/>retry-worktree:<br/>git log scan → identify<br/>complete/partial/unstarted phases<br/>Continue from partial phase"]
        TIER2["Tier 2: Adjust Mode<br/>━━━━━━━━━━<br/>run-experiment --adjust:<br/>reads prior temp results<br/>identifies failure mode<br/>minimal re-run only"]
        TIER3["Tier 3: Implicit Safety<br/>━━━━━━━━━━<br/>Dirty-tree commit guard<br/>All write skills: pre-test +<br/>pre-token commit ensures<br/>no uncommitted edits on disk"]
    end

    %% ─────────────── OUTPUT TOKENS ─────────────── %%
    subgraph Tokens ["STRUCTURED OUTPUT TOKENS (DERIVED)"]
        direction TB
        TOKEN_RULES["Token Emission Rules<br/>━━━━━━━━━━<br/>ALWAYS after dirty-tree guard<br/>Literal plain text (no markdown)<br/>Adjudicator: regex match<br/>Empty value = graceful degradation<br/>%%ORDER_UP%% = pipeline advance"]
    end

    %% ─────────────── FLOW ─────────────── %%
    START --> Lifecycles
    START --> GATE_ARG
    GATE_ARG -->|"pass"| GATE_RECIPE
    GATE_ARG -->|"fail → emit empty token, exit 0"| TOKEN_RULES
    GATE_RECIPE -->|"structural errors accumulate"| GATE_CONTRACT
    GATE_CONTRACT -->|"CI enforcement"| GATE_DIRTY

    GATE_DIRTY --> WRITE_SKILLS
    GATE_RECIPE --> READONLY_SKILLS

    WRITE_SKILLS --> CL_COMMIT
    READONLY_SKILLS --> CL_EMPTY
    READONLY_SKILLS --> CL_FALLBACK

    CL_COMMIT --> TOKEN_RULES
    CL_EMPTY --> TOKEN_RULES
    CL_FALLBACK --> TOKEN_RULES

    TIER1 --> WRITE_SKILLS
    TIER2 --> READONLY_SKILLS
    TIER3 --> WRITE_SKILLS

    INIT_ONLY --> GATE_ARG
    INIT_PRES --> GATE_RECIPE
    MUTABLE --> CL_FALLBACK
    APPEND --> CL_COMMIT

    %% CLASS ASSIGNMENTS %%
    class START terminal;
    class INIT_ONLY detector;
    class INIT_PRES gap;
    class MUTABLE phase;
    class APPEND handler;
    class GATE_ARG detector;
    class GATE_RECIPE stateNode;
    class GATE_CONTRACT stateNode;
    class GATE_DIRTY stateNode;
    class WRITE_SKILLS handler;
    class READONLY_SKILLS phase;
    class CL_COMMIT output;
    class CL_EMPTY output;
    class CL_FALLBACK gap;
    class TIER1 cli;
    class TIER2 cli;
    class TIER3 cli;
    class TOKEN_RULES output;

Closes #894

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/remediation-20260413-220101-339308/.autoskillit/temp/rectify/rectify_context-limit-dirty-tree-immunity_2026-04-13_223000.md

🤖 Generated with Claude Code via AutoSkillit

Token Usage Summary

Step	uncached	output	cache_read	cache_write	count	time
investigate	54	9.2k	759.2k	63.6k	1	6m 23s
rectify	7.5k	19.5k	1.7M	74.6k	1	10m 45s
dry_walkthrough	240	22.6k	1.8M	73.8k	1	5m 52s
implement	82	4.1k	271.0k	34.1k	1	3m 21s
retry_worktree	1.5k	81.3k	19.2M	214.4k	2	33m 4s
prepare_pr	52	6.2k	161.6k	29.5k	1	1m 53s
run_arch_lenses	4.0k	28.3k	727.5k	104.0k	3	13m 31s
compose_pr	75	12.0k	265.2k	37.5k	1	3m 21s
Total	13.5k	183.3k	24.9M	631.5k		1h 18m

Trecek

AutoSkillit PR Review — Verdict: changes_requested

Trecek · 2026-04-14T06:49:39Z

+        from autoskillit.recipe.contracts import load_bundled_manifest, resolve_skill_name
+        from autoskillit.recipe.io import builtin_recipes_dir, load_recipe
+
+        manifest = load_bundled_manifest()


[warning] defense: test_pipeline_file_writing_skills_have_context_limit_section silently skips skills whose SKILL.md does not exist (L527-L528: if not skill_md_path.exists(): continue). A missing SKILL.md likely indicates a misconfiguration; silently skipping weakens the contract guard — the test would pass even when the skill directory exists but has no SKILL.md. Log or fail with a diagnostic message instead of skipping.

Valid observation — flagged for design decision. Whether a missing SKILL.md is a misconfiguration error or a valid configuration state for non-bundled/external skills is a policy question requiring a human decision about the intended contract scope.

Trecek · 2026-04-14T06:49:39Z

+    wf = ctx.recipe
+    findings: list[RuleFinding] = []
+
+    manifest = load_bundled_manifest()


[warning] defense: load_bundled_manifest() is called inside the rule function body at every rule evaluation. If the manifest fails to load, the rule silently returns [] (an empty findings list), causing all file-writing skill checks to be silently skipped rather than reported. The None-return path should log a warning at ERROR severity or the manifest should be validated eagerly.

Investigated — this is intentional. The guard at lines 162-164 explicitly calls logger.warning() before returning — the failure is not silent. Additionally, load_bundled_manifest() is decorated with @lru_cache(maxsize=1) (contracts.py), so it is evaluated once and cached; the framing of 'called at every rule evaluation' is factually incorrect.

Trecek · 2026-04-14T06:49:40Z

+    )
+
+    def test_resolve_failures_has_context_limit_section(self):
+        """resolve-failures SKILL.md must contain '## Context Limit Behavior'."""


[warning] slop: test_resolve_failures_has_context_limit_section is strictly subsumed by test_pipeline_file_writing_skills_have_context_limit_section in the same class: the broader test already checks every write_behavior=always skill (which includes resolve-failures) for the '## Context Limit Behavior' section. The single-skill test provides no additional coverage and will become stale if resolve-failures changes write_behavior. Remove it.

Investigated — this is intentional. test_resolve_failures_has_context_limit_section is NOT subsumed by the broader test: test_pipeline_file_writing_skills_have_context_limit_section only inspects steps where step.on_context_limit is not None (L517 filter). The narrow test checks resolve-failures/SKILL.md content unconditionally, providing independent coverage when resolve-failures is used in steps without on_context_limit declared.

Trecek · 2026-04-14T06:49:40Z


+
+def _is_commit_guard(step_name: str, ctx: ValidationContext) -> bool:
+    """Return True if step_name is a commit_guard predecessor for merge_worktree.


[warning] slop: _is_commit_guard is a 7-line private helper with a self-evident name. The docstring repeats exactly what the 7 lines of code do and adds no information. Remove it.

Investigated — this is intentional. The docstring explains two distinct detection heuristics (name-prefix AND tool/cmd) which is non-trivially useful; the two-branch logic is not self-evident from the function name alone. Not removing.

Trecek · 2026-04-14T06:49:40Z

+  commit_guard:
+    tool: run_cmd
+    with:
+      cmd: "cd ${{ context.worktree_path }} && if [ -n \"$(git status --porcelain)\" ]; then git add -A && git commit -m 'chore: commit pending session changes'; fi"


[warning] defense: commit_guard uses git add -A which stages all untracked files including build artifacts and temp files not covered by .gitignore. If the worktree has .autoskillit/temp/ files or other ignored-but-present artifacts, they will be committed under 'chore: commit pending session changes'. Consider git add -u (tracked files only) unless staging all untracked files is intentional.

Investigated — this is intentional. git add -A is a deliberate design choice: the commit_guard purpose is to catch ALL dirty state including new untracked files from skills (commit e9cf6cf, note field at lines 339-342). Using git add -u would miss untracked files created by skills. The merge_worktree dirty-tree gate is the final arbiter for real issues.

Trecek · 2026-04-14T06:49:40Z

+  commit_guard:
+    tool: run_cmd
+    with:
+      cmd: "cd ${{ context.worktree_path }} && if [ -n \"$(git status --porcelain)\" ]; then git add -A && git commit -m 'chore: commit pending session changes'; fi"


[warning] defense: commit_guard uses git add -A which stages all untracked files. Same concern as implementation.yaml: temp/build artifacts not covered by .gitignore may be committed. This pattern appears in all three recipe files (implementation.yaml, implementation-groups.yaml, remediation.yaml) and should be reviewed consistently.

Investigated — this is intentional. Consistent with implementation.yaml: git add -A across all three recipe files is a deliberate pattern for capturing all dirty state including new untracked files before merge.

Trecek

AutoSkillit review found 10 blocking issues. See inline comments.

Three-layer defense against the class of bug where file-writing skills hit context limits mid-execution, leaving uncommitted edits on disk that later fail the merge_worktree dirty-tree gate — causing expensive 30-min recovery cycles (9 confirmed since March 16). **Layer 1: Recipe commit_guard steps** - Add `commit_guard` step before every `merge_worktree` step in implementation.yaml, implementation-groups.yaml, and remediation.yaml - Auto-commits any uncommitted changes (from context-exhausted skills, pre-commit auto-fixes, or any other source) before merge - on_failure routes to merge so the dirty-tree gate remains the final arbiter **Layer 2: Missing on_context_limit fields** - Add `on_context_limit: re_push` to resolve_ci steps in all 3 pipeline recipes (implementation, remediation, implementation-groups) - Add `on_context_limit: retest` to fix_tests in research.yaml - Add `on_context_limit` to merge-prs.yaml resolve_review_integration and research.yaml generate_report / generate_report_inconclusive steps **Layer 3: Semantic validation rules** - New rule `merge-without-commit-guard` (ERROR): merge_worktree step without a commit_guard predecessor — fires at lint time to block regression - New rule `file-writing-skill-missing-context-limit` (WARNING): run_skill step invoking a write_behavior=always skill without on_context_limit - Rename _WORKTREE_CREATING_SKILLS → _WORKTREE_MODIFYING_SKILLS; add implement-experiment to coverage **Layer 4: SKILL.md Context Limit Behavior sections** - Add ## Context Limit Behavior sections to resolve-failures, resolve-review, generate-report, retry-worktree, run-experiment, compose-pr, prepare-pr, diagnose-ci, dry-walkthrough, resolve-design-review, review-design - Strengthen resolve-failures Step 3 commit instructions to stage all modified files (including pre-commit auto-fixes) before structured output emission **Tests (all new, all pass)** - test_merge_worktree_has_commit_guard_predecessor (parametric, all recipes) - test_resolve_failures_steps_have_on_context_limit (parametric, all recipes) - test_worktree_creating_skills_includes_experiment - test_file_writing_skill_missing_context_limit_fires + negative cases - test_merge_without_commit_guard_fires + negative cases - test_resolve_failures_has_context_limit_section - test_pipeline_file_writing_skills_have_context_limit_section Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

load_bundled_manifest() can return None for a corrupt/empty YAML; the test now asserts clearly rather than raising AttributeError.

…skill The frozenset was already renamed to _WORKTREE_MODIFYING_SKILLS; the semantic rule name, function name, and all test assertions now match.

…uction fn The test helper duplicated _is_commit_guard() from rules_merge.py verbatim. The test now imports and calls the production function directly, eliminating silent drift if the detection logic changes.

Trecek commented Apr 14, 2026

View reviewed changes

Trecek and others added 4 commits April 14, 2026 00:23

fix(review): assert manifest is not None before calling .get() in test

a5cb5f3

load_bundled_manifest() can return None for a corrupt/empty YAML; the test now asserts clearly rather than raising AttributeError.

fix(review): rename retries-on-worktree-creating-skill to -modifying-…

9602782

…skill The frozenset was already renamed to _WORKTREE_MODIFYING_SKILLS; the semantic rule name, function name, and all test assertions now match.

fix(review): remove duplicate _is_commit_guard_step; delegate to prod…

52218be

…uction fn The test helper duplicated _is_commit_guard() from rules_merge.py verbatim. The test now imports and calls the production function directly, eliminating silent drift if the detection logic changes.

Trecek force-pushed the resolve-failures-skill-lacks-context-limit-behavior-leaves-u/894 branch from 16dade1 to 52218be Compare April 14, 2026 07:25

Trecek added this pull request to the merge queue Apr 14, 2026

Merged via the queue into integration with commit 38c424d Apr 14, 2026
2 checks passed

Trecek deleted the resolve-failures-skill-lacks-context-limit-behavior-leaves-u/894 branch April 14, 2026 07:36

Trecek mentioned this pull request Apr 14, 2026

Promote integration to main (198 PRs, 182 issues, 102 fixes, 107 features, 1 infra, 7 tests, 4 docs) #925

Merged



		def _is_commit_guard(step_name: str, ctx: ValidationContext) -> bool:
		"""Return True if step_name is a commit_guard predecessor for merge_worktree.

Conversation

Trecek commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture Impact

Error/Resilience Diagram

Process Flow Diagram

State Lifecycle Diagram

Implementation Plan

Token Usage Summary

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Trecek commented Apr 14, 2026 •

edited

Loading