Skip to content

Implementation Plan: resolve-failures — Test Polling Cascade + Output Context Bloat#862

Merged
Trecek merged 3 commits intointegrationfrom
resolve-failures-test-polling-cascade-from-bash-backgroundin/815
Apr 14, 2026
Merged

Implementation Plan: resolve-failures — Test Polling Cascade + Output Context Bloat#862
Trecek merged 3 commits intointegrationfrom
resolve-failures-test-polling-cascade-from-bash-backgroundin/815

Conversation

@Trecek
Copy link
Copy Markdown
Collaborator

@Trecek Trecek commented Apr 14, 2026

Summary

Two compounding inefficiencies in resolve-failures/SKILL.md waste tokens and API calls:

  1. Polling cascade (22+ wasted API calls per session): Steps 2 and 3 instruct test
    execution via `cd {worktree_path} && {test_command}` through the Bash tool. When
    tests exceed the Bash timeout (~5–7 min), the command auto-backgrounds and the LLM
    enters a manual polling loop of 18+ Read/Bash calls to wait for completion.

  2. Output context bloat (~17K tokens per session): Each test run injects 3–10K chars
    of raw pytest stdout (progress dots, install lines, timing) into conversation context.
    Across 5+ runs this accumulates to ~68K chars carried in every subsequent API call.

The fix is entirely in resolve-failures/SKILL.md: replace both Bash-based test
invocations with test_check MCP tool calls (which block synchronously, returning in one
call), and add an instruction in the fix loop to extract only the failure signal from each
result and discard the full stdout.

Architecture Impact

Process Flow Diagram

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    START(["START<br/>run_skill resolve-failures"])
    COMPLETE(["COMPLETE<br/>fixes_applied = N emitted"])
    ERROR(["ERROR<br/>max iterations reached"])

    subgraph Setup ["Setup & Validation — Steps 0 through 0.7"]
        direction TB
        S0["Step 0: Validate Args<br/>━━━━━━━━━━<br/>Path-detect worktree_path,<br/>plan_path, base_branch tokens"]
        S03["Step 0.3: Code-Index Init<br/>━━━━━━━━━━<br/>set_project_path repo root<br/>enables find_files / search_code"]
        S05["Step 0.5: Commit Dirty Tree<br/>━━━━━━━━━━<br/>git status --porcelain<br/>auto-commit if non-empty"]
        S07["Step 0.7: Switch Code-Index<br/>━━━━━━━━━━<br/>set_project_path worktree<br/>isolates queries to worktree"]
    end

    subgraph Context ["Context Understanding — Steps 1 and 2a"]
        direction TB
        S1["Step 1: Read Plan + Git History<br/>━━━━━━━━━━<br/>Plan intent, git log --oneline,<br/>git diff --stat from merge-base"]
        S2a["Step 2a: Reproduce CI Env<br/>━━━━━━━━━━<br/>Read CI workflow; run pre-test<br/>generation steps if present"]
    end

    subgraph TestRun ["● Step 2: Test Run — Single Synchronous MCP Call"]
        direction TB
        S2["● test_check MCP<br/>━━━━━━━━━━<br/>test_check(worktree_path)<br/>Blocks until suite completes<br/>Returns passed true or false<br/>in exactly 1 API call<br/><br/>BEFORE: Bash auto-backgrounds<br/>after timeout; LLM polls<br/>via 20+ Read/Bash calls"]
        D_PASS{"passed?"}
    end

    subgraph FixLoop ["● Step 3: Fix Loop — max 3 iterations"]
        direction TB
        S3A["Analyze Failures vs Plan<br/>━━━━━━━━━━<br/>Root cause vs implementation intent"]
        S3B["Apply Targeted Fixes<br/>━━━━━━━━━━<br/>pre-commit run --all-files<br/>git commit: fix description"]
        S3C["Write Fix Log<br/>━━━━━━━━━━<br/>AUTOSKILLIT_TEMP/resolve-failures/<br/>fix_log_N_ts.md per iteration"]
        S3D["● test_check MCP re-run<br/>━━━━━━━━━━<br/>test_check(worktree_path)<br/>Blocks synchronously<br/>BEFORE: Bash re-run, same<br/>polling-cascade risk"]
        S3E["● Extract Failure Signal<br/>━━━━━━━━━━<br/>Retain: counts, test names,<br/>error msgs 10-15 lines each<br/>Discard: full pytest stdout,<br/>progress dots, timing lines"]
        D_GREEN{"green?"}
        D_ITER{"iter >= 3?"}
    end

    subgraph GuardSuite ["★ tests/skills/test_resolve_failures_guards.py — new file"]
        direction TB
        G12["★ Step 2 Poll-Cascade Guards<br/>━━━━━━━━━━<br/>prohibits_bash_test_run<br/>prescribes_test_check_mcp"]
        G34["★ Step 3 Poll-Cascade Guards<br/>━━━━━━━━━━<br/>prohibits_bash_rerun<br/>prescribes_test_check_mcp"]
        G56["★ Step 3 Output-Bloat Guards<br/>━━━━━━━━━━<br/>instructs_output_summarization<br/>instructs_stdout_discard"]
    end

    %% MAIN FLOW %%
    START --> S0
    S0 --> S03
    S03 --> S05
    S05 --> S07
    S07 --> S1
    S1 --> S2a
    S2a --> S2
    S2 --> D_PASS
    D_PASS -->|"passed: true"| COMPLETE
    D_PASS -->|"passed: false"| S3A
    S3A --> S3B
    S3B --> S3C
    S3C --> S3D
    S3D --> S3E
    S3E --> D_GREEN
    D_GREEN -->|"green"| COMPLETE
    D_GREEN -->|"still red"| D_ITER
    D_ITER -->|"yes — report failure"| ERROR
    D_ITER -->|"no — next iteration"| S3A

    %% GUARD CONNECTIONS (build-time validation of SKILL.md text) %%
    G12 -. "validates SKILL.md Step 2" .-> S2
    G34 -. "validates SKILL.md Step 3 re-run" .-> S3D
    G56 -. "validates SKILL.md Step 3 extract" .-> S3E

    %% CLASS ASSIGNMENTS %%
    class START,COMPLETE,ERROR terminal;
    class S0,S03,S05,S07 phase;
    class S1,S2a handler;
    class S2,S3D stateNode;
    class D_PASS,D_GREEN,D_ITER stateNode;
    class S3A,S3B,S3C handler;
    class S3E output;
    class G12,G34,G56 newComponent;
Loading

Error/Resilience Diagram

%%{init: {'flowchart': {'nodeSpacing': 42, 'rankSpacing': 54, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    %% ENTRY POINT %%
    INVOKE(["run_skill → resolve-failures<br/>━━━━━━━━━━<br/>Orchestrator triggers after merge FAIL"])

    subgraph OLD ["❌ ELIMINATED FAILURE PATH (old Bash invocation)"]
        direction TB
        BASH["Bash: cd {worktree} &&<br/>task test-all<br/>━━━━━━━━━━<br/>Blocks on 5–7 min test suite"]
        TIMEOUT["Bash Auto-Background<br/>━━━━━━━━━━<br/>Built-in timeout exceeded<br/>Command silently backgrounded"]
        LOST_SIGNAL["Lost DONE Signal<br/>━━━━━━━━━━<br/>LLM has no completion event<br/>Process runs detached"]
        CASCADE["Polling Cascade<br/>━━━━━━━━━━<br/>18–22+ API calls (Read/Bash)<br/>Full context resent each call"]
        BLOAT["Output Bloat<br/>━━━━━━━━━━<br/>3–10K chars pytest stdout/run<br/>~17K tokens accumulated/session<br/>cache_read cost inflated"]
    end

    subgraph STEP2 ["● STEP 2 — test_check BLOCKING PATH (resolve-failures/SKILL.md)"]
        direction TB
        TC_CALL["● test_check MCP Tool<br/>━━━━━━━━━━<br/>server/tools_workspace.py<br/>Single blocking async call"]
        RUNNER["DefaultTestRunner.run()<br/>━━━━━━━━━━<br/>execution/testing.py<br/>Configures cmd + timeout + env"]
        MANAGED["run_managed_async()<br/>━━━━━━━━━━<br/>execution/process.py<br/>• Temp-file I/O (not pipes)<br/>• start_new_session=True<br/>• stdin=DEVNULL<br/>• anyio task-group watchers"]
        PROC_DONE["Process Exits / Timeout<br/>━━━━━━━━━━<br/>On timeout: SIGTERM → SIGKILL<br/>process tree killed"]
        TC_RESULT["Test Result<br/>━━━━━━━━━━<br/>passed: bool, stdout, stderr<br/>Single atomic MCP response"]
    end

    subgraph STEP3 ["● STEP 3 — FIX LOOP (circuit-breaker: max 3 iterations)"]
        direction TB
        APPLY_FIX["Apply Code Fix<br/>━━━━━━━━━━<br/>Read/Edit/Write on failing tests<br/>No unrelated changes"]
        RERUN["● test_check MCP Re-run<br/>━━━━━━━━━━<br/>Same blocking path as Step 2<br/>NOT Bash — identical isolation"]
        EXTRACT["● Output Extraction<br/>━━━━━━━━━━<br/>Retain ONLY:<br/>• pass/fail counts<br/>• failing test names<br/>• first 10–15 lines/error"]
        DISCARD["● Stdout Discard<br/>━━━━━━━━━━<br/>Drop: full pytest stdout<br/>progress dots, timing lines<br/>install-worktree output"]
        CB{"Circuit Breaker<br/>━━━━━━━━━━<br/>iteration < 3?"}
    end

    subgraph GUARDS ["★ REGRESSION GUARDS (tests/skills/test_resolve_failures_guards.py)"]
        direction LR
        G1["★ prohibits_bash_test_run<br/>━━━━━━━━━━<br/>Step 2: 'cd {worktree}' absent"]
        G2["★ prescribes_test_check_mcp<br/>━━━━━━━━━━<br/>Step 2: 'test_check' present"]
        G3["★ fix_loop_prohibits_bash_rerun<br/>━━━━━━━━━━<br/>Step 3: 'cd {worktree}' absent"]
        G4["★ fix_loop_prescribes_test_check<br/>━━━━━━━━━━<br/>Step 3: 'test_check' present"]
        G5["★ instructs_output_summarization<br/>━━━━━━━━━━<br/>Step 3: extract|retain only<br/>failure signal present"]
        G6["★ instructs_stdout_discard<br/>━━━━━━━━━━<br/>Step 3: discard|do not retain<br/>full output present"]
    end

    %% TERMINAL STATES %%
    T_GREEN(["WORKTREE GREEN<br/>━━━━━━━━━━<br/>passed=true → merge-ready"])
    T_RED(["FAILURE — MAX ITERATIONS<br/>━━━━━━━━━━<br/>Worktree preserved for inspection<br/>Orchestrator notified"])
    T_CASCADE(["UNCONTROLLED CASCADE<br/>━━━━━━━━━━<br/>Session cost explodes"])

    %% ENTRY ROUTING %%
    INVOKE -.->|"OLD: eliminated"| BASH
    INVOKE -->|"NEW: active"| TC_CALL

    %% OLD FAILURE PATH (dead end) %%
    BASH -->|"timeout fires"| TIMEOUT
    TIMEOUT -->|"silent background"| LOST_SIGNAL
    LOST_SIGNAL -->|"LLM polls"| CASCADE
    CASCADE -->|"accumulates"| BLOAT
    BLOAT --> T_CASCADE

    %% NEW EXECUTION PATH %%
    TC_CALL --> RUNNER
    RUNNER --> MANAGED
    MANAGED -->|"process lifecycle"| PROC_DONE
    PROC_DONE --> TC_RESULT

    %% RESULT ROUTING %%
    TC_RESULT -->|"passed=true"| T_GREEN
    TC_RESULT -->|"passed=false"| APPLY_FIX

    %% FIX LOOP %%
    APPLY_FIX --> RERUN
    RERUN --> EXTRACT
    EXTRACT --> DISCARD
    DISCARD --> CB
    CB -->|"yes — another iteration"| APPLY_FIX
    CB -->|"no — exhausted"| T_RED

    %% GUARD ENFORCEMENT (static analysis, dashed) %%
    G1 -.->|"asserts absent"| TC_CALL
    G2 -.->|"asserts present"| TC_CALL
    G3 -.->|"asserts absent"| RERUN
    G4 -.->|"asserts present"| RERUN
    G5 -.->|"asserts present"| EXTRACT
    G6 -.->|"asserts present"| DISCARD

    %% CLASS ASSIGNMENTS %%
    class INVOKE terminal;
    class BASH,TIMEOUT integration;
    class LOST_SIGNAL,CASCADE,BLOAT,T_CASCADE gap;
    class TC_CALL,RERUN handler;
    class RUNNER,MANAGED phase;
    class PROC_DONE,APPLY_FIX handler;
    class TC_RESULT stateNode;
    class EXTRACT,DISCARD output;
    class CB stateNode;
    class G1,G2,G3,G4,G5,G6 detector;
    class T_GREEN,T_RED terminal;
Loading

Scenarios Diagram

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart LR
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    subgraph S1 ["S1 (BEFORE): Bash Polling Cascade — Mitigated"]
        direction LR
        S1_SKILL["● SKILL.md Step 2<br/>━━━━━━━━━━<br/>Old: run tests via Bash"]
        S1_BASH["Bash Tool<br/>━━━━━━━━━━<br/>task test-all"]
        S1_BG["Auto-Background<br/>━━━━━━━━━━<br/>test suite over 5 min<br/>command detaches"]
        S1_POLL["LLM Polling Loop<br/>━━━━━━━━━━<br/>20+ API calls<br/>to detect completion"]
    end

    subgraph S2 ["S2 (AFTER): test_check Single Blocking Call"]
        direction LR
        S2_SKILL["● SKILL.md Step 2<br/>━━━━━━━━━━<br/>Now: use test_check MCP<br/>not Bash"]
        S2_MCP["test_check MCP<br/>━━━━━━━━━━<br/>tools_workspace.py<br/>Blocking async call"]
        S2_PROC["subprocess<br/>━━━━━━━━━━<br/>task test-all<br/>awaited to completion"]
        S2_ADJ["check_test_passed()<br/>━━━━━━━━━━<br/>Cross-validates exit code<br/>and pytest summary lines"]
        S2_RESULT["Structured Result<br/>━━━━━━━━━━<br/>passed: true/false<br/>One call — no polling"]
    end

    subgraph S3 ["S3 (AFTER): Fix Loop — Summarized Output (max 3 iterations)"]
        direction LR
        S3_SKILL["● SKILL.md Step 3<br/>━━━━━━━━━━<br/>Fix loop: test_check only<br/>Do NOT re-run via Bash"]
        S3_FIX["Apply Fix<br/>━━━━━━━━━━<br/>Edit source files<br/>in worktree"]
        S3_MCP["test_check MCP<br/>━━━━━━━━━━<br/>Re-run tests<br/>per iteration"]
        S3_EXTRACT["Extract Signal<br/>━━━━━━━━━━<br/>pass/fail counts<br/>failing test names<br/>first 10-15 error lines"]
        S3_DISCARD["Discard stdout<br/>━━━━━━━━━━<br/>No dots / timing<br/>No install output"]
        S3_CTX["Bounded Context<br/>━━━━━━━━━━<br/>Minimal token growth<br/>across all iterations"]
    end

    subgraph S4 ["S4: Guard Enforcement — Structural SKILL.md Validation"]
        direction LR
        S4_CI["CI / task test-check<br/>━━━━━━━━━━<br/>Runs test suite in parallel"]
        S4_GUARD["★ test_resolve_failures_guards.py<br/>━━━━━━━━━━<br/>6 regex guard tests<br/>xdist-safe: pure text assertions"]
        S4_LOAD["SKILL.md loaded<br/>━━━━━━━━━━<br/>pkg_root() path lookup<br/>read once at module scope"]
        S4_SCOPE["_extract_step()<br/>━━━━━━━━━━<br/>Regex scope to Step 2 or 3<br/>lookahead: Step 2 not Step 2a"]
        S4_POLLING["4 Polling-Cascade Guards<br/>━━━━━━━━━━<br/>Bash pattern ABSENT<br/>test_check PRESENT<br/>in Steps 2 and 3"]
        S4_BLOAT["2 Output-Bloat Guards<br/>━━━━━━━━━━<br/>summarize instruction present<br/>discard instruction present<br/>in Step 3"]
        S4_PASS["CI PASS<br/>━━━━━━━━━━<br/>SKILL.md structurally<br/>compliant — both fixes hold"]
    end

    %% S1: Bash polling cascade flow
    S1_SKILL -->|"LLM invokes Bash"| S1_BASH
    S1_BASH -->|"auto-backgrounds"| S1_BG
    S1_BG -->|"LLM must detect completion"| S1_POLL

    %% S2: test_check blocking flow
    S2_SKILL -->|"LLM invokes MCP"| S2_MCP
    S2_MCP -->|"awaits subprocess"| S2_PROC
    S2_PROC -->|"exit code + stdout"| S2_ADJ
    S2_ADJ -->|"single JSON response"| S2_RESULT

    %% S3: fix loop summarization flow
    S3_SKILL --> S3_FIX
    S3_FIX -->|"test_check MCP call"| S3_MCP
    S3_MCP -->|"full stdout returned"| S3_EXTRACT
    S3_EXTRACT -->|"retain failure signal only"| S3_DISCARD
    S3_DISCARD -->|"loop if still failing"| S3_CTX

    %% S4: guard enforcement flow
    S4_CI --> S4_GUARD
    S4_GUARD -->|"reads"| S4_LOAD
    S4_LOAD -->|"scopes text"| S4_SCOPE
    S4_SCOPE -->|"Steps 2 + 3"| S4_POLLING
    S4_SCOPE -->|"Step 3 only"| S4_BLOAT
    S4_POLLING -->|"assert"| S4_PASS
    S4_BLOAT -->|"assert"| S4_PASS

    %% CLASS ASSIGNMENTS %%
    class S1_SKILL,S2_SKILL,S3_SKILL phase;
    class S1_BASH,S1_BG,S1_POLL gap;
    class S2_MCP,S3_MCP handler;
    class S2_PROC,S2_ADJ phase;
    class S2_RESULT,S3_CTX output;
    class S3_FIX handler;
    class S3_EXTRACT phase;
    class S3_DISCARD detector;
    class S4_CI cli;
    class S4_GUARD newComponent;
    class S4_LOAD stateNode;
    class S4_SCOPE phase;
    class S4_POLLING,S4_BLOAT detector;
    class S4_PASS output;
Loading

Closes #815

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/impl-20260413-185024-166119/.autoskillit/temp/make-plan/resolve-failures-test-polling-cascade_plan_2026-04-13_185340.md

🤖 Generated with Claude Code via AutoSkillit

Token Usage Summary

Step uncached output cache_read cache_write count time
plan 225 15.6k 1.2M 70.1k 1 4m 19s
review 1.3k 5.8k 152.3k 26.3k 1 3m 45s
verify 140 13.5k 678.9k 41.1k 1 3m 43s
implement 184 7.5k 788.3k 32.5k 1 5m 13s
prepare_pr 102 5.5k 323.2k 23.8k 1 1m 48s
run_arch_lenses 313 40.1k 925.6k 200.7k 6 13m 52s
compose_pr 59 10.1k 210.4k 33.8k 1 2m 18s
Total 2.3k 98.1k 4.3M 428.3k 35m 1s

Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit PR Review — Verdict: changes_requested

step3 = _extract_step(text, "Step 3")
# Must mention extracting/retaining only the key failure info
has_extract = bool(
re.search(r"extract|retain only|only.{0,40}(fail|count|name)", step3, re.IGNORECASE)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] bugs: Duplicate line number [L89] in the diff annotation — both the re.search(...) call and its closing ) carry the L89 marker; L90 is skipped entirely. In the actual source file, every line from 91 onward maps to one line number lower than the diff label, causing off-by-one navigation errors in editors and coverage tools. The annotation tooling should be fixed to emit unique sequential markers.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. The test file at line 89 is functionally correct. The duplicate L89 annotation label is an artifact of how diff_annotator.py formats diff hunks, not a bug in this test. The actual source lines 89 (re.search call) and 90 (closing paren) are correctly sequenced. Any fix to the annotation tooling belongs in a separate issue against diff_annotator.py, not in this test file.

Comment thread tests/skills/test_resolve_failures_guards.py Outdated
SKILL_MD = pkg_root() / "skills_extended" / "resolve-failures" / "SKILL.md"


def _skill_text() -> str:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] defense: _skill_text() calls SKILL_MD.read_text() with no error handling. A missing or unreadable file surfaces as a bare FileNotFoundError/PermissionError with no context about which path failed. Wrap in a try/except that re-raises with the resolved path so CI diagnostics are self-explanatory.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. Python's Path.read_text() raises FileNotFoundError with the full path already embedded in the OS-level error message (e.g., [Errno 2] No such file or directory: '/path/to/SKILL.md'). The reviewer's concern about 'no context about which path failed' does not apply here — the path is included automatically. Additionally, CLAUDE.md guidelines prohibit adding error handling for scenarios that cannot occur in a correctly installed package. No code change applied.


# --- Polling-cascade fix ---

def test_resolve_failures_step2_prohibits_bash_test_run() -> None:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] tests: test_resolve_failures_step2_prohibits_bash_test_run and test_resolve_failures_step2_prescribes_test_check_mcp both call _extract_step(text, 'Step 2') and verify the same invariant. Separate tests add no isolation value; consider collapsing into a single test with two assertions.


from autoskillit.core.paths import pkg_root

SKILL_MD = pkg_root() / "skills_extended" / "resolve-failures" / "SKILL.md"
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] tests: SKILL_MD is computed at module import time via pkg_root(). If pkg_root() raises, the error surfaces as a pytest collection error rather than a clear test failure. Deferring the computation inside _skill_text() would make failures more transparent.

"""
text = _skill_text()
step3 = _extract_step(text, "Step 3")
# Must mention extracting/retaining only the key failure info
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] slop: The inline comment # Must mention extracting/retaining only the key failure info restates exactly what the docstring and assertion message already say. Remove it.

```
test_check(worktree_path="{worktree_path}")
```
Do NOT re-run via Bash — see Step 2 rationale.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] slop: "Do NOT re-run via Bash — see Step 2 rationale." duplicates the rationale already stated in full in Step 2. The cross-reference alone is sufficient; the redundant prose can be dropped.

Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit review found 3 blocking issues (warnings). See inline comments. (Note: REQUEST_CHANGES is not available on own PRs — verdict is changes_requested.)

@Trecek Trecek enabled auto-merge April 14, 2026 04:43
Trecek added 3 commits April 13, 2026 21:56
…t fixes

Six new tests that verify:
- Step 2 and Step 3 no longer instruct running tests via Bash (cd {worktree_path} && {test_command})
- Both steps prescribe test_check MCP tool instead
- Step 3 instructs extracting only the failure signal and discarding full stdout
…lures skill

Eliminates polling cascade (22+ wasted API calls) caused by Bash auto-backgrounding
long-running test commands. Replaces both Step 2 and Step 3 cd/test_command patterns
with synchronous test_check MCP calls. Adds Step 3 instruction to extract and discard
full pytest stdout, preventing ~17K token context accumulation per session.
… anchor on pytest/stdout context

Replaces the overly broad `discard` alternative with `discard.{0,60}(pytest|stdout|full output)`,
which requires the keyword to appear in the context of discarding test output — preventing false
passes on unrelated uses of "discard" in SKILL.md prose.
@Trecek Trecek force-pushed the resolve-failures-test-polling-cascade-from-bash-backgroundin/815 branch from 327ae5d to a865416 Compare April 14, 2026 05:00
@Trecek Trecek added this pull request to the merge queue Apr 14, 2026
Merged via the queue into integration with commit 0a6c310 Apr 14, 2026
2 checks passed
@Trecek Trecek deleted the resolve-failures-test-polling-cascade-from-bash-backgroundin/815 branch April 14, 2026 05:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant