fix: resolve-failures ci_only_failure verdict when fix was applied by Trecek · Pull Request #1969 · TalonT-Org/AutoSkillit

Trecek · 2026-05-06T01:59:53Z

Summary

The resolve-failures SKILL.md has an ambiguous verdict decision flow that allows an LLM executor to emit ci_only_failure even after successfully applying a fix. The fix restructures the Step 2d decision tree to make the override rule explicit: any time a code change is committed and tests pass, the verdict is real_fix, regardless of failure_subtype. The Step 2d table is clarified to apply ONLY to the "no fix applied" path, and a post-fix-loop verdict override is added to prevent re-evaluation through the wrong decision path.

Requirements

REQ-RF-001: When resolve-failures applies a code change AND the subsequent CI run passes, the verdict MUST be real_fix, not ci_only_failure
REQ-RF-002: ci_only_failure should only be emitted when no fix was applied or when the applied fix did not resolve the CI failure
REQ-RF-003: The fix must not break the existing ci_only_failure path for genuinely unfixable CI failures

Closes #1954

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/impl-20260505-180603-520539/.autoskillit/temp/make-plan/resolve_failures_ci_only_failure_verdict_fix_plan_2026-05-05_181000.md

🤖 Generated with Claude Code via AutoSkillit

Token Usage Summary

Step	count	uncached	output	cache_read	peak_ctx	turns	cache_write	time
plan	1	52	11.3k	677.5k	78.1k	83	52.0k	8m 25s
verify	1	46	13.6k	941.1k	63.6k	49	50.5k	5m 18s
implement	1	3.3M	37.4k	1.7M	25.6k	152	42.0k	16m 53s
prepare_pr	1	60	4.1k	182.5k	35.8k	17	23.4k	1m 27s
compose_pr	1	59	2.2k	159.8k	26.2k	15	13.2k	56s
review_pr	1	260	21.0k	809.4k	59.7k	59	48.5k	6m 59s
resolve_review	1	197	9.5k	911.4k	50.7k	58	38.0k	4m 57s
Total		3.3M	99.1k	5.3M	78.1k		267.7k	44m 57s

Token Efficiency

Step	LoC Changed	cache_read/LoC	cache_write/LoC	output/LoC
plan	0	—	—	—
verify	0	—	—	—
implement	114	14574.8	368.9	327.9
prepare_pr	0	—	—	—
compose_pr	0	—	—	—
review_pr	0	—	—	—
resolve_review	4	227847.0	9511.0	2371.2
Total	118	45281.8	2268.6	839.7

- Add Step 2c: Verdict Override Rule — any fix committed + tests pass → verdict = real_fix unconditionally, regardless of failure_subtype - Rename Step 2d to "No-Fix Verdict Decision Tree" and scope it to fixes_applied == 0 only - Remove confusing rows from decision table (fix-applied path is now handled exclusively by Step 2c/Step 3) - Add Step 2c override to Step 3 green exit condition - Add Step 2c override to Step 2.5 fix-loop entry - Update verdict invariant: ci_only_failure never emitted when fix applied - Add four new structural tests in test_resolve_failures_ci_aware.py (REQ-RF-001/002 guards) - Minor rewording of Step 2a, 2c, 2d, 2.5 body references to avoid spurious regex matches in existing test patterns Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Trecek

AutoSkillit PR Review — Verdict: approved_with_comments

Trecek

AutoSkillit review: warning-only findings detected. See inline comments — no blocking changes required.

Info findings (no action required):

tests/skills/test_resolve_failures_ci_aware.py

L229 [info/tests]: test_step2d_table_scoped_to_no_fix_path includes "without entering step 3" as a valid matching phrase, but SKILL.md uses "the fix loop" in all prose after the rename. This phrase is dead and will never match. Replace with "fix loop was entered" or similar.
L204 [info/tests]: test_skill_fix_applied_overrides_to_real_fix uses assert re.search(...) or re.search(...), msg with two alternatives. If both fail, the error message gives no hint as to which pattern was expected. Consider splitting or embedding pattern strings in the failure message.

…truncation Regex r"Step 3.*?Step 4" stopped at the first inline "Step 4" cross-reference inside the Step 3 body rather than at the ### Step 4 section boundary, making the test fragile. Replace with r"### Step 3.*?(?=\n### Step [45]|\Z)" which anchors on Markdown section headers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…1969) ## Summary The `resolve-failures` SKILL.md has an ambiguous verdict decision flow that allows an LLM executor to emit `ci_only_failure` even after successfully applying a fix. The fix restructures the Step 2d decision tree to make the override rule explicit: **any time a code change is committed and tests pass, the verdict is `real_fix`, regardless of `failure_subtype`**. The Step 2d table is clarified to apply ONLY to the "no fix applied" path, and a post-fix-loop verdict override is added to prevent re-evaluation through the wrong decision path. ## Requirements - REQ-RF-001: When `resolve-failures` applies a code change AND the subsequent CI run passes, the verdict MUST be `real_fix`, not `ci_only_failure` - REQ-RF-002: `ci_only_failure` should only be emitted when no fix was applied or when the applied fix did not resolve the CI failure - REQ-RF-003: The fix must not break the existing `ci_only_failure` path for genuinely unfixable CI failures Closes #1954 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260505-180603-520539/.autoskillit/temp/make-plan/resolve_failures_ci_only_failure_verdict_fix_plan_2026-05-05_181000.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit  --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Trecek commented May 6, 2026

View reviewed changes

Comment thread tests/skills/test_resolve_failures_ci_aware.py Outdated

Trecek commented May 6, 2026

View reviewed changes

Trecek added this pull request to the merge queue May 6, 2026

Merged via the queue into develop with commit 11a9561 May 6, 2026
2 checks passed

Trecek deleted the resolve-failures-skill-ci-only-failure-verdict-when-fix-was/1954 branch May 6, 2026 02:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve-failures ci_only_failure verdict when fix was applied#1969

fix: resolve-failures ci_only_failure verdict when fix was applied#1969
Trecek merged 2 commits into
developfrom
resolve-failures-skill-ci-only-failure-verdict-when-fix-was/1954

Trecek commented May 6, 2026 •

edited

Loading

Uh oh!

Trecek left a comment

Uh oh!

Uh oh!

Trecek left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Trecek commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Requirements

Implementation Plan

Token Usage Summary

Token Efficiency

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Trecek commented May 6, 2026 •

edited

Loading