Skip to content

fix: resolve-failures ci_only_failure verdict when fix was applied#1969

Merged
Trecek merged 2 commits into
developfrom
resolve-failures-skill-ci-only-failure-verdict-when-fix-was/1954
May 6, 2026
Merged

fix: resolve-failures ci_only_failure verdict when fix was applied#1969
Trecek merged 2 commits into
developfrom
resolve-failures-skill-ci-only-failure-verdict-when-fix-was/1954

Conversation

@Trecek
Copy link
Copy Markdown
Collaborator

@Trecek Trecek commented May 6, 2026

Summary

The resolve-failures SKILL.md has an ambiguous verdict decision flow that allows an LLM executor to emit ci_only_failure even after successfully applying a fix. The fix restructures the Step 2d decision tree to make the override rule explicit: any time a code change is committed and tests pass, the verdict is real_fix, regardless of failure_subtype. The Step 2d table is clarified to apply ONLY to the "no fix applied" path, and a post-fix-loop verdict override is added to prevent re-evaluation through the wrong decision path.

Requirements

  • REQ-RF-001: When resolve-failures applies a code change AND the subsequent CI run passes, the verdict MUST be real_fix, not ci_only_failure
  • REQ-RF-002: ci_only_failure should only be emitted when no fix was applied or when the applied fix did not resolve the CI failure
  • REQ-RF-003: The fix must not break the existing ci_only_failure path for genuinely unfixable CI failures

Closes #1954

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/impl-20260505-180603-520539/.autoskillit/temp/make-plan/resolve_failures_ci_only_failure_verdict_fix_plan_2026-05-05_181000.md

🤖 Generated with Claude Code via AutoSkillit

Token Usage Summary

Step count uncached output cache_read peak_ctx turns cache_write time
plan 1 52 11.3k 677.5k 78.1k 83 52.0k 8m 25s
verify 1 46 13.6k 941.1k 63.6k 49 50.5k 5m 18s
implement 1 3.3M 37.4k 1.7M 25.6k 152 42.0k 16m 53s
prepare_pr 1 60 4.1k 182.5k 35.8k 17 23.4k 1m 27s
compose_pr 1 59 2.2k 159.8k 26.2k 15 13.2k 56s
review_pr 1 260 21.0k 809.4k 59.7k 59 48.5k 6m 59s
resolve_review 1 197 9.5k 911.4k 50.7k 58 38.0k 4m 57s
Total 3.3M 99.1k 5.3M 78.1k 267.7k 44m 57s

Token Efficiency

Step LoC Changed cache_read/LoC cache_write/LoC output/LoC
plan 0
verify 0
implement 114 14574.8 368.9 327.9
prepare_pr 0
compose_pr 0
review_pr 0
resolve_review 4 227847.0 9511.0 2371.2
Total 118 45281.8 2268.6 839.7

- Add Step 2c: Verdict Override Rule — any fix committed + tests pass
  → verdict = real_fix unconditionally, regardless of failure_subtype
- Rename Step 2d to "No-Fix Verdict Decision Tree" and scope it to
  fixes_applied == 0 only
- Remove confusing rows from decision table (fix-applied path is now
  handled exclusively by Step 2c/Step 3)
- Add Step 2c override to Step 3 green exit condition
- Add Step 2c override to Step 2.5 fix-loop entry
- Update verdict invariant: ci_only_failure never emitted when fix applied
- Add four new structural tests in test_resolve_failures_ci_aware.py
  (REQ-RF-001/002 guards)
- Minor rewording of Step 2a, 2c, 2d, 2.5 body references to avoid
  spurious regex matches in existing test patterns

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit PR Review — Verdict: approved_with_comments

Comment thread tests/skills/test_resolve_failures_ci_aware.py Outdated
Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit review: warning-only findings detected. See inline comments — no blocking changes required.

Info findings (no action required):

tests/skills/test_resolve_failures_ci_aware.py

  • L229 [info/tests]: test_step2d_table_scoped_to_no_fix_path includes "without entering step 3" as a valid matching phrase, but SKILL.md uses "the fix loop" in all prose after the rename. This phrase is dead and will never match. Replace with "fix loop was entered" or similar.
  • L204 [info/tests]: test_skill_fix_applied_overrides_to_real_fix uses assert re.search(...) or re.search(...), msg with two alternatives. If both fail, the error message gives no hint as to which pattern was expected. Consider splitting or embedding pattern strings in the failure message.

…truncation

Regex r"Step 3.*?Step 4" stopped at the first inline "Step 4" cross-reference
inside the Step 3 body rather than at the ### Step 4 section boundary, making
the test fragile. Replace with r"### Step 3.*?(?=\n### Step [45]|\Z)" which
anchors on Markdown section headers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Trecek Trecek added this pull request to the merge queue May 6, 2026
Merged via the queue into develop with commit 11a9561 May 6, 2026
2 checks passed
@Trecek Trecek deleted the resolve-failures-skill-ci-only-failure-verdict-when-fix-was/1954 branch May 6, 2026 02:51
Trecek added a commit that referenced this pull request May 8, 2026
…1969)

## Summary

The `resolve-failures` SKILL.md has an ambiguous verdict decision flow
that allows an LLM executor to emit `ci_only_failure` even after
successfully applying a fix. The fix restructures the Step 2d decision
tree to make the override rule explicit: **any time a code change is
committed and tests pass, the verdict is `real_fix`, regardless of
`failure_subtype`**. The Step 2d table is clarified to apply ONLY to the
"no fix applied" path, and a post-fix-loop verdict override is added to
prevent re-evaluation through the wrong decision path.

## Requirements

- REQ-RF-001: When `resolve-failures` applies a code change AND the
subsequent CI run passes, the verdict MUST be `real_fix`, not
`ci_only_failure`
- REQ-RF-002: `ci_only_failure` should only be emitted when no fix was
applied or when the applied fix did not resolve the CI failure
- REQ-RF-003: The fix must not break the existing `ci_only_failure` path
for genuinely unfixable CI failures

Closes #1954

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-20260505-180603-520539/.autoskillit/temp/make-plan/resolve_failures_ci_only_failure_verdict_fix_plan_2026-05-05_181000.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit
<!-- autoskillit:pipeline-signature
steps=prepare_pr,run_arch_lenses,compose_pr,annotate_pr_diff,review_pr
-->

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant