fix(workflows): make impl pipeline resilient to transient Claude failures by MarkusNeusinger · Pull Request #5410 · MarkusNeusinger/anyplot

MarkusNeusinger · 2026-04-25T21:47:39Z

Summary

The implementation pipeline was leaving PRs and issues stuck after a single Claude Code Action hiccup. Three fixes restore self-healing behavior:

impl-generate.yml: cap raised from 2 → 3 generation attempts, aligning with the existing impl:{lib}:failed label description ("max retries exhausted (3 attempts)") and the repair phase's 3-attempt budget. Failure comments now read Attempt N/3.
impl-repair.yml: previously had no failure handler — when the Claude Code Action itself crashed, the workflow ended with ai-rejected already removed and re-review never fired, leaving the PR silently stuck. Added a Handle repair failure step that restores ai-rejected and auto-retries the same attempt once via a marker comment, then falls back to manual.
impl-review.yml: both failure paths (Claude crash → Handle review failure, and score=0 from missing quality_score.txt → Validate review output) immediately surfaced ai-review-failed, requiring manual rerun. Both now auto-retry once via repository_dispatch with a shared marker comment before giving up.

The >=50% after 3 attempts merge logic in impl-review.yml was already correct and is unchanged — these fixes only ensure PRs reach that gate instead of stalling earlier.

Concrete trigger (not added to the PR but motivated it)

Issue #5365 (area-mountain-panorama) had 4/9 libraries hard-failed without ever creating a PR (transient Claude crashes during generate, capped at 2 attempts), 1 PR stuck with ai-review-failed (plotnine #5372), and 1 PR stuck mid-repair (altair #5370 — repair workflow itself crashed on attempt 1). Manual recovery was triggered earlier in the conversation.

Test plan

Trigger a generate that fails twice (e.g., simulate or wait for transient flake) — should auto-retry to attempt 3 instead of stopping at 2
Trigger a repair where Claude Code Action crashes — should restore ai-rejected and auto-retry the same attempt once via marker comment
Trigger a review where Claude crashes — should auto-retry via repository_dispatch once before adding ai-review-failed
Trigger a review where Claude runs but writes no quality_score.txt — same auto-retry behavior
Verify markers prevent infinite retry loops (each marker only allows one auto-retry)

🤖 Generated with Claude Code

…ures Three resilience gaps were leaving PRs stuck after a single Claude Code Action hiccup: - impl-generate: only allowed 2 attempts before declaring `failed`. Bumped cap to 3, aligning with the existing "max retries exhausted (3 attempts)" label description and the repair phase's 3-attempt budget. - impl-repair: had no failure handler — if the repair workflow itself crashed, `ai-rejected` was already removed and re-review never fired, leaving the PR silently stuck. Added handler that restores the label and auto-retries once via a marker comment, then falls back to manual. - impl-review: both failure paths (Claude crash + score=0 from missing quality_score.txt) immediately surfaced `ai-review-failed`, requiring manual rerun. Both now auto-retry once via repository_dispatch with a shared marker comment before giving up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR hardens the implementation automation workflows against transient Claude Code Action failures so that generate/review/repair loops self-heal instead of leaving issues/PRs stuck.

Changes:

Increase impl-generate failure retry budget to 3 attempts and update failure messaging accordingly.
Add a failure handler to impl-repair to restore ai-rejected and auto-retry a crashed repair attempt once.
Add one-time auto-retry behavior to impl-review for both Claude crashes and missing/invalid review outputs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
`.github/workflows/impl-generate.yml`	Raises generation retry cap to 3 and updates attempt numbering/messages.
`.github/workflows/impl-repair.yml`	Adds failure handling to reapply `ai-rejected` and auto-retry a crashed repair attempt once.
`.github/workflows/impl-review.yml`	Adds marker-based one-time auto-retry via `repository_dispatch` for crash and score=0 failure paths.

Copilot · 2026-04-25T21:51:44Z

+          RETRY_COUNT=$(gh api "repos/${{ github.repository }}/issues/${PR_NUM}/comments" \
+            --jq "[.[] | select(.body | contains(\"$MARKER\"))] | length" 2>/dev/null || echo "0")


The retry guard counts marker comments via gh api .../comments but does not paginate (GitHub API defaults to 30 comments). On PRs with >30 comments, the marker may be older than the first page, causing RETRY_COUNT to incorrectly be 0 and potentially triggering repeated “first” auto-retries.

Consider adding pagination (e.g., gh api --paginate and/or per_page=100) when counting markers, or track the auto-retry via a label/check that can’t be pushed out of the first page.

Suggested change

RETRY_COUNT=$(gh api "repos/${{ github.repository }}/issues/${PR_NUM}/comments" \

--jq "[.[] | select(.body | contains(\"$MARKER\"))] | length" 2>/dev/null || echo "0")

RETRY_COUNT=$(gh api --paginate --slurp "repos/${{ github.repository }}/issues/${PR_NUM}/comments?per_page=100" \

--jq "[.[][] | select(.body | contains(\"$MARKER\"))] | length" 2>/dev/null || echo "0")

Copilot · 2026-04-25T21:51:45Z

+          RETRY_COUNT=$(gh api "repos/${{ github.repository }}/issues/${PR_NUM}/comments" \
+            --jq "[.[] | select(.body | contains(\"$MARKER\"))] | length" 2>/dev/null || echo "0")


Same pagination issue as above: RETRY_COUNT is computed from the first page of PR comments only. If the marker comment is older than the most recent 30 comments, this step will incorrectly think no auto-retry has occurred and will re-dispatch again.

Recommendation: use a paginated comments fetch when searching for the marker, or persist retry state in a label.

Suggested change

RETRY_COUNT=$(gh api "repos/${{ github.repository }}/issues/${PR_NUM}/comments" \

--jq "[.[] | select(.body | contains(\"$MARKER\"))] | length" 2>/dev/null || echo "0")

RETRY_COUNT=$(gh api --paginate "repos/${{ github.repository }}/issues/${PR_NUM}/comments" 2>/dev/null \

| jq -s --arg marker "$MARKER" '[.[][] | select(.body != null and (.body | contains($marker)))] | length' \

|| echo "0")

Copilot · 2026-04-25T21:51:45Z

+          RETRY_COUNT=$(gh api "repos/${{ github.repository }}/issues/${PR_NUM}/comments" \
+            --jq "[.[] | select(.body | contains(\"$MARKER\"))] | length" 2>/dev/null || echo "0")


RETRY_COUNT is derived from gh api .../comments without pagination (default 30). If the retry marker comment is older than the most recent page, this will miscount retries and can cause extra unintended auto-repair reruns.

Suggestion: fetch comments with pagination when searching for the marker (or store retry state in a label) to keep the retry cap reliable on long-lived PRs.

Suggested change

RETRY_COUNT=$(gh api "repos/${{ github.repository }}/issues/${PR_NUM}/comments" \

--jq "[.[] | select(.body | contains(\"$MARKER\"))] | length" 2>/dev/null || echo "0")

RETRY_COUNT=$(

(

gh api --paginate "repos/${{ github.repository }}/issues/${PR_NUM}/comments?per_page=100" \

--jq ".[] | select(.body != null and (.body | contains(\"$MARKER\"))) | 1" 2>/dev/null || true

) | wc -l | tr -d '[:space:]'

)

Copilot · 2026-04-25T21:51:45Z

+          gh api repos/${{ github.repository }}/dispatches \
+            -f event_type=review-pr \
+            -f "client_payload[pr_number]=$PR_NUM"
+          echo "::notice::Auto-re-triggered impl-review.yml for PR #$PR_NUM"


In the “first failure” path this step triggers a re-review via repository_dispatch but then exits successfully, so the workflow run can show as green even though no valid score/verdict was produced. That can be misleading and can also mask a dispatch failure (which would keep the PR stuck).

Consider exiting non-zero after dispatching the retry (without adding ai-review-failed) so the run status accurately reflects that the review did not complete, while still allowing the auto-retry to proceed.

Suggested change

echo "::notice::Auto-re-triggered impl-review.yml for PR #$PR_NUM"

echo "::notice::Auto-re-triggered impl-review.yml for PR #$PR_NUM"

exit 1

Copilot · 2026-04-25T21:51:45Z

+          gh api repos/${{ github.repository }}/dispatches \
+            -f event_type=review-pr \
+            -f "client_payload[pr_number]=$PR_NUM"
+          echo "::notice::Auto-re-triggered impl-review.yml for PR #$PR_NUM"


After dispatching the auto-retry, this step exits successfully, which means the overall impl-review workflow can report success even though the AI review actually crashed and no verdict was produced. If this workflow is used as a required status check, it could incorrectly unblock merges.

Consider failing the job/run after scheduling the retry (or otherwise emitting a failing conclusion) while still avoiding the ai-review-failed label on the first crash.

…etry Address Copilot review on #5410: - Marker counts (impl-generate / impl-repair / impl-review) used the default `gh api .../comments` page size of 30. On long-lived issues/PRs the marker could fall off the first page, causing repeated "first" auto-retries. Switched to `--paginate --per_page=100` so the count is reliable. - impl-review's auto-retry paths now `exit 1` after dispatching the retry, so the run status accurately reflects that no verdict was produced and any dispatch failure stays visible. The auto-retry runs in a separate workflow run, so this doesn't break the recovery chain. - Also paginate the failure-marker count in impl-generate (same bug class, not flagged by Copilot but same fix). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ompts (#5520) ## Summary Three workflows (`impl-review.yml`, `spec-create.yml`, `report-validate.yml`) used shell-style `$VAR` inside `with: prompt: |` blocks of `claude-code-action`. That block is a YAML string handed to a Node/Bun action — **no shell ever runs**, so `$VAR` was sent to Claude as a literal placeholder instead of the actual value. Result: Claude couldn't reliably identify the PR / spec / library to review and silently produced no `quality_score.txt`, which the validate step turns into `ai-review-failed`. ## Symptoms observed today (2026-04-29) 5 stuck implementation PRs from 2026-04-27, all with `ai-review-failed` despite the prior fixes branch (#5410) and the audit branch (#5515) landing in between: | PR | Branch | Pre-fix labels | |----|--------|----------------| | #5476 | seaborn/marimekko-basic | `ai-review-failed`, `quality:78` | | #5480 | altair/marimekko-basic | `ai-review-failed`, `quality:82` | | #5481 | letsplot/marimekko-basic | `ai-rejected`, `quality:76` | | #5483 | plotnine/marimekko-basic | `ai-review-failed` | | #5486 | plotly/line-basic | `ai-review-failed` | Re-dispatching review on each confirmed the bug: the run log of `Run AI Quality Review` shows the prompt being passed verbatim: ``` PROMPT: Read prompts/workflow-prompts/ai-quality-review.md and follow those instructions. Variables for this run: - LIBRARY: $LIBRARY # ← literal, never expanded - SPEC_ID: $SPEC_ID - PR_NUMBER: $PR_NUMBER - ATTEMPT: $ATTEMPT ``` Claude's review then either ran for ~20s and exited with no `quality_score.txt` (4 PRs failed), or recovered by inferring values from cwd (1 PR succeeded with `quality:82`). The intermittent pattern is exactly what you'd expect from "the prompt is ambiguous and Claude has to guess from context." ## Root cause Commit `252977cf3` ("chore: fix critical audit findings", 2026-04-28 22:46) routed several `${{ github.event.* }}` and step-output values through step-level `env:` and rewrote the in-prompt references as `$VAR`. That is the correct mitigation for `run:` shell steps and Python heredocs in the same workflows (and those changes stay in place). Inside `with: prompt: |` it is the wrong tool: the value is consumed by a JS action, not a shell, so there is no injection surface to mitigate and `$VAR` does not interpolate. `spec-create.yml` and `report-validate.yml` carry the identical anti-pattern in their `prompt:` blocks. They haven't surfaced as failures yet only because no triggering issue has come in since 2026-04-28. ## The fix Revert **only** the descriptive header lines of each `prompt:` block back to GitHub Actions Expression syntax (`${{ ... }}`), which the runner substitutes into the YAML string before the action receives it. Keep: - All `env:` blocks (harmless; lets future prompt content reference env vars if useful) - All `$VAR` references inside **embedded bash code samples** in the prompt (e.g. `gh issue edit $ISSUE_NUMBER`). Those are executed by Claude's Bash tool which inherits the step `env:` and expands them correctly — and rewriting them would re-enable the injection vector the audit was right to close. ```diff Variables for this run: - - LIBRARY: $LIBRARY - - SPEC_ID: $SPEC_ID - - PR_NUMBER: $PR_NUMBER - - ATTEMPT: $ATTEMPT + - LIBRARY: ${{ steps.pr.outputs.library }} + - SPEC_ID: ${{ steps.pr.outputs.specification_id }} + - PR_NUMBER: ${{ steps.pr.outputs.pr_number }} + - ATTEMPT: ${{ steps.attempts.outputs.display }} ``` (analogous 8-line revert in `spec-create.yml` × 2 prompt blocks and 4-line revert in `report-validate.yml`). Diff total: **3 files, 16 ±**. ## Test plan - [ ] After merge, redispatch `impl-review.yml` for the 4 stuck PRs (`gh workflow run impl-review.yml -f pr_number=<N>` for 5476, 5483, 5486; 5480 already got a 82 in the redispatch and should now stabilize) - [ ] Verify each run's `Run AI Quality Review` step log shows real values (e.g. `- LIBRARY: plotly`) in the PROMPT echo, not `$LIBRARY` - [ ] Verify `quality_score.txt` is produced and `ai-review-failed` label is removed - [ ] On next `spec-request`-labeled issue, verify the spec-create prompt sees the issue title/body - [ ] On next `report-pending`-labeled issue, verify the report-validate prompt sees the issue title/body 🤖 Generated with [Claude Code](https://claude.com/claude-code)

Copilot AI review requested due to automatic review settings April 25, 2026 21:47

Copilot started reviewing on behalf of MarkusNeusinger April 25, 2026 21:48 View session

Copilot AI reviewed Apr 25, 2026

View reviewed changes

MarkusNeusinger merged commit 048f38f into main Apr 25, 2026
7 checks passed

MarkusNeusinger deleted the fix/workflow-retry-resilience branch April 25, 2026 22:00

MarkusNeusinger mentioned this pull request Apr 29, 2026

fix(workflows): restore Actions Expression interpolation in Claude prompts #5520

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(workflows): make impl pipeline resilient to transient Claude failures#5410

fix(workflows): make impl pipeline resilient to transient Claude failures#5410
MarkusNeusinger merged 2 commits intomainfrom
fix/workflow-retry-resilience

MarkusNeusinger commented Apr 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		RETRY_COUNT=$(gh api "repos/${{ github.repository }}/issues/${PR_NUM}/comments" \
		--jq "[.[] \| select(.body \| contains(\"$MARKER\"))] \| length" 2>/dev/null \|\| echo "0")

	echo "::notice::Auto-re-triggered impl-review.yml for PR #$PR_NUM"
	echo "::notice::Auto-re-triggered impl-review.yml for PR #$PR_NUM"
	exit 1

Conversation

MarkusNeusinger commented Apr 25, 2026

Summary

Concrete trigger (not added to the PR but motivated it)

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants