ci(claude): migrate to claude-code-action v1.0.99 + fix issue-handler hang by kovtcharov · Pull Request #797 · amd/gaia

kovtcharov · 2026-04-17T19:18:41Z

Why

Two real, verified-from-logs problems with the current Claude Code setup:

1. PR reviews haven't fired since #783 merged. The workflow's @beta pin points to a 2025-08-22 SHA that predates pull_request_target support (merged in anthropics/claude-code-action#579 on 2025-09-22) and Opus 4.7 support (fixed in v1.0.98). The action's Prepare step rejects pull_request_target with Unsupported event type, but continue-on-error: true was hiding the failure as a "success" conclusion. Run 24580730832 on PR #795 is the concrete example.

2. @claude mentions post a TODO checklist and never update it with findings. In v0 tag mode, large custom_instructions + max_turns: 30 exhaust the turn budget before Claude reaches the final comment-update step. Visible on run 24581846289 for PR #795.

What changed

Pinned all 4 action call sites to v1.0.99 by SHA (c3d45e8e941e1b2ad7b278c57482d9c5bf1f35b3).
Full migration to the v1 API. v1.0.99 drops the v0 inputs we use — merged direct_prompt + custom_instructions into a single prompt per job; moved model / max_turns into claude_args.
All 4 jobs now run in automation mode (prompt input), not tag mode, to work around anthropics/claude-code-action#1223 (tag-mode --model silently ignored) and to fix the unchecked-TODO behavior.
--max-turns bumped for issue-handler from 30 to 50.
continue-on-error: true removed from the 3 Claude action steps.
Prompt-injection hardening on pr-comment / issue-handler — comment bodies fetched via gh api at runtime instead of interpolated from github.event.comment.body.

Validation (completed on this PR via temporary `pull_request` trigger)

Tested end-to-end on PR #797 itself by temporarily adding a pull_request trigger (since pull_request_target uses main's workflow, not the PR-head's). That trigger is now reverted — final diff is migration-only.

Path	Event	Run	Result
`pr-review`	`pull_request` (synchronize)	24583825151	✅ Claude posted a full structured review as this comment — Summary / Issues / Strengths / Verdict format, referenced by file.py:line
`issue-handler`	`issue_comment` (@claude mention)	24583975636	✅ Claude replied with actual findings to an @claude question, not an unchecked TODO list

Claude's own review of this PR caught a real bug in the issue-handler prompt — on issues.opened events github.event.comment.id is empty, so the prompt's gh api .../issues/comments/ URL would 404. Fixed in ad99674 by adding an explicit COMMENT ID field and instructing Claude to skip the comment fetch when empty. Dogfooding worked.

Still unvalidated end-to-end (structural validation only):

pr-comment (pull_request_review_comment) — uses the same automation-mode pattern as the two validated paths
release-notes (workflow_run) — only fires on Publish Release completion; will self-validate on the next release

Commits

4 commits in the branch. Net diff = migration + issue-handler fix. Recommend squash-merge to collapse to one clean commit.

ae1fb3f — the v1 migration (main change)
18efdc0 — TEST ONLY: add pull_request trigger (for validation)
ad99674 — fix from Claude's own review: gate comment fetch on COMMENT ID
c7cbc59 — revert of (2)

Vetting v1.0.99 against open upstream bugs

v1 is not bug-free. Assessment per issue:

Issue	Severity	Affects GAIA?
#1205 + #1187 / #1220 Bun tsconfig / cpSync symlink crash	P1	No — GAIA has no symlinked sensitive files (verified `CLAUDE.md`, `.claude/`)
#1206 HTTP 401 app-token exchange	P2	No — we use `GITHUB_TOKEN`, not a GitHub App installation
#1210 Git credentials overwritten	P2	Low risk — we only read
#1218 Fork PR fetch-by-name	P2	Fixed in v1.0.96
#1222 `gh` CLI unavailable on default-branch PRs	P2	Maybe — GAIA PRs target `main`. Mitigated by our pre-generated `pr-diff.txt` / `pr-files.txt`
#1223 `--model` ignored in tag mode	—	Would affect us — worked around by running all jobs in automation mode
#1225 Opus 4.7 broken on v1.0.97	P2	Fixed in v1.0.98
#1226 `execution_file` not written on max_turns	P2	Minor — doesn't affect user-visible behavior

Rollback

Single workflow file, isolated change. git revert restores @beta — which restores the silent-failure state, not a known-good state. If a real merge-time regression appears, fix forward.

… hang ## What this fixes 1. **PR reviews haven't fired since #783 merged.** The workflow's `@beta` pin is stuck on a 2025-08-22 SHA that predates `pull_request_target` support (merged upstream 2025-09-22) and Opus 4.7 support (v1.0.98). The action's Prepare step has been rejecting `pull_request_target` with "Unsupported event type" on every run, and `continue-on-error: true` was hiding the failure as "success". 2. **`@claude` mentions post a TODO checklist and never update it with findings.** In v0 tag mode, large custom_instructions + low max_turns were exhausting the turn budget before Claude got to the final comment-update step. Visible on run 24581846289 for PR #795. ## Changes - **Pin all 4 action call sites to `v1.0.99` by SHA** — unblocks `pull_request_target` and Opus 4.7. SHA-pin rather than tag-pin so a future floating-tag retarget can't silently repeat this class of breakage. - **Migrate to the v1 API.** v1.0.99 drops the v0 inputs we use (`direct_prompt`, `custom_instructions`, `model`, `max_turns`). Merged `direct_prompt` + `custom_instructions` into a single `prompt` block per job; moved `model` and `max_turns` into `claude_args`. Migration guide: https://github.com/anthropics/claude-code-action/blob/main/docs/migration-guide.md - **All 4 jobs now run in automation mode** (via `prompt` input) instead of tag mode. Two reasons: - Works around anthropics/claude-code-action#1223 (open bug): `--model` is silently ignored in tag mode, falling back to Sonnet 4.6. Automation mode honors `--model claude-opus-4-7` correctly. - Fixes the unchecked-TODO behavior: automation mode runs Claude to completion and posts one final comment, no progress tracker to forget to update. - Claude posts its reply via `gh pr comment` / `gh issue comment` / `gh api` from within the automation task. - **`--max-turns` bumped for `issue-handler` from 30 to 50** — the TODO-unfilled behavior was often turn-budget exhaustion against the expanded custom_instructions added in #783. - **`continue-on-error: true` removed from all 3 Claude action steps.** This masking is the same "No Silent Fallbacks" pattern #783 explicitly added to `CLAUDE.md` as prohibited — the workflow was the biggest violator. - **Prompt-injection hardening on `pr-comment` and `issue-handler`** — instead of interpolating `github.event.comment.body` into the `prompt` (a classic Actions injection sink when user content lands inside another shell/markdown context), the prompt tells Claude to fetch the comment body itself via `gh api`. Workflow-context values in the prompt are limited to numeric IDs and repo names.

… v1.0.99

github-actions · 2026-04-17T19:53:13Z

Code Review — PR #797

Summary

Well-scoped, well-documented CI fix that migrates anthropics/claude-code-action from the stale @beta tag (stuck on a 2025-08-22 SHA) to v1.0.99 pinned by SHA, and migrates all 4 call sites from v0 inputs (direct_prompt / custom_instructions / model / max_turns) to the v1 API (prompt + claude_args). The migration is mechanically consistent across all 4 jobs, and the author preserved each job's distinct prompt while applying three real behavior fixes: automation-mode everywhere (dodges tag-mode --model bug #1223), continue-on-error removed (aligns with CLAUDE.md "No Silent Fallbacks"), and prompt-injection hardening (comment bodies fetched via gh api instead of interpolated). The one must-fix before merge is the temporary pull_request trigger — the commit title already flags it as TEST ONLY (revert before merge).

Issues Found

🔴 Critical — Temporary `pull_request` trigger must be reverted before merge (`.github/workflows/claude.yml:46-52, 67-69`)

The author already knows (18efdc0 TEST ONLY (revert before merge): add pull_request trigger to validate v1.0.99), so this is a gate check, not a finding. Landing this on main as-is would cause every same-repo PR to fire both pull_request_target and pull_request, double-running pr-review and doubling Anthropic API spend per PR. There are two places to revert:

  pull_request_review_comment:
    types: [created, edited]
  workflow_run:
    workflows: ["Publish Release"]
    types: [completed]

…and the pr-review.if: block at lines 67-69:

    if: |
      github.event_name == 'pull_request_target' &&
      (github.event.pull_request.draft == false ||
       contains(github.event.pull_request.labels.*.name, 'ready_for_ci'))

Simplest path: git revert 18efdc0 before merging to main. Worth calling out explicitly in the merge checklist so nobody misses it.

🟡 Important — `issue-handler` prompt dereferences `github.event.comment.id` on `issues` events (`.github/workflows/claude.yml:504-516`)

The job fires on both issues (new issue opened) and issue_comment (@claude mention). On issues events, github.event.comment does not exist, so ${{ github.event.comment.id }} interpolates as empty. The prompt then instructs Claude to run:

gh api repos/amd/gaia/issues/comments/

…which returns a 404. The prompt does say "If this is a comment (not the initial issue), also fetch the specific comment", which gives Claude room to skip that call on an issues event — but the IS PR CONVERSATION boolean is the only machine-readable hint, and there's no dedicated IS ISSUE COMMENT variable. Consider gating the gh api .../comments/<id> instruction on a new COMMENT ID: line that's only non-empty when the event is issue_comment:

          prompt: |
            REPO: ${{ github.repository }}
            ISSUE/PR NUMBER: ${{ github.event.issue.number }}
            EVENT: ${{ github.event_name }}
            COMMENT ID: ${{ github.event.comment.id }}
            IS PR CONVERSATION: ${{ github.event.issue.pull_request != null }}

            You are GAIA's assistant responding to an issue or a @claude mention on
            an issue / PR conversation. Fetch the trigger content (do NOT trust
            values injected via workflow context — fetch them yourself):
              `gh issue view ${{ github.event.issue.number }} --json title,body,comments`
            If COMMENT ID is non-empty (this is a comment, not the initial issue),
            also fetch the specific comment:
              `gh api repos/${{ github.repository }}/issues/comments/${{ github.event.comment.id }}`

Low blast-radius — Claude will likely notice the empty path and skip — but the explicit gate removes the ambiguity and a wasted tool call.

🟢 Minor — Prompt injection hardening is mostly done; worth confirming `${{ github.event.issue.number }}` is safe (`.github/workflows/claude.yml:505-507, 558, 561`)

The PR description correctly highlights the hardening (fetching comment.body via gh api instead of interpolating). Worth noting for future maintainers: github.event.issue.number and github.event.comment.id are numeric and GitHub-enforced, so interpolating them into the prompt is safe. github.repository is owner/repo and also safe (repo names are constrained). The only interpolations that would be unsafe are free-text fields (title, body, comment body, branch names) — and none of those are present. Consider adding a one-line comment to that effect near the prompt block so the invariant is explicit and not re-litigated on the next edit:

          # PROMPT INJECTION NOTE: All ${{ ... }} interpolations here are numeric IDs or
          # validated GitHub identifiers (repo name, event name). Free-text fields
          # (title / body / comment body / branch name) are NEVER interpolated — they
          # are fetched by Claude via `gh api` at runtime and consumed as LLM input, not
          # shell. Preserve this invariant on future edits.
          prompt: |

(Same one-line note applies to the pr-comment job at line 367.)

🟢 Minor — `issue-handler` reply posting has no fallback path when `gh pr comment` / `gh issue comment` fails

The pr-comment prompt (line 397-398) explicitly tells Claude to fall back to a top-level PR comment if the threaded /replies POST fails. The issue-handler prompt gives only one posting command per branch (gh pr comment or gh issue comment). Both are robust commands, so this is unlikely to matter, but a one-line fallback instruction would match the belt-and-suspenders style of the pr-comment prompt. Not a blocker.

Strengths

Excellent commit hygiene and disclosure. The PR description names the exact run IDs that demonstrated the two failure modes (24580730832, 24581846289), calls out each upstream bug by issue number with a "affects GAIA?" assessment, and separates the test-only commit so git revert 18efdc0 is a clean one-liner.
The SHA pin is the right call. Pinning @beta or @v1 would have repeated the exact failure mode this PR fixes; c3d45e8e941e1b2ad7b278c57482d9c5bf1f35b3 # v1.0.99 both immunizes against floating-tag drift and leaves a human-readable version hint.
Migration is internally consistent. Verified no leftover v0 inputs (direct_prompt / custom_instructions / max_turns / bare model:) anywhere in the file — every occurrence is inside a comment documenting the migration (lines 12-13, 106-107). continue-on-error is likewise fully removed from Claude steps and only survives in comments explaining why (lines 103, 361, 499). This is exactly the alignment with CLAUDE.md's "No Silent Fallbacks" rule that the project has been tightening up.
Prompt injection posture is good. Moving free-text fetches (comment body, issue body) from workflow-context interpolation into runtime gh api calls is the correct fix for the classic Actions injection sink, and is the kind of thing that's easy to miss on a workflow migration.

Verdict

Request changes — on the strength of the 🔴 gate item only. The migration itself is clean and the pull_request trigger revert is already flagged by the author; once 18efdc0 is reverted (or squashed out), this is ready to merge. The 🟡 COMMENT ID gating and 🟢 notes are worth addressing but not blocking.

Claude's own review of this PR caught it: on issues.opened events, github.event.comment.id is empty, so the prompt's gh api .../issues/comments/ call becomes a malformed URL. Add an explicit COMMENT ID field to the prompt and instruct Claude to skip the comment fetch when it's empty.

kovtcharov · 2026-04-17T19:54:34Z

@claude quick sanity check — does the pr-comment prompt correctly handle the case where the inline review comment is a reply to another comment (i.e., in_reply_to_id is set)? Or will it try to post a reply-to-a-reply which GitHub rejects?

github-actions · 2026-04-17T19:54:58Z

Claude finished @kovtcharov's task —— View job

Short answer

It handles this correctly — no action needed. GitHub's POST /repos/{owner}/{repo}/pulls/{pull_number}/comments/{comment_id}/replies endpoint does not reject replies-to-replies. When comment_id points at a comment that itself has in_reply_to_id set, GitHub normalizes the new comment into the same review thread (it ends up as a sibling reply with in_reply_to_id pointing at the thread's root comment, not the intermediate one). The reply is posted; the thread structure stays flat.

Details

Looking at .github/workflows/claude.yml:389-398:

gh api -X POST \
  repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/comments/${{ github.event.comment.id }}/replies \
  -f body="$(cat /tmp/reply.md)"

The trigger is pull_request_review_comment (line 330), so github.event.comment.id is always a review comment ID — whether it's the root of a thread or a reply. Either way, /comments/:id/replies is the right endpoint:

Root comment (no in_reply_to_id): new comment gets in_reply_to_id = <root>.id. ✅
Reply comment (has in_reply_to_id): new comment gets in_reply_to_id = <thread_root>.id (GitHub auto-resolves to the thread root, not the immediate parent). ✅ No 422, no rejection.

This is the documented behavior of the Create a reply for a review comment endpoint — it's specifically designed to be called with either a root or a reply, which is why the upstream claude-code-action examples use exactly this pattern.

One thing worth noting (🟢 minor, not a blocker)

The prompt reads "Post your reply as a threaded reply on the review comment" — accurate for Claude's purposes, but someone reading the code later might worry about the reply-to-reply case you're asking about. A one-line comment near the gh api block would preempt that review cycle:

# /replies accepts any review comment ID (root or reply); GitHub threads it
# into the parent conversation automatically. No need to resolve to thread root.

Purely docs; the functional behavior is correct as-is.

Separate nit I noticed while checking this

Line 407's --allowedTools Edit,Read,Write,Grep,Glob,Bash — pr-comment only needs to read the diff and post via gh, so Edit and Write are wider than necessary. Not this PR's scope (same set is in the pre-migration version), but worth tightening in a follow-up to Read,Grep,Glob,Bash to match the "minimum tools" posture elsewhere in the workflow.

Done

Read pr-comment job prompt in .github/workflows/claude.yml
Verify GitHub API behavior for replies to replies
Post findings
| Branch kalin/fix-claude-action-pin

…validate v1.0.99" This reverts commit 18efdc0.

github-actions · 2026-04-17T19:56:48Z

Code Review — PR #797 (follow-up on `ad99674`)

Summary

Re-reviewing after ad99674 fix(claude): gate issue-handler comment fetch on COMMENT ID presence. That commit cleanly resolves the 🟡 COMMENT ID gating concern from the previous review — the issue-handler prompt now exposes an explicit COMMENT ID: field and tells Claude to skip the gh api .../issues/comments/<id> call when it's empty (see .github/workflows/claude.yml:508, 515-518). The one remaining blocker is unchanged: the temporary pull_request trigger added by 18efdc0 must be reverted before this lands on main. Migration quality is still solid — no v0 input leakage, consistent v1 API across all 4 call sites, SHA pinning, and prompt-injection hardening all verified against the current file.

Issues Found

🔴 Critical — Temporary `pull_request` trigger must be reverted before merge (`.github/workflows/claude.yml:46-52, 67-69`)

Still present (by design — the author's 18efdc0 message explicitly says TEST ONLY (revert before merge)). Flagging only so it doesn't slip past the merge button. Landing as-is would cause every same-repo PR to double-fire pr-review on both pull_request_target and pull_request, doubling Anthropic spend per PR for zero added signal.

Revert is a one-liner: git revert 18efdc0 drops both the on: block entry and the pr-review.if: reference. Target state:

  pull_request_review_comment:
    types: [created, edited]
  workflow_run:
    workflows: ["Publish Release"]
    types: [completed]

…and pr-review.if: (currently lines 67-71) back to single-trigger form:

    if: |
      github.event_name == 'pull_request_target' &&
      (github.event.pull_request.draft == false ||
       contains(github.event.pull_request.labels.*.name, 'ready_for_ci'))

🟢 Minor — Author's `@claude` question about `in_reply_to_id` is worth a quick sanity check before merge

Posed in this comment and already being handled by the issue-handler job on another run — not a review finding per se, but worth confirming the answer before merge. Quick take: GitHub's "Create a reply for a review comment" endpoint (POST /repos/{owner}/{repo}/pulls/{pull_number}/comments/{comment_id}/replies) accepts any comment ID in the thread — root or reply — and auto-threads the new reply under the thread root. So pr-comment posting a "reply to a reply" will succeed; it won't create a nested thread, it'll append to the existing thread. No workflow change needed. (Verifying this against upstream docs before merging is cheap insurance.)

🟢 Minor — Consider a one-line "prompt injection invariant" comment near the automation prompts (`.github/workflows/claude.yml:113, 367, 504`)

Previously noted and still applicable — the interpolations in the three automation prompts (${{ github.event.pull_request.number }}, ${{ github.event.comment.id }}, ${{ github.repository }}, ${{ github.event_name }}) are all numeric IDs or GitHub-validated identifiers, so they're safe. Free-text fields (title, body, comment body, branch name) are correctly fetched at runtime via gh api and consumed as LLM input, not shell. A one-line note anchoring that invariant would harden against accidental re-introduction on a future edit. Non-blocking.

🟢 Minor — `issue-handler` posting has no fallback when `gh pr comment` / `gh issue comment` fails (`.github/workflows/claude.yml:559-563`)

pr-comment has a belt-and-suspenders fallback at line 397-398 (If the threaded reply fails, fall back to a new top-level PR comment). issue-handler doesn't. Both primary commands are robust, so this is unlikely to bite — but matching the style of pr-comment would be consistent. Non-blocking.

Strengths

Fast iteration on prior feedback. ad99674 lands the COMMENT ID gate in 6 lines with a clear commit message that credits the surfacing review — exactly the kind of tight feedback loop that the CI migration is meant to enable.
Migration is internally consistent. Re-verified post-ad99674: no leftover v0 inputs anywhere in the file, continue-on-error removed from all 3 Claude action steps (only survives inside comments explaining why), and all 4 action call sites pin the same SHA (c3d45e8).
SHA-pinned with human-readable hint. anthropics/claude-code-action@c3d45e8e941e1b2ad7b278c57482d9c5bf1f35b3 # v1.0.99 immunizes against floating-tag drift while leaving the version legible to reviewers. This is the exact failure mode that would've been repeated with @v1 or @beta.
Prompt injection posture matches CLAUDE.md's threat model. Moving free-text fetches (comment body, issue body) out of workflow-context interpolation and into runtime gh api calls is the right fix; the prompts only ever interpolate numeric or GitHub-validated identifiers.

Verdict

Request changes — on the strength of the 🔴 gate item only. The migration itself is clean and ad99674 addressed the one substantive concern from the previous review. Revert 18efdc0 (or squash it out) before merging to main and this is ready to ship. The 🟢 notes are polish, not blockers.

Prior commits in this branch rewrote the prompt content (compressing bullet lists into prose, dropping sections like 'For Bugs' / 'For Feature Requests' / 'Limitations' / the full docs/ paths list) while migrating from v0 inputs (direct_prompt + custom_instructions) to v1 `prompt`. Those content edits were scope creep — not required by the API migration. Restore the original direct_prompt and custom_instructions text byte-for-byte inside the new `prompt:` input. The only additions are the minimum automation-mode scaffolding that tag mode used to do for us: - context header (REPO / PR NUMBER / COMMENT ID) so Claude knows the trigger identity without interpolating user-controlled text - fetch instruction (gh api) — tag mode auto-injected the comment body; automation mode doesn't, so Claude must fetch - post instruction (gh pr/issue comment or gh api replies) — tag mode auto-posted Claude's final message; automation mode doesn't Verified with byte-equality check that old direct_prompt and custom_instructions substrings are present verbatim in the new prompt for all 4 jobs (pr-review, pr-comment, issue-handler, release-notes).

github-actions · 2026-04-17T20:15:32Z

Summary

Clean, well-motivated CI migration: pins anthropics/claude-code-action from a stale @beta SHA to v1.0.99 (pinned by SHA), migrates to the v1 API (prompt + claude_args), runs all 4 jobs in automation mode to sidestep the tag-mode --model bug, and hardens against prompt injection by fetching comment/issue bodies via gh api at runtime instead of interpolating github.event.comment.body. The rationale in the header comment block and the inline per-step comments is genuinely useful — future maintainers will be able to reconstruct why these choices were made. The one blocker is author-flagged: the temporary pull_request trigger at .github/workflows/claude.yml:46-48 and its mirror in the pr-review if: clause must be reverted before merge (the current HEAD 90ed591 re-added it for a validation retest after the earlier c7cbc59 revert).

Issues Found

🟡 Important

🟡 Temporary pull_request trigger must be reverted before merge (.github/workflows/claude.yml:46-48, .github/workflows/claude.yml:64-65)

The current branch state still contains the test-only trigger that the PR description says was reverted. Commit 90ed591 ("TEST ONLY (revert before merge): retest verbatim-restored prompts") re-added it. If this lands on main:

Every same-repo PR will fire both pull_request_target and pull_request, producing two redundant Claude reviews per PR (wasted Anthropic quota + duplicate comments).
Fork PRs will fire a pull_request run that fails on missing ANTHROPIC_API_KEY (secrets aren't exposed to pull_request on forks), creating a noisy red X alongside the working pull_request_target run.

Since the author has already marked these lines with # TEMPORARY FOR TESTING — REMOVE BEFORE MERGE, this is a pre-merge checklist item rather than a design issue — but it is a merge-blocker. Two edits needed:

Revert lines 46-48:

  pull_request_review_comment:
    types: [created, edited]

(i.e. drop the pull_request: block and its types: line; the pull_request_review_comment: that follows should become the next trigger.)

And revert the pr-review if: at lines 63-67 back to single-event form:

    if: |
      github.event_name == 'pull_request_target' &&
      (github.event.pull_request.draft == false ||
       contains(github.event.pull_request.labels.*.name, 'ready_for_ci'))

🟢 Minor

🟢 --allowedTools grants Edit,Write to review-only jobs (.github/workflows/claude.yml:310, :425)

The pr-review and pr-comment jobs only need to read the repo and post comments — they write /tmp/review.md / /tmp/reply.md (which requires Write) but never Edit existing repo files. Dropping Edit from those two jobs is a small supply-chain tightening that doesn't change behavior. release-notes legitimately needs Edit,Write (it authors RELEASE_NOTES.md + the MDX). issue-handler is debatable — leave as-is if you want Claude to be able to propose patches inline. Not a blocker.

Example for pr-review at line 310:

          claude_args: |
            --max-turns 20
            --model claude-opus-4-7
            --allowedTools Read,Write,Grep,Glob,Bash

🟢 Optional: scope contents permission per-job (.github/workflows/claude.yml:56)

Top-level contents: write is required by release-notes (it commits the MDX + bumps version.py), but the three Claude-comment jobs only need contents: read + pull-requests: write to post comments. Moving the write scope to a job-level override on release-notes only (which already declares its own permissions: block at line 631) would follow least-privilege. Purely optional — the current setup is within the norm for a consolidated Claude workflow.

Strengths

Prompt-injection hardening is the right shape. Moving comment-body retrieval from github.event.comment.body interpolation to a runtime gh api …/comments/${{ github.event.comment.id }} call (lines 361-363, 529-534) keeps attacker-controlled text out of the YAML expansion layer. The comment ID is a safe integer. Good instinct.
Dogfooding loop closed a real bug. Commit ad99674 fixes the COMMENT ID empty case on issues.opened events — caught by Claude's own review of an earlier revision of this PR. The explicit "If COMMENT ID is empty, skip the comment fetch" at line 535 is the right remediation.
Removing continue-on-error: true is aligned with GAIA's fail-loudly rule (CLAUDE.md's "No Silent Fallbacks" section). The header comment at line 58-59 correctly notes this will surface the next class of silent SHA drift instead of hiding it.
SHA pinning with a semver comment (c3d45e8e… # v1.0.99) is the correct supply-chain posture for a third-party action with contents: write + secret access. The upstream-bug audit table in the PR description is unusually thorough.

Verdict

Request changes — only because the temporary pull_request trigger (.github/workflows/claude.yml:46-48, :64-65) is still on the branch per commit 90ed591 and must be reverted before merge, per the author's own REMOVE BEFORE MERGE marker. Once that one revert lands, this is ready to ship. The minor --allowedTools / per-job permissions notes are optional polish, not blockers.

…mpts" This reverts commit 90ed591.

PR #795 (large installer + agent export-import feature, many changed files) exceeded the 20-turn budget on pr-review before Claude could post its review comment. Run 24586335693 failed with `error_max_turns`, visible now that #797 removed the continue-on-error mask. issue-handler was already at 50 from #797. Matching pr-review and pr-comment for consistency — same failure mode, same fix. release-notes stays at 30 since release diffs are bounded to a tag range.

## Summary PR #795 ([run 24586335693](https://github.com/amd/gaia/actions/runs/24586335693)) exceeded pr-review's `--max-turns 20` budget and failed with `error_max_turns` — no review comment posted. That failure is visible (not silently swallowed) thanks to #797 removing the `continue-on-error` mask. The fix is the same bump I applied to `issue-handler` during the v1 migration: 20 → 50. Matching `pr-comment` at the same time for consistency — same failure mode would apply on a large-diff PR conversation. `release-notes` stays at 30 since release diffs are bounded to a tag-to-tag range. ## Test plan - [ ] After merge, re-run `pr-review` on PR #795 (close+reopen or push an empty commit) — confirm Claude completes the review within 50 turns and posts the comment - [ ] Spot-check next 2-3 post-merge PRs don't regress to failures

## Summary The FAQ's "Does GAIA send my data anywhere?" answer said a flat **No**, but that only holds when you're on the default Lemonade backend. If you configure the Claude API or OpenAI provider via settings, prompts leave the machine. Updating the lead to "Not by default" and adding a one-sentence caveat closes a minor-but-real accuracy gap. ## Test plan - [ ] Skim the rendered FAQ section on the docs site after merge — confirm the caveat reads naturally in Mintlify rendering - [ ] **Primary test purpose:** validate that `.github/workflows/claude.yml` (merged in #797) produces a real review comment on this PR via `pull_request_target` — the first post-merge validation that the v1.0.99 migration works end-to-end.

@kovtcharov-amd

# GAIA v0.17.3 Release Notes GAIA v0.17.3 is an extensibility and resilience release. You can now package your own agents into a custom GAIA installer and seed them on first launch, point GAIA at alternative OpenAI-compatible inference servers from the C++ library (Ollama, for example), and start from three new reference agents (weather, RAG Q&A, HTML mockup) that execute against real Lemonade hardware in CI. It also hardens the RAG cache against an insecure-deserialization class of bug (CWE-502) — all users should upgrade. **Why upgrade:** - **Ship your own GAIA** — Export and import agents between machines, follow a new guide to produce a custom installer that seeds your agents on first launch, and on Windows install everything in one step because the installer now includes the Lemonade Server MSI. - **Work with alternative inference backends** — The C++ library now preserves OpenAI-compatible `/v1` base URLs instead of rewriting them to `/api/v1`, so servers that expose the standard `/v1` path (Ollama, for example) work out of the box. - **Start from a working example** — Three new reference agents (weather via MCP, RAG document Q&A, HTML landing-page generator) with integration tests that actually execute against Lemonade on a Strix CI runner. - **Safer RAG cache** — Replaces `pickle` deserialization with JSON + HMAC-SHA256 (CWE-502). Unsigned or tampered caches are rejected and transparently rebuilt on the next query. - **Better document handling** — Encrypted or corrupted PDFs now produce distinct, actionable errors (`EncryptedPDFError`, `CorruptedPDFError`) instead of generic failures, and the RAG index is hardened for concurrent queries. --- ## What's New ### Custom Installers and Agent Portability You can now package a custom GAIA installer that ships with your own agents pre-loaded, and move agents between machines with export/import (PR #795). On Windows, the official installer now includes the Lemonade Server MSI and runs it during install, so a fresh machine has the complete local-LLM stack after a single download (PR #781). **What you can do:** - Export an agent from `~/.gaia/agents/` to a portable bundle with `gaia agents export` and import it on another machine with `gaia agents import` - Follow the new custom-installer playbook at [`docs/playbooks/custom-installer/index.mdx`](/playbooks/custom-installer) to distribute GAIA with your agents pre-loaded — useful for workshops, team deployments, and internal tooling - On Windows, the installer now includes Lemonade Server — no separate download for a complete first-run experience **Under the hood:** - `gaia agents export` / `gaia agents import` CLI commands round-trip agents between machines as portable bundles - First-launch agent seeder (`src/gaia/apps/webui/services/agent-seeder.cjs`) copies `<resourcesPath>/agents/<id>/` into `~/.gaia/agents/<id>/` the first time the app starts - Windows NSIS installer embeds `lemonade-server-minimal.msi` into `$PLUGINSDIR` and runs it via `msiexec /i ... /qn /norestart` during install (auto-cleaned on exit) --- ### Broader Backend Compatibility in the C++ Library The C++ library now preserves OpenAI-compatible `/v1` base URLs (PR #773) instead of rewriting them to `/api/v1`. That means inference servers that expose the standard OpenAI `/v1` path — for example, Ollama at `http://localhost:11434/v1` — work out of the box without needing a special adapter. --- ### Reference Agents and Real-Hardware Integration Tests Three new example agents and a Strix-runner CI workflow land together (PR #340). **What you can do:** - Copy `examples/weather_agent.py`, `examples/rag_doc_agent.py`, or `examples/product_mockup_agent.py` as a starting point for your own agents - Run the new integration tests locally against Lemonade to validate agents end-to-end, not just structurally **Under the hood:** - `tests/integration/test_example_agents.py` executes agents and validates responses with a 5-minute-per-test timeout - `.github/workflows/test_examples.yml` runs on the self-hosted Strix runner (`stx` label) with Lemonade serving `Qwen3-4B-Instruct-2507-GGUF` - Docs homepage refreshed with a technical value prop ("Agent SDK for AMD Ryzen AI") and MCP / CUA added to the capabilities list --- ### Smarter PDF Handling in RAG Encrypted and corrupted PDFs now surface as distinct, actionable errors (`EncryptedPDFError`, `CorruptedPDFError`, `EmptyPDFError`) instead of generic failures or silent 0-chunk indexes (PR #784, closes #451). Encrypted PDFs are detected before extraction; corrupted PDFs are caught during extraction with a clear message. Combined with the indexing-failure surfacing in PR #723, you get a visible indexing-failed status the moment a document fails — and the RAG index itself is now thread-safe under concurrent queries (PR #746). --- ## Security ### RAG Cache Deserialization Replaced with JSON + HMAC Fixes an insecure-deserialization issue in the RAG cache (CWE-502, PR #768). Previously, cached document indexes were serialized with Python `pickle`; if an attacker could write to `~/.gaia/` — via a shared drive, a sync conflict, or a malicious extension — loading that cache could execute arbitrary code. v0.17.3 replaces `pickle` with signed JSON: caches are now serialized as JSON and authenticated with HMAC-SHA256 using a per-install key stored at `~/.gaia/cache/hmac.key`. Unsigned or tampered caches are rejected and transparently rebuilt on the next query. Old `.pkl` caches from previous GAIA versions are ignored and re-indexed the next time you query a document. **You should upgrade if you** share `~/.gaia/` across machines (Dropbox, iCloud, network home directories), run GAIA in a multi-user environment, or have ever imported RAG caches from another source. --- ## Bug Fixes - **Ask Agent attaches files before sending to chat** (PR #725) — Dropped files are indexed into RAG and attached to the active session before the prompt is consumed, so the model sees the document on the first turn instead of the second. - **Document indexing failures are surfaced** (PR #723) — A document that produces 0 chunks now raises `RuntimeError` in the SDK and surfaces as `indexing_status: failed` in the UI, instead of looking like a silent success. Covers RAG SDK, background indexing, and re-index paths. - **Encrypted or corrupted PDFs produce actionable errors** (PR #784, closes #451) — RAG now raises distinct `EncryptedPDFError` and `CorruptedPDFError` exceptions instead of generic failures, so you see exactly what went wrong. - **RAG index thread safety hardened** (PR #746) — Adds `RLock` protection around index mutation paths and rebuilds chunk/index state atomically before publishing it, so concurrent queries read consistent snapshots and failed rebuilds no longer leak partial state. - **MCP JSON-RPC handler guards against non-dict bodies** (PR #803) — A malformed JSON-RPC payload (array, string, null) now returns HTTP 400 `Invalid Request: expected JSON object` instead of an HTTP 500 from a `TypeError`. - **File-search count aligned with accessible results** (PR #754) — The returned count now matches the number of files the tool actually surfaces, instead of a pre-filter total that over-reported results the caller could not access. - **Tracked block cursor replaces misplaced decorative cursor** (PR #727) — Fixes the mis-positioned blinking cursor in the chat input box, which now tracks the actual caret position via a mirror-div technique. - **Ad-hoc sign the macOS app bundle instead of skipping code signing** (PR #765) — The `.app` bundle inside the DMG now carries an ad-hoc signature, so Gatekeeper presents a single "Open Anyway" bypass in System Settings instead of the unrecoverable "is damaged" error. Full Apple Developer ID signing is still being finalized. --- ## Release & CI - **Publish workflow: single approval gate, no legacy Electron apps** (PR #758) — Removed the legacy jira and example standalone Electron apps from the publish pipeline; a single `publish` environment gate governs PyPI, npm, and installer publishing. - **Claude CI modernization** (PR #797, PR #799, PR #783) — Migrated all four `claude-code-action` call sites to `v1.0.99` (pinned by SHA, fixes an issue-handler hang), bumped `--max-turns` from 20 to 50 on both `pr-review` and `pr-comment` for deeper analysis, upgraded to Opus 4.7, standardized 23 subagent definitions with explicit when-to-use sections and tool allowlists, and added agent-builder tooling (manifest schema, `lint.py --agents`, BuilderAgent mixins). --- ## Docs - **Roadmap overhaul** (PR #710) — Milestone-aligned plans with voice-first as P0 and 9 new plan documents for upcoming initiatives. - **Plan: email triage agent** (PR #796) — Specification for an upcoming email triage agent. - **Docs/source drift resolved** (PR #794) — Fixed broken SDK examples across 15 docs, rewrote 5 spec files against the current source (including two that documented entire APIs that don't exist in code), added 20+ missing CLI flags to the CLI reference, and removed 2 already-shipped plan documents (installer, mcp-client). - **FAQ: data-privacy answer clarified for external LLM providers** (PR #798) — Sharper guidance on what leaves your machine when you point GAIA at Claude or OpenAI. --- ## Full Changelog **21 commits** since v0.17.2: - `6d3f3f71` — fix: replace misplaced decorative cursor with tracked terminal block cursor (#727) - `874cf2a3` — fix: Ask Agent indexes and attaches files before sending to chat (#725) - `4fa121e2` — fix: surface document indexing failures instead of silent 0-chunk success (#723) - `34b1d06e` — fix(ci): ad-hoc sign macOS DMG instead of skipping code signing (#765) - `7188b83c` — Roadmap overhaul: milestone-aligned plans with voice-first P0 and 9 new plan documents (#710) - `1beddac5` — cpp: support Ollama-compatible /v1 endpoints (#773) - `cf9ac995` — fix: harden rag index thread safety (#746) - `1c55c31b` — fix(ci): remove legacy electron apps from publish, single approval gate (#758) - `52946a7a` — feat(installer): bundle Lemonade Server MSI into Windows installer (#774) (#781) - `e96b3686` — ci(claude): review infra + conventions + subagent overhaul + agent-builder tooling (#783) - `058674b5` — fix(rag): detect encrypted and corrupted PDFs with actionable errors (#451) (#784) - `7bcb5d51` — fix: replace insecure pickle deserialization with JSON + HMAC in RAG cache (CWE-502) (#768) - `a5167e5f` — fix: keep file-search count aligned with accessible results (#754) - `da5ba458` — ci(claude): migrate to claude-code-action v1.0.99 + fix issue-handler hang (#797) - `03f546b9` — ci(claude): bump pr-review and pr-comment --max-turns 20 -> 50 (#799) - `4119d564` — docs(faq): clarify data-privacy answer re: external LLM providers (#798) - `0cfbcf41` — Add example agents and integration test workflow (#340) - `c4bd15fb` — docs: fix drift between docs and source (docs review pass 1 + 2) (#794) - `407ed5b8` — docs(plans): add email triage agent spec (#796) - `06fb04a4` — fix(mcp): guard JSON-RPC handler against non-dict body (#803) - `880ad603` — feat(installer): custom installer guide, agent export/import, first-launch seeder (#795) Full Changelog: [v0.17.2...v0.17.3](v0.17.2...v0.17.3) --- ## Release checklist - [x] `util/validate_release_notes.py docs/releases/v0.17.3.mdx --tag v0.17.3` passes - [x] `src/gaia/version.py` → `0.17.3` - [x] `src/gaia/apps/webui/package.json` → `0.17.3` - [x] Navbar label in `docs/docs.json` → `v0.17.3 · Lemonade 10.0.0` - [x] All 21 PRs in the range (v0.17.2..HEAD) are represented in the notes - [ ] Review from @kovtcharov-amd addressed

@v1

Known bug (anthropics/claude-code-action#1206): the OIDC-to-app-token exchange returns 401 on @v1. Pinning to v1.0.99 per the referenced workaround in amd/gaia#797.

kovtcharov requested a review from kovtcharov-amd as a code owner April 17, 2026 19:18

github-actions bot added the devops DevOps/infrastructure changes label Apr 17, 2026

itomek approved these changes Apr 17, 2026

View reviewed changes

kovtcharov force-pushed the kalin/fix-claude-action-pin branch from 47b93ac to ae1fb3f Compare April 17, 2026 19:35

kovtcharov changed the title ~~ci(claude): pin claude-code-action to v1.0.99 and unmask pr-review failures~~ ci(claude): migrate to claude-code-action v1.0.99 + fix issue-handler hang Apr 17, 2026

kovtcharov mentioned this pull request Apr 17, 2026

test: workflow smoke-test (do not merge) kovtcharov/gaia#1

Open

TEST ONLY (revert before merge): add pull_request trigger to validate…

18efdc0

… v1.0.99

kovtcharov force-pushed the kalin/fix-claude-action-pin branch from 14673b7 to ad99674 Compare April 17, 2026 19:54

Revert "TEST ONLY (revert before merge): add pull_request trigger to …

c7cbc59

…validate v1.0.99" This reverts commit 18efdc0.

kovtcharov added 2 commits April 17, 2026 13:11

TEST ONLY (revert before merge): retest verbatim-restored prompts

90ed591

Revert "TEST ONLY (revert before merge): retest verbatim-restored pro…

ce8c5ca

…mpts" This reverts commit 90ed591.

kovtcharov added this pull request to the merge queue Apr 17, 2026

Merged via the queue into main with commit da5ba45 Apr 17, 2026
19 checks passed

kovtcharov deleted the kalin/fix-claude-action-pin branch April 17, 2026 20:35

kovtcharov mentioned this pull request Apr 17, 2026

docs(faq): clarify data-privacy answer re: external LLM providers #798

Merged

2 tasks

kovtcharov mentioned this pull request Apr 17, 2026

ci(claude): bump pr-review and pr-comment --max-turns 20 -> 50 #799

Merged

2 tasks

itomek mentioned this pull request Apr 20, 2026

Release v0.17.3 #831

Merged

6 tasks

exiao mentioned this pull request Apr 21, 2026

ci: pin claude-code-action to v1.0.99 to fix OIDC 401 exiao/skills#107

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(claude): migrate to claude-code-action v1.0.99 + fix issue-handler hang#797

ci(claude): migrate to claude-code-action v1.0.99 + fix issue-handler hang#797
kovtcharov merged 7 commits intomainfrom
kalin/fix-claude-action-pin

kovtcharov commented Apr 17, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 17, 2026

Uh oh!

kovtcharov commented Apr 17, 2026

Uh oh!

github-actions bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 17, 2026

Uh oh!

github-actions bot commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kovtcharov commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What changed

Validation (completed on this PR via temporary pull_request trigger)

Commits

Vetting v1.0.99 against open upstream bugs

Rollback

Uh oh!

github-actions bot commented Apr 17, 2026

Code Review — PR #797

Summary

Issues Found

🔴 Critical — Temporary pull_request trigger must be reverted before merge (.github/workflows/claude.yml:46-52, 67-69)

🟡 Important — issue-handler prompt dereferences github.event.comment.id on issues events (.github/workflows/claude.yml:504-516)

🟢 Minor — Prompt injection hardening is mostly done; worth confirming ${{ github.event.issue.number }} is safe (.github/workflows/claude.yml:505-507, 558, 561)

🟢 Minor — issue-handler reply posting has no fallback path when gh pr comment / gh issue comment fails

Strengths

Verdict

Uh oh!

kovtcharov commented Apr 17, 2026

Uh oh!

github-actions bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Short answer

Details

One thing worth noting (🟢 minor, not a blocker)

Separate nit I noticed while checking this

Done

Uh oh!

github-actions bot commented Apr 17, 2026

Code Review — PR #797 (follow-up on ad99674)

Summary

Issues Found

🔴 Critical — Temporary pull_request trigger must be reverted before merge (.github/workflows/claude.yml:46-52, 67-69)

🟢 Minor — Author's @claude question about in_reply_to_id is worth a quick sanity check before merge

🟢 Minor — Consider a one-line "prompt injection invariant" comment near the automation prompts (.github/workflows/claude.yml:113, 367, 504)

🟢 Minor — issue-handler posting has no fallback when gh pr comment / gh issue comment fails (.github/workflows/claude.yml:559-563)

Strengths

Verdict

Uh oh!

github-actions bot commented Apr 17, 2026

Summary

Issues Found

🟡 Important

🟢 Minor

Strengths

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kovtcharov commented Apr 17, 2026 •

edited

Loading

Validation (completed on this PR via temporary `pull_request` trigger)

🔴 Critical — Temporary `pull_request` trigger must be reverted before merge (`.github/workflows/claude.yml:46-52, 67-69`)

🟡 Important — `issue-handler` prompt dereferences `github.event.comment.id` on `issues` events (`.github/workflows/claude.yml:504-516`)

🟢 Minor — Prompt injection hardening is mostly done; worth confirming `${{ github.event.issue.number }}` is safe (`.github/workflows/claude.yml:505-507, 558, 561`)

🟢 Minor — `issue-handler` reply posting has no fallback path when `gh pr comment` / `gh issue comment` fails

github-actions bot commented Apr 17, 2026 •

edited

Loading

Code Review — PR #797 (follow-up on `ad99674`)

🔴 Critical — Temporary `pull_request` trigger must be reverted before merge (`.github/workflows/claude.yml:46-52, 67-69`)

🟢 Minor — Author's `@claude` question about `in_reply_to_id` is worth a quick sanity check before merge

🟢 Minor — Consider a one-line "prompt injection invariant" comment near the automation prompts (`.github/workflows/claude.yml:113, 367, 504`)

🟢 Minor — `issue-handler` posting has no fallback when `gh pr comment` / `gh issue comment` fails (`.github/workflows/claude.yml:559-563`)