Skip to content

ci(claude): migrate to claude-code-action v1.0.99 + fix issue-handler hang#797

Merged
kovtcharov merged 7 commits intomainfrom
kalin/fix-claude-action-pin
Apr 17, 2026
Merged

ci(claude): migrate to claude-code-action v1.0.99 + fix issue-handler hang#797
kovtcharov merged 7 commits intomainfrom
kalin/fix-claude-action-pin

Conversation

@kovtcharov
Copy link
Copy Markdown
Collaborator

@kovtcharov kovtcharov commented Apr 17, 2026

Why

Two real, verified-from-logs problems with the current Claude Code setup:

1. PR reviews haven't fired since #783 merged. The workflow's @beta pin points to a 2025-08-22 SHA that predates pull_request_target support (merged in anthropics/claude-code-action#579 on 2025-09-22) and Opus 4.7 support (fixed in v1.0.98). The action's Prepare step rejects pull_request_target with Unsupported event type, but continue-on-error: true was hiding the failure as a "success" conclusion. Run 24580730832 on PR #795 is the concrete example.

2. @claude mentions post a TODO checklist and never update it with findings. In v0 tag mode, large custom_instructions + max_turns: 30 exhaust the turn budget before Claude reaches the final comment-update step. Visible on run 24581846289 for PR #795.

What changed

  • Pinned all 4 action call sites to v1.0.99 by SHA (c3d45e8e941e1b2ad7b278c57482d9c5bf1f35b3).
  • Full migration to the v1 API. v1.0.99 drops the v0 inputs we use — merged direct_prompt + custom_instructions into a single prompt per job; moved model / max_turns into claude_args.
  • All 4 jobs now run in automation mode (prompt input), not tag mode, to work around anthropics/claude-code-action#1223 (tag-mode --model silently ignored) and to fix the unchecked-TODO behavior.
  • --max-turns bumped for issue-handler from 30 to 50.
  • continue-on-error: true removed from the 3 Claude action steps.
  • Prompt-injection hardening on pr-comment / issue-handler — comment bodies fetched via gh api at runtime instead of interpolated from github.event.comment.body.

Validation (completed on this PR via temporary pull_request trigger)

Tested end-to-end on PR #797 itself by temporarily adding a pull_request trigger (since pull_request_target uses main's workflow, not the PR-head's). That trigger is now reverted — final diff is migration-only.

Path Event Run Result
pr-review pull_request (synchronize) 24583825151 ✅ Claude posted a full structured review as this comment — Summary / Issues / Strengths / Verdict format, referenced by file.py:line
issue-handler issue_comment (@claude mention) 24583975636 ✅ Claude replied with actual findings to an @claude question, not an unchecked TODO list

Claude's own review of this PR caught a real bug in the issue-handler prompt — on issues.opened events github.event.comment.id is empty, so the prompt's gh api .../issues/comments/ URL would 404. Fixed in ad99674 by adding an explicit COMMENT ID field and instructing Claude to skip the comment fetch when empty. Dogfooding worked.

Still unvalidated end-to-end (structural validation only):

  • pr-comment (pull_request_review_comment) — uses the same automation-mode pattern as the two validated paths
  • release-notes (workflow_run) — only fires on Publish Release completion; will self-validate on the next release

Commits

4 commits in the branch. Net diff = migration + issue-handler fix. Recommend squash-merge to collapse to one clean commit.

  1. ae1fb3f — the v1 migration (main change)
  2. 18efdc0 — TEST ONLY: add pull_request trigger (for validation)
  3. ad99674 — fix from Claude's own review: gate comment fetch on COMMENT ID
  4. c7cbc59 — revert of (2)

Vetting v1.0.99 against open upstream bugs

v1 is not bug-free. Assessment per issue:

Issue Severity Affects GAIA?
#1205 + #1187 / #1220 Bun tsconfig / cpSync symlink crash P1 No — GAIA has no symlinked sensitive files (verified CLAUDE.md, .claude/)
#1206 HTTP 401 app-token exchange P2 No — we use GITHUB_TOKEN, not a GitHub App installation
#1210 Git credentials overwritten P2 Low risk — we only read
#1218 Fork PR fetch-by-name P2 Fixed in v1.0.96
#1222 gh CLI unavailable on default-branch PRs P2 Maybe — GAIA PRs target main. Mitigated by our pre-generated pr-diff.txt / pr-files.txt
#1223 --model ignored in tag mode Would affect us — worked around by running all jobs in automation mode
#1225 Opus 4.7 broken on v1.0.97 P2 Fixed in v1.0.98
#1226 execution_file not written on max_turns P2 Minor — doesn't affect user-visible behavior

Rollback

Single workflow file, isolated change. git revert restores @beta — which restores the silent-failure state, not a known-good state. If a real merge-time regression appears, fix forward.

@github-actions github-actions bot added the devops DevOps/infrastructure changes label Apr 17, 2026
… hang

## What this fixes

1. **PR reviews haven't fired since #783 merged.** The workflow's `@beta` pin is
   stuck on a 2025-08-22 SHA that predates `pull_request_target` support (merged
   upstream 2025-09-22) and Opus 4.7 support (v1.0.98). The action's Prepare step
   has been rejecting `pull_request_target` with "Unsupported event type" on
   every run, and `continue-on-error: true` was hiding the failure as "success".

2. **`@claude` mentions post a TODO checklist and never update it with findings.**
   In v0 tag mode, large custom_instructions + low max_turns were exhausting the
   turn budget before Claude got to the final comment-update step. Visible on
   run 24581846289 for PR #795.

## Changes

- **Pin all 4 action call sites to `v1.0.99` by SHA** — unblocks
  `pull_request_target` and Opus 4.7. SHA-pin rather than tag-pin so a future
  floating-tag retarget can't silently repeat this class of breakage.

- **Migrate to the v1 API.** v1.0.99 drops the v0 inputs we use
  (`direct_prompt`, `custom_instructions`, `model`, `max_turns`). Merged
  `direct_prompt` + `custom_instructions` into a single `prompt` block per job;
  moved `model` and `max_turns` into `claude_args`. Migration guide:
  https://github.com/anthropics/claude-code-action/blob/main/docs/migration-guide.md

- **All 4 jobs now run in automation mode** (via `prompt` input) instead of tag
  mode. Two reasons:
  - Works around anthropics/claude-code-action#1223 (open bug): `--model` is
    silently ignored in tag mode, falling back to Sonnet 4.6. Automation mode
    honors `--model claude-opus-4-7` correctly.
  - Fixes the unchecked-TODO behavior: automation mode runs Claude to completion
    and posts one final comment, no progress tracker to forget to update.
  - Claude posts its reply via `gh pr comment` / `gh issue comment` / `gh api`
    from within the automation task.

- **`--max-turns` bumped for `issue-handler` from 30 to 50** — the TODO-unfilled
  behavior was often turn-budget exhaustion against the expanded
  custom_instructions added in #783.

- **`continue-on-error: true` removed from all 3 Claude action steps.** This
  masking is the same "No Silent Fallbacks" pattern #783 explicitly added to
  `CLAUDE.md` as prohibited — the workflow was the biggest violator.

- **Prompt-injection hardening on `pr-comment` and `issue-handler`** — instead
  of interpolating `github.event.comment.body` into the `prompt` (a classic
  Actions injection sink when user content lands inside another shell/markdown
  context), the prompt tells Claude to fetch the comment body itself via `gh
  api`. Workflow-context values in the prompt are limited to numeric IDs and
  repo names.
@kovtcharov kovtcharov force-pushed the kalin/fix-claude-action-pin branch from 47b93ac to ae1fb3f Compare April 17, 2026 19:35
@kovtcharov kovtcharov changed the title ci(claude): pin claude-code-action to v1.0.99 and unmask pr-review failures ci(claude): migrate to claude-code-action v1.0.99 + fix issue-handler hang Apr 17, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Code Review — PR #797

Summary

Well-scoped, well-documented CI fix that migrates anthropics/claude-code-action from the stale @beta tag (stuck on a 2025-08-22 SHA) to v1.0.99 pinned by SHA, and migrates all 4 call sites from v0 inputs (direct_prompt / custom_instructions / model / max_turns) to the v1 API (prompt + claude_args). The migration is mechanically consistent across all 4 jobs, and the author preserved each job's distinct prompt while applying three real behavior fixes: automation-mode everywhere (dodges tag-mode --model bug #1223), continue-on-error removed (aligns with CLAUDE.md "No Silent Fallbacks"), and prompt-injection hardening (comment bodies fetched via gh api instead of interpolated). The one must-fix before merge is the temporary pull_request trigger — the commit title already flags it as TEST ONLY (revert before merge).

Issues Found

🔴 Critical — Temporary pull_request trigger must be reverted before merge (.github/workflows/claude.yml:46-52, 67-69)

The author already knows (18efdc0 TEST ONLY (revert before merge): add pull_request trigger to validate v1.0.99), so this is a gate check, not a finding. Landing this on main as-is would cause every same-repo PR to fire both pull_request_target and pull_request, double-running pr-review and doubling Anthropic API spend per PR. There are two places to revert:

  pull_request_review_comment:
    types: [created, edited]
  workflow_run:
    workflows: ["Publish Release"]
    types: [completed]

…and the pr-review.if: block at lines 67-69:

    if: |
      github.event_name == 'pull_request_target' &&
      (github.event.pull_request.draft == false ||
       contains(github.event.pull_request.labels.*.name, 'ready_for_ci'))

Simplest path: git revert 18efdc0 before merging to main. Worth calling out explicitly in the merge checklist so nobody misses it.

🟡 Important — issue-handler prompt dereferences github.event.comment.id on issues events (.github/workflows/claude.yml:504-516)

The job fires on both issues (new issue opened) and issue_comment (@claude mention). On issues events, github.event.comment does not exist, so ${{ github.event.comment.id }} interpolates as empty. The prompt then instructs Claude to run:

gh api repos/amd/gaia/issues/comments/

…which returns a 404. The prompt does say "If this is a comment (not the initial issue), also fetch the specific comment", which gives Claude room to skip that call on an issues event — but the IS PR CONVERSATION boolean is the only machine-readable hint, and there's no dedicated IS ISSUE COMMENT variable. Consider gating the gh api .../comments/<id> instruction on a new COMMENT ID: line that's only non-empty when the event is issue_comment:

          prompt: |
            REPO: ${{ github.repository }}
            ISSUE/PR NUMBER: ${{ github.event.issue.number }}
            EVENT: ${{ github.event_name }}
            COMMENT ID: ${{ github.event.comment.id }}
            IS PR CONVERSATION: ${{ github.event.issue.pull_request != null }}

            You are GAIA's assistant responding to an issue or a @claude mention on
            an issue / PR conversation. Fetch the trigger content (do NOT trust
            values injected via workflow context — fetch them yourself):
              `gh issue view ${{ github.event.issue.number }} --json title,body,comments`
            If COMMENT ID is non-empty (this is a comment, not the initial issue),
            also fetch the specific comment:
              `gh api repos/${{ github.repository }}/issues/comments/${{ github.event.comment.id }}`

Low blast-radius — Claude will likely notice the empty path and skip — but the explicit gate removes the ambiguity and a wasted tool call.

🟢 Minor — Prompt injection hardening is mostly done; worth confirming ${{ github.event.issue.number }} is safe (.github/workflows/claude.yml:505-507, 558, 561)

The PR description correctly highlights the hardening (fetching comment.body via gh api instead of interpolating). Worth noting for future maintainers: github.event.issue.number and github.event.comment.id are numeric and GitHub-enforced, so interpolating them into the prompt is safe. github.repository is owner/repo and also safe (repo names are constrained). The only interpolations that would be unsafe are free-text fields (title, body, comment body, branch names) — and none of those are present. Consider adding a one-line comment to that effect near the prompt block so the invariant is explicit and not re-litigated on the next edit:

          # PROMPT INJECTION NOTE: All ${{ ... }} interpolations here are numeric IDs or
          # validated GitHub identifiers (repo name, event name). Free-text fields
          # (title / body / comment body / branch name) are NEVER interpolated — they
          # are fetched by Claude via `gh api` at runtime and consumed as LLM input, not
          # shell. Preserve this invariant on future edits.
          prompt: |

(Same one-line note applies to the pr-comment job at line 367.)

🟢 Minor — issue-handler reply posting has no fallback path when gh pr comment / gh issue comment fails

The pr-comment prompt (line 397-398) explicitly tells Claude to fall back to a top-level PR comment if the threaded /replies POST fails. The issue-handler prompt gives only one posting command per branch (gh pr comment or gh issue comment). Both are robust commands, so this is unlikely to matter, but a one-line fallback instruction would match the belt-and-suspenders style of the pr-comment prompt. Not a blocker.

Strengths

  • Excellent commit hygiene and disclosure. The PR description names the exact run IDs that demonstrated the two failure modes (24580730832, 24581846289), calls out each upstream bug by issue number with a "affects GAIA?" assessment, and separates the test-only commit so git revert 18efdc0 is a clean one-liner.
  • The SHA pin is the right call. Pinning @beta or @v1 would have repeated the exact failure mode this PR fixes; c3d45e8e941e1b2ad7b278c57482d9c5bf1f35b3 # v1.0.99 both immunizes against floating-tag drift and leaves a human-readable version hint.
  • Migration is internally consistent. Verified no leftover v0 inputs (direct_prompt / custom_instructions / max_turns / bare model:) anywhere in the file — every occurrence is inside a comment documenting the migration (lines 12-13, 106-107). continue-on-error is likewise fully removed from Claude steps and only survives in comments explaining why (lines 103, 361, 499). This is exactly the alignment with CLAUDE.md's "No Silent Fallbacks" rule that the project has been tightening up.
  • Prompt injection posture is good. Moving free-text fetches (comment body, issue body) from workflow-context interpolation into runtime gh api calls is the correct fix for the classic Actions injection sink, and is the kind of thing that's easy to miss on a workflow migration.

Verdict

Request changes — on the strength of the 🔴 gate item only. The migration itself is clean and the pull_request trigger revert is already flagged by the author; once 18efdc0 is reverted (or squashed out), this is ready to merge. The 🟡 COMMENT ID gating and 🟢 notes are worth addressing but not blocking.

Claude's own review of this PR caught it: on issues.opened events,
github.event.comment.id is empty, so the prompt's
gh api .../issues/comments/ call becomes a malformed URL. Add an
explicit COMMENT ID field to the prompt and instruct Claude to skip the
comment fetch when it's empty.
@kovtcharov kovtcharov force-pushed the kalin/fix-claude-action-pin branch from 14673b7 to ad99674 Compare April 17, 2026 19:54
@kovtcharov
Copy link
Copy Markdown
Collaborator Author

@claude quick sanity check — does the pr-comment prompt correctly handle the case where the inline review comment is a reply to another comment (i.e., in_reply_to_id is set)? Or will it try to post a reply-to-a-reply which GitHub rejects?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 17, 2026

Claude finished @kovtcharov's task —— View job


Short answer

It handles this correctly — no action needed. GitHub's POST /repos/{owner}/{repo}/pulls/{pull_number}/comments/{comment_id}/replies endpoint does not reject replies-to-replies. When comment_id points at a comment that itself has in_reply_to_id set, GitHub normalizes the new comment into the same review thread (it ends up as a sibling reply with in_reply_to_id pointing at the thread's root comment, not the intermediate one). The reply is posted; the thread structure stays flat.

Details

Looking at .github/workflows/claude.yml:389-398:

gh api -X POST \
  repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/comments/${{ github.event.comment.id }}/replies \
  -f body="$(cat /tmp/reply.md)"

The trigger is pull_request_review_comment (line 330), so github.event.comment.id is always a review comment ID — whether it's the root of a thread or a reply. Either way, /comments/:id/replies is the right endpoint:

  • Root comment (no in_reply_to_id): new comment gets in_reply_to_id = <root>.id. ✅
  • Reply comment (has in_reply_to_id): new comment gets in_reply_to_id = <thread_root>.id (GitHub auto-resolves to the thread root, not the immediate parent). ✅ No 422, no rejection.

This is the documented behavior of the Create a reply for a review comment endpoint — it's specifically designed to be called with either a root or a reply, which is why the upstream claude-code-action examples use exactly this pattern.

One thing worth noting (🟢 minor, not a blocker)

The prompt reads "Post your reply as a threaded reply on the review comment" — accurate for Claude's purposes, but someone reading the code later might worry about the reply-to-reply case you're asking about. A one-line comment near the gh api block would preempt that review cycle:

# /replies accepts any review comment ID (root or reply); GitHub threads it
# into the parent conversation automatically. No need to resolve to thread root.

Purely docs; the functional behavior is correct as-is.

Separate nit I noticed while checking this

Line 407's --allowedTools Edit,Read,Write,Grep,Glob,Bashpr-comment only needs to read the diff and post via gh, so Edit and Write are wider than necessary. Not this PR's scope (same set is in the pre-migration version), but worth tightening in a follow-up to Read,Grep,Glob,Bash to match the "minimum tools" posture elsewhere in the workflow.


Done

@github-actions
Copy link
Copy Markdown
Contributor

Code Review — PR #797 (follow-up on ad99674)

Summary

Re-reviewing after ad99674 fix(claude): gate issue-handler comment fetch on COMMENT ID presence. That commit cleanly resolves the 🟡 COMMENT ID gating concern from the previous review — the issue-handler prompt now exposes an explicit COMMENT ID: field and tells Claude to skip the gh api .../issues/comments/<id> call when it's empty (see .github/workflows/claude.yml:508, 515-518). The one remaining blocker is unchanged: the temporary pull_request trigger added by 18efdc0 must be reverted before this lands on main. Migration quality is still solid — no v0 input leakage, consistent v1 API across all 4 call sites, SHA pinning, and prompt-injection hardening all verified against the current file.

Issues Found

🔴 Critical — Temporary pull_request trigger must be reverted before merge (.github/workflows/claude.yml:46-52, 67-69)

Still present (by design — the author's 18efdc0 message explicitly says TEST ONLY (revert before merge)). Flagging only so it doesn't slip past the merge button. Landing as-is would cause every same-repo PR to double-fire pr-review on both pull_request_target and pull_request, doubling Anthropic spend per PR for zero added signal.

Revert is a one-liner: git revert 18efdc0 drops both the on: block entry and the pr-review.if: reference. Target state:

  pull_request_review_comment:
    types: [created, edited]
  workflow_run:
    workflows: ["Publish Release"]
    types: [completed]

…and pr-review.if: (currently lines 67-71) back to single-trigger form:

    if: |
      github.event_name == 'pull_request_target' &&
      (github.event.pull_request.draft == false ||
       contains(github.event.pull_request.labels.*.name, 'ready_for_ci'))

🟢 Minor — Author's @claude question about in_reply_to_id is worth a quick sanity check before merge

Posed in this comment and already being handled by the issue-handler job on another run — not a review finding per se, but worth confirming the answer before merge. Quick take: GitHub's "Create a reply for a review comment" endpoint (POST /repos/{owner}/{repo}/pulls/{pull_number}/comments/{comment_id}/replies) accepts any comment ID in the thread — root or reply — and auto-threads the new reply under the thread root. So pr-comment posting a "reply to a reply" will succeed; it won't create a nested thread, it'll append to the existing thread. No workflow change needed. (Verifying this against upstream docs before merging is cheap insurance.)

🟢 Minor — Consider a one-line "prompt injection invariant" comment near the automation prompts (.github/workflows/claude.yml:113, 367, 504)

Previously noted and still applicable — the interpolations in the three automation prompts (${{ github.event.pull_request.number }}, ${{ github.event.comment.id }}, ${{ github.repository }}, ${{ github.event_name }}) are all numeric IDs or GitHub-validated identifiers, so they're safe. Free-text fields (title, body, comment body, branch name) are correctly fetched at runtime via gh api and consumed as LLM input, not shell. A one-line note anchoring that invariant would harden against accidental re-introduction on a future edit. Non-blocking.

🟢 Minor — issue-handler posting has no fallback when gh pr comment / gh issue comment fails (.github/workflows/claude.yml:559-563)

pr-comment has a belt-and-suspenders fallback at line 397-398 (If the threaded reply fails, fall back to a new top-level PR comment). issue-handler doesn't. Both primary commands are robust, so this is unlikely to bite — but matching the style of pr-comment would be consistent. Non-blocking.

Strengths

  • Fast iteration on prior feedback. ad99674 lands the COMMENT ID gate in 6 lines with a clear commit message that credits the surfacing review — exactly the kind of tight feedback loop that the CI migration is meant to enable.
  • Migration is internally consistent. Re-verified post-ad99674: no leftover v0 inputs anywhere in the file, continue-on-error removed from all 3 Claude action steps (only survives inside comments explaining why), and all 4 action call sites pin the same SHA (c3d45e8).
  • SHA-pinned with human-readable hint. anthropics/claude-code-action@c3d45e8e941e1b2ad7b278c57482d9c5bf1f35b3 # v1.0.99 immunizes against floating-tag drift while leaving the version legible to reviewers. This is the exact failure mode that would've been repeated with @v1 or @beta.
  • Prompt injection posture matches CLAUDE.md's threat model. Moving free-text fetches (comment body, issue body) out of workflow-context interpolation and into runtime gh api calls is the right fix; the prompts only ever interpolate numeric or GitHub-validated identifiers.

Verdict

Request changes — on the strength of the 🔴 gate item only. The migration itself is clean and ad99674 addressed the one substantive concern from the previous review. Revert 18efdc0 (or squash it out) before merging to main and this is ready to ship. The 🟢 notes are polish, not blockers.

Prior commits in this branch rewrote the prompt content (compressing
bullet lists into prose, dropping sections like 'For Bugs' / 'For
Feature Requests' / 'Limitations' / the full docs/ paths list) while
migrating from v0 inputs (direct_prompt + custom_instructions) to v1
`prompt`. Those content edits were scope creep — not required by the
API migration.

Restore the original direct_prompt and custom_instructions text
byte-for-byte inside the new `prompt:` input. The only additions are
the minimum automation-mode scaffolding that tag mode used to do for
us:
  - context header (REPO / PR NUMBER / COMMENT ID) so Claude knows the
    trigger identity without interpolating user-controlled text
  - fetch instruction (gh api) — tag mode auto-injected the comment
    body; automation mode doesn't, so Claude must fetch
  - post instruction (gh pr/issue comment or gh api replies) — tag mode
    auto-posted Claude's final message; automation mode doesn't

Verified with byte-equality check that old direct_prompt and
custom_instructions substrings are present verbatim in the new prompt
for all 4 jobs (pr-review, pr-comment, issue-handler, release-notes).
@github-actions
Copy link
Copy Markdown
Contributor

Summary

Clean, well-motivated CI migration: pins anthropics/claude-code-action from a stale @beta SHA to v1.0.99 (pinned by SHA), migrates to the v1 API (prompt + claude_args), runs all 4 jobs in automation mode to sidestep the tag-mode --model bug, and hardens against prompt injection by fetching comment/issue bodies via gh api at runtime instead of interpolating github.event.comment.body. The rationale in the header comment block and the inline per-step comments is genuinely useful — future maintainers will be able to reconstruct why these choices were made. The one blocker is author-flagged: the temporary pull_request trigger at .github/workflows/claude.yml:46-48 and its mirror in the pr-review if: clause must be reverted before merge (the current HEAD 90ed591 re-added it for a validation retest after the earlier c7cbc59 revert).

Issues Found

🟡 Important

🟡 Temporary pull_request trigger must be reverted before merge (.github/workflows/claude.yml:46-48, .github/workflows/claude.yml:64-65)

The current branch state still contains the test-only trigger that the PR description says was reverted. Commit 90ed591 ("TEST ONLY (revert before merge): retest verbatim-restored prompts") re-added it. If this lands on main:

  • Every same-repo PR will fire both pull_request_target and pull_request, producing two redundant Claude reviews per PR (wasted Anthropic quota + duplicate comments).
  • Fork PRs will fire a pull_request run that fails on missing ANTHROPIC_API_KEY (secrets aren't exposed to pull_request on forks), creating a noisy red X alongside the working pull_request_target run.

Since the author has already marked these lines with # TEMPORARY FOR TESTING — REMOVE BEFORE MERGE, this is a pre-merge checklist item rather than a design issue — but it is a merge-blocker. Two edits needed:

Revert lines 46-48:

  pull_request_review_comment:
    types: [created, edited]

(i.e. drop the pull_request: block and its types: line; the pull_request_review_comment: that follows should become the next trigger.)

And revert the pr-review if: at lines 63-67 back to single-event form:

    if: |
      github.event_name == 'pull_request_target' &&
      (github.event.pull_request.draft == false ||
       contains(github.event.pull_request.labels.*.name, 'ready_for_ci'))

🟢 Minor

🟢 --allowedTools grants Edit,Write to review-only jobs (.github/workflows/claude.yml:310, :425)

The pr-review and pr-comment jobs only need to read the repo and post comments — they write /tmp/review.md / /tmp/reply.md (which requires Write) but never Edit existing repo files. Dropping Edit from those two jobs is a small supply-chain tightening that doesn't change behavior. release-notes legitimately needs Edit,Write (it authors RELEASE_NOTES.md + the MDX). issue-handler is debatable — leave as-is if you want Claude to be able to propose patches inline. Not a blocker.

Example for pr-review at line 310:

          claude_args: |
            --max-turns 20
            --model claude-opus-4-7
            --allowedTools Read,Write,Grep,Glob,Bash

🟢 Optional: scope contents permission per-job (.github/workflows/claude.yml:56)

Top-level contents: write is required by release-notes (it commits the MDX + bumps version.py), but the three Claude-comment jobs only need contents: read + pull-requests: write to post comments. Moving the write scope to a job-level override on release-notes only (which already declares its own permissions: block at line 631) would follow least-privilege. Purely optional — the current setup is within the norm for a consolidated Claude workflow.

Strengths

  • Prompt-injection hardening is the right shape. Moving comment-body retrieval from github.event.comment.body interpolation to a runtime gh api …/comments/${{ github.event.comment.id }} call (lines 361-363, 529-534) keeps attacker-controlled text out of the YAML expansion layer. The comment ID is a safe integer. Good instinct.
  • Dogfooding loop closed a real bug. Commit ad99674 fixes the COMMENT ID empty case on issues.opened events — caught by Claude's own review of an earlier revision of this PR. The explicit "If COMMENT ID is empty, skip the comment fetch" at line 535 is the right remediation.
  • Removing continue-on-error: true is aligned with GAIA's fail-loudly rule (CLAUDE.md's "No Silent Fallbacks" section). The header comment at line 58-59 correctly notes this will surface the next class of silent SHA drift instead of hiding it.
  • SHA pinning with a semver comment (c3d45e8e… # v1.0.99) is the correct supply-chain posture for a third-party action with contents: write + secret access. The upstream-bug audit table in the PR description is unusually thorough.

Verdict

Request changes — only because the temporary pull_request trigger (.github/workflows/claude.yml:46-48, :64-65) is still on the branch per commit 90ed591 and must be reverted before merge, per the author's own REMOVE BEFORE MERGE marker. Once that one revert lands, this is ready to ship. The minor --allowedTools / per-job permissions notes are optional polish, not blockers.

@kovtcharov kovtcharov added this pull request to the merge queue Apr 17, 2026
Merged via the queue into main with commit da5ba45 Apr 17, 2026
19 checks passed
@kovtcharov kovtcharov deleted the kalin/fix-claude-action-pin branch April 17, 2026 20:35
kovtcharov added a commit that referenced this pull request Apr 17, 2026
PR #795 (large installer + agent export-import feature, many changed
files) exceeded the 20-turn budget on pr-review before Claude could
post its review comment. Run 24586335693 failed with
`error_max_turns`, visible now that #797 removed the continue-on-error
mask.

issue-handler was already at 50 from #797. Matching pr-review and
pr-comment for consistency — same failure mode, same fix. release-notes
stays at 30 since release diffs are bounded to a tag range.
github-merge-queue bot pushed a commit that referenced this pull request Apr 17, 2026
## Summary

PR #795 ([run
24586335693](https://github.com/amd/gaia/actions/runs/24586335693))
exceeded pr-review's `--max-turns 20` budget and failed with
`error_max_turns` — no review comment posted. That failure is visible
(not silently swallowed) thanks to #797 removing the `continue-on-error`
mask. The fix is the same bump I applied to `issue-handler` during the
v1 migration: 20 → 50.

Matching `pr-comment` at the same time for consistency — same failure
mode would apply on a large-diff PR conversation. `release-notes` stays
at 30 since release diffs are bounded to a tag-to-tag range.

## Test plan

- [ ] After merge, re-run `pr-review` on PR #795 (close+reopen or push
an empty commit) — confirm Claude completes the review within 50 turns
and posts the comment
- [ ] Spot-check next 2-3 post-merge PRs don't regress to failures
github-merge-queue bot pushed a commit that referenced this pull request Apr 17, 2026
## Summary

The FAQ's "Does GAIA send my data anywhere?" answer said a flat **No**,
but that only holds when you're on the default Lemonade backend. If you
configure the Claude API or OpenAI provider via settings, prompts leave
the machine. Updating the lead to "Not by default" and adding a
one-sentence caveat closes a minor-but-real accuracy gap.

## Test plan

- [ ] Skim the rendered FAQ section on the docs site after merge —
confirm the caveat reads naturally in Mintlify rendering
- [ ] **Primary test purpose:** validate that
`.github/workflows/claude.yml` (merged in #797) produces a real review
comment on this PR via `pull_request_target` — the first post-merge
validation that the v1.0.99 migration works end-to-end.
@itomek itomek mentioned this pull request Apr 20, 2026
6 tasks
github-merge-queue bot pushed a commit that referenced this pull request Apr 20, 2026
# GAIA v0.17.3 Release Notes

GAIA v0.17.3 is an extensibility and resilience release. You can now
package your own agents into a custom GAIA installer and seed them on
first launch, point GAIA at alternative OpenAI-compatible inference
servers from the C++ library (Ollama, for example), and start from three
new reference agents (weather, RAG Q&A, HTML mockup) that execute
against real Lemonade hardware in CI. It also hardens the RAG cache
against an insecure-deserialization class of bug (CWE-502) — all users
should upgrade.

**Why upgrade:**
- **Ship your own GAIA** — Export and import agents between machines,
follow a new guide to produce a custom installer that seeds your agents
on first launch, and on Windows install everything in one step because
the installer now includes the Lemonade Server MSI.
- **Work with alternative inference backends** — The C++ library now
preserves OpenAI-compatible `/v1` base URLs instead of rewriting them to
`/api/v1`, so servers that expose the standard `/v1` path (Ollama, for
example) work out of the box.
- **Start from a working example** — Three new reference agents (weather
via MCP, RAG document Q&A, HTML landing-page generator) with integration
tests that actually execute against Lemonade on a Strix CI runner.
- **Safer RAG cache** — Replaces `pickle` deserialization with JSON +
HMAC-SHA256 (CWE-502). Unsigned or tampered caches are rejected and
transparently rebuilt on the next query.
- **Better document handling** — Encrypted or corrupted PDFs now produce
distinct, actionable errors (`EncryptedPDFError`, `CorruptedPDFError`)
instead of generic failures, and the RAG index is hardened for
concurrent queries.

---

## What's New

### Custom Installers and Agent Portability

You can now package a custom GAIA installer that ships with your own
agents pre-loaded, and move agents between machines with export/import
(PR #795). On Windows, the official installer now includes the Lemonade
Server MSI and runs it during install, so a fresh machine has the
complete local-LLM stack after a single download (PR #781).

**What you can do:**
- Export an agent from `~/.gaia/agents/` to a portable bundle with `gaia
agents export` and import it on another machine with `gaia agents
import`
- Follow the new custom-installer playbook at
[`docs/playbooks/custom-installer/index.mdx`](/playbooks/custom-installer)
to distribute GAIA with your agents pre-loaded — useful for workshops,
team deployments, and internal tooling
- On Windows, the installer now includes Lemonade Server — no separate
download for a complete first-run experience

**Under the hood:**
- `gaia agents export` / `gaia agents import` CLI commands round-trip
agents between machines as portable bundles
- First-launch agent seeder
(`src/gaia/apps/webui/services/agent-seeder.cjs`) copies
`<resourcesPath>/agents/<id>/` into `~/.gaia/agents/<id>/` the first
time the app starts
- Windows NSIS installer embeds `lemonade-server-minimal.msi` into
`$PLUGINSDIR` and runs it via `msiexec /i ... /qn /norestart` during
install (auto-cleaned on exit)

---

### Broader Backend Compatibility in the C++ Library

The C++ library now preserves OpenAI-compatible `/v1` base URLs (PR
#773) instead of rewriting them to `/api/v1`. That means inference
servers that expose the standard OpenAI `/v1` path — for example, Ollama
at `http://localhost:11434/v1` — work out of the box without needing a
special adapter.

---

### Reference Agents and Real-Hardware Integration Tests

Three new example agents and a Strix-runner CI workflow land together
(PR #340).

**What you can do:**
- Copy `examples/weather_agent.py`, `examples/rag_doc_agent.py`, or
`examples/product_mockup_agent.py` as a starting point for your own
agents
- Run the new integration tests locally against Lemonade to validate
agents end-to-end, not just structurally

**Under the hood:**
- `tests/integration/test_example_agents.py` executes agents and
validates responses with a 5-minute-per-test timeout
- `.github/workflows/test_examples.yml` runs on the self-hosted Strix
runner (`stx` label) with Lemonade serving `Qwen3-4B-Instruct-2507-GGUF`
- Docs homepage refreshed with a technical value prop ("Agent SDK for
AMD Ryzen AI") and MCP / CUA added to the capabilities list

---

### Smarter PDF Handling in RAG

Encrypted and corrupted PDFs now surface as distinct, actionable errors
(`EncryptedPDFError`, `CorruptedPDFError`, `EmptyPDFError`) instead of
generic failures or silent 0-chunk indexes (PR #784, closes #451).
Encrypted PDFs are detected before extraction; corrupted PDFs are caught
during extraction with a clear message. Combined with the
indexing-failure surfacing in PR #723, you get a visible indexing-failed
status the moment a document fails — and the RAG index itself is now
thread-safe under concurrent queries (PR #746).

---

## Security

### RAG Cache Deserialization Replaced with JSON + HMAC

Fixes an insecure-deserialization issue in the RAG cache (CWE-502, PR
#768). Previously, cached document indexes were serialized with Python
`pickle`; if an attacker could write to `~/.gaia/` — via a shared drive,
a sync conflict, or a malicious extension — loading that cache could
execute arbitrary code.

v0.17.3 replaces `pickle` with signed JSON: caches are now serialized as
JSON and authenticated with HMAC-SHA256 using a per-install key stored
at `~/.gaia/cache/hmac.key`. Unsigned or tampered caches are rejected
and transparently rebuilt on the next query. Old `.pkl` caches from
previous GAIA versions are ignored and re-indexed the next time you
query a document.

**You should upgrade if you** share `~/.gaia/` across machines (Dropbox,
iCloud, network home directories), run GAIA in a multi-user environment,
or have ever imported RAG caches from another source.

---

## Bug Fixes

- **Ask Agent attaches files before sending to chat** (PR #725) —
Dropped files are indexed into RAG and attached to the active session
before the prompt is consumed, so the model sees the document on the
first turn instead of the second.
- **Document indexing failures are surfaced** (PR #723) — A document
that produces 0 chunks now raises `RuntimeError` in the SDK and surfaces
as `indexing_status: failed` in the UI, instead of looking like a silent
success. Covers RAG SDK, background indexing, and re-index paths.
- **Encrypted or corrupted PDFs produce actionable errors** (PR #784,
closes #451) — RAG now raises distinct `EncryptedPDFError` and
`CorruptedPDFError` exceptions instead of generic failures, so you see
exactly what went wrong.
- **RAG index thread safety hardened** (PR #746) — Adds `RLock`
protection around index mutation paths and rebuilds chunk/index state
atomically before publishing it, so concurrent queries read consistent
snapshots and failed rebuilds no longer leak partial state.
- **MCP JSON-RPC handler guards against non-dict bodies** (PR #803) — A
malformed JSON-RPC payload (array, string, null) now returns HTTP 400
`Invalid Request: expected JSON object` instead of an HTTP 500 from a
`TypeError`.
- **File-search count aligned with accessible results** (PR #754) — The
returned count now matches the number of files the tool actually
surfaces, instead of a pre-filter total that over-reported results the
caller could not access.
- **Tracked block cursor replaces misplaced decorative cursor** (PR
#727) — Fixes the mis-positioned blinking cursor in the chat input box,
which now tracks the actual caret position via a mirror-div technique.
- **Ad-hoc sign the macOS app bundle instead of skipping code signing**
(PR #765) — The `.app` bundle inside the DMG now carries an ad-hoc
signature, so Gatekeeper presents a single "Open Anyway" bypass in
System Settings instead of the unrecoverable "is damaged" error. Full
Apple Developer ID signing is still being finalized.

---

## Release & CI

- **Publish workflow: single approval gate, no legacy Electron apps**
(PR #758) — Removed the legacy jira and example standalone Electron apps
from the publish pipeline; a single `publish` environment gate governs
PyPI, npm, and installer publishing.
- **Claude CI modernization** (PR #797, PR #799, PR #783) — Migrated all
four `claude-code-action` call sites to `v1.0.99` (pinned by SHA, fixes
an issue-handler hang), bumped `--max-turns` from 20 to 50 on both
`pr-review` and `pr-comment` for deeper analysis, upgraded to Opus 4.7,
standardized 23 subagent definitions with explicit when-to-use sections
and tool allowlists, and added agent-builder tooling (manifest schema,
`lint.py --agents`, BuilderAgent mixins).

---

## Docs

- **Roadmap overhaul** (PR #710) — Milestone-aligned plans with
voice-first as P0 and 9 new plan documents for upcoming initiatives.
- **Plan: email triage agent** (PR #796) — Specification for an upcoming
email triage agent.
- **Docs/source drift resolved** (PR #794) — Fixed broken SDK examples
across 15 docs, rewrote 5 spec files against the current source
(including two that documented entire APIs that don't exist in code),
added 20+ missing CLI flags to the CLI reference, and removed 2
already-shipped plan documents (installer, mcp-client).
- **FAQ: data-privacy answer clarified for external LLM providers** (PR
#798) — Sharper guidance on what leaves your machine when you point GAIA
at Claude or OpenAI.

---

## Full Changelog

**21 commits** since v0.17.2:

- `6d3f3f71` — fix: replace misplaced decorative cursor with tracked
terminal block cursor (#727)
- `874cf2a3` — fix: Ask Agent indexes and attaches files before sending
to chat (#725)
- `4fa121e2` — fix: surface document indexing failures instead of silent
0-chunk success (#723)
- `34b1d06e` — fix(ci): ad-hoc sign macOS DMG instead of skipping code
signing (#765)
- `7188b83c` — Roadmap overhaul: milestone-aligned plans with
voice-first P0 and 9 new plan documents (#710)
- `1beddac5` — cpp: support Ollama-compatible /v1 endpoints (#773)
- `cf9ac995` — fix: harden rag index thread safety (#746)
- `1c55c31b` — fix(ci): remove legacy electron apps from publish, single
approval gate (#758)
- `52946a7a` — feat(installer): bundle Lemonade Server MSI into Windows
installer (#774) (#781)
- `e96b3686` — ci(claude): review infra + conventions + subagent
overhaul + agent-builder tooling (#783)
- `058674b5` — fix(rag): detect encrypted and corrupted PDFs with
actionable errors (#451) (#784)
- `7bcb5d51` — fix: replace insecure pickle deserialization with JSON +
HMAC in RAG cache (CWE-502) (#768)
- `a5167e5f` — fix: keep file-search count aligned with accessible
results (#754)
- `da5ba458` — ci(claude): migrate to claude-code-action v1.0.99 + fix
issue-handler hang (#797)
- `03f546b9` — ci(claude): bump pr-review and pr-comment --max-turns 20
-> 50 (#799)
- `4119d564` — docs(faq): clarify data-privacy answer re: external LLM
providers (#798)
- `0cfbcf41` — Add example agents and integration test workflow (#340)
- `c4bd15fb` — docs: fix drift between docs and source (docs review pass
1 + 2) (#794)
- `407ed5b8` — docs(plans): add email triage agent spec (#796)
- `06fb04a4` — fix(mcp): guard JSON-RPC handler against non-dict body
(#803)
- `880ad603` — feat(installer): custom installer guide, agent
export/import, first-launch seeder (#795)

Full Changelog:
[v0.17.2...v0.17.3](v0.17.2...v0.17.3)

---

## Release checklist
- [x] `util/validate_release_notes.py docs/releases/v0.17.3.mdx --tag
v0.17.3` passes
- [x] `src/gaia/version.py` → `0.17.3`
- [x] `src/gaia/apps/webui/package.json` → `0.17.3`
- [x] Navbar label in `docs/docs.json` → `v0.17.3 · Lemonade 10.0.0`
- [x] All 21 PRs in the range (v0.17.2..HEAD) are represented in the
notes
- [ ] Review from @kovtcharov-amd addressed
exiao added a commit to exiao/skills that referenced this pull request Apr 21, 2026
Known bug (anthropics/claude-code-action#1206): the OIDC-to-app-token
exchange returns 401 on @v1. Pinning to v1.0.99 per the referenced
workaround in amd/gaia#797.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops DevOps/infrastructure changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants