ci(claude): migrate to claude-code-action v1.0.99 + fix issue-handler hang#797
ci(claude): migrate to claude-code-action v1.0.99 + fix issue-handler hang#797kovtcharov merged 7 commits intomainfrom
Conversation
… hang ## What this fixes 1. **PR reviews haven't fired since #783 merged.** The workflow's `@beta` pin is stuck on a 2025-08-22 SHA that predates `pull_request_target` support (merged upstream 2025-09-22) and Opus 4.7 support (v1.0.98). The action's Prepare step has been rejecting `pull_request_target` with "Unsupported event type" on every run, and `continue-on-error: true` was hiding the failure as "success". 2. **`@claude` mentions post a TODO checklist and never update it with findings.** In v0 tag mode, large custom_instructions + low max_turns were exhausting the turn budget before Claude got to the final comment-update step. Visible on run 24581846289 for PR #795. ## Changes - **Pin all 4 action call sites to `v1.0.99` by SHA** — unblocks `pull_request_target` and Opus 4.7. SHA-pin rather than tag-pin so a future floating-tag retarget can't silently repeat this class of breakage. - **Migrate to the v1 API.** v1.0.99 drops the v0 inputs we use (`direct_prompt`, `custom_instructions`, `model`, `max_turns`). Merged `direct_prompt` + `custom_instructions` into a single `prompt` block per job; moved `model` and `max_turns` into `claude_args`. Migration guide: https://github.com/anthropics/claude-code-action/blob/main/docs/migration-guide.md - **All 4 jobs now run in automation mode** (via `prompt` input) instead of tag mode. Two reasons: - Works around anthropics/claude-code-action#1223 (open bug): `--model` is silently ignored in tag mode, falling back to Sonnet 4.6. Automation mode honors `--model claude-opus-4-7` correctly. - Fixes the unchecked-TODO behavior: automation mode runs Claude to completion and posts one final comment, no progress tracker to forget to update. - Claude posts its reply via `gh pr comment` / `gh issue comment` / `gh api` from within the automation task. - **`--max-turns` bumped for `issue-handler` from 30 to 50** — the TODO-unfilled behavior was often turn-budget exhaustion against the expanded custom_instructions added in #783. - **`continue-on-error: true` removed from all 3 Claude action steps.** This masking is the same "No Silent Fallbacks" pattern #783 explicitly added to `CLAUDE.md` as prohibited — the workflow was the biggest violator. - **Prompt-injection hardening on `pr-comment` and `issue-handler`** — instead of interpolating `github.event.comment.body` into the `prompt` (a classic Actions injection sink when user content lands inside another shell/markdown context), the prompt tells Claude to fetch the comment body itself via `gh api`. Workflow-context values in the prompt are limited to numeric IDs and repo names.
47b93ac to
ae1fb3f
Compare
Code Review — PR #797SummaryWell-scoped, well-documented CI fix that migrates Issues Found🔴 Critical — Temporary
|
Claude's own review of this PR caught it: on issues.opened events, github.event.comment.id is empty, so the prompt's gh api .../issues/comments/ call becomes a malformed URL. Add an explicit COMMENT ID field to the prompt and instruct Claude to skip the comment fetch when it's empty.
14673b7 to
ad99674
Compare
|
@claude quick sanity check — does the |
|
Claude finished @kovtcharov's task —— View job Short answerIt handles this correctly — no action needed. GitHub's DetailsLooking at The trigger is
This is the documented behavior of the Create a reply for a review comment endpoint — it's specifically designed to be called with either a root or a reply, which is why the upstream One thing worth noting (🟢 minor, not a blocker)The prompt reads "Post your reply as a threaded reply on the review comment" — accurate for Claude's purposes, but someone reading the code later might worry about the reply-to-reply case you're asking about. A one-line comment near the # /replies accepts any review comment ID (root or reply); GitHub threads it
# into the parent conversation automatically. No need to resolve to thread root.Purely docs; the functional behavior is correct as-is. Separate nit I noticed while checking thisLine 407's Done
|
…validate v1.0.99" This reverts commit 18efdc0.
Code Review — PR #797 (follow-up on
|
Prior commits in this branch rewrote the prompt content (compressing
bullet lists into prose, dropping sections like 'For Bugs' / 'For
Feature Requests' / 'Limitations' / the full docs/ paths list) while
migrating from v0 inputs (direct_prompt + custom_instructions) to v1
`prompt`. Those content edits were scope creep — not required by the
API migration.
Restore the original direct_prompt and custom_instructions text
byte-for-byte inside the new `prompt:` input. The only additions are
the minimum automation-mode scaffolding that tag mode used to do for
us:
- context header (REPO / PR NUMBER / COMMENT ID) so Claude knows the
trigger identity without interpolating user-controlled text
- fetch instruction (gh api) — tag mode auto-injected the comment
body; automation mode doesn't, so Claude must fetch
- post instruction (gh pr/issue comment or gh api replies) — tag mode
auto-posted Claude's final message; automation mode doesn't
Verified with byte-equality check that old direct_prompt and
custom_instructions substrings are present verbatim in the new prompt
for all 4 jobs (pr-review, pr-comment, issue-handler, release-notes).
SummaryClean, well-motivated CI migration: pins Issues Found🟡 Important🟡 Temporary The current branch state still contains the test-only trigger that the PR description says was reverted. Commit
Since the author has already marked these lines with Revert lines 46-48: (i.e. drop the And revert the 🟢 Minor🟢 The Example for 🟢 Optional: scope Top-level Strengths
VerdictRequest changes — only because the temporary |
…mpts" This reverts commit 90ed591.
PR #795 (large installer + agent export-import feature, many changed files) exceeded the 20-turn budget on pr-review before Claude could post its review comment. Run 24586335693 failed with `error_max_turns`, visible now that #797 removed the continue-on-error mask. issue-handler was already at 50 from #797. Matching pr-review and pr-comment for consistency — same failure mode, same fix. release-notes stays at 30 since release diffs are bounded to a tag range.
## Summary PR #795 ([run 24586335693](https://github.com/amd/gaia/actions/runs/24586335693)) exceeded pr-review's `--max-turns 20` budget and failed with `error_max_turns` — no review comment posted. That failure is visible (not silently swallowed) thanks to #797 removing the `continue-on-error` mask. The fix is the same bump I applied to `issue-handler` during the v1 migration: 20 → 50. Matching `pr-comment` at the same time for consistency — same failure mode would apply on a large-diff PR conversation. `release-notes` stays at 30 since release diffs are bounded to a tag-to-tag range. ## Test plan - [ ] After merge, re-run `pr-review` on PR #795 (close+reopen or push an empty commit) — confirm Claude completes the review within 50 turns and posts the comment - [ ] Spot-check next 2-3 post-merge PRs don't regress to failures
## Summary The FAQ's "Does GAIA send my data anywhere?" answer said a flat **No**, but that only holds when you're on the default Lemonade backend. If you configure the Claude API or OpenAI provider via settings, prompts leave the machine. Updating the lead to "Not by default" and adding a one-sentence caveat closes a minor-but-real accuracy gap. ## Test plan - [ ] Skim the rendered FAQ section on the docs site after merge — confirm the caveat reads naturally in Mintlify rendering - [ ] **Primary test purpose:** validate that `.github/workflows/claude.yml` (merged in #797) produces a real review comment on this PR via `pull_request_target` — the first post-merge validation that the v1.0.99 migration works end-to-end.
# GAIA v0.17.3 Release Notes GAIA v0.17.3 is an extensibility and resilience release. You can now package your own agents into a custom GAIA installer and seed them on first launch, point GAIA at alternative OpenAI-compatible inference servers from the C++ library (Ollama, for example), and start from three new reference agents (weather, RAG Q&A, HTML mockup) that execute against real Lemonade hardware in CI. It also hardens the RAG cache against an insecure-deserialization class of bug (CWE-502) — all users should upgrade. **Why upgrade:** - **Ship your own GAIA** — Export and import agents between machines, follow a new guide to produce a custom installer that seeds your agents on first launch, and on Windows install everything in one step because the installer now includes the Lemonade Server MSI. - **Work with alternative inference backends** — The C++ library now preserves OpenAI-compatible `/v1` base URLs instead of rewriting them to `/api/v1`, so servers that expose the standard `/v1` path (Ollama, for example) work out of the box. - **Start from a working example** — Three new reference agents (weather via MCP, RAG document Q&A, HTML landing-page generator) with integration tests that actually execute against Lemonade on a Strix CI runner. - **Safer RAG cache** — Replaces `pickle` deserialization with JSON + HMAC-SHA256 (CWE-502). Unsigned or tampered caches are rejected and transparently rebuilt on the next query. - **Better document handling** — Encrypted or corrupted PDFs now produce distinct, actionable errors (`EncryptedPDFError`, `CorruptedPDFError`) instead of generic failures, and the RAG index is hardened for concurrent queries. --- ## What's New ### Custom Installers and Agent Portability You can now package a custom GAIA installer that ships with your own agents pre-loaded, and move agents between machines with export/import (PR #795). On Windows, the official installer now includes the Lemonade Server MSI and runs it during install, so a fresh machine has the complete local-LLM stack after a single download (PR #781). **What you can do:** - Export an agent from `~/.gaia/agents/` to a portable bundle with `gaia agents export` and import it on another machine with `gaia agents import` - Follow the new custom-installer playbook at [`docs/playbooks/custom-installer/index.mdx`](/playbooks/custom-installer) to distribute GAIA with your agents pre-loaded — useful for workshops, team deployments, and internal tooling - On Windows, the installer now includes Lemonade Server — no separate download for a complete first-run experience **Under the hood:** - `gaia agents export` / `gaia agents import` CLI commands round-trip agents between machines as portable bundles - First-launch agent seeder (`src/gaia/apps/webui/services/agent-seeder.cjs`) copies `<resourcesPath>/agents/<id>/` into `~/.gaia/agents/<id>/` the first time the app starts - Windows NSIS installer embeds `lemonade-server-minimal.msi` into `$PLUGINSDIR` and runs it via `msiexec /i ... /qn /norestart` during install (auto-cleaned on exit) --- ### Broader Backend Compatibility in the C++ Library The C++ library now preserves OpenAI-compatible `/v1` base URLs (PR #773) instead of rewriting them to `/api/v1`. That means inference servers that expose the standard OpenAI `/v1` path — for example, Ollama at `http://localhost:11434/v1` — work out of the box without needing a special adapter. --- ### Reference Agents and Real-Hardware Integration Tests Three new example agents and a Strix-runner CI workflow land together (PR #340). **What you can do:** - Copy `examples/weather_agent.py`, `examples/rag_doc_agent.py`, or `examples/product_mockup_agent.py` as a starting point for your own agents - Run the new integration tests locally against Lemonade to validate agents end-to-end, not just structurally **Under the hood:** - `tests/integration/test_example_agents.py` executes agents and validates responses with a 5-minute-per-test timeout - `.github/workflows/test_examples.yml` runs on the self-hosted Strix runner (`stx` label) with Lemonade serving `Qwen3-4B-Instruct-2507-GGUF` - Docs homepage refreshed with a technical value prop ("Agent SDK for AMD Ryzen AI") and MCP / CUA added to the capabilities list --- ### Smarter PDF Handling in RAG Encrypted and corrupted PDFs now surface as distinct, actionable errors (`EncryptedPDFError`, `CorruptedPDFError`, `EmptyPDFError`) instead of generic failures or silent 0-chunk indexes (PR #784, closes #451). Encrypted PDFs are detected before extraction; corrupted PDFs are caught during extraction with a clear message. Combined with the indexing-failure surfacing in PR #723, you get a visible indexing-failed status the moment a document fails — and the RAG index itself is now thread-safe under concurrent queries (PR #746). --- ## Security ### RAG Cache Deserialization Replaced with JSON + HMAC Fixes an insecure-deserialization issue in the RAG cache (CWE-502, PR #768). Previously, cached document indexes were serialized with Python `pickle`; if an attacker could write to `~/.gaia/` — via a shared drive, a sync conflict, or a malicious extension — loading that cache could execute arbitrary code. v0.17.3 replaces `pickle` with signed JSON: caches are now serialized as JSON and authenticated with HMAC-SHA256 using a per-install key stored at `~/.gaia/cache/hmac.key`. Unsigned or tampered caches are rejected and transparently rebuilt on the next query. Old `.pkl` caches from previous GAIA versions are ignored and re-indexed the next time you query a document. **You should upgrade if you** share `~/.gaia/` across machines (Dropbox, iCloud, network home directories), run GAIA in a multi-user environment, or have ever imported RAG caches from another source. --- ## Bug Fixes - **Ask Agent attaches files before sending to chat** (PR #725) — Dropped files are indexed into RAG and attached to the active session before the prompt is consumed, so the model sees the document on the first turn instead of the second. - **Document indexing failures are surfaced** (PR #723) — A document that produces 0 chunks now raises `RuntimeError` in the SDK and surfaces as `indexing_status: failed` in the UI, instead of looking like a silent success. Covers RAG SDK, background indexing, and re-index paths. - **Encrypted or corrupted PDFs produce actionable errors** (PR #784, closes #451) — RAG now raises distinct `EncryptedPDFError` and `CorruptedPDFError` exceptions instead of generic failures, so you see exactly what went wrong. - **RAG index thread safety hardened** (PR #746) — Adds `RLock` protection around index mutation paths and rebuilds chunk/index state atomically before publishing it, so concurrent queries read consistent snapshots and failed rebuilds no longer leak partial state. - **MCP JSON-RPC handler guards against non-dict bodies** (PR #803) — A malformed JSON-RPC payload (array, string, null) now returns HTTP 400 `Invalid Request: expected JSON object` instead of an HTTP 500 from a `TypeError`. - **File-search count aligned with accessible results** (PR #754) — The returned count now matches the number of files the tool actually surfaces, instead of a pre-filter total that over-reported results the caller could not access. - **Tracked block cursor replaces misplaced decorative cursor** (PR #727) — Fixes the mis-positioned blinking cursor in the chat input box, which now tracks the actual caret position via a mirror-div technique. - **Ad-hoc sign the macOS app bundle instead of skipping code signing** (PR #765) — The `.app` bundle inside the DMG now carries an ad-hoc signature, so Gatekeeper presents a single "Open Anyway" bypass in System Settings instead of the unrecoverable "is damaged" error. Full Apple Developer ID signing is still being finalized. --- ## Release & CI - **Publish workflow: single approval gate, no legacy Electron apps** (PR #758) — Removed the legacy jira and example standalone Electron apps from the publish pipeline; a single `publish` environment gate governs PyPI, npm, and installer publishing. - **Claude CI modernization** (PR #797, PR #799, PR #783) — Migrated all four `claude-code-action` call sites to `v1.0.99` (pinned by SHA, fixes an issue-handler hang), bumped `--max-turns` from 20 to 50 on both `pr-review` and `pr-comment` for deeper analysis, upgraded to Opus 4.7, standardized 23 subagent definitions with explicit when-to-use sections and tool allowlists, and added agent-builder tooling (manifest schema, `lint.py --agents`, BuilderAgent mixins). --- ## Docs - **Roadmap overhaul** (PR #710) — Milestone-aligned plans with voice-first as P0 and 9 new plan documents for upcoming initiatives. - **Plan: email triage agent** (PR #796) — Specification for an upcoming email triage agent. - **Docs/source drift resolved** (PR #794) — Fixed broken SDK examples across 15 docs, rewrote 5 spec files against the current source (including two that documented entire APIs that don't exist in code), added 20+ missing CLI flags to the CLI reference, and removed 2 already-shipped plan documents (installer, mcp-client). - **FAQ: data-privacy answer clarified for external LLM providers** (PR #798) — Sharper guidance on what leaves your machine when you point GAIA at Claude or OpenAI. --- ## Full Changelog **21 commits** since v0.17.2: - `6d3f3f71` — fix: replace misplaced decorative cursor with tracked terminal block cursor (#727) - `874cf2a3` — fix: Ask Agent indexes and attaches files before sending to chat (#725) - `4fa121e2` — fix: surface document indexing failures instead of silent 0-chunk success (#723) - `34b1d06e` — fix(ci): ad-hoc sign macOS DMG instead of skipping code signing (#765) - `7188b83c` — Roadmap overhaul: milestone-aligned plans with voice-first P0 and 9 new plan documents (#710) - `1beddac5` — cpp: support Ollama-compatible /v1 endpoints (#773) - `cf9ac995` — fix: harden rag index thread safety (#746) - `1c55c31b` — fix(ci): remove legacy electron apps from publish, single approval gate (#758) - `52946a7a` — feat(installer): bundle Lemonade Server MSI into Windows installer (#774) (#781) - `e96b3686` — ci(claude): review infra + conventions + subagent overhaul + agent-builder tooling (#783) - `058674b5` — fix(rag): detect encrypted and corrupted PDFs with actionable errors (#451) (#784) - `7bcb5d51` — fix: replace insecure pickle deserialization with JSON + HMAC in RAG cache (CWE-502) (#768) - `a5167e5f` — fix: keep file-search count aligned with accessible results (#754) - `da5ba458` — ci(claude): migrate to claude-code-action v1.0.99 + fix issue-handler hang (#797) - `03f546b9` — ci(claude): bump pr-review and pr-comment --max-turns 20 -> 50 (#799) - `4119d564` — docs(faq): clarify data-privacy answer re: external LLM providers (#798) - `0cfbcf41` — Add example agents and integration test workflow (#340) - `c4bd15fb` — docs: fix drift between docs and source (docs review pass 1 + 2) (#794) - `407ed5b8` — docs(plans): add email triage agent spec (#796) - `06fb04a4` — fix(mcp): guard JSON-RPC handler against non-dict body (#803) - `880ad603` — feat(installer): custom installer guide, agent export/import, first-launch seeder (#795) Full Changelog: [v0.17.2...v0.17.3](v0.17.2...v0.17.3) --- ## Release checklist - [x] `util/validate_release_notes.py docs/releases/v0.17.3.mdx --tag v0.17.3` passes - [x] `src/gaia/version.py` → `0.17.3` - [x] `src/gaia/apps/webui/package.json` → `0.17.3` - [x] Navbar label in `docs/docs.json` → `v0.17.3 · Lemonade 10.0.0` - [x] All 21 PRs in the range (v0.17.2..HEAD) are represented in the notes - [ ] Review from @kovtcharov-amd addressed
Known bug (anthropics/claude-code-action#1206): the OIDC-to-app-token exchange returns 401 on @v1. Pinning to v1.0.99 per the referenced workaround in amd/gaia#797.
Why
Two real, verified-from-logs problems with the current Claude Code setup:
1. PR reviews haven't fired since #783 merged. The workflow's
@betapin points to a 2025-08-22 SHA that predatespull_request_targetsupport (merged in anthropics/claude-code-action#579 on 2025-09-22) and Opus 4.7 support (fixed in v1.0.98). The action's Prepare step rejectspull_request_targetwithUnsupported event type, butcontinue-on-error: truewas hiding the failure as a "success" conclusion. Run 24580730832 on PR #795 is the concrete example.2.
@claudementions post a TODO checklist and never update it with findings. In v0 tag mode, largecustom_instructions+max_turns: 30exhaust the turn budget before Claude reaches the final comment-update step. Visible on run 24581846289 for PR #795.What changed
c3d45e8e941e1b2ad7b278c57482d9c5bf1f35b3).direct_prompt+custom_instructionsinto a singlepromptper job; movedmodel/max_turnsintoclaude_args.promptinput), not tag mode, to work around anthropics/claude-code-action#1223 (tag-mode--modelsilently ignored) and to fix the unchecked-TODO behavior.--max-turnsbumped forissue-handlerfrom 30 to 50.continue-on-error: trueremoved from the 3 Claude action steps.pr-comment/issue-handler— comment bodies fetched viagh apiat runtime instead of interpolated fromgithub.event.comment.body.Validation (completed on this PR via temporary
pull_requesttrigger)Tested end-to-end on PR #797 itself by temporarily adding a
pull_requesttrigger (sincepull_request_targetusesmain's workflow, not the PR-head's). That trigger is now reverted — final diff is migration-only.pr-reviewpull_request(synchronize)issue-handlerissue_comment(@claude mention)Claude's own review of this PR caught a real bug in the
issue-handlerprompt — onissues.openedeventsgithub.event.comment.idis empty, so the prompt'sgh api .../issues/comments/URL would 404. Fixed in ad99674 by adding an explicitCOMMENT IDfield and instructing Claude to skip the comment fetch when empty. Dogfooding worked.Still unvalidated end-to-end (structural validation only):
pr-comment(pull_request_review_comment) — uses the same automation-mode pattern as the two validated pathsrelease-notes(workflow_run) — only fires onPublish Releasecompletion; will self-validate on the next releaseCommits
4 commits in the branch. Net diff = migration + issue-handler fix. Recommend squash-merge to collapse to one clean commit.
ae1fb3f— the v1 migration (main change)18efdc0— TEST ONLY: addpull_requesttrigger (for validation)ad99674— fix from Claude's own review: gate comment fetch on COMMENT IDc7cbc59— revert of (2)Vetting v1.0.99 against open upstream bugs
v1 is not bug-free. Assessment per issue:
CLAUDE.md,.claude/)GITHUB_TOKEN, not a GitHub App installationghCLI unavailable on default-branch PRsmain. Mitigated by our pre-generatedpr-diff.txt/pr-files.txt--modelignored in tag modeexecution_filenot written on max_turnsRollback
Single workflow file, isolated change.
git revertrestores@beta— which restores the silent-failure state, not a known-good state. If a real merge-time regression appears, fix forward.