From 0f1ada86f19487b3265a7650dbf51134d3c9eed4 Mon Sep 17 00:00:00 2001 From: Nathan Heskew Date: Wed, 13 May 2026 12:59:55 -0700 Subject: [PATCH 1/8] ci: bump AI-review caller pins to ai-review-prompts@d446b4c6 (debug iteration) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Symmetric pin across `claude-review.yml` and `gemini-review.yml`, bumped from `128656e4` → `d446b4c6` (skipping the intermediate `7c8aae0a` since `d446b4c6` includes everything from #30/#31 PLUS the debug-iteration changes from #32). ## What `gemini-review.yml` picks up (substantive) The MCP-based agentic Gemini reviewer (#30) + debug iteration settings (#32): - Architecture: workflow-posts-single-shot → MCP-based agentic tool use (Docker'd `github-mcp-server`, inline comments, `pull_request_review_write` etc.) - Custom `/harper-review` slash command composed from our layered review scope - Severity tagging (`🔴` Critical / `🟠` High only) - Debug iteration (this PR's new content): - `gemini_debug: 'true'` — CLI stdout/stderr streamed inline in the workflow log so MCP startup and tool registration are visible - `tools.core` allowlist REMOVED — opens shell surface so we can observe what the agent actually wants to call (the first trial of #30 hit "Tool execution denied by policy" on `run_shell_command` without surfacing which commands were attempted) - `maxSessionTurns: 15` → `30` — headroom while iterating Patterns adopted from Google's official PR review example for `run-gemini-cli`; reimplemented around our auth gate, layered scope, and log-issue threading. See the upstream `_gemini-review.yml` header comment for the upstream-vs-ours diff. ## What `claude-review.yml` picks up (no functional change) Just stays in lockstep on the SHA. The Gemini-side changes don't affect Claude: shared `find-prior-review-comment.sh` and `log-review-to-ai-review-log.sh` continue to default through their pre-Gemini code paths. Also inherits the validator hardening (#31): Docker image references in reusable workflows are now lint-required to use `@sha256:` digest pinning. Claude's caller has none, so it's a no-op for Claude — but documents the discipline going forward. ## This PR's history Originally opened as a bump from `128656e4` → `7c8aae0a` (the post-#30 pin). The initial trial-run failed at the gemini-cli step with three compounding issues (MCP tools never registered, shell commands denied, maxSessionTurns hit). Branch is now amended forward to `d446b4c6` which includes the debug iteration that should surface root causes on the next run. Co-Authored-By: Claude Opus 4.7 (1M context) --- .github/workflows/claude-review.yml | 4 ++-- .github/workflows/gemini-review.yml | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/.github/workflows/claude-review.yml b/.github/workflows/claude-review.yml index ce6f4a7..c363586 100644 --- a/.github/workflows/claude-review.yml +++ b/.github/workflows/claude-review.yml @@ -24,7 +24,7 @@ concurrency: jobs: review: - uses: HarperFast/ai-review-prompts/.github/workflows/_claude-review.yml@128656e40c87c0e1293c542a5500df4f68dbff85 # main 2026-05-12 (post #25 — symmetric pin with gemini-review.yml; picks up shared-script refactor and authorize-ai-workflow.sh rename) + uses: HarperFast/ai-review-prompts/.github/workflows/_claude-review.yml@d446b4c68bb02068ede471ad5ac34c9af380bdfb # main 2026-05-13 (post #30/#31/#32 — symmetric pin with gemini-review.yml; Claude has no functional change from any of these; pin just stays in lockstep) with: # Same SHA as the `uses:` ref above. The reusable uses this to # check out HarperFast/ai-review-prompts (layer files + bash @@ -35,7 +35,7 @@ jobs: # introspect their own ref (`github.workflow_ref` resolves to the # CALLER's ref in `workflow_call` context), and `uses: …@` # is parsed literally so we can't interpolate a variable. - ai-review-prompts-ref: 128656e40c87c0e1293c542a5500df4f68dbff85 + ai-review-prompts-ref: d446b4c68bb02068ede471ad5ac34c9af380bdfb review-layers: | universal harper/common diff --git a/.github/workflows/gemini-review.yml b/.github/workflows/gemini-review.yml index 936a315..4e074a9 100644 --- a/.github/workflows/gemini-review.yml +++ b/.github/workflows/gemini-review.yml @@ -34,13 +34,13 @@ concurrency: jobs: review: - uses: HarperFast/ai-review-prompts/.github/workflows/_gemini-review.yml@128656e40c87c0e1293c542a5500df4f68dbff85 # main 2026-05-12 (post #25 — workflow posts Gemini response, output-name fix, default model gemini-3-flash-preview) + uses: HarperFast/ai-review-prompts/.github/workflows/_gemini-review.yml@d446b4c68bb02068ede471ad5ac34c9af380bdfb # main 2026-05-13 (post #30/#31/#32 — MCP pivot + debug-iteration: gemini_debug=true, no tools.core allowlist, maxSessionTurns=30 to surface what's happening with MCP registration and shell denials) with: # Same SHA as the `uses:` ref above. See claude-review.yml # in this repo for why the duplication is unavoidable # (reusable workflows can't introspect their own ref in # workflow_call context). - ai-review-prompts-ref: 128656e40c87c0e1293c542a5500df4f68dbff85 + ai-review-prompts-ref: d446b4c68bb02068ede471ad5ac34c9af380bdfb review-layers: | universal harper/common From 85f25e059f70c40331ecd0e5b7a7ac5a3bba0f07 Mon Sep 17 00:00:00 2001 From: Nathan Heskew Date: Wed, 13 May 2026 16:05:00 -0700 Subject: [PATCH 2/8] ci: bump ai-review-prompts pin to fa9ba10 (post #33) Atomic single-call submission for Gemini (`pull_request_review_write` with `method=create, event=COMMENT`) replaces the multi-step pending-review flow that hit submission races in the previous iteration. Prompt also tightens scope discipline so the agent stops reviewing source files outside the diff. Claude has no functional change from #33; pin moves to stay symmetric with the Gemini pin within this repo. Re-using this PR as the comparison-test surface for one more iteration. If the run is clean, next bump disables `gemini_debug` and finalizes the settings. Co-Authored-By: Claude Opus 4.7 (1M context) --- .github/workflows/claude-review.yml | 4 ++-- .github/workflows/gemini-review.yml | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/.github/workflows/claude-review.yml b/.github/workflows/claude-review.yml index c363586..d2ac174 100644 --- a/.github/workflows/claude-review.yml +++ b/.github/workflows/claude-review.yml @@ -24,7 +24,7 @@ concurrency: jobs: review: - uses: HarperFast/ai-review-prompts/.github/workflows/_claude-review.yml@d446b4c68bb02068ede471ad5ac34c9af380bdfb # main 2026-05-13 (post #30/#31/#32 — symmetric pin with gemini-review.yml; Claude has no functional change from any of these; pin just stays in lockstep) + uses: HarperFast/ai-review-prompts/.github/workflows/_claude-review.yml@fa9ba10643037ef9d42b2a6b954617cd411317b3 # main 2026-05-13 (post #30/#31/#32/#33 — symmetric pin with gemini-review.yml; Claude has no functional change from any of these; pin just stays in lockstep) with: # Same SHA as the `uses:` ref above. The reusable uses this to # check out HarperFast/ai-review-prompts (layer files + bash @@ -35,7 +35,7 @@ jobs: # introspect their own ref (`github.workflow_ref` resolves to the # CALLER's ref in `workflow_call` context), and `uses: …@` # is parsed literally so we can't interpolate a variable. - ai-review-prompts-ref: d446b4c68bb02068ede471ad5ac34c9af380bdfb + ai-review-prompts-ref: fa9ba10643037ef9d42b2a6b954617cd411317b3 review-layers: | universal harper/common diff --git a/.github/workflows/gemini-review.yml b/.github/workflows/gemini-review.yml index 4e074a9..50a9a73 100644 --- a/.github/workflows/gemini-review.yml +++ b/.github/workflows/gemini-review.yml @@ -34,13 +34,13 @@ concurrency: jobs: review: - uses: HarperFast/ai-review-prompts/.github/workflows/_gemini-review.yml@d446b4c68bb02068ede471ad5ac34c9af380bdfb # main 2026-05-13 (post #30/#31/#32 — MCP pivot + debug-iteration: gemini_debug=true, no tools.core allowlist, maxSessionTurns=30 to surface what's happening with MCP registration and shell denials) + uses: HarperFast/ai-review-prompts/.github/workflows/_gemini-review.yml@fa9ba10643037ef9d42b2a6b954617cd411317b3 # main 2026-05-13 (post #30/#31/#32/#33 — atomic submission via pull_request_review_write create+COMMENT; strict scope-to-diff prompt; minimal tools.core allowlist back; maxSessionTurns=20; gemini_debug=true kept for one more verification cycle) with: # Same SHA as the `uses:` ref above. See claude-review.yml # in this repo for why the duplication is unavoidable # (reusable workflows can't introspect their own ref in # workflow_call context). - ai-review-prompts-ref: d446b4c68bb02068ede471ad5ac34c9af380bdfb + ai-review-prompts-ref: fa9ba10643037ef9d42b2a6b954617cd411317b3 review-layers: | universal harper/common From 927e4d663e1e7f45c8bd3e973517cdc2ced62d1a Mon Sep 17 00:00:00 2001 From: Nathan Heskew Date: Thu, 14 May 2026 06:28:38 -0700 Subject: [PATCH 3/8] ci: bump ai-review-prompts pin to 832d8e6 (post #34) #34 fixes the post-#33 turn-exhaustion: passes PR-context env vars (PULL_REQUEST_NUMBER, REPOSITORY, ISSUE_TITLE, ISSUE_BODY) so the agent doesn't burn turns running `printenv` / `env | grep` / `git remote -v` trying to figure out which PR it's reviewing. Also widens tools.core to match upstream (`head`/`tail` added) so denial retries stop eating the turn budget. Claude has no functional change from #34; pin moves to stay symmetric with the Gemini pin within this repo. Continuing to use this PR as the comparison-test surface for one more iteration. If clean, next bump disables `gemini_debug`. Co-Authored-By: Claude Opus 4.7 (1M context) --- .github/workflows/claude-review.yml | 4 ++-- .github/workflows/gemini-review.yml | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/.github/workflows/claude-review.yml b/.github/workflows/claude-review.yml index d2ac174..4bb69fb 100644 --- a/.github/workflows/claude-review.yml +++ b/.github/workflows/claude-review.yml @@ -24,7 +24,7 @@ concurrency: jobs: review: - uses: HarperFast/ai-review-prompts/.github/workflows/_claude-review.yml@fa9ba10643037ef9d42b2a6b954617cd411317b3 # main 2026-05-13 (post #30/#31/#32/#33 — symmetric pin with gemini-review.yml; Claude has no functional change from any of these; pin just stays in lockstep) + uses: HarperFast/ai-review-prompts/.github/workflows/_claude-review.yml@832d8e6bb1ab2068585ded4288797ac191b21360 # main 2026-05-14 (post #30..#34 — symmetric pin with gemini-review.yml; Claude has no functional change from any of these; pin just stays in lockstep) with: # Same SHA as the `uses:` ref above. The reusable uses this to # check out HarperFast/ai-review-prompts (layer files + bash @@ -35,7 +35,7 @@ jobs: # introspect their own ref (`github.workflow_ref` resolves to the # CALLER's ref in `workflow_call` context), and `uses: …@` # is parsed literally so we can't interpolate a variable. - ai-review-prompts-ref: fa9ba10643037ef9d42b2a6b954617cd411317b3 + ai-review-prompts-ref: 832d8e6bb1ab2068585ded4288797ac191b21360 review-layers: | universal harper/common diff --git a/.github/workflows/gemini-review.yml b/.github/workflows/gemini-review.yml index 50a9a73..b83f7d3 100644 --- a/.github/workflows/gemini-review.yml +++ b/.github/workflows/gemini-review.yml @@ -34,13 +34,13 @@ concurrency: jobs: review: - uses: HarperFast/ai-review-prompts/.github/workflows/_gemini-review.yml@fa9ba10643037ef9d42b2a6b954617cd411317b3 # main 2026-05-13 (post #30/#31/#32/#33 — atomic submission via pull_request_review_write create+COMMENT; strict scope-to-diff prompt; minimal tools.core allowlist back; maxSessionTurns=20; gemini_debug=true kept for one more verification cycle) + uses: HarperFast/ai-review-prompts/.github/workflows/_gemini-review.yml@832d8e6bb1ab2068585ded4288797ac191b21360 # main 2026-05-14 (post #30..#34 — atomic submission + scope-to-diff + PR-context env vars (PULL_REQUEST_NUMBER/REPOSITORY/ISSUE_TITLE/ISSUE_BODY) so the agent stops thrashing on context discovery; tools.core widened to match upstream (cat/echo/git/grep/head/tail); gemini_debug=true for one more verification cycle) with: # Same SHA as the `uses:` ref above. See claude-review.yml # in this repo for why the duplication is unavoidable # (reusable workflows can't introspect their own ref in # workflow_call context). - ai-review-prompts-ref: fa9ba10643037ef9d42b2a6b954617cd411317b3 + ai-review-prompts-ref: 832d8e6bb1ab2068585ded4288797ac191b21360 review-layers: | universal harper/common From 6ab4594bbdae429030eb68408b94cd02ebb876ea Mon Sep 17 00:00:00 2001 From: Nathan Heskew Date: Thu, 14 May 2026 08:06:20 -0700 Subject: [PATCH 4/8] ci: bump ai-review-prompts pin to e1f8d43 (post #35) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit #35 pivots Gemini from our custom `/harper-review` slash command to the upstream `/pr-code-review` from the gemini-cli-extensions/code-review extension. The custom command was fighting the model's training: agent reverted to shell-based PR exploration (printenv, gh pr view, ls -R, npm list — all denied, all retried) and never reached MCP. Upstream's slash command is the pattern the model was trained on. Harper-specific scope (layered reviews, severity discipline, trivial-diff short-circuit, marker requirement) is now layered in via the ADDITIONAL_CONTEXT env var, which the upstream prompt substitutes at render time. Claude has no functional change from #35; pin moves to stay symmetric with the Gemini pin within this repo. Continuing to use this PR as the comparison-test surface. If clean, next bump disables gemini_debug and removes the now-unused legacy prompt template. Co-Authored-By: Claude Opus 4.7 (1M context) --- .github/workflows/claude-review.yml | 4 ++-- .github/workflows/gemini-review.yml | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/.github/workflows/claude-review.yml b/.github/workflows/claude-review.yml index 4bb69fb..680dfc5 100644 --- a/.github/workflows/claude-review.yml +++ b/.github/workflows/claude-review.yml @@ -24,7 +24,7 @@ concurrency: jobs: review: - uses: HarperFast/ai-review-prompts/.github/workflows/_claude-review.yml@832d8e6bb1ab2068585ded4288797ac191b21360 # main 2026-05-14 (post #30..#34 — symmetric pin with gemini-review.yml; Claude has no functional change from any of these; pin just stays in lockstep) + uses: HarperFast/ai-review-prompts/.github/workflows/_claude-review.yml@e1f8d434aafbd71943649ba7a8c6727638a1f5a3 # main 2026-05-14 (post #30..#35 — symmetric pin with gemini-review.yml; Claude has no functional change from any of these; pin just stays in lockstep) with: # Same SHA as the `uses:` ref above. The reusable uses this to # check out HarperFast/ai-review-prompts (layer files + bash @@ -35,7 +35,7 @@ jobs: # introspect their own ref (`github.workflow_ref` resolves to the # CALLER's ref in `workflow_call` context), and `uses: …@` # is parsed literally so we can't interpolate a variable. - ai-review-prompts-ref: 832d8e6bb1ab2068585ded4288797ac191b21360 + ai-review-prompts-ref: e1f8d434aafbd71943649ba7a8c6727638a1f5a3 review-layers: | universal harper/common diff --git a/.github/workflows/gemini-review.yml b/.github/workflows/gemini-review.yml index b83f7d3..6d8328d 100644 --- a/.github/workflows/gemini-review.yml +++ b/.github/workflows/gemini-review.yml @@ -34,13 +34,13 @@ concurrency: jobs: review: - uses: HarperFast/ai-review-prompts/.github/workflows/_gemini-review.yml@832d8e6bb1ab2068585ded4288797ac191b21360 # main 2026-05-14 (post #30..#34 — atomic submission + scope-to-diff + PR-context env vars (PULL_REQUEST_NUMBER/REPOSITORY/ISSUE_TITLE/ISSUE_BODY) so the agent stops thrashing on context discovery; tools.core widened to match upstream (cat/echo/git/grep/head/tail); gemini_debug=true for one more verification cycle) + uses: HarperFast/ai-review-prompts/.github/workflows/_gemini-review.yml@e1f8d434aafbd71943649ba7a8c6727638a1f5a3 # main 2026-05-14 (post #30..#35 — pivot to upstream /pr-code-review slash command; Harper scope layered via ADDITIONAL_CONTEXT env var; tools.core matches upstream exactly (cat/echo/grep/head/tail, no git); gemini_debug=true for one more verification cycle) with: # Same SHA as the `uses:` ref above. See claude-review.yml # in this repo for why the duplication is unavoidable # (reusable workflows can't introspect their own ref in # workflow_call context). - ai-review-prompts-ref: 832d8e6bb1ab2068585ded4288797ac191b21360 + ai-review-prompts-ref: e1f8d434aafbd71943649ba7a8c6727638a1f5a3 review-layers: | universal harper/common From 15cc49d18754fd207fdc4e85a18ce620e41a272a Mon Sep 17 00:00:00 2001 From: Nathan Heskew Date: Thu, 14 May 2026 08:49:45 -0700 Subject: [PATCH 5/8] ci: bump ai-review-prompts pin to 045c81b (post #36 visibility fix) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit #36 adds caller-side `if: failure()` artifact upload + job-summary tail. The previous run-attempt failed at gemini-cli but produced no artifacts because upstream's run-gemini-cli@v0.1.22 doesn't gate its inner upload step with `if: always()` — composite-action failure semantics short-circuit it. Purpose of THIS run-attempt: capture the full post-#35 agent trace so we can finish diagnosing the actual failure mode (the prior attempt's gh-CLI log truncated to the first ~5 seconds out of ~38). No functional change to the Gemini run itself. Claude has no functional change from #36; pin moves to stay symmetric with the Gemini pin within this repo. Co-Authored-By: Claude Opus 4.7 (1M context) --- .github/workflows/claude-review.yml | 4 ++-- .github/workflows/gemini-review.yml | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/.github/workflows/claude-review.yml b/.github/workflows/claude-review.yml index 680dfc5..8284a35 100644 --- a/.github/workflows/claude-review.yml +++ b/.github/workflows/claude-review.yml @@ -24,7 +24,7 @@ concurrency: jobs: review: - uses: HarperFast/ai-review-prompts/.github/workflows/_claude-review.yml@e1f8d434aafbd71943649ba7a8c6727638a1f5a3 # main 2026-05-14 (post #30..#35 — symmetric pin with gemini-review.yml; Claude has no functional change from any of these; pin just stays in lockstep) + uses: HarperFast/ai-review-prompts/.github/workflows/_claude-review.yml@045c81bf2c2b5b5b4a520fa6b2de137f86fcbc2a # main 2026-05-14 (post #30..#36 — symmetric pin with gemini-review.yml; Claude has no functional change from any of these; pin just stays in lockstep) with: # Same SHA as the `uses:` ref above. The reusable uses this to # check out HarperFast/ai-review-prompts (layer files + bash @@ -35,7 +35,7 @@ jobs: # introspect their own ref (`github.workflow_ref` resolves to the # CALLER's ref in `workflow_call` context), and `uses: …@` # is parsed literally so we can't interpolate a variable. - ai-review-prompts-ref: e1f8d434aafbd71943649ba7a8c6727638a1f5a3 + ai-review-prompts-ref: 045c81bf2c2b5b5b4a520fa6b2de137f86fcbc2a review-layers: | universal harper/common diff --git a/.github/workflows/gemini-review.yml b/.github/workflows/gemini-review.yml index 6d8328d..53c7c1a 100644 --- a/.github/workflows/gemini-review.yml +++ b/.github/workflows/gemini-review.yml @@ -34,13 +34,13 @@ concurrency: jobs: review: - uses: HarperFast/ai-review-prompts/.github/workflows/_gemini-review.yml@e1f8d434aafbd71943649ba7a8c6727638a1f5a3 # main 2026-05-14 (post #30..#35 — pivot to upstream /pr-code-review slash command; Harper scope layered via ADDITIONAL_CONTEXT env var; tools.core matches upstream exactly (cat/echo/grep/head/tail, no git); gemini_debug=true for one more verification cycle) + uses: HarperFast/ai-review-prompts/.github/workflows/_gemini-review.yml@045c81bf2c2b5b5b4a520fa6b2de137f86fcbc2a # main 2026-05-14 (post #30..#36 — visibility fix: caller-side if-failure upload of gemini-artifacts (works around upstream's missing if:always() on its own upload step) + job-summary tail; this run-attempt's purpose is to capture the full post-#35 agent trace) with: # Same SHA as the `uses:` ref above. See claude-review.yml # in this repo for why the duplication is unavoidable # (reusable workflows can't introspect their own ref in # workflow_call context). - ai-review-prompts-ref: e1f8d434aafbd71943649ba7a8c6727638a1f5a3 + ai-review-prompts-ref: 045c81bf2c2b5b5b4a520fa6b2de137f86fcbc2a review-layers: | universal harper/common From 447449622b186b8078a84df058f2d6d64285c758 Mon Sep 17 00:00:00 2001 From: Nathan Heskew Date: Thu, 14 May 2026 10:20:53 -0700 Subject: [PATCH 6/8] ci: add side-by-side gemini-review-debug.yml for MCP isolation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Throwaway debug workflow to isolate why MCP tools never propagate to the model's function-call surface in the production gemini-review.yml (despite the github-mcp-server starting fine and registering session- side). Telemetry on the prior runs proves it definitively: every tool call has `tool_type: "native"`; zero `tool_type: "mcp"`. Hypothesis under test --------------------- The production caller pins `ghcr.io/github/github-mcp-server@sha256: e3816a4...` which is v1.0.4 (released 2026-05-11). v1.0.0 (2026-04-16) was a major-version bump that upgraded modelcontextprotocol/go-sdk to v1.5.0 and reorganized toolsets, introducing feature-flagged tool groups gated by `dynamicToolsets` (visible in our server logs as `dynamicToolsets=false`). Upstream's example pins to `:v0.27.0` (Feb 2026 era), predating that reorganization. Our bump to v1.0.x was Harper-side, well-intended ("newer is better"), but appears to be the regression. This workflow tests the simplest variable: same gemini-cli, same action, same prompt, same model — only the MCP server version differs. If MCP propagates with `:v0.27.0`, the root cause is the v1.0.x server. Approach -------- Mirrors upstream's tip-of-main pr-review example as closely as practical, with minimal Harper-specific deviations: - Auth: skip App-token-mint step. Default GITHUB_TOKEN only. Single- repo debug branch; only org members push. Backport will reintroduce the gate via the reusable. - Model: hardcoded gemini-3-flash-preview (our known-good choice). - Layered Harper scope: NOT applied. ADDITIONAL_CONTEXT is empty. Prove MCP propagates first; reintroduce scope after. - tools.core: [] (empty) — forces all I/O through MCP. Cleanest signal: if MCP is broken, the agent has nothing. - MCP server pin: TAG `:v0.27.0` instead of `@sha256:...` digest. SHA-pinning discipline waived for the spike; will digest-pin on backport. - Visibility: failure-path artifact upload + summary tail (same as _gemini-review.yml post-#36, with the pipefail/SIGPIPE bug fixed via `set -eu` instead of `set -euo pipefail` plus `|| true` around the offending pipe). Iteration model --------------- Edit this file → push → wait ~90s. No PR ceremony on ai-review-prompts. Runs in parallel with the production caller via a distinct concurrency group, so we can A/B compare side-by-side on the same PR. When the working config is identified, backport to `HarperFast/ai-review-prompts/_gemini-review.yml` in one focused PR. Delete this file after backport. --- .github/workflows/gemini-review-debug.yml | 201 ++++++++++++++++++++++ 1 file changed, 201 insertions(+) create mode 100644 .github/workflows/gemini-review-debug.yml diff --git a/.github/workflows/gemini-review-debug.yml b/.github/workflows/gemini-review-debug.yml new file mode 100644 index 0000000..bb5de1b --- /dev/null +++ b/.github/workflows/gemini-review-debug.yml @@ -0,0 +1,201 @@ +name: Gemini PR Review (debug) + +# Single-repo debug spike to isolate why MCP tools never propagate to +# the model's function-call surface in our production `gemini-review.yml` +# (despite the github-mcp-server starting fine and registering session- +# side). This workflow runs in PARALLEL with the production caller — +# different concurrency group, no shared steps. +# +# Hypothesis under test: +# Our production caller pins `ghcr.io/github/github-mcp-server@sha256: +# e3816a4...` which is the v1.0.4 image (released 2026-05-11). v1.0.0 +# (2026-04-16) bumped the modelcontextprotocol/go-sdk to v1.5.0 and +# reorganized toolsets, introducing feature-flagged tool groups gated +# by `dynamicToolsets` (visible in our server logs as +# `dynamicToolsets=false`). Upstream's example pins to `:v0.27.0` +# (2026-02-03 era) which predates that reorganization. The bump to +# v1.0.x was Harper-side and well-intended but likely the regression. +# +# This workflow mirrors upstream's tip-of-main example as closely as +# practical: +# https://github.com/google-github-actions/run-gemini-cli/tree/main/examples/workflows/pr-review +# +# Deviations from upstream main, kept minimal: +# - Auth: skip the App-token-mint step entirely. Use the workflow's +# default GITHUB_TOKEN. This loses the auth gate but we're on a +# single-repo debug branch — only Nathan / org members push here. +# Backport will reintroduce the gate via the existing reusable. +# - Model: hardcoded `gemini-3-flash-preview` (no `vars.GEMINI_MODEL`). +# - Layered Harper scope: NOT applied. Goal is to prove MCP propagates +# before reintroducing scope complexity. ADDITIONAL_CONTEXT is empty. +# - Visibility: same failure-path artifact upload + summary-tail steps +# we added to the reusable in #36 (with the pipefail / SIGPIPE bug +# fixed). Upstream's inner upload step still doesn't gate on +# `if: always()` so we need our own failure-path upload. +# +# Iteration model: +# Edit this file → push → wait ~90s for run. No PR ceremony on +# ai-review-prompts. When we identify the working config, backport +# to `_gemini-review.yml` in one focused PR. + +on: + pull_request: + types: [opened, synchronize, reopened] + +concurrency: + # Distinct from `gemini-review-${pr}` so the production caller and + # this debug workflow run in parallel on the same PR without + # cancelling each other. + group: gemini-review-debug-${{ github.event.pull_request.number }} + cancel-in-progress: true + +jobs: + review: + runs-on: ubuntu-latest + timeout-minutes: 10 + permissions: + contents: read + id-token: write + pull-requests: write + steps: + - name: Checkout + uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.2 + with: + persist-credentials: 'false' + + - name: Prepare prompt context + # Mirrors upstream main's pattern. Writes a JSON file the + # agent can `@{...}`-include or read directly. The currently- + # released `/pr-code-review` slash command (extension @ main) + # doesn't actually read this file — it references $REPOSITORY + # / $PULL_REQUEST_NUMBER / $ADDITIONAL_CONTEXT in its prompt + # body — but upstream main's workflow still writes the file, + # so we follow. + shell: bash + run: |- + mkdir -p .gemini + jq -n \ + --arg repo "${REPOSITORY}" \ + --arg pr "${PULL_REQUEST_NUMBER}" \ + --arg context "${ADDITIONAL_CONTEXT}" \ + '{repository: $repo, pull_request_number: $pr, additional_context: $context}' \ + > .gemini/context.json + env: + REPOSITORY: ${{ github.repository }} + PULL_REQUEST_NUMBER: ${{ github.event.pull_request.number }} + ADDITIONAL_CONTEXT: '' + + - name: Run Gemini pull request review + id: gemini-review + uses: google-github-actions/run-gemini-cli@f77273f4c914e4bf38440cf36a0369cb64a37489 # v0.1.22 + env: + GEMINI_CLI_TRUST_WORKSPACE: 'true' + GITHUB_TOKEN: ${{ github.token }} + GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }} + # Upstream main keeps these as env vars too, alongside the + # context.json file. The currently-released `/pr-code-review` + # slash command reads them via shell echo at runtime. + PULL_REQUEST_NUMBER: ${{ github.event.pull_request.number }} + REPOSITORY: ${{ github.repository }} + ISSUE_TITLE: ${{ github.event.pull_request.title }} + ISSUE_BODY: ${{ github.event.pull_request.body }} + ADDITIONAL_CONTEXT: '' + with: + gemini_api_key: ${{ secrets.GEMINI_API_KEY }} + gemini_model: 'gemini-3-flash-preview' + gemini_cli_version: 'latest' + gemini_debug: 'true' + upload_artifacts: 'true' + github_pr_number: ${{ github.event.pull_request.number }} + extensions: | + [ + "https://github.com/gemini-cli-extensions/code-review" + ] + # MCP server pinned to TAG `:v0.27.0` — the version upstream + # tests with. Debug-spike-only deviation from our SHA-pinning + # discipline; when we backport to the reusable we'll resolve + # to a digest. No `modelConfigs`, no `tools.core` shell access + # (forces all I/O through MCP — proves whether MCP works). + settings: |- + { + "model": { + "maxSessionTurns": 25 + }, + "telemetry": { + "enabled": true, + "target": "local", + "outfile": ".gemini/telemetry.log" + }, + "mcpServers": { + "github": { + "command": "docker", + "args": [ + "run", + "-i", + "--rm", + "-e", + "GITHUB_PERSONAL_ACCESS_TOKEN", + "ghcr.io/github/github-mcp-server:v0.27.0" + ], + "includeTools": [ + "add_comment_to_pending_review", + "pull_request_read", + "pull_request_review_write" + ], + "env": { + "GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}" + } + } + }, + "tools": { + "core": [] + } + } + prompt: '/pr-code-review' + + - name: Upload gemini-artifacts (failure path) + # Same workaround as in _gemini-review.yml (post-#36): the + # run-gemini-cli action's inner upload step is `if: upload_ + # artifacts == 'true'` but not `if: always()`, so on the failure + # path the upload is skipped. The wrapper writes the artifact + # files BEFORE its exit 1, so we just upload them ourselves. + if: failure() + uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2 + with: + name: gemini-debug-output-failure + path: gemini-artifacts/ + if-no-files-found: warn + retention-days: 14 + + - name: Append gemini logs tail to job summary + # Pipefail-safe: `tail | head -c N` exits SIGPIPE under pipefail; + # we drop pipefail for this step and use `|| true` to swallow. + if: failure() + shell: bash + run: | + set -eu + if [ ! -d gemini-artifacts ]; then + { + echo "## ⚠️ Gemini debug run failed (no gemini-artifacts/ directory)" + echo + echo "The gemini-cli wrapper exited before writing logs to disk." + } >> "$GITHUB_STEP_SUMMARY" + exit 0 + fi + { + echo "## ⚠️ Gemini debug run failed — log tails" + echo + echo "Full logs uploaded as the \`gemini-debug-output-failure\` artifact." + echo + for f in stdout.log stderr.log telemetry.log; do + if [ -f "gemini-artifacts/$f" ] && [ -s "gemini-artifacts/$f" ]; then + echo "### \`gemini-artifacts/$f\` (last 200 lines)" + echo + echo '```' + (tail -n 200 "gemini-artifacts/$f" | head -c 50000) || true + echo + echo '```' + echo + fi + done + } >> "$GITHUB_STEP_SUMMARY" From 485437fb2cdda37c52d4f02b067edabbb5b57481 Mon Sep 17 00:00:00 2001 From: Nathan Heskew Date: Thu, 14 May 2026 11:51:49 -0700 Subject: [PATCH 7/8] debug: try /pr-review instead of /pr-code-review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Previous debug run (25874573129) confirmed MCP tools register correctly under server v0.27.0 (3 tools discovered: pull_request_read, pull_request_review_write, add_comment_to_pending_review). But the model failed to use them — INVALID_STREAM error with 0 output tokens across 4 API requests. Mute model. The extension's GEMINI.md loaded into agent context lists two top-level commands: /code-review - "When a user requests that code changes be reviewed" /pr-review - "When a user requests that pr to be reviewed, look at user provided input '{{args}}' and environment variables to see if $REPOSITORY, $PULL_REQUEST_NUMBER, and $ADDITIONAL_CONTEXT are set." Notably `/pr-review` (NOT `/pr-code-review`) is what the extension documents as the entry point. `/pr-code-review` exists as a file in the extension's commands/ directory but isn't mentioned in GEMINI.md. Possibility: `/pr-code-review` is internal / older / deprecated, and `/pr-review` is the intended public command (may be a routing alias). Cheap test. If `/pr-review` doesn't resolve, gemini-cli will fail clearly with a "command not found" error. If it does resolve and behaves better than `/pr-code-review`, we've found the right command. --- .github/workflows/gemini-review-debug.yml | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/.github/workflows/gemini-review-debug.yml b/.github/workflows/gemini-review-debug.yml index bb5de1b..22a8515 100644 --- a/.github/workflows/gemini-review-debug.yml +++ b/.github/workflows/gemini-review-debug.yml @@ -151,7 +151,17 @@ jobs: "core": [] } } - prompt: '/pr-code-review' + # Experiment: switch from `/pr-code-review` (which got the + # model into INVALID_STREAM with 0 output tokens across 4 + # API requests, despite MCP tools registering successfully + # under v0.27.0) to `/pr-review`. The extension's + # GEMINI.md (loaded into the agent's context at startup) + # documents `/pr-review` as the top-level command for + # "user requests that pr to be reviewed", though it's not + # in the extension's `commands/` directory — suggesting + # it might be a routing alias or skill-registered command. + # If it doesn't exist, gemini-cli will fail clearly. + prompt: '/pr-review' - name: Upload gemini-artifacts (failure path) # Same workaround as in _gemini-review.yml (post-#36): the From 861eed7a8c7cb27280c5f64fbfdcd5b4bad413c6 Mon Sep 17 00:00:00 2001 From: Nathan Heskew Date: Thu, 14 May 2026 17:03:29 -0700 Subject: [PATCH 8/8] debug: pivot to shell-based reviewer (gh CLI, no MCP) After 3 MCP-debug iterations proving it's not happening: - Run 25869995910: server v1.0.4, 0 MCP tools propagated - Run 25874573129: server v0.27.0, 3 MCP tools registered, model silent (INVALID_STREAM, 0 output tokens) - Run 25879095909: server v0.27.0 + /pr-review slash command, model hallucinated a full review of fictional google/oauth#42 using tags as prose (0 actual tool calls) Across all 3, the model demonstrated a strong preference for `gh` CLI patterns even when shell was denied. It tried `gh pr view`, `gh pr diff`, `printenv`, etc. in every run. Pragmatic pivot: stop fighting the model. - Drop mcpServers entirely - Drop the upstream code-review extension entirely - No custom slash command file; inline prompt only - tools.core: gh, git, cat, echo, grep, head, tail - Agent reads diff via `gh pr diff`, posts via `gh pr review --comment --body` Loses true inline comments (would require MCP). Findings reference file:line as plain-text `**File:** path:N` in the body. Goal of this iteration: produce A working Gemini review on PR #83 to compare against Claude. Architecture refinement comes later. --- .github/workflows/gemini-review-debug.yml | 209 ++++++++++------------ 1 file changed, 90 insertions(+), 119 deletions(-) diff --git a/.github/workflows/gemini-review-debug.yml b/.github/workflows/gemini-review-debug.yml index 22a8515..59c4eec 100644 --- a/.github/workflows/gemini-review-debug.yml +++ b/.github/workflows/gemini-review-debug.yml @@ -1,51 +1,30 @@ name: Gemini PR Review (debug) -# Single-repo debug spike to isolate why MCP tools never propagate to -# the model's function-call surface in our production `gemini-review.yml` -# (despite the github-mcp-server starting fine and registering session- -# side). This workflow runs in PARALLEL with the production caller — -# different concurrency group, no shared steps. +# Pragmatic shell-based Gemini reviewer. After 3 debug iterations +# proving MCP tool propagation is unreliable (v1.0.x server bug, +# umbrella-tool-shape mismatch with model's training, INVALID_STREAM +# on /pr-code-review, hallucinated review on /pr-review), pivoting +# to what the model wants to do anyway: use the `gh` CLI. # -# Hypothesis under test: -# Our production caller pins `ghcr.io/github/github-mcp-server@sha256: -# e3816a4...` which is the v1.0.4 image (released 2026-05-11). v1.0.0 -# (2026-04-16) bumped the modelcontextprotocol/go-sdk to v1.5.0 and -# reorganized toolsets, introducing feature-flagged tool groups gated -# by `dynamicToolsets` (visible in our server logs as -# `dynamicToolsets=false`). Upstream's example pins to `:v0.27.0` -# (2026-02-03 era) which predates that reorganization. The bump to -# v1.0.x was Harper-side and well-intended but likely the regression. +# Goal: produce A working Gemini review on PR #83 to compare against +# Claude. Architectural elegance is a non-goal for this iteration. # -# This workflow mirrors upstream's tip-of-main example as closely as -# practical: -# https://github.com/google-github-actions/run-gemini-cli/tree/main/examples/workflows/pr-review -# -# Deviations from upstream main, kept minimal: -# - Auth: skip the App-token-mint step entirely. Use the workflow's -# default GITHUB_TOKEN. This loses the auth gate but we're on a -# single-repo debug branch — only Nathan / org members push here. -# Backport will reintroduce the gate via the existing reusable. -# - Model: hardcoded `gemini-3-flash-preview` (no `vars.GEMINI_MODEL`). -# - Layered Harper scope: NOT applied. Goal is to prove MCP propagates -# before reintroducing scope complexity. ADDITIONAL_CONTEXT is empty. -# - Visibility: same failure-path artifact upload + summary-tail steps -# we added to the reusable in #36 (with the pipefail / SIGPIPE bug -# fixed). Upstream's inner upload step still doesn't gate on -# `if: always()` so we need our own failure-path upload. -# -# Iteration model: -# Edit this file → push → wait ~90s for run. No PR ceremony on -# ai-review-prompts. When we identify the working config, backport -# to `_gemini-review.yml` in one focused PR. +# Architecture: +# - No MCP. No upstream `code-review` extension. No custom slash +# command file. Just an inline prompt. +# - tools.core allows `gh`, `git`, `cat`, `echo`, `grep`, `head`, +# `tail`. The agent uses these natively. +# - Agent reads the diff via `gh pr diff`, generates a review, +# posts it via `gh pr review --comment --body`. +# - Body-anchored findings (no per-line inline comments — would +# require MCP, which we just spent 3 iterations failing to wire). +# Findings reference file:line as `**File:** path:N` plain text. on: pull_request: types: [opened, synchronize, reopened] concurrency: - # Distinct from `gemini-review-${pr}` so the production caller and - # this debug workflow run in parallel on the same PR without - # cancelling each other. group: gemini-review-debug-${{ github.event.pull_request.number }} cancel-in-progress: true @@ -63,28 +42,6 @@ jobs: with: persist-credentials: 'false' - - name: Prepare prompt context - # Mirrors upstream main's pattern. Writes a JSON file the - # agent can `@{...}`-include or read directly. The currently- - # released `/pr-code-review` slash command (extension @ main) - # doesn't actually read this file — it references $REPOSITORY - # / $PULL_REQUEST_NUMBER / $ADDITIONAL_CONTEXT in its prompt - # body — but upstream main's workflow still writes the file, - # so we follow. - shell: bash - run: |- - mkdir -p .gemini - jq -n \ - --arg repo "${REPOSITORY}" \ - --arg pr "${PULL_REQUEST_NUMBER}" \ - --arg context "${ADDITIONAL_CONTEXT}" \ - '{repository: $repo, pull_request_number: $pr, additional_context: $context}' \ - > .gemini/context.json - env: - REPOSITORY: ${{ github.repository }} - PULL_REQUEST_NUMBER: ${{ github.event.pull_request.number }} - ADDITIONAL_CONTEXT: '' - - name: Run Gemini pull request review id: gemini-review uses: google-github-actions/run-gemini-cli@f77273f4c914e4bf38440cf36a0369cb64a37489 # v0.1.22 @@ -92,83 +49,101 @@ jobs: GEMINI_CLI_TRUST_WORKSPACE: 'true' GITHUB_TOKEN: ${{ github.token }} GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }} - # Upstream main keeps these as env vars too, alongside the - # context.json file. The currently-released `/pr-code-review` - # slash command reads them via shell echo at runtime. + GH_TOKEN: ${{ github.token }} PULL_REQUEST_NUMBER: ${{ github.event.pull_request.number }} REPOSITORY: ${{ github.repository }} - ISSUE_TITLE: ${{ github.event.pull_request.title }} - ISSUE_BODY: ${{ github.event.pull_request.body }} - ADDITIONAL_CONTEXT: '' with: gemini_api_key: ${{ secrets.GEMINI_API_KEY }} gemini_model: 'gemini-3-flash-preview' gemini_cli_version: 'latest' gemini_debug: 'true' upload_artifacts: 'true' - github_pr_number: ${{ github.event.pull_request.number }} - extensions: | - [ - "https://github.com/gemini-cli-extensions/code-review" - ] - # MCP server pinned to TAG `:v0.27.0` — the version upstream - # tests with. Debug-spike-only deviation from our SHA-pinning - # discipline; when we backport to the reusable we'll resolve - # to a digest. No `modelConfigs`, no `tools.core` shell access - # (forces all I/O through MCP — proves whether MCP works). settings: |- { "model": { - "maxSessionTurns": 25 + "maxSessionTurns": 15 }, "telemetry": { "enabled": true, "target": "local", "outfile": ".gemini/telemetry.log" }, - "mcpServers": { - "github": { - "command": "docker", - "args": [ - "run", - "-i", - "--rm", - "-e", - "GITHUB_PERSONAL_ACCESS_TOKEN", - "ghcr.io/github/github-mcp-server:v0.27.0" - ], - "includeTools": [ - "add_comment_to_pending_review", - "pull_request_read", - "pull_request_review_write" - ], - "env": { - "GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}" - } - } - }, "tools": { - "core": [] + "core": [ + "run_shell_command(gh)", + "run_shell_command(git)", + "run_shell_command(cat)", + "run_shell_command(echo)", + "run_shell_command(grep)", + "run_shell_command(head)", + "run_shell_command(tail)" + ] } } - # Experiment: switch from `/pr-code-review` (which got the - # model into INVALID_STREAM with 0 output tokens across 4 - # API requests, despite MCP tools registering successfully - # under v0.27.0) to `/pr-review`. The extension's - # GEMINI.md (loaded into the agent's context at startup) - # documents `/pr-review` as the top-level command for - # "user requests that pr to be reviewed", though it's not - # in the extension's `commands/` directory — suggesting - # it might be a routing alias or skill-registered command. - # If it doesn't exist, gemini-cli will fail clearly. - prompt: '/pr-review' + prompt: | + You are a senior software engineer reviewing pull request + #${{ github.event.pull_request.number }} on + ${{ github.repository }}. + + Step 1 — Read the diff. Run: + + gh pr diff ${{ github.event.pull_request.number }} \ + --repo ${{ github.repository }} + + Step 2 — Decide if the diff is trivial. Trivial means + dependency version bumps, CI workflow pin updates, + lockfile-only churn, prose-only doc edits, version-string + changes. If trivial, skip to Step 4 with the no-blockers + template. + + Step 3 — For substantive diffs, review for BLOCKERS only. + A blocker is a 🔴 Critical (security vulnerability, + data-loss bug, broken public API contract) or 🟠 High + (clear correctness/security issue likely to bite in + production) finding. Do NOT post 🟡 Medium or 🟢 Low + findings. Cap at 10 findings. Reference each finding's + file:line as plain-text `**File:** path:LINE`. + + Step 4 — Post the review via: + + gh pr review ${{ github.event.pull_request.number }} \ + --repo ${{ github.repository }} \ + --comment \ + --body "$REVIEW_BODY" + + Where REVIEW_BODY is your review markdown formatted as: + + + + ## 📋 Review Summary + + + + ## 🔍 Findings + + ### 1. 🔴 + + **File:** `path/to/file.ext:LINE` + + **What:** <one or two sentences> + + **Why it matters:** <impact> + + **Suggested fix:** <concrete change or fenced code block> + + ### 2. ... + + Omit the `## 🔍 Findings` section entirely if you have + zero blockers. + + The `<!-- gemini-review:v1 -->` marker on the first line + is REQUIRED — it's how the team's tooling threads runs + together across pushes. + + After posting, your job is done. Do not post a duplicate. + Do not edit the review. - name: Upload gemini-artifacts (failure path) - # Same workaround as in _gemini-review.yml (post-#36): the - # run-gemini-cli action's inner upload step is `if: upload_ - # artifacts == 'true'` but not `if: always()`, so on the failure - # path the upload is skipped. The wrapper writes the artifact - # files BEFORE its exit 1, so we just upload them ourselves. if: failure() uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2 with: @@ -178,8 +153,6 @@ jobs: retention-days: 14 - name: Append gemini logs tail to job summary - # Pipefail-safe: `tail | head -c N` exits SIGPIPE under pipefail; - # we drop pipefail for this step and use `|| true` to swallow. if: failure() shell: bash run: | @@ -187,8 +160,6 @@ jobs: if [ ! -d gemini-artifacts ]; then { echo "## ⚠️ Gemini debug run failed (no gemini-artifacts/ directory)" - echo - echo "The gemini-cli wrapper exited before writing logs to disk." } >> "$GITHUB_STEP_SUMMARY" exit 0 fi