Inline Gemini reviewer for calibration by heskew · Pull Request #87 · HarperFast/oauth

heskew · 2026-05-20T14:35:21Z

Summary

Replaces oauth's thin caller of the _gemini-review.yml reusable in HarperFast/ai-review-prompts with a self-contained inlined copy of the same logic at pin 128656e40 (2026-05-12). The 5 scripts + 4 layer files the reusable depends on come along into .github/scripts/ and .github/review-layers/.

Why

3 logged Gemini production runs to date, 0 useful — two short-circuited to "no blockers" on non-trivial CI/permissions changes; one posted a false-positive finding (claimed CI pin "regression" on a release branch that pre-dated the "regressed" commit). The output gap from Claude is in the prompt and posting discipline, not the architecture.

Iterating on those through ai-review-prompts is PR + merge + pin bump + CI wait per change — too slow when none of the prior iterations produced useful output. Inlining gives a single-repo edit-push-observe loop. Once Gemini output reliably matches Claude's quality, the refined prompt promotes back to ai-review-prompts and this file reverts to its thin-caller form.

What changed

Workflow file (.github/workflows/gemini-review.yml):

on: workflow_call → on: pull_request + workflow_dispatch (the latter exposes model and gemini-cli-version for ad-hoc tuning from the Actions UI)
dropped both ai-review-prompts checkouts (sparse-checkout for the auth script + full clone for layers/scripts)
dropped the HarperFast/skills checkout — verified layer files reference harper-best-practices/ paths as text only, not filesystem reads
inlined the review-layers list and the OAuth repo-specific-checks block (previously workflow_call inputs)
bash refs point at local .github/scripts/<name>.sh

Scripts (.github/scripts/): 5 files, verbatim copy except compose-review-scope.sh, which gets a LAYERS_DIR env-var indirection (default .github/review-layers) so it works from either the inlined or reusable shape.

Layer files (.github/review-layers/): 4 files copied verbatim from ai-review-prompts@128656e40 — universal.md, harper/common.md, harper/v5.md, repo-type/plugin.md.

Claude reviewer (.github/workflows/claude-review.yml) is unchanged — still consumes the ai-review-prompts reusable at f22bf7d. Both reviewers run on this PR: Claude reviews the inlined workflow change itself; Gemini exercises the inlined logic on its own diff.

Test plan

Claude review fires on this PR and reviews the inlined workflow
Gemini review fires on this PR from the branch's own copy of the inlined workflow
compose-review-scope.sh emits a non-empty composed scope (≥ ~5 KB) under LAYERS_DIR=.github/review-layers
Posted Gemini comment starts with the  marker
ai-review-log issue threads a provider:gemini comment

Once those land, iterate on the prompt in subsequent commits to this branch and observe.

🤖 Generated with Claude Code

Inlines _gemini-review.yml at pin 128656e40 + the 5 scripts and 4 layer files it depends on, replacing the thin caller of the reusable in HarperFast/ai-review-prompts with a self-contained workflow. Why: 3 logged Gemini production runs, 0 useful (2 short-circuited to "no blockers" on non-trivial CI changes; 1 false-positive finding). Iterating on the prompt through ai-review-prompts requires PR + merge + pin bump + CI wait per change. Inlining gives a single-repo edit-push-observe loop until output is comparable to Claude's. Surgery applied to the workflow file: - on: workflow_call -> on: pull_request + workflow_dispatch - dropped both ai-review-prompts checkouts (sparse + full) - dropped the HarperFast/skills checkout (verified: layers reference skills paths as text only, no filesystem reads) - inlined the review-layers list and the OAuth repo-specific-checks block as literals (previously workflow_call inputs from the caller) - bash steps reference .github/scripts/<name>.sh instead of .ai-review-prompts/.github/scripts/<name>.sh - workflow_dispatch inputs expose model + gemini-cli-version for per-run tuning from the Actions UI One non-cosmetic patch to compose-review-scope.sh: LAYERS_DIR env var indirection (default .github/review-layers) so the script can be driven from either the inlined or reusable shape with the same code. Claude reviewer is unchanged - still consumes the ai-review-prompts reusable at f22bf7d. Both reviewers run in parallel on this PR; Claude will review the inlined workflow itself, Gemini will exercise the inlined logic on its own diff. Promotion path: once Gemini output reliably matches Claude's quality, fold the refined prompt + any script tweaks back to ai-review-prompts and revert this file to its thin-caller form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-20T14:38:00Z

Reviewed; no blockers found.

claude · 2026-05-20T14:40:54Z

+      gemini-cli-version:
+        description: 'Gemini CLI version pin (tag / branch / SHA / latest)'
+        type: string
+        default: 'latest'


workflow_dispatch missing pr-number input — runs are non-functional from the Actions UI

workflow_dispatch doesn't carry github.event.pull_request.number, so every downstream use of that context is empty:

Find prior review comment calls gh api "…/issues//comments" (empty issue path) — fails, but set -uo pipefail without -e swallows it, so prior_review.outputs.id is silently empty.

The Gemini prompt says "reviewing pull request #" (blank number) — the model has no useful PR context.

Post Gemini review comment calls gh pr comment "" --body-file … — fails silently for the same reason. Nothing is posted.

Net result: an Actions UI run shows green but does nothing.

The fix depends on intent. If workflow_dispatch is meant for ad-hoc review of an existing PR:

Suggested change

default: 'latest'

gemini-cli-version:

description: 'Gemini CLI version pin (tag / branch / SHA / latest)'

type: string

default: 'latest'

pr-number:

description: 'PR number to review (required for workflow_dispatch)'

type: string

required: true

…and then use github.event.pull_request.number || inputs.pr-number wherever PR_NUMBER is set. If workflow_dispatch is only meant to test workflow plumbing (not actually post a review), a guard at the top of the review job and a clear comment explaining the limitation would do.

claude · 2026-05-20T14:41:06Z

Reviewed; no blockers found.

Added in the previous commit without a real use case ("for ad-hoc tuning from the Actions UI") — every calibration tweak goes through a commit on the branch anyway. Both reviewers caught the cost: the trigger had no pr-number input, so any manual dispatch would have queried API paths with an empty PR number and failed silently. Hardcodes the two values that were behind the inputs fallback (gemini_model + gemini_cli_version) at their previous default values. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Single-shot Gemini has no shell/API access from inside the prompt, so any awareness of prior review state has to be pre-fetched and injected. Adds a new workflow step between `find-prior-review-comment` and the gemini-cli step that fetches the prior comment body when one exists, and exposes it via a step output the prompt interpolates. Updates the prompt with a new `## Prior review` section that: - Documents the empty-vs-non-empty signal (first review vs update) - Instructs that findings present in the prior review but absent from the current diff should be acknowledged with the resolving commit SHA, not re-posted - Instructs "no blockers" summaries on follow-up runs to briefly note what changed (matching Claude's continuity pattern) Closes the one usability gap observed in PR #87's first calibration cycle — Claude noted "prior blocker resolved by commit 52b3a24", Gemini just said "no blockers" without continuity. find-prior-review-comment.sh is unchanged (byte-identical to its ai-review-prompts@128656e40 source) — the continuity logic lives in the workflow file, which is what's being calibrated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Round 3 of PR #87 calibration regressed: injecting a "no blockers" prior review primed Gemini to hunt for issues without giving it anything real to evaluate, and it manufactured two weak findings against the prompt itself. Two changes addressing the priming: 1. The prior-body fetch step now checks for `### N.` finding headers in the body before emitting. Empty / "no blockers" priors produce no output, so the prompt sees an empty continuity section and reviews the current diff normally. 2. The "## Prior review" prompt section is reframed as a "Continuity reference" — neutral about whether the section exists ("ignore it and review the current diff normally" if empty), permissive rather than prescriptive about referencing it ("your summary line can cite..." instead of "for each finding ... decide whether ..."). Drops the checklist-y enumeration that pushed the model into completing-a-checklist mode against an empty input. Drops the `git log` reference that contributed to the "do tools or don't I" confusion in single-shot mode. Continuity behavior is preserved when it's actually useful (a real prior finding exists) and suppressed when it isn't (clean prior or first run). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Both findings from PR #87 rounds 3-4 are legitimate latent bugs in the prompt + post-script that were carried over from the ai-review-prompts@128656e40 source. Rounds 1-2 happened to not surface them because the model was focused on the diff content; once raised, they keep recurring because they remain unfixed. Finding 1 (prompt contradiction): the prompt forbade "shell commands, gh, or any API" while also instructing the model to "read agent context files first" and apply a layered review scope that requires reading the diff. In our configuration gemini-cli has default tools.core (full shell), so the model DOES use shell to research the diff — the prohibition was overreaching. Reworded to allow research tools (read files, git log/blame/diff, codebase search) while keeping the actual intent: do not attempt to post the review yourself, the workflow handles that. Finding 2 (marker check on indented output): the existing sed expression in post-review-comment.sh removed leading blank lines but preserved leading whitespace on the first non-blank line. If the model ever indented the marker line (e.g. " "), the case check would fail and a valid review wouldn't post. Added a second sed pass to strip leading whitespace from the first non-blank line. The defense against mystery content (refuse if marker absent) is unchanged. post-review-comment.sh is no longer byte-identical to its source at 128656e40 — first divergence from the inlined-as-copy state. Note for promotion: this fix belongs in ai-review-prompts when the calibration cycle wraps up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Bump both reviewer pins to ai-review-prompts@ea19009 Adopts the promoted Gemini calibration (ai-review-prompts#41) and restores both reviewers to a single shared pin. Gemini changes vs the prior pin (128656e40): - Single-shot architecture validated empirically over 5 iteration rounds in PR #87 (gemini/inline-calibrate) - Prior-review body fetched and injected as continuity context when the prior had findings (### N. headers); skipped on no-blockers priors so the model doesn't get primed to hunt for issues against uninformative context - Prompt's "your output IS the PR comment" section reworded to allow shell/file research tools while still forbidding self- posting (the workflow handles posting) - post-review-comment.sh now strips leading whitespace from the marker line so a non-whitespace-stable provider can't break the marker check Claude reviewer is byte-identical between f22bf7d and ea19009 in ai-review-prompts; bumping the pin is a no-op for Claude behavior but restores the two reviewers to a single shared upgrade motion. The "intentionally NOT in lockstep" comment from the divergence period is replaced with a note that lockstep is restored. PR #87 (gemini/inline-calibrate) is now superseded by this PR and will be closed without merging. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add caller-side permissions block to gemini-review.yml Mirrors the hardening on claude-review.yml — explicit grant at the calling-job level for the union the reusable's authorize + review jobs need (contents: read + pull-requests: write + id-token: write). The workflow happens to work without this on oauth today because the repo default is permissive enough, but the explicit grant survives repo-default changes and matches claude-review.yml's shape and discipline. Surfaced as a finding by Gemini's own review of this PR (#88) — exactly the loop the calibration was built for. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

heskew · 2026-05-21T01:35:57Z

Superseded by #88 (which has merged). The calibration this branch was running is now in production via HarperFast/ai-review-prompts#41; oauth's gemini-review.yml caller is back to the thin-caller form pointing at ea190091328bcee674c4739ccc97dda177ecf0c5.

Calibration summary for the record (rounds 1-5 on this branch):

Round	Trigger	Gemini	Claude
1	Inlining setup	1 valid finding (workflow_dispatch missing pr-number)	Same finding
2	Fix #1	No blockers	No blockers + continuity note
3	First continuity attempt	2 weak findings (priming regression)	No blockers + continuity
4	Filter no-blockers priors	Same 2 findings (priors were real bugs)	No blockers
5	Fix the prompt/script bugs	No blockers	No blockers

Closing without merging — the inlined files served their purpose as a fast iteration loop; the changes have promoted upstream.

claude Bot reviewed May 20, 2026

View reviewed changes

heskew and others added 4 commits May 20, 2026 08:18

This was referenced May 20, 2026

Promote Gemini calibration from oauth#87 — single-shot supersedes MCP HarperFast/ai-review-prompts#41

Merged

Bump both reviewer pins to ai-review-prompts@ea19009 #88

Merged

heskew closed this May 21, 2026

heskew deleted the gemini/inline-calibrate branch May 21, 2026 01:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inline Gemini reviewer for calibration#87

Inline Gemini reviewer for calibration#87
heskew wants to merge 5 commits into
mainfrom
gemini/inline-calibrate

heskew commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026 •

edited

Loading

Uh oh!

claude Bot May 20, 2026

Uh oh!

claude Bot commented May 20, 2026 •

edited

Loading

Uh oh!

heskew commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

heskew commented May 20, 2026

Summary

Why

What changed

Test plan

Uh oh!

github-actions Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

heskew commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 20, 2026 •

edited

Loading

claude Bot commented May 20, 2026 •

edited

Loading