Skip to content

Inline Gemini reviewer for calibration#87

Closed
heskew wants to merge 5 commits into
mainfrom
gemini/inline-calibrate
Closed

Inline Gemini reviewer for calibration#87
heskew wants to merge 5 commits into
mainfrom
gemini/inline-calibrate

Conversation

@heskew
Copy link
Copy Markdown
Member

@heskew heskew commented May 20, 2026

Summary

Replaces oauth's thin caller of the _gemini-review.yml reusable in HarperFast/ai-review-prompts with a self-contained inlined copy of the same logic at pin 128656e40 (2026-05-12). The 5 scripts + 4 layer files the reusable depends on come along into .github/scripts/ and .github/review-layers/.

Why

3 logged Gemini production runs to date, 0 useful — two short-circuited to "no blockers" on non-trivial CI/permissions changes; one posted a false-positive finding (claimed CI pin "regression" on a release branch that pre-dated the "regressed" commit). The output gap from Claude is in the prompt and posting discipline, not the architecture.

Iterating on those through ai-review-prompts is PR + merge + pin bump + CI wait per change — too slow when none of the prior iterations produced useful output. Inlining gives a single-repo edit-push-observe loop. Once Gemini output reliably matches Claude's quality, the refined prompt promotes back to ai-review-prompts and this file reverts to its thin-caller form.

What changed

Workflow file (.github/workflows/gemini-review.yml):

  • on: workflow_callon: pull_request + workflow_dispatch (the latter exposes model and gemini-cli-version for ad-hoc tuning from the Actions UI)
  • dropped both ai-review-prompts checkouts (sparse-checkout for the auth script + full clone for layers/scripts)
  • dropped the HarperFast/skills checkout — verified layer files reference harper-best-practices/ paths as text only, not filesystem reads
  • inlined the review-layers list and the OAuth repo-specific-checks block (previously workflow_call inputs)
  • bash refs point at local .github/scripts/<name>.sh

Scripts (.github/scripts/): 5 files, verbatim copy except compose-review-scope.sh, which gets a LAYERS_DIR env-var indirection (default .github/review-layers) so it works from either the inlined or reusable shape.

Layer files (.github/review-layers/): 4 files copied verbatim from ai-review-prompts@128656e40universal.md, harper/common.md, harper/v5.md, repo-type/plugin.md.

Claude reviewer (.github/workflows/claude-review.yml) is unchanged — still consumes the ai-review-prompts reusable at f22bf7d. Both reviewers run on this PR: Claude reviews the inlined workflow change itself; Gemini exercises the inlined logic on its own diff.

Test plan

  • Claude review fires on this PR and reviews the inlined workflow
  • Gemini review fires on this PR from the branch's own copy of the inlined workflow
  • compose-review-scope.sh emits a non-empty composed scope (≥ ~5 KB) under LAYERS_DIR=.github/review-layers
  • Posted Gemini comment starts with the <!-- gemini-review:v1 --> marker
  • ai-review-log issue threads a provider:gemini comment

Once those land, iterate on the prompt in subsequent commits to this branch and observe.

🤖 Generated with Claude Code

Inlines _gemini-review.yml at pin 128656e40 + the 5 scripts and 4 layer
files it depends on, replacing the thin caller of the reusable in
HarperFast/ai-review-prompts with a self-contained workflow.

Why: 3 logged Gemini production runs, 0 useful (2 short-circuited to
"no blockers" on non-trivial CI changes; 1 false-positive finding).
Iterating on the prompt through ai-review-prompts requires PR + merge +
pin bump + CI wait per change. Inlining gives a single-repo
edit-push-observe loop until output is comparable to Claude's.

Surgery applied to the workflow file:
- on: workflow_call -> on: pull_request + workflow_dispatch
- dropped both ai-review-prompts checkouts (sparse + full)
- dropped the HarperFast/skills checkout (verified: layers reference
  skills paths as text only, no filesystem reads)
- inlined the review-layers list and the OAuth repo-specific-checks
  block as literals (previously workflow_call inputs from the caller)
- bash steps reference .github/scripts/<name>.sh instead of
  .ai-review-prompts/.github/scripts/<name>.sh
- workflow_dispatch inputs expose model + gemini-cli-version for
  per-run tuning from the Actions UI

One non-cosmetic patch to compose-review-scope.sh: LAYERS_DIR env var
indirection (default .github/review-layers) so the script can be
driven from either the inlined or reusable shape with the same code.

Claude reviewer is unchanged - still consumes the ai-review-prompts
reusable at f22bf7d. Both reviewers run in parallel on this PR; Claude
will review the inlined workflow itself, Gemini will exercise the
inlined logic on its own diff.

Promotion path: once Gemini output reliably matches Claude's quality,
fold the refined prompt + any script tweaks back to ai-review-prompts
and revert this file to its thin-caller form.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

Reviewed; no blockers found.

Comment thread .github/workflows/gemini-review.yml Outdated
gemini-cli-version:
description: 'Gemini CLI version pin (tag / branch / SHA / latest)'
type: string
default: 'latest'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

workflow_dispatch missing pr-number input — runs are non-functional from the Actions UI

workflow_dispatch doesn't carry github.event.pull_request.number, so every downstream use of that context is empty:

  • Find prior review comment calls gh api "…/issues//comments" (empty issue path) — fails, but set -uo pipefail without -e swallows it, so prior_review.outputs.id is silently empty.
  • The Gemini prompt says "reviewing pull request #" (blank number) — the model has no useful PR context.
  • Post Gemini review comment calls gh pr comment "" --body-file … — fails silently for the same reason. Nothing is posted.

Net result: an Actions UI run shows green but does nothing.

The fix depends on intent. If workflow_dispatch is meant for ad-hoc review of an existing PR:

Suggested change
default: 'latest'
gemini-cli-version:
description: 'Gemini CLI version pin (tag / branch / SHA / latest)'
type: string
default: 'latest'
pr-number:
description: 'PR number to review (required for workflow_dispatch)'
type: string
required: true

…and then use github.event.pull_request.number || inputs.pr-number wherever PR_NUMBER is set. If workflow_dispatch is only meant to test workflow plumbing (not actually post a review), a guard at the top of the review job and a clear comment explaining the limitation would do.

@claude
Copy link
Copy Markdown

claude Bot commented May 20, 2026

Reviewed; no blockers found.

heskew and others added 4 commits May 20, 2026 08:18
Added in the previous commit without a real use case ("for ad-hoc
tuning from the Actions UI") — every calibration tweak goes through
a commit on the branch anyway. Both reviewers caught the cost: the
trigger had no pr-number input, so any manual dispatch would have
queried API paths with an empty PR number and failed silently.

Hardcodes the two values that were behind the inputs fallback
(gemini_model + gemini_cli_version) at their previous default
values.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single-shot Gemini has no shell/API access from inside the prompt,
so any awareness of prior review state has to be pre-fetched and
injected. Adds a new workflow step between `find-prior-review-comment`
and the gemini-cli step that fetches the prior comment body when one
exists, and exposes it via a step output the prompt interpolates.

Updates the prompt with a new `## Prior review` section that:
- Documents the empty-vs-non-empty signal (first review vs update)
- Instructs that findings present in the prior review but absent from
  the current diff should be acknowledged with the resolving commit
  SHA, not re-posted
- Instructs "no blockers" summaries on follow-up runs to briefly note
  what changed (matching Claude's continuity pattern)

Closes the one usability gap observed in PR #87's first calibration
cycle — Claude noted "prior blocker resolved by commit 52b3a24",
Gemini just said "no blockers" without continuity.

find-prior-review-comment.sh is unchanged (byte-identical to its
ai-review-prompts@128656e40 source) — the continuity logic lives in
the workflow file, which is what's being calibrated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round 3 of PR #87 calibration regressed: injecting a "no blockers"
prior review primed Gemini to hunt for issues without giving it
anything real to evaluate, and it manufactured two weak findings
against the prompt itself.

Two changes addressing the priming:

1. The prior-body fetch step now checks for `### N.` finding headers
   in the body before emitting. Empty / "no blockers" priors produce
   no output, so the prompt sees an empty continuity section and
   reviews the current diff normally.

2. The "## Prior review" prompt section is reframed as a
   "Continuity reference" — neutral about whether the section exists
   ("ignore it and review the current diff normally" if empty),
   permissive rather than prescriptive about referencing it
   ("your summary line can cite..." instead of "for each finding
   ... decide whether ..."). Drops the checklist-y enumeration that
   pushed the model into completing-a-checklist mode against an
   empty input. Drops the `git log` reference that contributed to
   the "do tools or don't I" confusion in single-shot mode.

Continuity behavior is preserved when it's actually useful (a real
prior finding exists) and suppressed when it isn't (clean prior or
first run).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both findings from PR #87 rounds 3-4 are legitimate latent bugs in
the prompt + post-script that were carried over from the
ai-review-prompts@128656e40 source. Rounds 1-2 happened to not
surface them because the model was focused on the diff content;
once raised, they keep recurring because they remain unfixed.

Finding 1 (prompt contradiction): the prompt forbade "shell
commands, gh, or any API" while also instructing the model to
"read agent context files first" and apply a layered review scope
that requires reading the diff. In our configuration gemini-cli
has default tools.core (full shell), so the model DOES use shell
to research the diff — the prohibition was overreaching. Reworded
to allow research tools (read files, git log/blame/diff, codebase
search) while keeping the actual intent: do not attempt to post
the review yourself, the workflow handles that.

Finding 2 (marker check on indented output): the existing sed
expression in post-review-comment.sh removed leading blank lines
but preserved leading whitespace on the first non-blank line. If
the model ever indented the marker line (e.g. "  <!-- gemini-
review:v1 -->"), the case check would fail and a valid review
wouldn't post. Added a second sed pass to strip leading whitespace
from the first non-blank line. The defense against mystery content
(refuse if marker absent) is unchanged.

post-review-comment.sh is no longer byte-identical to its source
at 128656e40 — first divergence from the inlined-as-copy state.
Note for promotion: this fix belongs in ai-review-prompts when
the calibration cycle wraps up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
heskew added a commit that referenced this pull request May 21, 2026
* Bump both reviewer pins to ai-review-prompts@ea19009

Adopts the promoted Gemini calibration (ai-review-prompts#41) and
restores both reviewers to a single shared pin.

Gemini changes vs the prior pin (128656e40):
- Single-shot architecture validated empirically over 5 iteration
  rounds in PR #87 (gemini/inline-calibrate)
- Prior-review body fetched and injected as continuity context when
  the prior had findings (### N. headers); skipped on no-blockers
  priors so the model doesn't get primed to hunt for issues against
  uninformative context
- Prompt's "your output IS the PR comment" section reworded to
  allow shell/file research tools while still forbidding self-
  posting (the workflow handles posting)
- post-review-comment.sh now strips leading whitespace from the
  marker line so a non-whitespace-stable provider can't break the
  marker check

Claude reviewer is byte-identical between f22bf7d and ea19009 in
ai-review-prompts; bumping the pin is a no-op for Claude behavior
but restores the two reviewers to a single shared upgrade motion.
The "intentionally NOT in lockstep" comment from the divergence
period is replaced with a note that lockstep is restored.

PR #87 (gemini/inline-calibrate) is now superseded by this PR and
will be closed without merging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add caller-side permissions block to gemini-review.yml

Mirrors the hardening on claude-review.yml — explicit grant at the
calling-job level for the union the reusable's authorize + review
jobs need (contents: read + pull-requests: write + id-token: write).
The workflow happens to work without this on oauth today because
the repo default is permissive enough, but the explicit grant
survives repo-default changes and matches claude-review.yml's
shape and discipline.

Surfaced as a finding by Gemini's own review of this PR (#88) —
exactly the loop the calibration was built for.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@heskew
Copy link
Copy Markdown
Member Author

heskew commented May 21, 2026

Superseded by #88 (which has merged). The calibration this branch was running is now in production via HarperFast/ai-review-prompts#41; oauth's gemini-review.yml caller is back to the thin-caller form pointing at ea190091328bcee674c4739ccc97dda177ecf0c5.

Calibration summary for the record (rounds 1-5 on this branch):

Round Trigger Gemini Claude
1 Inlining setup 1 valid finding (workflow_dispatch missing pr-number) Same finding
2 Fix #1 No blockers No blockers + continuity note
3 First continuity attempt 2 weak findings (priming regression) No blockers + continuity
4 Filter no-blockers priors Same 2 findings (priors were real bugs) No blockers
5 Fix the prompt/script bugs No blockers No blockers

Closing without merging — the inlined files served their purpose as a fast iteration loop; the changes have promoted upstream.

@heskew heskew closed this May 21, 2026
@heskew heskew deleted the gemini/inline-calibrate branch May 21, 2026 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant