Skip to content

feat(ci): headless PR review agent (phase 1)#28

Merged
vaderyang merged 2 commits into
mainfrom
feat/pr-review-agent
May 20, 2026
Merged

feat(ci): headless PR review agent (phase 1)#28
vaderyang merged 2 commits into
mainfrom
feat/pr-review-agent

Conversation

@vaderyang
Copy link
Copy Markdown
Collaborator

Summary

A code-review agent that fires after ci passes on a PR and posts a single PR review (APPROVE / COMMENT / REQUEST_CHANGES) before a human reviewer picks it up. The agent runs on the existing tokenscope self-hosted runner; routes Anthropic-shaped Claude Code calls through LiteLLM (172.16.103.81:4200) onto GLM-5 (SGLang :9000). No new infrastructure to stand up — all the model serving is already running on wuneng.

The point isn't to replace human review. It's to triage the obvious-in-hindsight stuff (schema mirror drift, body-column scans on wide windows, missing tanstack-query queryKey deps, classifier rules sensitive to window width — recent footguns this repo has hit) so the human reviewer arrives at a PR with the easy 80% already flagged.

Trigger sequence

1. PR opened / synchronize
2. `ci` workflow runs
3. `ci` completes with success
4. `workflow_run` fires `pr-review`
5. agent posts a single PR review

If CI fails, the agent never runs.

Files

File Role
.github/workflows/pr-review.yml workflow_run trigger gated on CI success + workflow_dispatch manual re-run; concurrency-keyed per PR
scripts/pr-review/run_review.sh Substitutes the prompt, pre-flights LiteLLM, runs claude -p with read-only tools + 1800 s outer timeout
scripts/pr-review/prompt.md Repo-specific reviewer prompt: crate map, schema-mirror rules, explicit "things this repo has been bitten by" section
scripts/pr-review/allowed_tools.txt Read / Grep / Glob / Bash(gh pr diff:*) / etc — no Edit, no Write, no unrestricted Bash
scripts/pr-review/post_review.py Picks gh-review event from agent output sections; falls back to plain comment if the bot can't review
docs/pr-review-agent.md Architecture + ops notes — runner setup, cost/latency, failure modes, phasing

Runner setup (one-time, not in this PR)

On the tokenscope-ci VM:

npm i -g @anthropic-ai/claude-code   # CLI
gh auth login                         # bot account

The runner is already labeled tokenscope (same one the ci workflow uses). Network path to LiteLLM at 172.16.103.81:4200 already exists via the libvirt bridge.

Cost / latency

GLM-5 runs on-prem — no per-request cost, the constraint is GPU minutes.

PR size Files Tokens (in / out) Wall clock
Small 1–2 20–40 K / 3–6 K 2–3 min
Medium 5–10 60–150 K / 8–15 K 5–8 min
Large 30+ 250–500 K / 15–25 K 15–25 min

Test plan

🤖 Generated with Claude Code

Vader Yang and others added 2 commits May 20, 2026 15:50
A code-review agent that fires after `ci` passes on a PR and posts
a single PR review (APPROVE / COMMENT / REQUEST_CHANGES) before a
human reviewer picks it up. The intent: triage the obvious-in-
hindsight stuff (schema mirror drift, body-column scans, missing
queryKey deps, classifier rules sensitive to window width, …) so the
human reviewer arrives at a PR with the easy 80% already surfaced.

## Pieces

* `.github/workflows/pr-review.yml`
  Trigger = `workflow_run` on `ci` completion, gated on
  `conclusion == 'success'` and `event == 'pull_request'`. Manual
  `workflow_dispatch` (with pr_number input) is kept as a re-run
  hatch. Concurrency-keyed per PR so a re-trigger cancels the
  in-flight job.

* `scripts/pr-review/run_review.sh`
  Substitutes PR_NUMBER / HEAD_SHA / BASE_REF into the prompt
  template, pre-flights LiteLLM with a 5 s curl, runs `claude -p`
  in print mode with a read-only tool allowlist + 1800 s outer
  timeout, drops the model's stdout into
  `/tmp/pr-review-${N}-out.md`.

* `scripts/pr-review/prompt.md`
  Repo-specific reviewer prompt. Carries the crate map, the schema-
  mirror rules (Rust serde ↔ console TS types), and an explicit
  "things this repo has been bitten by" section that the agent must
  actively look for (body-column scans like commit bf4887f, window-
  width-sensitive classifier heuristics like fea1d83, etc).
  Output format is strict — empty sections must be omitted, every
  Blocking/Suggestion item must cite file:line.

* `scripts/pr-review/allowed_tools.txt`
  Read / Grep / Glob / Bash(gh pr diff:*) / Bash(rg:*) / …
  No Edit, no Write, no unrestricted Bash(*) — the agent is
  read-only.

* `scripts/pr-review/post_review.py`
  Parses the agent's markdown for which sections are populated,
  picks the gh-review event accordingly (Blocking →
  REQUEST_CHANGES, Suggestions/Questions only → COMMENT, none →
  APPROVE), posts via `gh pr review`. Falls back to a plain
  `gh pr comment` when the bot can't review its own PRs.

* `docs/pr-review-agent.md`
  Architecture + ops notes — runner setup, cost/latency budget,
  failure modes, phasing.

## Model routing

Agent → LiteLLM (172.16.103.81:4200) → GLM-5 SGLang (172.16.103.81
:9000). LiteLLM rewrites the Anthropic-shaped
`claude-3-5-sonnet-20241022` calls onto GLM-5, which has
sglang's `glm47` tool-call parser configured. No new infrastructure
needed.

## Phasing

Phase 1 (this PR): manual + post-CI auto-trigger. Test on a few
real PRs to calibrate the prompt. Phase 2: tune prompt against a
labeled set of past PRs + reviewer comments. Phase 3: structured
inline review comments once line numbers prove reliable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
One-off probe runnable via workflow_dispatch. Verifies the
self-hosted runner has:
  * claude CLI on PATH (Claude Code)
  * gh CLI installed + authenticated
  * python3 + envsubst
  * network path to LiteLLM at 172.16.103.81:4200
  * round-trip claude → LiteLLM → GLM-5 returns expected token

Run before merging pr-review.yml or after re-imaging the runner so
prereq failures surface as clearly-labeled step failures instead of
opaque agent crashes in the real review path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vaderyang vaderyang merged commit 091fc5b into main May 20, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant