feat(ci): headless PR review agent (phase 1) by vaderyang · Pull Request #28 · Netis/TokenScope

vaderyang · 2026-05-20T07:51:29Z

Summary

A code-review agent that fires after ci passes on a PR and posts a single PR review (APPROVE / COMMENT / REQUEST_CHANGES) before a human reviewer picks it up. The agent runs on the existing tokenscope self-hosted runner; routes Anthropic-shaped Claude Code calls through LiteLLM (172.16.103.81:4200) onto GLM-5 (SGLang :9000). No new infrastructure to stand up — all the model serving is already running on wuneng.

The point isn't to replace human review. It's to triage the obvious-in-hindsight stuff (schema mirror drift, body-column scans on wide windows, missing tanstack-query queryKey deps, classifier rules sensitive to window width — recent footguns this repo has hit) so the human reviewer arrives at a PR with the easy 80% already flagged.

Trigger sequence

1. PR opened / synchronize
2. `ci` workflow runs
3. `ci` completes with success
4. `workflow_run` fires `pr-review`
5. agent posts a single PR review

If CI fails, the agent never runs.

Files

File	Role
`.github/workflows/pr-review.yml`	`workflow_run` trigger gated on CI success + `workflow_dispatch` manual re-run; concurrency-keyed per PR
`scripts/pr-review/run_review.sh`	Substitutes the prompt, pre-flights LiteLLM, runs `claude -p` with read-only tools + 1800 s outer timeout
`scripts/pr-review/prompt.md`	Repo-specific reviewer prompt: crate map, schema-mirror rules, explicit "things this repo has been bitten by" section
`scripts/pr-review/allowed_tools.txt`	Read / Grep / Glob / `Bash(gh pr diff:*)` / etc — no Edit, no Write, no unrestricted Bash
`scripts/pr-review/post_review.py`	Picks gh-review event from agent output sections; falls back to plain comment if the bot can't review
`docs/pr-review-agent.md`	Architecture + ops notes — runner setup, cost/latency, failure modes, phasing

Runner setup (one-time, not in this PR)

On the tokenscope-ci VM:

npm i -g @anthropic-ai/claude-code   # CLI
gh auth login                         # bot account

The runner is already labeled tokenscope (same one the ci workflow uses). Network path to LiteLLM at 172.16.103.81:4200 already exists via the libvirt bridge.

Cost / latency

GLM-5 runs on-prem — no per-request cost, the constraint is GPU minutes.

PR size	Files	Tokens (in / out)	Wall clock
Small	1–2	20–40 K / 3–6 K	2–3 min
Medium	5–10	60–150 K / 8–15 K	5–8 min
Large	30+	250–500 K / 15–25 K	15–25 min

Test plan

Land this PR with manual-trigger only (the workflow_dispatch path).
gh workflow run pr-review.yml -f pr_number=<test PR> against a small recent PR; verify it posts a review comment.
Re-run against PR feat(services): per-endpoint Services page (server_ip:port → models + perf) #25 (medium) and PR feat(services): Path view + Overview agent charts (deploy roll-up) #27 (large) to calibrate.
Enable the workflow_run auto-trigger once the prompt produces reviewer-grade output on the three test PRs.
Watch the first auto-fired review for line-number accuracy and false-positive rate.

🤖 Generated with Claude Code

A code-review agent that fires after `ci` passes on a PR and posts a single PR review (APPROVE / COMMENT / REQUEST_CHANGES) before a human reviewer picks it up. The intent: triage the obvious-in- hindsight stuff (schema mirror drift, body-column scans, missing queryKey deps, classifier rules sensitive to window width, …) so the human reviewer arrives at a PR with the easy 80% already surfaced. ## Pieces * `.github/workflows/pr-review.yml` Trigger = `workflow_run` on `ci` completion, gated on `conclusion == 'success'` and `event == 'pull_request'`. Manual `workflow_dispatch` (with pr_number input) is kept as a re-run hatch. Concurrency-keyed per PR so a re-trigger cancels the in-flight job. * `scripts/pr-review/run_review.sh` Substitutes PR_NUMBER / HEAD_SHA / BASE_REF into the prompt template, pre-flights LiteLLM with a 5 s curl, runs `claude -p` in print mode with a read-only tool allowlist + 1800 s outer timeout, drops the model's stdout into `/tmp/pr-review-${N}-out.md`. * `scripts/pr-review/prompt.md` Repo-specific reviewer prompt. Carries the crate map, the schema- mirror rules (Rust serde ↔ console TS types), and an explicit "things this repo has been bitten by" section that the agent must actively look for (body-column scans like commit bf4887f, window- width-sensitive classifier heuristics like fea1d83, etc). Output format is strict — empty sections must be omitted, every Blocking/Suggestion item must cite file:line. * `scripts/pr-review/allowed_tools.txt` Read / Grep / Glob / Bash(gh pr diff:*) / Bash(rg:*) / … No Edit, no Write, no unrestricted Bash(*) — the agent is read-only. * `scripts/pr-review/post_review.py` Parses the agent's markdown for which sections are populated, picks the gh-review event accordingly (Blocking → REQUEST_CHANGES, Suggestions/Questions only → COMMENT, none → APPROVE), posts via `gh pr review`. Falls back to a plain `gh pr comment` when the bot can't review its own PRs. * `docs/pr-review-agent.md` Architecture + ops notes — runner setup, cost/latency budget, failure modes, phasing. ## Model routing Agent → LiteLLM (172.16.103.81:4200) → GLM-5 SGLang (172.16.103.81 :9000). LiteLLM rewrites the Anthropic-shaped `claude-3-5-sonnet-20241022` calls onto GLM-5, which has sglang's `glm47` tool-call parser configured. No new infrastructure needed. ## Phasing Phase 1 (this PR): manual + post-CI auto-trigger. Test on a few real PRs to calibrate the prompt. Phase 2: tune prompt against a labeled set of past PRs + reviewer comments. Phase 3: structured inline review comments once line numbers prove reliable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

One-off probe runnable via workflow_dispatch. Verifies the self-hosted runner has: * claude CLI on PATH (Claude Code) * gh CLI installed + authenticated * python3 + envsubst * network path to LiteLLM at 172.16.103.81:4200 * round-trip claude → LiteLLM → GLM-5 returns expected token Run before merging pr-review.yml or after re-imaging the runner so prereq failures surface as clearly-labeled step failures instead of opaque agent crashes in the real review path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Vader Yang and others added 2 commits May 20, 2026 15:50

vaderyang merged commit 091fc5b into main May 20, 2026
1 check passed

vaderyang mentioned this pull request May 20, 2026

fix(pr-review): :4000 LiteLLM + auth secret + NO_PROXY + auto-merge trusted PRs #31

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ci): headless PR review agent (phase 1)#28

feat(ci): headless PR review agent (phase 1)#28
vaderyang merged 2 commits into
mainfrom
feat/pr-review-agent

vaderyang commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vaderyang commented May 20, 2026

Summary

Trigger sequence

Files

Runner setup (one-time, not in this PR)

Cost / latency

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant