From be2f49007020e1b8e6c2d2fb63edb437e2d8f1a1 Mon Sep 17 00:00:00 2001 From: Jordan Date: Thu, 21 May 2026 15:26:42 -0400 Subject: [PATCH 1/3] tighten review prompt: validate findings before posting Add an explicit draft -> validate -> filter -> post loop to the inline review prompt so findings get a second-pass check against the diff before being posted. The validator drops anything that can't be tied to a specific token in the diff, anything that depends on context outside the diff, anything CI already enforces, and style/opinion calls. Only high-signal survivors (compile/parse failures, definite logic errors, concrete security or data-loss risks, or quotable convention violations) make it into the posted comment. Output adds a 'No high-signal issues found.' fallback when validation drops everything. Goal: cut false-positive noise from the org-wide PR review bot. --- .../workflows/claude-code-review-reusable.yml | 26 ++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/.github/workflows/claude-code-review-reusable.yml b/.github/workflows/claude-code-review-reusable.yml index 780a75e..b7ca4d9 100644 --- a/.github/workflows/claude-code-review-reusable.yml +++ b/.github/workflows/claude-code-review-reusable.yml @@ -36,6 +36,30 @@ jobs: 2. Read the PR description to understand intent 3. Examine the full diff — only flag issues visible in the diff, not pre-existing code + **Process: draft → validate → filter → post** + + Do these steps in order. Do not skip the validation step. Under-reporting is better than over-reporting — false positives erode trust and waste reviewer time. + + 1. **Draft findings privately.** Make an internal list of every issue you'd flag. For each one, record: severity bucket, file:line in the diff, and a one-sentence reason. Do not post yet. + + 2. **Validate every drafted finding before posting.** For each one, re-read the cited file:line in the diff and answer honestly: + - Does the cited line actually contain the problem I described? If you can't point to specific tokens in the diff that prove the issue, drop it. + - Is this a real defect, or just a style/quality opinion? Drop opinions. + - Could this depend on context outside the diff (caller behavior, runtime state, config not shown)? If you can't validate it from the diff alone, drop it. + - Is this something CI already enforces (lint, types, formatting)? Drop it. + - Is this a pedantic nitpick a senior engineer wouldn't raise? Drop it. + - Is this a pre-existing issue not introduced by the diff? Drop it. + + 3. **Keep only HIGH-SIGNAL survivors.** A finding survives only if at least one of these is true: + - The code will fail to compile or parse (syntax errors, type errors, missing imports, unresolved references) + - The code will definitely produce wrong results regardless of inputs (clear logic error) + - There's a concrete security or data-loss risk traceable to a specific line in the diff + - A documented project convention (REVIEW.md, CLAUDE.md) is unambiguously violated, and you can quote the rule + + If you're not confident the issue is real, drop it. + + 4. **Post the review using only the surviving findings.** + **Rules:** - Every finding needs a file:line citation pointing to the exact problem in the diff - Do not speculate about issues without a direct line reference @@ -49,7 +73,7 @@ jobs: ## 🔴 Critical (security, data loss, breaking changes) ## 🟠 High (bugs, incorrect logic) ## 🟡 Medium (maintainability — max 3 items) - [Use file:line references. Skip empty sections. If nothing critical, keep it short.] + [Use file:line references. Skip empty sections. If nothing critical, keep it short. If no findings survive validation in any bucket, say "No high-signal issues found."] Post using: ```bash From 6b9c478798a83f001f098a401e1f3fa2847436fd Mon Sep 17 00:00:00 2001 From: Jordan Date: Thu, 21 May 2026 15:59:33 -0400 Subject: [PATCH 2/3] let validator look outside the diff Validation was forced to drop any finding that needed context beyond the diff to confirm, which kept too much noise out but also killed real bugs whose proof lives one file over (e.g. an awaited-but-not- async function defined elsewhere, a misnamed import). Grant the validator Read/Grep/Glob and gh-api/git-log/show/blame/diff Bash patterns so it can actually investigate. Update the prompt to encourage purposeful look-outside-the-diff checks (specific question in mind) and to use git blame/log to confirm whether a problem was introduced by the PR vs pre-existing. --- .github/workflows/claude-code-review-reusable.yml | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/.github/workflows/claude-code-review-reusable.yml b/.github/workflows/claude-code-review-reusable.yml index b7ca4d9..de7d517 100644 --- a/.github/workflows/claude-code-review-reusable.yml +++ b/.github/workflows/claude-code-review-reusable.yml @@ -42,13 +42,15 @@ jobs: 1. **Draft findings privately.** Make an internal list of every issue you'd flag. For each one, record: severity bucket, file:line in the diff, and a one-sentence reason. Do not post yet. - 2. **Validate every drafted finding before posting.** For each one, re-read the cited file:line in the diff and answer honestly: + 2. **Validate every drafted finding before posting.** The full repo is checked out and you have `Read`, `Grep`, and `Glob` available — use them to look outside the diff when it would change the verdict on a finding. Investigate with purpose: only look outside the diff when you have a specific question that would confirm or refute the issue (e.g. "is this function awaited at any callsite?", "is this symbol defined in the imported module?", "does this config key exist elsewhere?"). Don't go fishing. + + For each finding, answer honestly: - Does the cited line actually contain the problem I described? If you can't point to specific tokens in the diff that prove the issue, drop it. - Is this a real defect, or just a style/quality opinion? Drop opinions. - - Could this depend on context outside the diff (caller behavior, runtime state, config not shown)? If you can't validate it from the diff alone, drop it. + - Does this depend on context outside the diff (caller behavior, related files, config)? Go check that context — read the relevant files, grep for usages, look at the surrounding code. If after checking you still can't confirm the issue, drop it. - Is this something CI already enforces (lint, types, formatting)? Drop it. - Is this a pedantic nitpick a senior engineer wouldn't raise? Drop it. - - Is this a pre-existing issue not introduced by the diff? Drop it. + - Is this a pre-existing issue not introduced by the diff? Drop it — only flag what this PR introduces or makes worse. Use `git blame` or `git log` on the cited lines if you're unsure whether the diff introduced the problem. 3. **Keep only HIGH-SIGNAL survivors.** A finding survives only if at least one of these is true: - The code will fail to compile or parse (syntax errors, type errors, missing imports, unresolved references) @@ -83,4 +85,4 @@ jobs: )" ``` - claude_args: '--allowed-tools "Bash(gh issue view:*),Bash(gh search:*),Bash(gh issue list:*),Bash(gh pr comment:*),Bash(gh pr diff:*),Bash(gh pr view:*),Bash(gh pr list:*)"' + claude_args: '--allowed-tools "Read,Grep,Glob,Bash(gh issue view:*),Bash(gh search:*),Bash(gh issue list:*),Bash(gh pr comment:*),Bash(gh pr diff:*),Bash(gh pr view:*),Bash(gh pr list:*),Bash(gh api:*),Bash(git log:*),Bash(git show:*),Bash(git blame:*),Bash(git diff:*)"' From e474531299f4074cf6f2f9e18067a9adb4e1312a Mon Sep 17 00:00:00 2001 From: Jordan Date: Thu, 21 May 2026 16:00:12 -0400 Subject: [PATCH 3/3] bump review timeout to 8 minutes Validator now reads outside the diff (grep/read, gh api, git blame), which takes longer on bigger PRs. 5 min was already tight for the single-pass review; with investigation it'll be tighter. 8 gives a comfortable margin without letting a runaway review burn budget. --- .github/workflows/claude-code-review-reusable.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/claude-code-review-reusable.yml b/.github/workflows/claude-code-review-reusable.yml index de7d517..090de3a 100644 --- a/.github/workflows/claude-code-review-reusable.yml +++ b/.github/workflows/claude-code-review-reusable.yml @@ -10,7 +10,7 @@ on: jobs: claude-review: runs-on: ubuntu-latest - timeout-minutes: 5 + timeout-minutes: 8 permissions: contents: read pull-requests: write