Skip to content

fix(pr-review): per-file diff budget with manifest, replacing byte-slice truncation#234

Merged
simonrosenberg merged 2 commits into
mainfrom
fix-pr-review-diff-truncation
May 15, 2026
Merged

fix(pr-review): per-file diff budget with manifest, replacing byte-slice truncation#234
simonrosenberg merged 2 commits into
mainfrom
fix-pr-review-diff-truncation

Conversation

@simonrosenberg
Copy link
Copy Markdown
Contributor

@simonrosenberg simonrosenberg commented May 13, 2026

Closes #233.

Why

The PR review bot's diff truncation (MAX_TOTAL_DIFF = 100000 + naive byte slice) silently dropped any patch whose bytes landed past 100K. Because the GitHub raw-diff endpoint returns files alphabetically, this systematically punished late-alphabet frontend files and all openhands/**/*.py backend changes on large PRs. The bot then confidently flagged dropped files as "missing from the diff" — see OpenHands/OpenHands#14401, where it blocked approval four times on a file that was actually in the PR but lived past byte 105,151 of a 190KB diff.

What changed

agent_script.py — replace the raw-diff API + byte slice with /pulls/{n}/files + per-file budget:

  • New get_pr_files(): paginated fetch of structured per-file objects (filename, status, additions, deletions, patch).
  • New format_files_manifest(): lists every file in the PR with status and +/- counts. Invariant: every file in the input appears exactly once, regardless of budget.
  • New format_patches(): emits each file's patch up to MAX_PER_FILE_PATCH = 8000 chars; when the running total would exceed MAX_TOTAL_DIFF = 100000 chars, later files get a header-only stub with an explicit [patch omitted: ...] marker. Per-file truncation is annotated inline with [patch abbreviated: ...]. Never silently drops a file.
  • Caps unchanged from the prior code (100K total, 8K per file). With the manifest + per-file budget, the cap no longer drops whole files — abbreviated patches and the "read from workspace" skill rule cover the rest.

prompt.py — add {files_manifest} slot above the ```diff fence, with explanatory text telling the agent that abbreviated/omitted patches are still in the PR and should be read from the workspace.

code-review/SKILL.md — new GROUNDING section requiring: (a) check the manifest + read the file from workspace before flagging anything as missing; (b) verify inline-comment line numbers with sed -n / view rather than counting from @@ hunk headers (which LLMs do unreliably).

tests/test_pr_review_diff_payload.py — 10 new tests, including a regression test that mirrors the PR #14401 shape (small file behind two large ones).

Test plan

  • uv run --group test pytest tests/test_pr_review_diff_payload.py tests/test_pr_review_prompt.py tests/test_pr_review_review_context.py — 23/23 pass.
  • Full suite (rebased on fix(ci): catch up with openhands-sdk v1.22.0 import path changes #235): 269/269 pass.
  • CI green: test, sync-sdk-skill, sync-extensions, validate-claude-code all pass.
  • Live verification on a large PR after merge — confirm manifest renders and large files no longer trigger "file missing" findings.

Out of scope

  • Line-number hallucinations — the skill change addresses this with a "verify before commenting" rule, but a deeper fix (e.g. inline line-number annotations in patches) is a separate PR.
  • Cross-round dedup of false positives — the existing soft "don't repeat" instruction in prompt.py is unchanged. Improvement needs measurement, not just more prompting.

A matching README fix lands as a separate PR in software-agent-sdk (#3239).

🤖 Generated with Claude Code

Copy link
Copy Markdown
Contributor

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Elegant solution to a real, documented bug.

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM

High-impact area (core diff handling logic) but well-tested with comprehensive regression coverage. The structured file API eliminates the silent-drop failure mode entirely. Live verification pending (acknowledged in PR description).

VERDICT:
Worth merging: Solves the exact problem described, clean architecture, proper tests.

KEY INSIGHT:
Switching from byte-slice truncation to structured per-file API with explicit markers converts a silent failure into visible, actionable information for the agent.


patch = file.get("patch") or ""
remaining_budget = max_total - total
if remaining_budget <= len(header):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Minor observation - when remaining_budget <= len(header), appending header + "[patch omitted: ...]" can cause total to slightly exceed max_total by the marker length.

This is acceptable since the marker is essential (agent needs to know the patch was omitted), but worth documenting. Consider adding a comment:

# Note: This may push total slightly over max_total, but the marker
# is essential so the agent knows to read from workspace

Debug Agent and others added 2 commits May 13, 2026 17:55
…ice truncation (#233)

Switch agent_script.py from the raw-diff endpoint to `/pulls/{n}/files`
and feed the agent a Files Changed manifest plus per-file budgeted patches.
Update the code-review skill to require manifest + workspace checks before
flagging a file as missing, and to ground line numbers against the file
before posting inline comments.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cap bump from the previous commit was scope creep. With per-file
budgeting and the manifest in place, files past the cap are already
visible to the agent and reachable from the workspace — the bug is
fixed without raising the cap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@simonrosenberg simonrosenberg force-pushed the fix-pr-review-diff-truncation branch from b1c63b4 to 80f647b Compare May 13, 2026 15:55
Copy link
Copy Markdown
Contributor

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@xingyaoww xingyaoww requested a review from all-hands-bot May 13, 2026 19:13
Copy link
Copy Markdown
Contributor

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Solid fix for a real bug, clean architecture, comprehensive tests. One critical documentation error must be corrected before merge.

[CRITICAL ISSUES]

See inline comment on line 92.

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM

High-impact area (core diff handling logic) but well-tested with comprehensive regression coverage including the exact failure mode from issue #233. The structured API eliminates the silent-drop bug entirely. Documentation mismatch must be resolved but doesn't affect the implementation quality.

VERDICT:
Worth merging after fix: Solves a documented production bug with an elegant solution. Fix the MAX_TOTAL_DIFF documentation discrepancy and this is ready.

KEY INSIGHT:
Switching from byte-slice truncation to structured per-file budgeting with explicit markers transforms a silent failure mode into a visible, debuggable one - this is exactly the kind of defensive engineering that prevents future "file missing" false positives.

Comment thread plugins/pr-review/scripts/agent_script.py
Copy link
Copy Markdown
Contributor

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Elegant fix for a real, documented bug (issue #233). The structured file API eliminates the root cause (byte-slicing a format that isn't byte-alignable), and the manifest + per-file budget with explicit markers is clean design. Comprehensive test coverage including the exact regression scenario. The unresolved thread about marker length is fine as-is — the agent needs that text.

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM

High-impact area (core diff handling logic) but well-tested with regression coverage for the exact failure mode. The structured API eliminates silent file drops entirely. Live verification pending (acknowledged in PR description) is acceptable given urgency and test coverage.

VERDICT:
Worth merging - Solves the documented problem, clean architecture, comprehensive tests.

KEY INSIGHT:
Switching from byte-slicing raw diff to per-file structured data eliminates an entire class of bugs where arbitrary truncation points corrupt patches or silently drop files.

@simonrosenberg simonrosenberg merged commit 0275b6b into main May 15, 2026
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Code review bot repeatedly hallucinated about PR contents

3 participants