fix(pr-review): per-file diff budget with manifest, replacing byte-slice truncation#234
Conversation
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Elegant solution to a real, documented bug.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟡 MEDIUM
High-impact area (core diff handling logic) but well-tested with comprehensive regression coverage. The structured file API eliminates the silent-drop failure mode entirely. Live verification pending (acknowledged in PR description).
VERDICT:
✅ Worth merging: Solves the exact problem described, clean architecture, proper tests.
KEY INSIGHT:
Switching from byte-slice truncation to structured per-file API with explicit markers converts a silent failure into visible, actionable information for the agent.
|
|
||
| patch = file.get("patch") or "" | ||
| remaining_budget = max_total - total | ||
| if remaining_budget <= len(header): |
There was a problem hiding this comment.
🟡 Suggestion: Minor observation - when remaining_budget <= len(header), appending header + "[patch omitted: ...]" can cause total to slightly exceed max_total by the marker length.
This is acceptable since the marker is essential (agent needs to know the patch was omitted), but worth documenting. Consider adding a comment:
# Note: This may push total slightly over max_total, but the marker
# is essential so the agent knows to read from workspace…ice truncation (#233) Switch agent_script.py from the raw-diff endpoint to `/pulls/{n}/files` and feed the agent a Files Changed manifest plus per-file budgeted patches. Update the code-review skill to require manifest + workspace checks before flagging a file as missing, and to ground line numbers against the file before posting inline comments. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cap bump from the previous commit was scope creep. With per-file budgeting and the manifest in place, files past the cap are already visible to the agent and reachable from the workspace — the bug is fixed without raising the cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
b1c63b4 to
80f647b
Compare
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Solid fix for a real bug, clean architecture, comprehensive tests. One critical documentation error must be corrected before merge.
[CRITICAL ISSUES]
See inline comment on line 92.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟡 MEDIUM
High-impact area (core diff handling logic) but well-tested with comprehensive regression coverage including the exact failure mode from issue #233. The structured API eliminates the silent-drop bug entirely. Documentation mismatch must be resolved but doesn't affect the implementation quality.
VERDICT:
✅ Worth merging after fix: Solves a documented production bug with an elegant solution. Fix the MAX_TOTAL_DIFF documentation discrepancy and this is ready.
KEY INSIGHT:
Switching from byte-slice truncation to structured per-file budgeting with explicit markers transforms a silent failure mode into a visible, debuggable one - this is exactly the kind of defensive engineering that prevents future "file missing" false positives.
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Elegant fix for a real, documented bug (issue #233). The structured file API eliminates the root cause (byte-slicing a format that isn't byte-alignable), and the manifest + per-file budget with explicit markers is clean design. Comprehensive test coverage including the exact regression scenario. The unresolved thread about marker length is fine as-is — the agent needs that text.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟡 MEDIUM
High-impact area (core diff handling logic) but well-tested with regression coverage for the exact failure mode. The structured API eliminates silent file drops entirely. Live verification pending (acknowledged in PR description) is acceptable given urgency and test coverage.
VERDICT:
✅ Worth merging - Solves the documented problem, clean architecture, comprehensive tests.
KEY INSIGHT:
Switching from byte-slicing raw diff to per-file structured data eliminates an entire class of bugs where arbitrary truncation points corrupt patches or silently drop files.
Closes #233.
Why
The PR review bot's diff truncation (
MAX_TOTAL_DIFF = 100000+ naive byte slice) silently dropped any patch whose bytes landed past 100K. Because the GitHub raw-diff endpoint returns files alphabetically, this systematically punished late-alphabet frontend files and allopenhands/**/*.pybackend changes on large PRs. The bot then confidently flagged dropped files as "missing from the diff" — see OpenHands/OpenHands#14401, where it blocked approval four times on a file that was actually in the PR but lived past byte 105,151 of a 190KB diff.What changed
agent_script.py— replace the raw-diff API + byte slice with/pulls/{n}/files+ per-file budget:get_pr_files(): paginated fetch of structured per-file objects (filename, status, additions, deletions, patch).format_files_manifest(): lists every file in the PR with status and +/- counts. Invariant: every file in the input appears exactly once, regardless of budget.format_patches(): emits each file's patch up toMAX_PER_FILE_PATCH = 8000chars; when the running total would exceedMAX_TOTAL_DIFF = 100000chars, later files get a header-only stub with an explicit[patch omitted: ...]marker. Per-file truncation is annotated inline with[patch abbreviated: ...]. Never silently drops a file.prompt.py— add{files_manifest}slot above the ```diff fence, with explanatory text telling the agent that abbreviated/omitted patches are still in the PR and should be read from the workspace.code-review/SKILL.md— newGROUNDINGsection requiring: (a) check the manifest + read the file from workspace before flagging anything as missing; (b) verify inline-comment line numbers withsed -n/viewrather than counting from@@hunk headers (which LLMs do unreliably).tests/test_pr_review_diff_payload.py— 10 new tests, including a regression test that mirrors the PR #14401 shape (small file behind two large ones).Test plan
uv run --group test pytest tests/test_pr_review_diff_payload.py tests/test_pr_review_prompt.py tests/test_pr_review_review_context.py— 23/23 pass.test,sync-sdk-skill,sync-extensions,validate-claude-codeall pass.Out of scope
prompt.pyis unchanged. Improvement needs measurement, not just more prompting.A matching README fix lands as a separate PR in software-agent-sdk (#3239).
🤖 Generated with Claude Code