Skip to content

[tasks/github] Batch processing for PR reviews collection#3439

Merged
sgoggins merged 4 commits intochaoss:mainfrom
shlokgilda:fix/pr-reviews-batched-processing
Dec 17, 2025
Merged

[tasks/github] Batch processing for PR reviews collection#3439
sgoggins merged 4 commits intochaoss:mainfrom
shlokgilda:fix/pr-reviews-batched-processing

Conversation

@shlokgilda
Copy link
Collaborator

@shlokgilda shlokgilda commented Dec 4, 2025

Description

  • Adds batched processing to collect_pull_request_reviews to reduce memory usage
  • Processes reviews in batches of 1000 instead of accumulating all in memory before insertion
  • Combines contributor and review extraction into a single pass (was two separate loops)
  • Removes redundant repo_id database lookup
  • Fixes incorrect argument order in extract_needed_pr_review_data call (was storing wrong metadata values)

Notes for Reviewers

  • The original code had a subtle bug: it passed tool_source, tool_version where tool_version, data_source was expected. This didn't crash because tool_source is hardcoded inside the function, but it stored "pull_request_reviews_reviews" as tool_version and "2.0" as data_source instead of the correct values.
  • Memory impact: For a repo with 5000 PRs averaging 3 reviews each, old code held ~15,000 review objects in memory. New code caps at ~1000.
  • Follows the same batching pattern already used in collect_pull_requests.

Testing

  • Testing this code with large repos with around 20-30K PRs each. I can see the logs and lower memory usage for workers working on collect_pull_request_reviews. Will post testing related updates in comments if any.

Signed commits

  • Yes, I signed my commits.

AI Disclosure: I used Claude Code to generate docstrings and write this PR draft.

Signed-off-by: Shlok Gilda <gildashlok@hotmail.com>
@shlokgilda shlokgilda marked this pull request as draft December 4, 2025 00:16
@shlokgilda shlokgilda marked this pull request as ready for review December 4, 2025 01:42
Copy link
Contributor

@MoralCode MoralCode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Im a little hesitant about introducing new functions for inserting things into the DB, but I can see that this was added to deduplicate the code since you need to repeat the call after the batches are processed to get any remaining items that don't fill a full batch.

@shlokgilda
Copy link
Collaborator Author

Testing update: Did not encounter any errors. Table values correctly updated. Tested with https://github.com/tensorflow/tensorflow (large repo; ~75K PRs).

@MoralCode MoralCode added the discussion Seeking active feedback, usually for items under active development label Dec 6, 2025
@sgoggins
Copy link
Member

sgoggins commented Dec 9, 2025

@shlokgilda : I merged a PR that fixed a seperate issue on PR reviews from @MoralCode just now ... which created a conflict with your code. I think it looks pretty straightforward as both changes add safety. While I am nearly certain I know how to resolve the conflict while incorporating both changes, I would like @shlokgilda to change it in their fork/branch and update the PR ... So I don't mess something up editing online.

sgoggins
sgoggins previously approved these changes Dec 9, 2025
Copy link
Member

@sgoggins sgoggins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just needs the conflict I introduced by merging another PR fixed.

Signed-off-by: Shlok Gilda <gildashlok@hotmail.com>
@shlokgilda
Copy link
Collaborator Author

@sgoggins Fixed and pushed.

ABrain7710
ABrain7710 previously approved these changes Dec 10, 2025
@MoralCode MoralCode requested a review from sgoggins December 11, 2025 14:52
@MoralCode MoralCode added the ready Items tested and seeking additional approvals or a merge. Usually for items under active development label Dec 15, 2025
sgoggins
sgoggins previously approved these changes Dec 16, 2025
Copy link
Member

@sgoggins sgoggins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sgoggins
Copy link
Member

@shlokgilda : We will merge this as soon as the conflict is resolved! Thanks!

Signed-off-by: Shlok Gilda <gildashlok@hotmail.com>
@shlokgilda shlokgilda dismissed stale reviews from sgoggins and ABrain7710 via 6f9ebef December 16, 2025 05:08
@shlokgilda
Copy link
Collaborator Author

@sgoggins Merge conflict resolved.

Copy link
Contributor

@MoralCode MoralCode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments were addressed. I trust that this has been tested by the author and the code looks reasonable and well documented

Copy link
Member

@sgoggins sgoggins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sgoggins sgoggins merged commit 28bc9d4 into chaoss:main Dec 17, 2025
13 of 14 checks passed
@shlokgilda shlokgilda deleted the fix/pr-reviews-batched-processing branch December 17, 2025 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

discussion Seeking active feedback, usually for items under active development ready Items tested and seeking additional approvals or a merge. Usually for items under active development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants