Skip to content

Fix: Eliminate redundant full table scans in messages and events collection (phase 1, only events.py)#3479

Open
PredictiveManish wants to merge 4 commits intoaugurlabs:mainfrom
PredictiveManish:Map-once
Open

Fix: Eliminate redundant full table scans in messages and events collection (phase 1, only events.py)#3479
PredictiveManish wants to merge 4 commits intoaugurlabs:mainfrom
PredictiveManish:Map-once

Conversation

@PredictiveManish
Copy link
Copy Markdown
Contributor

@PredictiveManish PredictiveManish commented Dec 18, 2025

Description

augur/tasks/github/events.py

  • Built issue_url_to_id_map and pr_url_to_id_map once in BulkGithubEventCollection.collect() before the batch loop

  • Updated _process_events(), _process_issue_events(), and _process_pr_events() to accept mappings as parameters

  • Removed redundant _get_map_from_*() calls from batch processing methods
    Notes for Reviewers

  • This PR fixes (partially) Full table scans on every batch in messages and events collection #3440

Signed commits

  • Yes, I signed my commits.

Copy link
Copy Markdown
Collaborator

@MoralCode MoralCode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM! Would like to better understand how this has been tested/whether it still needs testing to confirm it works

@PredictiveManish
Copy link
Copy Markdown
Contributor Author

code LGTM! Would like to better understand how this has been tested/whether it still needs testing to confirm it works

Actually yes it needs testing to confirm! Will confirm when it's done, facing a little issue in setting up as I have purchased a new device so will share when tested.

@MoralCode MoralCode added the waiting This change is waiting for some other changes to land first label Jan 9, 2026
@MoralCode MoralCode requested a review from shlokgilda January 9, 2026 20:04
@MoralCode
Copy link
Copy Markdown
Collaborator

marked waiting due to waiting for testing (also probably needs a rebase)

Signed-off-by: PredictiveManish <manish.tiwari.09@zohomail.in>
shlokgilda
shlokgilda previously approved these changes Jan 9, 2026
Copy link
Copy Markdown
Collaborator

@shlokgilda shlokgilda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Waiting to hear back about the testing status.

@MoralCode
Copy link
Copy Markdown
Collaborator

Can you resolve the merge conflicts?

Signed-off-by: Manish Tiwari <manish.tiwari.09@zohomail.in>
@PredictiveManish
Copy link
Copy Markdown
Contributor Author

Can you resolve the merge conflicts?

Yes, sure!

shlokgilda
shlokgilda previously approved these changes Jan 31, 2026
Copy link
Copy Markdown
Collaborator

@shlokgilda shlokgilda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@MoralCode MoralCode changed the title Fix: Splitting PR #3444 only events.py changes included Fix: Eliminate redundant full table scans in messages and events collection (phase 1, only events.py) Feb 4, 2026
@MoralCode
Copy link
Copy Markdown
Collaborator

Once this has been tested, it can be merged.

sgoggins
sgoggins previously approved these changes Feb 24, 2026
Copy link
Copy Markdown
Collaborator

@sgoggins sgoggins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Manish Tiwari <manish.tiwari.09@zohomail.in>
@PredictiveManish PredictiveManish dismissed stale reviews from sgoggins and shlokgilda via ae07921 February 24, 2026 15:22
@MoralCode MoralCode added the testing Related to Augur's testing suite label Feb 24, 2026
Copy link
Copy Markdown
Collaborator

@shlokgilda shlokgilda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimization idea is right but issue_url_to_id_map and pr_url_to_id_map are never actually defined in collect(). They're just passed as undefined variables to _process_events. This will crash.

You need to add these two lines in collect() after repo_id is set, before the loop:

issue_url_to_id_map = self._get_map_from_issue_url_to_id(repo_id)     
pr_url_to_id_map = self._get_map_from_pr_url_to_id(repo_id)

The helper methods still exist but nothing calls them now, so they're dead code until this is wired up.

Signed-off-by: Manish Tiwari <manish.tiwari.09@zohomail.in>
Copy link
Copy Markdown
Collaborator

@shlokgilda shlokgilda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left an inline comment. Should be GTG after that


# making this a decent size since process_events retrieves all the issues and prs each time
if len(events) >= event_batch_size:
self._process_events(events, repo_id)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call wasn't updated to pass the new map arguments. It'll raise a TypeError at runtime for any repo with more events than event_batch_size. Needs to be:

self._process_events(events, repo_id, issue_url_to_id_map, pr_url_to_id_map)

Also the comment right above (# making this a decent size since process_events retrieves all the issues and prs each time) is now stale and should be removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Related to Augur's testing suite waiting This change is waiting for some other changes to land first

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants