Skip to content

current insert_facade_contributors logic not linking previously resolved contributors #252

@collectoss-issue-migrator

Description

@collectoss-issue-migrator

Note

Migrated from augurlabs/augur#3779
Originally opened by @MoralCode on 2026-03-19


https://github.com/chaoss/augur/blob/49a008ab97c43472339e400cb316a5323110d78d/augur/tasks/github/facade_github/tasks.py#L210-L246

Query error
this query, per the comments and what i understand of the implementation is getting all of the commit data's emails and names from the commit table that do not appear in the contributors table or the contributors_aliases table.

This works fine for new contributors that are not yet in the contributors table.

Where this fails is if a record slips through (maybe commits were made using an email that later got linked to a github account). In that case, the records with NULL cmt_ght_author_id never get revisited and properly linked to a contributor that IS resolved, just not for the commits that slipped through.

Last Collection Date

since we added a filter for last collection date (added by @IsaacMilarky added it in 8539825bb217c388735dfa1bc43d25dc4cee0d51, PR augurlabs/augur#3253 ), this problem is worse, since anything that slipped through or didnt get properly linked will now be systematically ignored by its older last collection date.

When this PR was filed, I called out that the change it was making seemed unrelated to the analyze_commits_in_parallel problem that was being solved at the time (augurlabs/augur#3253 (comment)).

Since this query is being run as essentially a precondition check to establish what records we should attempt to run contributor resolution on.

currently our logic is to find all email addresses (looking up against two sources, the contributors table and the aliases table, see #237) haven't been matched yet

That is way less simple than "we should run contributor resolution on all commits we don't currently have linked to a contributor" AKA "is the cmt_ght_author_id NULL?"

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdeployed versionLive problems with deployed versionshigh priorityBlocking multiple other things, causing data loss, or other incredibly urgent thingstech debt

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions