Skip to content

Legacy Data Issue: adding all commit.author_emails that have a cntrb_id are not in contributor_alias.alias_email #274

@cdolfi

Description

@cdolfi

Now, collectoss adds all emails from the facade task to the contributor_alias.alias_email column. For databases that existed before collectoss there needs to be row added for all commit.author_emails that have a cntrb_id are not in contributor_alias.alias_email

Note: this is different from #269/#270

Query to add to the alias table:

INSERT INTO augur_data.contributors_aliases (
    alias_email,
    canonical_email,
    cntrb_active,
    cntrb_last_modified,
    tool_source,
    tool_version,
    data_source,
    data_collection_date,
    cntrb_id
)
SELECT DISTINCT ON (c.cmt_author_email)
    c.cmt_author_email AS alias_email,
    c.cmt_author_email AS canonical_email,
    1 AS cntrb_active,
    NOW() AS cntrb_last_modified,
    'Manual' AS tool_source,
    '0.0.1' AS tool_version,
    'Commit table' AS data_source,
    NOW() AS data_collection_date,
    c.cmt_ght_author_id AS cntrb_id
FROM augur_data.commits c
WHERE c.cmt_ght_author_id IS NOT NULL
AND c.cmt_author_email IS NOT NULL
AND c.cmt_author_email LIKE '%@%'
AND NOT EXISTS (
    SELECT 1
    FROM augur_data.contributors_aliases ca
    WHERE ca.alias_email = c.cmt_author_email
);

If email is added manually it may need to get removed from the unresolved table"

DELETE FROM augur_data.unresolved_commit_emails
WHERE email IN (
    SELECT alias_email
    FROM augur_data.contributors_aliases
);

Metadata

Metadata

Assignees

No one assigned

    Labels

    data issuesissues that require going back and fixing affected data in a deployed database

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions