Skip to content

Legacy Data Issue: adding all contributors.canonical_email that are not in contributor_alias.alias_email #270

@cdolfi

Description

@cdolfi

Now, collectoss adds all emails from the facade task to the contributor_alias.alias_email column. For databases that existed before collectoss there needs to be row added for all contributors.canonical_email that are not in the contributor_alias.alias_email.

Note: this is different from #269 as that one is for contributor_alias.canonical_email and this one is for contributors.canonical_email

How many are not in the alias table:

SELECT COUNT(*) FROM augur_data.contributors c WHERE NOT EXISTS ( SELECT 1 FROM augur_data.contributors_aliases ca WHERE ca.alias_email = c.cntrb_canonical ) AND c.cntrb_canonical IS NOT NULL;

Query to add to the alias table:

    alias_email,
    canonical_email,
    cntrb_active,
    cntrb_last_modified,
    tool_source,
    tool_version,
    data_source,
    data_collection_date,
    cntrb_id
)
SELECT
    c.cntrb_canonical AS alias_email,
    c.cntrb_canonical AS canonical_email,
    1 AS cntrb_active,
    NOW() AS cntrb_last_modified,
    'Manual' AS tool_source,
    '0.0.1' AS tool_version,
    'Contributors Table' AS data_source,
    NOW() AS data_collection_date,
    c.cntrb_id
FROM augur_data.contributors c
WHERE NOT EXISTS (
    SELECT 1
    FROM augur_data.contributors_aliases ca
    WHERE ca.alias_email = c.cntrb_canonical
)
AND c.cntrb_canonical IS NOT NULL;

If email is added manually it may need to get removed from the unresolved table"

DELETE FROM augur_data.unresolved_commit_emails
WHERE email IN (
    SELECT alias_email
    FROM augur_data.contributors_aliases
);

Metadata

Metadata

Assignees

No one assigned

    Labels

    databaseRelated to the unifed data model/schema

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions