Github seems to be migrating to a new node_id format (as discovered in #93). https://docs.github.com/en/graphql/guides/migrating-graphql-global-node-ids
I'm currently on the fence about whether we need a dedicated task/logic to update these values in CollectOSS, whether a collectoss-utilities fixup script will work (chaoss/collectoss-utilities#5). Or whether we can just let regular collection handle it.
I initially assumed these legacy IDs were old, but even my local test instance (which occasionally gets reset) had a surprising number of them (48k ish for about 100k total pull request rows)
Even if a fixup script/job of some kind is warranted, a big issue is going to be how to efficiently query the values that need updating since they are essentially strings (and probably not indexed, so ILIKE is going to be slow, especially on multi-terabyte databases)
Leaving this here as an FYI
Some (non exhaustive) places we store these node ids (and also the prefix of the new format for that data type):
- pull_requests.pr_src_node_id (prefix PR_)
- issues.issue_node_id, prefix I_
- message.platform_node_id, prefix IC_ or PRRC_
- contributor.gh_node_id (U_)
- releases.release_id (prefix: RE_)
Github seems to be migrating to a new node_id format (as discovered in #93). https://docs.github.com/en/graphql/guides/migrating-graphql-global-node-ids
I'm currently on the fence about whether we need a dedicated task/logic to update these values in CollectOSS, whether a collectoss-utilities fixup script will work (chaoss/collectoss-utilities#5). Or whether we can just let regular collection handle it.
I initially assumed these legacy IDs were old, but even my local test instance (which occasionally gets reset) had a surprising number of them (48k ish for about 100k total pull request rows)
Even if a fixup script/job of some kind is warranted, a big issue is going to be how to efficiently query the values that need updating since they are essentially strings (and probably not indexed, so ILIKE is going to be slow, especially on multi-terabyte databases)
Leaving this here as an FYI
Some (non exhaustive) places we store these node ids (and also the prefix of the new format for that data type):