Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch databaseId for repos and users when initializing streams #112

Merged
merged 15 commits into from
Apr 29, 2022

Conversation

laurentS
Copy link
Contributor

This PR aims to resolve a number of issues by adding the numeric repo id to the context of all streams in the tap.

We have been having issues with repos changing names, incorrect case in the config file, etc... leading to problems with joins downstream. We are using org/repo to connect the various streams together, but since those values are not stable (for instance if a repo is renamed), this can lead to issues.

To remedy this, we fetch the github id (or databaseId in their graphql API lingo) and add it to all streams via the child context. This id is stable across renames, which should help match records over time.

The PR adds a "preflight" request to the graphql API which fetches said id for all the repos in the config file (if repositories is indeed in the config). At the same time, it:

  • gets the correct casing for mistyped names (so meltanolabs/tap-github is corrected to MeltanoLabs/tap-github, see Tap-github allows inputting buggy data into its output #110)
  • gets new names for repos that have been renamed (running a REST API call on myorg/oldname will return an HTTP 301, but we save this step here)
  • filters out repos which don't exist and issues a warning in the logs for each of them. @aaronsteers you mentioned elsewhere that you'd be in favour of letting the tap error out, does the above seem like an acceptable trade-off to you?

These changes should help increase the resilience of the tap to invalid manual input, as well as repo renames. If this logic seems ok to everyone, we can also apply it to the user streams, as username are likely to change more often than repo names.

tap_github/repository_streams.py Outdated Show resolved Hide resolved
@laurentS
Copy link
Contributor Author

@edgarrmondragon is there a way to disable sonarcloud notifications? It's particularly annoying to get a github notification + email each time we're pushing a commit, to tell me it's green. I've looked around sonarcloud and their user forum, but I suspect this is configurable at admin level, from what I'm reading.

@ericboucher ericboucher linked an issue Apr 28, 2022 that may be closed by this pull request
@edgarrmondragon
Copy link
Member

@edgarrmondragon is there a way to disable sonarcloud notifications? It's particularly annoying to get a github notification + email each time we're pushing a commit, to tell me it's green. I've looked around sonarcloud and their user forum, but I suspect this is configurable at admin level, from what I'm reading.

@laurentS I don't think it's possible at the moment to disable the PR comments without uninstalling SonarCloud completely: https://community.sonarsource.com/t/disable-github-merge-request-comments/18520

@ericboucher ericboucher changed the title Add repo id on streams Fetch databaseId for repos and users when initializing streams Apr 28, 2022
pytest.ini Outdated Show resolved Hide resolved
@sonarcloud
Copy link

sonarcloud bot commented Apr 29, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tap-github allows inputting buggy data into its output
3 participants