Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing endpoints available in singer-io/tap-github #93

Merged
merged 50 commits into from
Apr 7, 2022

Conversation

Ry-DS
Copy link
Contributor

@Ry-DS Ry-DS commented Mar 10, 2022

Closes #83

TODO:

  • Review (Meltano)
  • Discuss the code duplication (CI SonarCloud)
  • Convert Reviews and Pull Request Commits to GraphQL streams (they are too slow with rest since PR is the parent) See Add missing endpoints available in singer-io/tap-github #93 (comment)
  • Add pagination for the Graphql client #89 Decided that pagination of the graphql api is not worthwhile to explore (see above comment)
  • Test all streams individually to make sure they give some output
  • (Maybe) write test just for organization
  • Make sure the following are correct
    • Parent keys are missing in the new streams. Is this ok? Decided to add them
    • Set ignore_parent_replication_key to true unless sure if not
  • Finish implementing all streams

Streams:

  • Collaborators
  • Assignees
  • Reviews
  • Review Comments
  • Commit Comments
  • Issue Milestones
  • Projects
  • Project Cards
  • Project Columns
  • PR Commits
  • Releases
  • Organization stream
    • Teams
    • Team members
    • Team memberships

This PR involved a lot of json to propertylist conversions. Here are some regex conversion expressions to make things easier:

  • Automatic string type conversion: "(\w+)": .+" th.Property("$1", th.StringType)
  • Integer type conversion: "(\w+)": \d+ th.Property("$1", th.IntegerType)
  • Boolean type conversion: "(\w+)": (true|false)+ th.Property("$1", th.BooleanType)
  • Object type conversion "(\w+)": \{(.|\n+?)\}, th.Property("$1", th.ObjectType($2),),

Maybe we could standardise the above regex queries and document them in the README?

Discussion points:

  • We could use the graphql api for Pull Request Reviews if we want to extend our quota potential. Thoughts? Went with the path of least resistance for now (REST) but if you do see huge savings by using graphql let me know.
  • Not sure how to fix the test as it now needs authentication to work (collaborators needs a github token to work)

@Ry-DS Ry-DS marked this pull request as ready for review March 10, 2022 15:10
@ericboucher ericboucher marked this pull request as draft March 11, 2022 16:53
@ericboucher
Copy link
Contributor

@Ry-DS make sure to have a look @edgarrmondragon's PR #31 to avoid duplication

@Ry-DS
Copy link
Contributor Author

Ry-DS commented Mar 13, 2022

Conversions using regex patterns in description. Some need manual fixes such as arrays, nested objects, and null as the regex doesn't match those.

Click to expand
// projects
{
 th.Property("owner_url", th.StringType),
 th.Property("url", th.StringType),
 th.Property("html_url", th.StringType),
 th.Property("columns_url", th.StringType),
 th.Property("id", th.IntegerType),
 th.Property("node_id", th.StringType),
 th.Property("name", th.StringType),
 th.Property("body", th.StringType),
 th.Property("number", th.IntegerType),
 th.Property("state", th.StringType),
 th.Property("creator", th.ObjectType(
   th.Property("login", th.StringType),
   th.Property("id", th.IntegerType),
   th.Property("node_id", th.StringType),
   th.Property("avatar_url", th.StringType),
   th.Property("gravatar_id", th.StringType),
   th.Property("url", th.StringType),
   th.Property("html_url", th.StringType),
   th.Property("followers_url", th.StringType),
   th.Property("following_url", th.StringType),
   th.Property("gists_url", th.StringType),
   th.Property("starred_url", th.StringType),
   th.Property("subscriptions_url", th.StringType),
   th.Property("organizations_url", th.StringType),
   th.Property("repos_url", th.StringType),
   th.Property("events_url", th.StringType),
   th.Property("received_events_url", th.StringType),
   th.Property("type", th.StringType),
   th.Property("site_admin", th.BooleanType)
 ),),
 th.Property("created_at", th.StringType),
 th.Property("updated_at", th.StringType)
}

// project cards
{
 th.Property("url", th.StringType),
 th.Property("id", th.IntegerType),
 th.Property("node_id", th.StringType),
 th.Property("note", th.StringType),
 th.Property("creator", th.ObjectType(
   th.Property("login", th.StringType),
   th.Property("id", th.IntegerType),
   th.Property("node_id", th.StringType),
   th.Property("avatar_url", th.StringType),
   th.Property("gravatar_id", th.StringType),
   th.Property("url", th.StringType),
   th.Property("html_url", th.StringType),
   th.Property("followers_url", th.StringType),
   th.Property("following_url", th.StringType),
   th.Property("gists_url", th.StringType),
   th.Property("starred_url", th.StringType),
   th.Property("subscriptions_url", th.StringType),
   th.Property("organizations_url", th.StringType),
   th.Property("repos_url", th.StringType),
   th.Property("events_url", th.StringType),
   th.Property("received_events_url", th.StringType),
   th.Property("type", th.StringType),
   th.Property("site_admin", th.BooleanType)
 ),),
 th.Property("created_at", th.StringType),
 th.Property("updated_at", th.StringType),
 th.Property("archived", th.BooleanType),
 th.Property("column_url", th.StringType),
 th.Property("content_url", th.StringType),
 th.Property("project_url", th.StringType)
}
// project columns
{
 th.Property("url", th.StringType),
 th.Property("project_url", th.StringType),
 th.Property("cards_url", th.StringType),
 th.Property("id", th.IntegerType),
 th.Property("node_id", th.StringType),
 th.Property("name", th.StringType),
 th.Property("created_at", th.StringType),
 th.Property("updated_at", th.StringType)
}
// pr commits
{
 th.Property("url", th.StringType),
 th.Property("sha", th.StringType),
 th.Property("node_id", th.StringType),
 th.Property("html_url", th.StringType),
 th.Property("comments_url", th.StringType),
 th.Property("commit", th.ObjectType(
   th.Property("url", th.StringType),
   th.Property("author", th.ObjectType(
     th.Property("name", th.StringType),
     th.Property("email", th.StringType),
     th.Property("date", th.StringType)
   ),),
   th.Property("committer", th.ObjectType(
     th.Property("name", th.StringType),
     th.Property("email", th.StringType),
     th.Property("date", th.StringType)
   ),),
   th.Property("message", th.StringType),
   th.Property("tree", th.ObjectType(
     th.Property("url", th.StringType),
     th.Property("sha", th.StringType)
   ),),
   th.Property("comment_count", th.IntegerType),
   th.Property("verification", th.ObjectType(
     th.Property("verified", th.BooleanType),
     th.Property("reason", th.StringType),
     "signature": null,
     "payload": null
   ),),
 ),),
 th.Property("author", th.ObjectType(
   th.Property("login", th.StringType),
   th.Property("id", th.IntegerType),
   th.Property("node_id", th.StringType),
   th.Property("avatar_url", th.StringType),
   th.Property("gravatar_id", th.StringType),
   th.Property("url", th.StringType),
   th.Property("html_url", th.StringType),
   th.Property("followers_url", th.StringType),
   th.Property("following_url", th.StringType),
   th.Property("gists_url", th.StringType),
   th.Property("starred_url", th.StringType),
   th.Property("subscriptions_url", th.StringType),
   th.Property("organizations_url", th.StringType),
   th.Property("repos_url", th.StringType),
   th.Property("events_url", th.StringType),
   th.Property("received_events_url", th.StringType),
   th.Property("type", th.StringType),
   th.Property("site_admin", th.BooleanType)
 ),),
 th.Property("committer", th.ObjectType(
   th.Property("login", th.StringType),
   th.Property("id", th.IntegerType),
   th.Property("node_id", th.StringType),
   th.Property("avatar_url", th.StringType),
   th.Property("gravatar_id", th.StringType),
   th.Property("url", th.StringType),
   th.Property("html_url", th.StringType),
   th.Property("followers_url", th.StringType),
   th.Property("following_url", th.StringType),
   th.Property("gists_url", th.StringType),
   th.Property("starred_url", th.StringType),
   th.Property("subscriptions_url", th.StringType),
   th.Property("organizations_url", th.StringType),
   th.Property("repos_url", th.StringType),
   th.Property("events_url", th.StringType),
   th.Property("received_events_url", th.StringType),
   th.Property("type", th.StringType),
   th.Property("site_admin", th.BooleanType)
 ),),
 "parents": [
   {
     th.Property("url", th.StringType),
     th.Property("sha", th.StringType)
   }
 ]
}


// releases
{
 th.Property("url", th.StringType),
 th.Property("html_url", th.StringType),
 th.Property("assets_url", th.StringType),
 th.Property("upload_url", th.StringType),
 th.Property("tarball_url", th.StringType),
 th.Property("zipball_url", th.StringType),
 th.Property("id", th.IntegerType),
 th.Property("node_id", th.StringType),
 th.Property("tag_name", th.StringType),
 th.Property("target_commitish", th.StringType),
 th.Property("name", th.StringType),
 th.Property("body", th.StringType),
 th.Property("draft", th.BooleanType),
 th.Property("prerelease", th.BooleanType),
 th.Property("created_at", th.StringType),
 th.Property("published_at", th.StringType),
 th.Property("author", th.ObjectType(
   th.Property("login", th.StringType),
   th.Property("id", th.IntegerType),
   th.Property("node_id", th.StringType),
   th.Property("avatar_url", th.StringType),
   th.Property("gravatar_id", th.StringType),
   th.Property("url", th.StringType),
   th.Property("html_url", th.StringType),
   th.Property("followers_url", th.StringType),
   th.Property("following_url", th.StringType),
   th.Property("gists_url", th.StringType),
   th.Property("starred_url", th.StringType),
   th.Property("subscriptions_url", th.StringType),
   th.Property("organizations_url", th.StringType),
   th.Property("repos_url", th.StringType),
   th.Property("events_url", th.StringType),
   th.Property("received_events_url", th.StringType),
   th.Property("type", th.StringType),
   th.Property("site_admin", th.BooleanType)
 ),),
 "assets": [
   {
     th.Property("url", th.StringType),
     th.Property("browser_download_url", th.StringType),
     th.Property("id", th.IntegerType),
     th.Property("node_id", th.StringType),
     th.Property("name", th.StringType),
     th.Property("label", th.StringType),
     th.Property("state", th.StringType),
     th.Property("content_type", th.StringType),
     th.Property("size", th.IntegerType),
     th.Property("download_count", th.IntegerType),
     th.Property("created_at", th.StringType),
     th.Property("updated_at", th.StringType),
     th.Property("uploader", th.ObjectType(
       th.Property("login", th.StringType),
       th.Property("id", th.IntegerType),
       th.Property("node_id", th.StringType),
       th.Property("avatar_url", th.StringType),
       th.Property("gravatar_id", th.StringType),
       th.Property("url", th.StringType),
       th.Property("html_url", th.StringType),
       th.Property("followers_url", th.StringType),
       th.Property("following_url", th.StringType),
       th.Property("gists_url", th.StringType),
       th.Property("starred_url", th.StringType),
       th.Property("subscriptions_url", th.StringType),
       th.Property("organizations_url", th.StringType),
       th.Property("repos_url", th.StringType),
       th.Property("events_url", th.StringType),
       th.Property("received_events_url", th.StringType),
       th.Property("type", th.StringType),
       th.Property("site_admin", th.BooleanType)
     ),),
   }
 ]
}
// teams
{
 th.Property("id", th.IntegerType),
 th.Property("node_id", th.StringType),
 th.Property("url", th.StringType),
 th.Property("html_url", th.StringType),
 th.Property("name", th.StringType),
 th.Property("slug", th.StringType),
 th.Property("description", th.StringType),
 th.Property("privacy", th.StringType),
 th.Property("permission", th.StringType),
 th.Property("members_url", th.StringType),
 th.Property("repositories_url", th.StringType),
 "parent": null
}

// team members
{
 th.Property("login", th.StringType),
 th.Property("id", th.IntegerType),
 th.Property("node_id", th.StringType),
 th.Property("avatar_url", th.StringType),
 th.Property("gravatar_id", th.StringType),
 th.Property("url", th.StringType),
 th.Property("html_url", th.StringType),
 th.Property("followers_url", th.StringType),
 th.Property("following_url", th.StringType),
 th.Property("gists_url", th.StringType),
 th.Property("starred_url", th.StringType),
 th.Property("subscriptions_url", th.StringType),
 th.Property("organizations_url", th.StringType),
 th.Property("repos_url", th.StringType),
 th.Property("events_url", th.StringType),
 th.Property("received_events_url", th.StringType),
 th.Property("type", th.StringType),
 th.Property("site_admin", th.BooleanType)
}

// team roles
{
th.Property("url", th.StringType),
th.Property("role", th.StringType),
th.Property("state", th.StringType)
}

todo_streams.py Outdated Show resolved Hide resolved
Copy link
Contributor

@laurentS laurentS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a lot! I think there's not much missing, but some changes are probably required for the tap to work.
I've only written some of the comments once to avoid repeating the same thing over and over, but they might apply to multiple streams.

.github/workflows/test_tap.yml Show resolved Hide resolved
tap_github/organization_streams.py Show resolved Hide resolved
tap_github/organization_streams.py Outdated Show resolved Hide resolved
tap_github/organization_streams.py Outdated Show resolved Hide resolved
tap_github/organization_streams.py Outdated Show resolved Hide resolved
tap_github/repository_streams.py Outdated Show resolved Hide resolved
tap_github/repository_streams.py Outdated Show resolved Hide resolved
tap_github/streams.py Outdated Show resolved Hide resolved
tap_github/streams.py Show resolved Hide resolved
tap_github/tests/test_tap.py Outdated Show resolved Hide resolved
@Ry-DS Ry-DS requested a review from laurentS March 25, 2022 11:04
@aaronsteers
Copy link
Contributor

aaronsteers commented Mar 25, 2022

@Ry-DS It looks like the API is indeed returning them as strings, but they are clearly integers... Can we force the conversion? Do you get an error?
-> https://api.github.com/repos/facebook/react/events

It makes sense that it's a string as the number is quite large (20 billion) which definitely won't fit in a 32bit signed integer (max 2 billion). I think it's the right move to keep it as strings (it would error out otherwise).

I've logged an issue in the SDK to keep this discussion going. For now, I agree it sounds like preserving as a string type is the "safe" option.

https://gitlab.com/meltano/sdk/-/issues/351

tap_github/tests/fixtures.py Outdated Show resolved Hide resolved
tap_github/tests/test_tap.py Outdated Show resolved Hide resolved
@laurentS
Copy link
Contributor

@Ry-DS It looks like the API is indeed returning them as strings, but they are clearly integers... Can we force the conversion? Do you get an error?
-> https://api.github.com/repos/facebook/react/events

It makes sense that it's a string as the number is quite large (20 billion) which definitely won't fit in a 32bit signed integer (max 2 billion). I think it's the right move to keep it as strings (it would error out otherwise).

I've logged an issue in the SDK to keep this discussion going. For now, I agree it sounds like preserving as a string type is the "safe" option.

https://gitlab.com/meltano/sdk/-/issues/351

In our use-case, we pipe the output from this tap into target-postgres which actually converts integer into bigint which is an 8 byte integer in pg, so ranging way beyond 2 billion.
I don't know if there is a standard conversion across taps/targets...

@ericboucher ericboucher mentioned this pull request Mar 28, 2022
# Conflicts:
#	tap_github/repository_streams.py
#	tap_github/tap.py
@laurentS
Copy link
Contributor

laurentS commented Apr 5, 2022

There's something very weird with pytest, in the 3.10 environment, it says it's running python 3.8.10:
image
I suspect it came from the cached environment, despite the key being per python version, because I can't think of another reason for this to happen.

@laurentS laurentS mentioned this pull request Apr 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add all endpoints available in singer-io/tap-github
5 participants