Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

馃帀 Source Github: re-implement PullRequestCommentReactions stream using GraphQL API #14408

Closed
grubberr opened this issue Jul 5, 2022 · 2 comments 路 Fixed by #14795
Closed

Comments

@grubberr
Copy link
Contributor

grubberr commented Jul 5, 2022

Problem

Source GitHub Reactions streams are still not very fast because they nested and require a lot of HTTP requests
to sync all data.

For example to sync stream: pull_request_comment_reactions we need to iterate over comments
and for every comment make separate HTTP request to get a list of reaction for specific comment.
It requires a lot of HTTP requests - it's slow

Solution

One solution we can use is to re-implement REST API -> GraphQL API.
It allows to reduce number of HTTP requests.
One GraphQL API HTTP request can fetch a lot of reactions for a lot of comments.

We can re-implement all Reactions streams using GraphQL API to improve sync performance.

GraphQL API still not silver bullet: it reduce number of HTTP requests but it's CPU intensive
and GitHub count total cost of such GraphQL requests. If you spend all hourly costs you need to wait.

@sherifnada
Copy link
Contributor

@grubberr @lazebnyi i saw we are updating a number of issues to GraphQL, how does this pattern compare/differ with the investigation lazebnyi did here: #8705 ?

@grubberr
Copy link
Contributor Author

@sherifnada @lazebnyi

Quote from #8705

As we see from the table GQL can improve the performance of connectors, especially for nested streams.

it's true, for nested streams (and as for me only for nested streams) it's significantly improve performance

I did performance measurements for full_refresh sync of stream: pull_request_comment_reactions:

  1. REST API: 28076 - HTTP requests, 340 minutes of sync
  2. GraphQL API: 2757 - HTTP requests, 21 minutes of sync

The downside of GraphQL is code complexity, you can see in PR pretty complex paging algorithm

@grubberr grubberr changed the title 馃帀 Source Github: re-implement Reactions streams using GraphQL API 馃帀 Source Github: re-implement PullRequestCommentReactions stream using GraphQL API Aug 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants