Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BaseGraphqlService, support [github] V4 API #3763

Merged
merged 11 commits into from Jul 29, 2019

Conversation

chris48s
Copy link
Member

This bit of work sprouted off from #3610 but I've decided to do it in its own PR because this will require some review to get right and trying to load all this into a PR that is already a ~500 line diff is going to make that annoying. Lets get this bit right then I'll go back, rebase #3610 onto this work and pick that up.

Main things going on here are:

@chris48s chris48s added service-badge Accepted and actionable changes, features, and bugs core Server, BaseService, GitHub auth labels Jul 23, 2019
@shields-ci
Copy link

shields-ci commented Jul 23, 2019

Warnings
⚠️ This PR modified service code for github but not its test code.
That's okay so long as it's refactoring existing code.
⚠️

📚 Remember to ensure any changes to config.private in services/github/github-helpers.js are reflected in the server secrets documentation

Messages
📖 ✨ Thanks for your contribution to Shields, @chris48s!

Generated by 🚫 dangerJS against ba974b0

schema,
url,
query,
variables = {},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made query and variables top-level params here. I'm in 2 minds about whether it makes more sense to collect them into an object. Also note here I deliberately haven't implemented mutations. In general, calling a shields badge should read data from upstream service, not write data to it, so we shouldn't need mutations.

core/base-service/base-graphql.js Outdated Show resolved Hide resolved
services/github/github-api-provider.js Show resolved Hide resolved
services/github/github-api-provider.js Show resolved Hide resolved
*/
attrs.query = mergeQueries(
attrs.query,
'query { rateLimit { limit cost remaining resetAt } }'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly asking the GH API to just tell us what our remaining usage is in the body seems like an easier approach than attempting to parse the query and calculate the projected cost of each call and less likely to result in a discrepency: https://developer.github.com/v4/guides/resource-limitations/#calculating-nodes-in-a-call

That said, we probably do need to watch out for query count on contributions. With the V3 API, its pretty hard for someone to contribute a badge that smashes our usage limit because 1 http request = 1 point, but with a graphql a single http request can cost you a whole truckload of points. Not sure what the solution is for that other than to watch out for it in review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about establishing a max number of queries, which could be overridden per service? If the response exceeds the configured number it could error out rather than rendering. That way we'll see it right in the code, and can even search for it.

On the other hand, we have a rather huge amount of GitHub quota – I want to say something like 2.2k tokens – so maybe we don't need to worry about this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you thinking of that as a runtime check or a static check? Either way, I'm not against adding it, but I think it can be raised as an issue to address in a later PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't know the answer until the response comes back, it'd have to be a runtime check. Agreed, it can be addressed later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, ok. I misunderstood and thought you were still talking about pre-parsing the query before we send it.

Throwing the exception after we've already made the query is obviously simpler, but also of more limited value (esp if it makes it through to production) in that we've already made the request we don't want to make before deciding we didn't want to make it, if you see what I mean :)

Maybe its a useful MVP to start with, in that if we start getting exceptions in sentry saying "lol - just rinsed our token and now we're not even going to render a badge with all the expensive data we just requested" we can at least be reactive. Bear in mind it might lull us into a false sense of security though as we'd be unlikely to throw it in dev because we treat global tokens differently from pooled ones. My instinct is most contributors probably use a global token as opposed to a pool with one token in it (I know that's what I usually do - I think working on this PR is the first time I've set up a pool locally).

Unfortunately I think we do still actually need to work out the query cost up-front to implement it in a useful way.

function graphqlErrorHandler(errors) {
if (errors[0].type === 'NOT_FOUND') {
throw new NotFound({ prettyMessage: 'repo not found' })
} else {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In time, it might be useful to add other cases in here, but I don't know what they are yet.

Copy link
Member

@paulmelnikow paulmelnikow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Chris, this is looking really good. Not an insignificant piece of work! I left a few comments though there's nothing really major here.

As for porting the service or not, one strategy might be to reserve the GraphQL endpoints for the things that really need it, and the other might be to port things over to GraphQL as much as possible.

Is there a performance difference?

services/github/github-api-provider.js Show resolved Hide resolved
services/github/github-api-provider.js Outdated Show resolved Hide resolved
services/github/github-api-provider.js Outdated Show resolved Hide resolved
}

getRateLimitFromBody(body) {
const parsedBody = JSON.parse(body)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it would be nice to avoid parsing the body twice for all these responses. Any idea if there's a good way to do that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a bit annoying.. I think we could, but I think its also a fairly major change. At the moment we're essentially doing all this token work inside the call to _request(), which means we can deal with manipulating the request and processing the response in the same place. We can also deal with standardTokens, searchTokens and graphqlTokens in the same GithubApiProvider class. In order to avoid double-parsing, we'd need to restructure so we can still manipulate the request, but then update the token pool after we call _parseJson() in either _requestJson() or _requestGraphql().

I've not really completely thought through the consequences of all this, but here's a couple of thoughts:

One way to deal with that would be to split this up so we deal with standardTokens / searchTokens in by overriding the _requestJson() method in GithubAuthV3Service.. and graphqlTokens by overriding the _requestGraphql() method of GithubAuthV4Service. Although at that point if GithubAuthV4Service overrides _requestGraphql(), its not really a subclass of BaseGraphqlService anymore.

Maybe another way would be to allow the _requestJson()/_requestGraphql()/_requestFomat() methods in base classes to accept an optional callback param or something that cna be used to run a post-process after parsing but before we return the response.

Any further thoughts or ideas?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about having GithubAuthV4Service rely on BaseGraphqlService to do the json parsing, but pass in an empty schema so the validation is a no-op. Then the token-related stuff can happen, then GithubAuthV4Service can invoke _validate?


Partly related, I dislike deep inheritance because the indirection makes it hard for people unfamiliar with the code to dig through. Each layer is one more thing to keep in your head at once.

So I think it would be good to make BaseGraphqlService extend directly from BaseService. All that would take is exporting parseJson as its own function (or moving it to its own file).

There are a couple places in GitHub where we invoke _parseJson from within a service, and those will be made cleaner by importing it directly.

In Gerritt there's also a place where we override it, though it'd be less magical and easier to understand if we used BaseService directly (invoking _request, doing the trim, then invoking parse and validate).

The refactor is a bit involved so maybe it's best to leave it for a follow-on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a first step on this, I've moved parseJson() out to a helper and switched BaseGraphqlService to inherit from BaseService. That seems sensible anyway, but I've not taken it further.

Not sure I agree that overriding a method from the base class is "magical" in the context of gerrit though. Its what pretty much every other function we declare in a badge service class is doing - seems like a pretty consistent pattern 🤷‍♂️

services/github/github-api-provider.js Show resolved Hide resolved
services/github/github-auth-service.js Outdated Show resolved Hide resolved
*/
attrs.query = mergeQueries(
attrs.query,
'query { rateLimit { limit cost remaining resetAt } }'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about establishing a max number of queries, which could be overridden per service? If the response exceeds the configured number it could error out rather than rendering. That way we'll see it right in the code, and can even search for it.

On the other hand, we have a rather huge amount of GitHub quota – I want to say something like 2.2k tokens – so maybe we don't need to worry about this.

services/github/github-forks.service.js Outdated Show resolved Hide resolved
core/base-service/base-graphql.js Outdated Show resolved Hide resolved
core/base-service/graphql.js Show resolved Hide resolved
@shields-cd shields-cd temporarily deployed to shields-staging-pr-3763 July 23, 2019 22:16 Inactive
@shields-cd shields-cd temporarily deployed to shields-staging-pr-3763 July 26, 2019 20:41 Inactive
@chris48s
Copy link
Member Author

As for porting the service or not, one strategy might be to reserve the GraphQL endpoints for the things that really need it, and the other might be to port things over to GraphQL as much as possible.

Is there a performance difference?

I thought about this a bit already. As I see it, there are a few tradeoffs to think about here (some performance-related and some not). Predictably there is not a single clear-cut answer in all cases:

  • In most cases, a sensible graphql query will return a smaller JSON response than a REST call. For example, in the forks example we need to request this object with the V3 API to get one value: https://api.github.com/repos/badges/shields but with graphql the response is only like { data: { repository: { forks: { totalCount: 1984 } }, rateLimit: { limit: 5000, cost: 1, remaining: 4999, resetAt: '2019-07-24T21:23:26Z' } } } That's less bytes over the wire and a faster object to parse.
  • Beyond that, I think we can probably assume that making one HTTP request to the V3 API and making one HTTP request to the V4 API are basically identical in terms of performance (even if theoretically one or other takes longer to calculate/serve on their end) because the main thing we're waiting for is network latency.
  • With the V3 API, one request = one point off the usage limit. With the V4 API one request may be many points off the usage limit (but will be a minimum of one). Assuming we can produce the same badge with one REST call or a graphql request: In the best-case scenario, the graphql query will be the same cost as the REST request. In all other cases, graphql will be more expensive in this respect. Given we won't accept a badge that makes more than one V3 request, using the V3 request will sometimes be more efficient on our token usage.
  • At this point in time, REST is more familiar to contributors and more well-understood.
  • I don't think we're in any danger of GitHub deprecating the V3 API in the near future. They haven't published any info about when it will be deprecated or retired: https://developer.github.com/v3/versions/ . That said, presumably at some point in time it will go away and we'll need to migrate everything to V4 one day anyway, unless graphql goes out of fashion or something.

@paulmelnikow
Copy link
Member

I thought about this a bit already. As I see it, there are a few tradeoffs to think about here (some performance-related and some not). Predictably there is not a single clear-cut answer in all cases:

  • In most cases, a sensible graphql query will return a smaller JSON response than a REST call. For example, in the forks example we need to request this object with the V3 API to get one value: https://api.github.com/repos/badges/shields but with graphql the response is only like { data: { repository: { forks: { totalCount: 1984 } }, rateLimit: { limit: 5000, cost: 1, remaining: 4999, resetAt: '2019-07-24T21:23:26Z' } } } That's less bytes over the wire and a faster object to parse.
  • Beyond that, I think we can probably assume that making one HTTP request to the V3 API and making one HTTP request to the V4 API are basically identical in terms of performance (even if theoretically one or other takes longer to calculate/serve on their end) because the main thing we're waiting for is network latency.
  • With the V3 API, one request = one point off the usage limit. With the V4 API one request may be many points off the usage limit (but will be a minimum of one). Assuming we can produce the same badge with one REST call or a graphql request: In the best-case scenario, the graphql query will be the same cost as the REST request. In all other cases, graphql will be more expensive in this respect. Given we won't accept a badge that makes more than one V3 request, using the V3 request will sometimes be more efficient on our token usage.
  • At this point in time, REST is more familiar to contributors and more well-understood.
  • I don't think we're in any danger of GitHub deprecating the V3 API in the near future. They haven't published any info about when it will be deprecated or retired: https://developer.github.com/v3/versions/ . That said, presumably at some point in time it will go away and we'll need to migrate everything to V4 one day anyway, unless graphql goes out of fashion or something.

Everything you're saying makes sense.

Assuming the GraphQL API is otherwise fast, since network latency is the key issue I'd guess the smaller responses probably will have a bigger impact than anything else.

Could you paste what you wrote into a comment in github-auth-service.js? Or else distill it down to what you think contributors need to know?

@shields-cd shields-cd temporarily deployed to shields-staging-pr-3763 July 28, 2019 17:33 Inactive
@chris48s
Copy link
Member Author

Cheers. I'll raise issues for the additional bits of work

@chris48s chris48s merged commit 75ee413 into badges:master Jul 29, 2019
@shields-deployment
Copy link

This pull request was merged to master branch. This change is now waiting for deployment, which will usually happen within a few days. Stay tuned by joining our #ops channel on Discord!

After deployment, changes are copied to gh-pages branch:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Server, BaseService, GitHub auth service-badge Accepted and actionable changes, features, and bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants