Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate using GraphQL as our backend #5

Open
Synthetica9 opened this issue Jan 21, 2020 · 9 comments
Open

Investigate using GraphQL as our backend #5

Synthetica9 opened this issue Jan 21, 2020 · 9 comments

Comments

@Synthetica9
Copy link
Collaborator

It seems like we should be able to request all releases in one go (or in batches) instead of in one request per repo.

This should be much faster.

@ryantm
Copy link
Contributor

ryantm commented Jan 26, 2020

We really need to do this, my token just got rate limited!

@Synthetica9
Copy link
Collaborator Author

I looked into this in the mean time, and there doesn't seem to be a good way to get multiple releases in a single request. However, it is my first time using GraphQL, so I might just be overlooking something.

@Synthetica9
Copy link
Collaborator Author

@ryantm 5f9acc4 should alleviate this issue somewhat.

@ryantm
Copy link
Contributor

ryantm commented Dec 2, 2020

Yes, that definitely helped, I haven't had a rate limiting issue since then.

@Mic92
Copy link
Member

Mic92 commented Dec 2, 2020

I also struggled first to figure out how to do multiple queries in one request with graphql, but I finally figured it out:

{
  a: repository(name: "nur-packages", owner: "Mic92") {
    ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(first: 1) { edges { node { oid } } }
        }
      }
    }
  }
  b: repository(name: "nur-packages", owner: "some-other-user") {
    ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(first: 1) { edges { node { oid } } }
        }
      }
    }
  }
}

Both a and b are arbitrary chosen and can be later used to when looking at the result:

{'a': {'ref': {'target': {'history': {'edges': [{'node': {'oid': 'bd79477f2333510f2e4f6440983977e1c5a69ce8'}}]}}}}, 'b': {'ref': {'target': {'history': {'edges': [{'node': {'oid': 'f39fb799c516cb986945e8a6f8b6cbf5b9d5af2e'}}]}}}}}

Here is a full python snippet:

from typing import Optional, Dict, Any
import urllib.parse
import urllib.request
import json
import sys
import os


class GithubClient:
    def __init__(self, api_token: Optional[str]) -> None:
        self.api_token = api_token

    def _request(
        self, path: str, method: str, data: Optional[Dict[str, Any]] = None
    ) -> Any:
        url = urllib.parse.urljoin("https://api.github.com/", path)
        headers = {"Content-Type": "application/json"}
        if self.api_token:
            headers["Authorization"] = f"token {self.api_token}"

        body = None
        if data:
            body = json.dumps(data).encode("ascii")

        req = urllib.request.Request(url, headers=headers, method=method, data=body)
        resp = urllib.request.urlopen(req)
        return json.loads(resp.read())

    def post(self, path: str, data: Dict[str, str]) -> Any:
        return self._request(path, "POST", data)

    def graphql(self, query: str) -> Dict[str, Any]:
        resp = self.post("/graphql", data=dict(query=query))
        if "errors" in resp:
            raise RuntimeError(f"Expected data from graphql api, got: {resp}")
        data: Dict[str, Any] = resp["data"]
        return data


query = """
{
  a: repository(name: "nur-packages", owner: "Mic92") {
    ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(first: 1) { edges { node { oid } } }
        }
      }
    }
  }
  b: repository(name: "nur-packages", owner: "balsoft") {
    ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(first: 1) { edges { node { oid } } }
        }
      }
    }
  }
}
"""

token = os.environ.get("GITHUB_TOKEN")
if not token:
    print("GITHUB_TOKEN not set")
    sys.exit(1)
client = GithubClient(api_token=token)
d = client.graphql(query)
print(d)

@Synthetica9
Copy link
Collaborator Author

Yes, that definitely helped, I haven't had a rate limiting issue since then.

Is this repo still used? I was under the impression that this functionality had been ported to Haskell and merged into the main nixpkgs-update repo, but it's not?

@ryantm
Copy link
Contributor

ryantm commented Dec 2, 2020

@Synthetica9
Copy link
Collaborator Author

It is still in use! nix-community/infra@5e0e53f/build01/nixpkgs-update.nix#L67

Oh, cool! I guess it's just low-maintenance code then...

I've looked into using the GraphQL API, and it definitely seems to be an improvement: we can easily hit multiple repos with a single "request point" (I think every repo only counts for a single request, so we should be able to hit 100 repo's per request point). For reference, here is the query I used:

{
  a: repository(owner: "microsoft", name: "vscode") {
    ...releaseInfo
  }
  b: repository(owner: "junegunn", name: "fzf") {
    ...releaseInfo
  }
  c: repository(owner: "foobar", name: "arsadf") {
    ...releaseInfo
  }
  d: repository(owner: "jgm", name: "pandoc") {
    ...releaseInfo
  }
  e: repository(owner: "swaywm", name: "sway") {
    ...releaseInfo
  }
  f: repository(owner: "sagemath", name: "sage") {
    ...releaseInfo
  }
  rateLimit {
    limit
    cost
    remaining
    resetAt
  }
}

fragment releaseInfo on Repository {
  releases(first: 10) {
    nodes {
      tagName
      isPrerelease
      isDraft
      publishedAt
    }
  }
}

Perhaps we'll even be able to do a full query of all repo's in one shot, but I think some batching is still in order

@Mic92
Copy link
Member

Mic92 commented Dec 4, 2020

It is still in use! nix-community/infra@5e0e53f/build01/nixpkgs-update.nix#L67

Oh, cool! I guess it's just low-maintenance code then...

I've looked into using the GraphQL API, and it definitely seems to be an improvement: we can easily hit multiple repos with a single "request point" (I think every repo only counts for a single request, so we should be able to hit 100 repo's per request point). For reference, here is the query I used:

{
  a: repository(owner: "microsoft", name: "vscode") {
    ...releaseInfo
  }
  b: repository(owner: "junegunn", name: "fzf") {
    ...releaseInfo
  }
  c: repository(owner: "foobar", name: "arsadf") {
    ...releaseInfo
  }
  d: repository(owner: "jgm", name: "pandoc") {
    ...releaseInfo
  }
  e: repository(owner: "swaywm", name: "sway") {
    ...releaseInfo
  }
  f: repository(owner: "sagemath", name: "sage") {
    ...releaseInfo
  }
  rateLimit {
    limit
    cost
    remaining
    resetAt
  }
}

fragment releaseInfo on Repository {
  releases(first: 10) {
    nodes {
      tagName
      isPrerelease
      isDraft
      publishedAt
    }
  }
}

Perhaps we'll even be able to do a full query of all repo's in one shot, but I think some batching is still in order

You would probably run into some timeout eventually but a higher batch size would be definitely more efficient than scraping each repo individually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants