Investigate using GraphQL as our backend #5

Synthetica9 · 2020-01-21T14:47:55Z

It seems like we should be able to request all releases in one go (or in batches) instead of in one request per repo.

This should be much faster.

ryantm · 2020-01-26T00:36:36Z

We really need to do this, my token just got rate limited!

Synthetica9 · 2020-01-26T18:33:32Z

I looked into this in the mean time, and there doesn't seem to be a good way to get multiple releases in a single request. However, it is my first time using GraphQL, so I might just be overlooking something.

Synthetica9 · 2020-01-26T22:58:23Z

@ryantm 5f9acc4 should alleviate this issue somewhat.

ryantm · 2020-12-02T14:05:14Z

Yes, that definitely helped, I haven't had a rate limiting issue since then.

Mic92 · 2020-12-02T14:09:14Z

I also struggled first to figure out how to do multiple queries in one request with graphql, but I finally figured it out:

{
  a: repository(name: "nur-packages", owner: "Mic92") {
    ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(first: 1) { edges { node { oid } } }
        }
      }
    }
  }
  b: repository(name: "nur-packages", owner: "some-other-user") {
    ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(first: 1) { edges { node { oid } } }
        }
      }
    }
  }
}

Both a and b are arbitrary chosen and can be later used to when looking at the result:

{'a': {'ref': {'target': {'history': {'edges': [{'node': {'oid': 'bd79477f2333510f2e4f6440983977e1c5a69ce8'}}]}}}}, 'b': {'ref': {'target': {'history': {'edges': [{'node': {'oid': 'f39fb799c516cb986945e8a6f8b6cbf5b9d5af2e'}}]}}}}}

Here is a full python snippet:

from typing import Optional, Dict, Any
import urllib.parse
import urllib.request
import json
import sys
import os


class GithubClient:
    def __init__(self, api_token: Optional[str]) -> None:
        self.api_token = api_token

    def _request(
        self, path: str, method: str, data: Optional[Dict[str, Any]] = None
    ) -> Any:
        url = urllib.parse.urljoin("https://api.github.com/", path)
        headers = {"Content-Type": "application/json"}
        if self.api_token:
            headers["Authorization"] = f"token {self.api_token}"

        body = None
        if data:
            body = json.dumps(data).encode("ascii")

        req = urllib.request.Request(url, headers=headers, method=method, data=body)
        resp = urllib.request.urlopen(req)
        return json.loads(resp.read())

    def post(self, path: str, data: Dict[str, str]) -> Any:
        return self._request(path, "POST", data)

    def graphql(self, query: str) -> Dict[str, Any]:
        resp = self.post("/graphql", data=dict(query=query))
        if "errors" in resp:
            raise RuntimeError(f"Expected data from graphql api, got: {resp}")
        data: Dict[str, Any] = resp["data"]
        return data


query = """
{
  a: repository(name: "nur-packages", owner: "Mic92") {
    ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(first: 1) { edges { node { oid } } }
        }
      }
    }
  }
  b: repository(name: "nur-packages", owner: "balsoft") {
    ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(first: 1) { edges { node { oid } } }
        }
      }
    }
  }
}
"""

token = os.environ.get("GITHUB_TOKEN")
if not token:
    print("GITHUB_TOKEN not set")
    sys.exit(1)
client = GithubClient(api_token=token)
d = client.graphql(query)
print(d)

Synthetica9 · 2020-12-02T15:03:18Z

Yes, that definitely helped, I haven't had a rate limiting issue since then.

Is this repo still used? I was under the impression that this functionality had been ported to Haskell and merged into the main nixpkgs-update repo, but it's not?

ryantm · 2020-12-02T15:44:43Z

It is still in use! https://github.com/nix-community/infra/blob/5e0e53fbdd59826fb32d12826f9777c87100c597/build01/nixpkgs-update.nix#L67

Synthetica9 · 2020-12-03T14:58:05Z

It is still in use! nix-community/infra@5e0e53f/build01/nixpkgs-update.nix#L67

Oh, cool! I guess it's just low-maintenance code then...

I've looked into using the GraphQL API, and it definitely seems to be an improvement: we can easily hit multiple repos with a single "request point" (I think every repo only counts for a single request, so we should be able to hit 100 repo's per request point). For reference, here is the query I used:

{
  a: repository(owner: "microsoft", name: "vscode") {
    ...releaseInfo
  }
  b: repository(owner: "junegunn", name: "fzf") {
    ...releaseInfo
  }
  c: repository(owner: "foobar", name: "arsadf") {
    ...releaseInfo
  }
  d: repository(owner: "jgm", name: "pandoc") {
    ...releaseInfo
  }
  e: repository(owner: "swaywm", name: "sway") {
    ...releaseInfo
  }
  f: repository(owner: "sagemath", name: "sage") {
    ...releaseInfo
  }
  rateLimit {
    limit
    cost
    remaining
    resetAt
  }
}

fragment releaseInfo on Repository {
  releases(first: 10) {
    nodes {
      tagName
      isPrerelease
      isDraft
      publishedAt
    }
  }
}

Perhaps we'll even be able to do a full query of all repo's in one shot, but I think some batching is still in order

Mic92 · 2020-12-04T07:46:36Z

It is still in use! nix-community/infra@5e0e53f/build01/nixpkgs-update.nix#L67

Oh, cool! I guess it's just low-maintenance code then...

I've looked into using the GraphQL API, and it definitely seems to be an improvement: we can easily hit multiple repos with a single "request point" (I think every repo only counts for a single request, so we should be able to hit 100 repo's per request point). For reference, here is the query I used:
{
  a: repository(owner: "microsoft", name: "vscode") {
    ...releaseInfo
  }
  b: repository(owner: "junegunn", name: "fzf") {
    ...releaseInfo
  }
  c: repository(owner: "foobar", name: "arsadf") {
    ...releaseInfo
  }
  d: repository(owner: "jgm", name: "pandoc") {
    ...releaseInfo
  }
  e: repository(owner: "swaywm", name: "sway") {
    ...releaseInfo
  }
  f: repository(owner: "sagemath", name: "sage") {
    ...releaseInfo
  }
  rateLimit {
    limit
    cost
    remaining
    resetAt
  }
}

fragment releaseInfo on Repository {
  releases(first: 10) {
    nodes {
      tagName
      isPrerelease
      isDraft
      publishedAt
    }
  }
}
Perhaps we'll even be able to do a full query of all repo's in one shot, but I think some batching is still in order

You would probably run into some timeout eventually but a higher batch size would be definitely more efficient than scraping each repo individually.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate using GraphQL as our backend #5

Investigate using GraphQL as our backend #5

Synthetica9 commented Jan 21, 2020

ryantm commented Jan 26, 2020

Synthetica9 commented Jan 26, 2020

Synthetica9 commented Jan 26, 2020

ryantm commented Dec 2, 2020

Mic92 commented Dec 2, 2020

Synthetica9 commented Dec 2, 2020

ryantm commented Dec 2, 2020

Synthetica9 commented Dec 3, 2020

Mic92 commented Dec 4, 2020

Investigate using GraphQL as our backend #5

Investigate using GraphQL as our backend #5

Comments

Synthetica9 commented Jan 21, 2020

ryantm commented Jan 26, 2020

Synthetica9 commented Jan 26, 2020

Synthetica9 commented Jan 26, 2020

ryantm commented Dec 2, 2020

Mic92 commented Dec 2, 2020

Synthetica9 commented Dec 2, 2020

ryantm commented Dec 2, 2020

Synthetica9 commented Dec 3, 2020

Mic92 commented Dec 4, 2020