# Rate-Limited Querying of Github's GraphQL API

In [1]:
import pandas as pd
import requests

In [2]:
file = open("/home/joseph/graphql_token.txt", "r")
api_token = file.read().strip()

url = "https://api.github.com/graphql"
headers = {"Authorization": "token %s" % api_token}

In [3]:
def query(json: dict) -> str:
    r = requests.post(url=url, json=json, headers=headers)
    return r.text

### Sample Query

The data is returned in a JSON structure, as a `str` type object. 

In [4]:
json = {
    "query": "{ viewer { repositories(first: 1) { totalCount pageInfo { hasNextPage endCursor } edges { node { name } } } } }"
}

query(json)

'{"data":{"viewer":{"repositories":{"totalCount":38,"pageInfo":{"hasNextPage":true,"endCursor":"Y3Vyc29yOnYyOpHOAvjC2Q=="},"edges":[{"node":{"name":"Yendors-Analysis"}}]}}}}'

This query returns the remaining number of nodes queries, as well as the when the limit will be reset. This data can be passed into a pandas `DataFrame` using `pd.read_json()`, and from there the remaining limit and reset time can be parsed to allow for rate-limited programmatic scraping of Github's GraphQL API.

In [5]:
query_text = """query {
  viewer {
    login
  }
  rateLimit {
    limit
    cost
    remaining
    resetAt
  }
}"""

json = {"query": query_text}
query(json)

'{"data":{"viewer":{"login":"beverast"},"rateLimit":{"limit":5000,"cost":1,"remaining":4992,"resetAt":"2019-08-23T00:21:36Z"}}}'

### Ingest GraphQL Responses Into a DataFrame
1. Query the endpoint, ingest as a DataFrame from JSON

In [6]:
limit_df = pd.read_json(query(json))
limit_df = limit_df.reset_index()

In [7]:
limit_df.head()

Unnamed: 0,index,data
0,rateLimit,"{'limit': 5000, 'cost': 1, 'remaining': 4991, ..."
1,viewer,{'login': 'beverast'}


2. Create columns for the necessary data: `remaining` and `resetAt`

In [8]:
limit_df["remaining"] = limit_df.iloc[0][1]['remaining']
limit_df["resetAt"] = limit_df.iloc[0][1]['resetAt']

3. Drop unnecessary `viewer` data

In [9]:
limit_df = limit_df.drop(axis=1, index=1)

In [10]:
limit_df.head()

Unnamed: 0,index,data,remaining,resetAt
0,rateLimit,"{'limit': 5000, 'cost': 1, 'remaining': 4991, ...",4991,2019-08-23T00:21:36Z


4. Convert time to a mathematically usable format