## Github API v4 preview using GraphQL ##

Github's API is moving (slowly) over to GraphQL in place of their REST API. Given my prior exploratory analysis of Github's data [found here](https://github.com/EldritchJS/kono-notebook/blob/master/kono.ipynb) I set out to see what moving to the new API buys me, or loses me, as it were. Here's a few things that stand out as positives:

1) Single Endpoint  

2) Query-driven syntax

3) Previously unavailable data fields (e.g. reactions) 

4) Less event driven

The single endpoint trades looking up endpoint formats and return values for looking up query field and connector values. IMHO the latter is more advantageous, as endpoint syntax and formatting can get unruly in a hurry. Having a single endpoint lets data scientists focus on their queries, staying relegated to JSON. The queries follow (mostly) intuitive graph pattern wherein one traverses the docs/graphs until reaching the desired node(s). The reactions in particular permit further sentiment and engagement analyses, which could build on my previous work. 

and some negatives: 

1) Legacy REST-based code needs to be overhauled

2) Temporal queries not native

3) API still very much in flux


### Getting Started ###
In terms of connections, to get started, one need only generate a personal access token on Github's site. 

Programmitically, one needs the requests and json Python libraries imported, per the following. Additionally, the single endpoint is defined along with my Github authorization token (which is a whole bunch of Xs for what's in the repo, go get your own token please)

In [4]:
import requests
import json
import pandas as pd

endpoint = 'https://api.github.com/graphql'
headers = {'Authorization': 'bearer e434e3a20bc9933d09f566b3bfa9b6f9a8bfc8a5'}


### Organization member query ###

I'll start with a call to get the members of my team's Github organization. Note: there's a rate limiting factor of 100 for the number of members to returned. One can either get the first or last 100. Pagination is provided for larger groups. Also of interest: our mascot's account is spelled Radly, as opposed to Radley or Radlee.

In [18]:
query = ''' 
{ 
    organization(login: \"radanalyticsio\") { 
    pinnedRepositories(first: 100) { 
        edges { 
          node { 
            name
          } 
        } 
    } 
    members(first: 100) { 
      edges { 
        node { 
          name 
          avatarUrl 
        } 
      } 
    } 
  } 
} 
'''

In [19]:
r = requests.post(endpoint, json.dumps({"query": query}), headers=headers)
#print(json.dumps(r.json(), indent=2, sort_keys=True))
df = pd.read_json(json.dumps(r.json()))

In [20]:
print(df['data']['organization']['members']['edges'])

[{'node': {'name': 'Will Benton', 'avatarUrl': 'https://avatars2.githubusercontent.com/u/1161?v=4'}}, {'node': {'name': 'Matthew Farrellee', 'avatarUrl': 'https://avatars2.githubusercontent.com/u/112653?v=4'}}, {'node': {'name': 'michael mccune', 'avatarUrl': 'https://avatars3.githubusercontent.com/u/190649?v=4'}}, {'node': {'name': 'Erik Erlandson', 'avatarUrl': 'https://avatars0.githubusercontent.com/u/259898?v=4'}}, {'node': {'name': 'Rui Vieira', 'avatarUrl': 'https://avatars1.githubusercontent.com/u/327909?v=4'}}, {'node': {'name': 'Jirka Kremser', 'avatarUrl': 'https://avatars0.githubusercontent.com/u/535866?v=4'}}, {'node': {'name': 'Ricardo Martinelli de Oliveira', 'avatarUrl': 'https://avatars2.githubusercontent.com/u/813430?v=4'}}, {'node': {'name': 'Chad Roberts', 'avatarUrl': 'https://avatars0.githubusercontent.com/u/889317?v=4'}}, {'node': {'name': 'Trevor McKay', 'avatarUrl': 'https://avatars0.githubusercontent.com/u/1172537?v=4'}}, {'node': {'name': 'Zak Hassan', 'avatar

In [14]:
#print(json.dumps(r.json(), indent=2, sort_keys=True))


{
  "data": {
    "organization": {
      "members": {
        "edges": [
          {
            "node": {
              "avatarUrl": "https://avatars2.githubusercontent.com/u/1161?v=4",
              "name": "Will Benton"
            }
          },
          {
            "node": {
              "avatarUrl": "https://avatars2.githubusercontent.com/u/112653?v=4",
              "name": "Matthew Farrellee"
            }
          },
          {
            "node": {
              "avatarUrl": "https://avatars3.githubusercontent.com/u/190649?v=4",
              "name": "michael mccune"
            }
          },
          {
            "node": {
              "avatarUrl": "https://avatars0.githubusercontent.com/u/259898?v=4",
              "name": "Erik Erlandson"
            }
          },
          {
            "node": {
              "avatarUrl": "https://avatars1.githubusercontent.com/u/327909?v=4",
              "name": "Rui Vieira"
            }
          },
          {
          

### Individual user's owned repo query ###

Next I can do a simple query of the first 100 repositories I own, listing the repo's name, disk usage, and number of forks.

In [17]:
query=''' \
{ \
repositoryOwner(login : \"EldritchJS\") \
    {login repositories (first : 100) \
        {edges {node {name, diskUsage, forkCount }}} \
    } \
} \
'''

In [18]:
r = requests.post(endpoint, json.dumps({"query": query}), headers=headers)
print(json.dumps(r.json(), indent=2, sort_keys=True))

{
  "data": {
    "repositoryOwner": {
      "login": "EldritchJS",
      "repositories": {
        "edges": [
          {
            "node": {
              "diskUsage": 0,
              "forkCount": 0,
              "name": "mimisbrunnr"
            }
          },
          {
            "node": {
              "diskUsage": 2,
              "forkCount": 0,
              "name": "kono"
            }
          },
          {
            "node": {
              "diskUsage": 13,
              "forkCount": 0,
              "name": "eldritchjs.github.io"
            }
          },
          {
            "node": {
              "diskUsage": 9072,
              "forkCount": 0,
              "name": "radanalyticsio.github.io"
            }
          },
          {
            "node": {
              "diskUsage": 62,
              "forkCount": 0,
              "name": "ophicleide-training"
            }
          },
          {
            "node": {
              "diskUsage": 42,
           

### Individual repo stars, watches, and issues query ###

Once I have a repository I'm interested in, I can acquire data pertaining to said repository. In the next example I get a list of names of users who have starred the equoid-openshift repo, those who are watching the repo, and the number of issues. All fields of interest in my previous work on repository health metrics. 

In [19]:
query = ''' \
{ \
    repository(owner: \"EldritchJS\", name: \"equoid-openshift\") { \
    stargazers(first: 100) { \
        edges { \
            node { \
                name \
            } \
        } \
    } \
    watchers(first : 100) { \
        edges { \
            node { \
                name \
                } \
            } \
        } \
    issues( first: 100) { \
        edges { \
            node { \
                comments( first: 100) { \
                    edges { \
                        node { \
                        body \
                        } \
                    } \
                }\
                } \
            } \
        } \
    } \
} \
'''

In [20]:
r = requests.post(endpoint, json.dumps({"query": query}), headers=headers)
print(json.dumps(r.json(), indent=2, sort_keys=True))

{
  "data": {
    "repository": {
      "issues": {
        "edges": []
      },
      "stargazers": {
        "edges": [
          {
            "node": {
              "name": "Jirka Kremser"
            }
          }
        ]
      },
      "watchers": {
        "edges": []
      }
    }
  }
}
