## Github API v4 preview using GraphQL ##

Github's API is moving (slowly) over to GraphQL in place of their REST API. Given my prior exploratory analysis of Github's data [found here](https://github.com/EldritchJS/kono-notebook/blob/master/kono.ipynb) I set out to see what moving to the new API buys me, or loses me, as it were. Here's a few things that stand out as positives:

1) Single Endpoint  

2) Query-driven syntax (only get the data one wants)

3) Drill down of data without additional calls

4) Previously unavailable data fields (e.g. reactions) 

5) Less event-dependent

The single endpoint trades looking up endpoint formats and return values for looking up query field and connector values. IMHO the latter is more advantageous, as endpoint syntax and formatting can get unruly in a hurry. Having a single endpoint lets data scientists focus on their queries, staying relegated to JSON. The queries follow (mostly) intuitive graph pattern wherein one traverses the docs/graphs until reaching the desired node(s). The reactions in particular permit further sentiment and engagement analyses, which could build on my previous work. 

and some negatives: 

1) Legacy REST-based code needs to be overhauled

2) Temporal queries not native

3) API still very much in flux

### Goals ###

I've set out to make a Jupyter-driven dashboard of organization/repo/user health within Github. For starters, I would like to display, for a given organization, repository, or user, some static metrics such as forks, stars, and number of members. Admittedly some of these are available directly on Github.com however sentiment analysis et al. per my previous work in this area are expected to be added, which differentiates from Github's direct offerings. 

### Getting Started ###
In terms of connections, to get started, one need only generate a personal access token on Github's site. 

Programmitically, one needs the requests and json Python libraries imported, per the following (Note: I added Pandas as well for use as a dashboard later.) Additionally, the single endpoint is defined along with my Github authorization token (which is a whole bunch of Xs for what's in the repo, go get your own token please)

In [1]:
import requests
import json
import pandas as pd

endpoint = 'https://api.github.com/graphql'
headers = {'Authorization': 'bearer 5b151dd275afed8d9e51f4fb063b4aa36305c346'}

### Organization member query ###

I'll start with a call to get the members of my team's Github organization. Note: there's a rate limiting factor of 100 for the number of members to returned. One can either get the first or last 100. Pagination is provided for larger groups. Also of interest: our mascot's account is spelled Radly, as opposed to Radley or Radlee. The dataframes resulting from these calls have a larger amount of data than needed for our eventual dashboard, however, they are shown here to illustrate how flexible these queries are in terms of data verbosity versus conciseness. 


In [30]:
query = ''' 
{ 
    organization(login: \"radanalyticsio\") { 
      pinnedRepositories(first: 100) { 
        edges { 
          node { 
            name
          } 
        } 
        totalCount
      } 
      repositories(first: 100) {
        edges {
          node {
            name
            id
            pullRequests(last: 100) {
                totalCount
            }
            forks(last: 100) {
                totalCount
            }
          }
        }
        totalCount
      }
      members(first: 100) { 
        edges { 
          node { 
            name
            id
          } 
        }
        totalCount
      } 
    } 
} 
'''
r = requests.post(endpoint, json.dumps({"query": query}), headers=headers)
orgdf = pd.read_json(json.dumps(r.json()), orient='split')['organization']
print(orgdf['members']['edges'])
print('\n\n')
print(orgdf['repositories']['edges'])
print('\n\n')
print(orgdf['pinnedRepositories']['edges'])

[{'node': {'name': 'Will Benton', 'id': 'MDQ6VXNlcjExNjE='}}, {'node': {'name': 'Matthew Farrellee', 'id': 'MDQ6VXNlcjExMjY1Mw=='}}, {'node': {'name': 'michael mccune', 'id': 'MDQ6VXNlcjE5MDY0OQ=='}}, {'node': {'name': 'Erik Erlandson', 'id': 'MDQ6VXNlcjI1OTg5OA=='}}, {'node': {'name': 'Rui Vieira', 'id': 'MDQ6VXNlcjMyNzkwOQ=='}}, {'node': {'name': 'Jirka Kremser', 'id': 'MDQ6VXNlcjUzNTg2Ng=='}}, {'node': {'name': 'Ricardo Martinelli de Oliveira', 'id': 'MDQ6VXNlcjgxMzQzMA=='}}, {'node': {'name': 'Chad Roberts', 'id': 'MDQ6VXNlcjg4OTMxNw=='}}, {'node': {'name': 'Trevor McKay', 'id': 'MDQ6VXNlcjExNzI1Mzc='}}, {'node': {'name': 'Zak Hassan', 'id': 'MDQ6VXNlcjEyNjk3NTk='}}, {'node': {'name': 'Pete MacKinnon', 'id': 'MDQ6VXNlcjIzODA1NDU='}}, {'node': {'name': 'Diane Feddema', 'id': 'MDQ6VXNlcjMyMDg3MTk='}}, {'node': {'name': 'Mike Barrett', 'id': 'MDQ6VXNlcjQ0ODIzMzM='}}, {'node': {'name': 'Sophie Watson', 'id': 'MDQ6VXNlcjUzMTEyODk='}}, {'node': {'name': 'Paolo Patierno', 'id': 'MDQ6VXNlc

### Individual user's owned repo query ###
Next I can do a simple query of the first 100 repositories I own, listing the repo's name, disk usage, and number of forks.

In [3]:
query=''' 
{ 
    repositoryOwner(login : \"EldritchJS\") {
        login repositories (first : 100) {
            edges {
                node {
                    name
                    diskUsage
                    forkCount 
                }
            }
            totalCount
        } 
    } 
} 
'''
r = requests.post(endpoint, json.dumps({"query": query}), headers=headers)
userdf = pd.read_json(json.dumps(r.json()), orient='split')['repositoryOwner']
#print(userdf)


In [12]:
data = r.json()['data']['repositoryOwner']['repositories']['edges']
repos = []
for repo in data:
    repos.append(repo['node']['name'])
    
repos

['mimisbrunnr',
 'kono',
 'eldritchjs.github.io',
 'radanalyticsio.github.io',
 'ophicleide-training',
 'ophicleide-web',
 'calculator-sample',
 'scotch-io.github.io',
 'dinky',
 'nationalparks-py',
 'word-fountain-amq',
 'equoid-data-handler',
 'workshop-notebook',
 'base-notebook',
 'workshop',
 'kono-notebook',
 'stackhub',
 'kono-webapp',
 'kono-metrics',
 'pixiedust',
 'rhinsights',
 'pixiedust-notebook',
 'datascienceworkflowtalk',
 'django-example',
 'insights-anonymizer',
 'insights-anonymizer',
 'countminsketch',
 'topk-datasci',
 'topk-receiver-amq',
 'topk-amq',
 'insights-anonymizer',
 'jgrafzahl',
 'streaming-amqp',
 'equoid-data-publisher',
 'equoid-openshift',
 'tesserae',
 'SWGoHBot',
 'github-graphql-notebook',
 'infinispan-dump']

### Individual repo stars, watches, and issues query ###

Once I have a repository I'm interested in, I can acquire data pertaining to said repository. In the next example I get a list of names of users who have starred the equoid-openshift repo, those who are watching the repo, and the number of issues. All fields of interest in my previous work on repository health metrics. 

In [32]:
query = ''' 
{ 
    repository(owner: \"EldritchJS\", name: \"equoid-openshift\") { 
    stargazers(first: 100) { 
        totalCount
    } 
    watchers(first : 100) { 
        totalCount
    } 
    issues(first: 100) { 
        totalCount
        edges { 
            node { 
                comments( first: 100) { 
                    edges { 
                        node { 
                        body 
                        } 
                    } 
                }
                } 
            } 
        } 
    } 
} 
'''
r = requests.post(endpoint, json.dumps({"query": query}), headers=headers)
repodf = pd.read_json(json.dumps(r.json()), orient='split')['repository']
print(repodf)

issues        {'totalCount': 0, 'edges': []}
stargazers                 {'totalCount': 1}
watchers                   {'totalCount': 1}
Name: repository, dtype: object


### Text report ###

In lieu of a pretty dashboard (for now) the following is a text report of metrics for the aforementioned organization, repository, and user. 

In [33]:
print('Organization: radanalyticsio\n\nPinned Repos: ' \
      + str(orgdf['pinnedRepositories']['totalCount']) + '\nTotal Repos: ' \
      + str(orgdf['repositories']['totalCount']) + '\nMembers: ' + str(orgdf['members']['totalCount']) + '\n\n')

print('Repository: equoid-openshift\n\nStargazers: ' + str(repodf['stargazers']['totalCount']) + '\nWatchers: ' \
     + str(repodf['watchers']['totalCount']) + '\nIssues: ' + str(repodf['issues']['totalCount']) + '\n\n')

print('User: EldritchJS\n\nRepos: ' + str(userdf['repositories']['totalCount']))

Organization: radanalyticsio

Pinned Repos: 6
Total Repos: 49
Members: 23


Repository: equoid-openshift

Stargazers: 1
Watchers: 1
Issues: 0


User: EldritchJS

Repos: 38


## matplotlib dashboard (coming soon)

In [34]:
%matplotlib notebook

In [35]:
import matplotlib.pyplot as plt

Activity - Event breakdown

Interest - Forks, issues, comments, PRs, Follows

Responsiveness - Time to address issues, Time to act on PRs

Contributor Demeanor - Commits, reactions? 

Starring - Class of its own



Interest

In [None]:


# Forks




# Issues
# Comments
# PRs
# Follows