# Interacting with Github

This script shows how to access files at github, extract their content and calculate diff between two revisions. 

The script uses the PyGithub library to access the Github API. The library is available at https://github.com/PyGithub/PyGithub and can be installed using pip.

In [None]:
# import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# library to access github api
from github import Github

In [2]:
# access file with my personal token
with open('access_token.txt', 'r') as file:
    token = file.read()

In [3]:
# First create a Github instance:

# using an access token
g = Github(token, per_page=100)

# get the repo for this book
repo = g.get_repo("miroslawstaron/machine_learning_best_practices")

# get all commits
commits = repo.get_commits()


In [4]:

# print the number of commits
print(f'Number of commits in this repo: {commits.totalCount}')

# print the last commit
print(f'The last commit message: {commits[0].commit.message}')

Number of commits in this repo: 13
The last commit message: Full information quality example


In [47]:
# print the names of all files in the commit
# 0 means that we are looking at the latest commit
print(commits[0].file)

chapter_4/gerrit_reviews.csv


In [48]:
# get one of the file from the commit
fileOne = commits[0].files[0]

# get the file from the second commit
fileTwo = commits[1].files[0]

In [50]:
# to get the content of the file, we need to get the sha of the commit
# otherwise we only get the content from the last commit
fl = repo.get_contents(fileOne.filename, ref=commits[0].sha)
fr = repo.get_contents(fileTwo.filename, ref=commits[1].sha)

In [53]:
# read the file content, but decoded into strings
# otherwise we would get the content in bytes
linesOne = fl.decoded_content
linesTwo = fr.decoded_content

In [None]:
# calculate the diff using difflib
# for which we use a library difflib
import difflib

# print diff lines by iterating the list of lines 
# returned by the difflib library
for line in difflib.unified_diff(str(linesOne), 
                                 str(linesTwo), 
                                 fromfile=fileOne.filename, 
                                 tofile=fileTwo.filename):
  print(line)