# API Scavenger Game

## Challenge 1: Fork Languages

You will find out how many programming languages are used among all the forks created from the main lab repo of your bootcamp.

In [1]:
# import libraries here
import requests
import json
import pandas as pd
import numpy as np
import datetime as dt
from dateutil.parser import parse
import re
import base64

In [2]:
# pretty print function for JSON
def pprint(json_var):
    print(json.dumps(json_var, indent=4, sort_keys=True))
    
# check status function
def check_status(response):
    if response.status_code == 200:
        print('Satisfactory request')
    else:
        print('Your status code is:', response.status_code)
        
# function: pretty write in json to file
def pwrite(json_var,fileName):
    with open(fileName,'w') as respFile:
        json.dump(json_var, respFile, indent=4, sort_keys=True)

Assuming the main lab repo is ironhack-datalabs/madrid-oct-2018, you will:

#### 1. Obtain the full list of forks created from the main lab repo via Github API.

To list forks, we can use the GET method. As explained in the GitHub API documentation, we need to make the request to: GET /repos/:owner/:repo/forks.

In [3]:
# your code here
hostName = 'repos/ironhack-datalabs/'
repoName = 'madrid-oct-2018/'
url = 'https://api.github.com/'
search = 'forks'

# authorization for requests
userN ='beto-Sibileau'
userT = "beto-Token"

response = requests.get(f"{url}{hostName}{repoName}{search}", auth=(userN,userT))
#pprint(response.json())

# check status
check_status(response)

# convert response to list of dictionaries
forksJson = response.json()
print(f"Number of forks created at '{hostName[6:]}{repoName}' is: {len(forksJson)}")

Satisfactory request
Number of forks created at 'ironhack-datalabs/madrid-oct-2018/' is: 15


#### 2. Loop the JSON response to find out the language attribute of each fork. Use an array to store the language attributes of each fork.
Hint: Each language should appear only once in your array.
Print the language array. It should be something like: ["Python", "Jupyter Notebook", "HTML"]

In [4]:
# your code here

# loop on the Json response (filters None)
languages = []
for elem in forksJson:
    if elem['language']:
        if elem['language'] not in languages:
            languages.append(elem['language'])

print(languages)

# solution directly from Pandas (includes None)
df = pd.DataFrame(forksJson)
print(df.language.unique())


['Jupyter Notebook', 'HTML']
[None 'Jupyter Notebook' 'HTML']


## Challenge 2: Count Commits
Count how many commits were made in the month of october of 2018.
#### 1. Obtain all the commits made in October 2018 via API, which is a JSON array that contains multiple commit objects.

In [5]:
# your code here

search = 'commits'
response = requests.get(f"{url}{hostName}{repoName}{search}", auth=(userN,userT))

# check status
check_status(response)

# convert response to list of dictionaries
commitsJson = response.json()
#pprint(commitsJson)
#pwrite(commitsJson,'commits')

# json to pandas dataframe with normalize
dfCommits = pd.json_normalize(commitsJson)

#pd.set_option('display.max_columns', None)
#dfCommits.head()

# filter using pandas!
yearSearch  = 2018
monthSearch = 'October'
logicYear  = [parse(x).year  == yearSearch               for x in dfCommits['commit.committer.date']]
logicMonth = [parse(x).month == parse(monthSearch).month for x in dfCommits['commit.committer.date']]
commits_found = dfCommits[np.array(logicYear) & np.array(logicMonth)]

if commits_found.empty:
    print(f"There are no commits made in {monthSearch} {yearSearch}")
else:
    print(f"Number of commits made in {monthSearch} {yearSearch} is: {len(commits_found)}")
    

Satisfactory request
There are no commits made in October 2018


#### 2. Count how many commit objects are contained in the array.

In [6]:
# your code here

# (NO FILTER) --> number of commits in ironhack-datalabs/madrid-oct-2018/
print(f"Number of commits created at '{hostName[6:]}{repoName}' is: {len(commitsJson)}")

Number of commits created at 'ironhack-datalabs/madrid-oct-2018/' is: 30


## Challenge 3: Hidden Cold Joke

Using Python, call Github API to find out the cold joke contained in the 24 secret files in the following repo:

https://github.com/ironhack-datalabs/scavenger

The filenames of the secret files contain .scavengerhunt and they are scattered in different directories of this repo. The secret files are named from .0001.scavengerhunt to .0024.scavengerhunt. They are scattered randomly throughout this repo. You need to search for these files by calling the Github API, not searching the local files on your computer.

#### 1. Find the secret files.

In [7]:
# your code here
hostName = 'repos/ironhack-datalabs/'
repoName = 'scavenger/'
url = 'https://api.github.com/'
search = 'commits'

response = requests.get(f"{url}{hostName}{repoName}{search}", auth=(userN,userT))

# check status
check_status(response)

# build dataframe from commits
commitJson = response.json()
dfCommit = pd.DataFrame(commitJson)
print(f"There are {len(commitJson)} commits with SHA's: {dfCommit.sha.values}")

# I take all files from the commit into a pandas data frame
search = f"commits/{dfCommit.sha[0]}"
response = requests.get(f"{url}{hostName}{repoName}{search}", auth=(userN,userT))
dfFiles = pd.DataFrame(response.json()['files'])
print(f"\nDataFrame 'dfFiles' has these data types:\n{dfFiles.dtypes}")

# Apply regex to dfFiles.filename (iterable object)
pat = '\w+.scavengerhunt'
filesMatch = [re.findall(pat,elem) for elem in dfFiles.filename if re.findall(pat,elem)]
print(f"\nThe Secret files are:\n{filesMatch}")

Satisfactory request
There are 1 commits with SHA's: ['9308ccc8a4c34c5e3a991ee815222a9691c32476']

DataFrame 'dfFiles' has these data types:
sha             object
filename        object
status          object
additions        int64
deletions        int64
changes          int64
blob_url        object
raw_url         object
contents_url    object
patch           object
dtype: object

The Secret files are:
[['0006.scavengerhunt'], ['0008.scavengerhunt'], ['0012.scavengerhunt'], ['0007.scavengerhunt'], ['0021.scavengerhunt'], ['0022.scavengerhunt'], ['0005.scavengerhunt'], ['0018.scavengerhunt'], ['0016.scavengerhunt'], ['0024.scavengerhunt'], ['0010.scavengerhunt'], ['0014.scavengerhunt'], ['0011.scavengerhunt'], ['0023.scavengerhunt'], ['0020.scavengerhunt'], ['0003.scavengerhunt'], ['0004.scavengerhunt'], ['0019.scavengerhunt'], ['0017.scavengerhunt'], ['0002.scavengerhunt'], ['0013.scavengerhunt'], ['0015.scavengerhunt'], ['0009.scavengerhunt'], ['0001.scavengerhunt']]


#### 2.  Sort the filenames ascendingly.

In [8]:
# your code here
dfSortedFiles = pd.DataFrame(filesMatch, columns=['FileName']).sort_values('FileName')
dfSortedFiles

Unnamed: 0,FileName
23,0001.scavengerhunt
19,0002.scavengerhunt
15,0003.scavengerhunt
16,0004.scavengerhunt
6,0005.scavengerhunt
0,0006.scavengerhunt
3,0007.scavengerhunt
1,0008.scavengerhunt
22,0009.scavengerhunt
10,0010.scavengerhunt


#### 3. Read the content of each secret files into an array of strings.
Since the response is encoded, you will need to send the following information in the header of your request:
````python
headers = {'Accept': 'application/vnd.github.v3.raw'}
````

In [9]:
# your code here
# Get sha's from secret files and sort them accordingly
filesPointer = [True if re.findall(pat,elem) else False for elem in dfFiles.filename]
filesSha = dfFiles.sha[filesPointer].reset_index()
filesSha = filesSha.reindex(dfSortedFiles.index).reset_index()

# now get content by sha's and decoded base64
message = []
for sha in filesSha.sha:
    search = f"git/blobs/{sha}"
    response = requests.get(f"{url}{hostName}{repoName}{search}", auth=(userN,userT))
    base64_message = response.json()['content']
    base64_bytes = base64_message.encode('ascii')
    message_bytes = base64.b64decode(base64_bytes)
    message.append(message_bytes.decode('ascii').rstrip("\n"))

print(message)

['In', 'data', 'science,', '80', 'percent', 'of', 'time', 'spent', 'is', 'preparing', 'data,', '20', 'percent', 'of', 'time', 'is', 'spent', 'complaining', 'about', 'the', 'need', 'to', 'prepare', 'data.']


#### 4. Concatenate the strings in the array separating each two with a whitespace.

In [10]:
# your code here
jointMessage = str(" ").join(message)

#### 5. Print out the joke.

In [11]:
# your code here
print(jointMessage)

In data science, 80 percent of time spent is preparing data, 20 percent of time is spent complaining about the need to prepare data.


#### Draft

In [None]:
# commented junk code below

#filesSha.reindex(dfSortedFiles.index)

#filesSha.index[dfSortedFiles.index.values]

#filesSha.sort_values(index=dfSortedFiles.index)
#print(filesSha[dfSortedFiles.index.values])

#re.findall(pat,dfFiles.filename[1])

#print(dfFiles.filename[1])
#dfFiles.str.findall(pat,dfFiles.filename)

#fileSeries = pd.json_normalize(response.json()).files.values
#fileSeries