# Data Acquisition Using APIs 

In this notebook, we go over the basics in data acquisition using APIs, including authentication.  We illustrate this using APIs such as OpenNotify, GitHub and Reddit.

## OpenNotify API

We here make a get request to get the latest position of the ISS from the OpenNotify API.

In [1]:
import requests
import os

# Get request & status code :
response = requests.get("http://api.open-notify.org/iss-now.json")
print(response.status_code)

response = requests.get('http://api.open-notify.org/iss-pass.json')
print(response.status_code)

200
400


Let's set up the parameters to retrieve the next times the ISS we pass over New York City.

In [7]:
# This is the latitude and longitude of New York City.
parameters = {"lat": 40.71, "lon": -74}

# Make a get request with the parameters.
response = requests.get("http://api.open-notify.org/iss-pass.json", params=parameters)

# Print the content of the response (the data the server returned)
print(response.content)

# This gets the same data as the command above
response = requests.get("http://api.open-notify.org/iss-pass.json?lat=40.71&lon=-74")
print(response.content)

b'{\n  "message": "success", \n  "request": {\n    "altitude": 100, \n    "datetime": 1512932557, \n    "latitude": 40.71, \n    "longitude": -74.0, \n    "passes": 5\n  }, \n  "response": [\n    {\n      "duration": 615, \n      "risetime": 1512933806\n    }, \n    {\n      "duration": 550, \n      "risetime": 1512939678\n    }, \n    {\n      "duration": 581, \n      "risetime": 1512945513\n    }, \n    {\n      "duration": 641, \n      "risetime": 1512951300\n    }, \n    {\n      "duration": 538, \n      "risetime": 1512957116\n    }\n  ]\n}\n'


b'{\n  "message": "success", \n  "request": {\n    "altitude": 100, \n    "datetime": 1512932557, \n    "latitude": 40.71, \n    "longitude": -74.0, \n    "passes": 5\n  }, \n  "response": [\n    {\n      "duration": 615, \n      "risetime": 1512933806\n    }, \n    {\n      "duration": 550, \n      "risetime": 1512939678\n    }, \n    {\n      "duration": 581, \n      "risetime": 1512945513\n    }, \n    {\n      "duration": 641, \n      "risetime": 1512951300\n    }, \n    {\n      "duration": 538, \n      "risetime": 1512957116\n    }\n  ]\n}\n'

Let's do the same for San Fransisco.

In [8]:
# This is the latitude and longitude of San Fransisco.
parameters = {"lat": 37.78, "lon": -122.41}

# Make a get request with the parameters.
response = requests.get("http://api.open-notify.org/iss-pass.json", params=parameters)

content = response.content

# Print the content of the response (the data the server returned)
print(response.content)

b'{\n  "message": "success", \n  "request": {\n    "altitude": 100, \n    "datetime": 1512932775, \n    "latitude": 37.78, \n    "longitude": -122.41, \n    "passes": 5\n  }, \n  "response": [\n    {\n      "duration": 386, \n      "risetime": 1512933442\n    }, \n    {\n      "duration": 639, \n      "risetime": 1512939065\n    }, \n    {\n      "duration": 587, \n      "risetime": 1512944895\n    }, \n    {\n      "duration": 480, \n      "risetime": 1512950803\n    }, \n    {\n      "duration": 521, \n      "risetime": 1512956651\n    }\n  ]\n}\n'


## Using JSON format

Methods json.dumps and json.loads :

In [9]:
# Make a list of fast food chains.
best_food_chains = ["Taco Bell", "Shake Shack", "Chipotle"]
print(type(best_food_chains))

# Import the JSON library.
import json

# Use json.dumps to convert best_food_chains to a string.
best_food_chains_string = json.dumps(best_food_chains)
print(type(best_food_chains_string))

# Convert best_food_chains_string back to a list.
print(type(json.loads(best_food_chains_string)))

# Make a dictionary
fast_food_franchise = {
    "Subway": 24722,
    "McDonalds": 14098,
    "Starbucks": 10821,
    "Pizza Hut": 7600
}

# We can also dump a dictionary to a string and load it.
fast_food_franchise_string = json.dumps(fast_food_franchise)
print(type(fast_food_franchise_string))

fast_food_franchise_2 = json.loads(fast_food_franchise_string)

<class 'list'>
<class 'str'>
<class 'list'>
<class 'str'>


Using JSON format in a request :

In [10]:
parameters = {"lat": 37.78, "lon": -122.41}
response = requests.get("http://api.open-notify.org/iss-pass.json", params=parameters)

# Get the response data as a Python object.  We verify that it's a dictionary.
json_data = response.json()
print(type(json_data))
print(json_data)

first_pass_duration = json_data['response'][0]['duration']

# Headers is a dictionary
print(response.headers)

content_type = response.headers['content-type']

<class 'dict'>
{'message': 'success', 'request': {'altitude': 100, 'datetime': 1512932775, 'latitude': 37.78, 'longitude': -122.41, 'passes': 5}, 'response': [{'duration': 386, 'risetime': 1512933442}, {'duration': 639, 'risetime': 1512939065}, {'duration': 587, 'risetime': 1512944895}, {'duration': 480, 'risetime': 1512950803}, {'duration': 521, 'risetime': 1512956651}]}
{'Server': 'nginx/1.10.3', 'Date': 'Sun, 10 Dec 2017 19:06:26 GMT', 'Content-Type': 'application/json', 'Content-Length': '521', 'Connection': 'keep-alive', 'Via': '1.1 vegur'}


Let's find the number of people in space :

In [12]:
response = requests.get('http://api.open-notify.org/astros.json')
in_space_count = response.json()['number']
print(in_space_count)

6


There are currently 6 persons in space!

## API Authentication using GitHub

In [2]:
import os
token_github = os.environ['TOKEN_GITHUB']

# Create a dictionary of headers containing our Authorization header.
headers = {"Authorization": "token " + token_github}

# Make a GET request to the GitHub API with our headers.
# This API endpoint will give us details about Vik Paruchuri.
response = requests.get("https://api.github.com/users/cnoza", headers=headers)

# Print the content of the response.  As you can see, this token corresponds to the account of Vik Paruchuri.
print(response.json())

orgs = requests.get('https://api.github.com/users/cnoza/orgs',headers=headers).json()

{'login': 'cnoza', 'id': 31512667, 'avatar_url': 'https://avatars1.githubusercontent.com/u/31512667?v=4', 'gravatar_id': '', 'url': 'https://api.github.com/users/cnoza', 'html_url': 'https://github.com/cnoza', 'followers_url': 'https://api.github.com/users/cnoza/followers', 'following_url': 'https://api.github.com/users/cnoza/following{/other_user}', 'gists_url': 'https://api.github.com/users/cnoza/gists{/gist_id}', 'starred_url': 'https://api.github.com/users/cnoza/starred{/owner}{/repo}', 'subscriptions_url': 'https://api.github.com/users/cnoza/subscriptions', 'organizations_url': 'https://api.github.com/users/cnoza/orgs', 'repos_url': 'https://api.github.com/users/cnoza/repos', 'events_url': 'https://api.github.com/users/cnoza/events{/privacy}', 'received_events_url': 'https://api.github.com/users/cnoza/received_events', 'type': 'User', 'site_admin': False, 'name': 'Christophe Nozaradan', 'company': None, 'blog': 'https://www.linkedin.com/in/christophenozaradan/', 'location': None, 

Using pagination:

In [11]:
params = {"per_page": 50, "page": 1}
response = requests.get("https://api.github.com/users/cnoza/starred", headers=headers, params=params)
print(response.json())

[{'id': 17829481, 'name': 'data', 'full_name': 'fivethirtyeight/data', 'owner': {'login': 'fivethirtyeight', 'id': 6267336, 'avatar_url': 'https://avatars1.githubusercontent.com/u/6267336?v=4', 'gravatar_id': '', 'url': 'https://api.github.com/users/fivethirtyeight', 'html_url': 'https://github.com/fivethirtyeight', 'followers_url': 'https://api.github.com/users/fivethirtyeight/followers', 'following_url': 'https://api.github.com/users/fivethirtyeight/following{/other_user}', 'gists_url': 'https://api.github.com/users/fivethirtyeight/gists{/gist_id}', 'starred_url': 'https://api.github.com/users/fivethirtyeight/starred{/owner}{/repo}', 'subscriptions_url': 'https://api.github.com/users/fivethirtyeight/subscriptions', 'organizations_url': 'https://api.github.com/users/fivethirtyeight/orgs', 'repos_url': 'https://api.github.com/users/fivethirtyeight/repos', 'events_url': 'https://api.github.com/users/fivethirtyeight/events{/privacy}', 'received_events_url': 'https://api.github.com/users/

Exploring other APIs from Github :

In [13]:
# Creating a new repo (named "test") using a "post" request :
payload = {"name": "test"}

response = requests.post("https://api.github.com/user/repos", json=payload, headers=headers)
print(response.status_code)

201


In [14]:
## PUT/PATCH Requests to modify the description :
payload = {"description": "The best repository ever!", "name": "test"}
response = requests.patch("https://api.github.com/repos/cnoza/test", json=payload, headers=headers)
print(response.status_code)

200


In [15]:
## DELETE Requests ##
response = requests.delete("https://api.github.com/repos/cnoza/test", headers=headers)
status = response.status_code

## Reddit API

Authentication on Reddit :

In [3]:
import requests.auth

client_id = os.environ['REDDIT_ID']
client_secret = os.environ['REDDIT_SECRET']
password = os.environ['REDDIT_PWD']

client_auth = requests.auth.HTTPBasicAuth(client_id, client_secret)
post_data = {"grant_type": "password", "username": "cnoza", "password": password}
headers = {"User-Agent": "cnoza/0.1"}
response = requests.post("https://www.reddit.com/api/v1/access_token", auth=client_auth, data=post_data, headers=headers)

access_token = response.json()['access_token']
response.json()

{'access_token': '_OibsbOevBzPL7plOM8MY8fNDXQ',
 'expires_in': 3600,
 'scope': '*',
 'token_type': 'bearer'}

In [4]:
headers = {'Authorization': 'bearer ' + access_token, 'User-Agent': 'cnoza/0.1'}
params = {'t': 'day'}
response = requests.get('https://oauth.reddit.com/r/python/top', headers=headers, params=params)
python_top = response.json()
print(python_top)

{'kind': 'Listing', 'data': {'modhash': None, 'whitelist_status': 'all_ads', 'children': [{'kind': 't3', 'data': {'domain': 'teklern.blogspot.com', 'approved_at_utc': None, 'banned_by': None, 'media_embed': {}, 'subreddit': 'Python', 'selftext_html': None, 'selftext': '', 'likes': None, 'suggested_sort': None, 'user_reports': [], 'secure_media': None, 'is_reddit_media_domain': False, 'saved': False, 'id': '7ipz81', 'banned_at_utc': None, 'view_count': None, 'archived': False, 'clicked': False, 'report_reasons': None, 'title': 'Download information on all your gmail emails and the body text to either csv or json. I developed this to download my 100K + emails stored over several years on gmail.', 'num_crossposts': 0, 'link_flair_text': None, 'mod_reports': [], 'can_mod_post': False, 'is_crosspostable': True, 'pinned': False, 'score': 401, 'approved_by': None, 'over_18': False, 'hidden': False, 'thumbnail': '', 'subreddit_id': 't5_2qh0y', 'edited': False, 'link_flair_css_class': None, 'au

In [6]:
## Getting the Most Upvoted Post ##

python_top_articles = [item['data'] for item in python_top['data']['children']]

upvotes_max = 0
for post in python_top_articles:
    if post['ups'] > upvotes_max:
        most_upvoted = post['id']
        upvotes_max = post['ups']

print('Post ID: ', most_upvoted)

Post ID:  7ipz81


In [8]:
## Getting Post Comments ##

full_url = 'https://oauth.reddit.com/r/python/comments/' + most_upvoted
response = requests.get(full_url, headers=headers)
comments = response.json()
print(comments)

[{'kind': 'Listing', 'data': {'modhash': None, 'whitelist_status': 'all_ads', 'children': [{'kind': 't3', 'data': {'domain': 'teklern.blogspot.com', 'approved_at_utc': None, 'banned_by': None, 'media_embed': {}, 'subreddit': 'Python', 'selftext_html': None, 'selftext': '', 'likes': None, 'suggested_sort': None, 'user_reports': [], 'secure_media': None, 'is_reddit_media_domain': False, 'link_flair_text': None, 'id': '7ipz81', 'banned_at_utc': None, 'view_count': None, 'archived': False, 'clicked': False, 'report_reasons': None, 'title': 'Download information on all your gmail emails and the body text to either csv or json. I developed this to download my 100K + emails stored over several years on gmail.', 'num_crossposts': 0, 'saved': False, 'can_mod_post': False, 'is_crosspostable': True, 'pinned': False, 'score': 401, 'approved_by': None, 'over_18': False, 'hidden': False, 'num_comments': 29, 'thumbnail': '', 'subreddit_id': 't5_2qh0y', 'hide_score': False, 'edited': False, 'link_flai

In [9]:
## 5. Getting the Most Upvoted Comment ##

comments_list = [item['data'] for item in comments[1]['data']['children']]

ups_max = 0
for item in comments_list:
    if item['ups'] > ups_max:
        most_upvoted_comment = item['id']
        ups_max = item['ups']

print('Most Updvoted Comment ID :', most_upvoted_comment)

Most Updvoted Comment ID : dr0loqs
