<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Practice Using APIs

_Authors: Dave Yerrington (SF), Sam Stack (DC)_

---

In this lab, we'll practice using some popular APIs to retrieve and store data.

In [1]:
# Imports at the top.
import json
import urllib
import pandas as pd
import numpy as np
import requests
import json
import re
import matplotlib.pyplot as plt
%matplotlib inline

## Exercise 1: Get Data From Sheetsu

---

[Sheetsu](https://sheetsu.com/) is an online service that allows you to access any Google spreadsheet from an API. This can be a powerful way to share a data set with colleagues, as well as create mini, centralized data storage that is simpler to edit than a database.

A Google spreadsheet with wine data can be found [here](https://docs.google.com/spreadsheets/d/1pBwap3K4Blwbx3Su07HAxxZCyy0lOGAiwBrUIvuDbsE).

It can be accessed through the Sheetsu API at this endpoint: https://sheetsu.com/apis/v1.0/1a4050d2ae98.

**Questions:**

1) Use the `requests` library to access the document. Inspect the response text. What kind of data is it?
- Check the status code of the response object. What code is it?
- Use the appropriate libraries and read functions to read the response into a Pandas DataFrame.
- Once you've imported the data into a DataFrame, check the value of the fifth line. What's the price?

In [40]:
# You can either post or get information from this API.
api_base_url = 'https://sheetsu.com/apis/v1.0/1a4050d2ae98'
result = requests.get(api_base_url)
google_result = result.json()
import pandas as pd
df = pd.DataFrame(google_result)
df

Unnamed: 0,Color,Consumed In,Country,Grape,Name,Price,Region,Score,Vintage,Vinyard
0,W,2015,Portugal,,,,Portugal,4.0,2013,Vinho Verde
1,W,2015,France,,,17.8,France,3.0,2013,Peyruchet
2,W,2015,Oregon,,,20.0,Oregon,3.0,2013,Abacela
3,W,2015,Spain,chardonay,,7.0,Spain,2.5,2012,Ochoa
4,R,2015,US,"chiraz, cab",Spice Trader,6.0,,3.0,2012,Heartland
5,R,2015,US,cab,,13.0,California,3.5,2012,Crow Canyon
6,R,2015,US,,#14,21.0,Oregon,2.5,2013,Abacela
7,R,2015,France,"merlot, cab",,12.0,Bordeaux,3.5,2012,David Beaulieu


## Exercise 2: IMDb TV Shows

---

Sometimes an API doesn't provide all of the information we'd like and we need to get creative.

Here we'll use a combination of scraping and API calls to find the ratings and networks of famous television shows.

### 2.A) Get the Top TV Shows

IMDb contains data about movies and TV shows. Unfortunately, it doesn't have a public API.

The page http://www.imdb.com/chart/toptv/?ref_=nv_tp_tv250_2 contains the list of the top 250 television shows of all time. Retrieve the page using the `requests` library and then parse the HTML to obtain a list of the `television_ids` for these shows. You can parse it with regular expression or by using a library like `BeautifulSoup`.

> **Hint:** television_ids look like this: `tt2582802`.
> _Everything after "/title/" and before "/?"_

In [17]:
import requests
from bs4 import BeautifulSoup

In [18]:
# Target web page:
url = "http://www.imdb.com/chart/toptv/?ref_=nv_tp_tv250_2"
# Establishing the connection to the web page:
response = requests.get(url)

# You can use status codes to understand how the target server responds to your request.
# Ex., 200 = OK, 400 = Bad Request, 403 = Forbidden, 404 = Not Found.
print 'Status Code: ',response.status_code

# Pull the HTML string out of requests and convert it to a Python string.
html = response.text

# The first 500 characters of the content.
print "\nFirst part of HTML document fetched as string:\n"
print html[:500]

Status Code:  200

First part of HTML document fetched as string:




<!DOCTYPE html>
<html
    xmlns:og="http://ogp.me/ns#"
    xmlns:fb="http://www.facebook.com/2008/fbml">
    <head>
         
        <meta charset="utf-8">
        <meta http-equiv="X-UA-Compatible" content="IE=edge">

    
    
    

    
    
    

    <meta name="apple-itunes-app" content="app-id=342792525, app-argument=imdb:///?src=mdot">
            <style>
                body#styleguide-v2 {
                    background: no-repeat fixed center top #000;
                }
           


In [19]:
soup = BeautifulSoup(html, 'lxml')

In [22]:
soup

<!DOCTYPE html>\n<html xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://ogp.me/ns#">\n<head>\n<meta charset="unicode-escape"/>\n<meta content="IE=edge" http-equiv="X-UA-Compatible"/>\n<meta content="app-id=342792525, app-argument=imdb:///?src=mdot" name="apple-itunes-app"/>\n<style>\n                body#styleguide-v2 {\n                    background: no-repeat fixed center top #000;\n                }\n            </style>\n<script type="text/javascript">var IMDbTimer={starttime: new Date().getTime(),pt:'java'};</script>\n<script>\n    if (typeof uet == 'function') {\n      uet("bb", "LoadTitle", {wb: 1});\n    }\n</script>\n<script>(function(t){ (t.events = t.events || {})["csm_head_pre_title"] = new Date().getTime(); })(IMDbTimer);</script>\n<title>IMDb Top 250 TV - IMDb</title>\n<script>(function(t){ (t.events = t.events || {})["csm_head_post_title"] = new Date().getTime(); })(IMDbTimer);</script>\n<script>\n    if (typeof uet == 'function') {\n      uet("be", "LoadTi

In [26]:
tds = soup.findAll("td", {"class": "titleColumn"})

In [104]:
type(tds)

bs4.element.ResultSet

In [89]:
ids = []
for td in tds:
    for txt in td.findChildren('a'):
        ids.append(txt.get('href')[7:16])

In [109]:
ids

['tt5491994',
 'tt0185906',
 'tt0795176',
 'tt0944947',
 'tt0903747',
 'tt0306414',
 'tt2395695',
 'tt2861424',
 'tt0081846',
 'tt0141842',
 'tt0071075',
 'tt0417299',
 'tt1533395',
 'tt6769208',
 'tt1475582',
 'tt1806234',
 'tt0052520',
 'tt0098769',
 'tt0092337',
 'tt0303461',
 'tt2356777',
 'tt1355642',
 'tt3530232',
 'tt2802850',
 'tt0103359',
 'tt0877057',
 'tt0296310',
 'tt0213338',
 'tt4508902',
 'tt2085059',
 'tt0063929',
 'tt0112130',
 'tt0081834',
 'tt2571774',
 'tt2092588',
 'tt4574334',
 'tt0367279',
 'tt0475784',
 'tt0108778',
 'tt1856010',
 'tt7221388',
 'tt0098904',
 'tt0081912',
 'tt3718778',
 'tt0098936',
 'tt2707408',
 'tt1865718',
 'tt0193676',
 'tt0074006',
 'tt4742876',
 'tt0072500',
 'tt0096548',
 'tt0384766',
 'tt0386676',
 'tt2442560',
 'tt0118421',
 'tt0096697',
 'tt0472954',
 'tt2560140',
 'tt0121955',
 'tt0412142',
 'tt0200276',
 'tt4299972',
 'tt0353049',
 'tt2297757',
 'tt0096639',
 'tt0214341',
 'tt0086661',
 'tt0108855',
 'tt0264235',
 'tt0248654',
 'tt03

In [108]:
[text.get('href') for text in td.findChildren('a') for td in tds]

['/title/tt3673794/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=12230b0e-0e00-43ed-9e59-8d5353703cce&pf_rd_r=34Q58F3BHH9D4S0YRRS3&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=toptv&ref_=chttvtp_tt_250',
 '/title/tt3673794/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=12230b0e-0e00-43ed-9e59-8d5353703cce&pf_rd_r=34Q58F3BHH9D4S0YRRS3&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=toptv&ref_=chttvtp_tt_250',
 '/title/tt3673794/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=12230b0e-0e00-43ed-9e59-8d5353703cce&pf_rd_r=34Q58F3BHH9D4S0YRRS3&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=toptv&ref_=chttvtp_tt_250',
 '/title/tt3673794/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=12230b0e-0e00-43ed-9e59-8d5353703cce&pf_rd_r=34Q58F3BHH9D4S0YRRS3&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=toptv&ref_=chttvtp_tt_250',
 '/title/tt3673794/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=12230b0e-0e00-43ed-9e59-8d5353703cce&pf_rd_r=34Q58F3BHH9D4S0YRRS3&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=toptv&ref_=chttvtp_tt_250',
 '/title/tt3673794/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=12230b0e-0e00-43ed-9e59-8d535

### 2.B) Get Data on the Top TV Shows

Although IMBb doesn't have a public API, an open API exists at http://www.tvmaze.com/api.

Use this API to retrieve information about each of the 250 TV shows you extracted in the previous step.
1) Check the documentation of TVmaze's API to learn how to request show data by ID.
- Define a function that returns a Python object with select information for a given ID.
    - Show name.
    - Rating (avg).
    - Genre(s).
    - Network name.
    - Premiere date.
    - Status.
> Tip: The JSON object can easily be converted into a Python dictionary.

- Store the gathered information in a Pandas DataFrame.

Because the target information is in a JSON format, you'll need `json.loads(res.text)` in order to gather it.

In [58]:
# You can either post or get information from this API.
api_base_url = 'http://api.tvmaze.com/lookup/shows?imdb=tt0944947'
res = requests.get(api_base_url)
text = json.loads(res.text)

In [88]:
print text['name']
print text['rating']['average']
print text['genres']
print text['premiered']
print text['network']
print text['status']

Game of Thrones
9.4
[u'Drama', u'Adventure', u'Fantasy']
2011-04-17
{u'country': {u'timezone': u'America/New_York', u'code': u'US', u'name': u'United States'}, u'id': 8, u'name': u'HBO'}
Running


In [127]:
show_dicts = {}
for idx, id in enumerate(ids):
    show_dict = {}
    print idx
    api_base_url = 'http://api.tvmaze.com/lookup/shows?imdb=%s' %id
    res = requests.get(api_base_url)
    text = json.loads(res.text)
    try:
        show_dict['name'] = text['name']
        show_dict['rating'] = text['rating']['average']
        show_dict['genres'] = text['genres']
        show_dict['premiered'] = text['premiered']
        show_dict['network'] = text['network']
        show_dict['status'] = text['status']
    except:
        print id

    show_dicts[idx] = show_dict

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51


ValueError: No JSON object could be decoded

In [128]:
show_dicts

{0: {'genres': [u'Nature'],
  'name': u'Planet Earth II',
  'network': {u'country': {u'code': u'GB',
    u'name': u'United Kingdom',
    u'timezone': u'Europe/London'},
   u'id': 12,
   u'name': u'BBC One'},
  'premiered': u'2016-11-06',
  'rating': 9.7,
  'status': u'Ended'},
 1: {'genres': [u'Drama', u'Action', u'War'],
  'name': u'Band of Brothers',
  'network': {u'country': {u'code': u'US',
    u'name': u'United States',
    u'timezone': u'America/New_York'},
   u'id': 8,
   u'name': u'HBO'},
  'premiered': u'2001-09-09',
  'rating': 9.4,
  'status': u'Ended'},
 2: {'genres': [u'Nature'],
  'name': u'Planet Earth',
  'network': {u'country': {u'code': u'GB',
    u'name': u'United Kingdom',
    u'timezone': u'Europe/London'},
   u'id': 12,
   u'name': u'BBC One'},
  'premiered': u'2006-03-05',
  'rating': 9.3,
  'status': u'Ended'},
 3: {'genres': [u'Drama', u'Adventure', u'Fantasy'],
  'name': u'Game of Thrones',
  'network': {u'country': {u'code': u'US',
    u'name': u'United State

In [125]:
df = pd.DataFrame.from_dict(show_dicts).T

In [126]:
df.head()

Unnamed: 0,genres,name,network,premiered,rating,status
0,[Nature],Planet Earth II,"{u'country': {u'timezone': u'Europe/London', u...",2016-11-06,9.7,Ended
1,"[Drama, Action, War]",Band of Brothers,{u'country': {u'timezone': u'America/New_York'...,2001-09-09,9.4,Ended
2,[Nature],Planet Earth,"{u'country': {u'timezone': u'Europe/London', u...",2006-03-05,9.3,Ended
3,"[Drama, Adventure, Fantasy]",Game of Thrones,{u'country': {u'timezone': u'America/New_York'...,2011-04-17,9.4,Running
4,"[Drama, Crime, Thriller]",Breaking Bad,{u'country': {u'timezone': u'America/New_York'...,2008-01-20,9.3,Ended
