## Intro to APIs in Python Workshop

#### h/t to @josephofiowa

We will be using the OMDb Open Movie Database here: http://www.omdbapi.com/

First, we will import the necessary libraries.

Pandas is a common python library for data analysis. 
<br>
Requests is the main python library used for making API requests. 
<br>
Time is a library that allows us to pause between API calls to avoid getting rate limited

In [2]:
import pandas as pd
import requests
from time import sleep

Read IMDb data into a DataFrame: we want a year column!

In [3]:
movies = pd.read_csv('https://raw.githubusercontent.com/josephnelson93/GWDATA/master/imdb_1000.csv')
movies.head()

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
0,9.3,The Shawshank Redemption,R,Crime,142,"[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt..."
1,9.2,The Godfather,R,Crime,175,"[u'Marlon Brando', u'Al Pacino', u'James Caan']"
2,9.1,The Godfather: Part II,R,Crime,200,"[u'Al Pacino', u'Robert De Niro', u'Robert Duv..."
3,9.0,The Dark Knight,PG-13,Action,152,"[u'Christian Bale', u'Heath Ledger', u'Aaron E..."
4,8.9,Pulp Fiction,R,Crime,154,"[u'John Travolta', u'Uma Thurman', u'Samuel L...."


Use requests library to interact with a URL

In [51]:
r = requests.get('http://www.omdbapi.com/?apikey=f4e95a92&t=the+shawshank+redemption&r=json&type=movie')

Check the status: 200 means success, 4xx means error

In [53]:
r.status_code

200

View the raw response text

In [54]:
r.text

u'{"Title":"The Shawshank Redemption","Year":"1994","Rated":"R","Released":"14 Oct 1994","Runtime":"142 min","Genre":"Drama","Director":"Frank Darabont","Writer":"Stephen King (short story \\"Rita Hayworth and Shawshank Redemption\\"), Frank Darabont (screenplay)","Actors":"Tim Robbins, Morgan Freeman, Bob Gunton, William Sadler","Plot":"Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.","Language":"English","Country":"USA","Awards":"Nominated for 7 Oscars. Another 19 wins & 32 nominations.","Poster":"https://m.media-amazon.com/images/M/MV5BMDFkYTc0MGEtZmNhMC00ZDIzLWFmNTEtODM1ZmRlYWMwMWFmXkEyXkFqcGdeQXVyMTMxODk2OTU@._V1_SX300.jpg","Ratings":[{"Source":"Internet Movie Database","Value":"9.3/10"},{"Source":"Rotten Tomatoes","Value":"91%"},{"Source":"Metacritic","Value":"80/100"}],"Metascore":"80","imdbRating":"9.3","imdbVotes":"2,069,380","imdbID":"tt0111161","Type":"movie","DVD":"27 Jan 1998","BoxOffice":"N/A","Product

Decode the JSON response body into a dictionary

In [56]:
r.json()

{u'Actors': u'Tim Robbins, Morgan Freeman, Bob Gunton, William Sadler',
 u'Awards': u'Nominated for 7 Oscars. Another 19 wins & 32 nominations.',
 u'BoxOffice': u'N/A',
 u'Country': u'USA',
 u'DVD': u'27 Jan 1998',
 u'Director': u'Frank Darabont',
 u'Genre': u'Drama',
 u'Language': u'English',
 u'Metascore': u'80',
 u'Plot': u'Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.',
 u'Poster': u'https://m.media-amazon.com/images/M/MV5BMDFkYTc0MGEtZmNhMC00ZDIzLWFmNTEtODM1ZmRlYWMwMWFmXkEyXkFqcGdeQXVyMTMxODk2OTU@._V1_SX300.jpg',
 u'Production': u'Columbia Pictures',
 u'Rated': u'R',
 u'Ratings': [{u'Source': u'Internet Movie Database', u'Value': u'9.3/10'},
  {u'Source': u'Rotten Tomatoes', u'Value': u'91%'},
  {u'Source': u'Metacritic', u'Value': u'80/100'}],
 u'Released': u'14 Oct 1994',
 u'Response': u'True',
 u'Runtime': u'142 min',
 u'Title': u'The Shawshank Redemption',
 u'Type': u'movie',
 u'Website': u'N/A',
 u'Write

Extracting the year from the dictionary

In [57]:
r.json()['Year']

u'1994'

What happens if the movie name is not recognized?

In [65]:
r = requests.get('http://www.omdbapi.com/?apikey=f4e95a92&t=blahblahblah&r=json&type=movie')
r.status_code
r.json()

{u'Error': u'Movie not found!', u'Response': u'False'}

Define a function to return the year

In [66]:
def get_movie_year(title):
    r = requests.get('http://www.omdbapi.com/?apikey=f4e95a92&t=' + title + '&r=json&type=movie')
    info = r.json()
    if info['Response'] == 'True':
        return int(info['Year'])
    else:
        return None

Test the function

In [67]:
get_movie_year('The Shawshank Redemption')

1994

In [68]:
get_movie_year('blahblahblah')

Create a smaller DataFrame for testing

In [69]:
top_movies = movies.head().copy()

Write a for loop to build a list of years

In [70]:
years = []
for title in top_movies.title:
    years.append(get_movie_year(title))
    sleep(1)

Check that the DataFrame and the list of years are the same length

In [71]:
assert(len(top_movies) == len(years))

Save that list as a new column

In [72]:
top_movies['year'] = years

In [73]:
top_movies

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list,year
0,9.3,The Shawshank Redemption,R,Crime,142,"[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...",1994
1,9.2,The Godfather,R,Crime,175,"[u'Marlon Brando', u'Al Pacino', u'James Caan']",1972
2,9.1,The Godfather: Part II,R,Crime,200,"[u'Al Pacino', u'Robert De Niro', u'Robert Duv...",1974
3,9.0,The Dark Knight,PG-13,Action,152,"[u'Christian Bale', u'Heath Ledger', u'Aaron E...",2008
4,8.9,Pulp Fiction,R,Crime,154,"[u'John Travolta', u'Uma Thurman', u'Samuel L....",1994
