## Getting Data From APIs

What is an API?
- Application Programming Interface
- Structured way to expose specific functionality and data access to users
- Web APIs usually follow the "REST" standard

How to interact with a REST API:
- Make a "request" to a specific URL (an "endpoint"), and get the data back in a "response"
- Most relevant request method for us is GET (other methods: POST, PUT, DELETE)
- Response is often JSON format
- Web console is sometimes available (allows you to explore an API)

### Read the IMDb data into a DataFrame: we want a year column!


In [2]:
import pandas as pd
movies = pd.read_csv('../data/imdb_1000.csv')
movies.head(2)

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
0,9.3,The Shawshank Redemption,R,Crime,142,"[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt..."
1,9.2,The Godfather,R,Crime,175,"[u'Marlon Brando', u'Al Pacino', u'James Caan']"


We can use the library **requests** to interact with a URL and webpages.

**Homework:** Read up on the requests library

In [4]:
import requests
# from the omdapi.com site:
r1 = requests.get('http://www.omdbapi.com/?t=Minions&y=&plot=short&r=json&type=movie')
r = requests.get('http://www.omdbapi.com/?t=the shawshank redemption&r=json&type=movie')

In [8]:
# check the status: 200 means success, 4xx means error
r.status_code

200

In [10]:
# view the raw response text
r.text

u'{"Title":"The Shawshank Redemption","Year":"1994","Rated":"R","Released":"14 Oct 1994","Runtime":"142 min","Genre":"Crime, Drama","Director":"Frank Darabont","Writer":"Stephen King (short story \\"Rita Hayworth and Shawshank Redemption\\"), Frank Darabont (screenplay)","Actors":"Tim Robbins, Morgan Freeman, Bob Gunton, William Sadler","Plot":"Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.","Language":"English","Country":"USA","Awards":"Nominated for 7 Oscars. Another 14 wins & 19 nominations.","Poster":"http://ia.media-imdb.com/images/M/MV5BODU4MjU4NjIwNl5BMl5BanBnXkFtZTgwMDU2MjEyMDE@._V1_SX300.jpg","Metascore":"80","imdbRating":"9.3","imdbVotes":"1,569,613","imdbID":"tt0111161","Type":"movie","Response":"True"}'

In [13]:
r1.json()

{u'Actors': u'Sandra Bullock, Jon Hamm, Michael Keaton, Allison Janney',
 u'Awards': u'N/A',
 u'Country': u'USA',
 u'Director': u'Kyle Balda, Pierre Coffin',
 u'Genre': u'Animation, Comedy, Family',
 u'Language': u'English',
 u'Metascore': u'56',
 u'Plot': u'Minions Stuart, Kevin and Bob are recruited by Scarlet Overkill, a super-villain who, alongside her inventor husband Herb, hatches a plot to take over the world.',
 u'Poster': u'http://ia.media-imdb.com/images/M/MV5BMTg2MTMyMzU0M15BMl5BanBnXkFtZTgwOTU3ODk4NTE@._V1_SX300.jpg',
 u'Rated': u'PG',
 u'Released': u'10 Jul 2015',
 u'Response': u'True',
 u'Runtime': u'91 min',
 u'Title': u'Minions',
 u'Type': u'movie',
 u'Writer': u'Brian Lynch',
 u'Year': u'2015',
 u'imdbID': u'tt2293640',
 u'imdbRating': u'6.5',
 u'imdbVotes': u'100,622'}

In [12]:
r1.json()["Year"]

u'2015'

In [14]:
r1.json()["Plot"]

u'Minions Stuart, Kevin and Bob are recruited by Scarlet Overkill, a super-villain who, alongside her inventor husband Herb, hatches a plot to take over the world.'

### Decode the JSON response body into a python data structure

#### Quiz: What is the Data Structure and how would we look at 'Year' that the movie came out?

In [None]:
r.json()

In [15]:
r = requests.get('http://www.omdbapi.com/?t=blahblahblah&r=json&type=movie')
r.status_code
r.json()

{u'Error': u'Movie not found!', u'Response': u'False'}

## Let's define a function to return the year

In [50]:
def get_movie_year(title, param):
    r = requests.get('http://www.omdbapi.com/?t=' + title + '&r=json&type=movie')
    info = r.json()
    if info['Response'] == 'True':
        try:
            return info[param]
        except:
            return 'Exception NA'
    else:
        return ""

### Let's test that the function works

In [40]:
get_movie_year('The Shawshank Redemption', 'Year')
get_movie_year("Minions", 'Plot')
#get_movie_year('blahblahblah', 'Year')

u'Minions Stuart, Kevin and Bob are recruited by Scarlet Overkill, a super-villain who, alongside her inventor husband Herb, hatches a plot to take over the world.'

In [21]:
print "" == None

False


## Let's do something cool with our new function

#### Create a smaller DataFrame for testing

In [44]:
top_movies = movies.head().copy()
top_movies.head(3)

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
0,9.3,The Shawshank Redemption,R,Crime,142,"[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt..."
1,9.2,The Godfather,R,Crime,175,"[u'Marlon Brando', u'Al Pacino', u'James Caan']"
2,9.1,The Godfather: Part II,R,Crime,200,"[u'Al Pacino', u'Robert De Niro', u'Robert Duv..."


#### Write a for loop to build a list of years

In [54]:
from time import sleep
years = []
for title in top_movies.title:
    years.append(get_movie_year(title, "Year"))
    sleep(3)

In [59]:
a = [1,2,3]
b = a
b.append(4)
a

[1, 2, 3, 4]

### Quiz: What is the sleep function and why is it a good idea to use it?

#### Check that the DataFrame and the list of years are the same length

In [64]:
assert(len(top_movies) == len(years))


### Quiz: How would we save the list of years to a new column in top movies?

In [61]:
years

[u'1994', u'1972', u'1974', u'2008', u'1994']

In [68]:
top_movies["Year"] = years
#top_movies.insert(0, "Year", years)

In [69]:
top_movies.head(3)

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list,Year
0,9.3,The Shawshank Redemption,R,Crime,142,"[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...",1994
1,9.2,The Godfather,R,Crime,175,"[u'Marlon Brando', u'Al Pacino', u'James Caan']",1972
2,9.1,The Godfather: Part II,R,Crime,200,"[u'Al Pacino', u'Robert De Niro', u'Robert Duv...",1974


** Bonus Content (Time Permitting) **

In [70]:
# enumerate allows you to access the item location while iterating
letters = ['a', 'b', 'c']
for index, letter in enumerate(letters):
    print index, letter

0 a
1 b
2 c


In [None]:
# iterrows method for DataFrames is similar
for index, row in top_movies.iterrows():
    print index, row.title

In [83]:
for index, row in top_movies.iterrows():
    print index, row["Year"]

0 1994
1 1972
2 1974
3 2008
4 1994


In [None]:
# create a new column and set a default value
movies['year'] = -1

In [84]:
# loc method allows you to access a DataFrame element by 'label'
movies.loc[0, 'year'] = 1994

In [87]:
movies.head(3)

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list,year
0,9.3,The Shawshank Redemption,R,Crime,142,"[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...",1994.0
1,9.2,The Godfather,R,Crime,175,"[u'Marlon Brando', u'Al Pacino', u'James Caan']",
2,9.1,The Godfather: Part II,R,Crime,200,"[u'Al Pacino', u'Robert De Niro', u'Robert Duv...",


In [89]:
# write a for loop to update the year for the first three movies
for index, row in movies.iterrows():
    if index < 3:
        movies.loc[index, 'year'] = get_movie_year(row.title, "Year")
        sleep(1)
    else:
        break

In [90]:
movies.head(5)

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list,year
0,9.3,The Shawshank Redemption,R,Crime,142,"[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...",1994.0
1,9.2,The Godfather,R,Crime,175,"[u'Marlon Brando', u'Al Pacino', u'James Caan']",1972.0
2,9.1,The Godfather: Part II,R,Crime,200,"[u'Al Pacino', u'Robert De Niro', u'Robert Duv...",1974.0
3,9.0,The Dark Knight,PG-13,Action,152,"[u'Christian Bale', u'Heath Ledger', u'Aaron E...",
4,8.9,Pulp Fiction,R,Crime,154,"[u'John Travolta', u'Uma Thurman', u'Samuel L....",


Other considerations when accessing APIs:
- Most APIs require you to have an access key (which you should store outside your code)
- Most APIs limit the number of API calls you can make (per day, hour, minute, etc.)
- Not all APIs are free
- Not all APIs are well-documented
- Pay attention to the API version
Python wrapper is another option for accessing an API:
- Set of functions that "wrap" the API code for ease of use
- Potentially simplifies your code
- But, wrapper could have bugs or be out-of-date or poorly documented