<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Practice Using APIs

---

In this lab we will practice web scraping and using an API to retrieve and store data.

In [2]:
# Imports at the top
import json
import pandas as pd
import numpy as np
import requests
import json
import re
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt
%matplotlib inline

## IMDB TV Shows

---

Sometimes an API doesn't provide all the information we would like to get and we need to be creative.

Here we will use a combination of scraping and API calls to find the ratings and networks of famous television shows.

### 1 Get the top TV Shows

The Internet Movie Database contains data about movies and TV shows. Unfortunately it does not have a public API.

The page http://www.imdb.com/chart/toptv/?ref_=nv_tp_tv250_2 contains the list of the top 250 tv shows of all time. 

Let's try to get the web page directly:

In [3]:
# Send a request to get the books.toscrape web page

web_url = "https://www.imdb.com/chart/toptv/?ref_=nv_tp_tv250_2"
response = requests.get(web_url)

# Check the response 
if (response.status_code == 200):
    print(response.headers['Content-Type'])
else:
    print(response.reason)

Forbidden


**Forbidden??**
Looks like we are not able to scrape the data directly as the site is not allowing automated requests.


However, we can save the HTML page as a file, then parse the file using `BeautifulSoup`.

I have saved it as as "Top250TVShows.html". 


In [None]:
# open the file
topshows_file = open("../python-web_services_apis-lab/files/Top250TVShows.html", "r", encoding="utf-8")

# read the file as as BeautifulSoup object
soup = BeautifulSoup(topshows_file.read(), 'html.parser')
type(soup)

## 1.1 Find the movie_ids

Parse the html to obtain a list of the `movie_ids` for these movies. 


> **Hint:** movie_ids look like this: `tt2582802`
> _Everything after "/title/" and before "/?"_.
> We can use regular expressions (the `re` library) to find the tag with `href` value that contains 
> a string that looks like `/title` using `href=re.compile('/title')`


In [None]:
# Where can we find the top 250 tv shows? Inspect the elements to find it.


In [None]:
# Get the movie IDs only as a list (practice list comprehension...)


### 2 Get data on the top movies

Although the Internet Movie Database does not have a public API, an open API exists at http://www.tvmaze.com/api.

We will use this API to retrieve information about each of the 250 TV shows you have extracted in the previous step.



**2.1 Find the Correct API**

Check the documentation of tvmaze's api to select the correct endpoint to use in our request so that we can get the show data by the IMDB movie id.

In [None]:
## What's the endpoint URI?


In [None]:
## Test the request with the first movie id that you found
# and decode the json info


**2.2 Let's get some data**

Let's check that you can get each of the required info to be stored

In [None]:
# Get the name of the show


In [None]:
# Get the average rating, return np.NaN if the value for the key is None


In [None]:
# Get the genres as a string using the string join() 


In [None]:
# Get the network name, return None if the value for the key is None


In [None]:
# Get the Premiere date


In [None]:
# Get the status


**2.3 Write a Function**

- Define a function that returns a python list object with selected information for a given id.
    - Show name
    - Rating (avg)
    - Genre(s)
    - Network name
    - Premiere date
    - Status


In [None]:
# A:
def get_show_info(movie_id):
    info_list = []
    response = requests.get('https://api.tvmaze.com/lookup/shows?imdb='+movie_id)
    if (response.status_code == 200):
        show_info=response.json()
        info_list.append(movie_id)
        info_list.append(show_info['name'])
        info_list.append(show_info['rating']['average'])
        info_list.append(','.join(show_info['genres']))
        info_list.append(show_info['network']['name'] if show_info['network'] else None)
        info_list.append(show_info['premiered'])
        info_list.append(show_info['status'])
    return info_list


In [None]:
# Test the Function


**2.3 Create a DataFrame**

Let's create a `Pandas` DataFrame to store all the show data:


In [None]:
shows_df = pd.DataFrame(columns=('IMDB_id','Name','Rating','Genres','Network','Premier_Date','Status'))
shows_df

In [None]:
# Lets test getting one movie and adding to the DataFrame
show_info = get_show_info('tt2707408')
show_info

In [None]:
# We can now append this list to our dataframe, let's store in a temporary dataframe
temp_df = shows_df.append(pd.Series(show_info, index=shows_df.columns), ignore_index=True)
temp_df

**2.4 Store all the movies**

Now let's add the data for all 250 shows to `shows_df`. 
We need to check that the list is non-empty though before appending to the dataframe, in case we were not able to look it up using the API. 

In [None]:
# Just try to add 10 first
