## **Data Collection using API**

**Business Scenario**
You are the co-founder of a start-up Tuber, a free video streaming service that targets college students. Before launching Tuber, you need to build a movie database that covers information about major movie releases worldwide. After doing some research, you find that the OMDb website has the data you need. OMDb provides a set of APIs (i.e., Application Programing Interfaces) that you may use to collect the data.

**Please Use the Google Chrome browser. You might run into some issues if you are using other browsers such as Safari.**

**Save a copy of this document to your Google Drive: Go to the upper left-hand corner of this page, click "File" and choose "Save a copy in Drive".** You will see a folder named "Colab Notebooks" in your Google Drive.

First, let us import all the necessary libraries:

In [1]:
import requests       # The package we will use to connect to API and return data of interests. Response often in JSON format
import json           # The package we will use to process JSON data

Run the program!

In [3]:
url_head = 'https://www.omdbapi.com/?t='
apikey = '&apikey=6ae98900'
title = 'Frozen'

url = url_head + title + apikey # The url is: https://www.omdbapi.com/?t=Frozen&apikey=YOURAKIKEY

print(url)

response = requests.get(url)    # Connect to the API to download the data

print(response) #status code 200 indicates that the HTTP request was successful, meaning the server processed the request and returned the desired response

print(response.text)

https://www.omdbapi.com/?t=Frozen&apikey=6ae98900
<Response [200]>
{"Title":"Frozen","Year":"2013","Rated":"PG","Released":"27 Nov 2013","Runtime":"102 min","Genre":"Animation, Adventure, Comedy","Director":"Chris Buck, Jennifer Lee","Writer":"Jennifer Lee, Hans Christian Andersen, Chris Buck","Actors":"Kristen Bell, Idina Menzel, Jonathan Groff","Plot":"Fearless optimist Anna teams up with rugged mountain man Kristoff and his loyal reindeer Sven in an epic journey to find Anna's sister Elsa, whose icy powers have trapped the kingdom of Arendelle in eternal winter.","Language":"English","Country":"United States","Awards":"Won 2 Oscars. 83 wins & 60 nominations total","Poster":"https://m.media-amazon.com/images/M/MV5BMTQ1MjQwMTE5OF5BMl5BanBnXkFtZTgwNjk3MTcyMDE@._V1_SX300.jpg","Ratings":[{"Source":"Internet Movie Database","Value":"7.4/10"},{"Source":"Rotten Tomatoes","Value":"89%"},{"Source":"Metacritic","Value":"75/100"}],"Metascore":"75","imdbRating":"7.4","imdbVotes":"674,029","imdbI

Extract relevant data from the JSON response

In [4]:
jc = response.json() # Convert the data into a JSON object that can be manipulated by Python
jc

{'Title': 'Frozen',
 'Year': '2013',
 'Rated': 'PG',
 'Released': '27 Nov 2013',
 'Runtime': '102 min',
 'Genre': 'Animation, Adventure, Comedy',
 'Director': 'Chris Buck, Jennifer Lee',
 'Writer': 'Jennifer Lee, Hans Christian Andersen, Chris Buck',
 'Actors': 'Kristen Bell, Idina Menzel, Jonathan Groff',
 'Plot': "Fearless optimist Anna teams up with rugged mountain man Kristoff and his loyal reindeer Sven in an epic journey to find Anna's sister Elsa, whose icy powers have trapped the kingdom of Arendelle in eternal winter.",
 'Language': 'English',
 'Country': 'United States',
 'Awards': 'Won 2 Oscars. 83 wins & 60 nominations total',
 'Poster': 'https://m.media-amazon.com/images/M/MV5BMTQ1MjQwMTE5OF5BMl5BanBnXkFtZTgwNjk3MTcyMDE@._V1_SX300.jpg',
 'Ratings': [{'Source': 'Internet Movie Database', 'Value': '7.4/10'},
  {'Source': 'Rotten Tomatoes', 'Value': '89%'},
  {'Source': 'Metacritic', 'Value': '75/100'}],
 'Metascore': '75',
 'imdbRating': '7.4',
 'imdbVotes': '674,029',
 'imd

In [5]:

imdbID = jc['imdbID']          # The unique id of the movie
MovieTitle = jc['Title']      # The name of the movie
ReleaseYear = jc['Year']            #  The release year
Runtime = jc['Runtime']        # The duration of the movie
Genre = jc['Genre']  # The genre

data = [imdbID, MovieTitle, ReleaseYear, Runtime, Genre]

columns=['imdbID', 'MovieTitle', 'ReleaseYear', 'Runtime', 'Genre']
print(columns)
print(data) # Print out the content is often useful to make sure your program work. Comment out this line when your program is ready.

['imdbID', 'MovieTitle', 'ReleaseYear', 'Runtime', 'Genre']
['tt2294629', 'Frozen', '2013', '102 min', 'Animation, Adventure, Comedy']


**Questions:**

1. The Python code above collects data for one movie. Can you modify the code so that it can collects data for a list of movies below?

['Frozen', 'American Hustle', 'The Wolf of Wall Street', 'This is the End', 'Monsters University']

2. Can you save the data for the list of movies as an Excel/CSV file?

In [6]:
movies = ['Frozen', 'American Hustle', 'The Wolf of Wall Street', 'This is the End', 'Monsters University']

imdbID = []
MovieTitle = []
ReleaseYear = []
Runtime = []
Genre = []

for movie in movies:
    jc = requests.get(url_head + movie + apikey).json()
    imdbID.append(jc['imdbID'])          # The unique id of the movie
    MovieTitle.append(jc['Title'])      # The name of the movie
    ReleaseYear.append(jc['Year'])            #  The release year
    Runtime.append(jc['Runtime'])        # The duration of the movie
    Genre.append(jc['Genre'])


In [8]:
import pandas as pd
df = pd.DataFrame(list(zip(imdbID, MovieTitle, ReleaseYear, Runtime, Genre)), columns = ['imdbID', 'MovieTitle', 'ReleaseYear', 'Runtime', 'Genre'])
df
#df.to_csv('movie_list.csv', index=False)

Unnamed: 0,imdbID,MovieTitle,ReleaseYear,Runtime,Genre
0,tt2294629,Frozen,2013,102 min,"Animation, Adventure, Comedy"
1,tt1800241,American Hustle,2013,138 min,"Crime, Drama"
2,tt0993846,The Wolf of Wall Street,2013,180 min,"Biography, Comedy, Crime"
3,tt1245492,This Is the End,2013,107 min,"Comedy, Fantasy"
4,tt1453405,Monsters University,2013,104 min,"Animation, Adventure, Comedy"


In [None]:
from google.colab import files

files.download('movie_list.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

**Tips:**

JSON file can be a messy if there are lots of data in it. There are ways to format it to be more human friendly. Some browsers (e.g., Firefox and Edge) will automatically format the JSON data for you. You can also format the JSON content using the following website:

https://jsonformatter.org/json-viewer