# Recommendation Engine API Calls

Anaysis by Brendan Bullivant & Frank Flavell

## Overview

In this notebook we conducted API Calls from TMDB to obtain the movie descriptions, which we will use in the content-based recommendation engine.

## Table of Contents<span id="0"></span>

1. [**Data Import & API Calls**](#1)
    * Merged links.csv df to master df for TMDB IDs
    * API Calls to TMDB for Movie Descriptions
    * Clean API df
    * Merge API df with master df

# Package Import

In [55]:
# import external libraries
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import re #regex

#For API Calls
import json
import time
import requests

# Configure matplotlib for jupyter.
%matplotlib inline

# <span id="1"></span>1. Data Import & API Calls
#### [Return Contents](#0)

We import the .csv files listed in the dataset summary above and review their contents.  We decided to merge the movies and the ratings together.  Since we do not need additional information to build a recommendaiton system, we do not need the links .csv for making api calls to IMDB and TMDB.  We also do not need the tags at this time.

In [56]:
#Imports the dataframes
links = pd.read_csv("ml-latest-small/links.csv")
tags = pd.read_csv("ml-latest-small/tags.csv")
df = pd.read_pickle('cleaned.pickle')

In [57]:
links.head()

Unnamed: 0,movieId,imdbId,tmdbId
0,1,114709,862.0
1,2,113497,8844.0
2,3,113228,15602.0
3,4,114885,31357.0
4,5,113041,11862.0


In [58]:
links.dropna(inplace=True)

In [59]:
links.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9734 entries, 0 to 9741
Data columns (total 3 columns):
movieId    9734 non-null int64
imdbId     9734 non-null int64
tmdbId     9734 non-null float64
dtypes: float64(1), int64(2)
memory usage: 304.2 KB


In [60]:
links['tmdbId'] = links['tmdbId'].astype(int)

In [61]:
tags.head()

Unnamed: 0,userId,movieId,tag,timestamp
0,2,60756,funny,1445714994
1,2,60756,Highly quotable,1445714996
2,2,60756,will ferrell,1445714992
3,2,89774,Boxing story,1445715207
4,2,89774,MMA,1445715200


In [62]:
df.head()

Unnamed: 0,userId,movieId,rating,title,genres,year
0,1,1,4.0,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,1995
1,5,1,4.0,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,1995
2,7,1,4.5,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,1995
3,15,1,2.5,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,1995
4,17,1,4.5,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,1995


In [None]:
df = pd.merge(df, links, on="movieId")

## TMDB API Calls

Using the links dataframe, we grab the essential information we need for each movie, including the description and  keywords, so we can build an effective content-based recommendation engine.

In [63]:
num = links.loc[:, 'tmdbId']
num

0          862
1         8844
2        15602
3        31357
4        11862
         ...  
9737    432131
9738    445030
9739    479308
9740    483455
9741     37891
Name: tmdbId, Length: 9734, dtype: int64

We made a list of the movie IDs, which made it easier to conduct the API calls.

In [66]:
movie_ids = []
for num in links['tmdbId']:
    movie_ids.append(num)

print(movie_ids)

[862, 8844, 15602, 31357, 11862, 949, 11860, 45325, 9091, 710, 9087, 12110, 21032, 10858, 1408, 524, 4584, 5, 9273, 11517, 8012, 1710, 9691, 12665, 451, 16420, 9263, 17015, 902, 37557, 9909, 63, 9598, 687, 33689, 9603, 34615, 31174, 11443, 35196, 9312, 577, 11861, 807, 10530, 8391, 629, 11448, 49133, 26441, 97406, 9089, 11010, 11359, 17182, 2054, 10607, 19760, 9536, 11525, 4482, 10634, 755, 11859, 28387, 48750, 20927, 36929, 9102, 124626, 27526, 9623, 46785, 400, 880, 146599, 8447, 10534, 17414, 13997, 2086, 9095, 12158, 9283, 9208, 40154, 406, 63076, 11062, 13685, 47475, 2045, 9614, 688, 11907, 10874, 89333, 197, 103, 33542, 43566, 51352, 16934, 10324, 78406, 32119, 11066, 11104, 2074, 27793, 290157, 110972, 11863, 9101, 5757, 9302, 11000, 16388, 9737, 30765, 10474, 22279, 30157, 568, 11780, 34996, 414, 649, 1873, 5894, 1775, 8839, 20649, 10329, 8963, 26564, 8068, 8512, 1572, 13552, 6520, 9073, 10428, 17447, 9886, 9482, 19326, 9344, 9071, 8973, 15730, 47608, 2293, 9070, 48787, 34574, 

In [101]:
len(movie_ids)

9734

## API Calls to TMDB for Details including Descriptions

If you wish to make API calls to TMDB, then you should visit their site to create an account and obstain your own API key which you can input in the code block below.

In [74]:
List_of_responses = []
for num in movie_ids[0:5]: #we updated this to the first 5 webpages after we obtained the data so we could run other code if necesary.
    response = requests.get("https://api.themoviedb.org/3/movie/" + str(num) + "?api_key=" + config.key + "&language=en-US")
    data = response.json()
    time.sleep(.5)
    List_of_responses.append(data)
print(List_of_responses)



In [75]:
List_of_responses[0].keys()

dict_keys(['adult', 'backdrop_path', 'belongs_to_collection', 'budget', 'genres', 'homepage', 'id', 'imdb_id', 'original_language', 'original_title', 'overview', 'popularity', 'poster_path', 'production_companies', 'production_countries', 'release_date', 'revenue', 'runtime', 'spoken_languages', 'status', 'tagline', 'title', 'video', 'vote_average', 'vote_count'])

We turned the .json responses into a dataframe.

In [None]:
tmdb_df = pd.DataFrame(List_of_responses)

We made a pickle of the raw TMDB data for safe keeping.

In [104]:
tmdb_df.to_pickle("raw_tmdb.pickle")