# **Movies Dataset Content-Based, Collaborative Filtering, and Hybrid Recommender Models and Demo of Hybrid Recommender Model with 3 User Profiles**

## **Collaborators** 
- Ashna Sood 
- Urmi Suresh
- Tae Kim 
- Xianglong Wang

## **Imports** 

In [1]:
%matplotlib inline

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import ast
import os 
import pickle
import math

import seaborn as sns
sns.set()
sns.set_context('talk')

import warnings
warnings.filterwarnings('ignore')

import patsy
import statsmodels.api as sm
import scipy.stats as stats

from sklearn.metrics import make_scorer, accuracy_score, plot_confusion_matrix
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split, StratifiedKFold, KFold, GridSearchCV
from sklearn.pipeline import make_pipeline, Pipeline
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.manifold import TSNE

from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.metrics.pairwise import linear_kernel, cosine_similarity

from nltk.stem.snowball import SnowballStemmer

from sklearn import metrics

## **Content Based Recommender System Using Cleaned Metadata**

In [2]:
# read in cleaned movies metadata csv file
movies_df = pd.read_csv("movies_metadata_cleaned.csv")
movies_df

Unnamed: 0,ID,IMDB ID,Title,Collection,Genres,Language,Spoken Languages,Release Date,Runtime,Revenue,...,Production Countries,Popularity Rating,Vote Count,Vote Average,Keywords,Cast,Director,Writer,Producer,Metadata
0,461257,tt6980792,Queerama,,[],en,['en'],2017-06-09,75.0,,...,['United Kingdom'],0.163015,,,[],[],daisyasquith,,,daisyasquith
1,92323,tt0081758,Willie and Phil,,[],en,[],1980-08-15,115.0,,...,[],0.326500,,,[],"['michaelontkean', 'raysharkey', 'margotkidder']",paulmazursky,paulmazursky,,paulmazursky paulmazursky michaelontkean rays...
2,114838,tt0029949,Brother Rat,,['Comedy'],en,['en'],1938-10-29,87.0,,...,['United States of America'],0.174691,,,['basedonplayormusical'],"['ronaldreagan', 'janewyman', 'priscillalane',...",williamkeighley,jerrywald,,williamkeighley jerrywald Comedy ronaldreagan...
3,264723,tt0070580,Le pélican,,[],en,[],1974-02-06,83.0,,...,[],0.000115,,,[],[],gérardblain,,,gérardblain
4,88061,tt0055459,"So Evil, So Young",,['Drama'],en,['en'],1963-01-01,77.0,,...,[],0.001662,,,"['prison', ""women'sprison""]","['jillireland', 'ellenpollock', 'joanhaythorne...",godfreygrayson,markgrantham,,godfreygrayson markgrantham Drama jillireland...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
42273,24428,tt0848228,The Avengers,theavengerscollection,"['Science Fiction', 'Action', 'Adventure']",en,['en'],2012-04-25,143.0,1.519558e+09,...,['United States of America'],89.887648,12000.0,7.4,"['newyork', 'shield', 'marvelcomic', 'superher...","['robertdowneyjr.', 'chrisevans', 'markruffalo...",josswhedon,josswhedon,stanlee,josswhedon josswhedon stanlee theavengerscolle...
42274,19995,tt0499549,Avatar,avatarcollection,"['Action', 'Adventure', 'Fantasy', 'Science Fi...",en,"['en', 'es']",2009-12-10,162.0,2.787965e+09,...,"['United States of America', 'United Kingdom']",185.070892,12114.0,7.2,"['cultureclash', 'future', 'spacewar', 'spacec...","['samworthington', 'zoesaldana', 'sigourneywea...",jamescameron,jamescameron,jamescameron,jamescameron jamescameron jamescameron avatarc...
42275,155,tt0468569,The Dark Knight,thedarkknightcollection,"['Drama', 'Action', 'Crime', 'Thriller']",en,"['en', 'zh']",2008-07-16,152.0,1.004558e+09,...,"['United Kingdom', 'United States of America']",123.167259,12269.0,8.3,"['dccomics', 'crimefighter', 'secretidentity',...","['christianbale', 'michaelcaine', 'heathledger...",christophernolan,christophernolan,charlesroven,christophernolan christophernolan charlesroven...
42276,27205,tt1375666,Inception,,"['Action', 'Thriller', 'Science Fiction', 'Mys...",en,['en'],2010-07-14,148.0,8.255328e+08,...,"['United Kingdom', 'United States of America']",29.108149,14075.0,8.1,"['lossoflover', 'dream', 'kidnapping', 'sleep'...","['leonardodicaprio', 'josephgordon-levitt', 'e...",christophernolan,christophernolan,christophernolan,christophernolan christophernolan christophern...


Content based recommendation systems use specific features of items to produce recommendations of other items that are similar to what a user has already indicated that they like. The user provided data is usually explicitly expressed via a rating the user has previously given, or implicitly expressed via clicking on a link. For our recommendation system, the specific features of the items that we wanted to use to produce recommendations was something we created ourselves using the features that were given to us in the original movies dataset. This was our metadata. We then created a cosine similarity matrix using a TF-IDF table of every pair of words in each one of the movies’ metadata strings. We decided on the cosine similarity matrix after exploring other options such as the Euclidean Distance and Pearson Correlation, as the cosine similarity is efficient at comparing how similar the contents of two vectors is and is minimally affected by what the magnitudes of the vectors are. Our output for each movie inputted was 20 movies with the highest similarity score. 

In [None]:
# vectorize the movies' metadata
count = CountVectorizer(analyzer='word',ngram_range=(1, 2),min_df=0, stop_words='english')
# tokenizes the strings and returns a vector for each string 
count_matrix = count.fit_transform(movies_df["Metadata"])

# calculate cosine similarity between the movies
cosine_sim = cosine_similarity(count_matrix, count_matrix)

# save cosine similarity matrix 
outfile = "metadata_cosineSim"
np.save(outfile, cosine_sim)

In [3]:
# load metadata cosine similarity matrix 
cosine_sim_loaded = np.load("metadata_cosineSim.npy")

In [4]:
movies_df = movies_df.reset_index()
movie_titles = movies_df['Title']
indices = pd.Series(movies_df.index, index=movies_df['Title'])

Content Based recommender method to extract movie recommendations based on input movie title.

In [5]:
# method concept inspired from kaggle notebook 
def get_recommendations(movie_title):
    movie_index = indices[movie_title]
    sim_scores = list(enumerate(cosine_sim_loaded[movie_index]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:20]
    movie_indices = [i[0] for i in sim_scores]
    return list(movie_titles.iloc[movie_indices])

## **Collaborative Filtering**

In [7]:
!pip install scikit-surprise
from surprise import Reader, Dataset, SVD
from surprise.model_selection import cross_validate
from surprise import dump



In [6]:
# read in movie ratings 
ratings_df = pd.read_csv('Movies Data/ratings.csv')

# rename columns
ratings_df = ratings_df.rename(columns={"userId": "User ID", 
                                        "movieId": "Movie ID", 
                                        "timestamp": "Timestamp"})

In [7]:
ratings_df

Unnamed: 0,User ID,Movie ID,rating,Timestamp
0,1,110,1.0,1425941529
1,1,147,4.5,1425942435
2,1,858,5.0,1425941523
3,1,1221,5.0,1425941546
4,1,1246,5.0,1425941556
...,...,...,...,...
26024284,270896,58559,5.0,1257031564
26024285,270896,60069,5.0,1257032032
26024286,270896,63082,4.5,1257031764
26024287,270896,64957,4.5,1257033990


In [5]:
# check for null values
ratings_df.isnull().any()

User ID      False
Movie ID     False
rating       False
Timestamp    False
dtype: bool

Collaborative filtering is an algorithm that creates recommendations based on data collected from other users, using the assumption that users who have similar interests in certain items are more likely to see eye to eye again. This stems from the idea that if we have friends who share our similar interests, then we would trust their recommendations in the future simply because we think we have the same tastes. Our collaborative filtering algorithm uses the similarity index technique where a certain number of users are chosen based on how similar they are to the user we are focusing on. The algorithm to calculate this is Signular Value Decompositon (SVD), which is a matrix factorization technique that reduces the number of features of a dataset. In the context of collaborative filtering, each row is a user and each column is a movie item, and the values in the matrix are the user ratings.  Then a weighted average of the selected users is created and that number provides the suggestions for the focused user. 

In [4]:
reader = Reader()

In [8]:
data = Dataset.load_from_df(ratings_df[['User ID', 'Movie ID', 'rating']], reader)
algo = SVD()
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.7965  0.7962  0.7956  0.7962  0.7960  0.7961  0.0003  
MAE (testset)     0.6023  0.6025  0.6019  0.6023  0.6021  0.6022  0.0002  
Fit time          685.16  694.63  693.42  692.81  695.55  692.31  3.70    
Test time         57.71   58.90   49.77   49.97   46.38   52.54   4.88    


{'test_rmse': array([0.79654272, 0.79619573, 0.79561253, 0.79622099, 0.79598232]),
 'test_mae': array([0.60225976, 0.60245458, 0.60187422, 0.6023217 , 0.6021279 ]),
 'fit_time': (685.1593697071075,
  694.6251404285431,
  693.4244229793549,
  692.8122618198395,
  695.5460493564606),
 'test_time': (57.705753564834595,
  58.895099401474,
  49.7657413482666,
  49.96788167953491,
  46.383798360824585)}

In [9]:
training_data = data.build_full_trainset()
algo.fit(training_data)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7fd5a709e520>

Surprise package predict method takes in: 
- uid = user ID
- iid = item id 
- rui = true rating -- optional

In [None]:
algo.predict(1, 302, 3).est

In [None]:
# Dump algorithm to save 
file_name = "SVD_model_ex"
dump.dump(file_name, algo=algo)

### Reload Collaboraitve Filtering Model

In [17]:
# reload algorithm
from surprise import dump
file_name = "SVD_model_ex"
_, loaded_algo = dump.load(file_name)

## Hybrid Recommender combining Content Based and Collaborative Filtering Models

**Hybrid Recommender combining content based model and collaborative filtering model. Going to input the User ID and title of the movie and return top 20 movies that are similar based on both the metadata of the input movie and the user's preferences and predicted ratings of those movies.**

In [8]:
movies_df

Unnamed: 0,index,ID,IMDB ID,Title,Collection,Genres,Language,Spoken Languages,Release Date,Runtime,...,Production Countries,Popularity Rating,Vote Count,Vote Average,Keywords,Cast,Director,Writer,Producer,Metadata
0,0,461257,tt6980792,Queerama,,[],en,['en'],2017-06-09,75.0,...,['United Kingdom'],0.163015,,,[],[],daisyasquith,,,daisyasquith
1,1,92323,tt0081758,Willie and Phil,,[],en,[],1980-08-15,115.0,...,[],0.326500,,,[],"['michaelontkean', 'raysharkey', 'margotkidder']",paulmazursky,paulmazursky,,paulmazursky paulmazursky michaelontkean rays...
2,2,114838,tt0029949,Brother Rat,,['Comedy'],en,['en'],1938-10-29,87.0,...,['United States of America'],0.174691,,,['basedonplayormusical'],"['ronaldreagan', 'janewyman', 'priscillalane',...",williamkeighley,jerrywald,,williamkeighley jerrywald Comedy ronaldreagan...
3,3,264723,tt0070580,Le pélican,,[],en,[],1974-02-06,83.0,...,[],0.000115,,,[],[],gérardblain,,,gérardblain
4,4,88061,tt0055459,"So Evil, So Young",,['Drama'],en,['en'],1963-01-01,77.0,...,[],0.001662,,,"['prison', ""women'sprison""]","['jillireland', 'ellenpollock', 'joanhaythorne...",godfreygrayson,markgrantham,,godfreygrayson markgrantham Drama jillireland...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
42273,42273,24428,tt0848228,The Avengers,theavengerscollection,"['Science Fiction', 'Action', 'Adventure']",en,['en'],2012-04-25,143.0,...,['United States of America'],89.887648,12000.0,7.4,"['newyork', 'shield', 'marvelcomic', 'superher...","['robertdowneyjr.', 'chrisevans', 'markruffalo...",josswhedon,josswhedon,stanlee,josswhedon josswhedon stanlee theavengerscolle...
42274,42274,19995,tt0499549,Avatar,avatarcollection,"['Action', 'Adventure', 'Fantasy', 'Science Fi...",en,"['en', 'es']",2009-12-10,162.0,...,"['United States of America', 'United Kingdom']",185.070892,12114.0,7.2,"['cultureclash', 'future', 'spacewar', 'spacec...","['samworthington', 'zoesaldana', 'sigourneywea...",jamescameron,jamescameron,jamescameron,jamescameron jamescameron jamescameron avatarc...
42275,42275,155,tt0468569,The Dark Knight,thedarkknightcollection,"['Drama', 'Action', 'Crime', 'Thriller']",en,"['en', 'zh']",2008-07-16,152.0,...,"['United Kingdom', 'United States of America']",123.167259,12269.0,8.3,"['dccomics', 'crimefighter', 'secretidentity',...","['christianbale', 'michaelcaine', 'heathledger...",christophernolan,christophernolan,charlesroven,christophernolan christophernolan charlesroven...
42276,42276,27205,tt1375666,Inception,,"['Action', 'Thriller', 'Science Fiction', 'Mys...",en,['en'],2010-07-14,148.0,...,"['United Kingdom', 'United States of America']",29.108149,14075.0,8.1,"['lossoflover', 'dream', 'kidnapping', 'sleep'...","['leonardodicaprio', 'josephgordon-levitt', 'e...",christophernolan,christophernolan,christophernolan,christophernolan christophernolan christophern...


In [9]:
# read in movies ID map csv file
movies_ID_map = pd.read_csv("movies_ID_map.csv")
movies_ID_map

Unnamed: 0,Title,ID,Movie ID
0,Queerama,461257,176279
1,Willie and Phil,92323,112577
2,Brother Rat,114838,112548
3,Le pélican,264723,112510
4,"So Evil, So Young",88061,112467
...,...,...,...
42272,Deadpool,293660,122904
42273,The Avengers,24428,89745
42274,Avatar,19995,72998
42275,The Dark Knight,155,58559


We chose to ultimately create a Hybrid Recommender Model that combined the content based model and collaborative filtering model because we wanted to create a recommender that used both the user’s past preferences as well as other user’s preferences who were similar to the active user. To do this our initial goal was to input the User ID and title of the movie we wanted similar movies for and to output the top 20 movies that were similar to the input movie based on both the metadata of the input movie and the user’s preferences and predicted ratings of those movies. Thus, the movies that are recommended from this hybrid model are not only preferenced by similar movie features but also a predicted higher rating that the user would give to that movie. 

In [11]:
# method that produces recommendations based off of both metadata and user preferences
def hybrid_recommender(userID, title):
    index = indices[title]
    tmdbId = movies_ID_map.loc[movies_ID_map["Title"] == title]['ID']
    movie_id = movies_ID_map.loc[movies_ID_map["Title"] == title]['Movie ID']
    
    # take top 20 movies based on similarity scores to calculate the vote of the 60th percentile movie
    # then calculate the weighted rating of each movie using IMDB formula 
    sim_scores = list(enumerate(cosine_sim_loaded[int(index)]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:21]
    similar_movie_indices = [i[0] for i in sim_scores]
    
    movies = movies_df.iloc[similar_movie_indices][['Title', 'Vote Count', 'Vote Average', 'ID']]
    similar_movies_IDs = list(map(int, movies["ID"].values))

    for ID, index in zip(similar_movies_IDs, similar_movie_indices):
        movieID = int(movies_ID_map.loc[movies_ID_map["ID"] == ID]['Movie ID'])
        est_rating = loaded_algo.predict(userID, movieID).est
        movies.loc[index, "est"] = est_rating
        
    movies = movies.sort_values('est', ascending=False)
    return movies

Basic Testing of the Hybrid Recommender Comparing between User 10 and User 35

In [80]:
hybrid_recommender(10, 'The Avengers')

Unnamed: 0,Title,Vote Count,Vote Average,ID,est
42262,Iron Man,8951.0,7.4,1726,4.358564
36165,Team Thor,93.0,7.5,413279,4.226156
41411,Serenity,1287.0,7.4,16320,4.225701
42248,Captain America: Civil War,7462.0,7.1,271110,4.222177
42272,Deadpool,11444.0,7.4,293660,4.203255
32729,Marvel Studios: Assembling a Universe,44.0,6.6,259910,4.160966
42263,Iron Man 3,8951.0,6.8,68721,4.054722
42206,Doctor Strange,5880.0,7.1,284052,4.051236
42207,Captain America: The Winter Soldier,5881.0,7.6,100402,4.002187
42243,Captain America: The First Avenger,7174.0,6.6,1771,3.992446


In [81]:
hybrid_recommender(35, 'The Avengers')

Unnamed: 0,Title,Vote Count,Vote Average,ID,est
41411,Serenity,1287.0,7.4,16320,4.361905
42206,Doctor Strange,5880.0,7.1,284052,4.153873
42262,Iron Man,8951.0,7.4,1726,4.126833
36165,Team Thor,93.0,7.5,413279,4.112717
32729,Marvel Studios: Assembling a Universe,44.0,6.6,259910,4.099205
42248,Captain America: Civil War,7462.0,7.1,271110,4.091808
42272,Deadpool,11444.0,7.4,293660,4.054944
42216,Ant-Man,6029.0,7.0,102899,4.043048
42207,Captain America: The Winter Soldier,5881.0,7.6,100402,3.992822
42240,Avengers: Age of Ultron,6908.0,7.3,99861,3.966149


In [83]:
hybrid_recommender(10, 'Mean Girls')

Unnamed: 0,Title,Vote Count,Vote Average,ID,est
31692,Puella Magi Madoka Magica the Movie Part III: ...,36.0,7.3,212162,4.394826
14299,Live from New York!,5.0,5.4,334328,3.755567
34656,Just One of the Guys,64.0,6.4,24548,3.75471
38215,Geek Charming,188.0,6.0,81250,3.725091
28333,Screwballs,22.0,4.7,25164,3.665975
35695,Frenemies,83.0,5.2,84105,3.548348
39125,It's a Boy Girl Thing,279.0,6.3,37725,3.443764
37231,Zapped,131.0,5.6,278774,3.432579
41469,The DUFF,1372.0,6.8,272693,3.397726
35992,The Cheetah Girls,90.0,4.9,32293,3.391446


In [84]:
hybrid_recommender(35, 'Mean Girls')

Unnamed: 0,Title,Vote Count,Vote Average,ID,est
31692,Puella Magi Madoka Magica the Movie Part III: ...,36.0,7.3,212162,4.404914
14299,Live from New York!,5.0,5.4,334328,3.952208
35695,Frenemies,83.0,5.2,84105,3.521542
39125,It's a Boy Girl Thing,279.0,6.3,37725,3.472875
34656,Just One of the Guys,64.0,6.4,24548,3.401603
38215,Geek Charming,188.0,6.0,81250,3.387064
35099,How to Build a Better Boy,71.0,5.7,286987,3.364189
28333,Screwballs,22.0,4.7,25164,3.328902
37231,Zapped,131.0,5.6,278774,3.312382
41469,The DUFF,1372.0,6.8,272693,3.311737


## Demo of Hybrid Recommender System

In order to showcase the efficacy of our hybrid recommender system, we will be displaying the recommended movies and results of 3 different user profiles based on various movies they liked/disliked:

Movie Name (Movie ID) - User Rating

**User 1 -- Romantic Comedies Movies Fanatic:**  

User Rated movies: 
1. He’s Just Not That Into You (66203) - 4.5 
2. The Proposal (69406) - 5.0 
3. Bridget Jones’ Diary (4246) - 5.0  
4. 13 Going on 30 (7444) - 4.0 
5. What a Girl Wants (6266) - 3.5 
6. The Great Kidnapping (118510) - 1.5 
7. My Favorite Blonde (25884) - 3.0 
8. Being Ginger (151371) - 2.0   
9. The Blind Sunflowers (88652)- 1.0 
10. The Avengers (89745) - 3.5 

The movies that will be fed into the model to see the results are: 

a. My Big Fat Greek Wedding 

b. Father of the Bride 

**User 2 -- Action Movies Fanatic:** 

User Rated movies:
1. The Avengers (89745) - 5.0 
2. Captain America (26764) - 5.0 
3. Mission: Impossible (648) - 4.5  
4. National Treasure (8972) - 3.5 
5. Skyfall (96079) - 4.0 
6. Senseless (1746) - 1.0 
7. Get Over It (4168) - 2.5 
8. Moana (166461) - 1.5 
9. La La Land (164909) - 0.5 
10. Divergent (108190) - 3.0  

The movies that will be fed into the model to see the results are: 

a. Die Hard 

b. Lethal Weapon 

**User 3 -- Bollywood Movies Fanatic:**  

User Rated movies:
1. Zindagi Na Milegi Dobara (89554) - 5.0 
2. Rab Ne Bana Di Jodi (64650) - 5.0 
3. Dhoom (107707) - 4.5 
4. Om Shanti Om (56167) - 5.0  
5. Main Hoon Na (36083) - 4.5 
6. Bigfoot (103869) - 2.0 
7. Road to Paloma (143637) - 3.0 
8. Rocket Singh: Salesman of the Year (94661) - 1.5 
9. Hansel and Gretel (125401) - 1.0 
10. The Boy and the Pirates (107157) - 0.5 

The movies that will be fed into the model to see the results are: 

a. Dilwale Dulhania Le Jayenge 

b. Kabhi Khushi Kabhie Gham

In order for the model to work, we will be incorporating the user's ratings of each movies to create a user profile to add to our current dataset in order to produce user-based recommendations. 

In [10]:
ratings_df

Unnamed: 0,User ID,Movie ID,rating,Timestamp
0,1,110,1.0,1425941529
1,1,147,4.5,1425942435
2,1,858,5.0,1425941523
3,1,1221,5.0,1425941546
4,1,1246,5.0,1425941556
...,...,...,...,...
26024284,270896,58559,5.0,1257031564
26024285,270896,60069,5.0,1257032032
26024286,270896,63082,4.5,1257031764
26024287,270896,64957,4.5,1257033990


In [11]:
# method to add user ratings for movies to the ratings_df 
def add_user_rating(df, user_ID, movie_ID, rating, timestamp=1257031858):
    df.loc[len(df.index)] = [user_ID, movie_ID, rating, timestamp]

In [12]:
# user 270897 ratings 
user_270897 = [270897, 270897, 270897, 270897, 270897, 270897, 270897, 270897, 270897, 270897]
movie_IDs_270897 = [66203, 69406, 4246, 7444, 6266, 118510, 25884, 151371, 88652, 89745]
ratings_270897 = [4.5, 5.0, 5.0, 4.0, 3.5, 1.5, 3.0, 2.0, 1.0, 3.0]

# add user ratings to ratings_df
for user_ID, movie_ID, rating in zip(user_270897, movie_IDs_270897, ratings_270897):
    add_user_rating(ratings_df, user_ID, movie_ID, rating)
    
# user 270898 ratings 
user_270898 = [270898, 270898, 270898, 270898, 270898, 270898, 270898, 270898, 270898, 270898]
movie_IDs_270898 = [89745, 26764, 648, 8972, 96079, 1746, 4168, 166461, 164909, 108190]
ratings_270898 = [5.0, 5.0, 4.5, 3.5, 4.0, 1.0, 2.5, 1.5, 0.5, 3.0]

# add user ratings to ratings_df
for user_ID, movie_ID, rating in zip(user_270898, movie_IDs_270898, ratings_270898):
    add_user_rating(ratings_df, user_ID, movie_ID, rating)
    
# user 270899 ratings 
user_270899 = [270899, 270899, 270899, 270899, 270899, 270899, 270899, 270899, 270899, 270899]
movie_IDs_270899 = [89554, 64650, 107707, 56167, 36083, 103869, 143637, 94661, 125401, 107157]
ratings_270899 = [5.0, 5.0, 4.5, 5.0, 4.5, 2.0, 3.0, 1.5, 1.0, 0.5]

# add user ratings to ratings_df
for user_ID, movie_ID, rating in zip(user_270899, movie_IDs_270899, ratings_270899):
    add_user_rating(ratings_df, user_ID, movie_ID, rating)

ratings_df

Unnamed: 0,User ID,Movie ID,rating,Timestamp
0,1.0,110.0,1.0,1.425942e+09
1,1.0,147.0,4.5,1.425942e+09
2,1.0,858.0,5.0,1.425942e+09
3,1.0,1221.0,5.0,1.425942e+09
4,1.0,1246.0,5.0,1.425942e+09
...,...,...,...,...
26024314,270899.0,103869.0,2.0,1.257032e+09
26024315,270899.0,143637.0,3.0,1.257032e+09
26024316,270899.0,94661.0,1.5,1.257032e+09
26024317,270899.0,125401.0,1.0,1.257032e+09


In [7]:
reader = Reader()

In [8]:
data = Dataset.load_from_df(ratings_df[['User ID', 'Movie ID', 'rating']], reader)
algo = SVD()
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.7958  0.7958  0.7952  0.7962  0.7960  0.7958  0.0004  
MAE (testset)     0.6018  0.6018  0.6015  0.6020  0.6020  0.6018  0.0002  
Fit time          683.28  695.24  696.22  695.18  688.76  691.74  4.99    
Test time         60.27   57.21   51.74   57.78   57.82   56.97   2.82    


{'test_rmse': array([0.79580782, 0.79579842, 0.79515564, 0.79620523, 0.79598913]),
 'test_mae': array([0.60175944, 0.60179018, 0.60148886, 0.60201913, 0.60203228]),
 'fit_time': (683.2793922424316,
  695.2398734092712,
  696.2245357036591,
  695.1762704849243,
  688.7601597309113),
 'test_time': (60.27135252952576,
  57.21423673629761,
  51.74291276931763,
  57.77811813354492,
  57.81974959373474)}

In [9]:
training_data = data.build_full_trainset()
algo.fit(training_data)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7ff9851f3190>

In [10]:
# Dump algorithm to save 
file_name = "SVD_model_demo"
dump.dump(file_name, algo=algo)

#### Reload SVD demo 

In [13]:
# reload algorithm
from surprise import dump
file_name = "SVD_model_demo"
_, loaded_algo_demo = dump.load(file_name)

In [14]:
# testing new SVD model with added user profiles
user1_pred1 = loaded_algo_demo.predict(270897, 302, 3)
rating1 = user1_pred1.est
rating1

3.6565093784809406

In [15]:
# modified method with demo svd model and user friendly output
# method that produces recommendations based off of both metadata and user preferences
def hybrid_recommender(userID, title):
    index = indices[title]
    tmdbId = movies_ID_map.loc[movies_ID_map["Title"] == title]['ID']
    movie_id = movies_ID_map.loc[movies_ID_map["Title"] == title]['Movie ID']
    
    # take top 20 movies based on similarity scores to calculate the vote of the 60th percentile movie
    # then calculate the weighted rating of each movie using IMDB formula 
    sim_scores = list(enumerate(cosine_sim_loaded[int(index)]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:21]
    similar_movie_indices = [i[0] for i in sim_scores]
    
    movies = movies_df.iloc[similar_movie_indices][['Title', 'Vote Count', 'Vote Average', 'ID']]
    similar_movies_IDs = list(map(int, movies["ID"].values))

    for ID, index in zip(similar_movies_IDs, similar_movie_indices):
        movieID = int(movies_ID_map.loc[movies_ID_map["ID"] == ID]['Movie ID'])
        est_rating = loaded_algo_demo.predict(userID, movieID).est
        movies.loc[index, "est"] = est_rating
        
    movies = movies.sort_values('est', ascending=False)
    return list(movies["Title"]), movies

**User 1 (270897) - Romantic Comedies Movies Fanatic:**

In [17]:
# User 1 (270897) Movie Recc 1: My Big Fat Greek Wedding
u1_recc_titles1, u1_recc_movies1 = hybrid_recommender(270897, "My Big Fat Greek Wedding")
print("Hybrid Recommender Suggested Movies: \n", u1_recc_titles1)
u1_recc_movies1

Hybrid Recommender Suggested Movies: 
 ['The Big City', 'Father of the Bride', 'Brothers', 'Meet the Fockers', 'Days of Wine and Roses', 'The Front Page', 'Make Way for Tomorrow', 'The Wedding', 'Junebug', 'Guess Who', 'Herbstmilch', 'Aşk Kırmızı', 'Julia Misbehaves', 'The Ref', 'Lawyer Man', 'Five Nights in Maine', 'Monster-in-Law', "I Hate Valentine's Day", 'Over the Top', 'Date Movie']


Unnamed: 0,Title,Vote Count,Vote Average,ID,est
25303,The Big City,16.0,8.0,60567,3.924296
39630,Father of the Bride,355.0,6.2,11846,3.753976
40640,Brothers,650.0,6.8,7445,3.633077
41491,Meet the Fockers,1412.0,6.1,693,3.591512
32124,Days of Wine and Roses,39.0,7.5,32488,3.566119
34932,The Front Page,69.0,6.9,987,3.512676
31768,Make Way for Tomorrow,37.0,7.7,41059,3.487779
16918,The Wedding,7.0,6.6,10819,3.459577
34768,Junebug,66.0,6.5,1444,3.450242
38720,Guess Who,230.0,5.5,11638,3.357541


In [20]:
# User 1 (270897) Movie Recc 2: Father of the Bride
u1_recc_titles2, u1_recc_movies2 = hybrid_recommender(270897, 'Father of the Bride')
print("Hybrid Recommender Suggested Movies: \n", u1_recc_titles2)
u1_recc_movies2

Hybrid Recommender Suggested Movies: 
 ['Meet the Parents', 'The Intern', 'My Big Fat Greek Wedding', 'The Holiday', "Something's Gotta Give", 'What Women Want', 'Counting Backwards', 'Lucky 7', 'Make Way for Tomorrow', 'Путь к себе', 'Shopgirl', 'Father of the Bride Part II', 'Baby Boom', "It's Complicated", 'Private Benjamin', '¡A volar joven!', 'The Parent Trap', 'Irreconcilable Differences', 'Little Fockers', 'I Love Trouble']


Unnamed: 0,Title,Vote Count,Vote Average,ID,est
41681,Meet the Parents,1743.0,6.6,1597,3.854781
41741,The Intern,1926.0,7.1,257211,3.84429
40715,My Big Fat Greek Wedding,686.0,6.2,8346,3.760154
41391,The Holiday,1259.0,6.7,1581,3.699525
39986,Something's Gotta Give,422.0,6.3,6964,3.68308
41192,What Women Want,1021.0,6.1,3981,3.528582
4304,Counting Backwards,1.0,7.0,100634,3.510288
26474,Lucky 7,18.0,4.9,45693,3.506817
31768,Make Way for Tomorrow,37.0,7.7,41059,3.487779
3800,Путь к себе,1.0,7.0,420481,3.426556


From these results, it is evident that the hybrid recommender successfully produces movies that are both similar in genre but also similar to the user's previous ratings and preferences for romantic comedies. The hybrid model also returns the predicted user rating for each suggested movie and ranks from highest to lowest rating, indicating that the user's movie likes and dislikes are playing a role in the movies that are suggested, instead of just solely based off of the metadata. 

**User 2 (270898) - Action Movies Fanatic:** 

In [21]:
# User 2 (270898) Movie Recc 1: Die Hard
u2_recc_titles1, u2_recc_movies1 = hybrid_recommender(270898, 'Die Hard')
print("Hybrid Recommender Suggested Movies: \n", u2_recc_titles1)
u2_recc_movies1

Hybrid Recommender Suggested Movies: 
 ['Live Free or Die Hard', 'Die Hard 2', 'Hitch Hike', 'Die Hard: With a Vengeance', 'A Life Less Ordinary', 'Commando', 'Death Walks on High Heels', 'Killing Zoe', 'Blast', 'Wind River', 'Executive Decision', 'Crash Landing', 'First Kill', 'Hear No Evil', 'End Game', '7 Seconds', 'A Good Day to Die Hard', 'Stratton', 'Extraction', 'Street Fighter']


Unnamed: 0,Title,Vote Count,Vote Average,ID,est
41808,Live Free or Die Hard,2122.0,6.4,1571,3.627363
41737,Die Hard 2,1920.0,6.6,1573,3.572741
28286,Hitch Hike,22.0,6.5,54523,3.571463
41798,Die Hard: With a Vengeance,2094.0,6.9,1572,3.534645
37216,A Life Less Ordinary,130.0,6.2,8067,3.461202
40835,Commando,753.0,6.4,10999,3.401308
19932,Death Walks on High Heels,9.0,6.7,77029,3.347479
36744,Killing Zoe,111.0,6.1,507,3.288906
23130,Blast,12.0,4.6,24775,3.222696
38150,Wind River,181.0,7.4,395834,3.208929


In [22]:
# User 2 (270898) Movie Recc 2: Lethal Weapon
u2_recc_titles2, u2_recc_movies2 = hybrid_recommender(270898, 'Lethal Weapon')
print("Hybrid Recommender Suggested Movies: \n", u2_recc_titles2)
u2_recc_movies2

Hybrid Recommender Suggested Movies: 
 ['Lethal Weapon 4', 'Lethal Weapon 2', 'Confidence', 'Lethal Weapon 3', 'Under New Management', 'Guardian Angels', 'Sweet Alibis', 'Once a Thief', 'I Accidentally Domed Your Son', 'Love, Honour and Obey', 'Red Eagle', 'Johnny Dangerously', 'Turner & Hooch', 'Kill Me Later', 'Triggermen', 'Point Break', 'Rege', 'Homegrown', 'Money for Nothing', 'Bitch Slap']


Unnamed: 0,Title,Vote Count,Vote Average,ID,est
40884,Lethal Weapon 4,782.0,6.3,944,3.561604
41235,Lethal Weapon 2,1066.0,6.7,942,3.547417
37160,Confidence,127.0,6.4,10743,3.45978
40934,Lethal Weapon 3,824.0,6.4,943,3.448096
5654,Under New Management,2.0,6.0,62399,3.2938
32605,Guardian Angels,43.0,6.0,51191,3.242955
4353,Sweet Alibis,1.0,6.0,254155,3.236568
30052,Once a Thief,28.0,6.7,47423,3.215961
4822,I Accidentally Domed Your Son,1.0,7.0,77118,3.181557
28405,"Love, Honour and Obey",23.0,6.6,18002,3.17268


From these results, it is evident that the hybrid recommender successfully produces movies that are both similar in the action genre but also similar to the user's previous ratings and preferences for action/thriller movies. The hybrid model also returns the predicted user rating for each suggested movie and ranks from highest to lowest rating, indicating that the user's movie likes and dislikes are playing a role in the movies that are suggested, instead of just solely based off of the metadata. 

**User 3 (270899) - Bollywood Movies Fanatic:**

In [23]:
# User 3 (270899) Movie Recc 1: Dilwale Dulhania Le Jayenge
u3_recc_titles1, u3_recc_movies1 = hybrid_recommender(270899, 'Dilwale Dulhania Le Jayenge')
print("Hybrid Recommender Suggested Movies: \n", u3_recc_titles1)
u3_recc_movies1

Hybrid Recommender Suggested Movies: 
 ['Rab Ne Bana Di Jodi', 'Om Shanti Om', 'Veer-Zaara', 'Silja - nuorena nukkunut', 'Daawat-e-Ishq', 'A perfect match', 'All The Days Before Tomorrow', 'Aşk Kırmızı', 'Boynton Beach Club', 'Divine Intervention', 'Love Stories', 'Jab Tak Hai Jaan', 'Dil To Pagal Hai', 'The Bridal Party in Hardanger', 'The Trouble with Dee Dee', 'Babycakes', 'The Matriarch', 'Chalte Chalte', 'My Name Is Khan', 'Mujhse Dosti Karoge!']


Unnamed: 0,Title,Vote Count,Vote Average,ID,est
34655,Rab Ne Bana Di Jodi,64.0,6.7,14072,3.847915
35766,Om Shanti Om,85.0,7.2,8079,3.601545
34850,Veer-Zaara,67.0,7.5,4251,3.594865
158,Silja - nuorena nukkunut,,,468343,3.593886
25200,Daawat-e-Ishq,15.0,6.8,291154,3.503471
2222,A perfect match,,,52738,3.501428
1275,All The Days Before Tomorrow,,,19509,3.500995
10532,Aşk Kırmızı,3.0,5.3,210408,3.497808
10839,Boynton Beach Club,3.0,6.8,55831,3.401184
894,Divine Intervention,,,99885,3.333921


In [24]:
# User 3 (270899) Movie Recc 2: Kabhi Khushi Kabhie Gham
u3_recc_titles2, u3_recc_movies2 = hybrid_recommender(270899, 'Kabhi Khushi Kabhie Gham')
print("Hybrid Recommender Suggested Movies: \n", u3_recc_titles2)
u3_recc_movies2

Hybrid Recommender Suggested Movies: 
 ['Kuch Kuch Hota Hai', 'Dilwale Dulhania Le Jayenge', 'Dark Blue Almost Black', 'Wrong Side Up', 'The Living Room of the Nation', 'Veer-Zaara', 'Elizabeth Ekadashi', 'Prince of Broadway', 'Kal Ho Naa Ho', 'Chupke Chupke', "The Fourth Annual 'On Cinema' Oscar Special", 'The Flying Dutchman', 'Return to Me', 'The Trouble with Dee Dee', 'Ae Dil Hai Mushkil', 'The Matriarch', 'Yedyanchi Jatra', 'Kabhi Alvida Naa Kehna', 'My Name Is Khan', 'Student of the Year']


Unnamed: 0,Title,Vote Count,Vote Average,ID,est
36169,Kuch Kuch Hota Hai,93.0,7.5,11854,3.85927
40663,Dilwale Dulhania Le Jayenge,661.0,9.1,19404,3.84059
28607,Dark Blue Almost Black,23.0,6.7,3865,3.710355
5643,Wrong Side Up,2.0,6.0,26826,3.679387
7459,The Living Room of the Nation,2.0,4.8,266740,3.625779
34850,Veer-Zaara,67.0,7.5,4251,3.594865
1937,Elizabeth Ekadashi,,,305594,3.591911
10026,Prince of Broadway,3.0,6.7,87081,3.552827
36719,Kal Ho Naa Ho,110.0,7.3,4254,3.543006
20663,Chupke Chupke,10.0,7.5,21459,3.482539


From these results, it is evident that the hybrid recommender successfully produces movies that are both similar in the international Bollywood genre but also similar to the user's previous ratings and preferences for Bollywood movies. The hybrid model also returns the predicted user rating for each suggested movie and ranks from highest to lowest rating, indicating that the user's movie likes and dislikes are playing a role in the movies that are suggested, instead of just solely based off of the metadata. 

## Models Comparison

In order to display the benefits of a hybrid recommender system, the following code will test each model with the same 7 movies and 2 user profiles to display the varying results of both the individual content-based and collabroative filtering models and the combination in the hybrid model. 

To compare the results of each recommender, the 3 models will be tested with the following 7 movies: 
- Legally Blonde - 106910
- Kabhi Khushi Kabhie Gham - 27362
- The Avengers - 89745
- The Dark Knight - 58559
- Harry Potter and the Philosopher's Stone - 4896
- Tangled - 81847
- Cinderella - 130073

The user profiles will be Users 32, 7954, and 270898 that were randomly selected from the user ratings dataset, as they each have varying movie tastes that are included in this selection of 7 movies to test with. 

In [32]:
movie_names = ["Legally Blonde", "Kabhi Khushi Kabhie Gham", "The Avengers", "The Dark Knight",
         "Harry Potter and the Philosopher's Stone", "Tangled", "Cinderella"]

movie_ids = [106910, 27362, 89745, 58559, 4896, 81847, 130073]
user_ids = [32, 7954, 270898]

### Testing Content Based Recommender System

In [34]:
for movie in movie_names:
    print("Movie:", movie)
    print("Recommendations:\n", get_recommendations(movie), "\n")

Movie: Legally Blonde
Recommendations:
 ['Legally Blonde 2: Red, White & Blonde', 'The Wendell Baker Story', 'Four Christmases', 'Wild', 'Penelope', 'Cruel Intentions', 'Mentor', 'The Last Templar', 'Legally Blondes', 'Tenure', 'Girl', 'Godspell: A Musical Based on the Gospel According to St. Matthew', 'Finding Bliss', 'Life With Mikey', 'Overnight Delivery', 'Dear Eleanor', 'Ordinary World', 'Meadowland', 'Pretty Persuasion'] 

Movie: Kabhi Khushi Kabhie Gham
Recommendations:
 ['Kal Ho Naa Ho', 'Student of the Year', 'The Matriarch', 'Kuch Kuch Hota Hai', 'Kabhi Alvida Naa Kehna', 'My Name Is Khan', 'Ae Dil Hai Mushkil', 'The Trouble with Dee Dee', 'Elizabeth Ekadashi', 'Yedyanchi Jatra', 'Chupke Chupke', 'Dilwale Dulhania Le Jayenge', 'Wrong Side Up', 'The Flying Dutchman', 'The Living Room of the Nation', 'Prince of Broadway', 'Return to Me', 'Dark Blue Almost Black', 'Veer-Zaara'] 

Movie: The Avengers
Recommendations:
 ['Avengers: Age of Ultron', 'Captain America: The Winter Soldi

### Testing Collaborative Filtering Recommender System

Surprise package predict method takes in: 
- uid = user ID
- iid = item id 
- rui = true rating -- optional

In [33]:
for user_id in user_ids:
    print("User ID:", user_id, "\n")
    for movie, movie_id in zip(movie_names, movie_ids):
        print("Movie:", movie)
        print("Predicted User Rating:", loaded_algo_demo.predict(movie_id, user_id).est, "\n")
    print("\n")

User ID: 32 

Movie: Legally Blonde
Predicted User Rating: 3.6934934876614434 

Movie: Kabhi Khushi Kabhie Gham
Predicted User Rating: 4.1802906024813575 

Movie: The Avengers
Predicted User Rating: 4.948186468730362 

Movie: The Dark Knight
Predicted User Rating: 3.823194515488192 

Movie: Harry Potter and the Philosopher's Stone
Predicted User Rating: 3.8382930494494225 

Movie: Tangled
Predicted User Rating: 3.683635496351337 

Movie: Cinderella
Predicted User Rating: 3.3483236642270393 



User ID: 7954 

Movie: Legally Blonde
Predicted User Rating: 2.92933795379747 

Movie: Kabhi Khushi Kabhie Gham
Predicted User Rating: 2.9390543172382326 

Movie: The Avengers
Predicted User Rating: 3.3426800816452267 

Movie: The Dark Knight
Predicted User Rating: 2.8143656348049277 

Movie: Harry Potter and the Philosopher's Stone
Predicted User Rating: 3.0734163658775433 

Movie: Tangled
Predicted User Rating: 3.0705188458955885 

Movie: Cinderella
Predicted User Rating: 2.4473142252813513 




### Testing Hybrid Recommender System

In [35]:
for user_id in user_ids:
    print("User ID:", user_id, "\n")
    for movie in movie_names:
        print("Movie:", movie)
        titles, movies = hybrid_recommender(user_id, movie)
        print("Suggested Movies and Predicted User Ratings: \n")
        for movie, rating in zip(list(movies["Title"]), list(movies["est"])):
            print(movie, rating)
        print("\n")

User ID: 32 

Movie: Legally Blonde
Suggested Movies and Predicted User Ratings: 

Godspell: A Musical Based on the Gospel According to St. Matthew 3.9334896722137738
Pretty Persuasion 3.781224660451396
Cruel Intentions 3.746325943668475
Wild 3.7350694409513294
Girl 3.723586531289791
Overnight Delivery 3.721213307054958
Tenure 3.6657161656638615
Meadowland 3.6464649707900874
The Man in the Moon 3.583872461106615
Mentor 3.5818062281133547
Dear Eleanor 3.556816034437301
The Last Templar 3.540615656821905
Penelope 3.402715989758105
Ordinary World 3.3224954607287582
The Wendell Baker Story 3.3022977886417704
Life With Mikey 3.293558262474821
Four Christmases 3.2689747023010414
Finding Bliss 3.2679770091376534
Legally Blonde 2: Red, White & Blonde 3.0550069382143725
Legally Blondes 2.8295076745077523


Movie: Kabhi Khushi Kabhie Gham
Suggested Movies and Predicted User Ratings: 

Dilwale Dulhania Le Jayenge 4.278663970650635
Kuch Kuch Hota Hai 4.148588462335488
Chupke Chupke 4.1133687030271

Suggested Movies and Predicted User Ratings: 

Three Wishes for Cinderella 3.9937832433595566
Το γάλα 3.9219585174450753
Counting Backwards 3.7737440745319377
Prince and the Evening Star 3.7181623116803078
Cirque du Soleil: Varekai 3.7160264693805964
Ever After: A Cinderella Story 3.643994560486242
National Geographic American Blackout 3.640569039090282
Sleeping Beauty 3.5903781259746617
Jill And Joy's Winter 3.5753113011982234
Joni 3.5736446990015382
More Than a Miracle 3.568924865351908
A Journey Through Fairyland 3.539523663505201
Aşk Kırmızı 3.503275337771978
The Cave of the Golden Rose 3.412699266603725
Bhoot Unkle 3.2758808957386836
Sundome 3.2622672991748796
Jails, Hospitals & Hip-Hop 3.18585868166966
Cinderella III: A Twist in Time 3.0158728512904007
Ill Gotten Gains 2.823800725544186
Cinderella II: Dreams Come True 2.612152335007203


User ID: 270898 

Movie: Legally Blonde
Suggested Movies and Predicted User Ratings: 

Godspell: A Musical Based on the Gospel According to St. 

From the results of the 3 models, it is evident that the hybrid recommender system performs the best at recommending movies tailored to each user while also incorporating features about the movie's metadata. While the content-based recommender solely recommends movies based on the movie features and the content-based recommender reports predicted ratings based on the user's past movie preferences, the hybrid model combines the strengths of both models and suggests movies that not only have similar metadata features but also that the user is predicted to give a higher rating to. 