## Hybrid Recommender

The weaknesses of content-based and collaborative filtering algorithm imply that neither of the the two can make good recommedations by themselves. A third approach combines their merits and overcomes their weaknesses. This notebook describes how a hybrid recommender is built by ensembling the two filtering algorithms. 

In [1]:
import pandas as pd
import numpy as np
from scipy import sparse
import pickle
from sklearn.metrics.pairwise import cosine_similarity

### Data Imports

In [2]:
df_content = pd.read_csv('../data/clean_content.csv')
df_content.head()

Unnamed: 0,movie_id,title,genres,year,tmdb_id,imdb_id,tmdb_rating,tmdb_votes,imdb_rating,imdb_votes,body,sentiment_score,weighted_rating
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,1995,862,tt0114709,7.7,5415,8.3,956821,led woody andys toy live happily room andys bi...,0.8625,1.60957
1,2,Jumanji (1995),Adventure|Children|Fantasy,1995,8844,tt0113497,6.9,2413,7.0,334566,sibling judy peter discover enchanted board ga...,0.3612,0.616703
2,3,Grumpier Old Men (1995),Comedy|Romance,1995,15602,tt0113228,6.5,92,6.6,26930,family wedding reignites ancient feud nextdoor...,0.9081,0.111535
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance,1995,31357,tt0114885,6.1,34,5.9,10784,cheated mistreated stepped woman holding breat...,0.9725,-0.327571
4,5,Father of the Bride Part II (1995),Comedy,1995,11862,tt0113041,5.7,173,6.0,37433,george bank recovered daughter wedding receive...,0.6486,-0.844647


In [4]:
df_user = pd.read_csv('../data/ratings_title.csv')
df_user.head()

Unnamed: 0,userId,movieId,rating,title,genres,year
0,1,1,4.0,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,1995
1,5,1,4.0,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,1995
2,7,1,4.5,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,1995
3,15,1,2.5,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,1995
4,17,1,4.5,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,1995


In [5]:
df_user.rename(columns={'userId':'user_id', 'movieId':'movie_id'}, inplace=True)

### Content-Based Filtering

Load `content_similarity_matrix.pkl`

In [6]:
content_similarity = pickle.load(open('../data/movie_similarity_matrix.pkl','rb'))

Create a dataframe from the similarity matrix.

In [8]:
df_content_sim = pd.DataFrame(content_similarity, index=df_content['title'].values, 
             columns=df_content['title'].values)

Get content similarity for user `569` based on viewing history.

In [9]:
df_current_user = df_user[df_user['user_id'] == 569]
df_current_user

Unnamed: 0,user_id,movie_id,rating,title,genres,year
760,569,50,3.0,"Usual Suspects, The (1995)",Crime|Mystery|Thriller,1995
1480,569,231,3.0,Dumb & Dumber (Dumb and Dumber) (1994),Adventure|Comedy,1994
2102,569,296,5.0,Pulp Fiction (1994),Comedy|Crime|Drama|Thriller,1994
2250,569,316,4.0,Stargate (1994),Action|Adventure|Sci-Fi,1994
2415,569,349,4.0,Clear and Present Danger (1994),Action|Crime|Drama|Thriller,1994
2727,569,356,3.0,Forrest Gump (1994),Comedy|Drama|Romance|War,1994
3405,569,480,4.0,Jurassic Park (1993),Action|Adventure|Sci-Fi|Thriller,1993
4105,569,590,4.0,Dances with Wolves (1990),Adventure|Drama|Western,1990
4287,569,592,3.0,Batman (1989),Action|Crime|Thriller,1989
20105,569,588,4.0,Aladdin (1992),Adventure|Animation|Children|Comedy|Musical,1992


In [10]:
def get_content_similar_movies(user):
    
    #Current/target user
    df_current_user = df_user[df_user['user_id'] == user]
    
    #Movies watched by the current/target user
    user_watched_movies = df_current_user['title'].values
    
    #User's mean rating
    user_mean_rating = df_current_user['rating'].mean()
    
    #Filter the list of movies by like/dislike based on user's rating
    user_movies = []
    for movie in user_watched_movies:
        if df_current_user[df_current_user['title'] == movie]['rating'].values >= user_mean_rating:
            user_movies.append(movie)
            
    #Create an empty dataframe to store movie recommendations for each movie seen by the user
    similar_movies = pd.DataFrame()
    #Loop through each movie seen by the user
    for movie in user_movies:
        #Add similarity score for each movie with user_movie
        #Remove movies that the user has already seen
        similar_movies = similar_movies.append(df_content_sim[movie].drop(user_watched_movies))
    #Add the similarity score of each movie and select the movies with high scores
    content_rec = pd.DataFrame(similar_movies.sum()).reset_index().rename(columns={'index': 'title',
                        0: 'content_similarity'})
    return pd.merge(df_content[['title', 'genres']], content_rec, how='inner').sort_values(by='content_similarity', ascending=False)

In [11]:
content_based_scores = get_content_similar_movies(569)
content_based_scores[0:10]

Unnamed: 0,title,genres,content_similarity
8536,Jurassic World (2015),Action|Adventure|Drama|Sci-Fi|Thriller,0.735135
2230,Live and Let Die (1973),Action|Adventure|Thriller,0.735012
1532,"Return of Jafar, The (1994)",Adventure|Animation|Children|Fantasy|Musical|R...,0.719296
5879,Batman Begins (2005),Action|Crime|,0.700795
7719,"Dark Knight Rises, The (2012)",Action|Adventure|Crime|,0.675353
8052,"Good Day to Die Hard, A (2013)",Action|Crime|Thriller|,0.645506
1876,Rollercoaster (1977),Drama|Thriller,0.640544
771,Die Hard (1988),Action|Crime|Thriller,0.638042
6477,Live Free or Die Hard (2007),Action|Adventure|Crime|Thriller,0.637135
2419,Patriot Games (1992),Action|Crime|Drama|Thriller,0.632126


### Collaborative Filtering

In [12]:
#User-item interaction matrix
user_item = df_user.pivot_table(values = 'rating', index = 'user_id', columns= 'title') 

In [13]:
#Normalize user-item matrix
norm_user_item = user_item.subtract(user_item.mean(axis=1), axis = 'rows')

In [14]:
#User-User similarity matrix
user_similarity = cosine_similarity(sparse.csr_matrix(norm_user_item.fillna(0)))

In [15]:
#Convert similarity matrix into a dataframe
df_user_sim = pd.DataFrame(user_similarity, index=user_item.index, columns=user_item.index)

Find the top smilar users and their similarity score to the target user. Set a threshold on similarity score to filter similar users. This is important as each similar user's rating will be considered to predict the rating by target user. 

In [16]:
def get_user_similar_movies(user, similarity_threshold):
    
    #Extract similar users and their similarity score with the target user
    similar_users = df_user_sim[df_user_sim[user] > similarity_threshold][user].sort_values(ascending=False)[1:]
    
    #Extract movies watched by the target user and their score with the target user
    target_user_movies = norm_user_item[norm_user_item == user].dropna(axis =1, how= 'all')
    
    #Extract movies watched by similar users and their score with the similar users
    similar_user_movies = norm_user_item[norm_user_item.index.isin(similar_users.index)].dropna(axis=1, how = 'all')
    
    #Keep the movies watched by similar users but not by the target user: 
    for column in target_user_movies.columns: 
        if column in similar_user_movies.columns:
            similar_user_movies.drop(column, axis=1, inplace=True)
            
    #Weighted average
    movie_score = {}
    #Loop through the movies seen by similar users
    for movie in similar_user_movies.columns:
        #Extract the rating for each movie
        movie_rating = similar_user_movies[movie]
        #Variable to calculate numerator of the weighted average
        #This must be calculated for each movie
        numerator = 0
        #Variable to calculate the denominator of the weighted average
        denominator = 0
        #Loop through the similar users for that movie
        for user in similar_users.index:
            #If the similar user has seen the movie
            if pd.notnull(movie_rating[user]):
                #Weighted score is the product of user similarity score and movie rating by the similar user
                weighted_score = similar_users[user] * movie_rating[user]
                numerator += weighted_score
                denominator += similar_users[user]
        movie_score[movie] = numerator / denominator
    #Save the movie and the similarity score in a dataframe
    movie_score = pd.DataFrame(movie_score.items(), columns=['title', 'user_similarity'])
    user_rec = pd.merge(df_content[['title','genres','year']], movie_score[['title', 'user_similarity']], how='inner')
    return user_rec.sort_values(by=['user_similarity', 'year'], ascending=False)

In [17]:
user_based_scores = get_user_similar_movies(569, .1)
user_based_scores[0:10]

Unnamed: 0,title,genres,year,user_similarity
446,Wild Tales (2014),Comedy|Drama|Thriller,2014,1.746544
431,Prisoners (2013),Drama|Mystery|Thriller,2013,1.746544
415,Horrible Bosses (2011),Comedy|Crime,2011,1.746544
379,No Country for Old Men (2007),Crime|Drama,2007,1.746544
329,Along Came Polly (2004),Comedy|Romance,2004,1.746544
330,50 First Dates (2004),Comedy|Romance,2004,1.746544
333,Kill Bill: Vol. 2 (2004),Action|Drama|Thriller,2004,1.746544
317,Anger Management (2003),Comedy,2003,1.746544
323,Duplex (2003),Comedy|Crime,2003,1.746544
325,Kill Bill: Vol. 1 (2003),Action|Crime|Thriller,2003,1.746544


### Hybrid Recommender

In [18]:
def hybrid_recommender(user):
    content_user_scores = pd.merge(get_content_similar_movies(user), get_user_similar_movies(user, 0.1))
    content_user_scores['similarity_score'] = (content_user_scores['content_similarity'] + content_user_scores['user_similarity']) / 2
    top_scores = content_user_scores.sort_values(by=['similarity_score', 'year'], ascending=False)[:10]
    recommendations = pd.merge(df_content[['title','genres','imdb_rating', 'tmdb_rating']], top_scores[['title','similarity_score']], on='title')
    recommendations.rename(columns={'title':'Movie Title', 'imdb_rating': 'IMDb Rating', 'tmdb_rating':'TMDB rating', 'similarity_score':'Similarity Score'}, inplace=True)
    return recommendations.sort_values(by='Similarity Score', ascending=False)

In [19]:
#Movies watched by target user
df_current_user

Unnamed: 0,user_id,movie_id,rating,title,genres,year
760,569,50,3.0,"Usual Suspects, The (1995)",Crime|Mystery|Thriller,1995
1480,569,231,3.0,Dumb & Dumber (Dumb and Dumber) (1994),Adventure|Comedy,1994
2102,569,296,5.0,Pulp Fiction (1994),Comedy|Crime|Drama|Thriller,1994
2250,569,316,4.0,Stargate (1994),Action|Adventure|Sci-Fi,1994
2415,569,349,4.0,Clear and Present Danger (1994),Action|Crime|Drama|Thriller,1994
2727,569,356,3.0,Forrest Gump (1994),Comedy|Drama|Romance|War,1994
3405,569,480,4.0,Jurassic Park (1993),Action|Adventure|Sci-Fi|Thriller,1993
4105,569,590,4.0,Dances with Wolves (1990),Adventure|Drama|Western,1990
4287,569,592,3.0,Batman (1989),Action|Crime|Thriller,1989
20105,569,588,4.0,Aladdin (1992),Adventure|Animation|Children|Comedy|Musical,1992


In [20]:
#Movies recommended by hybrid recommender
hybrid_recommender(569)

Unnamed: 0,Movie Title,genres,IMDb Rating,TMDB rating,Similarity Score
5,Kill Bill: Vol. 1 (2003),Action|Crime|Thriller,8.2,7.7,1.046144
7,No Country for Old Men (2007),Crime|Drama,8.2,7.7,1.022244
6,Kill Bill: Vol. 2 (2004),Action|Drama|Thriller,8.0,7.7,1.016691
4,Donnie Darko (2001),Drama|Mystery|Sci-Fi|Thriller,8.0,7.7,1.016484
8,Horrible Bosses (2011),Comedy|Crime,6.9,6.4,1.01491
1,"Clockwork Orange, A (1971)",Crime|Drama|Sci-Fi|Thriller,8.3,8.0,1.013266
3,Mulholland Drive (2001),Crime|Drama|Film-Noir|Mystery|Thriller,7.9,7.7,0.997443
0,Red Rock West (1992),Thriller,7.0,6.4,0.967125
9,Wild Tales (2014),Comedy|Drama|Thriller,8.1,7.7,0.966625
2,Cast Away (2000),Drama,7.8,7.5,0.963425


The hybrid recommender does a better job than the two filtering approaches individually. The recommender has recommended a more diverse list of movies, that contains Action, Crime, Thriller, Drama & Comedy. It adds a contextual relevance to the ratings from collaborative filter and adds diversity and popularity relevance to the content-based filter. 

### Recommendations for New User

The recommendation engine built above works well with existing users in the database, with atleast 20 user-item interactions. The following recommender has been built to give recommendations to a new user. 

Let's create a user number `611` with following preferences: 

In [20]:
new_user_data = [('Copycat (1995)', 3), ('First Knight (1995)', 3.5), ("Muriel's Wedding (1994)", 0.5),
                ('So I Married an Axe Murderer (1993)', 0.5), ('Bridge on the River Kwai, The (1957)', 3.5), 
                ('Grease (1978)', 3.5), ('Last of the Mohicans, The (1992)', 4.5), ('Sneakers (1992)', 3.5), ('Chasing Amy (1997)', 0.5), 
                ('Untouchables, The (1987)', 3.5),('Romancing the Stone (1984)', 3.5),('South Park: Bigger, Longer and Uncut (1999)', 4.5),
                ('Talented Mr. Ripley, The (1999)', 1.5), ('Shrek (2001)', 5),('Lord of the Rings: The Fellowship of the Ring, The (2001)', 5)]

In [21]:
new_userId = df_user['user_id'].sort_values().values[-1] + 1
new_user = []

for movie,rating in new_user_data:
    new_ratings = {}
    new_ratings['user_id'] = new_userId
    new_ratings['rating'] = rating
    new_ratings['movie_id'] = df_content.loc[df_content['title'] == movie, 'movie_id'].values[0]
    new_ratings['title'] = movie
    new_ratings['genres'] = df_content.loc[df_content['title'] == movie, 'genres'].values[0]
    new_ratings['year'] = df_content[df_content['title'] == movie]['year'].values[0]
    new_user.append(new_ratings)
df_new_user = pd.DataFrame(new_user).drop_duplicates()

In [22]:
df_new_user[['user_id', 'movie_id', 'rating', 'title', 'genres', 'year']]

Unnamed: 0,user_id,movie_id,rating,title,genres,year
0,611,22,3.0,Copycat (1995),Crime|Drama|Horror|Mystery|Thriller,1995
1,611,168,3.5,First Knight (1995),Action|Drama|Romance,1995
2,611,342,0.5,Muriel's Wedding (1994),Comedy,1994
3,611,543,0.5,So I Married an Axe Murderer (1993),Comedy|Romance|Thriller,1993
4,611,1250,3.5,"Bridge on the River Kwai, The (1957)",Adventure|Drama|War,1957
5,611,1380,3.5,Grease (1978),Comedy|Musical|Romance,1978
6,611,1408,4.5,"Last of the Mohicans, The (1992)",Action|Romance|War|Western,1992
7,611,1396,3.5,Sneakers (1992),Action|Comedy|Crime|Drama|Sci-Fi,1992
8,611,1639,0.5,Chasing Amy (1997),Comedy|Drama|Romance,1997
9,611,2194,3.5,"Untouchables, The (1987)",Action|Crime|Drama,1987


In [23]:
df_user = pd.concat([df_user, df_new_user])

In [24]:
#User-Item MAtrix
user_item = df_user.pivot_table(values = 'rating', index = 'user_id', columns= 'title')  #Changed from movieId to title

In [25]:
#Normalize User-Item Matrix
norm_user_item = user_item.subtract(user_item.mean(axis=1), axis = 'rows')

In [26]:
#User-User similarity matrix
user_similarity = cosine_similarity(sparse.csr_matrix(norm_user_item.fillna(0)))

In [27]:
df_user_sim = pd.DataFrame(user_similarity, index=user_item.index, columns=user_item.index)

In [37]:
def hybrid_recommender(user):
    content_user_scores = pd.merge(get_content_similar_movies(user), get_user_similar_movies(user, 0.1))
    content_user_scores['similarity_score'] = (content_user_scores['content_similarity'] + content_user_scores['user_similarity']) / 2
    top_scores = content_user_scores.sort_values(by='similarity_score', ascending=False)[:10]
    recommendations = pd.merge(df_content[['title','genres','imdb_rating', 'tmdb_rating']], top_scores[['title','content_similarity', 'user_similarity','similarity_score']], on='title')
    #recommendations.rename(columns={'title':'Movie Title', 'imdb_rating': 'IMDb Rating', 'tmdb_rating':'TMDB rating', 'similarity_score':'Similarity Score'}, inplace=True)
    return recommendations.sort_values(by='similarity_score', ascending=False)
    #recommendations = pd.merge(df_content[['title','vote_average', 'vote_count']], top_scores[['title', 'similarity_score']], on='title')
    #return recommendations.sort_values(by='similarity_score', ascending=False)

In [38]:
get_content_similar_movies(611)[:10]

Unnamed: 0,title,genres,content_similarity
43,Pocahontas (1995),Animation|Children|Drama|Musical|Romance,0.556938
8008,"Hobbit: An Unexpected Journey, The (2012)",Adventure|Fantasy|,0.536777
856,Monty Python and the Holy Grail (1975),Adventure|Comedy|Fantasy,0.516482
5135,Shrek 2 (2004),Adventure|Animation|Children|Comedy|Musical|Ro...,0.498551
1792,"Jewel of the Nile, The (1985)",Action|Adventure|Comedy|Romance,0.496085
5584,Merlin (1998),Action|Adventure|Drama|Fantasy|Romance,0.476997
8250,"Hobbit: The Desolation of Smaug, The (2013)",Adventure|Fantasy|,0.470085
8568,The Hobbit: The Battle of the Five Armies (2014),Adventure|Fantasy,0.465219
7320,Shrek Forever After (a.k.a. Shrek: The Final C...,Adventure|Animation|Children|Comedy|Fantasy|,0.45585
4113,"Lord of the Rings: The Two Towers, The (2002)",Adventure|Fantasy,0.445944


In [39]:
get_user_similar_movies(611, 0.1)[:10][['title', 'genres','user_similarity']]

Unnamed: 0,title,genres,user_similarity
158,"Jungle Book, The (1967)",Animation|Children|Comedy|Musical,1.768116
19,Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Sci-Fi,1.574452
204,Close Encounters of the Third Kind (1977),Adventure|Drama|Sci-Fi,1.307692
146,Rocky (1976),Drama,1.307692
244,Moby Dick (1956),Drama,1.307692
307,V for Vendetta (2006),Action|Sci-Fi|Thriller|,1.268116
40,Terminator 2: Judgment Day (1991),Action|Sci-Fi,1.268116
122,Indiana Jones and the Last Crusade (1989),Action|Adventure,1.268116
151,Driving Miss Daisy (1989),Drama,1.268116
85,Star Wars: Episode V - The Empire Strikes Back...,Action|Adventure|Sci-Fi,1.268116


In [40]:
#Movies watched by user 611 
df_new_user

Unnamed: 0,user_id,rating,movie_id,title,genres,year
0,611,3.0,22,Copycat (1995),Crime|Drama|Horror|Mystery|Thriller,1995
1,611,3.5,168,First Knight (1995),Action|Drama|Romance,1995
2,611,0.5,342,Muriel's Wedding (1994),Comedy,1994
3,611,0.5,543,So I Married an Axe Murderer (1993),Comedy|Romance|Thriller,1993
4,611,3.5,1250,"Bridge on the River Kwai, The (1957)",Adventure|Drama|War,1957
5,611,3.5,1380,Grease (1978),Comedy|Musical|Romance,1978
6,611,4.5,1408,"Last of the Mohicans, The (1992)",Action|Romance|War|Western,1992
7,611,3.5,1396,Sneakers (1992),Action|Comedy|Crime|Drama|Sci-Fi,1992
8,611,0.5,1639,Chasing Amy (1997),Comedy|Drama|Romance,1997
9,611,3.5,2194,"Untouchables, The (1987)",Action|Crime|Drama,1987


In [42]:
#Movies recommended by the hubrid recommender to user 611
hybrid_recommender(611)[['title','genres','content_similarity', 'user_similarity', 'similarity_score']]

Unnamed: 0,title,genres,content_similarity,user_similarity,similarity_score
7,"Jungle Book, The (1967)",Animation|Children|Comedy|Musical,0.228229,1.768116,0.998172
0,Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Sci-Fi,0.182753,1.574452,0.878602
3,Monty Python and the Holy Grail (1975),Adventure|Comedy|Fantasy,0.516482,1.117117,0.8168
4,Indiana Jones and the Last Crusade (1989),Action|Adventure,0.234489,1.268116,0.751302
9,Shrek 2 (2004),Adventure|Animation|Children|Comedy|Musical|Ro...,0.498551,0.984649,0.7416
5,Rocky (1976),Drama,0.142002,1.307692,0.724847
2,Rebecca (1940),Drama|Mystery|Romance|Thriller,0.170906,1.268116,0.719511
8,"Lord of the Rings: The Two Towers, The (2002)",Adventure|Fantasy,0.445944,0.984649,0.715297
6,Driving Miss Daisy (1989),Drama,0.147052,1.268116,0.707584
1,Schindler's List (1993),Drama|War,0.269673,1.117117,0.693395


In [64]:
df_new_user

Unnamed: 0,user_id,rating,movie_id,title,genres,year
0,611,3.0,22,Copycat (1995),Crime|Drama|Horror|Mystery|Thriller,1995
1,611,3.5,168,First Knight (1995),Action|Drama|Romance,1995
2,611,0.5,342,Muriel's Wedding (1994),Comedy,1994
3,611,0.5,543,So I Married an Axe Murderer (1993),Comedy|Romance|Thriller,1993
4,611,3.5,1250,"Bridge on the River Kwai, The (1957)",Adventure|Drama|War,1957
5,611,3.5,1380,Grease (1978),Comedy|Musical|Romance,1978
6,611,4.5,1408,"Last of the Mohicans, The (1992)",Action|Romance|War|Western,1992
7,611,3.5,1396,Sneakers (1992),Action|Comedy|Crime|Drama|Sci-Fi,1992
8,611,0.5,1639,Chasing Amy (1997),Comedy|Drama|Romance,1997
9,611,3.5,2194,"Untouchables, The (1987)",Action|Crime|Drama,1987


Based on the viewing history and ratings, user `611` has rated action, drama, adventure, comedy and fantasy high, whereas movies that are based on thriller and mystery genres are rated low. Based on these ratings and based on the relevance of movies rated high by the user, the hybrid recommender does a good job of recommending diverse yet relevant movies. 

0.5 - Comedy * 3, Romance * 2, Thriller, Drama
1.5 - Drama , Mystery, Thriller
3 - Crime|Drama|Horror|Mystery|Thriller
3.5 - Action * 4, Drama * 4, Romance * 3, Adventure * 2, War, Comedy * 3, Musical, Crime * 2, Sci-fi 
4.5 - Action|Romance|War|Western, Animation|Comedy|Musical
5 - Adventure * 2,Fantasy * 2, '|Animation|Children|Comedy||Romance'

Let's consider a user that doe not have a lot of user-item interaction data.

In [43]:
new_user_data = [("2001: A Space Odyssey (1968)",5),("Star Wars: Episode III - Revenge of the Sith (2005)",0.5),
                 ("Duellists, The (1977)", 5), ("Philadelphia Story, The (1940)", 5), ("Batman Returns (1992)", 3)]

In [44]:
new_userId = df_user['user_id'].sort_values().values[-1] 
new_user = []

for movie,rating in new_user_data:
    new_ratings = {}
    new_ratings['user_id'] = new_userId
    new_ratings['rating'] = rating
    new_ratings['movie_id'] = df_content.loc[df_content['title'] == movie, 'movie_id'].values[0]
    new_ratings['title'] = movie
    new_ratings['genres'] = df_content.loc[df_content['title'] == movie, 'genres'].values[0]
    new_ratings['year'] = df_content[df_content['title'] == movie]['year'].values[0]
    new_user.append(new_ratings)
df_new_user = pd.DataFrame(new_user).drop_duplicates()

In [45]:
df_new_user

Unnamed: 0,user_id,rating,movie_id,title,genres,year
0,611,5.0,924,2001: A Space Odyssey (1968),Adventure|Drama|Sci-Fi,1968
1,611,0.5,33493,Star Wars: Episode III - Revenge of the Sith (...,Action|Adventure|Sci-Fi,2005
2,611,5.0,5965,"Duellists, The (1977)",Action|War,1977
3,611,5.0,898,"Philadelphia Story, The (1940)",Comedy|Drama|Romance,1940
4,611,3.0,1377,Batman Returns (1992),Action|Crime,1992


In [46]:
df_user = pd.concat([df_user, df_new_user])

In [47]:
#User-Item MAtrix
user_item = df_user.pivot_table(values = 'rating', index = 'user_id', columns= 'title')  #Changed from movieId to title

In [50]:
#Normalize User-Item Matrix
norm_user_item = user_item.subtract(user_item.mean(axis=1), axis = 'rows')

In [49]:
#User-User similarity matrix
user_similarity = cosine_similarity(sparse.csr_matrix(norm_user_item.fillna(0)))

In [51]:
df_user_sim = pd.DataFrame(user_similarity, index=user_item.index, columns=user_item.index)

In [77]:
df_new_user

Unnamed: 0,user_id,rating,movie_id,title,genres,year
0,611,5.0,924,2001: A Space Odyssey (1968),Adventure|Drama|Sci-Fi,1968
1,611,0.5,33493,Star Wars: Episode III - Revenge of the Sith (...,Action|Adventure|Sci-Fi,2005
2,611,5.0,5965,"Duellists, The (1977)",Action|War,1977
3,611,5.0,898,"Philadelphia Story, The (1940)",Comedy|Drama|Romance,1940
4,611,3.0,1377,Batman Returns (1992),Action|Crime,1992


In [52]:
#Recommendations 
hybrid_recommender(611)

Unnamed: 0,title,genres,imdb_rating,tmdb_rating,content_similarity,user_similarity,similarity_score
4,Time Bandits (1981),Adventure|Comedy|Fantasy|Sci-Fi,6.9,6.6,0.382492,2.008519,1.195505
6,Love and Death (1975),Comedy,7.7,7.5,0.355772,2.008519,1.182145
7,"Adventures of Baron Munchausen, The (1988)",Adventure|Comedy|Fantasy,7.1,6.9,0.256945,2.008519,1.132732
2,Heathers (1989),Comedy,7.2,7.3,0.245929,2.008519,1.127224
5,"Fisher King, The (1991)",Comedy|Drama|Fantasy|Romance,7.5,7.2,0.243999,2.008519,1.126259
8,Hedwig and the Angry Inch (2000),Comedy|Drama|Musical,7.7,7.4,0.242507,2.008519,1.125513
9,"Cat Returns, The (Neko no ongaeshi) (2002)",Adventure|Animation|Children|Fantasy,7.2,7.2,0.223704,2.008519,1.116111
3,Labyrinth (1986),Adventure|Fantasy|Musical,7.3,7.1,0.223064,2.008519,1.115792
1,Annie Hall (1977),Comedy|Romance,8.0,7.8,0.215709,2.008519,1.112114
0,Willy Wonka & the Chocolate Factory (1971),Children|Comedy|Fantasy|Musical,7.8,7.4,0.210932,2.008519,1.109725


We do not have a lot of data on user `611`. Inspite of that the recommender does a decent job in recommending mvoies relevant to what the user watched. It has picked on most of the genres and suggested relevant movies. Comedy seems to be the favorite recommended genre. This is because movie Philadelphia Story has been rated highest. Action has not found its way into recommendations as one of the three action movies has been rated the lowest by user. The recommendation list is not perfect since there are less data points on the user. But due to content-based filtering algorithm, the recommendations are still relevant. 

### Inference

Overall, the hybrid recommender does a good job in recommending movies to both existing as well as new users. It makes recommendations to new users with less data as well as gives diverse recommendations to existing users with relatively large data. 

However, the recommender does not completely overcome limitations of individual fitering algorithms. The type of collaborative filtering used in hybrid recommender is a memory-based technique. A big advantage of this is  easy interpretation and analysis of results. But a major drawback is that it is a memory intensive technique. Every time a new user is added, user-item matrix must be computed and user-similarity matrix must be  recalculated. This makes the recommender slow and more expensive with a large dataset. 

As a solution to speed and scalability issues in memory-based filtering, model based filtering techniques have been developed using machine learning algorithms. While using model based algorithms, some data is extracted from the entire dataset, and the model is run only on that data instead of whole dataset. Model - based algorithms can be ensembled with memory-based algorithms. User-item or user-user similarity matrix can be calculated and stored. A recommendation model can be used on these stored similarities to predict user ratings and give recommendations. This technique is useful in trimming the data by limiting the number of relevant users or items and making predictions. 

As mentioned earlier, model-based algorithms solve the issue of speed and scalability by using snippets of dataset instead of entire to data to make predictions. There is a possibility of inaccurate predictions by the model due to not being exposed to entire data at once. While this approach works for a large dataset, it is more difficult to add new users or items to the model and make predictions, making it less flexible. Addiitonally, the model suffers with sparsity in user-item interaction matrix. 

Model-based collaborative filtering is not part of this project at this time. Eventually, I intend to incorporate matrix factorization based algorithms Singular Value Decomposition (SVD) and Sigular Value Decomposition ++ (SVD++) to build a model-based hybrid recommendation engine. 