## Introduction
 Recommendation Systems are a type of information filtering systems as they improve the quality of search results and provides items that are more relevant to the search item or are related to the search history of the user.

This project provides movie recommendations based on 

1. Similarity between movies based on certain metrics (such as genre, director, description, actors, etc.) The idea is if a person liked a particular item, he or she will also like an item that is similar to it.

2. Persons with similar interests. The idea is that users similar to a particular user can be used to predict how much that particular user will like a particular product or service. (the movie a similar have seen but the particular user has not)
  

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Importing Libraries

In [3]:
%%capture
!pip install surprise

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel
from ast import literal_eval
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from surprise import SVD, Reader, Dataset 
from surprise.model_selection import cross_validate

## Loading Datasets and preprocessing

In [4]:
cd /content/drive/MyDrive/kaggle

/content/drive/MyDrive/kaggle


In [5]:
df_credits=pd.read_csv('./input/tmdb-movie-metadata/tmdb_5000_credits.csv')
df_credits.head(2)

Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."


In [6]:
df_movies=pd.read_csv('./input/tmdb-movie-metadata/tmdb_5000_movies.csv')
df_movies.head(2)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500


Now we join these two dataset on the 'id' column. As both dataframes contain a 'title' column, we remove on of the title columns.

In [7]:
df_credits.columns=['id', 'title', 'cast', 'crew']
df_movies.drop(['title'], axis=1, inplace=True)
df_movielens=pd.merge(df_credits,df_movies,on='id')

In [8]:
df_movielens.to_csv("df_movie.csv", index = False)
from google.colab import files
files.download("df_movie.csv")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
df_movielens.shape

(4803, 22)

## Content-Based Recommender

To personalize the recommendations, the content based recommender computes similarity between movies based on certain metrics and suggests movies that are most similar to a particular movie that a user liked. 

### Movie Overview Based Recommender

First, based on the 'overview' column,we compute pairwise similarity scores for all movies. Then recommend movies based on that similarity score.

In [None]:
df_movielens['overview'].head()

0    In the 22nd century, a paraplegic Marine is di...
1    Captain Barbossa, long believed to be dead, ha...
2    A cryptic message from Bond’s past sends him o...
3    Following the death of District Attorney Harve...
4    John Carter is a war-weary, former military ca...
Name: overview, dtype: object

In [None]:
df_movielens['overview'].isnull().sum()
#replace NaN in 'overview' column with an empty string.
df_movielens['overview'].fillna(' ', inplace=True)

#### Constructing TF-IDF Matrix 

Now we compute Term Frequency-Inverse Document Frequency (TF-IDF) vectors for each overview.

This gives us a matrix where each column represents a word in the overall overview vocabulary and each row represents a movie.This is done to reduce the importance of words that occur frequently in plot overviews and therefore, their significance in computing the final similarity score.

In [None]:
tfidfv=TfidfVectorizer(analyzer='word', stop_words='english')
tfidfv_matrix=tfidfv.fit_transform(df_movielens['overview'])
print(tfidfv_matrix.todense())
tfidfv_matrix.todense().shape

[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]


(4803, 20978)

Over 20,000 different words were used to describe the 4803 movies in our dataset.

In [None]:
# Computing Similarity Score based on movie overview
cosine_sim1 = linear_kernel(tfidfv_matrix, tfidfv_matrix)
cosine_sim1.shape 

(4803, 4803)

We now have a pairwise cosine similarity matrix for all the movies in our dataset. 

#### Defining Recommendation Function

Define a recommendation function that takes in a movie title as an input and outputs a list of the 10 most similar movies. In order to do this;

- Reverse map movie titles and dataframe indices. To achieve this, build a series to identify the index of a movie in our dataframe, given its title. The function gets the index of the movie given its title.

- Get the list of cosine similarity scores for that particular movie with all movies. Convert it into a list of tuples where the first element is its position and the second is the similarity score.

- Sort the aforementioned list of tuples based on the similarity scores; that is, the second element.

- Get the top 10 elements of this list. Ignore the first element as it refers to self (the movie most similar to a particular movie is the movie itself).

- Return the titles corresponding to the indices of the top elements.

In [None]:
indices=pd.Series(data=list(df_movielens.index), index= df_movielens['title'] )
indices.head()

title
Avatar                                      0
Pirates of the Caribbean: At World's End    1
Spectre                                     2
The Dark Knight Rises                       3
John Carter                                 4
dtype: int64

In [None]:
# Function that takes in movie title as input and outputs most similar movies
def content_recommendations(title, cosine_sim):
    
    # Get the index of the movie that matches the title
    idx = indices[title]
    
    # Get the pairwsie similarity scores of all movies with that movie
    sim_scores = list(enumerate(cosine_sim[idx]))
    
    # Sort the movies based on the similarity scores
    sim_scores.sort(key=lambda x: x[1], reverse=True)

    # Get the scores of the 10 most similar movies
    sim_scores=sim_scores[1:11]
    
    # Get the movie indices
    ind=[]
    for (x,y) in sim_scores:
        ind.append(x)
        
    # Return the top 10 most similar movies
    tit=[]
    for x in ind:
        tit.append(df_movielens.iloc[x]['title'])
    return pd.Series(data=tit, index=ind)

In [None]:
content_recommendations('The Dark Knight Rises',cosine_sim1)

65                              The Dark Knight
299                              Batman Forever
428                              Batman Returns
1359                                     Batman
3854    Batman: The Dark Knight Returns, Part 2
119                               Batman Begins
2507                                  Slow Burn
9            Batman v Superman: Dawn of Justice
1181                                        JFK
210                              Batman & Robin
dtype: object

In [None]:
content_recommendations('Avatar',cosine_sim1)

In [None]:
content_recommendations('The Avengers',cosine_sim1)

7               Avengers: Age of Ultron
3144                            Plastic
1715                            Timecop
4124                 This Thing of Ours
3311              Thank You for Smoking
3033                      The Corruptor
588     Wall Street: Money Never Sleeps
2136         Team America: World Police
1468                       The Fountain
1286                        Snowpiercer
dtype: object

### Movie overview,Cast, Crew, Keywords, Genres Based Recommender

To improve the quality of the content-based recommender, add more relevant metadata.
- the director
- the 3 top actors
- the 3 top related genres 
- the 3 top movie plot keywords

From the crew, cast, genres and keywords features, extract the director, and the three most important actors, genres and keywords associated with that movie. 

#### Preprocessing the Contents

##### Applying literal_eval Function on Stringified Lists

Convert 'crew', 'cast', 'genres' and 'keywords' columns from "stringified" lists into a safe and usable structure. literal_eval is a function that evaluates a string as though it were an expression and returns a result.

In [None]:
type(df_movielens['cast'].iloc[0])

str

In [None]:
features = ['cast', 'crew', 'keywords', 'genres']
for feature in features:
    df_movielens[feature] = df_movielens[feature].apply(literal_eval)

In [None]:
type(df_movielens['cast'].iloc[0])

list

In [None]:
df_movielens.head(3)

Unnamed: 0,id,title,cast,crew,budget,genres,homepage,keywords,original_language,original_title,...,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,vote_average,vote_count
0,19995,Avatar,"[{'cast_id': 242, 'character': 'Jake Sully', '...","[{'credit_id': '52fe48009251416c750aca23', 'de...",237000000,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...",http://www.avatarmovie.com/,"[{'id': 1463, 'name': 'culture clash'}, {'id':...",en,Avatar,...,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,7.2,11800
1,285,Pirates of the Caribbean: At World's End,"[{'cast_id': 4, 'character': 'Captain Jack Spa...","[{'credit_id': '52fe4232c3a36847f800b579', 'de...",300000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",http://disney.go.com/disneypictures/pirates/,"[{'id': 270, 'name': 'ocean'}, {'id': 726, 'na...",en,Pirates of the Caribbean: At World's End,...,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",6.9,4500
2,206647,Spectre,"[{'cast_id': 1, 'character': 'James Bond', 'cr...","[{'credit_id': '54805967c3a36829b5002c41', 'de...",245000000,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...",http://www.sonypictures.com/movies/spectre/,"[{'id': 470, 'name': 'spy'}, {'id': 818, 'name...",en,Spectre,...,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""...",2015-10-26,880674609,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,6.3,4466


##### Defining Functions to Grab the Contents

In [None]:
# Get the director's name from the crew feature. If director is not listed, return NaN
def get_director(x):
    for a in x:
        if a['job']=='Director':
            return a['name'] 
    return 'NaN'

In [None]:
# Get the list top 3 elements or entire list; whichever is more in cast, genres and keywords columns.
def get_top3(x):
    new=[]
    for a in x[:3]:
        new.append(a['name']) 
    return new
#Return empty list in case of missing/malformed data
    return []

Now we define new director, actor, genres and keywords columns.

In [None]:
df_movielens['director']=df_movielens['crew'].apply(lambda x: get_director(x))
df_movielens['actor']=df_movielens['cast'].apply(lambda x:get_top3(x))

In [None]:
df_movielens['genres'][0]

[{'id': 28, 'name': 'Action'},
 {'id': 12, 'name': 'Adventure'},
 {'id': 14, 'name': 'Fantasy'},
 {'id': 878, 'name': 'Science Fiction'}]

In [None]:
df_movielens['genres']=df_movielens['genres'].apply(lambda x:get_top3(x))
df_movielens['genres'][0]

['Action', 'Adventure', 'Fantasy']

In [None]:
df_movielens['keywords']=df_movielens['keywords'].apply(lambda x:get_top3(x))
df_movielens[['title', 'actor', 'director', 'keywords', 'genres']].head(3)

Unnamed: 0,title,actor,director,keywords,genres
0,Avatar,"[Sam Worthington, Zoe Saldana, Sigourney Weaver]",James Cameron,"[culture clash, future, space war]","[Action, Adventure, Fantasy]"
1,Pirates of the Caribbean: At World's End,"[Johnny Depp, Orlando Bloom, Keira Knightley]",Gore Verbinski,"[ocean, drug abuse, exotic island]","[Adventure, Fantasy, Action]"
2,Spectre,"[Daniel Craig, Christoph Waltz, Léa Seydoux]",Sam Mendes,"[spy, based on novel, secret agent]","[Action, Adventure, Crime]"


In [None]:
def clean_director(x):
    return x.lower().replace(' ','')

In [None]:
def clean_top3(x):
    new=[]
    for a in x:
        new.append(a.lower().replace(' ',''))
    return new

Now we apply these clean functions on director, actor, genres and keywords columns.

In [None]:
df_movielens['director']=df_movielens['director'].apply(lambda x: clean_director(x))
df_movielens['actor']=df_movielens['actor'].apply(lambda x:clean_top3(x))
df_movielens['keywords']=df_movielens['keywords'].apply(lambda x:clean_top3(x))
df_movielens['genres']=df_movielens['genres'].apply(lambda x:clean_top3(x))
df_movielens[['title', 'actor', 'director', 'keywords', 'genres']].head(3)

Unnamed: 0,title,actor,director,keywords,genres
0,Avatar,"[samworthington, zoesaldana, sigourneyweaver]",jamescameron,"[cultureclash, future, spacewar]","[action, adventure, fantasy]"
1,Pirates of the Caribbean: At World's End,"[johnnydepp, orlandobloom, keiraknightley]",goreverbinski,"[ocean, drugabuse, exoticisland]","[adventure, fantasy, action]"
2,Spectre,"[danielcraig, christophwaltz, léaseydoux]",sammendes,"[spy, basedonnovel, secretagent]","[action, adventure, crime]"


Now we create the 'combo' column, that contains all the metadata that we want to feed to our vectorizer (namely actors, director, genres and keywords).

In [None]:
df_movielens['overview']

0       In the 22nd century, a paraplegic Marine is di...
1       Captain Barbossa, long believed to be dead, ha...
2       A cryptic message from Bond’s past sends him o...
3       Following the death of District Attorney Harve...
4       John Carter is a war-weary, former military ca...
                              ...                        
4798    El Mariachi just wants to play his guitar and ...
4799    A newlywed couple's honeymoon is upended by th...
4800    "Signed, Sealed, Delivered" introduces a dedic...
4801    When ambitious New York attorney Sam is sent t...
4802    Ever since the second grade when he first saw ...
Name: overview, Length: 4803, dtype: object

In [None]:
def create_combo(x):
    return ''.join(x['overview']) + ' ' + ' '.join(x['keywords']) + ' ' + ' '.join(x['actor']) + ' ' + x['director'] + ' ' + ' '.join(x['genres'])

df_movielens['combo'] = df_movielens.apply(create_combo, axis=1)
df_movielens['combo'][0]

'In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization. cultureclash future spacewar samworthington zoesaldana sigourneyweaver jamescameron action adventure fantasy'

#### Constructing TF-IDF Matrix 

The next steps are the same as what we did with our Movie Overview Based Recommender. One important difference is that we use the CountVectorizer() instead of TF-IDF. This is because we do not want to down-weight the presence of an actor/director if he or she has acted or directed in relatively more movies.

In [None]:
cv = CountVectorizer(stop_words='english')
cv_matrix = cv.fit_transform(df_movielens['combo'])

#### Computing Similarity Score

Since we have used Countvectorizer, we use **cosine_similarities** to compute the similarity score.

In [None]:
cv2 = CountVectorizer(stop_words='english')
cv_matrix2 = cv2.fit_transform(df_movielens['combo'])

In [None]:
cosine_sim2 = cosine_similarity(cv_matrix2, cv_matrix2)

In [None]:
cosine_sim3 = cosine_similarity(cv_matrix, cv_matrix)

#### Applying Recommendation Function

We can now reuse our content_recommendations function by passing in the new cosine_sim2 matrix as the second argument.

In [None]:
content_recommendations('The Dark Knight Rises', cosine_sim3)

65                              The Dark Knight
299                              Batman Forever
119                               Batman Begins
1359                                     Batman
428                              Batman Returns
210                              Batman & Robin
2507                                  Slow Burn
3854    Batman: The Dark Knight Returns, Part 2
590                                   The Siege
238                Teenage Mutant Ninja Turtles
dtype: object

In [None]:
content_recommendations('The Dark Knight Rises', cosine_sim1)

65                              The Dark Knight
299                              Batman Forever
428                              Batman Returns
1359                                     Batman
3854    Batman: The Dark Knight Returns, Part 2
119                               Batman Begins
2507                                  Slow Burn
9            Batman v Superman: Dawn of Justice
1181                                        JFK
210                              Batman & Robin
dtype: object

In [None]:
content_recommendations('The Dark Knight Rises', cosine_sim2)

65               The Dark Knight
119                Batman Begins
4638    Amidst the Devil's Wings
1196                The Prestige
3073           Romeo Is Bleeding
3326              Black November
1503                      Takers
1986                      Faster
303                     Catwoman
747               Gangster Squad
dtype: object

In [None]:
content_recommendations('The Godfather', cosine_sim3)

2731      The Godfather: Part II
1873                  Blood Ties
867      The Godfather: Part III
3727                  Easy Money
4226                 Nine Queens
444            Road to Perdition
3726                  Sexy Beast
1635     The Replacement Killers
4638    Amidst the Devil's Wings
1247             City By The Sea
dtype: object

In [None]:
content_recommendations('The Godfather', cosine_sim2)

867      The Godfather: Part III
2731      The Godfather: Part II
2649           The Son of No One
1525              Apocalypse Now
4638    Amidst the Devil's Wings
1018             The Cotton Club
1170     The Talented Mr. Ripley
1209               The Rainmaker
1394               Donnie Brasco
1850                    Scarface
dtype: object

We see that our recommender has been successful in capturing more information due to more metadata and has given us better recommendations. It is more likely that Marvels or DC comics fans will like the movies of the same production house. Therefore, to our features above we can add production_company . We can also increase the weight of the director , by adding the feature multiple times in the soup.

## Collaborative Recommender

The content-based engine suffers from some severe limitations. It is only capable of suggesting movies which are close to a certain movie. That is, it is not capable of capturing tastes and providing recommendations across genres.

Also, the engine that we built is not really personal in that it does not capture the personal tastes and biases of a user. Anyone querying our engine for recommendations based on a movie will receive the same recommendations for that movie, regardless of who she/he is.

Therefore, in this section, we will use **Collaborative Filtering** to make recommendations to Movie Watchers.

Collaborative Filtering matches persons with similar interests and provides recommendations based on this matching. It is based on the idea that users similar to me can be used to predict how much I will like a particular product or service those users have used/experienced but I have not. This system does not require item metadata like its content-based counterparts.


### SVD: Matrix Factorization Based Algorithm

Singular Value Decomposition (SVD) extract features and correlation from the user-item matrix. For example, for movies in different categories. SVD would generate factors when looking into the dimension space like action vs comedy, Hollywood vs Bollywood, or Marvel vs Disney.

SVD decomposes a matrix into three other matrices and extracts the factors from the factorization of a high-level (user-item-rating) matrix.

In [None]:
# We will use the famous SVD algorithm.
svd = SVD()
reader = Reader()
# Load the ratings_small dataset (download it if needed),
data = Dataset.load_from_df(df_rating[['userId', 'movieId', 'rating']], reader)

NameError: ignored

In [None]:
# Run 5-fold cross-validation and print the results
cross_validate(svd, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

We get a mean Root Mean Sqaure Error of 0.89 approx which is good enough for our case. Let us now train on our dataset and arrive at predictions.

In [None]:
#sample full trainset
trainset = data.build_full_trainset()

In [None]:
# Train the algorithm on the trainset
svd.fit(trainset)

Let us pick user with user Id 1 and check the ratings she/he has given.


In [None]:
df_rating[df_rating['userId'] == 1]

We use the algorithm to predict his/her score for move_id of 302.

In [None]:
# predict ratings for the testset
svd.predict(uid=1, iid=302, r_ui=None)

In [None]:
# directly grab the estimated ratings for the testset
svd.predict(uid=1, iid=302, r_ui=None).est

For movie with ID 302, we get an estimated prediction of 2.63. One startling feature of this recommender system is that it does not care what the movie is (or what it contains). It works purely on the basis of an assigned movie ID and tries to predict ratings based on how the other users have predicted the movie.

## Hybrid Recommender

In this section, we try to build a simple hybrid recommender that brings together techniques we have implemented in the content-based and collaborative filter based engines. This is how it works:

- Input: User ID and the Title of a Movie

- Output: Similar movies sorted on the basis of expected ratings by that particular user.

In [None]:
df_movielens.columns=['movieId', 'title', 'cast', 'crew', 'budget', 'genres', 'homepage',
       'keywords', 'original_language', 'original_title', 'overview',
       'popularity', 'production_companies', 'production_countries',
       'release_date', 'revenue', 'runtime', 'spoken_languages', 'status',
       'tagline', 'vote_average', 'vote_count', 'director', 'actor', 'soup']

In [None]:
# Function that takes in movie title as input and outputs most similar movies
def hybrid_recommendations(userId, title):
    
    # Get the index of the movie that matches the title
    idx = indices[title]
    
    # Get the pairwsie similarity scores of all movies similar to that movie
    sim_scores = list(enumerate(cosine_sim2[idx]))
    
    # Sort the movies based on the similarity scores
    sim_scores.sort(key=lambda x: x[1], reverse=True)

    # Get the scores of the 10 most similar movies
    sim_scores=sim_scores[1:11]
    
    # Get the movie indices
    ind=[]
    for (x,y) in sim_scores:
        ind.append(x)
        
    # Grab the title,movieid,vote_average and vote_count of the top 10 most similar movies
    tit=[]
    movieid=[]
    vote_average=[]
    vote_count=[]
    for x in ind:
        tit.append(df_movielens.iloc[x]['title'])
        movieid.append(df_movielens.iloc[x]['movieId'])
        vote_average.append(df_movielens.iloc[x]['vote_average'])
        vote_count.append(df_movielens.iloc[x]['vote_count'])

        
    # Predict the ratings a user might give to these top 10 most similar movies
    est_rating=[]
    for a in movieid:
        est_rating.append(svd.predict(userId, a, r_ui=None).est)  
        
    return pd.DataFrame({'index': ind, 'title':tit, 'movieId':movieid, 'vote_average':vote_average, 'vote_count':vote_count,'estimated_rating':est_rating}).set_index('index').sort_values(by='estimated_rating', ascending=False)


In [None]:
hybrid_recommendations(1,'Avatar')

Unnamed: 0_level_0,title,movieId,vote_average,vote_count,estimated_rating
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
103,The Sorcerer's Apprentice,27022,5.8,1470,3.029158
5,Spider-Man 3,559,5.9,3576,2.653732
206,Clash of the Titans,18823,5.6,2233,2.651462
786,The Monkey King 2,381902,6.0,24,2.651462
131,G-Force,19585,5.1,510,2.651462
715,The Scorpion King,9334,5.3,779,2.651462
1,Pirates of the Caribbean: At World's End,285,6.9,4500,2.590907
215,Fantastic 4: Rise of the Silver Surfer,1979,5.4,2589,2.386699
466,The Time Machine,2135,5.8,631,2.232502
71,The Mummy: Tomb of the Dragon Emperor,1735,5.2,1387,2.188348


In [None]:

hybrid_recommendations(4,'Avatar')

Unnamed: 0_level_0,title,movieId,vote_average,vote_count,estimated_rating
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
103,The Sorcerer's Apprentice,27022,5.8,1470,4.610125
5,Spider-Man 3,559,5.9,3576,4.272945
206,Clash of the Titans,18823,5.6,2233,4.163664
786,The Monkey King 2,381902,6.0,24,4.163664
131,G-Force,19585,5.1,510,4.163664
715,The Scorpion King,9334,5.3,779,4.163664
1,Pirates of the Caribbean: At World's End,285,6.9,4500,4.161822
466,The Time Machine,2135,5.8,631,4.033034
215,Fantastic 4: Rise of the Silver Surfer,1979,5.4,2589,3.751639
71,The Mummy: Tomb of the Dragon Emperor,1735,5.2,1387,3.634847


In [None]:
content_recommendations('Avatar', cosine_sim2)

206                         Clash of the Titans
71        The Mummy: Tomb of the Dragon Emperor
786                           The Monkey King 2
103                   The Sorcerer's Apprentice
131                                     G-Force
215      Fantastic 4: Rise of the Silver Surfer
466                            The Time Machine
715                           The Scorpion King
1      Pirates of the Caribbean: At World's End
5                                  Spider-Man 3
dtype: object

We can see that if content-based recommendation is used alone and the 'combo' column is used as the content, the order of  recommended movies similar to a particular movie is fixed, regardless of the users. However, when we combine the content-based recommendation with the collaborative recommendation and build a hybrid recommendation, the order of recommended movies similar to a particular movie varies for different users.  