# Movie Recommender System


This notebook has three parts:
* **Content based recommender**
* **Collaborative filtering**
* **Hybrid system**

                                    20.08.2018 by Anton Fedun

In [49]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from ast import literal_eval
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.metrics.pairwise import linear_kernel, cosine_similarity
from surprise import Reader, Dataset, SVD, evaluate
from nltk.stem.snowball import SnowballStemmer
import warnings

warnings.simplefilter('ignore')

In the first part of this notebook I will build Content Based Recommender System.
System will be able to recommend films that are similar to a chosen one, based on two contents:
* **Description and taglines of the movies**
* **Cast, director, keywords and genres of the movies**

Let's try, which one will work better

**Part I**

**Content Based Recommender**

File **metadata.csv** contains all information about movies in the dataset.

In [50]:
metadata = pd.read_csv("data/metadata.csv")
metadata = metadata.drop(['Unnamed: 0'], axis=1)
metadata.genres = metadata.genres.fillna('[]').apply(literal_eval).apply(lambda x: [i['name'] for i in x] if isinstance(x, list) else [])
metadata['year'] = pd.to_datetime(metadata['release_date'], errors='coerce').apply(lambda x: str(x).split('-')[0] if x != np.nan else np.nan)
metadata.head()

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count,year
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[Animation, Comedy, Family]",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0,1995
1,False,,65000000,"[Adventure, Fantasy, Family]",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0,1995
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[Romance, Comedy]",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,0.0,101.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0,1995
3,False,,16000000,"[Comedy, Drama, Romance]",,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",...,81452156.0,127.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Friends are the people who let you be yourself...,Waiting to Exhale,False,6.1,34.0,1995
4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,[Comedy],,11862,tt0113041,en,Father of the Bride Part II,Just when George Banks has recovered from his ...,...,76578911.0,106.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Just When His World Is Back To Normal... He's ...,Father of the Bride Part II,False,5.7,173.0,1995


File **links_small.csv** contains related movies ids from other files. But only *tmdbId* is needed.
As a result, we get **9099** films, that are in metadata dataframe.
Let **avail** dafaframe be metadata films, that are available to process in the next step

In [51]:
links_small = pd.read_csv('data/links_small.csv')
links_small = links_small[links_small.tmdbId.notnull()].tmdbId
links_small = links_small.astype('int64')
links_small.head(7)

0      862
1     8844
2    15602
3    31357
4    11862
5      949
6    11860
Name: tmdbId, dtype: int64

In [52]:
avail = metadata[metadata.id.isin(links_small)]
len(avail)

9099

**Recommendations based on taglines and film describing**

I will be using the Cosine Similarity to calculate a numeric quantity that denotes the similarity between two movies. For that I will be using Tf-Idf Vectorizer matrix. The dot product of these matricies will give us cosine similarity matrix.

Each *i-th* row of **cosine_sim** corresponds to similarity of *i-th* movie with each movie. 

In [53]:
avail.tagline = avail.tagline.fillna('')
avail.overview = avail.overview.fillna('')
avail['description'] = avail.tagline + " " + avail.overview

In [54]:
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(avail.description)
print(tfidf_matrix.shape)

(9099, 30570)


In [55]:
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)
cosine_sim[1]

array([ 0.01667385,  1.        ,  0.04322203, ...,  0.00892538,
        0.01697785,  0.        ])

In [56]:
titles = avail.title
indices = pd.Series([i for i in range(len(avail))], index=avail.title)

In [57]:
def get_recommendations(title, number):
    try:
        idx = indices[title]
    except:
        print("Film (%s) does not exist in the dataset" % title)
        return
    
    if type(idx) != np.dtype('int64') and len(idx) > 1:
        print("There are several films called (%s)" % title)
        print("Their indices are: ", avail[avail.title == title].index)
        idx = sorted(idx, key=lambda x: avail.iloc[x].popularity, reverse=True)
        idx = idx[0]
        print("For recommendation, I will take the most popular one with id ", avail.iloc[idx].id)

    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:number+1]
    movie_indices = [i[0] for i in sim_scores]
    return titles.iloc[movie_indices]

Now, let's get 10 recommendations for the film **The Terminator**

In [58]:
get_recommendations('The Terminator', 10)

582              Terminator 2: Judgment Day
13693                  Terminator Salvation
14917                       Teenage Caveman
14631                       The Book of Eli
6388     Terminator 3: Rise of the Machines
25864                    Terminator Genisys
5868                           Just Married
19669                           Cloud Atlas
10228                        Must Love Dogs
3428                             The Hunger
Name: title, dtype: object

If the input film is not in the dataframe, corresponding error is raised.

Sometimes happens, that some films have similar titles. In that case, I choose the most popular one. 
For example, **Titanic** movie:

In [59]:
get_recommendations('Titanic', 10)

There are several films called (Titanic)
Their indices are:  Int64Index([1639, 3285], dtype='int64')
For recommendation, I will take the most popular one with id  597


2576               The Legend of 1900
3285                          Titanic
2424    Beyond the Poseidon Adventure
5225                    Rambling Rose
5646                       Ghost Ship
6922                    The Navigator
3482                            Gypsy
2823                          Niagara
1518                    Event Horizon
533         Six Degrees of Separation
Name: title, dtype: object

**Recommendations based on cast, crew, keywords and genres**

At first, merge **credits** and **keywords** dataframes with our **metadata**. 

In [60]:
credits = pd.read_csv('data/credits.csv')
keywords = pd.read_csv('data/keywords.csv')

metadata = metadata.merge(credits, on='id').merge(keywords, on='id')
metadata = metadata.drop(['Unnamed: 0'], axis=1)
metadata.head()

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,tagline,title,video,vote_average,vote_count,year,Unnamed: 0.1,cast,crew,keywords
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[Animation, Comedy, Family]",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,,Toy Story,False,7.7,5415.0,1995,0,"['Tom Hanks', 'Tim Allen', 'Don Rickles', 'Jim...","[{'name': 'John Lasseter', 'job': 'Director'},...","[{'id': 931, 'name': 'jealousy'}, {'id': 4290,..."
1,False,,65000000,"[Adventure, Fantasy, Family]",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0,1995,1,"['Robin Williams', 'Jonathan Hyde', 'Kirsten D...","[{'name': 'Larry J. Franco', 'job': 'Executive...","[{'id': 10090, 'name': 'board game'}, {'id': 1..."
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[Romance, Comedy]",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0,1995,2,"['Walter Matthau', 'Jack Lemmon', 'Ann-Margret...","[{'name': 'Howard Deutch', 'job': 'Director'},...","[{'id': 1495, 'name': 'fishing'}, {'id': 12392..."
3,False,,16000000,"[Comedy, Drama, Romance]",,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",...,Friends are the people who let you be yourself...,Waiting to Exhale,False,6.1,34.0,1995,3,"['Whitney Houston', 'Angela Bassett', 'Loretta...","[{'name': 'Forest Whitaker', 'job': 'Director'...","[{'id': 818, 'name': 'based on novel'}, {'id':..."
4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,[Comedy],,11862,tt0113041,en,Father of the Bride Part II,Just when George Banks has recovered from his ...,...,Just When His World Is Back To Normal... He's ...,Father of the Bride Part II,False,5.7,173.0,1995,4,"['Steve Martin', 'Diane Keaton', 'Martin Short...","[{'name': 'Alan Silvestri', 'job': 'Original M...","[{'id': 1009, 'name': 'baby'}, {'id': 1599, 'n..."


Then, reassign the value of **avail**, available films. Now, it's **9663** movies.

In [61]:
avail = metadata[metadata.id.isin(links_small)]
print(len(avail))

9663


I will do preprocessing of crew, cast and keywords Series.
From crew, I will only pick a director of the movie as a feature since others don't contribute that much to the feel of the movie. As a result, for a director to have more influence then for a regular keyword, for example, repeate a director 3 times.
Keywords will be converted to the lists of stemmed words. I will take only keywords that have occured more then 3 times in the dataset.
From cast, I will take only three main actors.

In [62]:
avail['cast'] = avail['cast'].apply(literal_eval)
avail['keywords'] = avail['keywords'].apply(literal_eval)
avail['crew'] = avail['crew'].apply(literal_eval)
avail['cast_size'] = avail['cast'].apply(lambda x: len(x))
avail['crew_size'] = avail['crew'].apply(lambda x: len(x))

In [63]:
def get_director(crew):
    for member in crew:
        if member['job'] == 'Director':
            return member['name']
    return np.nan    
        
avail['director'] = avail['crew'].apply(get_director)
#avail['cast'] = avail['cast'].apply(lambda x: [i['name'] for i in x] if isinstance(x, list) else [])
avail['cast'] = avail['cast'].apply(lambda x: x[:3] if len(x) > 3 else x)
avail['keywords'] = avail['keywords'].apply(lambda x: [i['name'] for i in x] if isinstance(x, list) else [])

In [64]:
avail.cast = avail.cast.apply(lambda x: [w.replace(" ", "").lower() for w in x])
avail.director = avail.director.astype('str').apply(lambda x: x.replace(" ", "").lower())
avail.director = avail.director.apply(lambda x: [x, x, x])

In [65]:
avail.head()

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,vote_average,vote_count,year,Unnamed: 0.1,cast,crew,keywords,cast_size,crew_size,director
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[Animation, Comedy, Family]",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,7.7,5415.0,1995,0,"[tomhanks, timallen, donrickles]","[{'name': 'John Lasseter', 'job': 'Director'},...","[jealousy, toy, boy, friendship, friends, riva...",13,106,"[johnlasseter, johnlasseter, johnlasseter]"
1,False,,65000000,"[Adventure, Fantasy, Family]",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,6.9,2413.0,1995,1,"[robinwilliams, jonathanhyde, kirstendunst]","[{'name': 'Larry J. Franco', 'job': 'Executive...","[board game, disappearance, based on children'...",26,16,"[joejohnston, joejohnston, joejohnston]"
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[Romance, Comedy]",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,6.5,92.0,1995,2,"[waltermatthau, jacklemmon, ann-margret]","[{'name': 'Howard Deutch', 'job': 'Director'},...","[fishing, best friend, duringcreditsstinger, o...",7,4,"[howarddeutch, howarddeutch, howarddeutch]"
3,False,,16000000,"[Comedy, Drama, Romance]",,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",...,6.1,34.0,1995,3,"[whitneyhouston, angelabassett, lorettadevine]","[{'name': 'Forest Whitaker', 'job': 'Director'...","[based on novel, interracial relationship, sin...",10,10,"[forestwhitaker, forestwhitaker, forestwhitaker]"
4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,[Comedy],,11862,tt0113041,en,Father of the Bride Part II,Just when George Banks has recovered from his ...,...,5.7,173.0,1995,4,"[stevemartin, dianekeaton, martinshort]","[{'name': 'Alan Silvestri', 'job': 'Original M...","[baby, midlife crisis, confidence, aging, daug...",12,7,"[charlesshyer, charlesshyer, charlesshyer]"


Keywords preprocessing

In [66]:
key = avail.apply(lambda x: pd.Series(x.keywords), axis=1).stack().reset_index(level=1, drop=True)
key.name = 'keyword'
key = key.value_counts()
print(key[:10])
key = key[key > 3]

independent film        634
woman director          578
murder                  403
based on novel          346
duringcreditsstinger    327
violence                266
biography               228
friendship              226
love                    224
sex                     219
Name: keyword, dtype: int64


In [67]:
stemmer = SnowballStemmer('english')
stemmer.stem('films')

'film'

In [68]:
def filter_keywords(keywords):
    words = []
    for i in keywords:
        if i in key:
            words.append(i)
    return words        

In [69]:
avail.keywords = avail.keywords.apply(filter_keywords)
avail.keywords = avail.keywords.apply(lambda x: [stemmer.stem(i) for i in x])
avail.keywords = avail.keywords.apply(lambda x: [str.lower(i.replace(" ", "")) for i in x])

In [70]:
avail['stack'] = avail.keywords + avail.cast + avail.genres + avail.director
avail['stack'] = avail['stack'].apply(lambda x: ' '.join(x))
avail.head()

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,vote_count,year,Unnamed: 0.1,cast,crew,keywords,cast_size,crew_size,director,stack
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[Animation, Comedy, Family]",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,5415.0,1995,0,"[tomhanks, timallen, donrickles]","[{'name': 'John Lasseter', 'job': 'Director'},...","[jealousi, toy, boy, friendship, friend, rival...",13,106,"[johnlasseter, johnlasseter, johnlasseter]",jealousi toy boy friendship friend rivalri toy...
1,False,,65000000,"[Adventure, Fantasy, Family]",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,2413.0,1995,1,"[robinwilliams, jonathanhyde, kirstendunst]","[{'name': 'Larry J. Franco', 'job': 'Executive...","[disappear, basedonchildren'sbook, newhom, rec...",26,16,"[joejohnston, joejohnston, joejohnston]",disappear basedonchildren'sbook newhom reclus ...
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[Romance, Comedy]",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,92.0,1995,2,"[waltermatthau, jacklemmon, ann-margret]","[{'name': 'Howard Deutch', 'job': 'Director'},...","[fish, bestfriend, duringcreditssting]",7,4,"[howarddeutch, howarddeutch, howarddeutch]",fish bestfriend duringcreditssting waltermatth...
3,False,,16000000,"[Comedy, Drama, Romance]",,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",...,34.0,1995,3,"[whitneyhouston, angelabassett, lorettadevine]","[{'name': 'Forest Whitaker', 'job': 'Director'...","[basedonnovel, interracialrelationship, single...",10,10,"[forestwhitaker, forestwhitaker, forestwhitaker]",basedonnovel interracialrelationship singlemot...
4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,[Comedy],,11862,tt0113041,en,Father of the Bride Part II,Just when George Banks has recovered from his ...,...,173.0,1995,4,"[stevemartin, dianekeaton, martinshort]","[{'name': 'Alan Silvestri', 'job': 'Original M...","[babi, midlifecrisi, confid, age, daughter, mo...",12,7,"[charlesshyer, charlesshyer, charlesshyer]",babi midlifecrisi confid age daughter motherda...


We now have our cast, director, keywords and genres stacked in the **stack** column.
I will use the same Cosine Similarity, but now with CountVecrorizer matrix.

In [71]:
count = CountVectorizer(analyzer='word',ngram_range=(1, 3),min_df=0)
count_matrix = count.fit_transform(avail['stack'])

In [72]:
cosine_sim = linear_kernel(count_matrix, count_matrix)

titles = avail.title
indices = pd.Series([i for i in range(len(avail))], index=avail.title)

Let's get recommendations for **The Terminator** film. As you can see, we have **Avatar** and **Titanic** films as recommendations. That's because thay all have a same director: James Kameron. As for me, this system works better than previous one. So I will use these cosine_sim values in the next Hybrid system

In [73]:
print(get_recommendations('The Terminator', 10))

582              Terminator 2: Judgment Day
1185                              The Abyss
1251                                 Aliens
15479                                Avatar
375                               True Lies
6697     Terminator 3: Rise of the Machines
1731                                Titanic
5888         Piranha Part Two: The Spawning
7092                 The Matrix Revolutions
6530                    The Matrix Reloaded
Name: title, dtype: object


In [74]:
print(get_recommendations('Rocky', 10))

2391                     Rocky V
2399              The Karate Kid
4672                  Lean On Me
7911            The Power of One
403                    8 Seconds
2400     The Karate Kid, Part II
2401    The Karate Kid, Part III
5098                 The Formula
5882                   Neighbors
2388                    Rocky II
Name: title, dtype: object


**Part II**

**Collaborative Filtering**


In this part, I will use a technique called **Collaborative Filtering** to make recommendations. Collaborative Filtering is based on the idea that users similar to me can be used to predict how much I will like a particular product or service those users have experienced but I have not.

For that I will use the **Surprise** library that used extremely powerful algorithms like **Singular Value Decomposition (SVD)** to minimise RMSE (Root Mean Square Error) and give great recommendations.

**Surpsise** is a Python scikit building and analysing recommender systems.

In [75]:
reader = Reader()
ratings = pd.read_csv("data/ratings_small.csv")
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205


In [76]:
data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)
print(data.df.head())
data.split(n_folds=5)

   userId  movieId  rating
0       1       31     2.5
1       1     1029     3.0
2       1     1061     3.0
3       1     1129     2.0
4       1     1172     4.0


In [77]:
svd = SVD(n_factors=100, n_epochs=40, lr_all=0.005, reg_all=0.2, verbose=False)
evaluate(svd, data, measures=['RMSE', 'MAE'])

Evaluating RMSE, MAE of algorithm SVD.

------------
Fold 1
RMSE: 0.8879
MAE:  0.6883
------------
Fold 2
RMSE: 0.9038
MAE:  0.6964
------------
Fold 3
RMSE: 0.8864
MAE:  0.6838
------------
Fold 4
RMSE: 0.8892
MAE:  0.6867
------------
Fold 5
RMSE: 0.8833
MAE:  0.6813
------------
------------
Mean RMSE: 0.8901
Mean MAE : 0.6873
------------
------------


CaseInsensitiveDefaultDict(list,
                           {'rmse': [0.88785076194795043,
                             0.90376316577949545,
                             0.88641168965686701,
                             0.88917105529141638,
                             0.88326203554260141],
                            'mae': [0.68834255339709616,
                             0.69643194888354076,
                             0.6838143622347389,
                             0.68667602264358341,
                             0.68133397239999249]})

After training, we get RMSE = **0.8901**, which is good for our model.

In [78]:
trainset = data.build_full_trainset()
svd.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x15697229e10>

**Part III**

**Hybrid system**

In this part, I will build hybrid system. How model works: get 50 top scoring films from the cosine_sim matrix; for a particular user, sort them by predicted rating for user.

In [79]:
def convert_int(x):
    try:
        return int(x)
    except:
        return np.nan
    
id_map = pd.read_csv('data/links_small.csv')[['movieId', 'tmdbId']]
id_map.tmdbId = id_map.tmdbId.apply(convert_int)
id_map.columns = ['movieId', 'id']
id_map = id_map.merge(avail[['title', 'id']], on='id').set_index('title')
indices_map = id_map.set_index('id')

In [80]:
def hybrid(userId, title, number=10):
    try:
        idx = indices[title]
    except:
        print("Film (%s) does not exist in the dataset" % title)
        return
    
    if type(idx) != np.dtype('int64') and len(idx) > 1:
        print("There are several films called (%s)" % title)
        print("Their indices are: ", avail[avail.title == title].index)
        idx = sorted(idx, key=lambda x: avail.iloc[x].popularity, reverse=True)
        idx = idx[0]
        print("For recommendation, I will take the most popular one with id ", avail.iloc[idx].id)
        
    tmdbId = id_map.loc[title]['id']
    movie_id = id_map.loc[title]['movieId']    
    sim_scores = list(enumerate(cosine_sim[int(idx)]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:50]
    
    movie_indices = [i[0] for i in sim_scores]
    movies = avail.iloc[movie_indices][['title', 'vote_count', 'vote_average', 'year', 'id']]
    movies['est'] = movies['id'].apply(lambda x: svd.predict(userId, indices_map.loc[x]['movieId']).est * 2)
    movies = movies.sort_values('est', ascending=False)
    return movies.head(number)

In [81]:
hybrid(34, 'Inception', 10)

Unnamed: 0,title,vote_count,vote_average,year,id,est
4222,Memento,4168.0,8.1,2000,77,8.501567
11847,The Prestige,4510.0,8.0,2006,1124,8.421738
12973,The Dark Knight,12269.0,8.3,2008,155,8.398073
2213,Cube,1101.0,6.9,1997,431,8.3303
23952,Interstellar,11187.0,8.1,2014,157336,8.239023
7800,Cypher,196.0,6.7,2002,10133,8.05433
582,Terminator 2: Judgment Day,4274.0,7.7,1991,280,8.046531
1288,The Terminator,4208.0,7.4,1984,218,8.033219
1264,Alien,4564.0,7.9,1979,348,8.019728
1251,Aliens,3282.0,7.7,1986,679,7.983232


In [82]:
hybrid(10, 'Inception', 10)

Unnamed: 0,title,vote_count,vote_average,year,id,est
4222,Memento,4168.0,8.1,2000,77,8.410319
12973,The Dark Knight,12269.0,8.3,2008,155,8.30037
11847,The Prestige,4510.0,8.0,2006,1124,8.291887
2213,Cube,1101.0,6.9,1997,431,8.222788
23952,Interstellar,11187.0,8.1,2014,157336,8.148933
7800,Cypher,196.0,6.7,2002,10133,8.028748
582,Terminator 2: Judgment Day,4274.0,7.7,1991,280,7.957318
1288,The Terminator,4208.0,7.4,1984,218,7.939911
1264,Alien,4564.0,7.9,1979,348,7.92858
1251,Aliens,3282.0,7.7,1986,679,7.907275


In [83]:
hybrid(44, 'Alien', 10)

Unnamed: 0,title,vote_count,vote_average,year,id,est
31161,The Martian,7442.0,7.6,2015,286217,7.756493
536,Blade Runner,3833.0,7.9,1982,78,7.692884
3579,Gladiator,5566.0,7.9,2000,98,7.582143
582,Terminator 2: Judgment Day,4274.0,7.7,1991,280,7.57875
1288,The Terminator,4208.0,7.4,1984,218,7.545517
1251,Aliens,3282.0,7.7,1986,679,7.520418
3704,Mad Max 2: The Road Warrior,981.0,7.3,1981,8810,7.343468
11131,District B13,572.0,6.5,2004,10045,7.333072
3703,Mad Max,1235.0,6.6,1979,9659,7.315581
13492,Body of Lies,919.0,6.5,2008,12113,7.306571


We see that for our hybrid recommender, we get different recommendations for different users although the movie is the same.

# Conclusion

In this notebook, I compared two different Content Based Recommender algorithms.
Better one later is used in the Hybrid System together with Collaborative Filtering to recommend similar films to a particular user. It was done with the help of an amazing **Surprise** library.

I was inspired to do this by Rounak Banik and [kinopoisk.ru](https://www.kinopoisk.ru/) movie recommender.