# Sources
https://www.kaggle.com/rounakbanik/movie-recommender-systems

## Simple Recommender

The Simple Recommender offers generalized recommnendations to every user based on movie popularity and (sometimes) genre. The basic idea behind this recommender is that movies that are more popular and more critically acclaimed will have a higher probability of being liked by the average audience. This model does not give personalized recommendations based on the user. 

The implementation of this model is extremely trivial. All we have to do is sort our movies based on ratings and popularity and display the top movies of our list. As an added step, we can pass in a genre argument to get the top movies of a particular genre. 

In [187]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from ast import literal_eval
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.metrics.pairwise import linear_kernel, cosine_similarity
from nltk.stem.snowball import SnowballStemmer
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.corpus import wordnet

import warnings; warnings.simplefilter('ignore')

In [188]:
md = pd. read_csv('../data/movies_metadata.csv')
md.head()

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,release_date,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,1995-10-30,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,1995-12-15,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,1995-12-22,0.0,101.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0
3,False,,16000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",...,1995-12-22,81452156.0,127.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Friends are the people who let you be yourself...,Waiting to Exhale,False,6.1,34.0
4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,"[{'id': 35, 'name': 'Comedy'}]",,11862,tt0113041,en,Father of the Bride Part II,Just when George Banks has recovered from his ...,...,1995-02-10,76578911.0,106.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Just When His World Is Back To Normal... He's ...,Father of the Bride Part II,False,5.7,173.0


In [189]:
md.columns

Index([u'adult', u'belongs_to_collection', u'budget', u'genres', u'homepage',
       u'id', u'imdb_id', u'original_language', u'original_title', u'overview',
       u'popularity', u'poster_path', u'production_companies',
       u'production_countries', u'release_date', u'revenue', u'runtime',
       u'spoken_languages', u'status', u'tagline', u'title', u'video',
       u'vote_average', u'vote_count'],
      dtype='object')

In [190]:
md['genres'] = md['genres'].fillna('[]').apply(literal_eval).apply(lambda x: [i['name'] for i in x] if isinstance(x, list) else [])

I use the TMDB Ratings to come up with our **Top Movies Chart.** I will use IMDB's *weighted rating* formula to construct my chart. Mathematically, it is represented as follows:

Weighted Rating (WR) = $(\frac{v}{v + m} . R) + (\frac{m}{v + m} . C)$

where,
* *v* is the number of votes for the movie
* *m* is the minimum votes required to be listed in the chart
* *R* is the average rating of the movie
* *C* is the mean vote across the whole report

The next step is to determine an appropriate value for *m*, the minimum votes required to be listed in the chart. We will use **95th percentile** as our cutoff. In other words, for a movie to feature in the charts, it must have more votes than at least 95% of the movies in the list.

I will build our overall Top 250 Chart and will define a function to build charts for a particular genre. Let's begin!

In [191]:
vote_counts = md[md['vote_count'].notnull()]['vote_count'].astype('int')
vote_averages = md[md['vote_average'].notnull()]['vote_average'].astype('int')
C = vote_averages.mean()
C

5.244896612406511

In [192]:
m = vote_counts.quantile(0.95)
m

434.0

In [193]:
md['year'] = pd.to_datetime(md['release_date'], errors='coerce').apply(lambda x: str(x).split('-')[0] if x != np.nan else np.nan)

In [194]:
qualified = md[(md['vote_count'] >= m) & (md['vote_count'].notnull()) & (md['vote_average'].notnull())][['title', 'year', 'vote_count', 'vote_average', 'popularity', 'genres','homepage','poster_path']]
qualified['vote_count'] = qualified['vote_count'].astype('int')
qualified['vote_average'] = qualified['vote_average'].astype('int')
qualified.shape

(2274, 8)

In [195]:
def weighted_rating(x):
    v = x['vote_count']
    R = x['vote_average']
    return (v/(v+m) * R) + (m/(m+v) * C)

In [196]:
qualified['wr'] = qualified.apply(weighted_rating, axis=1)

In [197]:
qualified = qualified.sort_values('wr', ascending=False).head(250)

In [198]:
qualified.head(15)

Unnamed: 0,title,year,vote_count,vote_average,popularity,genres,homepage,poster_path,wr
15480,Inception,2010,14075,8,29.1081,"[Action, Thriller, Science Fiction, Mystery, A...",http://inceptionmovie.warnerbros.com/,/qmDpIHrmpJINaRKAfWQfftjCdyi.jpg,7.917588
12481,The Dark Knight,2008,12269,8,123.167,"[Drama, Action, Crime, Thriller]",http://thedarkknight.warnerbros.com/dvdsite/,/1hRoyzDtpgMU7Dz4JF22RANzQO7.jpg,7.905871
22879,Interstellar,2014,11187,8,32.2135,"[Adventure, Drama, Science Fiction]",http://www.interstellarmovie.net/,/nBNZadXqJSdt05SHLqgT0HuC5Gm.jpg,7.897107
2843,Fight Club,1999,9678,8,63.8696,[Drama],http://www.foxmovies.com/movies/fight-club,/adw6Lq9FiC9zjYEpOqfq03ituwp.jpg,7.881753
4863,The Lord of the Rings: The Fellowship of the Ring,2001,8892,8,32.0707,"[Adventure, Fantasy, Action]",http://www.lordoftherings.net/,/bxVxZb5O9OxCO0oRUNdCnpy9NST.jpg,7.871787
292,Pulp Fiction,1994,8670,8,140.95,"[Thriller, Crime]",,/dM2w364MScsjFf8pfMbaWUcWrR.jpg,7.86866
314,The Shawshank Redemption,1994,8358,8,51.6454,"[Drama, Crime]",,/9O7gLzmreU0nGkIB6K3BsJbzvNv.jpg,7.864
7000,The Lord of the Rings: The Return of the King,2003,8226,8,29.3244,"[Adventure, Fantasy, Action]",http://www.lordoftherings.net,/uexxR7Kw1qYbZk0RYaF9Rx5ykbj.jpg,7.861927
351,Forrest Gump,1994,8147,8,48.3072,"[Comedy, Drama, Romance]",,/yE5d3BUhE8hCnkMUJOo1QDoOGNz.jpg,7.860656
5814,The Lord of the Rings: The Two Towers,2002,7641,8,29.4235,"[Adventure, Fantasy, Action]",http://www.lordoftherings.net/,/wf3v0Pn09jnT5HSaYal7Ami3bdA.jpg,7.851924


In [199]:
s = md.apply(lambda x: pd.Series(x['genres']),axis=1).stack().reset_index(level=1, drop=True)
s.name = 'genre'
gen_md = md.drop('genres', axis=1).join(s)

In [200]:
def build_chart(genre, percentile=0.85):
    df = gen_md[gen_md['genre'] == genre]
    vote_counts = df[df['vote_count'].notnull()]['vote_count'].astype('int')
    vote_averages = df[df['vote_average'].notnull()]['vote_average'].astype('int')
    C = vote_averages.mean()
    m = vote_counts.quantile(percentile)
    
    qualified = df[(df['vote_count'] >= m) & (df['vote_count'].notnull()) & (df['vote_average'].notnull())][['title', 'year', 'vote_count', 'vote_average', 'popularity','homepage','poster_path']]
    qualified['vote_count'] = qualified['vote_count'].astype('int')
    qualified['vote_average'] = qualified['vote_average'].astype('int')
    
    qualified['wr'] = qualified.apply(lambda x: (x['vote_count']/(x['vote_count']+m) * x['vote_average']) + (m/(m+x['vote_count']) * C), axis=1)
    qualified = qualified.sort_values('wr', ascending=False).head(250)
    
    return qualified

In [201]:
build_chart('Romance').head(15)

Unnamed: 0,title,year,vote_count,vote_average,popularity,homepage,poster_path,wr
10309,Dilwale Dulhania Le Jayenge,1995,661,9,34.457,,/2gvbZMtV1Zsl7FedJa5ysbpBx2G.jpg,8.565285
351,Forrest Gump,1994,8147,8,48.3072,,/yE5d3BUhE8hCnkMUJOo1QDoOGNz.jpg,7.971357
876,Vertigo,1958,1162,8,18.2082,,/obhM86qyv8RsE69XSMTtT9FdE0b.jpg,7.811667
40251,Your Name.,2016,1030,8,34.461252,https://www.funimationfilms.com/movie/yourname/,/xq1Ugd62d23K2knRUx6xxuALTZB.jpg,7.789489
883,Some Like It Hot,1959,835,8,11.8451,,/pxc9EFCMYkItESpqqrI783yl8Gh.jpg,7.745154
1132,Cinema Paradiso,1988,834,8,14.177,,/xKAweeF2ZPMNn6ce4GclSbr59Pv.jpg,7.744878
19901,Paperman,2012,734,8,7.19863,http://www.disneyanimation.com/projects/shorts...,/3TpMBcAYH4cxCw5WoRacWodMTCG.jpg,7.713951
37863,Sing Street,2016,669,8,10.672862,,/s0C78plmx3dFcO3WMnoXCz56FiN.jpg,7.689483
882,The Apartment,1960,498,8,11.9943,,/6OH3HlplEGoGjqeqRmWaVBGqbav.jpg,7.599317
38718,The Handmaiden,2016,453,8,16.727405,,/wvzfK5QR6dGLwND8MCzWjsQWG4Q.jpg,7.566166


In [None]:
genreSet = list(set(s))
dfGenre = build_chart(genreSet[2]).head(5)
dfGenre['Genre'] = genreSet[2]
for i in range(3,len(genreSet)):
    try:
        x = build_chart(genreSet[i]).head(5)
        x['Genre'] = genreSet[i]
        dfGenre = dfGenre.append(x)
    except:
        continue

In [None]:
dfGenre

In [None]:
dfGenre.homepage.fillna('https://www.google.com/search?q=' + dfGenre.title.split()[0], inplace=True)

In [218]:
dfGenre

Unnamed: 0,title,year,vote_count,vote_average,popularity,homepage,poster_path,wr,Genre
10309,Dilwale Dulhania Le Jayenge,1995,661,9,34.457,https://www.google.com/search?q=Dilwale Dulhan...,/2gvbZMtV1Zsl7FedJa5ysbpBx2G.jpg,8.565285,Romance
351,Forrest Gump,1994,8147,8,48.3072,https://www.google.com/search?q=Forrest Gump,/yE5d3BUhE8hCnkMUJOo1QDoOGNz.jpg,7.971357,Romance
876,Vertigo,1958,1162,8,18.2082,https://www.google.com/search?q=Vertigo,/obhM86qyv8RsE69XSMTtT9FdE0b.jpg,7.811667,Romance
40251,Your Name.,2016,1030,8,34.461252,https://www.funimationfilms.com/movie/yourname/,/xq1Ugd62d23K2knRUx6xxuALTZB.jpg,7.789489,Romance
883,Some Like It Hot,1959,835,8,11.8451,https://www.google.com/search?q=Some Like It Hot,/pxc9EFCMYkItESpqqrI783yl8Gh.jpg,7.745154,Romance
359,The Lion King,1994,5520,8,21.6058,http://movies.disney.com/the-lion-king,/bKPtXn9n4M4s8vvZrbw40mYsefB.jpg,7.909339,Animation
5481,Spirited Away,2001,3968,8,41.0489,http://movies.disney.com/spirited-away,/ynXoOxmDHNQ4UAy0oU6avW71HVW.jpg,7.875933,Animation
9698,Howl's Moving Castle,2004,2049,8,16.136,https://www.google.com/search?q=Howl's Moving ...,/iMarB2ior30OAXjPa7QIdeyUfM1.jpg,7.772103,Animation
2884,Princess Mononoke,1997,2041,8,17.1667,https://www.google.com/search?q=Princess Mononoke,/gzlJkVfWV5VEG5xK25cvFGJgkDz.jpg,7.771305,Animation
5833,My Neighbor Totoro,1988,1730,8,13.5073,https://www.google.com/search?q=My Neighbor To...,/2i0OOjvi7CqNQA6ZtYJtL65P9oZ.jpg,7.735274,Animation


In [140]:
dfGenre.to_csv("../data/top5Genre.csv")