## Simple recommenders
These recommenders offer generalized recommendations to every user, based on movie popularity and/or genre. The basic idea behind this system is that movies that are more popular and critically acclaimed will have a higher probability of being liked by the average audience.

Simple recommenders are basic systems that recommend the top items based on a certain metric or score. In this notebook, we will build a simplified clone of IMDB Top 250 Movies using metadata collected from TMDB.

The following are the steps involved:
- Decide on the metric or score to rate movies on.
- Calculate the score for every movie.
- Sort the movies based on the score and output the top results.

In [1]:
# importing packages
import numpy as np
import pandas as pd

In [2]:
# reading input files
#https://www.kaggle.com/tmdb/tmdb-movie-metadata
df_credits = pd.read_csv("tmdb_5000_credits.csv")
df_movies = pd.read_csv("tmdb_5000_movies.csv")
df_credits.shape, df_movies.shape

((4803, 4), (4803, 20))

In [3]:
df_movies.head(1)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800


In [4]:
df_credits.head(1)

Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."


In [5]:
# initial cleaning of data
df_credits.rename(columns = {"movie_id": "id"}, inplace = True)
df_movies_merge = df_movies.merge(df_credits[['id', 'cast', 'crew']], on = 'id')
df_movies_merge.head(1)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,cast,crew
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."


In [6]:
df_movies_merge.columns

Index(['budget', 'genres', 'homepage', 'id', 'keywords', 'original_language',
       'original_title', 'overview', 'popularity', 'production_companies',
       'production_countries', 'release_date', 'revenue', 'runtime',
       'spoken_languages', 'status', 'tagline', 'title', 'vote_average',
       'vote_count', 'cast', 'crew'],
      dtype='object')

In [7]:
df_movies_merge.drop(columns = ['homepage', 'status', 'production_countries', 'title'], inplace = True)
df_movies_merge.head(1)

Unnamed: 0,budget,genres,id,keywords,original_language,original_title,overview,popularity,production_companies,release_date,revenue,runtime,spoken_languages,tagline,vote_average,vote_count,cast,crew
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Enter the World of Pandora.,7.2,11800,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."


Let's use its weighted rating formula as a metric/score. Mathematically, it is represented as follows:
<img src="wt_rating_formula.jpg" alt="Alt text that describes the graphic" title="Title text" />

In [8]:
# assigning variables for wieghted rating calculation
averageRatingMovie = df_movies_merge['vote_average']
numberOfVotes = df_movies_merge['vote_count']
minVoteReqd = df_movies_merge['vote_count'].quantile(0.70)
meanVote = df_movies_merge['vote_average'].mean()

In [9]:
df_movies_merge['weighted_rating'] = ((averageRatingMovie * numberOfVotes) + (meanVote * minVoteReqd)) / (numberOfVotes + minVoteReqd)
df_movies_merge.head(1)

Unnamed: 0,budget,genres,id,keywords,original_language,original_title,overview,popularity,production_companies,release_date,revenue,runtime,spoken_languages,tagline,vote_average,vote_count,cast,crew,weighted_rating
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Enter the World of Pandora.,7.2,11800,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de...",7.148013


Finally, let's sort the DataFrame in descending order based on the score feature column and output the title, vote count, vote average, and weighted rating (score) of the top 20 movies.

In [10]:
# Best rated movies
df_movies_wt_sorted = df_movies_merge.sort_values('weighted_rating', ascending = False)
df_movies_wt_sorted[['original_title', 'vote_count', 'vote_average', 'weighted_rating', 'popularity']].head(20)

Unnamed: 0,original_title,vote_count,vote_average,weighted_rating,popularity
1881,The Shawshank Redemption,8205,8.5,8.340775,136.747729
3337,The Godfather,5893,8.4,8.192887,143.659698
662,Fight Club,9413,8.3,8.171648,146.757391
3232,Pulp Fiction,8428,8.3,8.157615,121.463076
65,The Dark Knight,12002,8.2,8.102674,187.322927
809,Forrest Gump,7927,8.2,8.056059,138.133331
1818,Schindler's List,4329,8.3,8.038748,104.469351
3865,Whiplash,4254,8.3,8.034695,192.528841
96,Inception,13752,8.1,8.018611,167.58371
1990,The Empire Strikes Back,5879,8.2,8.010426,78.51783


Well, from the above output, we can see that the `simple recommender` did a great job!

#### Recommendation based on scaled weighted average and popularity score(Priority is given 50% to both)
Introduce a popularity filter: this recommender would take the most similar movies, calculate the weighted ratings (using the IMDB formula from above), sort movies based on this rating, and return the top 10 movies.

In [11]:
from sklearn.preprocessing import MinMaxScaler

scaling = MinMaxScaler()
df_movies_col_scaled = scaling.fit_transform(df_movies_merge[['weighted_rating','popularity']])
df_movies_col_scaled

array([[0.6743388 , 0.17181451],
       [0.5814027 , 0.15884603],
       [0.43627257, 0.12263486],
       ...,
       [0.38859469, 0.00164973],
       [0.38478644, 0.00097879],
       [0.38758191, 0.00220412]])

In [12]:
df_movies_norm = pd.DataFrame(df_movies_col_scaled, columns=['norm_weighted_rating','norm_popularity'])
df_movies_norm.head(2)

Unnamed: 0,norm_weighted_rating,norm_popularity
0,0.674339,0.171815
1,0.581403,0.158846


In [13]:
# assigning 50% weights to both features
df_movies_merge[['normalized_weighted_rating','normalized_popularity']] = df_movies_norm
df_movies_merge['score'] = df_movies_merge['normalized_weighted_rating'] * 0.5 + df_movies_merge['normalized_popularity'] * 0.5
df_scored = df_movies_merge.sort_values(['score'], ascending=False)
df_scored[['original_title', 'normalized_weighted_rating', 'normalized_popularity', 'score']].head(20)

Unnamed: 0,original_title,normalized_weighted_rating,normalized_popularity,score
95,Interstellar,0.906439,0.827162,0.866801
546,Minions,0.46063,1.0,0.730315
94,Guardians of the Galaxy,0.851874,0.549462,0.700668
788,Deadpool,0.725217,0.58769,0.656453
127,Mad Max: Fury Road,0.670973,0.495989,0.583481
1881,The Shawshank Redemption,1.0,0.156179,0.57809
65,The Dark Knight,0.934991,0.213941,0.574466
3865,Whiplash,0.916431,0.219887,0.568159
3337,The Godfather,0.959622,0.164074,0.561848
662,Fight Club,0.953823,0.167611,0.560717
