### Rating Based Movie Recommendation System
We can use the average ratings of the movie as the score but using this won't be fair enough since a movie with 8.9 average rating and only 3 votes cannot be considered better than the movie with 7.8 as as average rating but 40 votes

So, we use The IMDB's weighted rating (wr) formula:

wr = (v * R + m * C) / (v + m)

Where:
- v is the number of votes the movie has received.
- R is the average rating of the movie.
- m is the minimum number of votes required for the movie to be considered in the calculation.
- C is the mean vote across the whole report.


In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv('data/tmdb_5000.csv')

In [3]:
# v = vote_count, r = vote_average

C = df['vote_average'].mean()
C

6.092171559442016

To determine an appropriate value for m, the minimum votes required to be listed in the chart. We will use 90th percentile as our cutoff. In other words, for a movie to feature in the charts, it must have more votes than at least 90% of the movies in the list

In [4]:
m = df['vote_count'].quantile(0.9)
m

1838.4000000000015

Filter out the movies that qualify for the chart

In [5]:
q_movies = df.copy().loc[df['vote_count'] >= m]
q_movies.shape

(481, 22)

There are 481 movies which qualify to be in this list. Now, we need to calculate our metric for each qualified movie. To do this, we will define a function, weighted_rating() and define a new feature score.

In [6]:
def weighted_rating(x, m=m, C=C):
    v = x['vote_count']
    R = x['vote_average']
    # Calculation based on the IMDB formula
    return (v/(v+m) * R) + (m/(m+v) * C)

In [7]:
# Define a new feature 'score' and calculate its value with `weighted_rating()`
q_movies['score'] = q_movies.apply(weighted_rating, axis=1)

In [8]:
#Sort movies based on score calculated above
q_movies = q_movies.sort_values('score', ascending=False)

#Print the top 15 movies
q_movies[['title', 'vote_count', 'vote_average', 'score']].head(10)

Unnamed: 0,title,vote_count,vote_average,score
1881,The Shawshank Redemption,8205,8.5,8.059258
662,Fight Club,9413,8.3,7.939256
65,The Dark Knight,12002,8.2,7.92002
3232,Pulp Fiction,8428,8.3,7.904645
96,Inception,13752,8.1,7.863239
3337,The Godfather,5893,8.4,7.851236
95,Interstellar,10867,8.1,7.809479
809,Forrest Gump,7927,8.2,7.803188
329,The Lord of the Rings: The Return of the King,8064,8.1,7.727243
1990,The Empire Strikes Back,5879,8.2,7.697884
