# Recommender Systems

Based on https://www.datacamp.com/tutorial/recommender-systems-python


## Import MovieLens dataset

In [1]:
import pandas as pd

metadata = pd.read_csv('resources/movies/movies_metadata.csv', low_memory=False)
pd.DataFrame(metadata.columns, columns=['columns'])

Unnamed: 0,columns
0,adult
1,belongs_to_collection
2,budget
3,genres
4,homepage
5,id
6,imdb_id
7,original_language
8,original_title
9,overview


## Simple Recommendation

$Weighted Rating = \left({{\bf v} \over {\bf v} + {\bf m}} \cdot R\right) + \left({{\bf m} \over {\bf v} + {\bf m}} \cdot C\right)$

In the above equation,

* v is the number of votes for the movie;
* m is the minimum votes required to be listed in the chart;
* R is the average rating of the movie;
* C is the mean vote across the whole report.

In [2]:
m = metadata.vote_count.quantile(0.9)
C = metadata.vote_average.mean()
print(f'Minimum Votes needed (90% percentile): {m} votes minimum \nMean Vote: {C:.3f}')


Minimum Votes needed (90% percentile): 160.0 votes minimum 
Mean Vote: 5.618


In [3]:
def weighted_rating(x, m=m, C=C):
    v = x['vote_count']
    R = x['vote_average']
    return (v / (v + m) * R) + (m / (v + m) * C)

Filter out all movies that have less than **m** votes

In [4]:
q_movies = metadata.copy().loc[metadata.vote_count >= m]
q_movies.shape


(4555, 24)

In [5]:
q_movies['score'] = q_movies.apply(weighted_rating, axis=1)
# q_movies['score'] = q_movies.vote_count / (q_movies.vote_count + m) * q_movies.vote_average + (m / (q_movies.vote_count + m) * C)


In [6]:
q_movies = q_movies.sort_values('score', ascending=False)
q_movies[['title', 'vote_count', 'vote_average', 'score']].head(10)


Unnamed: 0,title,vote_count,vote_average,score
314,The Shawshank Redemption,8358.0,8.5,8.445869
834,The Godfather,6024.0,8.5,8.425439
10309,Dilwale Dulhania Le Jayenge,661.0,9.1,8.421453
12481,The Dark Knight,12269.0,8.3,8.265477
2843,Fight Club,9678.0,8.3,8.256385
292,Pulp Fiction,8670.0,8.3,8.251406
522,Schindler's List,4436.0,8.3,8.206639
23673,Whiplash,4376.0,8.3,8.205404
5481,Spirited Away,3968.0,8.3,8.196055
2211,Life Is Beautiful,3643.0,8.3,8.187171
