## Popularity Based model

The first model is going to be based on what movies are the most popular in the entire database of movies we have. These movies that seem to be univerally liked (by majority) would be a good place to start for a basic model.
<br><br>

For example Shawshank redemption, The Godfather and The Dark knight are the top 3 rated movies from IMDB (https://www.imdb.com/search/title/?count=100&groups=top_1000&sort=user_rating). These movies have a universal appeal and recommending these movies to anyone, there is a high probability that they would enjoy these movies as well
<br><br>
But if we just go with the movies that are highly rated, we could run into an issue. Let's say there is a new movie or a movie that very few people have rated. And everyone who has rated the movie has given it a 5 star rating. Now even though the average might be higher than Shawshank redemption, it would be a fallacy to consider it a popular movie as there aren't enough data points to validate the claim.
<br><br>
So to make the recommendations fair, we are going with a weighted popularity calculation model.
<pre>
            (WR) = (v / (v+m)) × R + (m / (v+m)) × C

            Where:

            R = average for the movie (mean) = (rating)

            v = number of votes for the movie = (votes)

            m = minimum votes required to be listed in the Top Rated list 

            C = the mean vote across the whole report
</pre>

In [1]:
#import
import pandas as pd

In [2]:
ratings = pd.read_csv('Data/ratings.csv')
ratings

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931
...,...,...,...,...
100831,610,166534,4.0,1493848402
100832,610,168248,5.0,1493850091
100833,610,168250,5.0,1494273047
100834,610,168252,5.0,1493846352


In [3]:
## Making a list of all the movie ids
movies = pd.read_csv('Data/movies.csv')
movieIds = movies['movieId']

In [4]:
## Calculating R - average rating for the movie
R_series = ratings.groupby(['movieId'])['rating'].mean()
R = pd.DataFrame({'movieId':R_series.index, 'R': R_series.values})

In [5]:
## Calculating V - Number of votes for each movie
v_series = ratings['movieId'].value_counts()
v = pd.DataFrame({'movieId':v_series.index, 'v':v_series.values})

In [6]:
##Joining V and R
wr = pd.merge(R, v , on='movieId', how='inner')

In [7]:
wr

Unnamed: 0,movieId,R,v
0,1,3.920930,215
1,2,3.431818,110
2,3,3.259615,52
3,4,2.357143,7
4,5,3.071429,49
...,...,...,...
9719,193581,4.000000,1
9720,193583,3.500000,1
9721,193585,3.500000,1
9722,193587,3.500000,1


In [8]:
## Calculating m- minimum votes required for a movie to be considered
## We are taking 75% quantile for this value
m = v_series.quantile([.75])
m

0.75    9.0
Name: movieId, dtype: float64

In [9]:
## Calculating C - the mean vote across the whole report
C = ratings['rating'].mean()
C

3.501556983616962

In [10]:
# Traverse through the dataframe and calculate weighted rating
wr['weightedRating'] = ((wr['v'] / (wr['v']+m)) * wr['R']) + ((m / (wr['v']+m)) * C)

In [11]:
wr['weightedRating'] = ((wr['v'] / (wr['v']+m.values)) * wr['R']) + ((m.values/(wr['v']+m.values)) * C)
wr

Unnamed: 0,movieId,R,v,weightedRating
0,1,3.920930,215,3.904080
1,2,3.431818,110,3.437093
2,3,3.259615,52,3.295312
3,4,2.357143,7,3.000876
4,5,3.071429,49,3.138173
...,...,...,...,...
9719,193581,4.000000,1,3.551401
9720,193583,3.500000,1,3.501401
9721,193585,3.500000,1,3.501401
9722,193587,3.500000,1,3.501401


In [14]:
for m in wr.nlargest(20, 'weightedRating').movieId:
    print(movies[movies.movieId==m].title)

277    Shawshank Redemption, The (1994)
Name: title, dtype: object
659    Godfather, The (1972)
Name: title, dtype: object
2226    Fight Club (1999)
Name: title, dtype: object
922    Godfather: Part II, The (1974)
Name: title, dtype: object
46    Usual Suspects, The (1995)
Name: title, dtype: object
224    Star Wars: Episode IV - A New Hope (1977)
Name: title, dtype: object
602    Dr. Strangelove or: How I Learned to Stop Worr...
Name: title, dtype: object
914    Goodfellas (1990)
Name: title, dtype: object
461    Schindler's List (1993)
Name: title, dtype: object
6710    Dark Knight, The (2008)
Name: title, dtype: object
6315    Departed, The (2006)
Name: title, dtype: object
899    Princess Bride, The (1987)
Name: title, dtype: object
686    Rear Window (1954)
Name: title, dtype: object
898    Star Wars: Episode V - The Empire Strikes Back...
Name: title, dtype: object
694    Casablanca (1942)
Name: title, dtype: object
257    Pulp Fiction (1994)
Name: title, dtype: object
900    Rai