# Movie Recommendations using MovieLens Dataset

## Loading the Datasets

In [1]:
import numpy as np
import pandas as pd

df_movies = pd.read_csv('https://raw.githubusercontent.com/warriorkitty/orientlens/master/movielens/movies.csv')
df_ratings = pd.read_csv('https://raw.githubusercontent.com/warriorkitty/orientlens/master/movielens/ratings.csv')

In [2]:
df_movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [3]:
df_ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,5.0,847117005
1,1,2,3.0,847642142
2,1,10,3.0,847641896
3,1,32,4.0,847642008
4,1,34,4.0,847641956


Creating necessary dictionaries from the DataFrame objects.

In [4]:
movies = {}
for _, row in df_movies.iterrows():
  movies[row['movieId']] = row['title']

ratings = {}
for _, row in df_ratings.iterrows():
  if row['userId'] not in ratings:
    ratings[row['userId']] = {}
  ratings[row['userId']][row['movieId']] = row['rating']

## Getting Recommendations by Collaborative Filtering
Here, we use **Pearson Correlation** as the similarity measure to find similarity between two users. The idea here is to recommend movies that are rated well by other similar users.


We calculate Pearson Correlation using this expression given in Wikipedia:
<img src="images/pearson_formula.png">
<img src="images/pearson_variables.png">

In [5]:
from math import sqrt

def pearsonCorrelation(ratings, u1, u2):
  both_rated = set()# Movies rated by both u1 and u2
  
  for mov in ratings[u1]:
    if mov in ratings[u2]:
      both_rated.add(mov)
  
  n = len(both_rated)
  if(n == 0):# If they havn't rated any common movie
    return 0
  
  # Calculating the Means
  x_mean = sum([ratings[u1][mov] for mov in both_rated])/n
  y_mean = sum([ratings[u2][mov] for mov in both_rated])/n
  
  xy_sum = sum([ratings[u1][mov]*ratings[u2][mov] for mov in both_rated])
  
  # Calculating the sum of squares
  x_sqsum = sum([pow(ratings[u1][mov], 2) for mov in both_rated])
  y_sqsum = sum([pow(ratings[u2][mov], 2) for mov in both_rated])
  
  numerator = xy_sum - n*x_mean*y_mean
  denominator = sqrt(x_sqsum - n*pow(x_mean, 2))*sqrt(y_sqsum - n*pow(y_mean, 2))
  
  if denominator == 0:# Avoiding Divide by Zero Error
    return 0
  
  corr = numerator/denominator
  return corr

Using Pearson correlation scores as weights, we calulate the weighted average of ratings given by other users to get a recommendation score for each movie which we can then sort to find the best recommendations.

In [6]:
def getRecommendations(ratings, user):
  totals = {}# Weighted sum of ratings for each movie by other users
  simSums = {}# Sum of similarity scores for each movie
  for other_user in ratings:
    if other_user == user:# No need to check similarity with oneself
      continue
    
    similarity = pearsonCorrelation(ratings, user, other_user)
    if similarity <= 0:# We can ignore if similarity score is negative
      continue

    for mov in ratings[other_user]:
      if mov not in ratings[user]:# Only consider recommending movies that the user hasn't rated yet
        if mov not in totals:
          totals[mov] = 0
        totals[mov] += ratings[other_user][mov]*similarity
        
        if mov not in simSums:
          simSums[mov] = 0
        simSums[mov] += similarity
  
  rankings=[(total/simSums[mov], mov) for mov, total in totals.items( )]
  rankings.sort(reverse = True)# Sort in descending order
  
  return rankings

## Printing Top 10 Recommendations for a few users

In [7]:
def printTopRecommendations(ratings, movies, user, n = 10):
  rankings = getRecommendations(ratings, user)
  n = min(len(rankings), n)
  for score, mov in rankings[:n]:
    match = int((score/5)*100)# A Percentage Recommendation Score
    print(movies[mov], ' (Match =', match, '%)')

In [8]:
printTopRecommendations(ratings, movies, 568, 10)

Louis C.K.: Live at The Comedy Store (2015)  (Match = 100 %)
Comedy Central Roast of Justin Bieber (2015)  (Match = 100 %)
Going Clear: Scientology and the Prison of Belief (2015)  (Match = 100 %)
Kurt Cobain: Montage of Heck (2015)  (Match = 100 %)
The Missing (2014)  (Match = 100 %)
Bill Burr: You People Are All the Same (2012)  (Match = 100 %)
A Most Violent Year (2014)  (Match = 100 %)
Mommy (2014)  (Match = 100 %)
Unbroken (2014)  (Match = 100 %)
Selma (2014)  (Match = 100 %)


In [9]:
printTopRecommendations(ratings, movies, 389, 10)

Silent Movie (1976)  (Match = 100 %)
Noi the Albino (Nói albinói) (2003)  (Match = 100 %)
Some Mother's Son (1996)  (Match = 100 %)
Diva (1981)  (Match = 100 %)
Great Day in Harlem, A (1994)  (Match = 100 %)
Comedy Central Roast of Justin Bieber (2015)  (Match = 100 %)
Going Clear: Scientology and the Prison of Belief (2015)  (Match = 100 %)
Kurt Cobain: Montage of Heck (2015)  (Match = 100 %)
Louis C.K.: Live at The Comedy Store (2015)  (Match = 100 %)
The Missing (2014)  (Match = 100 %)
