# Lab 10: Recommender System

In this assignment, we will study how to do user-based collaborative filtering and item-based collaborative filtering. 

## 1. Dataset

In this assignment, we will use MovieLens-100K dataset. It includes about 100,000 ratings from 1000 users on 1700 movies.  

In [2]:
from math import sqrt
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
from sklearn.metrics.pairwise import linear_kernel
from sklearn.neighbors import NearestNeighbors


# 1. load data
user_ratings_train = pd.read_csv('./ml-100k/u1.base',
                            sep='\t',names=['user_id','movie_id','rating'], usecols=[0,1,2])

user_ratings_test = pd.read_csv('./ml-100k/u1.test',
                            sep='\t',names=['user_id','movie_id','rating'], usecols=[0,1,2])

movie_info =  pd.read_csv('./ml-100k/u.item', 
                          sep='|', names=['movie_id','title'], usecols=[0,1],
                          encoding="ISO-8859-1")

user_ratings_train = pd.merge(movie_info, user_ratings_train)
user_ratings_test = pd.merge(movie_info, user_ratings_test)

# 2. get the rating matrix. Each row is a user, and each column is a movie.
user_ratings_train = user_ratings_train.pivot_table(index=['user_id'],
                                        columns=['title'],
                                        values='rating')

user_ratings_test = user_ratings_test.pivot_table(index=['user_id'],
                                        columns=['title'],
                                        values='rating')




user_ratings_train = user_ratings_train.reindex(
                            index=user_ratings_train.index.union(user_ratings_test.index), 
                            columns=user_ratings_train.columns.union(user_ratings_test.columns) )

user_ratings_test = user_ratings_test.reindex(
                            index=user_ratings_train.index.union(user_ratings_test.index), 
                            columns=user_ratings_train.columns.union(user_ratings_test.columns) )

print(user_ratings_train.shape)
print(user_ratings_test.shape)

(943, 1664)
(943, 1664)


## Task 1. User-based CF

* Use pearson correlation to get the similarity between different users.
* Based on the obtained similarity score, predict the ratings. You can use 5 nearest neighbors or 10 nearest neighbors.
* Compute MAE for the testing set.

In [75]:
# your code
import math
user_ratings_train = user_ratings_train.fillna(0)
user_ratings_test = user_ratings_test.fillna(0)

train_mean = user_ratings_train.mean(axis=1)
test_mean = user_ratings_test.mean(axis=1) 

train_means_expanded = np.outer(train_mean, np.ones(1664))
test_means_expanded = np.outer(test_mean, np.ones(1664))

train_means_subtracted = user_ratings_train - train_means_expanded
test_means_subtracted = user_ratings_test - test_means_expanded
#print(test_means_subtracted)

similarity = []

denominator_0 = train_means_subtracted.loc[1]
denominator_0 = np.square(denominator_0)
denominator_0 = denominator_0.sum()
denominator_0 = math.sqrt(denominator_0)

for i in range(2, 943):
    numerator = train_means_subtracted.loc[1] * train_means_subtracted.loc[i]
    numerator = numerator.sum()
    
    denominator_i = np.square(train_means_subtracted.loc[i])
    denominator_i = denominator_i.sum()
    denominator_i = math.sqrt(denominator_i)
      
    denominator = denominator_i * denominator_0
    
    similarity.append((numerator / denominator))
    
     
        




[0.06023133550939565, 0.0217585221493863, -0.0028237515721901414, 0.14656521317545895, 0.2337078388084833, 0.12184548473856181, 0.06722637292591452, 0.04073439767659206, 0.14849943814687058, 0.1032164913065394, 0.14635594271403227, 0.1583994499723617, 0.11942411319834234, 0.10396175268221255, 0.2059404194388766, 0.09665501083196565, 0.23467263447490413, 0.03245186954049099, 0.08906373042659763, 0.08088909018290424, 0.13876728419503104, 0.20641786048630123, 0.16332016580165476, 0.11888342903689968, 0.0750149741950792, 0.11056411169746798, 0.03754345216853656, 0.05199845974872769, 0.06502813793278132, 0.07796024807467745, 0.13532041033409375, 0.00018201863591738054, -0.01981620844325676, -0.020725264243110494, -0.01052050200618659, 0.10018648381407691, -0.001227832690646877, 0.04068451280844907, 0.014132217497557751, 0.20462112633728993, 0.1593798988520586, 0.14145387907823243, 0.20738920375838166, 0.11098629809623621, 0.053295396249772656, 0.0049739244578022, 0.022549384804283517, 0.087

## Task 2. Item-based CF
* Use cosine similarity to get the similarity between different items.
* Based on the obtained similarity score, predict the ratings. You can use 5 nearest neighbors or 10 nearest neighbors.
* Compute MAE for the testing set.

In [23]:
# your code