# Lab 8: Recommender System

In this assignment, we will study how to do user-based collaborative filtering and item-based collaborative filtering. 

## 1. Dataset

In this assignment, we will use MovieLens-100K dataset. It includes about 100,000 ratings from 1000 users on 1700 movies.  

In [31]:
from math import sqrt
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
from sklearn.metrics.pairwise import linear_kernel
from sklearn.neighbors import NearestNeighbors


# 1. load data
user_ratings_train = pd.read_csv('./ml-100k/u1.base',
                            sep='\t',names=['user_id','movie_id','rating'], usecols=[0,1,2])

user_ratings_test = pd.read_csv('./ml-100k/u1.test',
                            sep='\t',names=['user_id','movie_id','rating'], usecols=[0,1,2])

movie_info =  pd.read_csv('./ml-100k/u.item', 
                          sep='|', names=['movie_id','title'], usecols=[0,1],
                          encoding="ISO-8859-1")

user_ratings_train = pd.merge(movie_info, user_ratings_train)
user_ratings_test = pd.merge(movie_info, user_ratings_test)

# 2. get the rating matrix. Each row is a user, and each column is a movie.
user_ratings_train = user_ratings_train.pivot_table(index=['user_id'],
                                        columns=['title'],
                                        values='rating')

user_ratings_test = user_ratings_test.pivot_table(index=['user_id'],
                                        columns=['title'],
                                        values='rating')




user_ratings_train = user_ratings_train.reindex(
                            index=user_ratings_train.index.union(user_ratings_test.index), 
                            columns=user_ratings_train.columns.union(user_ratings_test.columns) )

user_ratings_test = user_ratings_test.reindex(
                            index=user_ratings_train.index.union(user_ratings_test.index), 
                            columns=user_ratings_train.columns.union(user_ratings_test.columns) )

print(user_ratings_train.shape)
print(user_ratings_test.shape)

(943, 1664)
(943, 1664)


## Task 1. User-based CF

* Use pearson correlation to get the similarity between different users.
* Based on the obtained similarity score, predict the ratings. You can use 5 nearest neighbors or 10 nearest neighbors.
* Compute MAE for the testing set.

In [36]:

#print(user_ratings_train)

truth_values = []
pred_values = []
currUserID = 0
currItem = ""
for i in range(9):
    for j in range(14):
        if np.isnan(user_ratings_test.iloc[i,j]) == False:
            truth_values.append(user_ratings_test.iloc[i,j])
            currUserID = i+1
            for k in range(9):
                if k != i:
                    np.corrcoef(user_ratings_train.iloc[i,:],user_ratings_train.iloc[k,:])
#            print(currUserID)
#            pred_values.append(pred)

In [37]:
user_ratings_train.corr()

title,'Til There Was You (1997),1-900 (1994),101 Dalmatians (1996),12 Angry Men (1957),187 (1997),2 Days in the Valley (1996),"20,000 Leagues Under the Sea (1954)",2001: A Space Odyssey (1968),3 Ninjas: High Noon At Mega Mountain (1998),"39 Steps, The (1935)",...,Yankee Zulu (1994),Year of the Horse (1997),You So Crazy (1994),Young Frankenstein (1974),Young Guns (1988),Young Guns II (1990),"Young Poisoner's Handbook, The (1995)",Zeus and Roxanne (1997),unknown,Á köldum klaka (Cold Fever) (1994)
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
'Til There Was You (1997),1.000000,-0.003942,-0.007521,0.008973,0.199314,0.112919,-0.019809,-0.012876,-0.003552,-0.017973,...,,-0.005335,-0.002510,-0.033287,0.056328,0.096290,-0.013635,-0.004496,0.257413,-0.002510
1-900 (1994),-0.003942,1.000000,-0.015482,-0.017549,-0.009814,-0.008317,-0.013153,0.019911,-0.002358,-0.011934,...,,0.019656,-0.001667,0.016273,-0.014746,-0.009560,0.001110,-0.002985,0.113069,-0.001667
101 Dalmatians (1996),-0.007521,-0.015482,1.000000,0.063267,-0.024281,0.058784,0.107000,0.056892,-0.013952,0.037125,...,,-0.004130,0.025765,0.165362,0.091784,0.045611,-0.022228,0.117922,-0.015482,-0.009860
12 Angry Men (1957),0.008973,-0.017549,0.063267,1.000000,-0.014412,0.083147,0.169075,0.254716,0.051972,0.298771,...,,-0.023753,-0.011177,0.272342,0.124054,0.094188,0.037182,-0.003929,-0.017549,0.084638
187 (1997),0.199314,-0.009814,-0.024281,-0.014412,1.000000,0.077652,-0.016475,-0.010231,0.061833,-0.034333,...,,0.151848,-0.006250,-0.033014,0.006570,0.013975,0.084908,-0.011194,0.107828,0.093649
2 Days in the Valley (1996),0.112919,-0.008317,0.058784,0.083147,0.077652,1.000000,0.053702,0.080852,0.058557,-0.010489,...,,-0.016268,0.058359,0.060634,0.146263,0.148465,0.165949,0.051316,0.084969,0.092309
"20,000 Leagues Under the Sea (1954)",-0.019809,-0.013153,0.107000,0.169075,-0.016475,0.053702,1.000000,0.355740,0.065766,0.229815,...,,-0.000531,0.101334,0.219077,0.136834,0.038591,0.030172,0.009559,-0.013153,-0.008377
2001: A Space Odyssey (1968),-0.012876,0.019911,0.056892,0.254716,-0.010231,0.080852,0.355740,1.000000,0.045300,0.275574,...,,0.045212,0.041682,0.344152,0.128604,0.057061,0.059375,-0.011915,0.023705,0.061015
3 Ninjas: High Noon At Mega Mountain (1998),-0.003552,-0.002358,-0.013952,0.051972,0.061833,0.058557,0.065766,0.045300,1.000000,0.063090,...,,-0.003192,-0.001502,0.041562,-0.013289,-0.008615,0.028478,-0.002690,-0.002358,-0.001502
"39 Steps, The (1935)",-0.017973,-0.011934,0.037125,0.298771,-0.034333,-0.010489,0.229815,0.275574,0.063090,1.000000,...,,0.086545,-0.007600,0.133283,0.037412,0.021478,0.012711,0.017543,-0.011934,-0.007600


In [33]:
# your code
for i in range(943):
    for j in range(1664):
        if np.isnan(user_ratings_train.iloc[i,j]):
            user_ratings_train.iat[i,j] = 0

In [34]:
user_ratings_train

title,'Til There Was You (1997),1-900 (1994),101 Dalmatians (1996),12 Angry Men (1957),187 (1997),2 Days in the Valley (1996),"20,000 Leagues Under the Sea (1954)",2001: A Space Odyssey (1968),3 Ninjas: High Noon At Mega Mountain (1998),"39 Steps, The (1935)",...,Yankee Zulu (1994),Year of the Horse (1997),You So Crazy (1994),Young Frankenstein (1974),Young Guns (1988),Young Guns II (1990),"Young Poisoner's Handbook, The (1995)",Zeus and Roxanne (1997),unknown,Á köldum klaka (Cold Fever) (1994)
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,0.0,5.0,0.0,0.0,3.0,4.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,...,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,4.0,0.0,0.0,5.0,5.0,0.0,4.0,...,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Task 2. Item-based CF
* Use cosine similarity to get the similarity between different items.
* Based on the obtained similarity score, predict the ratings. You can use 5 nearest neighbors or 10 nearest neighbors.
* Compute MAE for the testing set.

In [23]:
# your code
