# Recommender System

This code aims to create a Recommender system,for estimating the movie ratings of a user. It employs a Model-based Collaborative filtering system whereby it estimates the movie ratings of movie which the user hasn't reviewed with help of the ratings of other users who have reviewed the movie

## Importing Libraries and Reading Data

In [2]:
import numpy as np
import pandas as pd

In [3]:
#Creating a data frame with relevant information
columns = ['user_id', 'item_id', 'rating', 'timestamp']
#Since the file is tab separated the sep parameter is specified
data = pd.read_csv('u.data', sep='\t', names=columns)
movie_data=pd.read_csv("Movie_Id_Titles")

## Quick look at the data

In [4]:
data.head()

Unnamed: 0,user_id,item_id,rating,timestamp
0,0,50,5,881250949
1,0,172,5,881250949
2,0,133,1,881250949
3,196,242,3,881250949
4,186,302,3,891717742


In [5]:
movie_data.head()

Unnamed: 0,item_id,title
0,1,Toy Story (1995)
1,2,GoldenEye (1995)
2,3,Four Rooms (1995)
3,4,Get Shorty (1995)
4,5,Copycat (1995)


In [6]:
#Here we merge the 2 data frames along the item_id column to obtain a combined data fram
data = pd.merge(data,movie_data,on='item_id')
data.head()

Unnamed: 0,user_id,item_id,rating,timestamp,title
0,0,50,5,881250949,Star Wars (1977)
1,290,50,5,880473582,Star Wars (1977)
2,79,50,4,891271545,Star Wars (1977)
3,2,50,5,888552084,Star Wars (1977)
4,8,50,5,879362124,Star Wars (1977)


In [7]:
user_no = data.user_id.nunique()
item_no = data.item_id.nunique()
#Total number of Users and movies respectively
user_no,item_no

(944, 1682)

In [8]:
#Implementing the train-test split
from sklearn.model_selection import train_test_split
train_data, test_data = train_test_split(data, test_size=0.25)

# Collaborative Filtering

Here we will implement Model-based Collaborative filtering

In [9]:
#Here we first create 2 empty matrices,for our training and test data
#Then for each row in the traina and test data we assign the value of the rating(row[3])
#to the corresponding position in our empty matrices(row[1]-1,row[2]-1)
train_data_matrix = np.zeros((user_no, item_no))
for row in train_data.itertuples():
    train_data_matrix[row[1]-1, row[2]-1] = row[3]  

test_data_matrix = np.zeros((user_no,item_no))
for line in test_data.itertuples():
    test_data_matrix[line[1]-1, line[2]-1] = row[3]

In [10]:
#Next we calculate the sparsity level of our matrix to get an idea of amount of zero elements it contains
#Sparsity level is the number of zero elements divided by the total number of elements
sparsity_level=1.0-(len(data)/(user_no*item_no))
print("Sparsity level:",sparsity_level)

Sparsity level: 0.9370182037122876


### This confirms the fact that our user-item matrix is a sparse matrix,which is expected since not every user has reviewed every movie in our dataset and there must be quite a lot of zeros

# Singular Value Decomposition

We use SVD to do matrix factorisation,where we basically convert our sparse user-item matrix into a low rank structure where
our matrix can be approximated as a product of low rank matrices 

Let X be the matrix we want to approximate.Let A be the matrix where each row represents the feature vector of a user
Similarly let be B be such a matrix for the movies.Then X can be approximated as 

                              **X=ADB'**

Here D is a diagonal matrix with each diagonal element being a non_negative singular value of X.
A singular value of X is basically the square root of an eigenvalue of the matrix XX' 

In [21]:
import scipy.sparse as sp
from scipy.sparse.linalg import svds
from sklearn.metrics import mean_squared_error
#get SVD components from train matrix. Choose k.
a, d, bt = svds(train_data_matrix, k = 20)
d_diag_matrix=np.diag(d)
X_pred = np.dot(np.dot(a, d_diag_matrix), bt)

## Error Calculation

In [22]:
print('Root mean squared error:',mean_squared_error(X_pred, test_data_matrix)**0.5)

Root mean squared error: 0.6184969149423699


## Final Prediction

In [13]:
#Test_data matrix with a lot of zeros
test_data_matrix


In [23]:
#Prediction matrix where we have filled all the zero values with estimated values of ratings
X_pred

In [24]:
#Matrix with rounded off values
X_pred.round(decimals=1)