In this project, we build a **Recommender system** using the MovieLens Dataset.

Given a name of the movie, the model should output the top 5 movies similar to that particular movie.

First we would create a rating matrix and normalise it. 
Then, we weould compute the SVD (Singular value decomposition) of this normalised rating matrix.

We will define a function that computes the cosine similarity between any two movies.
And Based on that cosine similarity, we sort the movies which are most similar and return the top 5 movies that match a given a movie title.

In [1]:
# Import all the required libraries

import numpy as np
import pandas as pd
from numpy import dot

#### Read the Dataset from three files containing the ratings, movies and users info

In [2]:
ratings_data = pd.read_csv('ratings.dat', delimiter = '::', names= ['userId','movieId','rating','timestamp'], engine='python')
movies_data = pd.read_csv('movies.dat', delimiter = '::', names=['movieId', 'movie_title', 'genres'], engine='python', encoding = "ISO-8859-1");
user_data = pd.read_csv('users.dat', delimiter = '::', names= ['userId','Gender','Age','Occupation','Zip-code'], engine='python')

### Rating Matrix -

The rows of the matrix are movies and columns will represent the users

In [3]:
rating_matrix = np.ndarray(shape=(np.max(ratings_data.movieId.to_numpy()), np.max(ratings_data.userId.to_numpy())), dtype=np.uint8)
rating_matrix[ratings_data.movieId.to_numpy()-1,ratings_data.userId.to_numpy()-1] = ratings_data.rating.to_numpy()

In [4]:
# Normalise the raating matrix -

temp_norm_matrix = (rating_matrix-rating_matrix.mean())/rating_matrix.std()
normalised_matrix = temp_norm_matrix.T/np.sqrt(rating_matrix.shape[0]-1)

### Singular Value Decomposition

$$ \mathbf {M} =\mathbf {U\Sigma V^{T}} $$

SVD breaks down a normal matrix (linear transformation) into three fundamental parts: a left singular matrix (for rotation required to prepare the space for scaling), a diagonal matrix (for axis aligned scaling), and a right singular matrix (another rotation to move the now properly scaled space into it's ultimate rotational position)

In [5]:
# To compute the SVD of the normalised matrix, we can use the svd function from np.linalg

U,S,V = np.linalg.svd(normalised_matrix)

### Cosine Similarity
Now we compute the Cosine Similarity to find the similarity between any two movies. Mathematically the cosine similarity is given as follows: 

$$ cosine(x,y) = \frac{x\cdot y^T}{||x||\cdot ||y||}  $$

In [6]:
# Out of the U, S and V that we have obtained through SVD, V (Right Singular Matrix) is the one we'll use to determine the -
# Cosine Similarity because V-Transpose represents the similarity between the items and the latent factors.
# While finding cosine similarity (in the print function), we'll iterate through columns of V.T --
column = V.transpose()[:,:500]  
# The value 500 above represents the number of singular values I'm using and can be changed. 
# 500 gave me the closest output for Universal Soldier(id=2808) example.
modulus = np.sqrt(np.einsum('ab,ab->a',column,column))

In [7]:
def print_top5_recommendations(movieId):
    print('Top 5 Recommendations for '+movies_data[movies_data.movieId == movieId].movie_title.to_numpy()[0]+': \n')
    i = movieId - 1  # movie_id is one more than the index of that movie ([0] = 1. Toy Story)
    r = column[i, :]  # from V.transpose column, here we get to the row of this particular movie
    cos_sim = np.dot(r, column.T)/(modulus[i]*modulus)  # computing Cosine Similarity
    descending_list = np.argsort(-(cos_sim))  # Sorting the list obtained in most similar to least similar order
    top_five = descending_list[:6]  # the Top 5 similar movies
    # Printing the similar movies, skipping the movie itself, because the most similar option will be the movie itself
    for id in top_five+1:
        if id == movieId:
            pass
        else:
            print(movies_data[movies_data.movieId == id].movie_title.to_numpy()[0])

##### Get the top 5 recommendations given a title




In [8]:
print_top5_recommendations(2808) # 2808 is the id for Universal Soldier(1992)

Top 5 Recommendations for Universal Soldier (1992): 

Soldier (1998)
Solo (1996)
Universal Soldier: The Return (1999)
Judge Dredd (1995)
Timecop (1994)
