# Introduction

The algorithm I will use for movie recommendation is based on Matrix Factorization using Singular Value Decomposition (SVD).


## Mathematical Concepts Behind the Algorithm:

### 1. Matrix Factorization:
Matrix factorization is a collaborative filtering technique where we decompose the user-item interaction matrix into lower-dimensional matrices, capturing latent factors that represent users and items.

### 2. Singular Value Decomposition (SVD):
SVD is a type of matrix factorization that decomposes a matrix 𝐴 into three matrices:

$$A = U\Sigma V^{T}$$


* 𝑈 is a matrix of user factors.
* Σ is a diagonal matrix of singular values.
* $V^{T}$ is a matrix of item factors.

## Steps of the Algorithm

### 1. Construct the User-Item Matrix:

A matrix 𝑅 is created where each row represents a user, each column represents a movie, and each cell contains the rating given by the user to the movie. If a user has not rated a movie, the cell value is zero.

### 2. Apply SVD:

* I apply Truncated SVD to 𝑅, reducing it to lower-dimensional matrices:

$$𝑅≈𝑈_{𝑘}Σ_{𝑘} 𝑉^{T}_{k}$$
* $𝑈_{𝑘}$ contains the user factors
* $Σ_{𝑘}$ contains the top 𝑘 singular values
* $𝑉^{T}_{k}$ contains the item factors.

### 3. Reconstruct the Matrix:

I reconstruct the original matrix using the decomposed matrices:

$$\hat{𝑅}=𝑈_{𝑘}Σ_{𝑘} 𝑉^{T}_{k}$$

The reconstructed matrix $\hat{R}$ approximates the original user-item matrix 𝑅, filling in the missing ratings with predicted values.

### 4. Generate Recommendations:

* For a given user, we use the reconstructed matrix $\hat{R}$ to identify movies with the highest predicted ratings that the user hasn't seen yet.

## Summary

The key idea behind matrix factorization is to identify latent factors that capture the underlying relationships between users and items. SVD helps us decompose the user-item matrix into these latent factors, enabling us to predict missing ratings and generate personalized recommendations. The mathematical elegance of SVD lies in its ability to efficiently capture the most significant patterns in the data, even when dealing with large and sparse matrices.

# Python Code

## The Data

For this report, I use the MovieLens dataset from GroupLens Research. This dataset contains 87382 movie titles and 6843 users.

In [4]:
import pandas as pd
import numpy as np


# Load the dataset
movies = pd.read_csv('movies.csv')
ratings = pd.read_csv('ratings.csv')

In [16]:
num_unique_movies = movies['title'].nunique()
print(f"Number of unique Movie Titles: {num_unique_movies}")

Number of unique Movie Titles: 87382


In [12]:
num_unique_users = ratings['userId'].nunique()
print(f"Number of unique user IDs: {num_unique_users}")

Number of unique user IDs: 6843


In [6]:
print(movies.head(20))

    movieId                                  title  \
0         1                       Toy Story (1995)   
1         2                         Jumanji (1995)   
2         3                Grumpier Old Men (1995)   
3         4               Waiting to Exhale (1995)   
4         5     Father of the Bride Part II (1995)   
5         6                            Heat (1995)   
6         7                         Sabrina (1995)   
7         8                    Tom and Huck (1995)   
8         9                    Sudden Death (1995)   
9        10                       GoldenEye (1995)   
10       11         American President, The (1995)   
11       12     Dracula: Dead and Loving It (1995)   
12       13                           Balto (1995)   
13       14                           Nixon (1995)   
14       15                Cutthroat Island (1995)   
15       16                          Casino (1995)   
16       17           Sense and Sensibility (1995)   
17       18                 

In [7]:
print(ratings.head(20))

    userId  movieId  rating    timestamp
0        1     17.0     4.0  944249077.0
1        1     25.0     1.0  944250228.0
2        1     29.0     2.0  943230976.0
3        1     30.0     5.0  944249077.0
4        1     32.0     5.0  943228858.0
5        1     34.0     2.0  943228491.0
6        1     36.0     1.0  944249008.0
7        1     80.0     5.0  944248943.0
8        1    110.0     3.0  943231119.0
9        1    111.0     5.0  944249008.0
10       1    161.0     1.0  943231162.0
11       1    166.0     5.0  943228442.0
12       1    176.0     4.0  944079496.0
13       1    223.0     3.0  944082810.0
14       1    232.0     5.0  943228442.0
15       1    260.0     5.0  943228696.0
16       1    302.0     4.0  944253272.0
17       1    306.0     5.0  944248888.0
18       1    307.0     5.0  944253207.0
19       1    322.0     4.0  944053801.0


## Step 1: Construct the User-Item Matrix:

In [8]:
# Create the user-item matrix
user_item_matrix = ratings.pivot(index='userId', columns='movieId', values='rating').fillna(0)


## Step 2: Apply SVD

In [9]:
from sklearn.decomposition import TruncatedSVD

# Fit the SVD model
svd = TruncatedSVD(n_components=50, random_state=42)
svd.fit(user_item_matrix)

# Get the user and item factors
user_factors = svd.transform(user_item_matrix)
item_factors = svd.components_


## Step 3: Reconstruct the Matrix:

In [17]:

# Reconstruct the matrix
reconstructed_matrix = np.dot(user_factors, item_factors)

def get_recommendations(user_id, user_item_matrix, reconstructed_matrix, movies, num_recommendations=3):
    user_index = user_id - 1  # assuming user IDs start at 1
    user_ratings = reconstructed_matrix[user_index]
    sorted_movie_indices = np.argsort(user_ratings)[::-1]
    recommended_movie_indices = [idx for idx in sorted_movie_indices if user_item_matrix.iloc[user_index, idx] == 0]
    top_recommendations = recommended_movie_indices[:num_recommendations]

    recommended_movie_titles = []
    for movie_id in top_recommendations:
        movie_title = movies[movies['movieId'] == movie_id].title.values
        if len(movie_title) > 0:  # Ensure the movie title exists
            recommended_movie_titles.append(movie_title[0])
    return recommended_movie_titles


## Step 4: Generate Recommendations

In [18]:
# Generate Recommendations
for i in range(1, 11):
  user_id = i
  recommendations = get_recommendations(user_id, user_item_matrix, reconstructed_matrix, movies)
  print(f"Recommendations for user {user_id}: {recommendations}")

Recommendations for user 1: ['Old Lady Who Walked in the Sea, The (Vieille qui marchait dans la mer, La) (1991)', "Mummy's Hand, The (1940)", 'Mille bolle blu (1993)']
Recommendations for user 2: ['Wedding Gift, The (1994)', 'Hot Shots! Part Deux (1993)', 'Brothers McMullen, The (1995)']
Recommendations for user 3: ['Toy Story (1995)', 'Bhaji on the Beach (1993)', 'New Jersey Drive (1995)']
Recommendations for user 4: ['Streetcar Named Desire, A (1951)', 'I.Q. (1994)', "Mummy's Hand, The (1940)"]
Recommendations for user 5: ['Slingshot, The (Kådisbellan) (1993)', 'Mask, The (1994)', 'Four Weddings and a Funeral (1994)']
Recommendations for user 6: ['Hanging Up (2000)', 'Halloween: Resurrection (Halloween 8) (2002)', 'An Amazing Couple (2002)']
Recommendations for user 7: ['Beyond Rangoon (1995)', 'Cobb (1994)', 'Hot Shots! Part Deux (1993)']
Recommendations for user 8: ['Usual Suspects, The (1995)', 'Red Firecracker, Green Firecracker (Pao Da Shuang Deng) (1994)', 'Godzilla (Gojira) (1