<center>
    <a href="https://www.aus.edu/"><img src="https://i.imgur.com/pdZvnSD.png" width=200> </a>    
</center>
<h1 align=center><font size = 5>CMP 49412 - Personalized Recommendations</font>
<h1 align=center><font size = 5>Matrix Factorization - Singular Value Decomposition (SVD)</font>
<h1 align=center><font size = 5>Prepared by Alex Aklson, Ph.D.</font>
<h1 align=center><font size = 5>March 12, 2025</font>

Import Libraries.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from scipy.sparse.linalg import svds

#### Read in Data and Explore it

Read in the data.

In [None]:
rating_df = pd.read_csv('movie_ratings_data.csv')

Quick Exploration of the data.

In [None]:
rating_df.head()

In [None]:
rating_df.shape

In [None]:
rating_df['user_id'].nunique()

In [None]:
rating_df['movie_title'].nunique()

Let's create the user-item matrix.

In [None]:
user_item_matrix = rating_df.pivot_table(index='user_id', columns='movie_title', values='rating')

In [None]:
user_item_matrix

#### Calculate the Sparsity

In [None]:
sparsity_count = user_item_matrix.isnull().values.sum()

In [None]:
print(sparsity_count)

In [None]:
full_count = user_item_matrix.size

In [None]:
print(full_count)

In [None]:
sparsity = sparsity_count / full_count
print(sparsity)

Count the occupied cells per column.

In [None]:
occupied_count = user_item_matrix.notnull().sum()
print(occupied_count)

Sort the resulting series from low to high.

In [None]:
sorted_occupied_count = occupied_count.sort_values()
print(sorted_occupied_count)

Plot a histogram of the values in sorted_occupied_count

In [None]:
plt.hist(sorted_occupied_count, edgecolor='k')
plt.show()

Get the average rating for each user. 

In [None]:
avg_ratings = user_item_matrix.mean(axis=1)

In [None]:
avg_ratings

Center each user's ratings around 0.

In [None]:
user_ratings_centered = user_item_matrix.sub(avg_ratings, axis=0)

Fill in all missing values with 0s.

In [None]:
user_ratings_centered.fillna(0, inplace=True)

In [None]:
user_ratings_centered.head()

Print the mean of each row.

In [None]:
print(user_ratings_centered.mean(axis=1))

Decompose the matrix using SVD.

In [None]:
U, Sigma, Vt = svds(user_ratings_centered.to_numpy(), k=50)

In [None]:
Sigma = np.diag(Sigma[::-1])
print(Sigma)

In [None]:
Sigma.shape

Reconstruct the user-item matrix.

In [None]:
U_Sigma = np.dot(U[:, ::-1], Sigma)

Dot product of result and Vt

In [None]:
U_Sigma_Vt = np.dot(U_Sigma, Vt[::-1, :])

In [None]:
print(U_Sigma_Vt)

Add the mean back to the reconstructed matrix.

In [None]:
uncentered_ratings = U_Sigma_Vt + avg_ratings.values.reshape(-1, 1)

Convert the reconstructed matrix into a DataFrame.

In [None]:
calc_pred_ratings_df = pd.DataFrame(np.round(uncentered_ratings, 1), 
                                    index=user_item_matrix.index,
                                    columns=user_item_matrix.columns
                                   )

In [None]:
print(calc_pred_ratings_df.head())

In [None]:
print(user_item_matrix.head())

Recommend to the fifth user 5 movies.

In [None]:
user_5_ratings = calc_pred_ratings_df.iloc[4,:].sort_values(ascending=False)

print(user_5_ratings[:5])

### Exercise

Load `rating_exercise.csv` and use matrix factorization - SVD to predict the missing ratings. Use the predicted labels to recommend 5 movies to **User5**.