
# Outline
- [ 1 - Notation](#1)
- [ 2 - Recommender Systems](#2)
- [ 3 - Movie ratings dataset](#3)
- [ 4 - Collaborative filtering learning algorithm](#4)
  - [ 4.1 Collaborative filtering cost function](#4.1)
    - [ Exercise 1](#ex01)
- [ 5 - Learning movie recommendations](#5)
- [ 6 - Recommendations](#6)
- [ 7 - Congratulations!](#7)




##  Packages
We will use NumPy and Tensorflow Packages.

In [3]:
import numpy as np
import tensorflow as tf
import pandas as pd
from tensorflow import keras

<a name="1"></a>
## 1 - Notation


| General Notation       | Description                                                             | Python (if any) |
|:-----------------------|:------------------------------------------------------------------------|:---------------|
| $r(i,j)$               | scalar; = 1 if user j rated movie i = 0 otherwise                        |                |
| $y(i,j)$               | scalar; = rating given by user j on movie i (if r(i,j) = 1 is defined)   |                |
| $\mathbf{w}^{(j)}$     | vector; parameters for user j                                           |                |
| $b^{(j)}$              | scalar; parameter for user j                                            |                |
| $\mathbf{x}^{(i)}$     | vector; feature ratings for movie i                                     |                |
| $n_u$                  | number of users                                                         | num_users      |
| $n_m$                  | number of movies                                                        | num_movies     |
| $n$                    | number of features                                                      | num_features   |
| $\mathbf{X}$           | matrix of vectors $\mathbf{x}^{(i)}$                                    | X              |
| $\mathbf{W}$           | matrix of vectors $\mathbf{w}^{(j)}$                                    | W              |
| $\mathbf{b}$           | vector of bias parameters $b^{(j)}$                                     | b              |
| $\mathbf{R}$           | matrix of elements $r(i,j)$                                             | R              |


## 2 - Recommender Systems
In this project, we will implement the collaborative filtering learning algorithm and apply it to a dataset of movie ratings.
The goal of a collaborative filtering recommender system is to generate two vectors: For each user, a 'parameter vector' that embodies the movie tastes of a user. For each movie, a feature vector of the same size which embodies some description of the movie. The dot product of the two vectors plus the bias term should produce an estimate of the rating the user might give to that movie.


<a name="3"></a>
## 3 - Movie ratings dataset 
The data set is derived from the [MovieLens "ml-latest-small"](https://grouplens.org/datasets/movielens/latest/) dataset.   
[F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. <https://doi.org/10.1145/2827872>]

The original dataset has  9000 movies rated by 600 users. The dataset has been reduced in size to focus on movies from the years since 2000. This dataset consists of ratings on a scale of 0.5 to 5 in 0.5 step increments. The reduced dataset has $n_u = 443$ users, and $n_m= 4778$ movies. 

Below, you will load the movie dataset into the variables $Y$ and $R$.

The matrix $Y$ (a  $n_m \times n_u$ matrix) stores the ratings $y^{(i,j)}$. The matrix $R$ is an binary-valued indicator matrix, where $R(i,j) = 1$ if user $j$ gave a rating to movie $i$, and $R(i,j)=0$ otherwise. 

Throughout this part of the exercise, you will also be working with the
matrices, $\mathbf{X}$, $\mathbf{W}$ and $\mathbf{b}$: 

$$\mathbf{X} = 
\begin{bmatrix}
--- (\mathbf{x}^{(0)})^T --- \\
--- (\mathbf{x}^{(1)})^T --- \\
\vdots \\
--- (\mathbf{x}^{(n_m-1)})^T --- \\
\end{bmatrix} , \quad
\mathbf{W} = 
\begin{bmatrix}
--- (\mathbf{w}^{(0)})^T --- \\
--- (\mathbf{w}^{(1)})^T --- \\
\vdots \\
--- (\mathbf{w}^{(n_u-1)})^T --- \\
\end{bmatrix},\quad
\mathbf{ b} = 
\begin{bmatrix}
 b^{(0)}  \\
 b^{(1)} \\
\vdots \\
b^{(n_u-1)} \\
\end{bmatrix}\quad
$$ 

The $i$-th row of $\mathbf{X}$ corresponds to the
feature vector $x^{(i)}$ for the $i$-th movie, and the $j$-th row of
$\mathbf{W}$ corresponds to one parameter vector $\mathbf{w}^{(j)}$, for the
$j$-th user. Both $x^{(i)}$ and $\mathbf{w}^{(j)}$ are $n$-dimensional
vectors. For the purposes of this exercise, you will use $n=10$, and
therefore, $\mathbf{x}^{(i)}$ and $\mathbf{w}^{(j)}$ have 10 elements.
Correspondingly, $\mathbf{X}$ is a
$n_m \times 10$ matrix and $\mathbf{W}$ is a $n_u \times 10$ matrix.

We will start by loading the movie ratings dataset to understand the structure of the data.
We will load $Y$ and $R$ with the movie dataset.  

In [11]:
def load_data(csv_file):
    # Read CSV file
    df = pd.read_csv(csv_file)
    # Create a pivot table for the rating matrix
    rating_matrix = df.pivot_table(index='movieId', columns='userId', values='rating')
    # Create a binary matrix
    binary_matrix = rating_matrix.notnull().astype(int)
    # Replace NaN with 0 in rating matrix
    rating_matrix = rating_matrix.fillna(0)
    return rating_matrix, binary_matrix, rating_matrix.to_numpy(), binary_matrix.to_numpy()

rating, binary_matrix, Y, R = load_data('dataset/ratings.csv')


## 4 - Collaborative filtering learning algorithm

Now, we will begin implementing the collaborative filtering learning
algorithm. we will start by implementing the objective function. 

The collaborative filtering algorithm in the setting of movie
recommendations considers a set of $n$-dimensional parameter vectors
$\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)}$, $\mathbf{w}^{(0)},...,\mathbf{w}^{(n_u-1)}$ and $b^{(0)},...,b^{(n_u-1)}$, where the
model predicts the rating for movie $i$ by user $j$ as
$y^{(i,j)} = \mathbf{w}^{(j)}\cdot \mathbf{x}^{(i)} + b^{(j)}$ . Given a dataset that consists of
a set of ratings produced by some users on some movies, you wish to
learn the parameter vectors $\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)},
\mathbf{w}^{(0)},...,\mathbf{w}^{(n_u-1)}$  and $b^{(0)},...,b^{(n_u-1)}$ that produce the best fit (minimizes
the squared error).

We will complete the code in cofiCostFunc to compute the cost
function for collaborative filtering. 


<a name="4.1"></a>
### 4.1 Collaborative filtering cost function

The collaborative filtering cost function is given by
$$J({\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)},\mathbf{w}^{(0)},b^{(0)},...,\mathbf{w}^{(n_u-1)},b^{(n_u-1)}})= \frac{1}{2}\sum_{(i,j):r(i,j)=1}(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2
+\underbrace{
\frac{\lambda}{2}
\sum_{j=0}^{n_u-1}\sum_{k=0}^{n-1}(\mathbf{w}^{(j)}_k)^2
+ \frac{\lambda}{2}\sum_{i=0}^{n_m-1}\sum_{k=0}^{n-1}(\mathbf{x}_k^{(i)})^2
}_{regularization}
\tag{1}$$
The first summation in (1) is "for all $i$, $j$ where $r(i,j)$ equals $1$" and could be written:

$$
= \frac{1}{2}\sum_{j=0}^{n_u-1} \sum_{i=0}^{n_m-1}r(i,j)*(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2
+\text{regularization}
$$

We should now write cofiCostFunc (collaborative filtering cost function) to return this cost.

In [15]:
def initialize_matrix(n, m):
    # Create an n by m matrix with values between 0 and 5
    return np.random.uniform(low=1, high=5, size=(n, m))


In [16]:
# initialize the parameter matrix

X = initialize_matrix(len(Y), 10)
W = initialize_matrix(len(Y), 10)
b = initialize_matrix(1, len(Y[0]))

In [17]:
def cofi_cost_func(X, W, b, Y, R, lambda_):
    """
    Returns the cost for the content-based filtering
    Vectorized for speed. Uses tensorflow operations to be compatible with custom training loop.
    Args:
      X (ndarray (num_movies,num_features)): matrix of item features
      W (ndarray (num_users,num_features)) : matrix of user parameters
      b (ndarray (1, num_users)            : vector of user parameters
      Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies
      R (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
      lambda_ (float): regularization parameter
    Returns:
      J (float) : Cost
    """
    j = (tf.linalg.matmul(X, tf.transpose(W)) + b - Y)*R
    J = 0.5 * tf.reduce_sum(j**2) + (lambda_/2) * (tf.reduce_sum(X**2) + tf.reduce_sum(W**2))
    return J

In [18]:
# Reduce the data set size so that this runs faster
num_users_r = 40
num_movies_r = 50
num_features_r = 10

X_r = X[:num_movies_r, :num_features_r]
W_r = W[:num_users_r,  :num_features_r]
b_r = b[0, :num_users_r].reshape(1,-1)
Y_r = Y[:num_movies_r, :num_users_r]
R_r = R[:num_movies_r, :num_users_r]

# Evaluate cost function
J = cofi_cost_func(X_r, W_r, b_r, Y_r, R_r, 0)
print(f"Cost: {J:0.2f}")

Cost: 777969.20


## 5 - Learning movie recommendations 
------------------------------

After we have finished implementing the collaborative filtering cost
function, we can start training your algorithm to make
movie recommendations for ourself. 

In [34]:
movie_list = pd.read_csv('dataset/movies.csv', index_col='movieId')

In [35]:
rate_0 = np.zeros(len(Y))
my_ratings = pd.Series(rate_0)
my_ratings.index = rating.index

In [36]:
movie_list[movie_list['genres'].apply(lambda x: 'Adventure' in x)][500:550]

Unnamed: 0_level_0,title,genres
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
6162,Gerry (2002),Adventure|Drama
6169,"Black Stallion Returns, The (1983)",Adventure|Children
6170,"Black Stallion, The (1979)",Adventure|Children|Drama
6232,Born Free (1966),Adventure|Children|Drama
6239,Journey to the Center of the Earth (1959),Adventure|Children|Sci-Fi
6294,Bulletproof Monk (2003),Action|Adventure|Sci-Fi
6297,Holes (2003),Adventure|Children|Comedy|Mystery
6316,"Wiz, The (1978)",Adventure|Children|Comedy|Fantasy|Musical
6333,X2: X-Men United (2003),Action|Adventure|Sci-Fi|Thriller
6350,Laputa: Castle in the Sky (Tenkû no shiro Rapy...,Action|Adventure|Animation|Children|Fantasy|Sc...


In [37]:
# rating the movie by ourself, the high rating according to action, adventure, and Animation genres
my_ratings[631] = 5 
my_ratings[3000] = 4.5
my_ratings[7153]  = 5   
my_ratings[1032]  = 5   
my_ratings[3687] = 4   
my_ratings[558] = 4
my_ratings[2857] = 4.5
my_ratings[8] = 4.5
my_ratings[6294]  = 5   
my_ratings[6645]  = 5  
my_ratings[6902]  = 5  
my_ratings[11] = 2   
my_ratings[15] = 1
my_ratings[17] = 2
my_ratings[25] = 1
my_rated = np.array([1 if my_ratings[index] > 0 else 0 for index in my_ratings.index])


In [38]:

print('\nNew user ratings:\n')
for index in my_ratings.index:
    if my_ratings[index] > 0 :
        print(f'Rated {my_ratings[index]} for  {movie_list.loc[index,"title"]}; Genres: {movie_list.loc[index, "genres"]}');


New user ratings:

Rated 4.5 for  Tom and Huck (1995); Genres: Adventure|Children
Rated 2.0 for  American President, The (1995); Genres: Comedy|Drama|Romance
Rated 1.0 for  Cutthroat Island (1995); Genres: Action|Adventure|Romance
Rated 2.0 for  Sense and Sensibility (1995); Genres: Drama|Romance
Rated 1.0 for  Leaving Las Vegas (1995); Genres: Drama|Romance
Rated 4.0 for  Pagemaster, The (1994); Genres: Action|Adventure|Animation|Children|Fantasy
Rated 5.0 for  All Dogs Go to Heaven 2 (1996); Genres: Adventure|Animation|Children|Fantasy|Musical|Romance
Rated 5.0 for  Alice in Wonderland (1951); Genres: Adventure|Animation|Children|Fantasy|Musical
Rated 4.5 for  Yellow Submarine (1968); Genres: Adventure|Animation|Comedy|Fantasy|Musical
Rated 4.5 for  Princess Mononoke (Mononoke-hime) (1997); Genres: Action|Adventure|Animation|Drama|Fantasy
Rated 4.0 for  Light Years (Gandahar) (1988); Genres: Adventure|Animation|Fantasy|Sci-Fi
Rated 5.0 for  Bulletproof Monk (2003); Genres: Action|Ad

Now, let's add these reviews to $Y$ and $R$ and normalize the ratings.

In [39]:
def normalizeRatings(rating_Y, R_matrix):
    rating_Y_copy =  np.empty(rating_Y.shape)
    mean_list = []
    for col in range(rating_Y.shape[1]):
        valid_rating = R_matrix[:, col].sum()
        if valid_rating > 0:
            mean_col = rating_Y[:, col].sum() / valid_rating
        else:
            mean_col = 0  # Handle the case where there are no valid ratings

        mean_list.append(mean_col)
        for i in range(rating_Y.shape[0]):  # Iterate over rows
            if R_matrix[i, col] == 1:
                rating_Y_copy[i, col] = rating_Y[i, col] - mean_col
            else:
                rating_Y_copy[i, col] = 0

    return rating_Y_copy, mean_list


In [40]:

# Add new user ratings to Y
Y = np.c_[my_ratings.to_numpy(), Y]

# Add new user indicator matrix to R
R = np.c_[(my_ratings.to_numpy() != 0).astype(int), R]

# Normalize the Dataset
Ynorm, Ymean = normalizeRatings(Y, R)

Let's prepare to train the model. Initialize the parameters and select the Adam optimizer.

In [41]:
#  Useful Values
num_movies, num_users = Y.shape
num_features = 100

# Set Initial Parameters (W, X), use tf.Variable to track these variables
tf.random.set_seed(1234) # for consistent results
W = tf.Variable(tf.random.normal((num_users,  num_features),dtype=tf.float64),  name='W')
X = tf.Variable(tf.random.normal((num_movies, num_features),dtype=tf.float64),  name='X')
b = tf.Variable(tf.random.normal((1,          num_users),   dtype=tf.float64),  name='b')

# Instantiate an optimizer.
optimizer = keras.optimizers.Adam(learning_rate=1e-1)

Let's now train the collaborative filtering model. This will learn the parameters $\mathbf{X}$, $\mathbf{W}$, and $\mathbf{b}$. 

The operations involved in learning $w$, $b$, and $x$ simultaneously do not fall into the typical 'layers' offered in the TensorFlow neural network package. Instead, we can use a custom training loop.

Recall from earlier labs the steps of gradient descent.
- repeat until convergence:
    - compute forward pass
    - compute the derivatives of the loss relative to parameters
    - update the parameters using the learning rate and the computed derivatives 
    
TensorFlow has the marvelous capability of calculating the derivatives for you. This is shown below. Within the `tf.GradientTape()` section, operations on Tensorflow Variables are tracked. When `tape.gradient()` is later called, it will return the gradient of the loss relative to the tracked variables. The gradients can then be applied to the parameters using an optimizer. 


In [42]:
iterations = 400
lambda_ = 1
for iter in range(iterations):
    # Use TensorFlow’s GradientTape
    # to record the operations used to compute the cost 
    with tf.GradientTape() as tape:

        # Compute the cost (forward pass included in cost)
        cost_value = cofi_cost_func(X, W, b, Ynorm, R, lambda_)

    # Use the gradient tape to automatically retrieve
    # the gradients of the trainable variables with respect to the loss
    grads = tape.gradient( cost_value, [X,W,b] )

    # Run one step of gradient descent by updating
    # the value of the variables to minimize the loss.
    optimizer.apply_gradients( zip(grads, [X,W,b]) )

    # Log periodically.
    if iter % 20 == 0:
        print(f"Training loss at iteration {iter}: {cost_value:0.1f}")

Training loss at iteration 0: 4819643.8
Training loss at iteration 20: 228347.7
Training loss at iteration 40: 92015.2
Training loss at iteration 60: 46482.4
Training loss at iteration 80: 27195.6
Training loss at iteration 100: 17736.1
Training loss at iteration 120: 12603.9
Training loss at iteration 140: 9634.4
Training loss at iteration 160: 7837.8
Training loss at iteration 180: 6712.9
Training loss at iteration 200: 5987.8
Training loss at iteration 220: 5507.4
Training loss at iteration 240: 5180.4
Training loss at iteration 260: 4951.6
Training loss at iteration 280: 4787.2
Training loss at iteration 300: 4665.8
Training loss at iteration 320: 4573.8
Training loss at iteration 340: 4502.5
Training loss at iteration 360: 4446.1
Training loss at iteration 380: 4400.3


<a name="6"></a>
## 6 - Recommendations
Below, we compute the ratings for all the movies and users and display the movies that are recommended. These are based on the movies and ratings entered as `my_ratings[]` above. To predict the rating of movie $i$ for user $j$, you compute $\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)}$. This can be computed for all ratings using matrix multiplication.

In [43]:
# Make a prediction using trained weights and biases
p = np.matmul(X, np.transpose(W)) + b.numpy()

#restore the mean
pm = p + Ymean

my_predictions = pm[:,0]

# sort predictions
ix = tf.argsort(my_predictions, direction='DESCENDING')


In [44]:
print("The recommended movies according to the predicted rating is: \n")
for i in range(15):
    j = ix.numpy()[i]
    movie_id = my_ratings.index[j]
    if my_rated[j] == 0:
        print(f'Predicting rating {my_predictions[j]:0.2f} for movie {movie_list.loc[movie_id, "title"]}; Genres: {movie_list.loc[movie_id, "genres"]}')

print('\n\nOriginal vs Predicted ratings:\n')
for i in range(len(my_ratings)):
    if my_ratings.values[i] > 0:
        movie_id = my_ratings.index[i]
        print(f'Original {my_ratings.values[i]}, Predicted {my_predictions[i]:0.2f} for {movie_list.loc[movie_id, "title"]} ; Genres: {movie_list.loc[movie_id, "genres"]}')

The recommended movies according to the predicted rating is: 

Predicting rating 5.31 for movie Sixth Sense, The (1999); Genres: Drama|Horror|Mystery
Predicting rating 5.24 for movie Big Lebowski, The (1998); Genres: Comedy|Crime
Predicting rating 5.17 for movie Thing, The (1982); Genres: Action|Horror|Sci-Fi|Thriller
Predicting rating 5.10 for movie Corpse Bride (2005); Genres: Animation|Comedy|Fantasy|Musical|Romance
Predicting rating 5.07 for movie Austin Powers: The Spy Who Shagged Me (1999); Genres: Action|Adventure|Comedy
Predicting rating 5.05 for movie Lord of the Rings: The Two Towers, The (2002); Genres: Adventure|Fantasy
Predicting rating 4.96 for movie Matrix, The (1999); Genres: Action|Sci-Fi|Thriller
Predicting rating 4.88 for movie Quick and the Dead, The (1995); Genres: Action|Thriller|Western
Predicting rating 4.87 for movie Crying Game, The (1992); Genres: Drama|Romance|Thriller
Predicting rating 4.85 for movie WALL·E (2008); Genres: Adventure|Animation|Children|Roman