# Collaborative Filtering Recommender System
In this notebook, we'll implement collaborative filtering to build a recommender system for movies, and use it to recommend new movies for myself.  
The code here are based on my own implementations in the graded lab, organized and rewritten to be more succinct and clear.

### Tensors and operations in tensorflow
See this [link](https://www.tensorflow.org/tutorials/customization/basics) for tensors and their operations, as well as the conversion between tensors and numpy arrays.

## Tools

In [82]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras

## Collaborative Filtering Algorithm

We'll be predicting the movie ratings as follows: for user $j$, predict rating for movie $i$ as:

$$
Y(i,j) = R(i,j) * (\vec{w}^{(j)} \cdot \vec{x}^{(i)} + b^{(j)})
$$

where $R(i,j) = 1$ if user $j$ has rated movie $i$, and $R(i,j) = 0$ if not.

We'll use gradient descent with the following cost function to learn the parameters $\mathbf{X}$, $\mathbf{W}$, and $\mathbf{b}$ collaboratively.

$$J({\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)},\mathbf{w}^{(0)},b^{(0)},...,\mathbf{w}^{(n_u-1)},b^{(n_u-1)}})= \left[ \frac{1}{2}\sum_{(i,j):r(i,j)=1}(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2 \right]
+ \underbrace{\left[
\frac{\lambda}{2}
\sum_{j=0}^{n_u-1}\sum_{k=0}^{n-1}(\mathbf{w}^{(j)}_k)^2
+ \frac{\lambda}{2}\sum_{i=0}^{n_m-1}\sum_{k=0}^{n-1}(\mathbf{x}_k^{(i)})^2
\right]}_{regularization}
$$

## Implementation

### Cost function
We'll implement the cost function with the following function:
- `cofi_cost_func`: compute cost for collaborative filtering

In [83]:
def cofi_cost_func(X, W, b, Y, R, lambda_):
    """
    Returns the cost for the collaborative filtering
    Vectorized for speed. Uses tensorflow operations to be compatible with custom training loop.
    Args:
      X (ndarray (num_movies,num_features)): matrix of item features
      W (ndarray (num_users,num_features)) : matrix of user parameters
      b (ndarray (1, num_users)            : vector of user parameters
      Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies
      R (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
      lambda_ (float): regularization parameter
    Returns:
      J (float) : Cost
    """
    J = tf.math.reduce_sum((R * (tf.linalg.matmul(X, tf.transpose(W)) + b - Y))**2)
    reg = lambda_ * (tf.math.reduce_sum(W**2) + tf.math.reduce_sum(X**2))
    J = (J + reg) / 2
    
    return J

## Making movie recommendations for myself

### Dataset

The movie ratings data set is from the course graded lab. The data set is derived from the [MovieLens "ml-latest-small"](https://grouplens.org/datasets/movielens/latest/) dataset.   
[F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. <https://doi.org/10.1145/2827872>]

The original dataset has  9000 movies rated by 600 users. The dataset has been reduced in size to focus on movies from the years since 2000. This dataset consists of ratings on a scale of 0.5 to 5 in 0.5 step increments. The reduced dataset has $n_u = 443$ users, and $n_m= 4778$ movies. 

In [84]:
Y = np.genfromtxt('./data_movies/small_movies_Y.csv', delimiter=',')
R = np.genfromtxt('./data_movies/small_movies_R.csv', delimiter=',')

print("Y", Y.shape, "R", R.shape)

Y (4778, 443) R (4778, 443)


In [85]:
# Useful parameters
n_m, n_u = Y.shape  # n_m: number of movies, n_u: number of users
n_f = 100  # number of features for each movie

### New user ratings
In the cell below, I'll select a few movies from the movie list (filename: small_movie_list.csv) and give my own ratings. I'll then train the recommendation system to recommend movies for myself.

In [86]:
movieList_df = pd.read_csv('./data_movies/small_movie_list.csv', header=0, index_col=0,  delimiter=',', quotechar='"')
movieList = movieList_df["title"].to_list()
my_ratings = np.zeros(n_m)  #  Initialize my ratings

# my_ratings[movie_id] = rating
my_ratings[393]  = 5   # Lord of the Rings: The Fellowship of the Ring, The 
my_ratings[653]  = 5   # Lord of the Rings: The Two Towers, The
my_ratings[929]  = 5   # Lord of the Rings: The Return of the King, The
my_ratings[3326]  = 4   # Hobbit: An Unexpected Journey, The
my_ratings[3547] = 4.5   # Hobbit: The Desolation of Smaug, The
my_ratings[3843] = 3.5   # The Hobbit: The Battle of the Five Armies
my_ratings[580]  = 5   # Spirited Away (Sen to Chihiro no kamikakushi)
my_ratings[4478]  = 4.5   # Your Name
my_ratings[793]  = 3.5   # Pirates of the Caribbean: The Curse of the Black Pearl
my_ratings[3304]  = 4.5   # Life of Pi 
my_ratings[2716]  = 3.5   # Inception
my_rated = [i for i in range(len(my_ratings)) if my_ratings[i] > 0]

print('\nNew user ratings:\n')
for i in range(len(my_ratings)):
    if my_ratings[i] > 0 :
        print(f'Rated {my_ratings[i]} for  {movieList_df.loc[i,"title"]}');


New user ratings:

Rated 5.0 for  Lord of the Rings: The Fellowship of the Ring, The (2001)
Rated 5.0 for  Spirited Away (Sen to Chihiro no kamikakushi) (2001)
Rated 5.0 for  Lord of the Rings: The Two Towers, The (2002)
Rated 3.5 for  Pirates of the Caribbean: The Curse of the Black Pearl (2003)
Rated 5.0 for  Lord of the Rings: The Return of the King, The (2003)
Rated 3.5 for  Inception (2010)
Rated 4.5 for  Life of Pi (2012)
Rated 4.0 for  Hobbit: An Unexpected Journey, The (2012)
Rated 4.5 for  Hobbit: The Desolation of Smaug, The (2013)
Rated 3.5 for  The Hobbit: The Battle of the Five Armies (2014)
Rated 4.5 for  Your Name. (2016)


### Normalize the ratings
To make a more reasonable prediction for a new user who has rated none or only very few movies, we'll use mean normalization on the data. The following function `normalizeRatings` is from the course grade lab.

In [87]:
def normalizeRatings(Y, R):
    """
    Preprocess data by subtracting mean rating for every movie (every row).
    Only include real ratings R(i,j)=1.
    [Ynorm, Ymean] = normalizeRatings(Y, R) normalized Y so that each movie
    has a rating of 0 on average. Unrated moves then have a mean rating (0)
    Returns the mean rating in Ymean.
    """
    Ymean = (np.sum(Y*R,axis=1)/(np.sum(R, axis=1)+1e-12)).reshape(-1,1)
    Ynorm = Y - np.multiply(Ymean, R) 
    return(Ynorm, Ymean)

In [88]:
# Add new user ratings to Y 
Y = np.c_[my_ratings, Y]

# Add new user indicator matrix to R
R = np.c_[(my_ratings != 0).astype(int), R]

# Normalize the Dataset
Ynorm, Ymean = normalizeRatings(Y, R)

### Model training
We'll train the model with a custom training loop in tensorflow with the Adam optimizer:

In [89]:
# Useful parameters
n_m, n_u = Y.shape  # n_m: number of movies, n_u: number of users
n_f = 100  # number of features for each movie
alpha = 1e-1  # learning rate

# Set Initial Parameters (W, X), use tf.Variable to track these variables
# Randomly initalized parameters W, b, X
tf.random.set_seed(1234) # for consistent results
W = tf.Variable(tf.random.normal([n_u, n_f], dtype=tf.float64), name='W')
X = tf.Variable(tf.random.normal([n_m, n_f], dtype=tf.float64), name='X')
b = tf.Variable(tf.random.normal([1, n_u], dtype=tf.float64), name='b')

# Instatntiate on optimizer
optimizer = keras.optimizers.Adam(learning_rate = alpha)

In [90]:
iterations = 300
lambda_ = 1.

for iter in range(iterations):
    # Use TensorFlow’s GradientTape
    # to record the operations used to compute the cost 
    with tf.GradientTape() as tape:

        # Compute the cost (forward pass included in cost)
        cost_value = cofi_cost_func(X, W, b, Ynorm, R, lambda_)

    # Use the gradient tape to automatically retrieve
    # the gradients of the trainable variables with respect to the loss
    grads = tape.gradient( cost_value, [X,W,b] )

    # Run one step of gradient descent by updating
    # the value of the variables to minimize the loss.
    optimizer.apply_gradients( zip(grads, [X,W,b]) )

    # Log periodically.
    if iter % 20 == 0:
        print(f"Training loss at iteration {iter}: {cost_value:0.1f}")

Training loss at iteration 0: 2321299.7
Training loss at iteration 20: 136152.4
Training loss at iteration 40: 51855.9
Training loss at iteration 60: 24596.6
Training loss at iteration 80: 13629.4
Training loss at iteration 100: 8486.6
Training loss at iteration 120: 5806.5
Training loss at iteration 140: 4310.3
Training loss at iteration 160: 3433.8
Training loss at iteration 180: 2900.5
Training loss at iteration 200: 2564.9
Training loss at iteration 220: 2346.9
Training loss at iteration 240: 2201.0
Training loss at iteration 260: 2100.4
Training loss at iteration 280: 2029.2


### Make recommendations
After we learned the parameters $\mathbf{W}$, and $\mathbf{b}$ for the users (including the new user), as well as the feature vectors $\mathbf{X}$ for the movies, we can predict the ratings that the new user would give for all movies. One simple way of making recommendations is to recommend the first few (say, 10) movies with the highest predicted ratings.

In [91]:
# Make a prediction using trained weights and biases
p = np.matmul(X.numpy(), np.transpose(W.numpy())) + b.numpy()

#restore the mean
pm = p + Ymean

my_predictions = pm[:,0]

# sort predictions
ix = tf.argsort(my_predictions, direction='DESCENDING')

for i in range(30):
    j = ix[i]
    if j not in my_rated:
        print(f'Predicting rating {my_predictions[j]:0.2f} for movie {movieList[j]}')

print('\n\nOriginal vs Predicted ratings:\n')
for i in range(len(my_ratings)):
    if my_ratings[i] > 0:
        print(f'Original {my_ratings[i]}, Predicted {my_predictions[i]:0.2f} for {movieList[i]}')

Predicting rating 5.45 for movie Bossa Nova (2000)
Predicting rating 5.45 for movie Colourful (Karafuru) (2010)
Predicting rating 5.45 for movie Raise Your Voice (2004)
Predicting rating 5.45 for movie Kung Fu Panda: Secrets of the Masters (2011)
Predicting rating 5.44 for movie I'm the One That I Want (2000)
Predicting rating 5.44 for movie Son of the Bride (Hijo de la novia, El) (2001)
Predicting rating 5.44 for movie Wonder Woman (2009)
Predicting rating 5.44 for movie Justice League: Doom (2012) 
Predicting rating 5.44 for movie A Detective Story (2003)
Predicting rating 5.44 for movie Superman/Batman: Public Enemies (2009)
Predicting rating 5.44 for movie Faster (2010)
Predicting rating 5.44 for movie Max Manus (2008)
Predicting rating 5.44 for movie Deathgasm (2015)
Predicting rating 5.44 for movie I Am Not Your Negro (2017)
Predicting rating 5.44 for movie Won't You Be My Neighbor? (2018)
Predicting rating 5.44 for movie Act of Killing, The (2012)
Predicting rating 5.44 for movi