# Collaborative Filtering Recommender System using Tensorflow

We will learn how to implement collaborative filtering to build a recommender system for movies.

We will use NumPy and Tensorflow to build the model. 

In [8]:
import numpy as np
import tensorflow as tf 
import keras 
import matplotlib.pyplot as plt
import seaborn as sns 
import pandas as pd

plt.style.use('seaborn-v0_8')
%config InlineBackend.figure_format = 'svg'

# 1. Notation

| General Notation | Description | Python (if any) |
|:-----------------|:------------|:----------------|
| $r(i,j)$ | scalar; = 1 if user j rated movie i, = 0 otherwise | |
| $y(i,j)$ | scalar; rating given by user j on movie i (if r(i,j) = 1 is defined) | |
| $\mathbf{w}^{(j)}$ | vector; parameters for user j | |
| $b^{(j)}$ | scalar; parameter for user j | |
| $\mathbf{x}^{(i)}$ | vector; feature ratings for movie i | |
| $n_u$ | number of users | `num_users` |
| $n_m$ | number of movies | `num_movies` |
| $n$ | number of features | `num_features` |
| $\mathbf{X}$ | matrix of vectors $\mathbf{x}^{(i)}$ | `X` |
| $\mathbf{W}$ | matrix of vectors $\mathbf{w}^{(j)}$ | `W` |
| $\mathbf{b}$ | vector of bias parameters $b^{(j)}$ | `b` |
| $\mathbf{R}$ | matrix of elements $r(i,j)$ | `R` |


# 2. Recommender System

The goal of a collaborative filtering recommender system is to generate two vectors: for each user, a **parameter vector** that embodies the user's preferences, and for each movie, a **feature vector** of the same length that embodies some description of the movie.

The dot product $\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)}$ should produce an estimate of the rating that user j would give to movie i. 

Each rating is provided in the matrix form $\mathbf{Y}$, where $y(i,j)$ is the rating given by user j on movie i. 

The ratings go from 0 to 5 in steps of 0.5. If a movie has not been rated by a user, it's rating is 0. 

The matrix $\mathbf{R}$ is a binary-valued indicator matrix, where $r(i,j)$ = 1 if user j rated movie i, and $r(i,j)$ = 0 otherwise. 

Movies are in rows and users are in columns. 

Each movie has a feature vector $\mathbf{x}^{(i)}$ of length $n$ and each user has a parameter vector $\mathbf{w}^{(j)}$ of length $n$, as well as a bias parameter $b^{(j)}$. 

These vectors are learned simultaneously by using the existing user/movie ratings as training data. 

Once the features vectors and parameters are learned, they can be used to predict how a user might rate a movie that they have not yet rated. 

This can be achieved with the `cofiCostFunc` function of Tensorflow. 

While Tensorflow is typically used for supervised learning, its core functions can be accessed and used for other purposes.

# 3. Movie ratings dataset 

The dataset is a subset of the [MovieLens 100k dataset](https://grouplens.org/datasets/movielens/100k/), reduced to account only for movies released since year 2000. 

The ratings are in the range 0 to 5, with 0.5 increments, as previously mentioned. The dataset has $n_u = 443$ users and $n_m = 4778$ movies.

Once loaded, the data can be stored in the following matrices:

$$\mathbf{X} = 
\begin{bmatrix}
--- (\mathbf{x}^{(0)})^T --- \\
--- (\mathbf{x}^{(1)})^T --- \\
\vdots \\
--- (\mathbf{x}^{(n_m-1)})^T --- \\
\end{bmatrix} , \quad
\mathbf{W} = 
\begin{bmatrix}
--- (\mathbf{w}^{(0)})^T --- \\
--- (\mathbf{w}^{(1)})^T --- \\
\vdots \\
--- (\mathbf{w}^{(n_u-1)})^T --- \\
\end{bmatrix},\quad
\mathbf{ b} = 
\begin{bmatrix}
 b^{(0)}  \\
 b^{(1)} \\
\vdots \\
b^{(n_u-1)} \\
\end{bmatrix}\quad
$$ 

In [35]:
R = pd.read_csv('data/small_movies_R.csv', header=None).values
b = pd.read_csv('data/small_movies_b.csv', header=None).values 
W = pd.read_csv('data/small_movies_W.csv', header=None).values
X = pd.read_csv('data/small_movies_X.csv', header=None).values
Y = pd.read_csv('data/small_movies_Y.csv', header=None).values
raw_df = pd.read_csv('data/small_movie_list.csv')

In [43]:
raw_df = raw_df.drop('Unnamed: 0', axis=1)

In [46]:
raw_df

Unnamed: 0,mean rating,number of ratings,title
0,3.400000,5,"Yards, The (2000)"
1,3.250000,6,Next Friday (2000)
2,2.000000,4,Supernova (2000)
3,2.000000,4,Down to You (2000)
4,2.672414,29,Scream 3 (2000)
...,...,...,...
4773,3.500000,1,Jon Stewart Has Left the Building (2015)
4774,4.000000,1,Black Butler: Book of the Atlantic (2017)
4775,3.500000,1,No Game No Life: Zero (2017)
4776,3.500000,1,Flint (2017)


In [51]:
tsmean = np.mean(Y[0, R[0, :].astype(bool)])
print(f'Average rating for movie 1 : {tsmean:0.3f} / 5' )

Average rating for movie 1 : 3.400 / 5


# 4. Collaborative filtering learning algorithm

Let's implement the objective function first.

## 4.1 Collaborative filtering cost function

The cost function for collaborative filtering is given by:

$$J({\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)},\mathbf{w}^{(0)},b^{(0)},...,\mathbf{w}^{(n_u-1)},b^{(n_u-1)}})= \left[ \frac{1}{2}\sum_{(i,j):r(i,j)=1}(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2 \right]
+ \underbrace{\left[
\frac{\lambda}{2}
\sum_{j=0}^{n_u-1}\sum_{k=0}^{n-1}(\mathbf{w}^{(j)}_k)^2
+ \frac{\lambda}{2}\sum_{i=0}^{n_m-1}\sum_{k=0}^{n-1}(\mathbf{x}_k^{(i)})^2
\right]}_{regularization}
\tag{1}$$
The first summation in (1) is 'for all $i$, $j$ where $r(i,j)$ equals $1$' and could be written:

$$
= \left[ \frac{1}{2}\sum_{j=0}^{n_u-1} \sum_{i=0}^{n_m-1}r(i,j)*(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2 \right]
+\text{regularization}
$$

In [52]:
def cofi_cost_func_v(X, W, b, Y, R, lambda_):
    '''
    Returns the cost for the content-based filtering
    Vectorized for speed. Uses tensorflow operations to be compatible with custom training loop.
    Args:
      X (ndarray (num_movies,num_features)): matrix of item features
      W (ndarray (num_users,num_features)) : matrix of user parameters
      b (ndarray (1, num_users)            : vector of user parameters
      Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies
      R (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
      lambda_ (float): regularization parameter
    Returns:
      J (float) : Cost
    '''
    j = (tf.linalg.matmul(X, tf.transpose(W)) + b - Y)*R
    J = 0.5 * tf.reduce_sum(j**2) + (lambda_/2) * (tf.reduce_sum(X**2) + tf.reduce_sum(W**2))
    return J

Let's evaluate the cost function for a small dataset.

In [55]:
# Reduce the data set size so that this runs faster
num_users_r = 4
num_movies_r = 5 
num_features_r = 3

X_r = X[:num_movies_r, :num_features_r]
W_r = W[:num_users_r,  :num_features_r]
b_r = b[0, :num_users_r].reshape(1,-1)
Y_r = Y[:num_movies_r, :num_users_r]
R_r = R[:num_movies_r, :num_users_r]

In [56]:
J = cofi_cost_func_v(X_r, W_r, b_r, Y_r, R_r, lambda_=1.5)
print(f'Cost (with regularization): {J:0.2f}')

Cost (with regularization): 28.09


# 5. Learning movie recommendations

You can choose your movies based on their index in the original dataframe, and rate them as you wish. 

In [110]:
# locate your Movie index by title

Movie = 'Knock Knock'

raw_df[raw_df['title'].str.contains(Movie)]

Unnamed: 0,mean rating,number of ratings,title
4209,1.5,1,Knock Knock (2015)


In [113]:
# rate them! 

my_ratings = np.zeros(raw_df.values.shape[0]) ## init

my_ratings[246]  = 5   ## of course Shrek (2001) is 5/5
my_ratings[1045] = 5   ## Shrek 2 (2004)
my_ratings[393]  = 5   ## LOTR 1
my_ratings[653]  = 5   ## LOTR 2
my_ratings[929]  = 5   ## LOTR 3
my_ratings[2716] = 4   ## Inception (2010)
my_ratings[3014] = 2   # Avengers, The (2012)
my_ratings[2165] = 3   # You Don't Mess with the Zohan (2008)
my_ratings[4083] = 4   # Inside Out (2015)
my_ratings[1841] = 4   # Hot Fuzz (2007)
my_ratings[4693] = 5   # The Shape of Water (2017)
my_ratings[3962] = 4   # The Hateful Eight (2015)
my_ratings[3336] = 5   # Django Unchained (2012)
my_ratings[877]  = 4.5 # Kill Bill: Vol. 1 (2003)
my_ratings[1006] = 4.5 # Kill Bill: Vol. 2 (2004)
my_ratings[1352] = 4   # Sin City (2005)
my_ratings[4209] = 1   # Knock Knock (2015)
my_ratings[2925] = 1   # Louis Theroux: Law & Disorder (2008)
my_ratings[2937] = 1   # Nothing to Declare (Rien à déclarer)
my_ratings[793]  = 5   # Pirates of the Caribbean: The Curse of the Black Pearl (2003)
my_ratings[1150] = 4   # Incredibles, The (2004)
my_ratings[382]  = 1   # Amelie (Fabuleux destin d'Amélie Poulain, Le)


my_rated = [i for i in range(len(my_ratings)) if my_ratings[i] > 0]

print('\nNew user ratings:\n')
for i in range(len(my_ratings)):
    if my_ratings[i] > 0 :
        print(f'Rated {my_ratings[i]} for  {raw_df.loc[i,'title']}');



New user ratings:

Rated 5.0 for  Shrek (2001)
Rated 1.0 for  Amelie (Fabuleux destin d'Amélie Poulain, Le) (2001)
Rated 5.0 for  Lord of the Rings: The Fellowship of the Ring, The (2001)
Rated 5.0 for  Lord of the Rings: The Two Towers, The (2002)
Rated 5.0 for  Pirates of the Caribbean: The Curse of the Black Pearl (2003)
Rated 4.5 for  Kill Bill: Vol. 1 (2003)
Rated 5.0 for  Lord of the Rings: The Return of the King, The (2003)
Rated 4.5 for  Kill Bill: Vol. 2 (2004)
Rated 5.0 for  Shrek 2 (2004)
Rated 4.0 for  Incredibles, The (2004)
Rated 4.0 for  Sin City (2005)
Rated 4.0 for  Hot Fuzz (2007)
Rated 3.0 for  You Don't Mess with the Zohan (2008)
Rated 4.0 for  Inception (2010)
Rated 1.0 for  Louis Theroux: Law & Disorder (2008)
Rated 1.0 for  Nothing to Declare (Rien à déclarer) (2010)
Rated 2.0 for  Avengers, The (2012)
Rated 5.0 for  Django Unchained (2012)
Rated 4.0 for  The Hateful Eight (2015)
Rated 4.0 for  Inside Out (2015)
Rated 1.0 for  Knock Knock (2015)
Rated 5.0 for  T

Now let's add these reviews to $\mathbf{Y}$ and $\mathbf{R}$ and normalize the ratings.

In [134]:
# Add new user ratings to Y 
Y = np.c_[my_ratings, Y]

# Add new user indicator to R
R = np.c_[(my_ratings != 0).astype(int), R]

# Normalize the dataset
from recsys_utils import normalizeRatings
Ynorm, Ymean = normalizeRatings(Y, R)

Let's prepare to train the model. Initialize the parameters and select the Adam optimizer.

In [135]:
# Useful Values
num_movies, num_users = Y.shape
num_features = 100

# Set Initial Parameters (W, X), use tf.Variable to track them
tf.random.set_seed(1234) # for consistent results

W = tf.Variable(tf.random.normal((num_users,  num_features), dtype=tf.float64), name='W')
X = tf.Variable(tf.random.normal((num_movies, num_features), dtype=tf.float64), name='X')
b = tf.Variable(tf.random.normal((1,          num_users),    dtype=tf.float64), name='b')

# Instantiate an optimizer 
optimizer = keras.optimizers.Adam(learning_rate=1e-1)

Let's now train the collaborative filtering model. This will learn the parameters $\mathbf{X}$, $\mathbf{W}$ and $\mathbf{b}$.

The operations involved do not fit the typical supervised learning workflow, so we can make a custom training loop, using the `GradientTape` object.

In [136]:
iterations = 200
lambda_ = 1

for iter in range(iterations):
    # Use TF's Gradient Tape to record the operations used to compute the cost (similar to PyTorch)
    with tf.GradientTape() as tape:
        # compute the cost
        cost_value = cofi_cost_func_v(X, W, b, Ynorm, R, lambda_)

    # Use the gradient tape to auto-retrieve the gradients with respect to the loss
    grads = tape.gradient(cost_value, [X, W, b])

    # run one step of gradient descent (Adam) by updating the value of the variables to minimize the loss
    optimizer.apply_gradients(zip(grads, [X, W, b]))

    # Log periodically 

    if iter % 20 == 0:
        print(f'Training loss at iteration {iter}: {cost_value}')

Training loss at iteration 0: 2272368.5994359604
Training loss at iteration 20: 131261.93090126733
Training loss at iteration 40: 49322.11601578797
Training loss at iteration 60: 23408.787979445973
Training loss at iteration 80: 13054.027365913933
Training loss at iteration 100: 8191.663217169939
Training loss at iteration 120: 5644.20360613328
Training loss at iteration 140: 4213.595078310591
Training loss at iteration 160: 3371.822073570485
Training loss at iteration 180: 2858.6764583729746


# 6. Recommendations

We compute the ratings for all the movies and users and display the movies that are recommended, based on the ratings entered by the user in `my_ratings[]`.

To predict the rating of movie $i$ for user $j$, we compute the dot product of the feature vector $\mathbf{x}^{(i)}$ and the parameter vector $\mathbf{w}^{(j)}$, and add the bias term $b^{(j)}$, using matrix multiplication. 

Remember that tensorflow tensors can be reconverted to numpy arrays using the `.numpy()` method (similarly to PyTorch).

In [137]:
titles = raw_df['title'].values.astype(str)

In [139]:
# make a prediction using the trained weights and biases
p = np.matmul(X.numpy(), np.transpose(W.numpy())) + b.numpy()

# the parameter Y was normalized by subtracting the mean value of the ratings per user, so let's readd it
pm = p + Ymean

my_predictions = pm[:, 0]

# sort predictions 
ix = tf.argsort(my_predictions, direction='DESCENDING')

for i in range(17): ## examples to print
    j = ix[i]
    if j not in my_rated:
        print(f'Predicting rating {my_predictions[j]} for movie {titles[j]}')

print('\n\nOriginal vs Predicted ratings:\n')
for i in range(len(my_ratings)):
    if my_ratings[i] > 0:
        print(f'Original {my_ratings[i]}, Predicted {my_predictions[i]:0.2f} for {titles[i]}')

Predicting rating 5.070957292417822 for movie Colourful (Karafuru) (2010)
Predicting rating 5.069214354793804 for movie 'Salem's Lot (2004)
Predicting rating 5.028309403530229 for movie One I Love, The (2014)
Predicting rating 5.0277120013977585 for movie Delirium (2014)
Predicting rating 5.027635664384478 for movie Laggies (2014)
Predicting rating 5.019451070251684 for movie Act of Killing, The (2012)
Predicting rating 5.014071989475742 for movie Odd Life of Timothy Green, The (2012)
Predicting rating 5.011087427063315 for movie Into the Abyss (2011)
Predicting rating 5.011011838760622 for movie Eichmann (2007)
Predicting rating 5.010943070547765 for movie Battle Royale 2: Requiem (Batoru rowaiaru II: Chinkonka) (2003)
Predicting rating 5.002322193775573 for movie Valet, The (La doublure) (2006)
Predicting rating 4.987960620146327 for movie Max Manus (2008)
Predicting rating 4.985631242984548 for movie Dylan Moran: Monster (2004)
Predicting rating 4.9854135620603826 for movie Who Kill

In [140]:
filter=(raw_df['number of ratings'] > 20)
raw_df['pred'] = my_predictions
raw_df = raw_df.reindex(columns=['pred', 'mean rating', 'number of ratings', 'title'])
raw_df.loc[ix[:300]].loc[filter].sort_values('mean rating', ascending=False)

Unnamed: 0,pred,mean rating,number of ratings,title
1743,4.631884,4.252336,107,"Departed, The (2006)"
988,4.495105,4.160305,131,Eternal Sunshine of the Spotless Mind (2004)
2395,4.902303,4.136364,88,Inglourious Basterds (2009)
929,4.953829,4.118919,185,"Lord of the Rings: The Return of the King, The..."
393,4.960439,4.106061,198,"Lord of the Rings: The Fellowship of the Ring,..."
1318,4.526754,4.075,40,Howl's Moving Castle (Hauru no ugoku shiro) (2...
3714,4.551042,4.050847,59,Guardians of the Galaxy (2014)
653,4.956583,4.021277,188,"Lord of the Rings: The Two Towers, The (2002)"
3083,4.490959,3.993421,76,"Dark Knight Rises, The (2012)"
2399,4.614411,3.96875,32,Moon (2009)
