# Collaborative Filtering

## Librarys

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import random
import tensorflow as tf

random.seed(99999)

## Data - Movie ratings dataset

The data set is derived from the [MovieLens "ml-latest-small"](https://grouplens.org/datasets/movielens/latest/) dataset.   
[F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. <https://doi.org/10.1145/2827872>]

The original dataset has  9000 movies rated by 600 users. The dataset has been reduced in size to focus on movies from the years since 2000. This dataset consists of ratings on a scale of 0.5 to 5 in 0.5 step increments. The reduced dataset has $n_u = 443$ users, and $n_m= 4778$ movies. 

Below, you will load the movie dataset into the variables $Y$ and $R$.

The matrix $Y$ (a  $n_m \times n_u$ matrix) stores the ratings $y^{(i,j)}$. The matrix $R$ is an binary-valued indicator matrix, where $R(i,j) = 1$ if user $j$ gave a rating to movie $i$, and $R(i,j)=0$ otherwise. 

Throughout this part of the exercise, you will also be working with the
matrices, $\mathbf{X}$, $\mathbf{W}$ and $\mathbf{b}$: 

$$\mathbf{X} = 
\begin{bmatrix}
--- (\mathbf{x}^{(0)})^T --- \\
--- (\mathbf{x}^{(1)})^T --- \\
\vdots \\
--- (\mathbf{x}^{(n_m-1)})^T --- \\
\end{bmatrix} , \quad
\mathbf{W} = 
\begin{bmatrix}
--- (\mathbf{w}^{(0)})^T --- \\
--- (\mathbf{w}^{(1)})^T --- \\
\vdots \\
--- (\mathbf{w}^{(n_u-1)})^T --- \\
\end{bmatrix},\quad
\mathbf{ b} = 
\begin{bmatrix}
 b^{(0)}  \\
 b^{(1)} \\
\vdots \\
b^{(n_u-1)} \\
\end{bmatrix}\quad
$$ 

The $i$-th row of $\mathbf{X}$ corresponds to the
feature vector $x^{(i)}$ for the $i$-th movie, and the $j$-th row of
$\mathbf{W}$ corresponds to one parameter vector $\mathbf{w}^{(j)}$, for the
$j$-th user. Both $x^{(i)}$ and $\mathbf{w}^{(j)}$ are $n$-dimensional
vectors. For the purposes of this exercise, you will use $n=10$, and
therefore, $\mathbf{x}^{(i)}$ and $\mathbf{w}^{(j)}$ have 10 elements.
Correspondingly, $\mathbf{X}$ is a
$n_m \times 10$ matrix and $\mathbf{W}$ is a $n_u \times 10$ matrix.

We will start by loading the movie ratings dataset to understand the structure of the data.
We will load $Y$ and $R$ with the movie dataset.  
We'll also load $\mathbf{X}$, $\mathbf{W}$, and $\mathbf{b}$ with pre-computed values. These values will be learned later in the lab, but we'll use pre-computed values to develop the cost model.

In [2]:
#Load data
df_movies = pd.read_csv('data\movies\movies.csv')
df_ratings = pd.read_csv('data\movies\\ratings.csv')

In [3]:
years = [(str(i)) for i in range(2000, 2022)]
df_movies['year'] = df_movies['title'].str.extract(r'\((\d{4})\)')

mask = df_movies['title'].str.contains('|'.join(years))
df_movies = df_movies[mask]

genres_one_hot_encoding = df_movies['genres'].str.get_dummies(sep='|')
df_movies = pd.concat([df_movies, genres_one_hot_encoding], axis=1)

df_movies.reset_index(inplace=True)
df_movies.drop(columns=['index', 'genres', 'year', 'title', '(no genres listed)'], inplace=True)

In [4]:
ratings_filtered_mask = df_ratings['movieId'].isin(df_movies['movieId'])
df_ratings = df_ratings[ratings_filtered_mask]
df_ratings.reset_index(inplace=True)
df_ratings.drop(columns=['index', 'timestamp'], inplace=True)

In [5]:
# Create new ids
new_movie_ids = df_movies.index
new_user_ids = range(df_ratings['userId'].nunique())

# Dictionarys to map old ids with new ones
id_mapping = dict(zip(df_movies['movieId'], new_movie_ids))
user_id_mapping = dict(zip(sorted(df_ratings['userId'].unique()), new_user_ids))

df_ratings['movieId'] = df_ratings['movieId'].map(id_mapping)
df_ratings['userId'] = df_ratings['userId'].map(user_id_mapping)
df_movies['movieId'] = df_movies['movieId'].map(id_mapping)


In [6]:
df_ratings.head()

Unnamed: 0,userId,movieId,rating
0,0,8,5.0
1,0,56,5.0
2,0,65,4.0
3,0,73,4.0
4,0,84,5.0


In [7]:
df_movies.head()

Unnamed: 0,movieId,Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,Fantasy,Film-Noir,Horror,IMAX,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0
1,1,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0
2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
3,3,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0
4,4,0,0,1,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0


In [8]:
X = df_movies.drop(columns='movieId')
num_movies = X.shape[0]
num_features = X.shape[1]
num_users = df_ratings['userId'].nunique()
R = np.zeros((num_movies, num_users))
Y = np.zeros((num_movies, num_users))

for index, row in df_ratings.iterrows():
    # Movie index and user index for the matrixes
    movie_index = int(row['movieId']) - 1  # Subtract 1 to adjust to base 0
    user_id = int(row['userId']) - 1  # Subtract 1 to adjust to base 0
    
    # Fill R with 1 if user rated the movie
    R[movie_index, user_id] = 1
    
    # Fill Y with corresponding rate
    Y[movie_index, user_id] = row['rating']

In [9]:
print("Y", Y.shape, "R", R.shape)
print("X", X.shape)
# print("W", W.shape)
# print("b", b.shape)
print("num_features", num_features)
print("num_movies",   num_movies)
print("num_users",    num_users)

Y (4789, 458) R (4789, 458)
X (4789, 19)
num_features 19
num_movies 4789
num_users 458


In [10]:
#  From the matrix, we can compute statistics like average rating.
tsmean =  np.mean(Y[0, R[0, :].astype(bool)])
print(f"Average rating for movie 1 : {tsmean:0.3f} / 5" )

Average rating for movie 1 : 2.708 / 5


## Recommender systems


The goal of a collaborative filtering recommender system is to generate two vectors: For each user, a 'parameter vector' that embodies the movie tastes of a user. For each movie, a feature vector of the same size which embodies some description of the movie. The dot product of the two vectors plus the bias term should produce an estimate of the rating the user might give to that movie.

### Collaborative filtering learning algorithm

Now, you will begin implementing the collaborative filtering learning
algorithm. You will start by implementing the objective function. 

The collaborative filtering algorithm in the setting of movie
recommendations considers a set of $n$-dimensional parameter vectors
$\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)}$, $\mathbf{w}^{(0)},...,\mathbf{w}^{(n_u-1)}$ and $b^{(0)},...,b^{(n_u-1)}$, where the
model predicts the rating for movie $i$ by user $j$ as
$y^{(i,j)} = \mathbf{w}^{(j)}\cdot \mathbf{x}^{(i)} + b^{(j)}$ . Given a dataset that consists of
a set of ratings produced by some users on some movies, you wish to
learn the parameter vectors $\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)},
\mathbf{w}^{(0)},...,\mathbf{w}^{(n_u-1)}$  and $b^{(0)},...,b^{(n_u-1)}$ that produce the best fit (minimizes
the squared error).

You will complete the code in cofiCostFunc to compute the cost
function for collaborative filtering. 

### 4.1 Collaborative filtering cost function

The collaborative filtering cost function is given by
$$J({\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)},\mathbf{w}^{(0)},b^{(0)},...,\mathbf{w}^{(n_u-1)},b^{(n_u-1)}})= \left[ \frac{1}{2}\sum_{(i,j):r(i,j)=1}(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2 \right]
+ \underbrace{\left[
\frac{\lambda}{2}
\sum_{j=0}^{n_u-1}\sum_{k=0}^{n-1}(\mathbf{w}^{(j)}_k)^2
+ \frac{\lambda}{2}\sum_{i=0}^{n_m-1}\sum_{k=0}^{n-1}(\mathbf{x}_k^{(i)})^2
\right]}_{regularization}
\tag{1}$$
The first summation in (1) is "for all $i$, $j$ where $r(i,j)$ equals $1$" and could be written:

$$
= \left[ \frac{1}{2}\sum_{j=0}^{n_u-1} \sum_{i=0}^{n_m-1}r(i,j)*(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2 \right]
+\text{regularization}
$$

You should now write cofiCostFunc (collaborative filtering cost function) to return this cost.

In [11]:
def cofi_cost_func(X, W, b, Y, R, lambda_):
    """
    Returns the cost for the content-based filtering
    Args:
        X (ndarray (num_movies,num_features)): matrix of item features
        W (ndarray (num_users,num_features)) : matrix of user parameters
        b (ndarray (1, num_users)            : vector of user parameters
        Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies
        R (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
        lambda_ (float): regularization parameter
    Returns:
        J (float) : Cost
    """
    nm, nu = Y.shape
    J = 0
    
    for j in range(nu):
        for i in range(nm):
            f_wxb = (np.dot(W[j], X[i]) + b[0][j])
            J += R[i][j] * (f_wxb - Y[i][j])**2
            
            
#     w_sum = 0
#     for j in range(nu):
#         for k in range(len(W[j])):
#             w_sum += W[j][k]**2
#     w_sum *= lambda_

#     x_sum = 0
#     for j in range(nu):
#         print(X[j])
#         for k in range(len(X[j])):
#             print(X[j][k])
#             x_sum += X[j][k]**2
#     x_sum *= lambda_

#     J += w_sum + x_sum

    J *= 1/2
    J += (lambda_/2) * (np.sum(np.square(W)) + np.sum(np.square(X)))
    

    return J

**Vectorized Implementation**

It is important to create a vectorized implementation to compute $J$, since it will later be called many times during optimization.

In [12]:
def cofi_cost_func_v(X, W, b, Y, R, lambda_):
    """
    Returns the cost for the content-based filtering
    Vectorized for speed. Uses tensorflow operations to be compatible with custom training loop.
    Args:
        X (ndarray (num_movies,num_features)): matrix of item features
        W (ndarray (num_users,num_features)) : matrix of user parameters
        b (ndarray (1, num_users)            : vector of user parameters
        Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies
        R (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
        lambda_ (float): regularization parameter
    Returns:
        J (float) : Cost
    """
    X = tf.convert_to_tensor(X, dtype=tf.float64)
    W = tf.convert_to_tensor(W, dtype=tf.float64)
    j = (tf.linalg.matmul(X, tf.transpose(W)) + b - Y)*R
    J = 0.5 * tf.reduce_sum(j**2) + (lambda_/2) * (tf.reduce_sum(X**2) + tf.reduce_sum(W**2))
    return J

In [13]:
def initialize_parameters(num_users, num_features):
    """
    Initializes the parameters W and b for collaborative filtering algorithm
    
    Args:
    num_users (int): Number of users
    num_features (int): Number of features
    
    Returns:
    W (ndarray): Initialized matrix of user parameters
    b (ndarray): Initialized vector of user parameters
    """
    # Initialize W with random values
    W = np.random.randn(num_users, num_features)
    
    # Initialize b with zeros
    b = np.zeros((1, num_users))
    
    return W, b

In [14]:
W, b = initialize_parameters(num_users, num_features)

# Evaluate cost function
J = cofi_cost_func(np.array(X), W, b, Y, R, 0)
print(f"Cost: {J:0.2f}")

# Evaluate cost function with regularization 
J = cofi_cost_func(np.array(X), W, b, Y, R, 1.5)
print(f"Cost (with regularization): {J:0.2f}")

Cost: 291800.11
Cost (with regularization): 306662.63


In [15]:
# Evaluate cost function
J = cofi_cost_func_v(np.array(X), W, b, Y, R, 0)
print(f"Cost: {J:0.2f}")

# Evaluate cost function with regularization 
J = cofi_cost_func_v(np.array(X), W, b, Y, R, 1.5)
print(f"Cost (with regularization): {J:0.2f}")

Cost: 291800.11
Cost (with regularization): 306662.63
