# Collaborative Filtering
### Author: _Calvin Chi_

We will use collaborative filtering to build a joke recommender system to recommend jokes that match people's sense of humour, based on historical ratings of $m = 24,983$ users for $p = 100$ jokes. The historical data can be represented as a matrix $R \in \mathbb{R}^{m \times p}$, where each user rates a subset of jokes and gives each a rating in $[-10, 10]$, with a higher value representing higher satisfaction. 

Define directory locations and load libraries

In [1]:
from scipy import io
import numpy as np
import pandas as pd
DIR = "datasets"

Load the training and validation data

In [2]:
train = pd.DataFrame(io.loadmat(DIR + '/joke_train.mat')['train'])
validation = open(DIR + '/validation.txt').readlines()

Preprocess validation dataset

In [8]:
validation = np.array([list(map(int, line.strip().split(','))) for line in validation])

Set the regularization parameter $\lambda = 125$ and number of latent features to learn $d = 10$

In [18]:
lda = 125
d = 10

Fill `NaN` with 0, with 0 used to detect missingness in the implementation.

In [19]:
train.fillna(0, inplace=True)

Let $R \in \mathbb{R}^{m \times p}$ be the ratings matrix with $\texttt{NaN}$ entries, $U \in \mathbb{R}^{m \times d}$  be the latent factor matrix for people, $V \in \mathbb{R}^{d \times p}$ be the latent factor matrix for jokes, $W \in \mathbb{m \times p}$ be a matrix of whether there was a rating, where

$$W_{ij}=\begin{cases}
    1, & \text{if $R_{ij} \neq 0$}\\
    0, & \text{Otherwise}\\
  \end{cases}$$
  
Let $\lambda$ be the regularization hyperparameter and $u_{i} = U[i, :]$ and $v_{j} = V[:, j]$ be the latent factor of person $i$ and latent factor for joke $j$ respectively. The latent factor matrices $U$ and $J$ are learned are from minimizing the mean squared error MSE defined as

$$MSE = \sum_{(i, j) \in S}(\langle u_{i}, v_{j} \rangle - R_{ij})^{2} + \lambda \sum_{i = 1}^{m}||u_{i}||_{2}^{2} + \lambda \sum_{j = 1}^{p}||v_{j}||_{2}^{2}, \quad \text{where }S = \{(i, j): R_{ij} \neq \texttt{NaN}\} $$

Where $\langle u_{i}, v_{j} \rangle$ denotes the inner product between $u_{i}$ and $v_{j}$. Solving for $U$ and $V$ can be done via [alternating least squares](https://bugra.github.io/work/notes/2014-04-19/alternating-least-squares-method-for-collaborative-filtering/). Let $R_{u_{i}} \in \mathbb{R}^{1 \times p}$ be the joke rating for person $u_{i}$, $R_{v_{j}} \in \mathbb{R}^{m \times 1}$ be the ratings given by $m$ people for joke $v_{j}$, $W_{u_{i}} \in \mathbb{R}^{p \times p}$ be the diagonalized matrix of $W[i, :]$, and $W_{v_{j}} \in \mathbb{R}^{m \times m}$ be the diagonalized matrix of $W[:, j]$. In matrix notation, the MSE for $u_{i}$ and $v_{j}$ are respectively

$$MSE(u_{i}) = (R_{u_{i}} - u_{i}V)W_{u_{i}}(R_{u_{i}} - u_{i}V)^{T} + \lambda u_{i}u_{i}^{T} + \lambda v_{j}v_{j}^{T}$$

$$MSE(v_{j}) = (R_{v_{j}} - Uv_{j})^{T}W_{v_{j}}(R_{v_{j}} - Uv_{j}) + \lambda u_{i}u_{i}^{T} + \lambda v_{j}v_{j}^{T}$$

The update equations for $u_{i}$ are solved as

\begin{align*}
\frac{d(MSE(u_{i})}{du_{i}} &= -R_{u_{i}}W_{u}V^{T} + u_{i}VW_{u_{i}}V^{T} + \lambda u_{i} = 0\\
&\Rightarrow u_{i}(VW_{u_{i}}V^{T} + \lambda I) = R_{u_{i}}W_{u_{i}}V^{T}\\
&\Rightarrow u_{i} = R_{u_{i}}W_{u_{i}}V^{T}(VW_{u_{i}}V^{T} + \lambda I)^{-1}\\
\end{align*}

The update equations for $v_{j}$ are solved as

\begin{align*}
\frac{d(MSE(v_{j})}{dv_{j}} &= -U^{T}W_{v_{j}}R_{v_{j}} + U^{T}W_{v_{j}}Uv_{j} + \lambda v_{j} = 0\\
&\Rightarrow v_{j}(U^{T}W_{v_{j}}U + \lambda I) = U^{T}W_{v_{j}}R_{v_{j}}\\
&\Rightarrow v_{j} = (U^{T}W_{v_{j}}U + \lambda I)^{-1}U^{T}W_{v_{j}}R_{v_{j}}\\
\end{align*}

And now the implementation

In [68]:
def collaborative_filtering(R, d, lda, threshold):
    W = (R != 0).astype(np.float64)
    MSE1 = float("inf")
    jokes = np.random.normal(0, 1, (d, R.shape[1]))
    users = np.random.normal(0, 1, (R.shape[0], d))
    # idx is tuple - 1st array contain row indices and 2nd array contains column indices
    S_idx = np.where(R.values != 0)
    predicted_ratings = users.dot(jokes)
    MSE2 = np.mean((predicted_ratings[S_idx] - R.values[S_idx])**2)
    while MSE1 - MSE2 > threshold:
        MSE1 = MSE2
        for u, Wu in enumerate(W.values):
            rating = np.reshape(R.iloc[u, :], (1, R.shape[1]))
            users[u, :] = rating.dot(np.diag(Wu)).dot(jokes.T).dot(np.linalg.inv(jokes.dot(np.diag(Wu)).dot(jokes.T) + 
                        lda * np.eye(d))).flatten()
        for r, Wr in enumerate(W.T.values):
            rating = np.reshape(R.iloc[:, r], (R.shape[0], 1))
            jokes[:, r] = np.linalg.inv(users.T.dot(np.diag(Wr)).dot(users) + 
                                        lda * np.eye(d)).dot(users.T.dot(np.diag(Wr)).dot(rating)).flatten()
        predicted_ratings = users.dot(jokes)
        MSE2 = np.mean((predicted_ratings[S_idx] - R.values[S_idx])**2)
    return users, jokes 

Train on the historical data

In [69]:
users, jokes = collaborative_filtering(train, d, lda, 0.1)

Now that collaborative filtering has produced the latent factor matrices, the matrix multiplication $P = UV$ produces a $\mathbb{R}^{m \times p}$ matrix, where the $(i, j)$ entry corresponds to the predicted rating for joke $j$ by user $i$. We can now use $P$ to predict whether user $i$ will like joke $j$, with liking a joke defined as giving a rating greater than 0. 

The validation dataset has format

```
1, 5, 1
1, 8, 1
```

Where each line follows the format ''$\texttt{i, j, s}$'', representing user $\texttt{i}$ and joke $\texttt{j}$ with rating $\texttt{s}$. 

In [75]:
predicted_ratings = users.dot(jokes)

Convert ratings to whether to recommend joke to user. 

In [78]:
positive_rating = (predicted_ratings > 0).astype(int)

Get the right row and column indices from $P$ for prediction on validation dataset.

In [84]:
row_idx = validation[:, 0] - 1
col_idx = validation[:, 1] - 1

Make prediction and calculate mean accuracy.

In [91]:
accuracy = np.mean(positive_rating[row_idx, col_idx] == validation[:, 2])
accuracy

0.73170731707317072

This concludes my notes on collaborative filtering.