# Unit 5: Model-based Collaborative Filtering for **Rating** Prediction

In this unit, we change the approach towards CF from neighborhood-based to **model-based**. This means that we create and train a model for describing users and items instead of using the k nearest neighbors. The model parameters are latent representations for users and items.

Key to this idea is to compress the sparse interaction information of $R$ by finding two matrices $U$ and $V$ that by multiplication reconstruct $R$. The decomposition of $R$ into $U \times V$ is called _matrix factorization_ and we refer to $U$ as user latent factor matrix and $V$ as item latent factor matrix.

Compressing the sparse matrix into the product of two matrices means that the two remaining matrices are much smaller. This decrease in size is governed by the dimension of latent user/item vectors and symbolized by $d \in \mathbb{N}$. We choose $d$ to be much smaller than the number of items or users:

\begin{equation*}
\underset{m\times n}{\mathrm{R}} \approx  \underset{m\times d}{U} \times \underset{d\times n}{V^T} \\
d \ll \min\{m, n\}
\end{equation*}

In [None]:
from collections import OrderedDict
import itertools
from typing import Dict, List, Tuple

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

In [None]:
from recsys_training.data import Dataset
from recsys_training.evaluation import get_relevant_items

In [None]:
ml100k_ratings_filepath = '../data/raw/ml-100k/u.data'

## Load Data

In [None]:
data = Dataset(ml100k_ratings_filepath)
data.rating_split(seed=42)
user_ratings = data.get_user_ratings()

## Initialize the user and item latent factors, i.e. the model parameters

In [None]:
seed = 42
m = data.n_users
n = data.n_items
d = 8

As we want to learn the user/item latent factors from rating data, we first randomly initialize them

In [None]:
np.random.seed(seed)
user_factors = np.random.normal(0, 1, (m, d))
item_factors = np.random.normal(0, 1, (n, d))
ratings = data.train_ratings[['user', 'item', 'rating']].sample(frac=1, random_state=seed)

## Training

We fit the model to the data with a technique called _minibatch gradient descent_.

This means that for a number of epochs, i.e. full passes through the training data (ratings), we randomly choose a small subset of ratings (our minibatch) holding user, item and rating for each instance. Then, we compute the rating prediction as the dot product of user and item latent vectors (also called embeddings) and compute the mean squared error between predicted and true rating. We derive this error for user and item latent vectors to obtain our partial derivatives. We subtract part of the gradient from our latent vectors to move into the direction of minimizing error, i.e. deviation between true values and predictions.

To keep track of the decreasing error, we compute the root mean squared error and print it.

In [None]:
epochs = 10
batch_size = 64
learning_rate = 0.01

num_batches = int(np.ceil(len(ratings) / batch_size))
rmse_trace = []
rmse_test_trace = []

**Task:** Implement `compute_gradients` that receives a minibatch and computes the gradients for user and item latent vectors involved.

In [None]:
def compute_gradients(ratings: np.array,
                      u: np.array,
                      v: np.array) -> Tuple[np.array, np.array]:
    pass

    return u_grad, v_grad

In [None]:
def get_rmse(rating, u, v) -> float:
    pred = np.sum(u * v, axis=1)
    error = rating - pred
    rmse = np.sqrt(np.mean(error ** 2))
    return rmse

In [None]:
for epoch in range(epochs):
    for idx in range(num_batches):
        
        minibatch = ratings.iloc[idx * batch_size:(idx + 1) * batch_size]
        
        # deduct 1 as user ids are 1-indexed, but array is 0-indexed
        user_embeds = user_factors[minibatch['user'].values - 1]
        item_embeds = item_factors[minibatch['item'].values - 1]

        user_grads, item_grads = compute_gradients(minibatch['rating'].values,
                                                   user_embeds,
                                                   item_embeds)
        
        # update user and item factors
        user_factors[minibatch['user'].values - 1] -= learning_rate * user_grads
        item_factors[minibatch['item'].values - 1] -= learning_rate * item_grads

        if not idx % 300:
            rmse = get_rmse(minibatch['rating'].values,
                            user_embeds,
                            item_embeds)
            rmse_test = get_rmse(data.test_ratings['rating'].values,
                                 user_factors[data.test_ratings['user'].values - 1],
                                 item_factors[data.test_ratings['user'].values - 1])
            rmse_trace.append(rmse)
            rmse_test_trace.append(rmse_test)
            print(f"Epoch: {epoch:02d}, RMSE: {rmse:.3f}, Test RMSE: {rmse_test:.3f}")

In [None]:
plt.figure(figsize=(12,8))
plt.plot(range(len(rmse_trace)), rmse_trace, 'b--', label='Train')
plt.plot(range(len(rmse_test_trace)), rmse_test_trace, 'g--', label='Test')
plt.grid(True)
plt.legend()
plt.xlabel('Epoch')
plt.ylabel('RMSE')
plt.show()

### Using the model for Recommendations

We have now created a model to describe users and items in terms of latent vectors. We fitted them to reconstruct ratings by multiplication. So for obtaining recommendations we simply multiply user-item latent vectors we are interested in and see favorable combinations where predicted ratings, i.e. the products, are rather high.

Thus, before writing the `get_recommendations` we first implement `get_prediction`.

**Task:** Implement `get_prediction` for predicting ratings for a user and all items or a set of provided items. Remember to remove _known positives_.

In [None]:
def get_prediction(user,
                   items: np.array = None,
                   remove_known_pos: bool = True) -> Dict[int, Dict[str, float]]:
    pass

    return preds

In [None]:
item_predictions = get_prediction(1)

In [None]:
list(item_predictions.items())[:20]

In [None]:
def get_recommendations(user: int, N: int, remove_known_pos: bool = False) -> List[Tuple[int, Dict[str, float]]]:
    predictions = get_prediction(user, remove_known_pos=remove_known_pos)
    recommendations = []
    for item, pred in predictions.items():
        add_item = (item, pred)
        recommendations.append(add_item)
        if len(recommendations) == N:
            break

    return recommendations

In [None]:
recommendations = get_recommendations(1, 10)

In [None]:
recommendations

### Evaluation

In [None]:
N = 10

In [None]:
relevant_items = get_relevant_items(data.test_ratings)

In [None]:
users = relevant_items.keys()
prec_at_N = dict.fromkeys(data.users)

for user in users:
    recommendations = get_recommendations(user, N, remove_known_pos=True)
    recommendations = [val[0] for val in recommendations]
    hits = np.intersect1d(recommendations,
                          relevant_items[user])
    prec_at_N[user] = len(hits)/N

In [None]:
recommendations

In [None]:
np.mean([val for val in prec_at_N.values() if val is not None])