In [1]:
import numpy as np
import pytest
import torch
from torch import nn

from typing import Callable, List, Optional, Union

# RecPack: An Experimental Toolkit for Recommendation Algorithms

by Lien Michiels and Robin Verachtert

Welcome to this demo of the RecPack experimentation framework.

## What is Recpack?



## What is Recpack?
- Python library for recommendation algorithm implementation and evaluation

## What is Recpack?
- Python library for recommendation algorithm implementation and evaluation
- Focus on being easily extendable

Extendable :
- Makes it easy for researchers to compare their new algorithms to the baselines.
- Loosely coopled modules makes for a shallow learning curve to add your own components

## What is Recpack?
### Modules and interactions
![pipeline](RecpackPipeline.png)

## This Demo
- Demonstrate implementation of new algorithm using "Neural Matrix Factorization (He et al. 2017)"

In order to show the user how to use the package, and what we mean with extendable, we show how to implement a Neural Network using pytorch.
Thanks to a functionality richt baseclass this requires limited implementation, and thus facilitates faster development.

## This Demo
- Demonstrate implementation of new algorithm using "Neural Matrix Factorization (He et al. 2017)"
- Run experiment to compare performance of new algorithm to baselines

Once we have implemented the algorithm, we throw it into the arena against some baselines.
Showcasing the pipeline functionality that acts as glue between the different recpack components as well as providing an intuitive way to create and run experiments.

## Neural Matrix Factorization
- Users and items are represented by embedding fectors
- Similarity is modeled using an MLP network, rather than computing <u,i> as in traditional matrix factorization embedding techniques.

![Architecture](NeuMFArchitecture.png)

For simplicity, and to focus on the recpack functionality, we implement the NMF approach that only uses an MLP, as presented in Figure 2 in the original paper.

The architecture boils down to 2 embeddings, an MLP module and a final output conversion function.


### Implementing MLP network

![MLPArchitecture](MLPArchitecture.png)


In order to make the MLP reusable we will implement it as its own Torch module.
The parameters include: layer dimensions for each layer, activation function to be used after each linear operation and the number of output nodes.


In [2]:
# Code used from https://github.com/facebookresearch/multimodal/blob/5dec8a/torchmultimodal/modules/layers/mlp.py
# Another option is to use torchvision.ops.MLP 
# which is nearly identical in implementation, but is in torchvision and not in base torch.
class MLP(nn.Module):
    """A multi-layer perceptron module.
    This module is a sequence of linear layers plus activation functions.
    The user can optionally add normalization and/or dropout to each of the layers.
    
    Code used from https://github.com/facebookresearch/multimodal/blob/5dec8a/torchmultimodal/modules/layers/mlp.py
    
    :param in_dim: Input dimension.
    :type in_dim: int
    :param out_dim: Output dimension.
    :type out_dim: int
    :param hidden_dims: Output dimension for each hidden layer.
    :type hidden_dims: Optional[Union[int, List[int]]] 
    :param dropout: Probability for dropout layers between each hidden layer.
    :type dropout: float
    :param activation: Which activation function to use. 
        Supports module type or partial.
    :type activation: Callable[..., nn.Module]
    """
    pass

In [3]:
class MLP(MLP):
    def __init__(
        self, 
        in_dim: int, 
        out_dim: int,
        hidden_dims: Optional[Union[int, List[int]]] = None,
        dropout: float = 0.5,
        activation: Callable[..., nn.Module] = nn.ReLU,
    ):
        super().__init__()

        layers = nn.ModuleList()

        if hidden_dims is None:
            hidden_dims = []

        if isinstance(hidden_dims, int):
            hidden_dims = [hidden_dims]

        for hidden_dim in hidden_dims:
            layers.append(nn.Linear(in_dim, hidden_dim))
            layers.append(activation())
            layers.append(nn.Dropout(dropout))
            in_dim = hidden_dim
        layers.append(nn.Linear(in_dim, out_dim))
        self.model = nn.Sequential(*layers)

In [4]:
class MLP(MLP):
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.model(x)

In [5]:
EMBEDDING_SIZE = 3
N_USERS = 5
N_ITEMS = 10
a = MLP(2*EMBEDDING_SIZE, N_ITEMS, [6, 4, 2], dropout=0)

In [6]:
# TODO: add tests that check that the MLP works as expected.
def test_MLP_shapes():
    pass


## Implementing the Neural Architecture

![Architecture](NeuMFArchitecture.png)

Given the MLP module implemented before, our NeuMF module includes 2 embeddings, 1 mlp block and a final activation function to modify the scores to the 0-1 interval

We also need to define the interface for the forward function.
Since we need both a user_id and an item_id, it makes sense to have two parameters: user and items
Both are 1D tensors of size L, that contain ids

When calling forward the embeddings will be looked up for each user and item, generating two 2D tensors
These are concatenated (horizontally) into a single matrix of dimensions L x num_components

The concatenated matrix is passed through the MLP, which will predict a final score for each u,i pair.
These scores are put into the 0, 1 interval through the use of a sigmoid function.

And this is returned. Comparison to targets, and learning will happen in the baseclass


In [7]:
class NMFModule(nn.Module):
    """Model that encodes the Neural Matrix Factorization Network.
    
    Implements the 3 tiered network defined in the He et al. paper.

    :param predictive_powers: size of the last hidden layer in MLP.
        Embedding sizes computed as 2 * predictive powers.
    :type predictive_powers: int
    :param n_users: number of users in the network
    :type n_users: int
    :param n_items: number of items in the network
    :type n_items: int
    :param hidden_dims: dimensions of the MLP hidden layers.
    :type hidden_dims: Union[int, List[int]]
    :param dropout: Dropout chance between layers of the MLP
    :type dropout: float
    """
    pass

In [8]:
class NMFModule(NMFModule):
    def __init__(
        self, predictive_powers: int, n_users: int, n_items: int, dropout: float
    ):
        super().__init__()
        num_components = 2 * predictive_powers
        
        self.user_embedding = nn.Embedding(n_users, num_components)
        self.item_embedding = nn.Embedding(n_items, num_components)

        # we use a three tiered MLP as described in the experiments of the paper.
        hidden_dims = [
            4 * predictive_powers, 
            2 * predictive_powers, 
            predictive_powers
        ]

        # Output is always 1, since we need a single score for u,i
        self.mlp = MLP(4 * predictive_powers, 1, 
                       hidden_dims, dropout=dropout)

        self.final = nn.Sigmoid()

        # weight initialization
        self.user_embedding.weight.data.normal_(0, 
            1.0 / self.user_embedding.embedding_dim)
        self.item_embedding.weight.data.normal_(0, 
            1.0 / self.item_embedding.embedding_dim)

In [9]:
class NMFModule(NMFModule):
    def forward(self, users: torch.LongTensor, items: torch.LongTensor) -> torch.FloatTensor:
        """Predict scores for the user item pairs obtained 
        by zipping together the two inputs

        :param users: 1D tensor with user ids
        :type users: torch.LongTensor
        :param items: 1D tensor with item ids
        :type items: torch.LongTensor
        :return: 1D tensor with predicted similarities.
            Position i is the similarity between 
            `users[i]` and `items[i]`
        :rtype: torch.FloatTensor
        """

        # Embedding lookups
        user_emb = self.user_embedding(users)
        item_emb = self.item_embedding(items)

        # Pass concatenated through MLP and apply sigmoid
        return self.final(
            self.mlp(
                torch.hstack([user_emb, item_emb])
            )
        )

In [10]:
def test_output_shapes_NMF(
    predictive_factors, num_users, num_items
):
    """Check that no mather the inner settings of the network, the output is always correct"""
    mod = NMFModule(predictive_factors, num_users, num_items, 0.0)
    
    user_tensor = torch.LongTensor([1, 2])
    item_tensor = torch.LongTensor([1, 2])
    
    res = mod(user_tensor, item_tensor) # predict scores for items given the users
    
    assert res.shape == (2, 1)

    assert (res.detach().numpy() <= 1).all()
    assert (res.detach().numpy() >= 0).all()


test_output_shapes_NMF(5, 10, 10)
test_output_shapes_NMF(5, 3, 10)
test_output_shapes_NMF(1, 3, 3)



In [11]:
from typing import List, Union, Optional

import pandas as pd
from recpack.algorithms.base import TorchMLAlgorithm
from recpack.algorithms.samplers import PositiveNegativeSampler
from recpack.algorithms.util import get_users
from recpack.matrix import InteractionMatrix
from scipy.sparse import csr_matrix, lil_matrix


## Implementing the algorithm

### Choosing the right baseclass
`Algorithm`

Follows the sklearn interface
- `__init__`: sets the (hyper)parameters of the algorithm
- `fit`: train the algorithm, and build a model that can be used for prediction
- `predict`: Given a matrix of user histories, recommend items for these users.

TODO: explain Algorithm
    
You can overwrite these base functions, but it is advised to overload `_fit` and `_predict`, because the `fit` and `predict` contain some handy shared functionality.
Such as:
- timing of training
- conversion of input to csr (can be overloaded if timestamps are needed, or InteractionMatrix is needed for another reason)



### Choosing the right baseclass
`TorchMLAlgorithm`
- Implements common functionality
    - Fitting:
        - Epoch loop
        - Early stopping
        - Keep best/last model
        - Progress logs
    - Prediction
        - Batched prediction loop
        - Prune recommendations 
    - Saving + loading


We could start from the Algorithm baseclass, and implement the whole learning loop and prediction from scratch.
However RecPack provides a more specific baseclass that facilitates implementations of torch algorithms.

It provides additional functionality on top of the `Algorithm` baseclass. The fit function has been implemented such that it already does a loop over the selected number of epochs. Evaluation after each epochs and checking which model to keep is also implemented.

For prediction a loop over the active users in batches is implemented, recommendations are requested per batch, and then pruned.
This way we never need to store a dense user x item matrix.

Finally the baseclass also provides saving and loading functionality, which can be handy while prototyping and is also used to keep the best model at the end of the training if so desired.

### Choosing the right baseclass
We need to implement:

- `__init__`: Sets the hyperparameters
- `_init_model`: Initialises the model during training
- `_train_epoch`: train for a single epoch
- `predict_batch`: predict for a batch of users


Given the baseclass, we need to implement 4 functions:
- init: Here we need to define the additional parameters introduced by this algorithm, as well as the standard ones expected by the baseclass
- `_init_model`: This function will be called at the start of `fit` to construct the neural network itself based on the input matrix. So in this function we need to make sure the model is constructed, and initialised correctly.
- `_train_epoch`: While the baseclass contains a lot of functionality, we still need to implement how to perform training within a single epoch. 
- `batch_predict`: Final function to implement should perform recommendation for a select set of users

In [12]:
class NeuMF(TorchMLAlgorithm):
    """Implementation of Neural Matrix Factoration.

    Neural Matrix Factorization based on MLP architecture
    as presented in Figure 2 in He, Xiangnan, et al. 
    "Neural collaborative filtering."
    In Proceedings of the 26th international conference on world wide web. 2017.

    Represents the users and items using an embedding, 
    similarity between the two is modelled using a neural network.

    The network consists of an embedding for both users and items.
    To compute similarity those two embeddings are 
    concatenated and passed through the MLP
    Finally the similarity is transformed to the [0,1] domain
    using a sigmoid function.

    As in the paper, the sum of square errors is used as loss function.
    Positive items should get a prediction close to 1, 
    while sampled negatives should get a value close to 0.

    The MLP has 3 layers, as suggested in the experiments section.
    Bottom layer has dimension `4 * predictive_powers`, 
    middle layer `2 * predictive_powers`
    and the top layer has `predictive_powers`.

    :param predictive_powers: Size of the last hidden layer in the MLP network.
        Embedding size is 2 * predictive_powers
    :type predictive_powers: int
    :param batch_size: How many samples to use in each update step.
        Higher batch sizes make each epoch more efficient,
        but increases the amount of epochs needed to converge to the optimum,
        by reducing the amount of updates per epoch.
        Defaults to 512.
    :type batch_size: Optional[int]
    :param max_epochs: The max number of epochs to train.
        If the stopping criterion uses early stopping, less epochs could be used.
        Defaults to 10.
    :type max_epochs: Optional[int]
    :param learning_rate: How much to update the weights at each update. Defaults to 0.01
    :type learning_rate: Optional[float]
    :param stopping_criterion: Name of the stopping criterion to use for training.
        For available values,
        check :meth:`recpack.algorithms.stopping_criterion.StoppingCriterion.FUNCTIONS`
        Defaults to 'ndcg'
    :type stopping_criterion: Optional[str]
    :param stop_early: If True, early stopping is enabled,
        and after ``max_iter_no_change`` iterations where improvement of loss function
        is below ``min_improvement`` the optimisation is stopped,
        even if max_epochs is not reached.
        Defaults to False
    :type stop_early: bool, optional
    :param max_iter_no_change: If early stopping is enabled,
        stop after this amount of iterations without change.
        Defaults to 5
    :type max_iter_no_change: int, optional
    :param min_improvement: If early stopping is enabled, no change is detected,
        if the improvement is below this value.
        Defaults to 0.01
    :type min_improvement: float, optional
    :param seed: Seed to the randomizers, useful for reproducible results,
        defaults to None
    :type seed: int, optional
    :param save_best_to_file: If true, the best model will be saved after training,
        defaults to False
    :type save_best_to_file: bool, optional
    :param keep_last: Retain last model, rather than best
        (according to stopping criterion value on validation data), defaults to False
    :type keep_last: bool, optional
    :param predict_topK: The topK recommendations to keep per row in the matrix.
        Use when the user x item output matrix would become too large for RAM.
        Defaults to None, which results in no filtering.
    :type predict_topK: int, optional
    :param n_negatives_per_positive: Amount of negatives to sample for each positive example, defaults to 1
    :type n_negatives_per_positive: int, optional
    :param dropout: Dropout parameter used in MLP, defaults to 0.0
    :type dropout: float, optional
    :param exact_sampling: Enable or disable exact checks while sampling. 
        With exact sampling the sampled negatives are guaranteed to not have been visited by the user. 
        Non exact sampling assumes that the space for item selection is large enough, 
        such that most items are likely not seen before.
        Defaults to False,
    :type exact_sampling: bool, optional
    """
    pass

In [13]:
class NeuMF(NeuMF):
    def __init__(
        self,
        predictive_factors: int,
        batch_size: Optional[int] = 512,
        max_epochs: Optional[int] = 10,
        learning_rate: Optional[float] = 0.01,
        stopping_criterion: Optional[str] = "ndcg",
        stop_early: Optional[bool] = False,
        max_iter_no_change: Optional[int] = 5,
        min_improvement: Optional[float] = 0.0,
        seed: Optional[int] = None,
        save_best_to_file: Optional[bool] = False,
        keep_last: Optional[bool] = False,
        predict_topK: Optional[int] = None,
        n_negatives_per_positive: Optional[int] = 1,
        exact_sampling: Optional[bool] = False,
        dropout: Optional[float] = 0.0,
    ):
        print(batch_size, max_epochs, learning_rate, stopping_criterion)
        super().__init__(batch_size, max_epochs, learning_rate,
            stopping_criterion, stop_early, max_iter_no_change,
            min_improvement, seed, save_best_to_file, keep_last,
            predict_topK,
        )

        self.predictive_factors = predictive_factors

        self.n_negatives_per_positive = n_negatives_per_positive
        self.dropout = dropout
        self.exact_sampling = exact_sampling

        self.sampler = PositiveNegativeSampler(
            U=self.n_negatives_per_positive, replace=False, exact=exact_sampling, 
            batch_size=self.batch_size
        )

In choosing our parameters we go for simple is better. We could let the user configure the structure of the MLP entirely, but then we would need to babysit them to make sure they don't make mistakes in this configuration.


Instead we follow the authors in their experiment, and use a single parameter to define the structure of the MLP: __predictive factors__.
This is the size of the final hidden layer in the MLP. The second layer is twice as big, while the bottom layer is twice as big again.

The embedding size is 2 x the predictive factors as well, since they will be concatenated and passed to the bottom layer of the MLP, this way those two are also guaranteed equal.

We also allow the user to specify the dropout parameter of the MLP architecture. We don't expose the other parameters for simplicity, and keep them at reasonable defaults.

`n_negatives_per_positive`, the number of negatives to sample when training the model. For each positive, U negatives will be sampled.
Related to sampling we also added the exact_sampling boolean. For datasets with enough items, we can assume that a random item selected is unlikely to be interacted with, and so the sampler only checks against the positive example for which a negative is sampled. For small datasets it can be advisable to set exact_sampling to True, such that the sampler checks the negatives against each of the items the user has interacted with. This way you avoid asking the network to learn opposing information.
Exact sampling is a lot slower than approximate sampling, espcially for large datasets.



In [14]:
class NeuMF(NeuMF):
    def _init_model(self, X: csr_matrix):
        num_users, num_items = X.shape
        self.model_ = NMFModule(self.predictive_factors, num_users, num_items, self.dropout).to(
            self.device
        )

        self.optimizer = torch.optim.Adam(
            self.model_.parameters(), lr=self.learning_rate
        )

In the init model function we construct our NMFModule given the hyper parameters and the shape of the training matrix.

We also create our optimiser, using the ADAM optimiser as suggested in the paper.

In [15]:
class NeuMF(NeuMF):
    def _train_epoch(self, X: csr_matrix) -> List[int]:
        losses = []
        for users, positives, negatives in self.sampler.sample(X):

            self.optimizer.zero_grad()

            # Predict for the positives
            positive_scores = self.model_.forward(
                users.to(self.device), positives.to(self.device))
            # Predict for the negatives
            negative_scores = self.model_.forward(
                *self._construct_negative_prediction_input(
                    users.to(self.device), negatives.to(self.device))
            )

            loss = self._compute_loss(
                positive_scores, negative_scores)

            # Backwards propagation of the loss
            loss.backward()
            self.optimizer.step()

            losses.append(loss.item())

        return losses



During a single epoch the algorithm will loop in batch sizes through the positives in random order. For each positive user, item pair also U negatives will be sampled. 

This is all handled by a RecPack preimplemented Sampler: PositiveNegativeSampler

<!-- TODO: Add slide with docs of the sampler -->

For each batch we will
compute similarities between the user u and the sampled positives and negatives given the 1D tensors for users and positives, and the 2D tensors for negatives.

For the positives this is straightforward since the tensors are already in the right format.
For the negatives we need to restructure the 2D tensor into a 1D tensor

Once we have computed similarities for both positives and negatives, we can compute the loss, which we will implement next.

Given the loss we let torch handle the update to the network by back propagation.

We finally store the loss, so we can log the average loss to the user, allowing them to monitor the training.

The array of losses accumulated during the epoch is returned as return value of the train epoch function.

In [16]:
class NeuMF(NeuMF):
    def _compute_loss(
        self, positive_scores: torch.FloatTensor, negative_scores: torch.FloatTensor
    ) -> torch.FloatTensor:
        """Compute the Square Error loss given recommendations 
        for positive items, and sampled negatives.
        """

        mse = nn.MSELoss(reduction="sum")
        return mse(positive_scores, torch.ones_like(positive_scores, dtype=torch.float)) + mse(
            negative_scores, torch.zeros_like(negative_scores, dtype=torch.float)
        )

    def _construct_negative_prediction_input(self, users, negatives):
        """Construct the prediction input given a 1D user tensor and a 2D negatives tensor.
        
        Since negatives has shape |batch| x U, and users is a 1d vector,
        these need to be turned into two 1D vectors of shape |batch| * U

        First the users as a row are stacked U times and transposed,
        so that this is also a batch x U tensor.
        Then both are reshaped to remove the 2nd dimension, 
        resulting in a single long 1d vector.
        """
        return (
            users.repeat(self.n_negatives_per_positive, 1).T.reshape(-1), 
            negatives.reshape(-1)
        )

Squared Loss. This is simply the sum of the square of the error in the prediction. Because we are reconstructing a binary matrix, the target value for positives is 1 and 0 for the negatives.

construction of input:
repeat the users tensor such that the correct users are still associated with the correct negatives.

In [17]:
class NeuMF(NeuMF):
    def _batch_predict(self, X: csr_matrix, users: List[int]) -> csr_matrix:
        """Generate recommendations for each of the users."""

        X_pred = lil_matrix(X.shape)
        if users is None:
            users = get_users(X)

        _, n_items = X.shape
        n_users = len(users)

        # Create two 1D arrays such that each item gets a score for each of the users.
        # The user tensor contains the users in order (eg. [1, 1, 2, 2]), 
        # such that the items are the item indices repeated (eg. [0, 1, 2, 0, 1, 2]).
        user_tensor = torch.LongTensor(users).repeat(n_items, 1).T.reshape(-1).to(self.device)
        item_tensor = torch.arange(n_items).repeat(n_users).to(self.device)

        X_pred[users] = self.model_(user_tensor, item_tensor).detach().cpu().numpy().reshape(n_users, n_items)
        return X_pred.tocsr()

For any decent size dataset computing a prediction for each user at once requires too much RAM to be feasible. 
Instead we compute the recommendations for a batch of users, and then only keep topK recommendations.

In the implementation our only task is to generate the correct tensors to give as input to the module
We can then rely on the fitted model to generate the recommendation scores, the base class to make sure the model is in prediction mode as well as pruning recommendations and constructing the full prediction output.

In [18]:
TIMESTAMP_IX = 'ts'
ITEM_IX = 'iid'
USER_IX = 'uid'

data = {
    TIMESTAMP_IX: [3, 2, 1, 4, 0, 1, 2, 4, 0, 1, 2],
    ITEM_IX: [0, 1, 2, 3, 0, 1, 2, 4, 0, 1, 2],
    USER_IX: [0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5],
}
df = pd.DataFrame.from_dict(data)

mat = InteractionMatrix(df, ITEM_IX, USER_IX, timestamp_ix=TIMESTAMP_IX)


In [19]:
from recpack.tests.test_algorithms.util import assert_changed, assert_same

def test_training_epoch(X):
    a = NeuMF(
        predictive_factors=2, 
        n_negatives_per_positive=2,
        exact_sampling=True
    )
    device = a.device
    a._init_model(X)

    # Each training epoch should update the parameters
    params = [np for np in a.model_.named_parameters() if np[1].requires_grad]
    params_before = [(name, p.clone()) for (name, p) in params]
    for _ in range(5):
        a._train_epoch(X)
    params = [np for np in a.model_.named_parameters() if np[1].requires_grad]
    assert_changed(params_before, params, device)

test_training_epoch(mat.binary_values)

512 10 0.01 ndcg


AssertionError: 

In [20]:
def test_batch_predict(mat, users):
    a = NeuMF(
        predictive_factors=2, 
        n_negatives_per_positive=2,
        exact_sampling=True
    )
    device = a.device
    a.fit(mat, (mat, mat))
    params = [np for np in a.model_.named_parameters() if np[1].requires_grad]
    params_before = [(name, p.clone()) for (name, p) in params]

    pred = a._batch_predict(mat.users_in(users), users=users)

    assert pred.shape == mat.shape
    np.testing.assert_array_equal(pred.sum(axis=1).nonzero()[0], users)

    params = [np for np in a.model_.named_parameters() if np[1].requires_grad]
    assert_same(params_before, params, device)

    

test_batch_predict(mat, [0, 1])
test_batch_predict(mat, [0])
test_batch_predict(mat, [0, 1, 3])

512 10 0.01 ndcg
2022-06-22 14:12:44,655 - base - recpack - INFO - Processed epoch 0 in 0.01 s.Batch Training Loss = 10.4516
2022-06-22 14:12:44,662 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6405398473870061, which is better than previous iterations.
2022-06-22 14:12:44,662 - base - recpack - INFO - Model improved. Storing better model.
2022-06-22 14:12:44,666 - base - recpack - INFO - Evaluation at end of 0 took 0.01 s.
2022-06-22 14:12:44,676 - base - recpack - INFO - Processed epoch 1 in 0.01 s.Batch Training Loss = 10.3924
2022-06-22 14:12:44,682 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6028241164651866, which is worse than previous iterations.
2022-06-22 14:12:44,683 - base - recpack - INFO - Evaluation at end of 1 took 0.01 s.
2022-06-22 14:12:44,696 - base - recpack - INFO - Processed epoch 2 in 0.01 s.Batch Training Loss = 10.3358
2022-06-22 14:12:44,702 - stopping_criterion - recpack - INFO - StoppingCriterion has value 

2022-06-22 14:12:45,149 - base - recpack - INFO - Processed epoch 2 in 0.02 s.Batch Training Loss = 9.4404
2022-06-22 14:12:45,155 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.7838062179781108, which is worse than previous iterations.
2022-06-22 14:12:45,155 - base - recpack - INFO - Evaluation at end of 2 took 0.01 s.
2022-06-22 14:12:45,165 - base - recpack - INFO - Processed epoch 3 in 0.01 s.Batch Training Loss = 9.3179
2022-06-22 14:12:45,170 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.7659479478689263, which is worse than previous iterations.
2022-06-22 14:12:45,170 - base - recpack - INFO - Evaluation at end of 3 took 0.01 s.
2022-06-22 14:12:45,181 - base - recpack - INFO - Processed epoch 4 in 0.01 s.Batch Training Loss = 9.2018
2022-06-22 14:12:45,187 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.8524021713512242, which is better than previous iterations.
2022-06-22 14:12:45,188 - base - recpack - INFO

In [21]:
def test_negative_input_construction(users, negatives, U):
    
    a = NeuMF(
        predictive_factors=8, 
        n_negatives_per_positive=U
    )
    
    num_users = users.shape[0]
    users_input, negatives_input = a._construct_negative_prediction_input(users, negatives)
    assert users_input.shape == negatives_input.shape
    assert len(users_input.shape) == 1 # 1d vectors
    
    # Check that both are in the right order (each user is repeated U times before the next user is present)
    for ix in range(users_input.shape[0]):
        assert users_input[ix] == users[ix // U]
        assert negatives_input[ix] == negatives[ix // U, ix % U]

test_negative_input_construction(torch.LongTensor([4, 5, 6]), torch.LongTensor([[1, 2], [1, 2], [1, 2]]), U=2)
test_negative_input_construction(torch.LongTensor([4, 5, 6]), torch.LongTensor([[1], [1], [1]]), U=1)


512 10 0.01 ndcg
512 10 0.01 ndcg


In [22]:
def test_overfit(mat):
    m = NeuMF(
        predictive_factors=5,
        batch_size=1,
        max_epochs=20,
        learning_rate=0.02,
        stopping_criterion="ndcg",
        n_negatives_per_positive=1,
    )

    # set sampler to exact sampling
    m.sampler.exact = True
    m.fit(mat, (mat, mat))
    bin_mat = mat.binary_values
    pred = m.predict(mat.binary_values).toarray()
    for user in mat.active_users:
        # The model should have overfitted, so that the visited items have the highest similarities
        positives = bin_mat[user].nonzero()[1]
        negatives = list(set(range(mat.shape[1])) - set(positives))

        for item in positives:
            assert (pred[user][negatives] < pred[user, item]).all()
            
test_overfit(mat)
    

1 20 0.02 ndcg
2022-06-22 14:12:47,684 - base - recpack - INFO - Processed epoch 0 in 0.03 s.Batch Training Loss = 0.5015
2022-06-22 14:12:47,695 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.7633421038677857, which is better than previous iterations.
2022-06-22 14:12:47,695 - base - recpack - INFO - Model improved. Storing better model.
2022-06-22 14:12:47,699 - base - recpack - INFO - Evaluation at end of 0 took 0.01 s.
2022-06-22 14:12:47,731 - base - recpack - INFO - Processed epoch 1 in 0.03 s.Batch Training Loss = 0.5001
2022-06-22 14:12:47,745 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.7819889967717142, which is better than previous iterations.
2022-06-22 14:12:47,746 - base - recpack - INFO - Model improved. Storing better model.
2022-06-22 14:12:47,749 - base - recpack - INFO - Evaluation at end of 1 took 0.02 s.
2022-06-22 14:12:47,781 - base - recpack - INFO - Processed epoch 2 in 0.03 s.Batch Training Loss = 0.4998
2022-06-

## Experiment

Use RecPack Pipeline to compare the newly implemented algorithm to frequently used baselines

Now that we have implemented a new algorithm, we want to compare its performance to some baseline algorithms.
This acts both as a sanity check in terms of performance, as an important step to make a contribution to the field.

For easy experimentation RecPack provides a pipeline setup.

We’ll use the PipelineBuilder to construct our pipeline, and once constructed run it to finally get comparative results


In [23]:
from recpack.pipelines import PipelineBuilder
from recpack.datasets import MovieLens25M
from recpack.scenarios import WeakGeneralization

We need the following components:

- PipelineBuilder
- A Dataset
- A Scenario

In [24]:
dataset = MovieLens25M(
#     path='/home/robinverachtert/datasets',
    path='/Users/robinverachtert/workspace/doctorate/datasets/'
)
data = dataset.load_interaction_matrix()

  0%|          | 0/12415224 [00:00<?, ?it/s]

  0%|          | 0/12415224 [00:00<?, ?it/s]

In [25]:
# Subsample to 1000 users to make it faster
import numpy as np

users = np.random.choice(list(data.active_users), 1000)
data = data.users_in(users)

In [26]:
scenario = WeakGeneralization(frac_data_in=0.8, validation=True)
scenario.split(data)

0it [00:00, ?it/s]

0it [00:00, ?it/s]

In [27]:
from recpack.pipelines import ALGORITHM_REGISTRY
ALGORITHM_REGISTRY.register('NeuMF', NeuMF)

In order to use our new algorithm we need to register it with the algorithm registry.
This will allow the pipeline to find the right class to use when constructing the algorithm.

In [28]:
builder = PipelineBuilder()
builder.set_data_from_scenario(scenario)
builder.set_optimisation_metric('NormalizedDiscountedCumulativeGainK', K=10)
builder.add_metric('NormalizedDiscountedCumulativeGainK', K=10)
builder.add_metric('CoverageK', K=10)

builder.add_algorithm(
    algorithm = 'NeuMF', 
    params = {
        'batch_size': 128,
        'max_epochs': 10,
        'learning_rate': 0.01,
        'stopping_criterion': 'ndcg',
        'predict_topK': 20,
        'n_negatives_per_positive': 3,
        'dropout': 0.01
    },
    grid = {
        'predictive_factors': [8, 16, 32],
    }
)

builder.add_algorithm('Popularity', params={'K': 20})
builder.add_algorithm('ItemKNN', grid={'similarity': ['conditional_probability', 'cosine']})

Now we can construct our pipeline using the pipelineBuilder.

We set the data from the scenario, select an optimisation metric, add metrics to evaluate on.
And finally we add our algorithms and the parameters we want to use.
If parameters should be optimised like the predictive factors here, we use grid search 
(more advanced methods are under development) 

We similarly add our baselines, and their parameters

In [29]:
pipeline = builder.build()

When you call build(), the builder will perform some checks on the values you have set, and raise an error if anything is wrong. Eg. if you are expecting parameters to be optimised, you should have validation data as well as an optimisation metric. Validation data is part of the splitting process, while the optimisation metric should be set on the builder.

In [30]:
pipeline.run()

  0%|          | 0/3 [00:00<?, ?it/s]

128 10 0.01 ndcg
2022-06-22 14:14:00,036 - base - recpack - INFO - Processed epoch 0 in 5.72 s.Batch Training Loss = 53.6188
2022-06-22 14:14:15,932 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.07013975762846296, which is better than previous iterations.
2022-06-22 14:14:15,933 - base - recpack - INFO - Model improved. Storing better model.
2022-06-22 14:14:15,950 - base - recpack - INFO - Evaluation at end of 0 took 15.91 s.
2022-06-22 14:14:22,350 - base - recpack - INFO - Processed epoch 1 in 6.40 s.Batch Training Loss = 42.9730
2022-06-22 14:14:38,962 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.06826195413245605, which is worse than previous iterations.
2022-06-22 14:14:38,963 - base - recpack - INFO - Evaluation at end of 1 took 16.61 s.
2022-06-22 14:14:44,896 - base - recpack - INFO - Processed epoch 2 in 5.93 s.Batch Training Loss = 40.6114
2022-06-22 14:15:00,385 - stopping_criterion - recpack - INFO - StoppingCriterion has va

2022-06-22 14:25:14,927 - base - recpack - INFO - Model improved. Storing better model.
2022-06-22 14:25:15,011 - base - recpack - INFO - Evaluation at end of 0 took 41.22 s.
2022-06-22 14:25:50,676 - base - recpack - INFO - Processed epoch 1 in 35.66 s.Batch Training Loss = 42.2653
2022-06-22 14:26:28,712 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.0668488138004705, which is worse than previous iterations.
2022-06-22 14:26:28,713 - base - recpack - INFO - Evaluation at end of 1 took 38.04 s.
2022-06-22 14:27:04,366 - base - recpack - INFO - Processed epoch 2 in 35.65 s.Batch Training Loss = 40.5017
2022-06-22 14:27:42,424 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.06931486242399064, which is better than previous iterations.
2022-06-22 14:27:42,425 - base - recpack - INFO - Model improved. Storing better model.
2022-06-22 14:27:42,515 - base - recpack - INFO - Evaluation at end of 2 took 38.15 s.
2022-06-22 14:28:18,138 - base - recp

  self._set_arrayXarray(i, j, x)


2022-06-22 14:43:41,163 - base - recpack - INFO - Fitting ItemKNN complete - Took 2.13s


  self._set_arrayXarray(i, j, x)


2022-06-22 14:43:43,289 - base - recpack - INFO - Fitting ItemKNN complete - Took 1.87s


  self._set_arrayXarray(i, j, x)


2022-06-22 14:43:45,598 - base - recpack - INFO - Fitting ItemKNN complete - Took 1.88s




Running the pipeline goes through the algorithms in order, if there are parameters to optimise, those are optimised by training and evaluating(on the validation dataset) with each of the grid combinations, and the best one  is chosen according to the optimisation metric.

Then a final training and evaluation (on the test dataset) with the selected metrics gives us the results which we can fetch once done.

In [45]:
import pandas as pd

In [51]:
pd.DataFrame.from_dict(pipeline.get_metrics()).T

Unnamed: 0,normalizeddiscountedcumulativegaink_10,coveragek_10
"NeuMF(U=3,batch_size=128,dropout=0.01,exact_sampling=False,keep_last=False,learning_rate=0.01,max_epochs=10,max_iter_no_change=5,min_improvement=0.0,num_components=16,predict_topK=20,save_best_to_file=False,seed=2404017312,stop_early=False,stopping_criterion=<recpack.algorithms.stopping_criterion.StoppingCriterion object at 0x7fa205cd3c40>)",0.045246,0.0383
Popularity(K=20),0.094183,0.000313
"ItemKNN(K=200,normalize=False,normalize_X=False,normalize_sim=False,pop_discount=None,similarity=cosine)",0.153369,0.08384


In [31]:
pipeline.get_metrics_dataframe(short=True)

Unnamed: 0,normalizeddiscountedcumulativegaink_10,coveragek_10
NeuMF,0.077922,0.004364
Popularity,0.079727,0.000502
ItemKNN,0.090906,0.107338


We see that the new algorithm does not perform as well as either ItemKNN or Popularity. This can be due to poor choices in hyper parameters, since tuning them is an important step