# PinSAGE with MSD Dataset and StrongGeneralization Scenario

In this notebook, the implementation of PinSAGE in RecPack and the experimental part to generate the results of the algorithm will be presented. 
The notebook contains:
1. The implementation of PinSAGE in RecPack.
2. The 10% of MSD Dataset from RecPack and the StrongGeneralization Scenario has been used to split the data.
3. The StrongGeneralization Scenario to split the data.
4. The RecPack Pipeline Builder to run the experiments, including the splitted dataset, the algorithms and metrics to run. Hyperparameter has been performed in the Pipeline.

Please make sure you have installed all the latest libraries in your Python environment, in order to have a successful run of the code.

## PinSAGE implementation in RecPack

In [115]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch_sparse import SparseTensor, matmul
from recpack.algorithms.base import TorchMLAlgorithm
from recpack.matrix.interaction_matrix import InteractionMatrix
from recpack.algorithms.loss_functions import bpr_loss, bpr_max_loss
from recpack.algorithms.samplers import PositiveNegativeSampler
from recpack.matrix.util import to_csr_matrix 
from scipy.sparse import csr_matrix, lil_matrix
from typing import List, Optional
import logging

logger = logging.getLogger(__name__)

# PinSAGEConv: A single convolutional layer for the PinSAGE model
class PinSAGEConv(nn.Module):
    def __init__(self, in_channels, out_channels, dropout=0.0):
        """
        Initialize the PinSAGEConv layer.

        Args:
            in_channels (int): Number of input channels (dimensions of the input features).
            out_channels (int): Number of output channels (dimensions of the output features).
            dropout (float): Dropout rate for regularization.
        """
        super(PinSAGEConv, self).__init__()
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.dropout = dropout

        # Define a linear transformation and a dropout layer
        self.linear = nn.Linear(in_channels, out_channels)
        self.dropout_layer = nn.Dropout(dropout)
        self.reset_parameters()

    def reset_parameters(self):
        """
        Initialize the parameters of the layer using Xavier uniform initialization.
        """
        nn.init.xavier_uniform_(self.linear.weight)
        if self.linear.bias is not None:
            nn.init.zeros_(self.linear.bias)

    def forward(self, x, graph):
        """
        Forward pass for the PinSAGEConv layer.

        Args:
            x (torch.Tensor): Input feature matrix.
            graph (SparseTensor): Sparse adjacency matrix representing the graph.

        Returns:
            torch.Tensor: Output features after convolution and activation.
        """
        try:
            out = matmul(graph, x)  # Perform graph convolution
        except RuntimeError as e:
            # Log the error and return the input unchanged if the operation fails
            # logger.error(f"matmul failed with error: {e}")
            return x  
        out = self.linear(out)
        out = torch.relu(out)
        out = self.dropout_layer(out)
        
        return out

# PinSAGE: A model implementation based on the PinSAGE algorithm
class PinSAGE(nn.Module):
    def __init__(self, num_users, num_items, embedding_dim=64, n_layers=2, dropout=0.0):
        """
        Initialize the PinSAGE model.

        Args:
            num_users (int): Number of users.
            num_items (int): Number of items.
            embedding_dim (int): Dimension of the embedding vectors.
            n_layers (int): Number of PinSAGEConv layers.
            dropout (float): Dropout rate for regularization.
        """
        super(PinSAGE, self).__init__()
        self.num_users = num_users
        self.num_items = num_items
        self.embedding_dim = embedding_dim
        self.n_layers = n_layers
        self.dropout = dropout

        # Initialize user and item embeddings
        self.user_embedding = nn.Embedding(num_users, embedding_dim)
        self.item_embedding = nn.Embedding(num_items, embedding_dim)
        
        # Create a list of PinSAGEConv layers
        self.convs = nn.ModuleList([
            PinSAGEConv(embedding_dim, embedding_dim, dropout) for _ in range(n_layers)
        ])
        
        # Final linear layers for users and items
        self.user_final_linear = nn.Linear(embedding_dim, embedding_dim)
        self.item_final_linear = nn.Linear(embedding_dim, embedding_dim)
        self.reset_parameters()

    def reset_parameters(self):
        """
        Initialize the parameters of the model using Xavier uniform initialization.
        """
        nn.init.xavier_uniform_(self.user_embedding.weight)
        nn.init.xavier_uniform_(self.item_embedding.weight)
        nn.init.xavier_uniform_(self.user_final_linear.weight)
        nn.init.xavier_uniform_(self.item_final_linear.weight)

    def forward(self, graph):
        """
        Forward pass for the PinSAGE model.

        Args:
            graph (SparseTensor): Sparse adjacency matrix representing the graph.

        Returns:
            Tuple[torch.Tensor, torch.Tensor]: Final user and item embeddings.
        """
        user_emb = self.user_embedding.weight
        item_emb = self.item_embedding.weight
        
        # Concatenate user and item embeddings
        all_emb = torch.cat([user_emb, item_emb], dim=0)
        embs = [all_emb]

        # Pass through each PinSAGEConv layer
        for conv in self.convs:
            all_emb = conv(all_emb, graph)
            embs.append(all_emb)

        # Compute the final embeddings by averaging the embeddings across layers
        final_embedding = torch.mean(torch.stack(embs, dim=1), dim=1)
        user_emb_final, item_emb_final = torch.split(final_embedding, [self.num_users, self.num_items])
        
        # Separate final transformations for users and items
        final_user_emb = torch.relu(self.user_final_linear(user_emb_final))
        final_item_emb = torch.relu(self.item_final_linear(item_emb_final))

        # Normalize final embeddings
        final_user_emb = final_user_emb / torch.norm(final_user_emb, p=2, dim=1, keepdim=True)
        final_item_emb = final_item_emb / torch.norm(final_item_emb, p=2, dim=1, keepdim=True)

        return final_user_emb, final_item_emb

In [116]:
from recpack.algorithms.base import TorchMLAlgorithm
from recpack.matrix import Matrix
from recpack.matrix.interaction_matrix import InteractionMatrix
from recpack.algorithms.loss_functions import bpr_loss
from recpack.algorithms.samplers import PositiveNegativeSampler
from recpack.algorithms.stopping_criterion import (
    EarlyStoppingException,
    StoppingCriterion,
)
from typing import List, Tuple, Optional
import numpy as np
from scipy.sparse import csr_matrix, lil_matrix, coo_matrix
import torch
import torch.optim as optim
import tempfile
import time
import logging

logger = logging.getLogger(__name__)

# PinSAGEAlgorithm: An implementation of the PinSAGE algorithm using TorchMLAlgorithm as a base class
class PinSAGEAlgorithm(TorchMLAlgorithm):
    def __init__(
        self,
        batch_size: int = 256,
        max_epochs: int = 100,
        learning_rate: float = 0.001,
        embedding_dim: int = 64,
        n_layers: int = 3,
        dropout: float = 0.1,
        stopping_criterion: str = "bpr",
        stop_early: bool = True,
        max_iter_no_change: int = 5,
        min_improvement: float = 0.01,
        seed: Optional[int] = None,
        save_best_to_file: bool = False,
        keep_last: bool = False,
        predict_topK: Optional[int] = None,
        validation_sample_size: Optional[int] = None,
        grad_clip: float = 1.0,  # Gradient clipping value
    ):
        """
        Initialize the PinSAGEAlgorithm with various hyperparameters.

        Args:
            batch_size (int): Number of samples per batch.
            max_epochs (int): Maximum number of training epochs.
            learning_rate (float): Learning rate for the optimizer.
            embedding_dim (int): Dimension of the embedding vectors.
            n_layers (int): Number of PinSAGEConv layers.
            dropout (float): Dropout rate for regularization.
            stopping_criterion (str): Criterion to stop training early.
            stop_early (bool): Whether to enable early stopping.
            max_iter_no_change (int): Maximum iterations with no improvement for early stopping.
            min_improvement (float): Minimum improvement required for early stopping.
            seed (Optional[int]): Random seed for reproducibility.
            save_best_to_file (bool): Whether to save the best model to a file.
            keep_last (bool): Whether to keep the last model.
            predict_topK (Optional[int]): Number of top-K predictions to consider.
            validation_sample_size (Optional[int]): Size of the validation sample.
            grad_clip (float): Maximum gradient norm for clipping.
        """
        self.embedding_dim = embedding_dim
        self.n_layers = n_layers
        self.dropout = dropout
        self.grad_clip = grad_clip
        super().__init__(
            batch_size=batch_size,
            max_epochs=max_epochs,
            learning_rate=learning_rate,
            stopping_criterion=stopping_criterion,
            stop_early=stop_early,
            max_iter_no_change=max_iter_no_change,
            min_improvement=min_improvement,
            seed=seed,
            save_best_to_file=save_best_to_file,
            keep_last=keep_last,
            predict_topK=predict_topK,
            validation_sample_size=validation_sample_size,
        )
        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    def _init_model(self, train: InteractionMatrix) -> None:
        """
        Initialize the PinSAGE model and optimizer.

        Args:
            train (InteractionMatrix): The training interaction matrix.
        """
        num_users, num_items = train.shape
        self.model_ = PinSAGE(num_users, num_items, self.embedding_dim, self.n_layers, self.dropout).to(self.device)
        self.optimizer = optim.Adam(self.model_.parameters(), lr=self.learning_rate)

    def _create_sparse_graph(self, interaction_matrix: csr_matrix, num_users: int, num_items: int) -> SparseTensor:
        """
        Create a sparse graph from the interaction matrix.

        Args:
            interaction_matrix (csr_matrix): The interaction matrix in CSR format.
            num_users (int): Number of users.
            num_items (int): Number of items.

        Returns:
            SparseTensor: A sparse tensor representing the graph.
        """
        coo = interaction_matrix.tocoo()
        row = torch.tensor(coo.row, dtype=torch.long)
        col = torch.tensor(coo.col, dtype=torch.long)
        value = torch.tensor(coo.data, dtype=torch.float32)
        shape = (num_users + num_items, num_users + num_items)
        graph = SparseTensor(row=row, col=col, value=value, sparse_sizes=shape).to(self.device)
        return graph

    def _train_epoch(self, train: InteractionMatrix) -> List[float]:
        """
        Train the model for one epoch.

        Args:
            train (InteractionMatrix): The training interaction matrix.

        Returns:
            List[float]: A list of losses for each batch.
        """
        self.model_.train()
        interaction_matrix = train  # Get the sparse matrix directly
        graph = self._create_sparse_graph(interaction_matrix, train.shape[0], train.shape[1])
        total_loss = 0
        losses = []

        sampler = PositiveNegativeSampler(num_negatives=1, batch_size=self.batch_size)

        # Iterate over samples generated by the PositiveNegativeSampler
        for user_indices, pos_item_indices, neg_item_indices in sampler.sample(interaction_matrix):
            user_indices = torch.tensor(user_indices).to(self.device)
            pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
            neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()

            self.optimizer.zero_grad()
            user_emb_final, item_emb_final = self.model_(graph)  # Call model only once
            pos_scores = user_emb_final[user_indices] @ item_emb_final[pos_item_indices].t()
            neg_scores = user_emb_final[user_indices] @ item_emb_final[neg_item_indices].t()

            loss = bpr_loss(pos_scores, neg_scores)

            if torch.isnan(loss).any() or torch.isinf(loss).any():
                continue

            loss.backward()
            torch.nn.utils.clip_grad_norm_(self.model_.parameters(), max_norm=self.grad_clip)  # Gradient clipping
            self.optimizer.step()

            total_loss += loss.item()
            losses.append(loss.item())

        if len(losses) == 0:
            return [float('nan')]

        return losses

    def _batch_predict(self, X: InteractionMatrix, users: List[int]) -> csr_matrix:
        """
        Make batch predictions for a list of users.

        Args:
            X (InteractionMatrix): The interaction matrix.
            users (List[int]): List of user indices to make predictions for.

        Returns:
            csr_matrix: A sparse matrix with the prediction scores.
        """
        self.model_.eval()
        graph = self._create_sparse_graph(X, X.shape[0], X.shape[1])
        user_indices = torch.tensor(users).to(self.device)
        item_indices = torch.arange(X.shape[1]).to(self.device)
        
        with torch.no_grad():
            user_emb_final, item_emb_final = self.model_(graph)
            scores = user_emb_final[user_indices] @ item_emb_final.t()
            scores = scores.cpu().numpy()
        
        result = lil_matrix((X.shape[0], X.shape[1]))
        for i, user in enumerate(users):
            result[user] = scores[i]
        
        return result.tocsr()

In [5]:
from recpack.datasets import Netflix, DummyDataset
from recpack.pipelines import PipelineBuilder
from recpack.scenarios import StrongGeneralization
from recpack.pipelines import ALGORITHM_REGISTRY
import pandas as pd

In [117]:
ALGORITHM_REGISTRY.register("PinSAGEAlgorithm16", PinSAGEAlgorithm)

## RecPack Dataset Importing

In [120]:
from recpack.datasets import MillionSongDataset
dataset = MillionSongDataset()

In [121]:
dataset.fetch_dataset()

In [122]:
dataset

<recpack.datasets.million_song_dataset.MillionSongDataset at 0x7f8ee3caa750>

In [123]:
df = dataset._load_dataframe()

## Datasets without Timestamps sampling

In [124]:
# Count interactions per user and per song
user_interactions = df['userId'].value_counts().reset_index()
user_interactions.columns = ['userId', 'user_interactions']

song_interactions = df['songId'].value_counts().reset_index()
song_interactions.columns = ['songId', 'song_interactions']

# Merge the interaction counts back to the original dataframe
df = df.merge(user_interactions, on='userId')
df = df.merge(song_interactions, on='songId')

# Calculate a combined interaction score
df['interaction_score'] = df['user_interactions'] + df['song_interactions']

# Rank based on the interaction score
df['rank'] = df['interaction_score'].rank(method='first', ascending=False)

# Select the top 10%
filtered_df = df[df['rank'] <= len(df) * 0.1]

# Drop helper columns
filtered_df = filtered_df.drop(columns=['user_interactions', 'song_interactions', 'interaction_score', 'rank'])

In [125]:
df

Unnamed: 0,userId,songId,user_interactions,song_interactions,interaction_score,rank
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,142,6698,6840,55439149.0
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAPDEY12A81C210A9,142,2012,2154,91117622.0
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,142,6383,6525,56840187.0
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,142,6383,6525,56840188.0
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBFNSP12AF72A0E22,142,687,829,117735248.0
...,...,...,...,...,...,...
138680238,b7815dbb206eb2831ce0fe040d0aa537e2e800f7,SOUSMXX12AB0185C24,56,155529,155585,6266409.0
138680239,b7815dbb206eb2831ce0fe040d0aa537e2e800f7,SOWYSKH12AF72A303A,56,3306,3362,77443084.0
138680240,b7815dbb206eb2831ce0fe040d0aa537e2e800f7,SOWYSKH12AF72A303A,56,3306,3362,77443085.0
138680241,b7815dbb206eb2831ce0fe040d0aa537e2e800f7,SOWYSKH12AF72A303A,56,3306,3362,77443086.0


In [126]:
filtered_df

Unnamed: 0,userId,songId
28,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOFRQTD12A81C233C0
59,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOMGIYR12AB0187973
60,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOMGIYR12AB0187973
61,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOMGIYR12AB0187973
62,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOMGIYR12AB0187973
...,...,...
138680210,b7815dbb206eb2831ce0fe040d0aa537e2e800f7,SOOFYTN12A6D4F9B35
138680211,b7815dbb206eb2831ce0fe040d0aa537e2e800f7,SOOFYTN12A6D4F9B35
138680212,b7815dbb206eb2831ce0fe040d0aa537e2e800f7,SOOFYTN12A6D4F9B35
138680237,b7815dbb206eb2831ce0fe040d0aa537e2e800f7,SOUJVIT12A8C1451C1


## Dataset Preprocessing to Interaction Matrix

In [127]:
from recpack.matrix import InteractionMatrix
from recpack.preprocessing.preprocessors import DataFramePreprocessor
item_ix = 'songId'
user_ix = 'userId'

preprocessor = DataFramePreprocessor(item_ix=item_ix, user_ix=user_ix)
interaction_matrix = preprocessor.process(filtered_df)

  0%|          | 0/13868024 [00:00<?, ?it/s]

  0%|          | 0/13868024 [00:00<?, ?it/s]

## StrongGeneralization Scenario Splitting of Data

In [128]:
scenario = StrongGeneralization(frac_users_train=0.7, frac_interactions_in=0.8, validation=True)
scenario.split(interaction_matrix)

0it [00:00, ?it/s]

0it [00:00, ?it/s]

## Experimental RecPack Pipeline

In [129]:
pipeline_builder = PipelineBuilder()
ok = (scenario._validation_data_in, scenario._validation_data_out)
pipeline_builder.set_data_from_scenario(scenario)


# Add the baseline algorithms
#pipeline_builder.add_algorithm('ItemKNN', grid={'K': [100, 200, 400, 800]})
#pipeline_builder.add_algorithm('EASE', grid={'l2': [10, 100, 1000], 'alpha': [0, 0.1, 0.5]})

# Add our LightGCN algorithm
pipeline_builder.add_algorithm(
    'PinSAGEAlgorithm16',
    grid={
        'learning_rate': [0.1, 0.01, 0.001],
        'embedding_dim': [100, 200, 400]
    },
    params={
        'max_epochs': 5,
        'batch_size': 1024,
        'n_layers': 3
    }
)

# Add NDCG, Recall, and HR metrics to be evaluated at 10, 20, and 50
pipeline_builder.add_metric('NDCGK', [10, 20, 50])
pipeline_builder.add_metric('RecallK', [10, 20, 50])
pipeline_builder.add_metric('HitK', [10, 20, 50])

# Set the optimisation metric
pipeline_builder.set_optimisation_metric('RecallK', 20)

# Construct pipeline
pipeline = pipeline_builder.build()

# Debugging: Output the shape of the training data
#print(f"Training data shape: {im.shape}")

# Run pipeline, will first do optimisation, and then evaluation
pipeline.run()



  0%|          | 0/1 [00:00<?, ?it/s]

  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-02 13:24:30,100 - base - recpack - INFO - Processed epoch 0 in 233.16 s.Batch Training Loss = 0.6021
2024-08-02 13:27:38,075 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6040709028655837, which is better than previous iterations.
2024-08-02 13:27:38,077 - base - recpack - INFO - Model improved. Storing better model.
2024-08-02 13:27:38,670 - base - recpack - INFO - Evaluation at end of 0 took 188.57 s.
2024-08-02 13:31:30,884 - base - recpack - INFO - Processed epoch 1 in 232.21 s.Batch Training Loss = 0.6016
2024-08-02 13:34:42,009 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6035805572514483, which is worse than previous iterations.
2024-08-02 13:34:42,011 - base - recpack - INFO - Evaluation at end of 1 took 191.12 s.
2024-08-02 13:38:34,720 - base - recpack - INFO - Processed epoch 2 in 232.71 s.Batch Training Loss = 0.6015
2024-08-02 13:41:50,089 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.60365101

  self.model_ = torch.load(self.best_model)


2024-08-02 13:56:09,248 - base - recpack - INFO - Fitting PinSAGEAlgorithm complete - Took 2.13e+03s


  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-02 14:03:25,889 - base - recpack - INFO - Processed epoch 0 in 231.78 s.Batch Training Loss = 0.6021
2024-08-02 14:06:41,895 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6042479317261902, which is better than previous iterations.
2024-08-02 14:06:41,897 - base - recpack - INFO - Model improved. Storing better model.
2024-08-02 14:06:42,486 - base - recpack - INFO - Evaluation at end of 0 took 196.59 s.
2024-08-02 14:10:34,643 - base - recpack - INFO - Processed epoch 1 in 232.16 s.Batch Training Loss = 0.6014
2024-08-02 14:13:47,859 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6034940289293175, which is worse than previous iterations.
2024-08-02 14:13:47,861 - base - recpack - INFO - Evaluation at end of 1 took 193.21 s.
2024-08-02 14:17:40,283 - base - recpack - INFO - Processed epoch 2 in 232.42 s.Batch Training Loss = 0.6017
2024-08-02 14:20:53,820 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.60347279

  self.model_ = torch.load(self.best_model)


2024-08-02 14:35:06,136 - base - recpack - INFO - Fitting PinSAGEAlgorithm complete - Took 2.14e+03s


  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-02 14:42:23,197 - base - recpack - INFO - Processed epoch 0 in 232.04 s.Batch Training Loss = 0.6028
2024-08-02 14:45:38,155 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6023080691371367, which is better than previous iterations.
2024-08-02 14:45:38,156 - base - recpack - INFO - Model improved. Storing better model.
2024-08-02 14:45:38,734 - base - recpack - INFO - Evaluation at end of 0 took 195.53 s.
2024-08-02 14:49:30,952 - base - recpack - INFO - Processed epoch 1 in 232.22 s.Batch Training Loss = 0.6015
2024-08-02 14:52:45,974 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6020454846556061, which is worse than previous iterations.
2024-08-02 14:52:45,976 - base - recpack - INFO - Evaluation at end of 1 took 195.02 s.
2024-08-02 14:56:38,005 - base - recpack - INFO - Processed epoch 2 in 232.03 s.Batch Training Loss = 0.6013
2024-08-02 14:59:55,715 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.60242092

  self.model_ = torch.load(self.best_model)


2024-08-02 15:14:16,445 - base - recpack - INFO - Fitting PinSAGEAlgorithm complete - Took 2.15e+03s


  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-02 15:24:31,035 - base - recpack - INFO - Processed epoch 0 in 407.42 s.Batch Training Loss = 0.6022
2024-08-02 15:27:48,588 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6031040137510385, which is better than previous iterations.
2024-08-02 15:27:48,590 - base - recpack - INFO - Model improved. Storing better model.
2024-08-02 15:27:49,769 - base - recpack - INFO - Evaluation at end of 0 took 198.73 s.
2024-08-02 15:34:37,590 - base - recpack - INFO - Processed epoch 1 in 407.82 s.Batch Training Loss = 0.6015
2024-08-02 15:37:55,526 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6030987599678811, which is worse than previous iterations.
2024-08-02 15:37:55,528 - base - recpack - INFO - Evaluation at end of 1 took 197.94 s.
2024-08-02 15:44:43,082 - base - recpack - INFO - Processed epoch 2 in 407.55 s.Batch Training Loss = 0.6015
2024-08-02 15:48:01,561 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.60427287

  self.model_ = torch.load(self.best_model)


2024-08-02 16:08:12,865 - base - recpack - INFO - Fitting PinSAGEAlgorithm complete - Took 3.03e+03s


  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-02 16:18:24,158 - base - recpack - INFO - Processed epoch 0 in 407.03 s.Batch Training Loss = 0.6020
2024-08-02 16:21:42,312 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6035723526956015, which is better than previous iterations.
2024-08-02 16:21:42,313 - base - recpack - INFO - Model improved. Storing better model.
2024-08-02 16:21:43,508 - base - recpack - INFO - Evaluation at end of 0 took 199.35 s.
2024-08-02 16:28:30,732 - base - recpack - INFO - Processed epoch 1 in 407.22 s.Batch Training Loss = 0.6014
2024-08-02 16:31:51,196 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.603582116623991, which is worse than previous iterations.
2024-08-02 16:31:51,198 - base - recpack - INFO - Evaluation at end of 1 took 200.46 s.
2024-08-02 16:38:38,416 - base - recpack - INFO - Processed epoch 2 in 407.22 s.Batch Training Loss = 0.6016
2024-08-02 16:41:55,774 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.602920727

  self.model_ = torch.load(self.best_model)


2024-08-02 17:02:06,536 - base - recpack - INFO - Fitting PinSAGEAlgorithm complete - Took 3.03e+03s


  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-02 17:12:19,317 - base - recpack - INFO - Processed epoch 0 in 407.45 s.Batch Training Loss = 0.6026
2024-08-02 17:15:36,123 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.603296939695735, which is better than previous iterations.
2024-08-02 17:15:36,124 - base - recpack - INFO - Model improved. Storing better model.
2024-08-02 17:15:37,293 - base - recpack - INFO - Evaluation at end of 0 took 197.97 s.
2024-08-02 17:22:24,734 - base - recpack - INFO - Processed epoch 1 in 407.44 s.Batch Training Loss = 0.6016
2024-08-02 17:25:42,381 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6029373651026743, which is worse than previous iterations.
2024-08-02 17:25:42,383 - base - recpack - INFO - Evaluation at end of 1 took 197.65 s.
2024-08-02 17:32:29,492 - base - recpack - INFO - Processed epoch 2 in 407.11 s.Batch Training Loss = 0.6019
2024-08-02 17:35:47,827 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.604386299

  self.model_ = torch.load(self.best_model)


2024-08-02 17:56:10,889 - base - recpack - INFO - Fitting PinSAGEAlgorithm complete - Took 3.04e+03s


  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-02 18:14:10,648 - base - recpack - INFO - Processed epoch 0 in 869.52 s.Batch Training Loss = 0.6019
2024-08-02 18:17:31,861 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6038143398307573, which is better than previous iterations.
2024-08-02 18:17:31,862 - base - recpack - INFO - Model improved. Storing better model.
2024-08-02 18:17:34,216 - base - recpack - INFO - Evaluation at end of 0 took 203.57 s.
2024-08-02 18:32:03,698 - base - recpack - INFO - Processed epoch 1 in 869.48 s.Batch Training Loss = 0.6015
2024-08-02 18:35:26,838 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6038868522573765, which is worse than previous iterations.
2024-08-02 18:35:26,840 - base - recpack - INFO - Evaluation at end of 1 took 203.14 s.
2024-08-02 18:49:55,797 - base - recpack - INFO - Processed epoch 2 in 868.95 s.Batch Training Loss = 0.6013
2024-08-02 18:53:16,555 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.60317396

  self.model_ = torch.load(self.best_model)


2024-08-02 19:28:59,929 - base - recpack - INFO - Fitting PinSAGEAlgorithm complete - Took 5.36e+03s


  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-02 19:46:59,300 - base - recpack - INFO - Processed epoch 0 in 869.33 s.Batch Training Loss = 0.6022
2024-08-02 19:50:19,176 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6041090124886958, which is better than previous iterations.
2024-08-02 19:50:19,177 - base - recpack - INFO - Model improved. Storing better model.
2024-08-02 19:50:21,438 - base - recpack - INFO - Evaluation at end of 0 took 202.13 s.
2024-08-02 20:04:50,818 - base - recpack - INFO - Processed epoch 1 in 869.38 s.Batch Training Loss = 0.6017
2024-08-02 20:08:10,170 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6036722444056799, which is worse than previous iterations.
2024-08-02 20:08:10,172 - base - recpack - INFO - Evaluation at end of 1 took 199.35 s.
2024-08-02 20:22:39,854 - base - recpack - INFO - Processed epoch 2 in 869.68 s.Batch Training Loss = 0.6014
2024-08-02 20:26:02,217 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.60397167

  self.model_ = torch.load(self.best_model)


2024-08-02 21:01:43,119 - base - recpack - INFO - Fitting PinSAGEAlgorithm complete - Took 5.36e+03s


  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-02 21:19:41,987 - base - recpack - INFO - Processed epoch 0 in 868.33 s.Batch Training Loss = 0.6022
2024-08-02 21:23:03,869 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6032488127412081, which is better than previous iterations.
2024-08-02 21:23:03,870 - base - recpack - INFO - Model improved. Storing better model.
2024-08-02 21:23:06,184 - base - recpack - INFO - Evaluation at end of 0 took 204.19 s.
2024-08-02 21:37:34,963 - base - recpack - INFO - Processed epoch 1 in 868.78 s.Batch Training Loss = 0.6016
2024-08-02 21:40:55,785 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6027551506246213, which is worse than previous iterations.
2024-08-02 21:40:55,787 - base - recpack - INFO - Evaluation at end of 1 took 200.82 s.
2024-08-02 21:55:24,604 - base - recpack - INFO - Processed epoch 2 in 868.82 s.Batch Training Loss = 0.6012
2024-08-02 21:58:46,724 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.60321732

  self.model_ = torch.load(self.best_model)


2024-08-02 22:34:26,194 - base - recpack - INFO - Fitting PinSAGEAlgorithm complete - Took 5.36e+03s


  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-02 22:41:46,183 - base - recpack - INFO - Processed epoch 0 in 231.36 s.Batch Training Loss = 0.6022
2024-08-02 22:45:00,169 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6047710575296412, which is better than previous iterations.
2024-08-02 22:45:00,171 - base - recpack - INFO - Model improved. Storing better model.
2024-08-02 22:45:00,762 - base - recpack - INFO - Evaluation at end of 0 took 194.58 s.
2024-08-02 22:48:52,793 - base - recpack - INFO - Processed epoch 1 in 232.03 s.Batch Training Loss = 0.6018
2024-08-02 22:52:08,619 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6036408338198848, which is worse than previous iterations.
2024-08-02 22:52:08,621 - base - recpack - INFO - Evaluation at end of 1 took 195.82 s.
2024-08-02 22:56:00,531 - base - recpack - INFO - Processed epoch 2 in 231.91 s.Batch Training Loss = 0.6018
2024-08-02 22:59:14,927 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.60269772

  self.model_ = torch.load(self.best_model)


2024-08-02 23:13:29,934 - base - recpack - INFO - Fitting PinSAGEAlgorithm complete - Took 2.14e+03s


## Results

In [130]:
pipeline.get_metrics()

Unnamed: 0,NDCGK_10,NDCGK_20,NDCGK_50,RecallK_10,RecallK_20,RecallK_50,HitK_10,HitK_20,HitK_50
"PinSAGEAlgorithm(batch_size=1024,dropout=0.1,embedding_dim=100,grad_clip=1.0,keep_last=False,learning_rate=0.1,max_epochs=5,max_iter_no_change=5,min_improvement=0.01,n_layers=3,predict_topK=None,save_best_to_file=False,seed=2149616359,stop_early=True,stopping_criterion=<recpack.algorithms.stopping_criterion.StoppingCriterion object at 0x7f8ee460e450>,validation_sample_size=None)",0.038847,0.048153,0.064221,0.057134,0.087331,0.150476,0.17533,0.276309,0.463195


In [131]:
pd.DataFrame.from_dict(pipeline.get_metrics()).T

Unnamed: 0,"PinSAGEAlgorithm(batch_size=1024,dropout=0.1,embedding_dim=100,grad_clip=1.0,keep_last=False,learning_rate=0.1,max_epochs=5,max_iter_no_change=5,min_improvement=0.01,n_layers=3,predict_topK=None,save_best_to_file=False,seed=2149616359,stop_early=True,stopping_criterion=<recpack.algorithms.stopping_criterion.StoppingCriterion object at 0x7f8ee460e450>,validation_sample_size=None)"
NDCGK_10,0.038847
NDCGK_20,0.048153
NDCGK_50,0.064221
RecallK_10,0.057134
RecallK_20,0.087331
RecallK_50,0.150476
HitK_10,0.17533
HitK_20,0.276309
HitK_50,0.463195
