# PinSAGE with Globo Dataset and StrongGeneralization Scenario

In this notebook, the implementation of PinSAGE in RecPack and the experimental part to generate the results of the algorithm will be presented. 
The notebook contains:
1. The implementation of PinSAGE in RecPack.
2. The 10% of Globo Dataset from RecPack and the StrongGeneralization Scenario has been used to split the data.
3. The StrongGeneralization Scenario to split the data.
4. The RecPack Pipeline Builder to run the experiments, including the splitted dataset, the algorithms and metrics to run. Hyperparameter has been performed in the Pipeline.

Please make sure you have installed all the latest libraries in your Python environment, in order to have a successful run of the code.

## PinSAGE implementation in RecPack

In [21]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch_sparse import SparseTensor, matmul
from recpack.algorithms.base import TorchMLAlgorithm
from recpack.matrix.interaction_matrix import InteractionMatrix
from recpack.algorithms.loss_functions import bpr_loss, bpr_max_loss
from recpack.algorithms.samplers import PositiveNegativeSampler
from recpack.matrix.util import to_csr_matrix 
from scipy.sparse import csr_matrix, lil_matrix
from typing import List, Optional
import logging

logger = logging.getLogger(__name__)

# PinSAGEConv: A single convolutional layer for the PinSAGE model
class PinSAGEConv(nn.Module):
    def __init__(self, in_channels, out_channels, dropout=0.0):
        """
        Initialize the PinSAGEConv layer.

        Args:
            in_channels (int): Number of input channels (dimensions of the input features).
            out_channels (int): Number of output channels (dimensions of the output features).
            dropout (float): Dropout rate for regularization.
        """
        super(PinSAGEConv, self).__init__()
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.dropout = dropout

        # Define a linear transformation and a dropout layer
        self.linear = nn.Linear(in_channels, out_channels)
        self.dropout_layer = nn.Dropout(dropout)
        self.reset_parameters()

    def reset_parameters(self):
        """
        Initialize the parameters of the layer using Xavier uniform initialization.
        """
        nn.init.xavier_uniform_(self.linear.weight)
        if self.linear.bias is not None:
            nn.init.zeros_(self.linear.bias)

    def forward(self, x, graph):
        """
        Forward pass for the PinSAGEConv layer.

        Args:
            x (torch.Tensor): Input feature matrix.
            graph (SparseTensor): Sparse adjacency matrix representing the graph.

        Returns:
            torch.Tensor: Output features after convolution and activation.
        """
        try:
            out = matmul(graph, x)  # Perform graph convolution
        except RuntimeError as e:
            # Log the error and return the input unchanged if the operation fails
            # logger.error(f"matmul failed with error: {e}")
            return x  
        out = self.linear(out)
        out = torch.relu(out)
        out = self.dropout_layer(out)
        
        return out

# PinSAGE: A model implementation based on the PinSAGE algorithm
class PinSAGE(nn.Module):
    def __init__(self, num_users, num_items, embedding_dim=64, n_layers=2, dropout=0.0):
        """
        Initialize the PinSAGE model.

        Args:
            num_users (int): Number of users.
            num_items (int): Number of items.
            embedding_dim (int): Dimension of the embedding vectors.
            n_layers (int): Number of PinSAGEConv layers.
            dropout (float): Dropout rate for regularization.
        """
        super(PinSAGE, self).__init__()
        self.num_users = num_users
        self.num_items = num_items
        self.embedding_dim = embedding_dim
        self.n_layers = n_layers
        self.dropout = dropout

        # Initialize user and item embeddings
        self.user_embedding = nn.Embedding(num_users, embedding_dim)
        self.item_embedding = nn.Embedding(num_items, embedding_dim)
        
        # Create a list of PinSAGEConv layers
        self.convs = nn.ModuleList([
            PinSAGEConv(embedding_dim, embedding_dim, dropout) for _ in range(n_layers)
        ])
        
        # Final linear layers for users and items
        self.user_final_linear = nn.Linear(embedding_dim, embedding_dim)
        self.item_final_linear = nn.Linear(embedding_dim, embedding_dim)
        self.reset_parameters()

    def reset_parameters(self):
        """
        Initialize the parameters of the model using Xavier uniform initialization.
        """
        nn.init.xavier_uniform_(self.user_embedding.weight)
        nn.init.xavier_uniform_(self.item_embedding.weight)
        nn.init.xavier_uniform_(self.user_final_linear.weight)
        nn.init.xavier_uniform_(self.item_final_linear.weight)

    def forward(self, graph):
        """
        Forward pass for the PinSAGE model.

        Args:
            graph (SparseTensor): Sparse adjacency matrix representing the graph.

        Returns:
            Tuple[torch.Tensor, torch.Tensor]: Final user and item embeddings.
        """
        user_emb = self.user_embedding.weight
        item_emb = self.item_embedding.weight
        
        # Concatenate user and item embeddings
        all_emb = torch.cat([user_emb, item_emb], dim=0)
        embs = [all_emb]

        # Pass through each PinSAGEConv layer
        for conv in self.convs:
            all_emb = conv(all_emb, graph)
            embs.append(all_emb)

        # Compute the final embeddings by averaging the embeddings across layers
        final_embedding = torch.mean(torch.stack(embs, dim=1), dim=1)
        user_emb_final, item_emb_final = torch.split(final_embedding, [self.num_users, self.num_items])
        
        # Separate final transformations for users and items
        final_user_emb = torch.relu(self.user_final_linear(user_emb_final))
        final_item_emb = torch.relu(self.item_final_linear(item_emb_final))

        # Normalize final embeddings
        final_user_emb = final_user_emb / torch.norm(final_user_emb, p=2, dim=1, keepdim=True)
        final_item_emb = final_item_emb / torch.norm(final_item_emb, p=2, dim=1, keepdim=True)

        return final_user_emb, final_item_emb

In [22]:
from recpack.algorithms.base import TorchMLAlgorithm
from recpack.matrix import Matrix
from recpack.matrix.interaction_matrix import InteractionMatrix
from recpack.algorithms.loss_functions import bpr_loss
from recpack.algorithms.samplers import PositiveNegativeSampler
from recpack.algorithms.stopping_criterion import (
    EarlyStoppingException,
    StoppingCriterion,
)
from typing import List, Tuple, Optional
import numpy as np
from scipy.sparse import csr_matrix, lil_matrix, coo_matrix
import torch
import torch.optim as optim
import tempfile
import time
import logging

logger = logging.getLogger(__name__)

# PinSAGEAlgorithm: An implementation of the PinSAGE algorithm using TorchMLAlgorithm as a base class
class PinSAGEAlgorithm(TorchMLAlgorithm):
    def __init__(
        self,
        batch_size: int = 256,
        max_epochs: int = 100,
        learning_rate: float = 0.001,
        embedding_dim: int = 64,
        n_layers: int = 3,
        dropout: float = 0.1,
        stopping_criterion: str = "bpr",
        stop_early: bool = True,
        max_iter_no_change: int = 5,
        min_improvement: float = 0.01,
        seed: Optional[int] = None,
        save_best_to_file: bool = False,
        keep_last: bool = False,
        predict_topK: Optional[int] = None,
        validation_sample_size: Optional[int] = None,
        grad_clip: float = 1.0,  # Gradient clipping value
    ):
        """
        Initialize the PinSAGEAlgorithm with various hyperparameters.

        Args:
            batch_size (int): Number of samples per batch.
            max_epochs (int): Maximum number of training epochs.
            learning_rate (float): Learning rate for the optimizer.
            embedding_dim (int): Dimension of the embedding vectors.
            n_layers (int): Number of PinSAGEConv layers.
            dropout (float): Dropout rate for regularization.
            stopping_criterion (str): Criterion to stop training early.
            stop_early (bool): Whether to enable early stopping.
            max_iter_no_change (int): Maximum iterations with no improvement for early stopping.
            min_improvement (float): Minimum improvement required for early stopping.
            seed (Optional[int]): Random seed for reproducibility.
            save_best_to_file (bool): Whether to save the best model to a file.
            keep_last (bool): Whether to keep the last model.
            predict_topK (Optional[int]): Number of top-K predictions to consider.
            validation_sample_size (Optional[int]): Size of the validation sample.
            grad_clip (float): Maximum gradient norm for clipping.
        """
        self.embedding_dim = embedding_dim
        self.n_layers = n_layers
        self.dropout = dropout
        self.grad_clip = grad_clip
        super().__init__(
            batch_size=batch_size,
            max_epochs=max_epochs,
            learning_rate=learning_rate,
            stopping_criterion=stopping_criterion,
            stop_early=stop_early,
            max_iter_no_change=max_iter_no_change,
            min_improvement=min_improvement,
            seed=seed,
            save_best_to_file=save_best_to_file,
            keep_last=keep_last,
            predict_topK=predict_topK,
            validation_sample_size=validation_sample_size,
        )
        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    def _init_model(self, train: InteractionMatrix) -> None:
        """
        Initialize the PinSAGE model and optimizer.

        Args:
            train (InteractionMatrix): The training interaction matrix.
        """
        num_users, num_items = train.shape
        self.model_ = PinSAGE(num_users, num_items, self.embedding_dim, self.n_layers, self.dropout).to(self.device)
        self.optimizer = optim.Adam(self.model_.parameters(), lr=self.learning_rate)

    def _create_sparse_graph(self, interaction_matrix: csr_matrix, num_users: int, num_items: int) -> SparseTensor:
        """
        Create a sparse graph from the interaction matrix.

        Args:
            interaction_matrix (csr_matrix): The interaction matrix in CSR format.
            num_users (int): Number of users.
            num_items (int): Number of items.

        Returns:
            SparseTensor: A sparse tensor representing the graph.
        """
        coo = interaction_matrix.tocoo()
        row = torch.tensor(coo.row, dtype=torch.long)
        col = torch.tensor(coo.col, dtype=torch.long)
        value = torch.tensor(coo.data, dtype=torch.float32)
        shape = (num_users + num_items, num_users + num_items)
        graph = SparseTensor(row=row, col=col, value=value, sparse_sizes=shape).to(self.device)
        return graph

    def _train_epoch(self, train: InteractionMatrix) -> List[float]:
        """
        Train the model for one epoch.

        Args:
            train (InteractionMatrix): The training interaction matrix.

        Returns:
            List[float]: A list of losses for each batch.
        """
        self.model_.train()
        interaction_matrix = train  # Get the sparse matrix directly
        graph = self._create_sparse_graph(interaction_matrix, train.shape[0], train.shape[1])
        total_loss = 0
        losses = []

        sampler = PositiveNegativeSampler(num_negatives=1, batch_size=self.batch_size)

        # Iterate over samples generated by the PositiveNegativeSampler
        for user_indices, pos_item_indices, neg_item_indices in sampler.sample(interaction_matrix):
            user_indices = torch.tensor(user_indices).to(self.device)
            pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
            neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()

            self.optimizer.zero_grad()
            user_emb_final, item_emb_final = self.model_(graph)  # Call model only once
            pos_scores = user_emb_final[user_indices] @ item_emb_final[pos_item_indices].t()
            neg_scores = user_emb_final[user_indices] @ item_emb_final[neg_item_indices].t()

            loss = bpr_loss(pos_scores, neg_scores)

            if torch.isnan(loss).any() or torch.isinf(loss).any():
                continue

            loss.backward()
            torch.nn.utils.clip_grad_norm_(self.model_.parameters(), max_norm=self.grad_clip)  # Gradient clipping
            self.optimizer.step()

            total_loss += loss.item()
            losses.append(loss.item())

        if len(losses) == 0:
            return [float('nan')]

        return losses

    def _batch_predict(self, X: InteractionMatrix, users: List[int]) -> csr_matrix:
        """
        Make batch predictions for a list of users.

        Args:
            X (InteractionMatrix): The interaction matrix.
            users (List[int]): List of user indices to make predictions for.

        Returns:
            csr_matrix: A sparse matrix with the prediction scores.
        """
        self.model_.eval()
        graph = self._create_sparse_graph(X, X.shape[0], X.shape[1])
        user_indices = torch.tensor(users).to(self.device)
        item_indices = torch.arange(X.shape[1]).to(self.device)
        
        with torch.no_grad():
            user_emb_final, item_emb_final = self.model_(graph)
            scores = user_emb_final[user_indices] @ item_emb_final.t()
            scores = scores.cpu().numpy()
        
        result = lil_matrix((X.shape[0], X.shape[1]))
        for i, user in enumerate(users):
            result[user] = scores[i]
        
        return result.tocsr()

In [3]:
from recpack.datasets import Netflix, DummyDataset
from recpack.pipelines import PipelineBuilder
from recpack.scenarios import StrongGeneralization
from recpack.pipelines import ALGORITHM_REGISTRY
import pandas as pd

In [23]:
ALGORITHM_REGISTRY.register("PinSAGEAlgorithm2", PinSAGEAlgorithm)

## RecPack Dataset Importing

In [5]:
from recpack.datasets import Globo
dataset = Globo(path="", filename="archive.zip")

In [6]:
dataset.fetch_dataset()

In [7]:
dataset

<recpack.datasets.globo.Globo at 0x7fb075b0f690>

In [None]:
df = dataset._load_dataframe()

## Datasets with Timestamps sampling

In [10]:
timestamp_counts = df['click_timestamp'].value_counts().sort_index(ascending=False)
cumulative_counts = timestamp_counts.cumsum()
total_counts = cumulative_counts.max()
threshold_count = total_counts * 0.1
threshold_timestamp = cumulative_counts[cumulative_counts >= threshold_count].index[0]

In [11]:
filtered_df = df[df['click_timestamp'] >= threshold_timestamp]

In [12]:
df

Unnamed: 0,user_id,click_article_id,click_timestamp
0,0,157541,1.506827e+09
1,0,68866,1.506827e+09
2,1,235840,1.506827e+09
3,1,96663,1.506827e+09
4,2,119592,1.506827e+09
...,...,...,...
2564,10051,84911,1.508212e+09
2565,322896,30760,1.508212e+09
2566,322896,157507,1.508212e+09
2567,123718,234481,1.508212e+09


In [13]:
filtered_df

Unnamed: 0,user_id,click_article_id,click_timestamp
4289,22712,158772,1.508196e+09
4290,22712,284638,1.508633e+09
4291,22712,95633,1.508678e+09
4292,22712,95524,1.508679e+09
4293,22712,184427,1.508679e+09
...,...,...,...
2564,10051,84911,1.508212e+09
2565,322896,30760,1.508212e+09
2566,322896,157507,1.508212e+09
2567,123718,234481,1.508212e+09


## Dataset Preprocessing to Interaction Matrix

In [14]:
from recpack.matrix import InteractionMatrix
from recpack.preprocessing.preprocessors import DataFramePreprocessor

item_ix = 'click_article_id'
user_ix = 'user_id'
timestamp_ix = 'click_timestamp'

preprocessor = DataFramePreprocessor(item_ix=item_ix, user_ix=user_ix, timestamp_ix=timestamp_ix)

interaction_matrix = preprocessor.process(filtered_df)

  0%|          | 0/298819 [00:00<?, ?it/s]

  0%|          | 0/298819 [00:00<?, ?it/s]

## StrongGeneralization Scenario Splitting of Data

In [15]:
scenario = StrongGeneralization(frac_users_train=0.8, frac_interactions_in=0.8, validation=True)
scenario.split(interaction_matrix)

0it [00:00, ?it/s]

0it [00:00, ?it/s]

## Experimental RecPack Pipeline

In [24]:
pipeline_builder = PipelineBuilder()
ok = (scenario._validation_data_in, scenario._validation_data_out)
pipeline_builder.set_data_from_scenario(scenario)


# Add the baseline algorithms
#pipeline_builder.add_algorithm('ItemKNN', grid={'K': [100, 200, 400, 800]})
#pipeline_builder.add_algorithm('EASE', grid={'l2': [10, 100, 1000], 'alpha': [0, 0.1, 0.5]})

# Add our LightGCN algorithm
pipeline_builder.add_algorithm(
    'PinSAGEAlgorithm2',
    grid={
        'learning_rate': [0.1, 0.01, 0.001],
        'embedding_dim': [100, 200, 400]
    },
    params={
        'max_epochs': 5,
        'batch_size': 1024,
        'n_layers': 3
    }
)

# Add NDCG, Recall, and HR metrics to be evaluated at 10, 20, and 50
pipeline_builder.add_metric('NDCGK', [10, 20, 50])
pipeline_builder.add_metric('RecallK', [10, 20, 50])
pipeline_builder.add_metric('HitK', [10, 20, 50])

# Set the optimisation metric
pipeline_builder.set_optimisation_metric('RecallK', 20)

# Construct pipeline
pipeline = pipeline_builder.build()

# Debugging: Output the shape of the training data
#print(f"Training data shape: {im.shape}")

# Run pipeline, will first do optimisation, and then evaluation
pipeline.run()



  0%|          | 0/1 [00:00<?, ?it/s]

  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-04 18:58:40,776 - base - recpack - INFO - Processed epoch 0 in 4.94 s.Batch Training Loss = 0.3942
2024-08-04 18:58:48,548 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.4024587547305186, which is better than previous iterations.
2024-08-04 18:58:48,549 - base - recpack - INFO - Model improved. Storing better model.
2024-08-04 18:58:48,611 - base - recpack - INFO - Evaluation at end of 0 took 7.83 s.
2024-08-04 18:58:52,527 - base - recpack - INFO - Processed epoch 1 in 3.92 s.Batch Training Loss = 0.3879
2024-08-04 18:59:00,178 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.40268481369275827, which is worse than previous iterations.
2024-08-04 18:59:00,179 - base - recpack - INFO - Evaluation at end of 1 took 7.65 s.
2024-08-04 18:59:04,088 - base - recpack - INFO - Processed epoch 2 in 3.91 s.Batch Training Loss = 0.3874
2024-08-04 18:59:11,511 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.40005855492659076

  self.model_ = torch.load(self.best_model)
  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-04 18:59:46,246 - base - recpack - INFO - Processed epoch 0 in 3.76 s.Batch Training Loss = 0.3921
2024-08-04 18:59:53,768 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.3996632608632707, which is better than previous iterations.
2024-08-04 18:59:53,769 - base - recpack - INFO - Model improved. Storing better model.
2024-08-04 18:59:53,830 - base - recpack - INFO - Evaluation at end of 0 took 7.58 s.
2024-08-04 18:59:57,750 - base - recpack - INFO - Processed epoch 1 in 3.92 s.Batch Training Loss = 0.3837
2024-08-04 19:00:05,238 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.39898707062485184, which is worse than previous iterations.
2024-08-04 19:00:05,239 - base - recpack - INFO - Evaluation at end of 1 took 7.49 s.
2024-08-04 19:00:09,191 - base - recpack - INFO - Processed epoch 2 in 3.95 s.Batch Training Loss = 0.3834
2024-08-04 19:00:16,729 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.3959093159478139,

  self.model_ = torch.load(self.best_model)
  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-04 19:00:51,354 - base - recpack - INFO - Processed epoch 0 in 3.73 s.Batch Training Loss = 0.4180
2024-08-04 19:00:59,484 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.38641763651829425, which is better than previous iterations.
2024-08-04 19:00:59,485 - base - recpack - INFO - Model improved. Storing better model.
2024-08-04 19:00:59,565 - base - recpack - INFO - Evaluation at end of 0 took 8.21 s.
2024-08-04 19:01:03,334 - base - recpack - INFO - Processed epoch 1 in 3.77 s.Batch Training Loss = 0.3772
2024-08-04 19:01:11,548 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.3841856722533237, which is worse than previous iterations.
2024-08-04 19:01:11,549 - base - recpack - INFO - Evaluation at end of 1 took 8.21 s.
2024-08-04 19:01:15,356 - base - recpack - INFO - Processed epoch 2 in 3.81 s.Batch Training Loss = 0.3761
2024-08-04 19:01:23,484 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.385019579084985, 

  self.model_ = torch.load(self.best_model)
  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-04 19:02:03,102 - base - recpack - INFO - Processed epoch 0 in 6.60 s.Batch Training Loss = 0.3919
2024-08-04 19:02:11,038 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.39749064210010265, which is better than previous iterations.
2024-08-04 19:02:11,039 - base - recpack - INFO - Model improved. Storing better model.
2024-08-04 19:02:11,198 - base - recpack - INFO - Evaluation at end of 0 took 8.09 s.
2024-08-04 19:02:17,689 - base - recpack - INFO - Processed epoch 1 in 6.49 s.Batch Training Loss = 0.3849
2024-08-04 19:02:25,046 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.4001463287241852, which is worse than previous iterations.
2024-08-04 19:02:25,047 - base - recpack - INFO - Evaluation at end of 1 took 7.36 s.
2024-08-04 19:02:31,552 - base - recpack - INFO - Processed epoch 2 in 6.50 s.Batch Training Loss = 0.3846
2024-08-04 19:02:39,116 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.3976232598284455,

  self.model_ = torch.load(self.best_model)
  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-04 19:03:22,084 - base - recpack - INFO - Processed epoch 0 in 6.43 s.Batch Training Loss = 0.3931
2024-08-04 19:03:29,518 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.39469694755827445, which is better than previous iterations.
2024-08-04 19:03:29,519 - base - recpack - INFO - Model improved. Storing better model.
2024-08-04 19:03:29,695 - base - recpack - INFO - Evaluation at end of 0 took 7.61 s.
2024-08-04 19:03:36,098 - base - recpack - INFO - Processed epoch 1 in 6.40 s.Batch Training Loss = 0.3862
2024-08-04 19:03:43,595 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.39953250240969157, which is worse than previous iterations.
2024-08-04 19:03:43,596 - base - recpack - INFO - Evaluation at end of 1 took 7.50 s.
2024-08-04 19:03:49,975 - base - recpack - INFO - Processed epoch 2 in 6.38 s.Batch Training Loss = 0.3858
2024-08-04 19:03:57,482 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.4008720571921753

  self.model_ = torch.load(self.best_model)
  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-04 19:04:39,891 - base - recpack - INFO - Processed epoch 0 in 6.59 s.Batch Training Loss = 0.4073
2024-08-04 19:04:47,877 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.39328772776925724, which is better than previous iterations.
2024-08-04 19:04:47,878 - base - recpack - INFO - Model improved. Storing better model.
2024-08-04 19:04:48,038 - base - recpack - INFO - Evaluation at end of 0 took 8.15 s.
2024-08-04 19:04:54,480 - base - recpack - INFO - Processed epoch 1 in 6.44 s.Batch Training Loss = 0.3781
2024-08-04 19:05:02,442 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.39175125263489907, which is worse than previous iterations.
2024-08-04 19:05:02,443 - base - recpack - INFO - Evaluation at end of 1 took 7.96 s.
2024-08-04 19:05:08,997 - base - recpack - INFO - Processed epoch 2 in 6.55 s.Batch Training Loss = 0.3780
2024-08-04 19:05:17,406 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.3932798324401023

  self.model_ = torch.load(self.best_model)
  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-04 19:06:08,449 - base - recpack - INFO - Processed epoch 0 in 12.93 s.Batch Training Loss = 0.3928
2024-08-04 19:06:16,328 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.40000605424172253, which is better than previous iterations.
2024-08-04 19:06:16,330 - base - recpack - INFO - Model improved. Storing better model.
2024-08-04 19:06:16,644 - base - recpack - INFO - Evaluation at end of 0 took 8.19 s.
2024-08-04 19:06:29,580 - base - recpack - INFO - Processed epoch 1 in 12.93 s.Batch Training Loss = 0.3860
2024-08-04 19:06:37,478 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.4029636710814282, which is worse than previous iterations.
2024-08-04 19:06:37,479 - base - recpack - INFO - Evaluation at end of 1 took 7.90 s.
2024-08-04 19:06:50,401 - base - recpack - INFO - Processed epoch 2 in 12.92 s.Batch Training Loss = 0.3846
2024-08-04 19:06:58,066 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.39911772792876

  self.model_ = torch.load(self.best_model)
  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-04 19:08:01,046 - base - recpack - INFO - Processed epoch 0 in 12.89 s.Batch Training Loss = 0.3919
2024-08-04 19:08:08,570 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.3989792840029767, which is better than previous iterations.
2024-08-04 19:08:08,571 - base - recpack - INFO - Model improved. Storing better model.
2024-08-04 19:08:08,883 - base - recpack - INFO - Evaluation at end of 0 took 7.84 s.
2024-08-04 19:08:21,756 - base - recpack - INFO - Processed epoch 1 in 12.87 s.Batch Training Loss = 0.3845
2024-08-04 19:08:29,326 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.3988235079663505, which is worse than previous iterations.
2024-08-04 19:08:29,327 - base - recpack - INFO - Evaluation at end of 1 took 7.57 s.
2024-08-04 19:08:42,247 - base - recpack - INFO - Processed epoch 2 in 12.92 s.Batch Training Loss = 0.3843
2024-08-04 19:08:49,591 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.394269958879063

  self.model_ = torch.load(self.best_model)
  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-04 19:09:51,995 - base - recpack - INFO - Processed epoch 0 in 12.85 s.Batch Training Loss = 0.4007
2024-08-04 19:10:00,022 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.39088498744210454, which is better than previous iterations.
2024-08-04 19:10:00,024 - base - recpack - INFO - Model improved. Storing better model.
2024-08-04 19:10:00,341 - base - recpack - INFO - Evaluation at end of 0 took 8.34 s.
2024-08-04 19:10:13,287 - base - recpack - INFO - Processed epoch 1 in 12.95 s.Batch Training Loss = 0.3772
2024-08-04 19:10:21,156 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.3906829682043011, which is worse than previous iterations.
2024-08-04 19:10:21,157 - base - recpack - INFO - Evaluation at end of 1 took 7.87 s.
2024-08-04 19:10:34,098 - base - recpack - INFO - Processed epoch 2 in 12.94 s.Batch Training Loss = 0.3766
2024-08-04 19:10:42,104 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.38791932261865

  self.model_ = torch.load(self.best_model)
  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-04 19:11:45,726 - base - recpack - INFO - Processed epoch 0 in 12.95 s.Batch Training Loss = 0.4001
2024-08-04 19:11:53,638 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.3875605214055656, which is better than previous iterations.
2024-08-04 19:11:53,639 - base - recpack - INFO - Model improved. Storing better model.
2024-08-04 19:11:53,956 - base - recpack - INFO - Evaluation at end of 0 took 8.23 s.
2024-08-04 19:12:06,885 - base - recpack - INFO - Processed epoch 1 in 12.93 s.Batch Training Loss = 0.3785
2024-08-04 19:12:14,786 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.38863432716042834, which is worse than previous iterations.
2024-08-04 19:12:14,786 - base - recpack - INFO - Evaluation at end of 1 took 7.90 s.
2024-08-04 19:12:27,691 - base - recpack - INFO - Processed epoch 2 in 12.90 s.Batch Training Loss = 0.3773
2024-08-04 19:12:35,903 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.38411413330240

  self.model_ = torch.load(self.best_model)


## Results

In [25]:
pipeline.get_metrics()

Unnamed: 0,NDCGK_10,NDCGK_20,NDCGK_50,RecallK_10,RecallK_20,RecallK_50,HitK_10,HitK_20,HitK_50
"PinSAGEAlgorithm(batch_size=1024,dropout=0.1,embedding_dim=400,grad_clip=1.0,keep_last=False,learning_rate=0.001,max_epochs=5,max_iter_no_change=5,min_improvement=0.01,n_layers=3,predict_topK=None,save_best_to_file=False,seed=1663951561,stop_early=True,stopping_criterion=<recpack.algorithms.stopping_criterion.StoppingCriterion object at 0x7fb0756e6310>,validation_sample_size=None)",0.126869,0.147992,0.17897,0.23078,0.310627,0.458474,0.27781,0.376593,0.569525
