# LightGCN with Globo Dataset and StrongGeneralization Scenario

In this notebook, the implementation of LightGCN in RecPack and the experimental part to generate the results of the algorithm will be presented. 
The notebook contains:
1. The implementation of LightGCN in RecPack.
2. The 10% of Globo Dataset from RecPack and the StrongGeneralization Scenario has been used to split the data.
3. The StrongGeneralization Scenario to split the data.
4. The RecPack Pipeline Builder to run the experiments, including the splitted dataset, the algorithms and metrics to run. Hyperparameter has been performed in the Pipeline.

Please make sure you have installed all the latest libraries in your Python environment, in order to have a successful run of the code.

In [2]:
import torch
from tqdm import tqdm

In [4]:
from torch_sparse import SparseTensor, matmul

## LightGCN implementation in RecPack

In [5]:
import torch
import torch.nn as nn
from torch_geometric.nn import MessagePassing
from torch_geometric.utils import add_self_loops, degree
import time
from typing import List, Tuple, Optional
from tqdm import tqdm
from recpack.algorithms.base import TorchMLAlgorithm
from recpack.matrix.interaction_matrix import InteractionMatrix
from recpack.matrix import to_csr_matrix
from recpack.algorithms.loss_functions import bpr_loss
from recpack.algorithms.samplers import PositiveNegativeSampler
from scipy.sparse import csr_matrix, lil_matrix
import torch.optim as optim
import logging

# logger = logging.getLogger(__name__)

# LightGCN model definition using MessagePassing from PyTorch Geometric
class LightGCN(MessagePassing):
    def __init__(self, num_users, num_items, embedding_dim=64, K=3, add_self_loops=False):
        """
        Initialize the LightGCN model with user and item embeddings.

        Args:
            num_users (int): Number of users.
            num_items (int): Number of items.
            embedding_dim (int): Dimension of the embedding vectors.
            K (int): Number of propagation layers.
            add_self_loops (bool): Whether to add self-loops to the adjacency matrix.
        """
        super(LightGCN, self).__init__(aggr='add')
        self.num_users, self.num_items = num_users, num_items
        self.embedding_dim, self.K = embedding_dim, K
        self.add_self_loops = add_self_loops

        # Initialize user and item embeddings
        self.users_emb = nn.Embedding(num_embeddings=self.num_users, embedding_dim=self.embedding_dim)
        self.items_emb = nn.Embedding(num_embeddings=self.num_items, embedding_dim=self.embedding_dim)

        # Initialize embeddings with normal distribution
        nn.init.normal_(self.users_emb.weight, std=0.1)
        nn.init.normal_(self.items_emb.weight, std=0.1)

    def forward(self, edge_index: SparseTensor):
        """
        Forward pass for the LightGCN model.

        Args:
            edge_index (SparseTensor): Sparse tensor representing the adjacency matrix.

        Returns:
            Tuple: Final user and item embeddings after propagation, and the initial embeddings.
        """
        if self.add_self_loops:
            edge_index, _ = add_self_loops(edge_index, num_nodes=self.num_users + self.num_items)
        
        # Normalize the adjacency matrix
        edge_index_norm = self.normalize_adj(edge_index)
        
        # Concatenate user and item embeddings
        emb_0 = torch.cat([self.users_emb.weight, self.items_emb.weight])
        embs = [emb_0]
        emb_k = emb_0

        # Perform K propagation steps
        for i in range(self.K):
            # Preventing CUDA/Library version error
            try:
                emb_k = self.propagate(edge_index_norm, x=emb_k)
            except RuntimeError as e:
                break
            embs.append(emb_k)

        # Stack and average embeddings from each propagation step
        embs = torch.stack(embs, dim=1)
        emb_final = torch.mean(embs, dim=1)

        # Split the final embeddings back into user and item embeddings
        users_emb_final, items_emb_final = torch.split(emb_final, [self.num_users, self.num_items])

        return users_emb_final, self.users_emb.weight, items_emb_final, self.items_emb.weight

    def message(self, x_j: torch.Tensor) -> torch.Tensor:
        """
        Message function that aggregates messages from neighboring nodes.

        Args:
            x_j (torch.Tensor): Features of the neighboring nodes.

        Returns:
            torch.Tensor: Aggregated messages.
        """
        return x_j

    def message_and_aggregate(self, adj_t: SparseTensor, x: torch.Tensor) -> torch.Tensor:
        """
        Message and aggregate function using matrix multiplication.

        Args:
            adj_t (SparseTensor): Transposed adjacency matrix.
            x (torch.Tensor): Node features.

        Returns:
            torch.Tensor: Result of multiplying adjacency matrix with node features.
        """
        return matmul(adj_t, x)

    def normalize_adj(self, edge_index: SparseTensor) -> SparseTensor:
        """
        Normalize the adjacency matrix.

        Args:
            edge_index (SparseTensor): Sparse tensor representing the adjacency matrix.

        Returns:
            SparseTensor: Normalized adjacency matrix.
        """
        row, col, value = edge_index.coo()
        row = row.long()  # Ensure indices are long type
        col = col.long()  # Ensure indices are long type
        deg = degree(row, num_nodes=edge_index.size(0), dtype=value.dtype)
        deg_inv_sqrt = deg.pow(-0.5)
        deg_inv_sqrt[deg_inv_sqrt == float('inf')] = 0
        value = deg_inv_sqrt[row] * value * deg_inv_sqrt[col]
        return SparseTensor(row=row, col=col, value=value, sparse_sizes=edge_index.sparse_sizes())


In [6]:
from recpack.algorithms.base import TorchMLAlgorithm
from recpack.matrix import Matrix
from recpack.matrix.interaction_matrix import InteractionMatrix
from recpack.algorithms.loss_functions import bpr_loss, bpr_max_loss
from recpack.algorithms.samplers import PositiveNegativeSampler
from recpack.algorithms.stopping_criterion import (
    EarlyStoppingException,
    StoppingCriterion,
)
from typing import List, Tuple, Optional
import numpy as np
from scipy.sparse import csr_matrix, lil_matrix, coo_matrix
import torch
import torch.optim as optim
import tempfile
import time
import logging

logger = logging.getLogger(__name__)

# Implementation of the LightGCN algorithm using TorchMLAlgorithm as a base class
class LightGCNAlgorithm(TorchMLAlgorithm):
    def __init__(
        self,
        batch_size: int = 256,
        max_epochs: int = 100,
        learning_rate: float = 0.001,
        embedding_dim: int = 64,
        n_layers: int = 3,
        reg_weight: float = 1e-5,
        stopping_criterion: str = "bpr",
        stop_early: bool = True,
        max_iter_no_change: int = 5,
        min_improvement: float = 0.01,
        seed: Optional[int] = None,
        save_best_to_file: bool = False,
        keep_last: bool = False,
        predict_topK: Optional[int] = None,
        validation_sample_size: Optional[int] = None,
    ):
        """
        Initialize the LightGCNAlgorithm with various hyperparameters.

        Args:
            batch_size (int): Number of samples per batch.
            max_epochs (int): Maximum number of training epochs.
            learning_rate (float): Learning rate for the optimizer.
            embedding_dim (int): Dimension of the embedding vectors.
            n_layers (int): Number of LightGCN layers.
            reg_weight (float): Regularization weight.
            stopping_criterion (str): Criterion to stop training early.
            stop_early (bool): Whether to enable early stopping.
            max_iter_no_change (int): Maximum iterations with no improvement for early stopping.
            min_improvement (float): Minimum improvement required for early stopping.
            seed (Optional[int]): Random seed for reproducibility.
            save_best_to_file (bool): Whether to save the best model to a file.
            keep_last (bool): Whether to keep the last model.
            predict_topK (Optional[int]): Number of top-K predictions to consider.
            validation_sample_size (Optional[int]): Size of the validation sample.
        """
        self.embedding_dim = embedding_dim
        self.n_layers = n_layers
        self.reg_weight = reg_weight
        super().__init__(
            batch_size=batch_size,
            max_epochs=max_epochs,
            learning_rate=learning_rate,
            stopping_criterion=stopping_criterion,
            stop_early=stop_early,
            max_iter_no_change=max_iter_no_change,
            min_improvement=min_improvement,
            seed=seed,
            save_best_to_file=save_best_to_file,
            keep_last=keep_last,
            predict_topK=predict_topK,
            validation_sample_size=validation_sample_size,
        )
        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    def _init_model(self, train: InteractionMatrix) -> None:
        """
        Initialize the LightGCN model and optimizer.

        Args:
            train (InteractionMatrix): The training interaction matrix.
        """
        num_users, num_items = train.shape
        self.model_ = LightGCN(num_users, num_items, self.embedding_dim, self.n_layers).to(self.device)
        self.optimizer = optim.Adam(self.model_.parameters(), lr=self.learning_rate)

    def _create_sparse_graph(self, interaction_matrix: csr_matrix, num_users: int, num_items: int) -> SparseTensor:
        """
        Create a sparse graph from the interaction matrix.

        Args:
            interaction_matrix (csr_matrix): The interaction matrix in CSR format.
            num_users (int): Number of users.
            num_items (int): Number of items.

        Returns:
            SparseTensor: A sparse tensor representing the graph.
        """
        coo = interaction_matrix.tocoo()
        row = torch.tensor(coo.row, dtype=torch.long)
        col = torch.tensor(coo.col, dtype=torch.long)
        value = torch.tensor(coo.data, dtype=torch.float32)
        print(f"Graph - Rows: {row.shape}, Cols: {col.shape}, Values: {value.shape}")  # Debugging information
        return SparseTensor(row=row, col=col, value=value, sparse_sizes=(num_users + num_items, num_users + num_items)).to(self.device)

    def _train_epoch(self, train: csr_matrix) -> List[float]:
        """
        Train the model for one epoch.

        Args:
            train (csr_matrix): The training interaction matrix.

        Returns:
            List[float]: A list of losses for each batch.
        """
        self.model_.train()
        graph = self._create_sparse_graph(train, train.shape[0], train.shape[1]).to(self.device)
        total_loss = 0
        losses = []

        sampler = PositiveNegativeSampler(num_negatives=1, batch_size=self.batch_size)

        for user_indices, pos_item_indices, neg_item_indices in sampler.sample(train):
            user_indices = torch.tensor(user_indices).to(self.device)
            pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
            neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()

            self.optimizer.zero_grad()
            users_emb_final, _, items_emb_final, _ = self.model_(graph)  # Call model only once
            pos_scores = users_emb_final[user_indices] @ items_emb_final[pos_item_indices].t()
            neg_scores = users_emb_final[user_indices] @ items_emb_final[neg_item_indices].t()

            loss = bpr_loss(pos_scores, neg_scores)

            if torch.isnan(loss).any() or torch.isinf(loss).any():
                continue

            loss.backward()
            torch.nn.utils.clip_grad_norm_(self.model_.parameters(), max_norm=1.0)  # Gradient clipping
            self.optimizer.step()

            total_loss += loss.item()
            losses.append(loss.item())

        if len(losses) == 0:
            return [float('nan')]

        return losses

    def _batch_predict(self, X: csr_matrix, users: List[int]) -> csr_matrix:
        """
        Make batch predictions for a list of users.

        Args:
            X (csr_matrix): The interaction matrix.
            users (List[int]): List of user indices to make predictions for.

        Returns:
            csr_matrix: A sparse matrix with the prediction scores.
        """
        self.model_.eval()
        graph = self._create_sparse_graph(X, X.shape[0], X.shape[1]).to(self.device)
        user_indices = torch.tensor(users).to(self.device)
        item_indices = torch.arange(X.shape[1]).to(self.device)
        
        with torch.no_grad():
            user_emb_final, _, item_emb_final, _ = self.model_(graph)
            scores = user_emb_final[user_indices] @ item_emb_final.t()
            scores = scores.cpu().numpy()
        
        result = lil_matrix((X.shape[0], X.shape[1]))
        for i, user in enumerate(users):
            result[user] = scores[i]
        
        return result.tocsr()

    def fit(self, X: csr_matrix, validation_data: tuple):
        """
        Fit the model to the training data.

        Args:
            X (csr_matrix): The training interaction matrix.
            validation_data (tuple): Validation data used for early stopping.
        """
        super().fit(X, validation_data)

In [7]:
from recpack.datasets import Netflix, DummyDataset
from recpack.pipelines import PipelineBuilder
from recpack.scenarios import StrongGeneralization, TimedLastItemPrediction, WeakGeneralization
from recpack.pipelines import ALGORITHM_REGISTRY
import pandas as pd

In [8]:
ALGORITHM_REGISTRY.register("LightGCNAlgorithm1", LightGCNAlgorithm)

## RecPack Dataset Importing

In [12]:
from recpack.datasets import Globo
dataset = Globo(path="", filename="archive.zip")

In [13]:
dataset.fetch_dataset()

In [14]:
dataset

<recpack.datasets.globo.Globo at 0x7f685a01d410>

In [None]:
df = dataset._load_dataframe()
#df = dataset.load()

## Datasets with Timestamps sampling

In [33]:
timestamp_counts = df['click_timestamp'].value_counts().sort_index(ascending=False)
cumulative_counts = timestamp_counts.cumsum()
total_counts = cumulative_counts.max()
threshold_count = total_counts * 0.1
threshold_timestamp = cumulative_counts[cumulative_counts >= threshold_count].index[0]

In [34]:
filtered_df = df[df['click_timestamp'] >= threshold_timestamp]

In [35]:
df

Unnamed: 0,user_id,click_article_id,click_timestamp
0,0,157541,1.506827e+09
1,0,68866,1.506827e+09
2,1,235840,1.506827e+09
3,1,96663,1.506827e+09
4,2,119592,1.506827e+09
...,...,...,...
2564,10051,84911,1.508212e+09
2565,322896,30760,1.508212e+09
2566,322896,157507,1.508212e+09
2567,123718,234481,1.508212e+09


In [36]:
filtered_df

Unnamed: 0,user_id,click_article_id,click_timestamp
4289,22712,158772,1.508196e+09
4290,22712,284638,1.508633e+09
4291,22712,95633,1.508678e+09
4292,22712,95524,1.508679e+09
4293,22712,184427,1.508679e+09
...,...,...,...
2564,10051,84911,1.508212e+09
2565,322896,30760,1.508212e+09
2566,322896,157507,1.508212e+09
2567,123718,234481,1.508212e+09


## Dataset Preprocessing to Interaction Matrix

In [38]:
from recpack.matrix import InteractionMatrix
from recpack.preprocessing.preprocessors import DataFramePreprocessor

item_ix = 'click_article_id'
user_ix = 'user_id'
timestamp_ix = 'click_timestamp'

preprocessor = DataFramePreprocessor(item_ix=item_ix, user_ix=user_ix, timestamp_ix=timestamp_ix)

interaction_matrix = preprocessor.process(filtered_df)

  0%|          | 0/298819 [00:00<?, ?it/s]

  0%|          | 0/298819 [00:00<?, ?it/s]

## StrongGeneralization Scenario Splitting of Data

In [40]:
scenario = StrongGeneralization(frac_users_train=0.8, frac_interactions_in=0.8, validation=True)
scenario.split(interaction_matrix)

0it [00:00, ?it/s]

0it [00:00, ?it/s]

## Experimental RecPack Pipeline

In [41]:
pipeline_builder = PipelineBuilder()
ok = (scenario._validation_data_in, scenario._validation_data_out)
pipeline_builder.set_data_from_scenario(scenario)


# Add the baseline algorithms
#pipeline_builder.add_algorithm('ItemKNN', grid={'K': [100, 200, 400, 800]})
#pipeline_builder.add_algorithm('EASE', grid={'l2': [10, 100, 1000], 'alpha': [0, 0.1, 0.5]})

# Add our LightGCN algorithm
pipeline_builder.add_algorithm(
    'LightGCNAlgorithm1',
    grid={
        'learning_rate': [0.1, 0.01, 0.001],
        'embedding_dim': [100, 200, 400]
    },
    params={
        'max_epochs': 5,
        'batch_size': 1024,
        'n_layers': 3
    }
)

# Add NDCG, Recall, and HR metrics to be evaluated at 10, 20, and 50
pipeline_builder.add_metric('NDCGK', [10, 20, 50])
pipeline_builder.add_metric('RecallK', [10, 20, 50])
pipeline_builder.add_metric('HitK', [10, 20, 50])

# Set the optimisation metric
pipeline_builder.set_optimisation_metric('RecallK', 20)

# Construct pipeline
pipeline = pipeline_builder.build()

# Debugging: Output the shape of the training data
#print(f"Training data shape: {im.shape}")

# Run pipeline, will first do optimisation, and then evaluation
pipeline.run()



  0%|          | 0/1 [00:00<?, ?it/s]

  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-04 16:19:48,827 - base - recpack - INFO - Processed epoch 0 in 2.66 s.Batch Training Loss = 0.1877
2024-08-04 16:20:06,707 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.697225770719185, which is better than previous iterations.
2024-08-04 16:20:06,708 - base - recpack - INFO - Model improved. Storing better model.
2024-08-04 16:20:06,765 - base - recpack - INFO - Evaluation at end of 0 took 17.94 s.
2024-08-04 16:20:09,334 - base - recpack - INFO - Processed epoch 1 in 2.57 s.Batch Training Loss = 0.1636
2024-08-04 16:20:26,730 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.697139449436499, which is worse than previous iterations.
2024-08-04 16:20:26,731 - base - recpack - INFO - Evaluation at end of 1 took 17.40 s.
2024-08-04 16:20:29,118 - base - recpack - INFO - Processed epoch 2 in 2.39 s.Batch Training Loss = 0.1438
2024-08-04 16:20:46,729 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.698590678685544, w

  self.model_ = torch.load(self.best_model)
  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-04 16:21:51,909 - base - recpack - INFO - Processed epoch 0 in 2.56 s.Batch Training Loss = 0.2533
2024-08-04 16:22:09,245 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.693764085643032, which is better than previous iterations.
2024-08-04 16:22:09,246 - base - recpack - INFO - Model improved. Storing better model.
2024-08-04 16:22:09,303 - base - recpack - INFO - Evaluation at end of 0 took 17.39 s.
2024-08-04 16:22:11,922 - base - recpack - INFO - Processed epoch 1 in 2.62 s.Batch Training Loss = 0.1105
2024-08-04 16:22:29,389 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6942464392478546, which is worse than previous iterations.
2024-08-04 16:22:29,390 - base - recpack - INFO - Evaluation at end of 1 took 17.47 s.
2024-08-04 16:22:32,128 - base - recpack - INFO - Processed epoch 2 in 2.74 s.Batch Training Loss = 0.0959
2024-08-04 16:22:50,082 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6927156162379505,

  self.model_ = torch.load(self.best_model)
  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-04 16:23:56,988 - base - recpack - INFO - Processed epoch 0 in 2.63 s.Batch Training Loss = 0.6314
2024-08-04 16:24:14,397 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6929225248184446, which is better than previous iterations.
2024-08-04 16:24:14,399 - base - recpack - INFO - Model improved. Storing better model.
2024-08-04 16:24:14,459 - base - recpack - INFO - Evaluation at end of 0 took 17.47 s.
2024-08-04 16:24:17,111 - base - recpack - INFO - Processed epoch 1 in 2.65 s.Batch Training Loss = 0.3863
2024-08-04 16:24:34,876 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6931801658049614, which is worse than previous iterations.
2024-08-04 16:24:34,877 - base - recpack - INFO - Evaluation at end of 1 took 17.77 s.
2024-08-04 16:24:37,536 - base - recpack - INFO - Processed epoch 2 in 2.66 s.Batch Training Loss = 0.2424
2024-08-04 16:24:55,476 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6933110060715539

  self.model_ = torch.load(self.best_model)
  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-04 16:26:01,245 - base - recpack - INFO - Processed epoch 0 in 3.40 s.Batch Training Loss = 0.2354
2024-08-04 16:26:19,367 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6978755409878843, which is better than previous iterations.
2024-08-04 16:26:19,368 - base - recpack - INFO - Model improved. Storing better model.
2024-08-04 16:26:19,476 - base - recpack - INFO - Evaluation at end of 0 took 18.23 s.
2024-08-04 16:26:21,692 - base - recpack - INFO - Processed epoch 1 in 2.21 s.Batch Training Loss = 0.2590
2024-08-04 16:26:39,713 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6998723982685229, which is worse than previous iterations.
2024-08-04 16:26:39,714 - base - recpack - INFO - Evaluation at end of 1 took 18.02 s.
2024-08-04 16:26:41,593 - base - recpack - INFO - Processed epoch 2 in 1.88 s.Batch Training Loss = 0.2617
2024-08-04 16:26:59,149 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6999352953132717

  self.model_ = torch.load(self.best_model)
  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-04 16:28:04,410 - base - recpack - INFO - Processed epoch 0 in 3.64 s.Batch Training Loss = 0.2216
2024-08-04 16:28:21,789 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6935064364442387, which is better than previous iterations.
2024-08-04 16:28:21,790 - base - recpack - INFO - Model improved. Storing better model.
2024-08-04 16:28:21,898 - base - recpack - INFO - Evaluation at end of 0 took 17.49 s.
2024-08-04 16:28:25,518 - base - recpack - INFO - Processed epoch 1 in 3.62 s.Batch Training Loss = 0.1055
2024-08-04 16:28:43,382 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6941477057790343, which is worse than previous iterations.
2024-08-04 16:28:43,383 - base - recpack - INFO - Evaluation at end of 1 took 17.86 s.
2024-08-04 16:28:47,018 - base - recpack - INFO - Processed epoch 2 in 3.63 s.Batch Training Loss = 0.0887
2024-08-04 16:29:04,984 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6937701422029867

  self.model_ = torch.load(self.best_model)
  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-04 16:30:13,689 - base - recpack - INFO - Processed epoch 0 in 3.63 s.Batch Training Loss = 0.5881
2024-08-04 16:30:31,714 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6937199638921236, which is better than previous iterations.
2024-08-04 16:30:31,715 - base - recpack - INFO - Model improved. Storing better model.
2024-08-04 16:30:31,822 - base - recpack - INFO - Evaluation at end of 0 took 18.13 s.
2024-08-04 16:30:35,597 - base - recpack - INFO - Processed epoch 1 in 3.77 s.Batch Training Loss = 0.2919
2024-08-04 16:30:53,101 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6937378068728239, which is worse than previous iterations.
2024-08-04 16:30:53,102 - base - recpack - INFO - Evaluation at end of 1 took 17.50 s.
2024-08-04 16:30:56,832 - base - recpack - INFO - Processed epoch 2 in 3.73 s.Batch Training Loss = 0.1871
2024-08-04 16:31:14,635 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6939630386317703

  self.model_ = torch.load(self.best_model)
  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-04 16:32:24,439 - base - recpack - INFO - Processed epoch 0 in 4.39 s.Batch Training Loss = 0.3199
2024-08-04 16:32:41,572 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6986445944099671, which is better than previous iterations.
2024-08-04 16:32:41,573 - base - recpack - INFO - Model improved. Storing better model.
2024-08-04 16:32:41,782 - base - recpack - INFO - Evaluation at end of 0 took 17.34 s.
2024-08-04 16:32:44,406 - base - recpack - INFO - Processed epoch 1 in 2.62 s.Batch Training Loss = 0.3643
2024-08-04 16:33:02,008 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6983709090862136, which is worse than previous iterations.
2024-08-04 16:33:02,009 - base - recpack - INFO - Evaluation at end of 1 took 17.60 s.
2024-08-04 16:33:04,582 - base - recpack - INFO - Processed epoch 2 in 2.57 s.Batch Training Loss = 0.4172
2024-08-04 16:33:22,342 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.7005112108667504

  self.model_ = torch.load(self.best_model)
  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-04 16:34:31,948 - base - recpack - INFO - Processed epoch 0 in 5.87 s.Batch Training Loss = 0.1976
2024-08-04 16:34:50,067 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6945123548861797, which is better than previous iterations.
2024-08-04 16:34:50,067 - base - recpack - INFO - Model improved. Storing better model.
2024-08-04 16:34:50,282 - base - recpack - INFO - Evaluation at end of 0 took 18.33 s.
2024-08-04 16:34:56,147 - base - recpack - INFO - Processed epoch 1 in 5.86 s.Batch Training Loss = 0.0987
2024-08-04 16:35:13,546 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6944732077006555, which is worse than previous iterations.
2024-08-04 16:35:13,547 - base - recpack - INFO - Evaluation at end of 1 took 17.40 s.
2024-08-04 16:35:19,408 - base - recpack - INFO - Processed epoch 2 in 5.86 s.Batch Training Loss = 0.0891
2024-08-04 16:35:37,137 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6948403042226701

  self.model_ = torch.load(self.best_model)
  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-04 16:36:52,127 - base - recpack - INFO - Processed epoch 0 in 5.86 s.Batch Training Loss = 0.5290
2024-08-04 16:37:10,185 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6931482322493041, which is better than previous iterations.
2024-08-04 16:37:10,187 - base - recpack - INFO - Model improved. Storing better model.
2024-08-04 16:37:10,453 - base - recpack - INFO - Evaluation at end of 0 took 18.32 s.
2024-08-04 16:37:16,417 - base - recpack - INFO - Processed epoch 1 in 5.96 s.Batch Training Loss = 0.2208
2024-08-04 16:37:33,602 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6937285583372285, which is worse than previous iterations.
2024-08-04 16:37:33,603 - base - recpack - INFO - Evaluation at end of 1 took 17.19 s.
2024-08-04 16:37:39,463 - base - recpack - INFO - Processed epoch 2 in 5.86 s.Batch Training Loss = 0.1506
2024-08-04 16:37:57,275 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6941502234904738

  self.model_ = torch.load(self.best_model)
  user_indices = torch.tensor(user_indices).to(self.device)
  pos_item_indices = torch.tensor(pos_item_indices).to(self.device)
  neg_item_indices = torch.tensor(neg_item_indices).to(self.device).squeeze()


2024-08-04 16:39:08,218 - base - recpack - INFO - Processed epoch 0 in 2.65 s.Batch Training Loss = 0.2542
2024-08-04 16:39:25,808 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6939483783111025, which is better than previous iterations.
2024-08-04 16:39:25,809 - base - recpack - INFO - Model improved. Storing better model.
2024-08-04 16:39:25,864 - base - recpack - INFO - Evaluation at end of 0 took 17.64 s.
2024-08-04 16:39:28,669 - base - recpack - INFO - Processed epoch 1 in 2.80 s.Batch Training Loss = 0.1108
2024-08-04 16:39:46,435 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6942927179718827, which is worse than previous iterations.
2024-08-04 16:39:46,436 - base - recpack - INFO - Evaluation at end of 1 took 17.77 s.
2024-08-04 16:39:49,197 - base - recpack - INFO - Processed epoch 2 in 2.76 s.Batch Training Loss = 0.0956
2024-08-04 16:40:06,943 - stopping_criterion - recpack - INFO - StoppingCriterion has value 0.6948049135750978

  self.model_ = torch.load(self.best_model)


## Results

In [42]:
pipeline.get_metrics()

Unnamed: 0,NDCGK_10,NDCGK_20,NDCGK_50,RecallK_10,RecallK_20,RecallK_50,HitK_10,HitK_20,HitK_50
"LightGCNAlgorithm(batch_size=1024,embedding_dim=100,keep_last=False,learning_rate=0.01,max_epochs=5,max_iter_no_change=5,min_improvement=0.01,n_layers=3,predict_topK=None,reg_weight=1e-05,save_best_to_file=False,seed=3647745631,stop_early=True,stopping_criterion=<recpack.algorithms.stopping_criterion.StoppingCriterion object at 0x7f66c76cdd50>,validation_sample_size=None)",0.031082,0.042532,0.055994,0.062554,0.105838,0.1703,0.076439,0.129807,0.212305
