## List of General Recommendation Models in RecBole

Below is a summary of the key general recommendation models available in the RecBole framework, designed primarily for handling implicit feedback data:

- **Pop**: Simple popularity-based recommendation model.
- **ItemKNN**: Traditional item-based collaborative filtering.
- **BPR (Bayesian Personalized Ranking)**: Optimizes a pairwise ranking loss, ideal for datasets with implicit feedback.
- **NeuMF (Neural Matrix Factorization)**: Combines classical matrix factorization with deep neural networks.
- **ConvNCF (Convolutional Neural Collaborative Filtering)**: Integrates convolutional neural networks with matrix factorization.
- **DMF (Deep Matrix Factorization)**: Uses deep learning techniques to enhance matrix factorization.
- **FISM (Factored Item Similarity Models)**: A variant of matrix factorization focusing on item similarities.
- **NAIS (Neural Attentive Item Similarity model)**: Applies attention mechanisms to item similarities in collaborative filtering.
- **SpectralCF**: Leverages graph spectral theory for collaborative filtering.
- **GCMC (Graph Convolutional Matrix Completion)**: Applies graph convolutional networks to matrix completion tasks.
- **NGCF (Neural Graph Collaborative Filtering)**: Enhances collaborative filtering with graph neural networks.
- **LightGCN**: Simplifies Graph Convolutional Networks by removing feature transformations and nonlinear activations.
- **DGCF (Disentangled Graph Collaborative Filtering)**: Focuses on disentangling the latent factors in collaborative filtering.
- **LINE**: Designed for large-scale information network embeddings.
- **MultiVAE**, **MultiDAE**: Variational and denoising autoencoders for collaborative filtering.
- **MacridVAE**: Variational autoencoder with a focus on disentangling user preferences.
- **CDAE (Collaborative Denoising Auto-Encoder)**: Combines collaborative filtering with the denoising capabilities of autoencoders.
- **ENMF (Efficient Neural Matrix Factorization)**: A more efficient take on neural matrix factorization.
- **NNCF (Neural Network-based Collaborative Filtering)**: Utilizes neural networks for collaborative filtering.
- **RaCT**, **RecVAE**: Advanced models using variational autoencoders for recommendation.
- **EASE (Embarrassingly Shallow Autoencoders for Sparse Data)**: A straightforward linear autoencoder approach for collaborative filtering.
- **SLIMElastic**: Sparse linear method enhanced with elastic net regularization.
- **SGL (Self-supervised Graph Learning for recommendation)**: Integrates self-supervised learning with graph-based recommendation.
- **ADMMSLIM**, **NCEPLRec**, **SimpleX**, **NCL (Neighborhood-based Collaborative Learning)**, **Random**, **DiffRec**, **LDiffRec**: Various models that integrate different techniques for general recommendation tasks.

This comprehensive list includes a variety of models from simple to sophisticated, covering a wide range of techniques suitable for general recommendation scenarios, often based on implicit feedback.

Source: https://recbole.io/docs/user_guide/model_intro.html#general-recommendation

## Grouped Recommendation Models in RecBole

Below is an organized summary of key recommendation models in RecBole, grouped by their implementation approach. This categorization will help in selecting models based on specific use cases and characteristics of the dataset.

### Collaborative Filtering Models
These models make recommendations based on past interactions between users and items:

- **ItemKNN**: Item-based nearest neighbors.
- **BPR (Bayesian Personalized Ranking)**: Utilizes pairwise ranking loss, ideal for implicit feedback.
- **NeuMF (Neural Matrix Factorization)**: Integrates deep learning with traditional matrix factorization.
- **FISM (Factored Item Similarity Models)**: Focuses on item similarity using matrix factorization techniques.
- **NAIS (Neural Attentive Item Similarity model)**: Applies attention mechanisms to enhance item similarity models.

### Graph-Based Models
Leveraging graph structures to represent complex relationships between items and users:

- **SpectralCF**: Employs spectral graph theory in collaborative filtering.
- **GCMC (Graph Convolutional Matrix Completion)**: Utilizes graph convolutional networks for matrix completion.
- **NGCF (Neural Graph Collaborative Filtering)**: Incorporates graph neural networks to learn from user-item interactions.
- **LightGCN**: Simplifies Graph Convolutional Networks by removing nonlinearities and feature transformation.
- **DGCF (Disentangled Graph Collaborative Filtering)**: Disentangles latent factors in collaborative filtering using graphs.

### Deep Learning Models
Using neural networks to uncover patterns in user-item interactions:

- **ConvNCF (Convolutional Neural Collaborative Filtering)**: Combines convolutional neural networks with matrix factorization.
- **DMF (Deep Matrix Factorization)**: Enhances matrix factorization with deep learning techniques.
- **NNCF (Neural Network-based Collaborative Filtering)**: General neural network approach for collaborative filtering.
- **ENMF (Efficient Neural Matrix Factorization)**: A more efficient version of neural matrix factorization.

### Autoencoder-Based Models
Utilizing autoencoders to compress and learn from user-item interactions:

- **MultiVAE**, **MultiDAE**: Variational and denoising autoencoders focused on collaborative filtering.
- **MacridVAE**: Variational autoencoder designed to disentangle user preferences.
- **RecVAE**: Advanced variational autoencoder for recommendation.
- **EASE (Embarrassingly Shallow Autoencoders for Sparse Data)**: Simple linear autoencoder approach.

### Hybrid Models
Combining multiple techniques to utilize strengths from different areas:

- **SLIMElastic**: Incorporates sparse linear methods with elastic net regularization.
- **SGL (Self-supervised Graph Learning for recommendation)**: Integrates self-supervised learning with graph-based methods.
- **ADMMSLIM**, **NCEPLRec**, **SimpleX**: Various methods that combine optimization techniques with collaborative filtering.
- **Random**, **DiffRec**, **LDiffRec**: Models that incorporate differentiating strategies or random sampling.

### Other Models
Models that are categorized based on unique characteristics or simpler approaches:

- **Pop**: Based purely on item popularity, involves no learning.
- **LINE**: Designed for large-scale information network embeddings.

This comprehensive list offers a structured way to explore various models based on their technical approach to handling recommendations, which can be particularly useful for academic or professional projects in recommender systems.


## ItemKNN with Explicit Feedback
This configuration treats user ratings from the ML-100k dataset as explicit feedback. Here, the actual numerical ratings are used to compute similarities between items and to predict user preferences. This setup leverages the explicit ratings to rank items more accurately according to how users have rated them in the past. Unlike implicit feedback, which infers interactions, explicit feedback directly reflects user preferences, providing a clear indication of how much a user likes or dislikes an item.

In [2]:
# Configuration for running ItemKNN Model with Explicit Feedback - Ranking Metrics
config_dict_ranking = {
    'model': 'ItemKNN',
    'dataset': 'ml-100k',
    'similarity': 'cosine',
    'k': 20,
    'use_implicit': False,
    'eval_setting': 'RO_RS',
    'metrics': ['Recall', 'MRR', 'NDCG', 'Hit', 'Precision'],  # Focus on ranking-based metrics
    'valid_metric': 'MRR@10',
    'topk': 10,
    'gpu_id': 0,
    'early_stop': 5,
}

# Run the model for ranking metrics
run_recbole(model='ItemKNN', dataset='ml-100k', config_dict=config_dict_ranking)


14 Apr 15:49    INFO  ['/home/stef/.local/lib/python3.10/site-packages/ipykernel_launcher.py', '-f', '/home/stef/.local/share/jupyter/runtime/kernel-ce844cad-8ee2-499f-8f1d-229165b49f0f.json']
14 Apr 15:49    INFO  
General Hyper Parameters:
gpu_id = 0
use_gpu = True
seed = 2020
state = INFO
reproducibility = True
data_path = /home/stef/.local/lib/python3.10/site-packages/recbole/config/../dataset_example/ml-100k
checkpoint_dir = saved
show_progress = True
save_dataset = False
dataset_save_path = None
save_dataloaders = False
dataloaders_save_path = None
log_wandb = False

Training Hyper Parameters:
epochs = 300
train_batch_size = 2048
learner = adam
learning_rate = 0.001
train_neg_sample_args = {'distribution': 'uniform', 'sample_num': 1, 'alpha': 1.0, 'dynamic': False, 'candidate_num': 0}
eval_step = 1
stopping_step = 10
clip_grad_norm = None
weight_decay = 0.0
loss_decimal_place = 4

Evaluation Hyper Parameters:
eval_args = {'split': {'RS': [0.8, 0.1, 0.1]}, 'order': 'RO', 'group_by

{'best_valid_score': 0.3553,
 'valid_score_bigger': True,
 'best_valid_result': OrderedDict([('recall@10', 0.1884),
              ('mrr@10', 0.3553),
              ('ndcg@10', 0.2063),
              ('hit@10', 0.7041),
              ('precision@10', 0.1471)]),
 'test_result': OrderedDict([('recall@10', 0.2328),
              ('mrr@10', 0.4364),
              ('ndcg@10', 0.2637),
              ('hit@10', 0.7762),
              ('precision@10', 0.1852)])}

## ItemKNN Model with Implicit Feedback Conversion

This process involves converting the explicit ratings from the MovieLens 100K dataset into a binary format for implicit feedback analysis. Ratings above a threshold of 3.5 are considered positive interactions (indicative of a user liking an item), while all others are discarded. This binary dataset is then utilized to train the ItemKNN model within RecBole, focusing on uncovering latent patterns in user-item interactions without relying on explicit numerical ratings. This approach emphasizes whether an interaction occurred, rather than its magnitude, aligning with typical use cases for implicit feedback where only user actions (clicks, views) are tracked.

### Implementation Details
- **Data Conversion**: Ratings are transformed to a binary scale indicating presence or absence of interaction, refining the dataset to only include instances of positive feedback (feedback greater than 3.5 is positive).
- **Model Configuration**: The ItemKNN model is configured to handle this implicit dataset by calculating item similarities based on the presence of user interactions. This setup helps in predicting which items a user might interact with, based on similar items they have interacted with in the past.
- **Execution and Evaluation**: The model is run using RecBole's framework, evaluating its performance on metrics like Recall, MRR, and NDCG, which are crucial for assessing the effectiveness of recommendations based on implicit feedback.

This methodology leverages the strengths of the ItemKNN algorithm in a scenario typical for systems where explicit ratings are not available, making it highly relevant for applications like e-commerce and media streaming platforms.


In [16]:
import pandas as pd

# Load data
data = pd.read_csv('u.data', sep='\t', names=['user_id', 'item_id', 'rating', 'timestamp'])

# Define a threshold
threshold = 3.5

# Convert ratings to binary
data['rating'] = (data['rating'] > threshold).astype(int)

# Filter out non-interactions
data = data[data['rating'] == 1]

data.to_csv('ml100k_implicit.csv', index=False, sep='\t')

data

Unnamed: 0,user_id,item_id,rating,timestamp
5,298,474,1,884182806
7,253,465,1,891628467
11,286,1014,1,879781125
12,200,222,1,876042340
16,122,387,1,879270459
...,...,...,...,...
99988,421,498,1,892241344
99989,495,1091,1,888637503
99990,806,421,1,882388897
99991,676,538,1,892685437


In [21]:
import pandas as pd

# Load your current data
data = pd.read_csv('dataset/implicit_ml-100k/implicit_ml-100k.inter', delimiter='\t')

# Assuming your data includes the columns: user_id, item_id, rating, and optionally timestamp
# Check what columns are actually in your data
print(data.columns)

# Save it back with the correct header
# Make sure to include all columns that exist in your data
data.to_csv('dataset/implicit_ml-100k/implicit_ml-100k.inter', sep='\t', index=False,
            header=['user_id:token', 'item_id:token', 'rating:float', 'timestamp:float'])


Index(['user_id', 'item_id', 'rating', 'timestamp'], dtype='object')


In [22]:
from recbole.quick_start import run_recbole

# Configuration for running ItemKNN Model with Implicit Feedback
config_dict_implicit = {
    'model': 'ItemKNN',
    'dataset': 'implicit_ml-100k',
    'data_path': '/home/stef/russmann/dataset/',  # Adjust this path as necessary
    'similarity': 'cosine',
    'k': 20,
    'eval_setting': 'RO_RS',
    'metrics': ['Recall', 'MRR', 'NDCG', 'Hit', 'Precision'],
    'valid_metric': 'MRR@10',
    'topk': 10,
    'early_stop': 5,
    'gpu_id': 0,
}

# Run the model
run_recbole(model='ItemKNN', dataset='implicit_ml-100k', config_dict=config_dict_implicit)


14 Apr 16:51    INFO  ['/home/stef/.local/lib/python3.10/site-packages/ipykernel_launcher.py', '-f', '/home/stef/.local/share/jupyter/runtime/kernel-ce844cad-8ee2-499f-8f1d-229165b49f0f.json']
14 Apr 16:51    INFO  
General Hyper Parameters:
gpu_id = 0
use_gpu = True
seed = 2020
state = INFO
reproducibility = True
data_path = /home/stef/russmann/dataset/implicit_ml-100k
checkpoint_dir = saved
show_progress = True
save_dataset = False
dataset_save_path = None
save_dataloaders = False
dataloaders_save_path = None
log_wandb = False

Training Hyper Parameters:
epochs = 300
train_batch_size = 2048
learner = adam
learning_rate = 0.001
train_neg_sample_args = {'distribution': 'uniform', 'sample_num': 1, 'alpha': 1.0, 'dynamic': False, 'candidate_num': 0}
eval_step = 1
stopping_step = 10
clip_grad_norm = None
weight_decay = 0.0
loss_decimal_place = 4

Evaluation Hyper Parameters:
eval_args = {'split': {'RS': [0.8, 0.1, 0.1]}, 'order': 'RO', 'group_by': 'user', 'mode': {'valid': 'full', 'test':

{'best_valid_score': 0.2794,
 'valid_score_bigger': True,
 'best_valid_result': OrderedDict([('recall@10', 0.2241),
              ('mrr@10', 0.2794),
              ('ndcg@10', 0.1797),
              ('hit@10', 0.5669),
              ('precision@10', 0.0923)]),
 'test_result': OrderedDict([('recall@10', 0.221),
              ('mrr@10', 0.2908),
              ('ndcg@10', 0.1877),
              ('hit@10', 0.569),
              ('precision@10', 0.0993)])}