# Transformers4Rec:
Expected Runtime:

For ml-100k (100,000 interactions) and with the configurations (10 epochs, batch size of 128):


*   On a GPU (e.g., NVIDIA Tesla T4): It may take about 3–8 minutes.
*   On a CPU (e.g., 8-core): It could range from 10–20 minutes
*   GPU (Google Colab) T4 GPU: ~2–5 minutes
*   GPU (Google Colab) L4 GPU: ~1.5–4 minutes
*   GPU (Google Colab) A100 GPU: ~1–2 minutes






**Model 1: Using Transformers4Rec**

Transformers4Rec by NVIDIA: Integrates with Hugging Face Transformers, enabling the application of transformer architectures to sequential and session-based recommendation tasks.


This data set consists of:

*   100,000 ratings (1-5) from 943 users on 1682 movies.
*   Each user has rated at least 20 movies.
*  Simple demographic info for the users (age, gender, occupation, zip)

In [None]:
# Step 1: Mount Google Drive (if necessary)
from google.colab import drive
drive.mount('/content/drive')

# Step 2: Import necessary libraries
import pandas as pd
import os
from transformers4rec.torch import TabularDataModule
from transformers4rec.torch.models import transformer as tfr
from transformers4rec.config.trainer import T4RecTrainer
import torch
from transformers4rec.config.schema import Schema, Tags  # Correct schema module for CPU version

# Step 3: Set up the path for the dataset
dataset_path = "/content/drive/My Drive/ml-100k/"  # Update this path to match your drive location

# Step 4: Load the MovieLens 100K dataset into a pandas DataFrame
# Reading u.data file
data_file = os.path.join(dataset_path, "u.data")
columns = ['user_id', 'item_id', 'rating', 'timestamp']

# Load the u.data file as a DataFrame
df = pd.read_csv(data_file, sep='\t', names=columns)

# Optional: Load item metadata from u.item
item_file = os.path.join(dataset_path, "u.item")
item_columns = ['item_id', 'movie_title', 'release_date', 'video_release_date', 'IMDb_URL', 'unknown', 'Action', 'Adventure', 'Animation', 'Children\'s', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy', 'Film-Noir', 'Horror', 'Musical', 'Mystery', 'Romance', 'Sci-Fi', 'Thriller', 'War', 'Western']

items_df = pd.read_csv(item_file, sep='|', encoding='latin-1', names=item_columns)

# Step 5: Define the schema based on the dataset structure
schema = Schema([
    Schema.Column('user_id', Tags.USER),
    Schema.Column('item_id', Tags.ITEM),
    Schema.Column('timestamp', Tags.CONTINUOUS),
    Schema.Column('rating', Tags.TARGET),
    Schema.Column('movie_title', Tags.ITEM)  # Optional: if you include movie titles
])

# Step 6: Prepare the DataLoader
# Create a DataLoader for Transformers4Rec
data_loader = TabularDataModule(
    schema=schema,
    dataframe=df,
    batch_size=32,
    max_sequence_length=100
)

# Step 7: Set up the Transformer-based recommendation model
# Configure the transformer block
transformer_config = tfr.TransformerBlock(
    d_model=64,  # Embedding dimension
    n_head=4,    # Number of attention heads
    num_layers=2  # Number of transformer layers
)

# Build the model configuration
model_config = tfr.ModelConfig(
    prediction_task="next-item",  # We're predicting the next item for the user
    schema=schema,
    model_block=transformer_config,
    embedding_dim_default=64
)

# Define the recommendation model using the configuration
recommendation_model = tfr.RecommendationModel(model_config)

# Step 8: Train the model using T4RecTrainer
# Set up the trainer
trainer = T4RecTrainer(
    model=recommendation_model,
    datamodule=data_loader,
    max_epochs=5,  # Number of training epochs
    accelerator='cpu'  # Force CPU if needed
)

# Train the model
trainer.fit()

# Step 9: Make Recommendations
# After the model is trained, let's make predictions based on some user sessions.
# Example user session data
user_session = {
    'user_id': [1],  # Example user
    'item_id': [1, 2, 3],  # Example previous items interacted with by the user
    'timestamp': [978300760, 978301968, 978302174]  # Example timestamps
}

# Convert the session data to a DataFrame
user_session_df = pd.DataFrame(user_session)

# Convert the input session to the correct format
input_data = data_load


In [3]:
!pip uninstall pandas dask cudf dask-cudf merlin transformers4rec -y

Found existing installation: pandas 1.5.3
Uninstalling pandas-1.5.3:
  Successfully uninstalled pandas-1.5.3
Found existing installation: dask 2024.1.1
Uninstalling dask-2024.1.1:
  Successfully uninstalled dask-2024.1.1
[0mFound existing installation: transformers4rec 23.12.0
Uninstalling transformers4rec-23.12.0:
  Successfully uninstalled transformers4rec-23.12.0


In [None]:
# Install RAPIDS 23.08 including cudf, dask-cudf, cuml, and other dependencies
!pip install --extra-index-url=https://pypi.nvidia.com cudf-cu11==23.08 dask-cudf-cu11==23.08 cuml-cu11==23.08 \
  cugraph-cu11==23.08 pylibcugraph-cu11==23.08 rapids-dask-dependency==23.08 dask-cuda==23.08

In [None]:
!pip install transformers4rec[torch-gpu]==23.8

In [3]:
!pip install pandas==1.5.3



In [None]:
try:
    from transformers4rec.torch import TabularDataModule
    from transformers4rec.torch.models import transformer as tfr
    import cudf
    import pandas as pd
    print(f"Transformers4Rec and cuDF successfully imported with GPU support!\ncuDF version: {cudf.__version__}\npandas version: {pd.__version__}")
except ImportError as e:
    print(f"Error: {e}")
