# Neural Collaborative Filtering (NCF) - Complete Step-by-Step Tutorial

## ⚠️ Important Notes Before Starting

**If you encounter multiprocessing/pickling errors:**
1. **Restart the kernel** (Kernel → Restart Kernel)
2. **Re-run all cells from the beginning** (Cell → Run All)
3. This is necessary because DataLoaders with `num_workers > 0` cause issues in Jupyter notebooks
4. The notebook is configured with `num_workers=0` to avoid these errors

**Best Practice:**
- Run cells sequentially from top to bottom
- If you modify any cell, restart kernel and re-run from that point
- This ensures all variables are properly initialized

---

## Introduction

This notebook provides a comprehensive, step-by-step walkthrough of the Neural Collaborative Filtering (NCF) implementation. NCF is a deep learning approach to collaborative filtering for recommendation systems, introduced in the paper "Neural Collaborative Filtering" by He et al. at WWW'17.

### What is Neural Collaborative Filtering?

Traditional collaborative filtering methods (like Matrix Factorization) use linear models to learn user-item interactions. NCF extends this by using neural networks to model non-linear interactions between users and items, potentially capturing more complex patterns in user preferences.

### Key Components of NCF:

1. **GMF (Generalized Matrix Factorization)**: Uses element-wise product of user and item embeddings (similar to traditional matrix factorization)
2. **MLP (Multi-Layer Perceptron)**: Uses deep neural networks to learn non-linear user-item interactions
3. **NeuMF (Neural Matrix Factorization)**: Combines both GMF and MLP to leverage the strengths of both approaches

### Project Structure:

- `config.py`: Configuration settings (dataset paths, model type, etc.)
- `data_utils.py`: Data loading and preprocessing utilities
- `model.py`: NCF neural network architecture
- `evaluate.py`: Evaluation metrics (Hit Rate and NDCG)
- `main.py`: Main training script

---

## Step 1: Environment Setup and Imports

In this first step, we'll set up our environment by importing all necessary libraries and understanding what each one does.


In [1]:
# ============================================================================
# STEP 1: ENVIRONMENT SETUP AND IMPORTS
# ============================================================================

"""
This cell imports all the necessary libraries for our NCF implementation.
Let's understand what each library does:

1. os: Operating system interface - used for file path operations and environment variables
2. time: Time-related functions - used to measure training time
3. numpy: Numerical computing library - used for array operations and mathematical functions
4. pandas: Data manipulation library - used to load and process CSV data files
5. scipy.sparse: Sparse matrix operations - used to efficiently store user-item interaction matrices
6. torch: PyTorch deep learning framework - the core library for building and training neural networks
7. torch.nn: Neural network modules - provides layers, loss functions, and activation functions
8. torch.optim: Optimization algorithms - provides optimizers like Adam, SGD
9. torch.utils.data: Data loading utilities - provides Dataset and DataLoader classes
10. torch.backends.cudnn: CUDA deep neural network library backend - optimizes GPU operations

NOTE: We will write ALL code directly in this notebook - no external module imports!
This makes it easier to understand and modify each component.
"""

import os
import time
import numpy as np
import pandas as pd
import scipy.sparse as sp
import urllib.request
import zipfile
import shutil
import urllib

import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as data
import torch.backends.cudnn as cudnn
import torch.nn.functional as F

print("✓ All libraries imported successfully!")
print(f"✓ PyTorch version: {torch.__version__}")
print(f"✓ CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"✓ CUDA device: {torch.cuda.get_device_name(0)}")
else:
    print("⚠ CUDA not available - training will be slower on CPU")


✓ All libraries imported successfully!
✓ PyTorch version: 2.8.0
✓ CUDA available: False
⚠ CUDA not available - training will be slower on CPU


### Explanation of Step 1:

**Why these imports matter:**

1. **Standard Libraries (os, time)**: 
   - `os` helps us manage file paths and set environment variables (like CUDA device selection)
   - `time` allows us to track how long training takes, which is important for performance monitoring

2. **Data Processing Libraries (numpy, pandas, scipy.sparse, urllib.request, zipfile, shutil)**:
   - `numpy`: The foundation for numerical computing in Python. All PyTorch tensors can be converted to/from numpy arrays
   - `pandas`: Makes it easy to load CSV files containing user-item interactions
   - `scipy.sparse`: Critical for recommendation systems! User-item interaction matrices are typically very sparse (most users haven't interacted with most items). Sparse matrices save memory and computation time
   - `urllib.request`: Used to download dataset files from the internet
   - `zipfile`: Used to extract compressed dataset files if needed
   - `shutil`: Used for file operations like moving/copying downloaded files

3. **PyTorch Core (torch, torch.nn, torch.optim, torch.utils.data, torch.nn.functional)**:
   - `torch`: The main PyTorch library - provides tensor operations, automatic differentiation, and GPU support
   - `torch.nn`: Contains pre-built neural network layers (Embedding, Linear, Dropout, etc.) and loss functions
   - `torch.optim`: Provides optimization algorithms (Adam, SGD) that update model weights during training
   - `torch.utils.data`: Provides `Dataset` and `DataLoader` classes that handle batching, shuffling, and parallel data loading
   - `torch.nn.functional`: Contains functional versions of neural network operations (activation functions, etc.)

4. **CUDA Optimization (torch.backends.cudnn)**:
   - `cudnn` is NVIDIA's library for deep neural network operations
   - Enabling `cudnn.benchmark = True` allows PyTorch to optimize convolution operations for your specific GPU, making training faster

**Important Note:**
- **We will NOT import any custom modules** - all code (configuration, data loading, model architecture, evaluation) will be written directly in this notebook
- This approach makes it easier to understand, modify, and experiment with each component
- Everything will be self-contained and fully explained step by step

**What happens when you run this cell:**
- All libraries are loaded into memory
- We check if CUDA (GPU) is available, which will significantly speed up training
- If CUDA is available, we can use GPU acceleration; otherwise, training will run on CPU (much slower)

---

**✅ Step 1 Complete!** 

You've successfully set up the environment. In the next step, we'll define the configuration settings and hyperparameters directly in the notebook.

---

**⏸️ PAUSE: Please review Step 1 and let me know when you're ready to proceed to Step 2!**


---

## Step 2: Configuration and Hyperparameters

In this step, we'll define all the configuration settings and hyperparameters needed for our NCF model. These settings control:
- Which dataset to use
- Which model architecture to train (MLP, GMF, or NeuMF)
- File paths for data and model saving
- Training hyperparameters (learning rate, batch size, epochs, etc.)
- Hardware settings (GPU selection)

Let's define all these settings step by step.


In [None]:
# ============================================================================
# STEP 2: CONFIGURATION AND HYPERPARAMETERS
# ============================================================================

"""
This cell defines all configuration settings for our NCF implementation.
We'll break it down into several sections for clarity.
"""

# ============================================================================
# 2.1 DATASET CONFIGURATION
# ============================================================================

# Choose which dataset to use
# Options: 'ml-1m' (MovieLens 1M) or 'pinterest-20' (Pinterest dataset)
dataset = 'ml-1m'
assert dataset in ['ml-1m', 'pinterest-20'], \
    f"Dataset must be 'ml-1m' or 'pinterest-20', got '{dataset}'"

print(f"✓ Dataset selected: {dataset}")

# ============================================================================
# 2.2 MODEL ARCHITECTURE CONFIGURATION
# ============================================================================

# Choose which model architecture to use
# Options:
#   - 'MLP': Multi-Layer Perceptron only (non-linear interactions)
#   - 'GMF': Generalized Matrix Factorization only (linear interactions)
#   - 'NeuMF-end': Neural Matrix Factorization trained from scratch (end-to-end)
#   - 'NeuMF-pre': Neural Matrix Factorization with pre-trained GMF and MLP models
model_name = 'NeuMF-end'
assert model_name in ['MLP', 'GMF', 'NeuMF-end', 'NeuMF-pre'], \
    f"Model must be 'MLP', 'GMF', 'NeuMF-end', or 'NeuMF-pre', got '{model_name}'"

print(f"✓ Model architecture: {model_name}")

# ============================================================================
# 2.3 DATA AND MODEL PATHS CONFIGURATION
# ============================================================================

# Data will be downloaded automatically during training
# We'll create a local data directory to store downloaded files

data_dir = os.path.join(os.path.dirname(os.getcwd()), '..', 'data')
if not os.path.exists(data_dir):
    os.makedirs(data_dir)

# Model saving directory
model_path = os.path.join(os.path.dirname(os.getcwd()), '..', 'models')
if not os.path.exists(model_path):
    os.makedirs(model_path)

GMF_model_path = os.path.join(model_path, 'GMF.pth')
MLP_model_path = os.path.join(model_path, 'MLP.pth')
NeuMF_model_path = os.path.join(model_path, 'NeuMF.pth')

print(f"✓ Directories configured")
print(f"  - Data directory: {data_dir} (will be created/used for downloaded data)")
print(f"  - Model save path: {model_path}")

# ============================================================================
# 2.4 TRAINING HYPERPARAMETERS
# ============================================================================

# Learning rate: Controls how big steps the optimizer takes during training
# Too high: training might be unstable or diverge
# Too low: training will be very slow
# Typical range: 0.0001 to 0.01
learning_rate = 0.001

# Dropout rate: Regularization technique to prevent overfitting
# Randomly sets some neurons to zero during training
# Range: 0.0 (no dropout) to 0.9 (very aggressive dropout)
# 0.0 means no dropout (all neurons active)
dropout_rate = 0.0

# Batch size: Number of training examples processed together in one iteration
# Larger batch size: more stable gradients, but requires more memory
# Smaller batch size: less memory, but noisier gradients
# Typical values: 64, 128, 256, 512
batch_size = 256

# Number of training epochs: How many times we'll iterate through the entire dataset
# More epochs: better learning, but risk of overfitting
# Too few epochs: model might not learn enough
epochs = 20

# Top-K for evaluation: When evaluating, we recommend top K items to each user
# We measure if the true item is in the top K recommendations
# Common values: 5, 10, 20
top_k = 10

# Factor number: Dimension of the embedding vectors for users and items
# Larger: more capacity to learn complex patterns, but more parameters
# Smaller: fewer parameters, faster training, but less capacity
# Common values: 8, 16, 32, 64
factor_num = 32

# Number of MLP layers: Depth of the Multi-Layer Perceptron component
# More layers: can learn more complex non-linear patterns
# Fewer layers: simpler model, faster training
# Typical range: 1 to 5 layers
num_layers = 3

# Number of negative samples for training: For each positive (user, item) pair,
# we sample this many negative items (items the user hasn't interacted with)
# More negatives: better learning signal, but slower training
# Fewer negatives: faster training, but potentially weaker learning
# Common values: 1, 4, 8
num_ng = 4

# Number of negative samples for testing: During evaluation, for each test item,
# we also provide this many negative items. The model should rank the true item higher.
# Typically 99 negatives + 1 positive = 100 items total per test case
test_num_ng = 99

# Whether to save the trained model
save_model = True

# GPU device ID: Which GPU to use (if multiple GPUs available)
# Set to "0" for first GPU, "1" for second, etc.
# Set to "-1" or use CPU if no GPU available
gpu_id = "0"

print(f"\n✓ Training hyperparameters configured:")
print(f"  - Learning rate: {learning_rate}")
print(f"  - Dropout rate: {dropout_rate}")
print(f"  - Batch size: {batch_size}")
print(f"  - Epochs: {epochs}")
print(f"  - Top-K for evaluation: {top_k}")
print(f"  - Embedding dimension (factor_num): {factor_num}")
print(f"  - MLP layers: {num_layers}")
print(f"  - Training negative samples: {num_ng}")
print(f"  - Test negative samples: {test_num_ng}")
print(f"  - Save model: {save_model}")
print(f"  - GPU ID: {gpu_id}")

# ============================================================================
# 2.5 GPU CONFIGURATION
# ============================================================================

# Set which GPU to use (if available)
os.environ["CUDA_VISIBLE_DEVICES"] = gpu_id

# Enable cuDNN benchmarking for faster training (if using GPU)
# This allows PyTorch to optimize operations for your specific GPU
cudnn.benchmark = True

print(f"\n✓ GPU configuration set")
if torch.cuda.is_available():
    print(f"  - Using GPU: {torch.cuda.get_device_name(0)}")
else:
    print(f"  - Using CPU (GPU not available)")


✓ Dataset selected: ml-1m
✓ Model architecture: NeuMF-end
✓ Directories configured
  - Data directory: ./data/ (will be created/used for downloaded data)
  - Model save path: ./models/

✓ Training hyperparameters configured:
  - Learning rate: 0.001
  - Dropout rate: 0.0
  - Batch size: 256
  - Epochs: 20
  - Top-K for evaluation: 10
  - Embedding dimension (factor_num): 32
  - MLP layers: 3
  - Training negative samples: 4
  - Test negative samples: 99
  - Save model: True
  - GPU ID: 0

✓ GPU configuration set
  - Using CPU (GPU not available)


### Detailed Explanation of Step 2:

#### 2.1 Dataset Configuration

**Why we need this:**
- Different datasets have different characteristics (number of users, items, sparsity)
- The code needs to know which dataset files to load
- MovieLens 1M (`ml-1m`) is a classic movie recommendation dataset
- Pinterest-20 is a larger dataset with different interaction patterns

**What happens:**
- We select which dataset to use
- The assertion ensures we only use supported datasets

---

#### 2.2 Model Architecture Configuration

**The four model types explained:**

1. **MLP (Multi-Layer Perceptron)**:
   - Uses only deep neural networks to learn user-item interactions
   - Captures non-linear patterns
   - Good for complex recommendation scenarios

2. **GMF (Generalized Matrix Factorization)**:
   - Uses only element-wise product of embeddings (like traditional matrix factorization)
   - Captures linear interactions
   - Simpler and faster than MLP

3. **NeuMF-end (Neural Matrix Factorization - End-to-End)**:
   - Combines both GMF and MLP
   - Trained from scratch (no pre-training)
   - Best of both worlds: linear + non-linear interactions
   - This is the recommended approach for most cases

4. **NeuMF-pre (Neural Matrix Factorization - Pre-trained)**:
   - Also combines GMF and MLP
   - But first trains GMF and MLP separately
   - Then initializes NeuMF with these pre-trained weights
   - Usually gives best performance but requires more training time

**Why we use 'NeuMF-end':**
- Good balance between performance and training time
- No need to pre-train separate models
- Still achieves excellent results

---

#### 2.3 Data and Model Paths Configuration

**Data Download Approach:**
- We will **download the data automatically** during training (in Step 3)
- No need to manually download or specify data paths
- Data will be stored in a local `./data/` directory
- The download function will handle fetching the dataset files

**What data files we'll download:**
- `{dataset}.train.rating`: Training data with user-item pairs
- `{dataset}.test.rating`: Test data with user-item pairs  
- `{dataset}.test.negative`: Negative samples for testing (99 negatives per test case)

**Model paths:**
- Where to save trained models
- Separate paths for GMF, MLP, and NeuMF models
- Models will be saved in `./models/` directory

---

#### 2.4 Training Hyperparameters - Deep Dive

**Learning Rate (0.001):**
- Controls step size in gradient descent
- Too high: loss might explode or oscillate
- Too low: training takes forever
- 0.001 is a safe default for Adam optimizer

**Dropout Rate (0.0):**
- Regularization to prevent overfitting
- 0.0 means no dropout (all neurons always active)
- Increase to 0.2-0.5 if you see overfitting (training loss decreases but test performance doesn't improve)

**Batch Size (256):**
- Number of examples processed together
- Larger = more stable, but needs more memory
- 256 is a good balance for most GPUs

**Epochs (20):**
- One epoch = one full pass through training data
- 20 epochs is usually enough, but depends on dataset size
- Monitor validation metrics to stop early if needed

**Top-K (10):**
- Evaluation metric: "Is the true item in top 10 recommendations?"
- Common in recommendation systems (users only see top results)
- We'll compute Hit Rate@10 and NDCG@10

**Factor Num (32):**
- Embedding dimension for users and items
- Each user/item gets a 32-dimensional vector
- 32 is a good default (not too small, not too large)

**Num Layers (3):**
- Depth of MLP component
- 3 layers means: input → hidden1 → hidden2 → hidden3 → output
- More layers = more complex patterns, but harder to train

**Num NG (4):**
- For each positive (user, item) pair, sample 4 negative items
- Creates 4 negative examples per positive example
- Helps model learn to distinguish good vs bad recommendations

**Test Num NG (99):**
- During evaluation: 1 positive item + 99 negative items = 100 total
- Model should rank the positive item in top 10
- This simulates real-world recommendation scenario

---

#### 2.5 GPU Configuration

**CUDA_VISIBLE_DEVICES:**
- Tells PyTorch which GPU to use
- Useful when you have multiple GPUs
- Set to "0" for first GPU, "1" for second, etc.

**cudnn.benchmark:**
- Enables automatic optimization of neural network operations
- PyTorch finds fastest algorithms for your specific GPU
- Only works if input sizes don't change (which is true for our case)

---

**✅ Step 2 Complete!**

All configuration settings are now defined. In the next step, we'll implement the data loading utilities to read and preprocess the dataset.

---

**⏸️ PAUSE: Please review Step 2 and let me know when you're ready to proceed to Step 3!**


---

## Step 3: Data Downloading and Preprocessing

In this step, we'll:
1. Download the MovieLens 1M dataset automatically
2. Process the raw data into the NCF format
3. Split the data into training and testing sets
4. Generate negative samples for evaluation
5. Create the data structures needed for training

The NCF format requires:
- `train.rating`: Training user-item pairs (tab-separated: user_id, item_id)
- `test.rating`: Test user-item pairs (tab-separated: user_id, item_id)
- `test.negative`: Test data with negative samples (format: (user_id, item_id)\tneg1\tneg2\t...\tneg99)

Let's implement this step by step.


In [3]:
# ============================================================================
# STEP 3: DATA DOWNLOADING AND PREPROCESSING
# ============================================================================

"""
This step downloads the MovieLens 1M dataset and processes it into the NCF format.
"""

# ============================================================================
# 3.1 DOWNLOAD MOVIELENS 1M DATASET
# ============================================================================

def download_ml1m_dataset(data_dir='./data'):
    """
    Downloads the MovieLens 1M dataset from the official source.
    
    Parameters:
    - data_dir: Directory where data will be stored
    
    Returns:
    - Path to the ratings.dat file
    """
    # Create data directory if it doesn't exist
    if not os.path.exists(data_dir):
        os.makedirs(data_dir)
    
    # Dataset URL and paths
    dataset_url = 'http://files.grouplens.org/datasets/movielens/ml-1m.zip'
    zip_path = os.path.join(data_dir, 'ml-1m.zip')
    extract_path = os.path.join(data_dir, 'ml-1m')
    
    # The zip file contains a folder 'ml-1m', so after extraction:
    # Option 1: ./data/ml-1m/ratings.dat (if zip extracts to data_dir)
    # Option 2: ./data/ml-1m/ml-1m/ratings.dat (if zip structure is nested)
    # Let's check both possibilities
    ratings_file_option1 = os.path.join(extract_path, 'ratings.dat')
    ratings_file_option2 = os.path.join(extract_path, 'ml-1m', 'ratings.dat')
    
    # Check if already downloaded and extracted
    if os.path.exists(ratings_file_option1):
        print(f"✓ Dataset already exists at {ratings_file_option1}")
        return ratings_file_option1
    elif os.path.exists(ratings_file_option2):
        print(f"✓ Dataset already exists at {ratings_file_option2}")
        return ratings_file_option2
    
    # Download the dataset
    if not os.path.exists(zip_path):
        print(f"Downloading MovieLens 1M dataset from {dataset_url}...")
        print("This may take a few minutes...")
        urllib.request.urlretrieve(dataset_url, zip_path)
        print("✓ Download complete!")
    else:
        print(f"✓ Zip file already exists at {zip_path}")
    
    # Extract the dataset
    if not os.path.exists(ratings_file_option1) and not os.path.exists(ratings_file_option2):
        print(f"Extracting {zip_path}...")
        with zipfile.ZipFile(zip_path, 'r') as zip_ref:
            zip_ref.extractall(data_dir)
        print("✓ Extraction complete!")
        
        # Clean up zip file to save space
        if os.path.exists(zip_path):
            os.remove(zip_path)
            print("✓ Removed zip file to save space")
    else:
        print(f"✓ Dataset already extracted")
    
    # Find the ratings file (check both possible locations)
    ratings_file = None
    if os.path.exists(ratings_file_option1):
        ratings_file = ratings_file_option1
    elif os.path.exists(ratings_file_option2):
        ratings_file = ratings_file_option2
    else:
        # If still not found, search for ratings.dat in the extracted directory
        for root, dirs, files in os.walk(extract_path):
            if 'ratings.dat' in files:
                ratings_file = os.path.join(root, 'ratings.dat')
                break
    
    # Verify the ratings file exists
    if not ratings_file or not os.path.exists(ratings_file):
        raise FileNotFoundError(
            f"Expected ratings file not found. Checked:\n"
            f"  - {ratings_file_option1}\n"
            f"  - {ratings_file_option2}\n"
            f"  - Searched in {extract_path}"
        )
    
    print(f"✓ Dataset ready. Ratings file at: {ratings_file}")
    return ratings_file

# Download the dataset
print("=" * 70)
print("STEP 3.1: Downloading MovieLens 1M Dataset")
print("=" * 70)
ratings_file = download_ml1m_dataset(data_dir)


STEP 3.1: Downloading MovieLens 1M Dataset
✓ Dataset already exists at ./data/ml-1m/ratings.dat


In [4]:
# ============================================================================
# 3.2 LOAD AND PREPROCESS THE DATA
# ============================================================================

def preprocess_ml1m_to_ncf_format(ratings_file, data_dir, test_ratio=0.2, test_negatives=99):
    """
    Processes the MovieLens 1M dataset into NCF format.
    
    Parameters:
    - ratings_file: Path to the ratings.dat file
    - data_dir: Directory to save processed files
    - test_ratio: Ratio of data to use for testing (default 0.2 = 20%)
    - test_negatives: Number of negative samples per test case (default 99)
    
    Returns:
    - Paths to the generated files
    """
    print("\n" + "=" * 70)
    print("STEP 3.2: Preprocessing Data into NCF Format")
    print("=" * 70)
    
    # Load ratings data
    print("Loading ratings data...")
    ratings = pd.read_csv(
        ratings_file,
        sep='::',
        engine='python',
        names=['UserID', 'MovieID', 'Rating', 'Timestamp'],
        dtype={'UserID': np.int32, 'MovieID': np.int32, 'Rating': np.float32, 'Timestamp': np.int32}
    )
    
    print(f"✓ Loaded {len(ratings)} ratings")
    print(f"  - Unique users: {ratings['UserID'].nunique()}")
    print(f"  - Unique movies: {ratings['MovieID'].nunique()}")
    
    # Filter ratings >= 4 (positive interactions)
    # In recommendation systems, we typically treat ratings >= 4 as positive
    print("\nFiltering positive interactions (ratings >= 4)...")
    positive_ratings = ratings[ratings['Rating'] >= 4].copy()
    print(f"✓ {len(positive_ratings)} positive interactions (out of {len(ratings)} total)")
    
    # Remap user and item IDs to be contiguous (0-indexed)
    print("\nRemapping user and item IDs to be contiguous...")
    unique_users = sorted(positive_ratings['UserID'].unique())
    unique_items = sorted(positive_ratings['MovieID'].unique())
    
    user_map = {old_id: new_id for new_id, old_id in enumerate(unique_users)}
    item_map = {old_id: new_id for new_id, old_id in enumerate(unique_items)}
    
    positive_ratings['user'] = positive_ratings['UserID'].map(user_map)
    positive_ratings['item'] = positive_ratings['MovieID'].map(item_map)
    
    user_num = len(unique_users)
    item_num = len(unique_items)
    print(f"✓ Remapped to {user_num} users and {item_num} items")
    
    # Create user-item pairs
    user_item_pairs = positive_ratings[['user', 'item']].values
    
    # Split into train and test sets
    print(f"\nSplitting data (train: {1-test_ratio:.0%}, test: {test_ratio:.0%})...")
    np.random.seed(42)  # For reproducibility
    n_total = len(user_item_pairs)
    n_test = int(n_total * test_ratio)
    
    # Shuffle indices
    indices = np.random.permutation(n_total)
    test_indices = indices[:n_test]
    train_indices = indices[n_test:]
    
    train_pairs = user_item_pairs[train_indices]
    test_pairs = user_item_pairs[test_indices]
    
    print(f"✓ Training pairs: {len(train_pairs)}")
    print(f"✓ Test pairs: {len(test_pairs)}")
    
    # Save training data
    train_file = os.path.join(data_dir, 'ml-1m.train.rating')
    print(f"\nSaving training data to {train_file}...")
    train_df = pd.DataFrame(train_pairs, columns=['user', 'item'])
    train_df.to_csv(train_file, sep='\t', header=False, index=False)
    print(f"✓ Saved {len(train_df)} training pairs")
    
    # Create training matrix (for negative sampling)
    print("\nCreating training interaction matrix...")
    train_mat = sp.dok_matrix((user_num, item_num), dtype=np.float32)
    for u, i in train_pairs:
        train_mat[u, i] = 1.0
    print(f"✓ Training matrix created: {train_mat.nnz} interactions")
    
    # Generate test negative samples
    print(f"\nGenerating test negative samples ({test_negatives} negatives per test case)...")
    test_negative_file = os.path.join(data_dir, 'ml-1m.test.negative')
    
    with open(test_negative_file, 'w') as f:
        for u, i in test_pairs:
            # Write the positive pair
            negatives = []
            attempts = 0
            max_attempts = test_negatives * 10  # Safety limit
            
            # Sample negative items (items not in training set for this user)
            while len(negatives) < test_negatives and attempts < max_attempts:
                neg_item = np.random.randint(item_num)
                # Make sure it's not in training set for this user
                if (u, neg_item) not in train_mat:
                    negatives.append(neg_item)
                attempts += 1
            
            # If we couldn't find enough negatives, pad with random items
            while len(negatives) < test_negatives:
                neg_item = np.random.randint(item_num)
                if neg_item not in negatives:
                    negatives.append(neg_item)
            
            # Write in NCF format: (user, item)\tneg1\tneg2\t...\tneg99
            line = f"({u}, {i})" + "\t" + "\t".join(map(str, negatives)) + "\n"
            f.write(line)
    
    print(f"✓ Generated test negative samples: {len(test_pairs)} test cases")
    
    # Save test data (for reference, though NCF mainly uses test.negative)
    test_file = os.path.join(data_dir, 'ml-1m.test.rating')
    test_df = pd.DataFrame(test_pairs, columns=['user', 'item'])
    test_df.to_csv(test_file, sep='\t', header=False, index=False)
    print(f"✓ Saved {len(test_df)} test pairs")
    
    print("\n" + "=" * 70)
    print("✓ Data preprocessing complete!")
    print("=" * 70)
    print(f"Generated files:")
    print(f"  - {train_file}")
    print(f"  - {test_file}")
    print(f"  - {test_negative_file}")
    
    return train_file, test_file, test_negative_file, user_num, item_num, train_mat

# Process the data
train_rating_path, test_rating_path, test_negative_path, user_num, item_num, train_mat = \
    preprocess_ml1m_to_ncf_format(ratings_file, data_dir, test_ratio=0.2, test_negatives=99)

print(f"\n✓ Final statistics:")
print(f"  - Number of users: {user_num}")
print(f"  - Number of items: {item_num}")
print(f"  - Training interactions: {train_mat.nnz}")



STEP 3.2: Preprocessing Data into NCF Format
Loading ratings data...
✓ Loaded 1000209 ratings
  - Unique users: 6040
  - Unique movies: 3706

Filtering positive interactions (ratings >= 4)...
✓ 575281 positive interactions (out of 1000209 total)

Remapping user and item IDs to be contiguous...
✓ Remapped to 6038 users and 3533 items

Splitting data (train: 80%, test: 20%)...
✓ Training pairs: 460225
✓ Test pairs: 115056

Saving training data to ./data/ml-1m.train.rating...
✓ Saved 460225 training pairs

Creating training interaction matrix...
✓ Training matrix created: 460225 interactions

Generating test negative samples (99 negatives per test case)...
✓ Generated test negative samples: 115056 test cases
✓ Saved 115056 test pairs

✓ Data preprocessing complete!
Generated files:
  - ./data/ml-1m.train.rating
  - ./data/ml-1m.test.rating
  - ./data/ml-1m.test.negative

✓ Final statistics:
  - Number of users: 6038
  - Number of items: 3533
  - Training interactions: 460225


### Detailed Explanation of Step 3:

#### 3.1 Downloading the Dataset

**What is MovieLens 1M?**
- A benchmark dataset for recommendation systems
- Contains 1,000,209 ratings from 6,040 users on 3,900 movies
- Ratings are on a scale of 1-5 stars
- Widely used in research and benchmarking

**Download Process:**
1. **Check if data exists**: Avoids redundant downloads
2. **Download zip file**: Fetches from the official GroupLens website
3. **Extract contents**: Unzips the dataset files
4. **Cleanup**: Removes zip file to save disk space
5. **Verification**: Ensures the ratings file exists

**Files in the dataset:**
- `ratings.dat`: User-item ratings (what we need)
- `movies.dat`: Movie information (not used in NCF)
- `users.dat`: User information (not used in NCF)

---

#### 3.2 Data Preprocessing - Deep Dive

**Step 1: Load Ratings**
- Reads the `ratings.dat` file with `::` separator
- Columns: UserID, MovieID, Rating, Timestamp
- We only need UserID, MovieID, and Rating

**Step 2: Filter Positive Interactions**
- **Why filter?**: In implicit feedback (click/no-click), we only have positive signals
- **Threshold**: Ratings >= 4 are considered positive (user liked the item)
- This converts explicit ratings to implicit feedback format
- Result: Binary interaction matrix (1 = liked, 0 = not interacted)

**Step 3: Remap IDs**
- **Why remap?**: Original IDs might not be contiguous (e.g., user IDs: 1, 5, 10, 100...)
- **Benefit**: Contiguous IDs (0, 1, 2, 3...) are more efficient for embeddings
- Creates mapping dictionaries: `{old_id: new_id}`
- Final: `user_num` users (0 to user_num-1) and `item_num` items (0 to item_num-1)

**Step 4: Train-Test Split**
- **Ratio**: 80% training, 20% testing (standard in recommendation systems)
- **Method**: Random shuffle with fixed seed (for reproducibility)
- **Important**: We split user-item pairs, not users or items
- This means some users appear in both train and test sets (realistic scenario)

**Step 5: Create Training Matrix**
- **Sparse matrix**: Most user-item pairs don't exist (sparse)
- **Format**: Dictionary of Keys (DOK) matrix - efficient for sparse data
- **Purpose**: Used during training to avoid sampling negative items that user already interacted with
- **Memory efficient**: Only stores non-zero entries

**Step 6: Generate Test Negative Samples**
- **Format**: For each test (user, item) pair, we provide:
  - 1 positive item (the true item)
  - 99 negative items (items user hasn't interacted with)
- **Evaluation**: Model should rank the positive item in top 10
- **Sampling strategy**: Randomly sample items not in user's training set
- **Output format**: `(user, item)\tneg1\tneg2\t...\tneg99`

**Why 99 negatives?**
- Simulates real-world scenario: recommend 1 item from 100 candidates
- Standard evaluation protocol in recommendation systems
- Makes evaluation more realistic and challenging

---

**✅ Step 3 Complete!**

The data is now ready for training. We have:
- Training data with user-item pairs
- Test data with negative samples
- Training matrix for efficient negative sampling
- User and item counts for model initialization

---

## Step 4: PyTorch Dataset Class and Data Loading

In this step, we'll create:
1. A function to load all data files into memory
2. A PyTorch Dataset class (`NCFData`) for efficient data loading
3. Data loaders for training and testing

The Dataset class handles:
- Loading training and test data
- Negative sampling during training
- Batching and shuffling
- Integration with PyTorch's DataLoader

Let's implement this step by step.


In [5]:
# ============================================================================
# STEP 4: PYTORCH DATASET CLASS AND DATA LOADING
# ============================================================================

"""
This step creates the PyTorch Dataset class and data loading functions.
"""

# ============================================================================
# 4.1 LOAD ALL DATA FILES
# ============================================================================

def load_all_data(train_rating_path, test_negative_path):
    """
    Loads all data files into memory for efficient access during training.
    
    This function loads:
    1. Training data: user-item pairs from train.rating file
    2. Test data: user-item pairs with negative samples from test.negative file
    3. Training matrix: sparse matrix for efficient negative sampling
    
    Parameters:
    - train_rating_path: Path to the training rating file
    - test_negative_path: Path to the test negative file
    
    Returns:
    - train_data: List of [user, item] pairs for training
    - test_data: List of [user, item] pairs for testing (includes negatives)
    - user_num: Total number of users
    - item_num: Total number of items
    - train_mat: Sparse matrix of training interactions
    """
    print("=" * 70)
    print("STEP 4.1: Loading Data Files")
    print("=" * 70)
    
    # Load training data
    print(f"Loading training data from {train_rating_path}...")
    train_data = pd.read_csv(
        train_rating_path,
        sep='\t',
        header=None,
        names=['user', 'item'],
        usecols=[0, 1],
        dtype={0: np.int32, 1: np.int32}
    )
    
    # Calculate number of users and items
    user_num = train_data['user'].max() + 1
    item_num = train_data['item'].max() + 1
    
    print(f"✓ Loaded {len(train_data)} training pairs")
    print(f"  - Users: {user_num}")
    print(f"  - Items: {item_num}")
    
    # Convert to list of lists for easier processing
    train_data = train_data.values.tolist()
    
    # Create sparse training matrix (Dictionary of Keys format)
    # This is used to quickly check if a user-item pair exists in training data
    print("\nCreating training interaction matrix...")
    train_mat = sp.dok_matrix((user_num, item_num), dtype=np.float32)
    for u, i in train_data:
        train_mat[u, i] = 1.0
    print(f"✓ Training matrix created: {train_mat.nnz} interactions")
    
    # Load test data with negative samples
    print(f"\nLoading test data from {test_negative_path}...")
    test_data = []
    with open(test_negative_path, 'r') as fd:
        line = fd.readline()
        while line is not None and line != '':
            # Format: (user, item)\tneg1\tneg2\t...\tneg99
            arr = line.strip().split('\t')
            
            # Parse the positive pair: (user, item)
            # eval() converts string "(123, 456)" to tuple (123, 456)
            positive_pair = eval(arr[0])
            u = positive_pair[0]
            i = positive_pair[1]
            
            # Add the positive pair
            test_data.append([u, i])
            
            # Add all negative items for this user
            for neg_item in arr[1:]:
                if neg_item:  # Skip empty strings
                    test_data.append([u, int(neg_item)])
            
            line = fd.readline()
    
    print(f"✓ Loaded {len(test_data)} test pairs (including negatives)")
    
    print("\n" + "=" * 70)
    print("✓ Data loading complete!")
    print("=" * 70)
    
    return train_data, test_data, user_num, item_num, train_mat

# Load all data
train_data, test_data, user_num, item_num, train_mat = load_all_data(
    train_rating_path, test_negative_path
)

print(f"\n✓ Final data statistics:")
print(f"  - Training pairs: {len(train_data)}")
print(f"  - Test pairs: {len(test_data)}")
print(f"  - Users: {user_num}")
print(f"  - Items: {item_num}")


STEP 4.1: Loading Data Files
Loading training data from ./data/ml-1m.train.rating...
✓ Loaded 460225 training pairs
  - Users: 6038
  - Items: 3533

Creating training interaction matrix...
✓ Training matrix created: 460225 interactions

Loading test data from ./data/ml-1m.test.negative...
✓ Loaded 11505600 test pairs (including negatives)

✓ Data loading complete!

✓ Final data statistics:
  - Training pairs: 460225
  - Test pairs: 11505600
  - Users: 6038
  - Items: 3533


In [6]:
# ============================================================================
# 4.2 CREATE PYTORCH DATASET CLASS
# ============================================================================

class NCFData(data.Dataset):
    """
    PyTorch Dataset class for Neural Collaborative Filtering.
    
    This class handles:
    - Loading user-item pairs
    - Negative sampling during training
    - Providing data to PyTorch DataLoader
    
    Key concepts:
    - Positive samples: Real user-item interactions (label = 1)
    - Negative samples: Random items user hasn't interacted with (label = 0)
    - During training: We mix positives and negatives
    - During testing: We only use the provided test data
    """
    
    def __init__(self, features, num_item, train_mat=None, num_ng=0, is_training=None):
        """
        Initialize the NCF Dataset.
        
        Parameters:
        - features: List of [user, item] pairs (positive interactions)
        - num_item: Total number of items (for negative sampling)
        - train_mat: Sparse matrix of training interactions (to avoid sampling existing pairs)
        - num_ng: Number of negative samples per positive sample (for training)
        - is_training: Whether this is training data (True) or test data (False)
        """
        super(NCFData, self).__init__()
        
        # Store positive samples (user-item pairs from training/test data)
        self.features_ps = features
        
        # Store metadata
        self.num_item = num_item
        self.train_mat = train_mat  # Used to check if (user, item) exists
        self.num_ng = num_ng  # Number of negatives per positive
        self.is_training = is_training  # Training or testing mode
        
        # Initialize labels (will be filled during negative sampling)
        self.labels = [0 for _ in range(len(features))]
        
        # These will be populated by ng_sample() during training
        self.features_ng = []  # Negative samples
        self.features_fill = []  # Combined positives + negatives
        self.labels_fill = []  # Labels for combined features
    
    def ng_sample(self):
        """
        Generate negative samples for training.
        
        For each positive (user, item) pair:
        - Sample num_ng random items that the user hasn't interacted with
        - These become negative examples (label = 0)
        
        This function is called once per epoch before training starts.
        """
        assert self.is_training, 'Negative sampling only needed during training'
        
        print(f"Generating {self.num_ng} negative samples per positive pair...")
        self.features_ng = []
        
        # For each positive pair, generate num_ng negative samples
        for x in self.features_ps:
            u = x[0]  # User ID
            # Generate num_ng negative items for this user
            for t in range(self.num_ng):
                # Sample a random item
                j = np.random.randint(self.num_item)
                
                # Make sure this item is NOT in the user's training set
                # Keep sampling until we find a negative item
                while (u, j) in self.train_mat:
                    j = np.random.randint(self.num_item)
                
                # Add this negative sample
                self.features_ng.append([u, j])
        
        # Create labels: 1 for positives, 0 for negatives
        labels_ps = [1 for _ in range(len(self.features_ps))]
        labels_ng = [0 for _ in range(len(self.features_ng))]
        
        # Combine positives and negatives
        self.features_fill = self.features_ps + self.features_ng
        self.labels_fill = labels_ps + labels_ng
        
        print(f"✓ Generated {len(self.features_ng)} negative samples")
        print(f"  - Total samples (positives + negatives): {len(self.features_fill)}")
    
    def __len__(self):
        """
        Return the total number of samples.
        
        During training: (num_ng + 1) * num_positives
          - 1 positive + num_ng negatives per positive pair
        During testing: Just the number of positive pairs
        """
        if self.is_training:
            return (self.num_ng + 1) * len(self.labels)
        else:
            return len(self.features_ps)
    
    def __getitem__(self, idx):
        """
        Get a single sample (user, item, label) by index.
        
        This is called by PyTorch DataLoader to get batches of data.
        
        Parameters:
        - idx: Index of the sample to retrieve
        
        Returns:
        - user: User ID (integer)
        - item: Item ID (integer)
        - label: 1 for positive, 0 for negative (integer)
        """
        # During training: use combined features (positives + negatives)
        # During testing: use only positive features
        if self.is_training:
            features = self.features_fill
            labels = self.labels_fill
        else:
            features = self.features_ps
            labels = self.labels
        
        # Get the specific sample
        user = features[idx][0]
        item = features[idx][1]
        label = labels[idx]
        
        return user, item, label

print("=" * 70)
print("STEP 4.2: PyTorch Dataset Class Created")
print("=" * 70)
print("✓ NCFData class defined")
print("  - Handles positive and negative sampling")
print("  - Compatible with PyTorch DataLoader")


STEP 4.2: PyTorch Dataset Class Created
✓ NCFData class defined
  - Handles positive and negative sampling
  - Compatible with PyTorch DataLoader


In [7]:
# ============================================================================
# 4.3 CREATE DATA LOADERS
# ============================================================================

print("=" * 70)
print("STEP 4.3: Creating Data Loaders")
print("=" * 70)
print("⚠ IMPORTANT: If you get multiprocessing/pickling errors,")
print("   restart the kernel and re-run all cells from the beginning.")
print("=" * 70)

# Create training dataset
# num_ng: number of negative samples per positive (defined in Step 2)
train_dataset = NCFData(
    train_data,
    item_num,
    train_mat,
    num_ng=num_ng,  # From Step 2 configuration
    is_training=True
)

# Create test dataset
# num_ng=0: no negative sampling needed (negatives already in test_data)
test_dataset = NCFData(
    test_data,
    item_num,
    train_mat,
    num_ng=0,  # No negative sampling for testing
    is_training=False
)

print(f"✓ Training dataset created: {len(train_dataset)} samples")
print(f"✓ Test dataset created: {len(test_dataset)} samples")

# Create data loaders
# DataLoader handles batching, shuffling, and parallel loading
# 
# CRITICAL: num_workers MUST be 0 for Jupyter notebooks!
# Using num_workers > 0 causes multiprocessing/pickling errors because
# classes defined in notebooks can't be pickled by worker processes.
# If running as a .py script (not notebook), you can use num_workers=4.
train_loader = data.DataLoader(
    train_dataset,
    batch_size=batch_size,  # From Step 2 configuration
    shuffle=True,  # Shuffle training data each epoch
    num_workers=0,  # MUST be 0 for Jupyter notebooks (avoids pickling errors)
    pin_memory=True if torch.cuda.is_available() else False  # Faster GPU transfer
)

test_loader = data.DataLoader(
    test_dataset,
    batch_size=test_num_ng + 1,  # 1 positive + test_num_ng negatives
    shuffle=False,  # Don't shuffle test data
    num_workers=0,  # MUST be 0 for Jupyter notebooks
    pin_memory=True if torch.cuda.is_available() else False
)

# Verify num_workers is 0 (safety check)
assert train_loader.num_workers == 0, f"ERROR: train_loader.num_workers is {train_loader.num_workers}, must be 0!"
assert test_loader.num_workers == 0, f"ERROR: test_loader.num_workers is {test_loader.num_workers}, must be 0!"

print(f"\n✓ Data loaders created:")
print(f"  - Training batch size: {batch_size}")
print(f"  - Test batch size: {test_num_ng + 1} (1 positive + {test_num_ng} negatives)")
print(f"  - Training batches per epoch: {len(train_loader)}")
print(f"  - Test batches: {len(test_loader)}")
print(f"  - num_workers: {train_loader.num_workers} (verified: safe for notebooks)")

# Note: We'll call train_dataset.ng_sample() before each training epoch
# to generate fresh negative samples


STEP 4.3: Creating Data Loaders
⚠ IMPORTANT: If you get multiprocessing/pickling errors,
   restart the kernel and re-run all cells from the beginning.
✓ Training dataset created: 2301125 samples
✓ Test dataset created: 11505600 samples

✓ Data loaders created:
  - Training batch size: 256
  - Test batch size: 100 (1 positive + 99 negatives)
  - Training batches per epoch: 8989
  - Test batches: 115056
  - num_workers: 0 (verified: safe for notebooks)


### Detailed Explanation of Step 4:

#### 4.1 Loading Data Files

**Why load all data at once?**
- Training data is relatively small (fits in memory)
- Loading once is faster than reading from disk repeatedly
- Enables efficient negative sampling

**What we load:**
1. **Training data**: List of [user, item] pairs
2. **Test data**: List of [user, item] pairs (includes 1 positive + 99 negatives per test case)
3. **Training matrix**: Sparse matrix for O(1) lookup of existing interactions

**Training Matrix (DOK format):**
- **DOK = Dictionary of Keys**: Efficient sparse matrix format
- Only stores non-zero entries: `{(user, item): 1.0}`
- Fast lookup: `(u, i) in train_mat` checks if user u interacted with item i
- Memory efficient: Only stores interactions, not entire matrix

---

#### 4.2 PyTorch Dataset Class - Deep Dive

**What is a PyTorch Dataset?**
- A class that implements `__len__()` and `__getitem__()`
- Allows PyTorch to load data in batches
- Handles data preprocessing and sampling

**NCFData Class Components:**

1. **Initialization (`__init__`)**:
   - Stores positive samples (user-item pairs)
   - Stores metadata (num_items, train_mat, etc.)
   - Prepares for negative sampling

2. **Negative Sampling (`ng_sample`)**:
   - **When called**: Once per epoch, before training starts
   - **What it does**: For each positive pair, samples `num_ng` negative items
   - **How it works**:
     - Randomly sample an item
     - Check if (user, item) exists in training matrix
     - If exists, sample again (avoid positive items)
     - If not exists, add as negative sample
   - **Result**: Creates balanced dataset (1 positive : num_ng negatives)

3. **Length (`__len__`)**:
   - Training: `(num_ng + 1) * num_positives`
     - Example: 1000 positives × (4 negatives + 1 positive) = 5000 samples
   - Testing: Just number of test pairs

4. **Get Item (`__getitem__`)**:
   - Returns (user_id, item_id, label)
   - Training: Returns from combined positives + negatives
   - Testing: Returns from test data (already has negatives)

**Why negative sampling?**
- **Problem**: We only have positive interactions (users liked items)
- **Solution**: Generate negative examples (items users haven't interacted with)
- **Benefit**: Model learns to distinguish good vs bad recommendations
- **Ratio**: Typically 1:4 (1 positive, 4 negatives) for training

---

#### 4.3 Data Loaders

**What is a DataLoader?**
- PyTorch utility that batches data from a Dataset
- Handles shuffling, parallel loading, and memory management

**Training DataLoader:**
- **Batch size**: 256 (processes 256 samples at once)
- **Shuffle**: True (randomize order each epoch)
- **Num workers**: 4 (parallel data loading threads)
- **Pin memory**: True if GPU available (faster CPU→GPU transfer)

**Test DataLoader:**
- **Batch size**: 100 (1 positive + 99 negatives)
- **Shuffle**: False (keep test order consistent)
- **Num workers**: 0 (simpler, no parallel loading needed)

**Why different batch sizes?**
- Training: Process many samples efficiently (256)
- Testing: Each batch = 1 test case (100 items to rank)

**Data Flow:**
1. Dataset provides individual samples
2. DataLoader groups samples into batches
3. Model processes batches during training/testing

---

**✅ Step 4 Complete!**

We now have:
- All data loaded into memory
- PyTorch Dataset class for efficient data access
- Data loaders ready for training and testing

---

## Step 5: NCF Model Architecture

In this step, we'll implement the Neural Collaborative Filtering (NCF) model. The NCF architecture combines:

1. **GMF (Generalized Matrix Factorization)**: Linear interactions via element-wise product
2. **MLP (Multi-Layer Perceptron)**: Non-linear interactions via deep neural networks
3. **NeuMF**: Combination of both GMF and MLP

The model learns user and item embeddings and combines them to predict user-item interaction scores.

Let's implement this step by step.


In [8]:
# ============================================================================
# STEP 5: NCF MODEL ARCHITECTURE
# ============================================================================

"""
This step implements the Neural Collaborative Filtering (NCF) model.

The NCF model has three variants:
1. GMF: Only Generalized Matrix Factorization (linear)
2. MLP: Only Multi-Layer Perceptron (non-linear)
3. NeuMF: Neural Matrix Factorization (combines GMF + MLP)
"""

class NCF(nn.Module):
    """
    Neural Collaborative Filtering Model
    
    This model learns user and item embeddings and combines them using
    either GMF (linear) or MLP (non-linear) or both (NeuMF).
    
    Architecture:
    1. Embedding layers: Convert user/item IDs to dense vectors
    2. GMF path: Element-wise product of embeddings (linear interaction)
    3. MLP path: Deep neural network (non-linear interaction)
    4. Prediction layer: Combines GMF and/or MLP outputs to predict score
    """
    
    def __init__(self, user_num, item_num, factor_num, num_layers,
                 dropout, model_name, GMF_model=None, MLP_model=None):
        """
        Initialize the NCF model.
        
        Parameters:
        - user_num: Total number of users
        - item_num: Total number of items
        - factor_num: Dimension of embedding vectors (e.g., 32)
        - num_layers: Number of layers in MLP component
        - dropout: Dropout rate for regularization
        - model_name: 'MLP', 'GMF', 'NeuMF-end', or 'NeuMF-pre'
        - GMF_model: Pre-trained GMF model (for NeuMF-pre)
        - MLP_model: Pre-trained MLP model (for NeuMF-pre)
        """
        super(NCF, self).__init__()
        
        # Store configuration
        self.dropout = dropout
        self.model_name = model_name
        self.GMF_model = GMF_model
        self.MLP_model = MLP_model
        
        # ====================================================================
        # EMBEDDING LAYERS
        # ====================================================================
        # Embeddings convert user/item IDs (integers) to dense vectors
        
        # GMF embeddings: factor_num dimensions
        # Used for Generalized Matrix Factorization (linear interactions)
        if model_name != 'MLP':  # MLP doesn't use GMF
            self.embed_user_GMF = nn.Embedding(user_num, factor_num)
            self.embed_item_GMF = nn.Embedding(item_num, factor_num)
        
        # MLP embeddings: Larger dimension for deeper networks
        # Dimension = factor_num * 2^(num_layers-1)
        # Example: factor_num=32, num_layers=3 → 32 * 2^2 = 128 dimensions
        if model_name != 'GMF':  # GMF doesn't use MLP
            mlp_embed_dim = factor_num * (2 ** (num_layers - 1))
            self.embed_user_MLP = nn.Embedding(user_num, mlp_embed_dim)
            self.embed_item_MLP = nn.Embedding(item_num, mlp_embed_dim)
        
        # ====================================================================
        # MLP LAYERS (Multi-Layer Perceptron)
        # ====================================================================
        # Build MLP with decreasing dimensions
        # Example with factor_num=32, num_layers=3:
        #   Input: 128*2 = 256 (concatenated user + item embeddings)
        #   Layer 1: 256 → 128
        #   Layer 2: 128 → 64
        #   Layer 3: 64 → 32
        #   Output: 32 dimensions
        
        if model_name != 'GMF':  # GMF doesn't use MLP
            MLP_modules = []
            for i in range(num_layers):
                # Calculate input size for this layer
                input_size = factor_num * (2 ** (num_layers - i))
                
                # Add dropout for regularization
                MLP_modules.append(nn.Dropout(p=self.dropout))
                
                # Add linear layer (halves the dimension)
                MLP_modules.append(nn.Linear(input_size, input_size // 2))
                
                # Add ReLU activation (non-linearity)
                MLP_modules.append(nn.ReLU())
            
            # Combine all MLP layers into a sequential module
            self.MLP_layers = nn.Sequential(*MLP_modules)
        
        # ====================================================================
        # PREDICTION LAYER
        # ====================================================================
        # Final layer that outputs the interaction score
        
        if self.model_name in ['MLP', 'GMF']:
            # Single path: just factor_num dimensions
            predict_size = factor_num
        else:
            # NeuMF: concatenate GMF (factor_num) + MLP (factor_num) = 2*factor_num
            predict_size = factor_num * 2
        
        self.predict_layer = nn.Linear(predict_size, 1)
        
        # Initialize weights
        self._init_weight_()
    
    def _init_weight_(self):
        """
        Initialize model weights.
        
        Different initialization strategies:
        - Embeddings: Small random values (std=0.01)
        - MLP layers: Xavier uniform (good for ReLU)
        - Prediction layer: Kaiming uniform (good for sigmoid)
        - Biases: Zero
        """
        if not self.model_name == 'NeuMF-pre':
            # Random initialization for training from scratch
            
            # Embedding initialization: Small random values
            # This prevents embeddings from starting too large
            if hasattr(self, 'embed_user_GMF'):
                nn.init.normal_(self.embed_user_GMF.weight, std=0.01)
            if hasattr(self, 'embed_item_GMF'):
                nn.init.normal_(self.embed_item_GMF.weight, std=0.01)
            if hasattr(self, 'embed_user_MLP'):
                nn.init.normal_(self.embed_user_MLP.weight, std=0.01)
            if hasattr(self, 'embed_item_MLP'):
                nn.init.normal_(self.embed_item_MLP.weight, std=0.01)
            
            # MLP layer initialization: Xavier uniform
            # Good for layers with ReLU activation
            if hasattr(self, 'MLP_layers'):
                for m in self.MLP_layers:
                    if isinstance(m, nn.Linear):
                        nn.init.xavier_uniform_(m.weight)
            
            # Prediction layer initialization: Kaiming uniform
            # Good for layers before sigmoid activation
            nn.init.kaiming_uniform_(self.predict_layer.weight, 
                                    a=1, nonlinearity='sigmoid')
            
            # Initialize all biases to zero
            for m in self.modules():
                if isinstance(m, nn.Linear) and m.bias is not None:
                    m.bias.data.zero_()
        else:
            # Pre-trained initialization (for NeuMF-pre)
            # Copy weights from pre-trained GMF and MLP models
            
            # Copy embedding weights
            self.embed_user_GMF.weight.data.copy_(
                self.GMF_model.embed_user_GMF.weight)
            self.embed_item_GMF.weight.data.copy_(
                self.GMF_model.embed_item_GMF.weight)
            self.embed_user_MLP.weight.data.copy_(
                self.MLP_model.embed_user_MLP.weight)
            self.embed_item_MLP.weight.data.copy_(
                self.MLP_model.embed_item_MLP.weight)
            
            # Copy MLP layer weights
            for (m1, m2) in zip(self.MLP_layers, self.MLP_model.MLP_layers):
                if isinstance(m1, nn.Linear) and isinstance(m2, nn.Linear):
                    m1.weight.data.copy_(m2.weight)
                    m1.bias.data.copy_(m2.bias)
            
            # Combine prediction layer weights from GMF and MLP
            predict_weight = torch.cat([
                self.GMF_model.predict_layer.weight, 
                self.MLP_model.predict_layer.weight], dim=1)
            predict_bias = (self.GMF_model.predict_layer.bias + 
                           self.MLP_model.predict_layer.bias) / 2
            
            self.predict_layer.weight.data.copy_(0.5 * predict_weight)
            self.predict_layer.bias.data.copy_(predict_bias)
    
    def forward(self, user, item):
        """
        Forward pass: Predict user-item interaction scores.
        
        Parameters:
        - user: Tensor of user IDs [batch_size]
        - item: Tensor of item IDs [batch_size]
        
        Returns:
        - prediction: Tensor of predicted scores [batch_size]
        """
        # ====================================================================
        # GMF PATH (Generalized Matrix Factorization)
        # ====================================================================
        # Linear interaction: element-wise product of embeddings
        # Similar to traditional matrix factorization
        
        if self.model_name != 'MLP':
            # Get embeddings
            embed_user_GMF = self.embed_user_GMF(user)  # [batch_size, factor_num]
            embed_item_GMF = self.embed_item_GMF(item)  # [batch_size, factor_num]
            
            # Element-wise product (linear interaction)
            output_GMF = embed_user_GMF * embed_item_GMF  # [batch_size, factor_num]
        
        # ====================================================================
        # MLP PATH (Multi-Layer Perceptron)
        # ====================================================================
        # Non-linear interaction: deep neural network
        
        if self.model_name != 'GMF':
            # Get embeddings
            embed_user_MLP = self.embed_user_MLP(user)  # [batch_size, mlp_dim]
            embed_item_MLP = self.embed_item_MLP(item)   # [batch_size, mlp_dim]
            
            # Concatenate user and item embeddings
            interaction = torch.cat((embed_user_MLP, embed_item_MLP), -1)  # [batch_size, mlp_dim*2]
            
            # Pass through MLP layers (with dropout and ReLU)
            output_MLP = self.MLP_layers(interaction)  # [batch_size, factor_num]
        
        # ====================================================================
        # COMBINE PATHS
        # ====================================================================
        if self.model_name == 'GMF':
            # Only GMF path
            concat = output_GMF
        elif self.model_name == 'MLP':
            # Only MLP path
            concat = output_MLP
        else:
            # NeuMF: Concatenate both paths
            concat = torch.cat((output_GMF, output_MLP), -1)  # [batch_size, factor_num*2]
        
        # ====================================================================
        # PREDICTION
        # ====================================================================
        # Final linear layer outputs interaction score
        prediction = self.predict_layer(concat)  # [batch_size, 1]
        
        # Flatten to [batch_size]
        return prediction.view(-1)

print("=" * 70)
print("STEP 5: NCF Model Architecture")
print("=" * 70)
print("✓ NCF model class defined")
print(f"  - Model type: {model_name}")
print(f"  - Embedding dimension: {factor_num}")
print(f"  - MLP layers: {num_layers}")
print(f"  - Dropout rate: {dropout_rate}")


STEP 5: NCF Model Architecture
✓ NCF model class defined
  - Model type: NeuMF-end
  - Embedding dimension: 32
  - MLP layers: 3
  - Dropout rate: 0.0


In [9]:
# ============================================================================
# 5.2 CREATE AND INITIALIZE THE MODEL
# ============================================================================

print("\n" + "=" * 70)
print("STEP 5.2: Creating and Initializing Model")
print("=" * 70)

# Check if we need pre-trained models (for NeuMF-pre)
if model_name == 'NeuMF-pre':
    # For NeuMF-pre, we would load pre-trained GMF and MLP models
    # For now, we'll use NeuMF-end (training from scratch)
    print("⚠ NeuMF-pre requires pre-trained models.")
    print("  Switching to NeuMF-end (training from scratch)...")
    model_name = 'NeuMF-end'

# Create the model
ncf_model = NCF(
    user_num=user_num,
    item_num=item_num,
    factor_num=factor_num,
    num_layers=num_layers,
    dropout=dropout_rate,
    model_name=model_name,
    GMF_model=None,
    MLP_model=None
)

# Move model to GPU if available
if torch.cuda.is_available():
    ncf_model = ncf_model.cuda()
    print("✓ Model moved to GPU")
else:
    print("✓ Model on CPU")

# Count parameters
total_params = sum(p.numel() for p in ncf_model.parameters())
trainable_params = sum(p.numel() for p in ncf_model.parameters() if p.requires_grad)

print(f"\n✓ Model created successfully!")
print(f"  - Total parameters: {total_params:,}")
print(f"  - Trainable parameters: {trainable_params:,}")

# Print model architecture
print(f"\nModel Architecture:")
print(f"  - Users: {user_num:,}")
print(f"  - Items: {item_num:,}")
if model_name != 'MLP':
    print(f"  - GMF embeddings: {factor_num} dimensions")
if model_name != 'GMF':
    mlp_embed_dim = factor_num * (2 ** (num_layers - 1))
    print(f"  - MLP embeddings: {mlp_embed_dim} dimensions")
    print(f"  - MLP layers: {num_layers} (with dropout={dropout_rate})")
print(f"  - Prediction layer: {factor_num if model_name in ['MLP', 'GMF'] else factor_num * 2} → 1")

# ============================================================================
# 5.3 DETAILED MODEL ARCHITECTURE VISUALIZATION
# ============================================================================

print("\n" + "=" * 70)
print("STEP 5.3: Detailed Model Architecture")
print("=" * 70)

# Print the full model structure
print("\n" + "=" * 70)
print("COMPLETE MODEL STRUCTURE:")
print("=" * 70)
print(ncf_model)
print("=" * 70)

# Print detailed layer information
print("\n" + "=" * 70)
print("LAYER-BY-LAYER BREAKDOWN:")
print("=" * 70)

if model_name != 'MLP':
    print("\n[GMF Path - Generalized Matrix Factorization]")
    print(f"  embed_user_GMF: Embedding({user_num}, {factor_num})")
    print(f"    → Converts user IDs to {factor_num}-dimensional vectors")
    print(f"  embed_item_GMF: Embedding({item_num}, {factor_num})")
    print(f"    → Converts item IDs to {factor_num}-dimensional vectors")
    print(f"  Element-wise product: user_emb * item_emb")
    print(f"    → Output shape: [batch_size, {factor_num}]")

if model_name != 'GMF':
    mlp_embed_dim = factor_num * (2 ** (num_layers - 1))
    print("\n[MLP Path - Multi-Layer Perceptron]")
    print(f"  embed_user_MLP: Embedding({user_num}, {mlp_embed_dim})")
    print(f"    → Converts user IDs to {mlp_embed_dim}-dimensional vectors")
    print(f"  embed_item_MLP: Embedding({item_num}, {mlp_embed_dim})")
    print(f"    → Converts item IDs to {mlp_embed_dim}-dimensional vectors")
    print(f"  Concatenation: [user_emb, item_emb]")
    print(f"    → Output shape: [batch_size, {mlp_embed_dim * 2}]")
    
    print(f"\n  MLP Layers ({num_layers} layers):")
    for i in range(num_layers):
        input_size = factor_num * (2 ** (num_layers - i))
        output_size = input_size // 2
        print(f"    Layer {i+1}:")
        print(f"      Dropout(p={dropout_rate})")
        print(f"      Linear({input_size}, {output_size})")
        print(f"      ReLU()")
        print(f"      → Output shape: [batch_size, {output_size}]")

print("\n[Prediction Layer]")
if model_name == 'GMF':
    predict_input = factor_num
    print(f"  Input: GMF output [{factor_num} dimensions]")
elif model_name == 'MLP':
    predict_input = factor_num
    print(f"  Input: MLP output [{factor_num} dimensions]")
else:  # NeuMF
    predict_input = factor_num * 2
    print(f"  Input: Concatenated GMF + MLP [{factor_num * 2} dimensions]")
    print(f"    → GMF: [{factor_num} dims] + MLP: [{factor_num} dims]")

print(f"  Linear({predict_input}, 1)")
print(f"    → Output: [batch_size, 1] (interaction score)")
print(f"    → Higher score = more likely user will like item")

# Calculate and print parameter breakdown
print("\n" + "=" * 70)
print("PARAMETER BREAKDOWN:")
print("=" * 70)

total_params = 0
if model_name != 'MLP':
    gmf_user_params = user_num * factor_num
    gmf_item_params = item_num * factor_num
    gmf_total = gmf_user_params + gmf_item_params
    total_params += gmf_total
    print(f"\nGMF Embeddings:")
    print(f"  User embeddings: {user_num:,} × {factor_num} = {gmf_user_params:,} parameters")
    print(f"  Item embeddings: {item_num:,} × {factor_num} = {gmf_item_params:,} parameters")
    print(f"  GMF Total: {gmf_total:,} parameters")

if model_name != 'GMF':
    mlp_embed_dim = factor_num * (2 ** (num_layers - 1))
    mlp_user_params = user_num * mlp_embed_dim
    mlp_item_params = item_num * mlp_embed_dim
    mlp_embed_total = mlp_user_params + mlp_item_params
    total_params += mlp_embed_total
    print(f"\nMLP Embeddings:")
    print(f"  User embeddings: {user_num:,} × {mlp_embed_dim} = {mlp_user_params:,} parameters")
    print(f"  Item embeddings: {item_num:,} × {mlp_embed_dim} = {mlp_item_params:,} parameters")
    print(f"  MLP Embeddings Total: {mlp_embed_total:,} parameters")
    
    # MLP layers parameters
    mlp_layer_params = 0
    print(f"\nMLP Layers:")
    for i in range(num_layers):
        input_size = factor_num * (2 ** (num_layers - i))
        output_size = input_size // 2
        layer_params = (input_size * output_size) + output_size  # weights + bias
        mlp_layer_params += layer_params
        print(f"  Layer {i+1} (Linear({input_size}, {output_size})): {layer_params:,} parameters")
    total_params += mlp_layer_params
    print(f"  MLP Layers Total: {mlp_layer_params:,} parameters")

# Prediction layer
if model_name in ['MLP', 'GMF']:
    predict_input = factor_num
else:
    predict_input = factor_num * 2
predict_params = (predict_input * 1) + 1  # weights + bias
total_params += predict_params
print(f"\nPrediction Layer:")
print(f"  Linear({predict_input}, 1): {predict_params:,} parameters")

print("\n" + "=" * 70)
print(f"TOTAL MODEL PARAMETERS: {total_params:,}")
print(f"Model Size (float32): ~{total_params * 4 / 1024 / 1024:.2f} MB")
print("=" * 70)

# Print architecture diagram
print("\n" + "=" * 70)
print("ARCHITECTURE DIAGRAM:")
print("=" * 70)

if model_name == 'NeuMF-end' or model_name == 'NeuMF-pre':
    print("""
    User ID ──┐
              ├─→ Embedding (GMF) ──→ [32] ─┐
              │                                ├─→ Element-wise Product ─→ [32] ─┐
    Item ID ──┤                                │                                  │
              ├─→ Embedding (GMF) ──→ [32] ─┘                                  │
              │                                                                  │
              ├─→ Embedding (MLP) ──→ [128] ─┐                                 │
              │                                 ├─→ Concatenate ─→ [256]         │
              └─→ Embedding (MLP) ──→ [128] ─┘                                 │
                                                                                 │
                                                                                 │
    [256] ─→ Dropout ─→ Linear(256→128) ─→ ReLU ─→ [128]                       │
    [128] ─→ Dropout ─→ Linear(128→64)  ─→ ReLU ─→ [64]                        │
    [64]  ─→ Dropout ─→ Linear(64→32)   ─→ ReLU ─→ [32] ────────────────────────┤
                                                                                 │
                                                                                 ├─→ Concatenate ─→ [64] ─→ Linear(64→1) ─→ Score
                                                                                 │
    """)
elif model_name == 'GMF':
    print("""
    User ID ──→ Embedding ──→ [32] ─┐
                                     ├─→ Element-wise Product ─→ [32] ─→ Linear(32→1) ─→ Score
    Item ID ──→ Embedding ──→ [32] ─┘
    """)
else:  # MLP
    mlp_embed_dim = factor_num * (2 ** (num_layers - 1))
    print(f"""
    User ID ──→ Embedding ──→ [{mlp_embed_dim}] ─┐
                                                 ├─→ Concatenate ─→ [{mlp_embed_dim * 2}]
    Item ID ──→ Embedding ──→ [{mlp_embed_dim}] ─┘
    
    [{mlp_embed_dim * 2}] ─→ Dropout ─→ Linear ─→ ReLU ─→ ... (MLP layers) ... ─→ [32] ─→ Linear(32→1) ─→ Score
    """)

print("=" * 70)



STEP 5.2: Creating and Initializing Model
✓ Model on CPU

✓ Model created successfully!
  - Total parameters: 1,574,657
  - Trainable parameters: 1,574,657

Model Architecture:
  - Users: 6,038
  - Items: 3,533
  - GMF embeddings: 32 dimensions
  - MLP embeddings: 128 dimensions
  - MLP layers: 3 (with dropout=0.0)
  - Prediction layer: 64 → 1

STEP 5.3: Detailed Model Architecture

COMPLETE MODEL STRUCTURE:
NCF(
  (embed_user_GMF): Embedding(6038, 32)
  (embed_item_GMF): Embedding(3533, 32)
  (embed_user_MLP): Embedding(6038, 128)
  (embed_item_MLP): Embedding(3533, 128)
  (MLP_layers): Sequential(
    (0): Dropout(p=0.0, inplace=False)
    (1): Linear(in_features=256, out_features=128, bias=True)
    (2): ReLU()
    (3): Dropout(p=0.0, inplace=False)
    (4): Linear(in_features=128, out_features=64, bias=True)
    (5): ReLU()
    (6): Dropout(p=0.0, inplace=False)
    (7): Linear(in_features=64, out_features=32, bias=True)
    (8): ReLU()
  )
  (predict_layer): Linear(in_features=64

### Detailed Explanation of Step 5:

#### 5.1 Model Architecture Overview

**What is NCF?**
- Neural Collaborative Filtering combines traditional matrix factorization with deep learning
- Learns user and item embeddings (dense vector representations)
- Uses neural networks to model complex user-item interactions

**Three Model Variants:**

1. **GMF (Generalized Matrix Factorization)**:
   - **Path**: Embedding → Element-wise product → Prediction
   - **Interaction**: Linear (element-wise multiplication)
   - **Similar to**: Traditional matrix factorization
   - **Use case**: Simple, fast, good baseline

2. **MLP (Multi-Layer Perceptron)**:
   - **Path**: Embedding → Concatenate → MLP layers → Prediction
   - **Interaction**: Non-linear (deep neural network)
   - **Advantage**: Can learn complex patterns
   - **Use case**: When you need non-linear interactions

3. **NeuMF (Neural Matrix Factorization)**:
   - **Path**: Both GMF and MLP paths → Concatenate → Prediction
   - **Interaction**: Both linear and non-linear
   - **Advantage**: Best of both worlds
   - **Use case**: Best performance (recommended)

---

#### 5.2 Embedding Layers - Deep Dive

**What are Embeddings?**
- Convert discrete IDs (user/item IDs) to continuous vectors
- Each user/item gets a dense vector representation
- These vectors are learned during training

**GMF Embeddings:**
- Dimension: `factor_num` (e.g., 32)
- Purpose: Linear interactions via element-wise product
- Example: User embedding [0.1, 0.5, -0.3, ...] × Item embedding [0.2, 0.1, 0.4, ...]

**MLP Embeddings:**
- Dimension: `factor_num * 2^(num_layers-1)` (e.g., 32 * 2^2 = 128 for 3 layers)
- Purpose: Non-linear interactions via deep network
- Larger dimension allows more complex patterns

**Why Different Embedding Dimensions?**
- GMF: Simple element-wise product, smaller dimension is sufficient
- MLP: Needs larger dimension to learn complex non-linear patterns
- NeuMF: Uses both, so needs separate embeddings for each path

---

#### 5.3 MLP Layers Architecture

**Layer Structure:**
- Each layer: Dropout → Linear → ReLU
- Dimensions decrease by half each layer
- Example with factor_num=32, num_layers=3:
  ```
  Input: 256 (128 user + 128 item embeddings concatenated)
  Layer 1: 256 → 128 (with dropout and ReLU)
  Layer 2: 128 → 64 (with dropout and ReLU)
  Layer 3: 64 → 32 (with dropout and ReLU)
  Output: 32 dimensions
  ```

**Why Decreasing Dimensions?**
- Compresses information progressively
- Forces model to learn important features
- Reduces overfitting

**Dropout:**
- Randomly sets some neurons to zero during training
- Prevents overfitting
- Only active during training (not during evaluation)

**ReLU Activation:**
- Introduces non-linearity
- Allows model to learn complex patterns
- Formula: ReLU(x) = max(0, x)

---

#### 5.4 Forward Pass - Step by Step

**For NeuMF (combines GMF + MLP):**

1. **Input**: User IDs and Item IDs (batch of integers)
   ```
   user = [1, 5, 10, ...]  # batch_size user IDs
   item = [3, 7, 2, ...]   # batch_size item IDs
   ```

2. **GMF Path**:
   ```
   embed_user_GMF = Embedding(user)  # [batch_size, 32]
   embed_item_GMF = Embedding(item)   # [batch_size, 32]
   output_GMF = embed_user_GMF * embed_item_GMF  # Element-wise product
   ```

3. **MLP Path**:
   ```
   embed_user_MLP = Embedding(user)  # [batch_size, 128]
   embed_item_MLP = Embedding(item)   # [batch_size, 128]
   interaction = concat([embed_user_MLP, embed_item_MLP])  # [batch_size, 256]
   output_MLP = MLP_layers(interaction)  # [batch_size, 32]
   ```

4. **Combine**:
   ```
   concat = concat([output_GMF, output_MLP])  # [batch_size, 64]
   ```

5. **Predict**:
   ```
   prediction = Linear(concat)  # [batch_size, 1]
   ```

6. **Output**: Interaction scores (higher = more likely user will like item)

---

#### 5.5 Weight Initialization

**Why Initialize Weights?**
- Random initialization helps training start properly
- Prevents vanishing/exploding gradients
- Different strategies for different layer types

**Initialization Strategies:**

1. **Embeddings**: Normal distribution (std=0.01)
   - Small values prevent large initial gradients
   - Allows gradual learning

2. **MLP Layers**: Xavier uniform
   - Good for ReLU activations
   - Maintains variance across layers

3. **Prediction Layer**: Kaiming uniform
   - Good before sigmoid activation
   - Prevents saturation

4. **Biases**: Zero
   - Common practice
   - Model learns bias values during training

**Pre-trained Initialization (NeuMF-pre):**
- Copies weights from separately trained GMF and MLP models
- Combines their prediction layers
- Usually gives best performance but requires more training time

---

#### 5.6 Model Parameters

**Parameter Count Example:**
- Users: 6,000, Items: 4,000, factor_num=32, num_layers=3
- GMF embeddings: 6,000×32 + 4,000×32 = 320,000
- MLP embeddings: 6,000×128 + 4,000×128 = 1,280,000
- MLP layers: ~50,000
- Prediction: 64×1 = 64
- **Total**: ~1.65 million parameters

**Memory Usage:**
- Each parameter: 4 bytes (float32)
- Model size: ~6.6 MB
- Very efficient for recommendation systems!

---

**✅ Step 5 Complete!**

We now have:
- Complete NCF model architecture
- Model initialized and ready for training
- Understanding of how embeddings and layers work

---

## Step 6: Evaluation Metrics

In this step, we'll implement evaluation metrics to measure how well our model performs:
1. **Hit Rate (HR@K)**: Percentage of test cases where the true item is in top-K recommendations
2. **NDCG (Normalized Discounted Cumulative Gain@K)**: Measures ranking quality, giving more weight to items ranked higher

These metrics are standard in recommendation systems and help us understand model performance.

Let's implement this step by step.


In [10]:
# ============================================================================
# STEP 6: EVALUATION METRICS
# ============================================================================

"""
This step implements evaluation metrics for recommendation systems:
- Hit Rate (HR@K): Binary metric - is the true item in top K?
- NDCG (Normalized Discounted Cumulative Gain@K): Ranking quality metric
"""

# ============================================================================
# 6.1 HIT RATE METRIC
# ============================================================================

def hit(gt_item, pred_items):
    """
    Calculate Hit Rate for a single test case.
    
    Hit Rate is 1 if the ground truth item is in the predicted top-K items,
    otherwise 0.
    
    Parameters:
    - gt_item: Ground truth item ID (the item user actually interacted with)
    - pred_items: List of top-K predicted item IDs (recommended items)
    
    Returns:
    - 1 if gt_item is in pred_items, 0 otherwise
    """
    if gt_item in pred_items:
        return 1
    return 0

print("=" * 70)
print("STEP 6.1: Hit Rate Metric")
print("=" * 70)
print("✓ Hit Rate function defined")
print("  - Returns 1 if true item is in top-K recommendations")
print("  - Returns 0 otherwise")


STEP 6.1: Hit Rate Metric
✓ Hit Rate function defined
  - Returns 1 if true item is in top-K recommendations
  - Returns 0 otherwise


In [11]:
# ============================================================================
# 6.2 NDCG METRIC
# ============================================================================

def ndcg(gt_item, pred_items):
    """
    Calculate Normalized Discounted Cumulative Gain (NDCG) for a single test case.
    
    NDCG measures ranking quality by:
    1. Giving more weight to items ranked higher (position matters)
    2. Using logarithmic discounting (relevance decreases with position)
    
    Formula: NDCG = 1 / log2(position + 2)
    - Position 0 (top): 1 / log2(2) = 1.0
    - Position 1: 1 / log2(3) ≈ 0.63
    - Position 2: 1 / log2(4) = 0.5
    - Position 9: 1 / log2(11) ≈ 0.29
    
    Parameters:
    - gt_item: Ground truth item ID (the item user actually interacted with)
    - pred_items: List of top-K predicted item IDs (recommended items)
    
    Returns:
    - NDCG score (0.0 to 1.0) if gt_item is in pred_items
    - 0.0 if gt_item is not in pred_items
    """
    if gt_item in pred_items:
        # Find the position (index) of the ground truth item
        index = pred_items.index(gt_item)
        # Calculate NDCG: 1 / log2(position + 2)
        # +2 because: position 0 should give 1/log2(2) = 1.0
        return np.reciprocal(np.log2(index + 2))
    return 0.0

print("=" * 70)
print("STEP 6.2: NDCG Metric")
print("=" * 70)
print("✓ NDCG function defined")
print("  - Measures ranking quality")
print("  - Higher score for items ranked higher")
print("  - Returns 0 if true item not in recommendations")

# Example to demonstrate NDCG
print("\nNDCG Examples:")
example_items = [10, 20, 30, 40, 50]
print(f"  Top-5 recommendations: {example_items}")
print(f"  If true item is at position 0: NDCG = {ndcg(10, example_items):.3f}")
print(f"  If true item is at position 2: NDCG = {ndcg(30, example_items):.3f}")
print(f"  If true item is at position 4: NDCG = {ndcg(50, example_items):.3f}")
print(f"  If true item not in list: NDCG = {ndcg(99, example_items):.3f}")


STEP 6.2: NDCG Metric
✓ NDCG function defined
  - Measures ranking quality
  - Higher score for items ranked higher
  - Returns 0 if true item not in recommendations

NDCG Examples:
  Top-5 recommendations: [10, 20, 30, 40, 50]
  If true item is at position 0: NDCG = 1.000
  If true item is at position 2: NDCG = 0.500
  If true item is at position 4: NDCG = 0.387
  If true item not in list: NDCG = 0.000


In [12]:
# ============================================================================
# 6.3 EVALUATION FUNCTION
# ============================================================================

def evaluate_metrics(model, test_loader, top_k, device='cuda'):
    """
    Evaluate model performance on test data.
    
    This function:
    1. For each test case (1 positive + 99 negatives):
       - Gets model predictions for all 100 items
       - Selects top-K items with highest scores
       - Checks if the true item is in top-K (Hit Rate)
       - Calculates NDCG based on true item's position
    2. Averages metrics across all test cases
    
    Parameters:
    - model: Trained NCF model
    - test_loader: DataLoader with test data
    - top_k: Number of top items to consider (e.g., 10)
    - device: 'cuda' or 'cpu'
    
    Returns:
    - mean_HR: Average Hit Rate across all test cases
    - mean_NDCG: Average NDCG across all test cases
    """
    model.eval()  # Set model to evaluation mode (disables dropout)
    
    HR_list = []  # List to store Hit Rate for each test case
    NDCG_list = []  # List to store NDCG for each test case
    
    with torch.no_grad():  # Disable gradient computation (faster, saves memory)
        for user, item, label in test_loader:
            # Move data to device (GPU or CPU)
            if device == 'cuda' and torch.cuda.is_available():
                user = user.cuda()
                item = item.cuda()
            else:
                device = 'cpu'
            
            # Get model predictions for all items in this batch
            # Batch size = test_num_ng + 1 = 100 (1 positive + 99 negatives)
            predictions = model(user, item)  # [100] tensor of scores
            
            # Get top-K items with highest prediction scores
            # torch.topk returns (values, indices)
            _, indices = torch.topk(predictions, top_k)
            
            # Get the actual item IDs for top-K recommendations
            # torch.take extracts items at given indices
            recommends = torch.take(item, indices).cpu().numpy().tolist()
            
            # The first item in the batch is always the positive (true) item
            gt_item = item[0].item()  # Ground truth item ID
            
            # Calculate metrics for this test case
            HR_list.append(hit(gt_item, recommends))
            NDCG_list.append(ndcg(gt_item, recommends))
    
    # Calculate average metrics across all test cases
    mean_HR = np.mean(HR_list)
    mean_NDCG = np.mean(NDCG_list)
    
    return mean_HR, mean_NDCG

print("=" * 70)
print("STEP 6.3: Evaluation Function")
print("=" * 70)
print("✓ Evaluation function defined")
print("  - Evaluates model on test data")
print("  - Calculates average Hit Rate and NDCG")
print("  - Works with GPU or CPU")

# Determine device
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"\n✓ Evaluation ready (device: {device})")


STEP 6.3: Evaluation Function
✓ Evaluation function defined
  - Evaluates model on test data
  - Calculates average Hit Rate and NDCG
  - Works with GPU or CPU

✓ Evaluation ready (device: cpu)


### Detailed Explanation of Step 6:

#### 6.1 Hit Rate (HR@K) - Binary Metric

**What is Hit Rate?**
- Measures whether the true item appears in the top-K recommendations
- Binary metric: 1 if found, 0 if not found
- Simple and intuitive: "Did we recommend the right item?"

**Example:**
```
True item: Movie #42
Top-10 recommendations: [15, 23, 42, 7, 89, 12, 56, 3, 91, 8]
                        ↑
                    Found at position 2!
Hit Rate = 1 (item is in top-10)
```

**Why Hit Rate?**
- Users typically only see top-K recommendations
- If true item is in top-K, recommendation is successful
- Easy to interpret: "X% of test cases had correct item in top-K"

**Limitations:**
- Doesn't consider position (item at position 1 vs position 10 both get 1)
- Binary: doesn't measure how good the ranking is

---

#### 6.2 NDCG (Normalized Discounted Cumulative Gain) - Ranking Metric

**What is NDCG?**
- Measures ranking quality, not just presence
- Gives more weight to items ranked higher
- Uses logarithmic discounting (relevance decreases with position)

**NDCG Formula:**
```
NDCG = 1 / log2(position + 2)
```

**Position vs NDCG Score:**
| Position | NDCG Score | Meaning |
|----------|-----------|---------|
| 0 (top)  | 1.000     | Perfect! Item ranked #1 |
| 1        | 0.631     | Item ranked #2 |
| 2        | 0.500     | Item ranked #3 |
| 3        | 0.431     | Item ranked #4 |
| 4        | 0.387     | Item ranked #5 |
| 9        | 0.289     | Item ranked #10 |

**Why Logarithmic Discounting?**
- Position 0 → 1: Big difference (user sees it first)
- Position 9 → 10: Small difference (both far down)
- Logarithmic function captures this diminishing importance

**Example:**
```
True item: Movie #42
Top-10 recommendations: [15, 23, 42, 7, 89, 12, 56, 3, 91, 8]
                        ↑
                    Found at position 2!
NDCG = 1 / log2(2 + 2) = 1 / log2(4) = 1 / 2 = 0.5
```

**Why NDCG?**
- Considers position: Better ranking = Higher score
- More informative than Hit Rate
- Standard metric in information retrieval and recommendation systems

**Limitations:**
- More complex than Hit Rate
- Requires understanding of logarithmic discounting

---

#### 6.3 Evaluation Process

**How Evaluation Works:**

1. **For each test case** (1 positive + 99 negatives):
   ```
   User: 123
   Items: [42 (positive), 1, 5, 7, 9, ... (99 negatives)]
   ```

2. **Get predictions**:
   ```
   Model scores: [0.8, 0.3, 0.2, 0.1, 0.05, ...]
   Item 42 gets score 0.8 (highest!)
   ```

3. **Select top-K** (e.g., K=10):
   ```
   Top-10 items: [42, 1, 5, 7, 9, 12, 15, 18, 20, 23]
   ```

4. **Calculate metrics**:
   - Hit Rate: Is 42 in top-10? Yes → HR = 1
   - NDCG: Position of 42? Position 0 → NDCG = 1.0

5. **Average across all test cases**:
   ```
   Mean HR = (1 + 0 + 1 + 1 + ...) / N
   Mean NDCG = (1.0 + 0.0 + 0.5 + 0.63 + ...) / N
   ```

**Test Data Structure:**
- Each batch: 100 items (1 positive + 99 negatives)
- Model should rank the positive item higher than negatives
- We measure if positive is in top-K

**Why 99 Negatives?**
- Simulates real-world scenario: recommend 1 from 100 candidates
- Standard evaluation protocol (used in research papers)
- Makes evaluation realistic and challenging

---

#### 6.4 Understanding the Results

**Good Performance:**
- HR@10 > 0.6: 60% of test cases have true item in top-10
- NDCG@10 > 0.4: Good ranking quality on average

**Excellent Performance:**
- HR@10 > 0.7: 70% success rate
- NDCG@10 > 0.5: Very good ranking quality

**What to Expect:**
- Random baseline: HR@10 ≈ 0.1 (10% chance)
- Good model: HR@10 ≈ 0.6-0.7
- State-of-the-art: HR@10 > 0.7

**Interpreting Results:**
- **HR higher than NDCG**: Model finds items but doesn't rank them well
- **NDCG close to HR**: Model ranks items well (good positions)
- **Both low**: Model needs more training or better architecture

---

**✅ Step 6 Complete!**

We now have:
- Hit Rate metric for binary evaluation
- NDCG metric for ranking quality
- Complete evaluation function ready to use

---

## Step 7: Training Loop

In this step, we'll implement the complete training process:
1. **Loss Function**: Binary Cross-Entropy with Logits (for binary classification)
2. **Optimizer**: Adam optimizer (adaptive learning rate)
3. **Training Loop**: Iterate through epochs, train on batches, evaluate periodically
4. **Model Saving**: Save the best model based on validation performance

This is where the model learns to make good recommendations!

Let's implement this step by step.


In [13]:
# ============================================================================
# STEP 7: TRAINING LOOP
# ============================================================================

"""
This step implements the complete training process for the NCF model.
"""

# ============================================================================
# 7.1 SETUP LOSS FUNCTION AND OPTIMIZER
# ============================================================================

print("=" * 70)
print("STEP 7.1: Setting Up Loss Function and Optimizer")
print("=" * 70)

# Loss Function: Binary Cross-Entropy with Logits
# This combines sigmoid activation + binary cross-entropy loss
# More numerically stable than applying sigmoid separately
# 
# Why BCEWithLogitsLoss?
# - Our task: Predict if user will like item (binary: 1 or 0)
# - Model outputs raw scores (logits), not probabilities
# - BCEWithLogitsLoss applies sigmoid internally and computes loss
# - More stable than: sigmoid(output) then BCE(sigmoid_output, label)

loss_function = nn.BCEWithLogitsLoss()
print("✓ Loss function: BCEWithLogitsLoss")
print("  - For binary classification (like/dislike)")
print("  - Combines sigmoid + cross-entropy for stability")

# Optimizer: Adam (Adaptive Moment Estimation)
# Adam is an adaptive learning rate optimizer that:
# - Adjusts learning rate per parameter
# - Uses momentum (moving average of gradients)
# - Works well for most deep learning tasks
# - Better than SGD for this problem

optimizer = optim.Adam(ncf_model.parameters(), lr=learning_rate)
print(f"\n✓ Optimizer: Adam")
print(f"  - Learning rate: {learning_rate}")
print(f"  - Adaptive: adjusts learning rate automatically")

# Count trainable parameters
total_params = sum(p.numel() for p in ncf_model.parameters() if p.requires_grad)
print(f"\n✓ Model ready for training")
print(f"  - Trainable parameters: {total_params:,}")


STEP 7.1: Setting Up Loss Function and Optimizer
✓ Loss function: BCEWithLogitsLoss
  - For binary classification (like/dislike)
  - Combines sigmoid + cross-entropy for stability

✓ Optimizer: Adam
  - Learning rate: 0.001
  - Adaptive: adjusts learning rate automatically

✓ Model ready for training
  - Trainable parameters: 1,574,657


In [14]:
# ============================================================================
# 7.2 TRAINING LOOP
# ============================================================================

print("\n" + "=" * 70)
print("STEP 7.2: Starting Training")
print("=" * 70)
print(f"Training for {epochs} epochs...")
print(f"Model: {model_name}")
print(f"Device: {device}")
print("=" * 70)

# Track best performance
best_hr = 0.0
best_ndcg = 0.0
best_epoch = 0
training_history = {
    'epoch': [],
    'hr': [],
    'ndcg': [],
    'time': []
}

# Training loop
for epoch in range(epochs):
    # ========================================================================
    # TRAINING PHASE
    # ========================================================================
    
    # Set model to training mode
    # This enables dropout and other training-specific behaviors
    ncf_model.train()
    
    # Start timer for this epoch
    epoch_start_time = time.time()
    
    # Generate negative samples for this epoch
    # Important: We generate fresh negatives each epoch for better learning
    print(f"\nEpoch {epoch+1}/{epochs}")
    print("-" * 70)
    train_dataset.ng_sample()
    
    # Track loss for this epoch
    epoch_loss = 0.0
    num_batches = 0
    
    # Iterate through training batches
    for batch_idx, (user, item, label) in enumerate(train_loader):
        # Move data to device (GPU or CPU)
        if device == 'cuda' and torch.cuda.is_available():
            user = user.cuda()
            item = item.cuda()
            label = label.float().cuda()
        else:
            user = user
            item = item
            label = label.float()
        
        # ================================================================
        # FORWARD PASS
        # ================================================================
        # Clear gradients from previous iteration
        optimizer.zero_grad()
        
        # Get model predictions (raw scores/logits)
        prediction = ncf_model(user, item)  # [batch_size]
        
        # ================================================================
        # COMPUTE LOSS
        # ================================================================
        # Compare predictions with true labels (1 for positive, 0 for negative)
        loss = loss_function(prediction, label)
        
        # ================================================================
        # BACKWARD PASS
        # ================================================================
        # Compute gradients
        loss.backward()
        
        # Update model weights
        optimizer.step()
        
        # Track loss
        epoch_loss += loss.item()
        num_batches += 1
        
        # Print progress every 100 batches
        if (batch_idx + 1) % 100 == 0:
            avg_loss = epoch_loss / num_batches
            print(f"  Batch {batch_idx+1}/{len(train_loader)} - Loss: {avg_loss:.4f}")
    
    # Calculate average loss for this epoch
    avg_loss = epoch_loss / num_batches if num_batches > 0 else 0.0
    
    # ========================================================================
    # EVALUATION PHASE
    # ========================================================================
    
    # Set model to evaluation mode
    # This disables dropout and uses deterministic behavior
    ncf_model.eval()
    
    # Evaluate on test set
    print("  Evaluating on test set...")
    HR, NDCG = evaluate_metrics(ncf_model, test_loader, top_k, device)
    
    # Calculate elapsed time
    elapsed_time = time.time() - epoch_start_time
    time_str = time.strftime("%H:%M:%S", time.gmtime(elapsed_time))
    
    # Store history
    training_history['epoch'].append(epoch + 1)
    training_history['hr'].append(HR)
    training_history['ndcg'].append(NDCG)
    training_history['time'].append(elapsed_time)
    
    # Print epoch results
    print(f"  Time: {time_str}")
    print(f"  Loss: {avg_loss:.4f}")
    print(f"  HR@{top_k}: {HR:.4f}")
    print(f"  NDCG@{top_k}: {NDCG:.4f}")
    
    # ========================================================================
    # SAVE BEST MODEL
    # ========================================================================
    
    # Check if this is the best model so far
    if HR > best_hr:
        best_hr = HR
        best_ndcg = NDCG
        best_epoch = epoch + 1
        
        print(f"  ✓ New best model! (HR@{top_k}: {HR:.4f})")
        
        # Save model if enabled
        if save_model:
            if not os.path.exists(model_path):
                os.makedirs(model_path)
            
            model_filename = os.path.join(model_path, f'{model_name}.pth')
            torch.save(ncf_model, model_filename)
            print(f"  ✓ Model saved to {model_filename}")
    else:
        print(f"  (Best: HR@{top_k}: {best_hr:.4f} at epoch {best_epoch})")
    
    print("-" * 70)

# ========================================================================
# TRAINING COMPLETE
# ========================================================================

print("\n" + "=" * 70)
print("TRAINING COMPLETE!")
print("=" * 70)
print(f"Best model at epoch {best_epoch}:")
print(f"  HR@{top_k}: {best_hr:.4f}")
print(f"  NDCG@{top_k}: {best_ndcg:.4f}")
print("=" * 70)

# Store the NeuMF-end model for later comparison
ncf_model_neumf_end = ncf_model
best_hr_neumf_end = best_hr
best_ndcg_neumf_end = best_ndcg



STEP 7.2: Starting Training
Training for 20 epochs...
Model: NeuMF-end
Device: cpu

Epoch 1/20
----------------------------------------------------------------------
Generating 4 negative samples per positive pair...
✓ Generated 1840900 negative samples
  - Total samples (positives + negatives): 2301125
  Batch 100/8989 - Loss: 0.4856
  Batch 200/8989 - Loss: 0.4368
  Batch 300/8989 - Loss: 0.4145
  Batch 400/8989 - Loss: 0.4014
  Batch 500/8989 - Loss: 0.3922
  Batch 600/8989 - Loss: 0.3861
  Batch 700/8989 - Loss: 0.3830
  Batch 800/8989 - Loss: 0.3795
  Batch 900/8989 - Loss: 0.3767
  Batch 1000/8989 - Loss: 0.3740
  Batch 1100/8989 - Loss: 0.3719
  Batch 1200/8989 - Loss: 0.3707
  Batch 1300/8989 - Loss: 0.3699
  Batch 1400/8989 - Loss: 0.3687
  Batch 1500/8989 - Loss: 0.3673
  Batch 1600/8989 - Loss: 0.3664
  Batch 1700/8989 - Loss: 0.3652
  Batch 1800/8989 - Loss: 0.3641
  Batch 1900/8989 - Loss: 0.3631
  Batch 2000/8989 - Loss: 0.3626
  Batch 2100/8989 - Loss: 0.3622
  Batch 22

---

## Step 7.5: Train Models Separately (GMF, MLP, NeuMF-pre)

In this step, we'll train each model architecture separately:
1. **Train GMF model** - Generalized Matrix Factorization only
2. **Train MLP model** - Multi-Layer Perceptron only  
3. **Train NeuMF-pre** - Neural Matrix Factorization using pre-trained GMF and MLP weights

This approach (NeuMF-pre) typically gives the best performance as it leverages pre-trained components.


In [15]:
# ============================================================================
# STEP 7.5: TRAIN MODELS SEPARATELY (GMF, MLP, NeuMF-pre)
# ============================================================================

print("=" * 70)
print("STEP 7.5: Training Models Separately")
print("=" * 70)
print("This will train:")
print("  1. GMF model (Generalized Matrix Factorization)")
print("  2. MLP model (Multi-Layer Perceptron)")
print("  3. NeuMF-pre model (using pre-trained GMF and MLP weights)")
print("=" * 70)

# Store trained models
trained_models = {}

# ============================================================================
# 7.5.1 TRAIN GMF MODEL
# ============================================================================

print("\n" + "=" * 70)
print("STEP 7.5.1: Training GMF Model")
print("=" * 70)

# Create GMF model
gmf_model = NCF(
    user_num=user_num,
    item_num=item_num,
    factor_num=factor_num,
    num_layers=num_layers,
    dropout=dropout_rate,
    model_name='GMF',
    GMF_model=None,
    MLP_model=None
)

if torch.cuda.is_available():
    gmf_model = gmf_model.cuda()

# Setup optimizer and loss
gmf_optimizer = optim.Adam(gmf_model.parameters(), lr=learning_rate)
gmf_loss_function = nn.BCEWithLogitsLoss()

# Training loop for GMF
print(f"Training GMF for {epochs} epochs...")
best_hr_gmf = 0.0
best_ndcg_gmf = 0.0
best_epoch_gmf = 0

for epoch in range(epochs):
    gmf_model.train()
    epoch_start_time = time.time()
    train_dataset.ng_sample()
    
    epoch_loss = 0.0
    num_batches = 0
    
    for batch_idx, (user, item, label) in enumerate(train_loader):
        if device == 'cuda' and torch.cuda.is_available():
            user = user.cuda()
            item = item.cuda()
            label = label.float().cuda()
        else:
            user = user
            item = item
            label = label.float()
        
        gmf_optimizer.zero_grad()
        prediction = gmf_model(user, item)
        loss = gmf_loss_function(prediction, label)
        loss.backward()
        gmf_optimizer.step()
        
        epoch_loss += loss.item()
        num_batches += 1
    
    avg_loss = epoch_loss / num_batches if num_batches > 0 else 0.0
    
    # Evaluate
    gmf_model.eval()
    HR, NDCG = evaluate_metrics(gmf_model, test_loader, top_k, device)
    
    elapsed_time = time.time() - epoch_start_time
    time_str = time.strftime("%H:%M:%S", time.gmtime(elapsed_time))
    
    print(f"Epoch {epoch+1}/{epochs} - Time: {time_str} - Loss: {avg_loss:.4f} - HR@{top_k}: {HR:.4f} - NDCG@{top_k}: {NDCG:.4f}")
    
    if HR > best_hr_gmf:
        best_hr_gmf = HR
        best_ndcg_gmf = NDCG
        best_epoch_gmf = epoch + 1
        if save_model:
            torch.save(gmf_model, GMF_model_path)
            print(f"  ✓ Saved best GMF model (HR@{top_k}: {HR:.4f})")

print(f"\n✓ GMF Training Complete!")
print(f"  Best epoch: {best_epoch_gmf}")
print(f"  Best HR@{top_k}: {best_hr_gmf:.4f}")
print(f"  Best NDCG@{top_k}: {best_ndcg_gmf:.4f}")

trained_models['GMF'] = {
    'model': gmf_model,
    'hr': best_hr_gmf,
    'ndcg': best_ndcg_gmf,
    'epoch': best_epoch_gmf
}


STEP 7.5: Training Models Separately
This will train:
  1. GMF model (Generalized Matrix Factorization)
  2. MLP model (Multi-Layer Perceptron)
  3. NeuMF-pre model (using pre-trained GMF and MLP weights)

STEP 7.5.1: Training GMF Model
Training GMF for 20 epochs...
Generating 4 negative samples per positive pair...
✓ Generated 1840900 negative samples
  - Total samples (positives + negatives): 2301125
Epoch 1/20 - Time: 00:00:20 - Loss: 0.3577 - HR@10: 0.6525 - NDCG@10: 0.3838
  ✓ Saved best GMF model (HR@10: 0.6525)
Generating 4 negative samples per positive pair...
✓ Generated 1840900 negative samples
  - Total samples (positives + negatives): 2301125
Epoch 2/20 - Time: 00:00:20 - Loss: 0.2979 - HR@10: 0.7042 - NDCG@10: 0.4259
  ✓ Saved best GMF model (HR@10: 0.7042)
Generating 4 negative samples per positive pair...
✓ Generated 1840900 negative samples
  - Total samples (positives + negatives): 2301125
Epoch 3/20 - Time: 00:00:19 - Loss: 0.2781 - HR@10: 0.7249 - NDCG@10: 0.4432
  ✓

In [16]:
# ============================================================================
# 7.5.2 TRAIN MLP MODEL
# ============================================================================

print("\n" + "=" * 70)
print("STEP 7.5.2: Training MLP Model")
print("=" * 70)

# Create MLP model
mlp_model = NCF(
    user_num=user_num,
    item_num=item_num,
    factor_num=factor_num,
    num_layers=num_layers,
    dropout=dropout_rate,
    model_name='MLP',
    GMF_model=None,
    MLP_model=None
)

if torch.cuda.is_available():
    mlp_model = mlp_model.cuda()

# Setup optimizer and loss
mlp_optimizer = optim.Adam(mlp_model.parameters(), lr=learning_rate)
mlp_loss_function = nn.BCEWithLogitsLoss()

# Training loop for MLP
print(f"Training MLP for {epochs} epochs...")
best_hr_mlp = 0.0
best_ndcg_mlp = 0.0
best_epoch_mlp = 0

for epoch in range(epochs):
    mlp_model.train()
    epoch_start_time = time.time()
    train_dataset.ng_sample()
    
    epoch_loss = 0.0
    num_batches = 0
    
    for batch_idx, (user, item, label) in enumerate(train_loader):
        if device == 'cuda' and torch.cuda.is_available():
            user = user.cuda()
            item = item.cuda()
            label = label.float().cuda()
        else:
            user = user
            item = item
            label = label.float()
        
        mlp_optimizer.zero_grad()
        prediction = mlp_model(user, item)
        loss = mlp_loss_function(prediction, label)
        loss.backward()
        mlp_optimizer.step()
        
        epoch_loss += loss.item()
        num_batches += 1
    
    avg_loss = epoch_loss / num_batches if num_batches > 0 else 0.0
    
    # Evaluate
    mlp_model.eval()
    HR, NDCG = evaluate_metrics(mlp_model, test_loader, top_k, device)
    
    elapsed_time = time.time() - epoch_start_time
    time_str = time.strftime("%H:%M:%S", time.gmtime(elapsed_time))
    
    print(f"Epoch {epoch+1}/{epochs} - Time: {time_str} - Loss: {avg_loss:.4f} - HR@{top_k}: {HR:.4f} - NDCG@{top_k}: {NDCG:.4f}")
    
    if HR > best_hr_mlp:
        best_hr_mlp = HR
        best_ndcg_mlp = NDCG
        best_epoch_mlp = epoch + 1
        if save_model:
            torch.save(mlp_model, MLP_model_path)
            print(f"  ✓ Saved best MLP model (HR@{top_k}: {HR:.4f})")

print(f"\n✓ MLP Training Complete!")
print(f"  Best epoch: {best_epoch_mlp}")
print(f"  Best HR@{top_k}: {best_hr_mlp:.4f}")
print(f"  Best NDCG@{top_k}: {best_ndcg_mlp:.4f}")

trained_models['MLP'] = {
    'model': mlp_model,
    'hr': best_hr_mlp,
    'ndcg': best_ndcg_mlp,
    'epoch': best_epoch_mlp
}



STEP 7.5.2: Training MLP Model
Training MLP for 20 epochs...
Generating 4 negative samples per positive pair...
✓ Generated 1840900 negative samples
  - Total samples (positives + negatives): 2301125
Epoch 1/20 - Time: 00:00:33 - Loss: 0.3491 - HR@10: 0.5890 - NDCG@10: 0.3364
  ✓ Saved best MLP model (HR@10: 0.5890)
Generating 4 negative samples per positive pair...
✓ Generated 1840900 negative samples
  - Total samples (positives + negatives): 2301125
Epoch 2/20 - Time: 00:00:32 - Loss: 0.3182 - HR@10: 0.6518 - NDCG@10: 0.3826
  ✓ Saved best MLP model (HR@10: 0.6518)
Generating 4 negative samples per positive pair...
✓ Generated 1840900 negative samples
  - Total samples (positives + negatives): 2301125
Epoch 3/20 - Time: 00:00:33 - Loss: 0.2971 - HR@10: 0.6923 - NDCG@10: 0.4143
  ✓ Saved best MLP model (HR@10: 0.6923)
Generating 4 negative samples per positive pair...
✓ Generated 1840900 negative samples
  - Total samples (positives + negatives): 2301125
Epoch 4/20 - Time: 00:00:32 

In [17]:
# ============================================================================
# 7.5.3 TRAIN NeuMF-pre MODEL (Using Pre-trained GMF and MLP)
# ============================================================================

print("\n" + "=" * 70)
print("STEP 7.5.3: Training NeuMF-pre Model")
print("=" * 70)
print("Creating NeuMF model with pre-trained GMF and MLP weights...")

# Create NeuMF-pre model using pre-trained GMF and MLP
neumf_pre_model = NCF(
    user_num=user_num,
    item_num=item_num,
    factor_num=factor_num,
    num_layers=num_layers,
    dropout=dropout_rate,
    model_name='NeuMF-pre',
    GMF_model=gmf_model,
    MLP_model=mlp_model
)

if torch.cuda.is_available():
    neumf_pre_model = neumf_pre_model.cuda()

# Setup optimizer (SGD is typically used for NeuMF-pre)
neumf_pre_optimizer = optim.SGD(neumf_pre_model.parameters(), lr=learning_rate)
neumf_pre_loss_function = nn.BCEWithLogitsLoss()

# Training loop for NeuMF-pre
print(f"Training NeuMF-pre for {epochs} epochs...")
best_hr_neumf_pre = 0.0
best_ndcg_neumf_pre = 0.0
best_epoch_neumf_pre = 0

for epoch in range(epochs):
    neumf_pre_model.train()
    epoch_start_time = time.time()
    train_dataset.ng_sample()
    
    epoch_loss = 0.0
    num_batches = 0
    
    for batch_idx, (user, item, label) in enumerate(train_loader):
        if device == 'cuda' and torch.cuda.is_available():
            user = user.cuda()
            item = item.cuda()
            label = label.float().cuda()
        else:
            user = user
            item = item
            label = label.float()
        
        neumf_pre_optimizer.zero_grad()
        prediction = neumf_pre_model(user, item)
        loss = neumf_pre_loss_function(prediction, label)
        loss.backward()
        neumf_pre_optimizer.step()
        
        epoch_loss += loss.item()
        num_batches += 1
    
    avg_loss = epoch_loss / num_batches if num_batches > 0 else 0.0
    
    # Evaluate
    neumf_pre_model.eval()
    HR, NDCG = evaluate_metrics(neumf_pre_model, test_loader, top_k, device)
    
    elapsed_time = time.time() - epoch_start_time
    time_str = time.strftime("%H:%M:%S", time.gmtime(elapsed_time))
    
    print(f"Epoch {epoch+1}/{epochs} - Time: {time_str} - Loss: {avg_loss:.4f} - HR@{top_k}: {HR:.4f} - NDCG@{top_k}: {NDCG:.4f}")
    
    if HR > best_hr_neumf_pre:
        best_hr_neumf_pre = HR
        best_ndcg_neumf_pre = NDCG
        best_epoch_neumf_pre = epoch + 1
        if save_model:
            torch.save(neumf_pre_model, NeuMF_model_path)
            print(f"  ✓ Saved best NeuMF-pre model (HR@{top_k}: {HR:.4f})")

print(f"\n✓ NeuMF-pre Training Complete!")
print(f"  Best epoch: {best_epoch_neumf_pre}")
print(f"  Best HR@{top_k}: {best_hr_neumf_pre:.4f}")
print(f"  Best NDCG@{top_k}: {best_ndcg_neumf_pre:.4f}")

trained_models['NeuMF-pre'] = {
    'model': neumf_pre_model,
    'hr': best_hr_neumf_pre,
    'ndcg': best_ndcg_neumf_pre,
    'epoch': best_epoch_neumf_pre
}

# ============================================================================
# COMPARISON OF ALL MODELS
# ============================================================================

print("\n" + "=" * 70)
print("MODEL COMPARISON SUMMARY")
print("=" * 70)
print(f"{'Model':<15} {'HR@{top_k}':<12} {'NDCG@{top_k}':<12} {'Best Epoch':<12}")
print("-" * 70)
print(f"{'GMF':<15} {best_hr_gmf:<12.4f} {best_ndcg_gmf:<12.4f} {best_epoch_gmf:<12}")
print(f"{'MLP':<15} {best_hr_mlp:<12.4f} {best_ndcg_mlp:<12.4f} {best_epoch_mlp:<12}")
print(f"{'NeuMF-end':<15} {best_hr_neumf_end:<12.4f} {best_ndcg_neumf_end:<12.4f} {best_epoch:<12}")
print(f"{'NeuMF-pre':<15} {best_hr_neumf_pre:<12.4f} {best_ndcg_neumf_pre:<12.4f} {best_epoch_neumf_pre:<12}")
print("=" * 70)

# Find best model
all_results = [
    ('GMF', best_hr_gmf, best_ndcg_gmf),
    ('MLP', best_hr_mlp, best_ndcg_mlp),
    ('NeuMF-end', best_hr_neumf_end, best_ndcg_neumf_end),
    ('NeuMF-pre', best_hr_neumf_pre, best_ndcg_neumf_pre)
]
best_model_name, best_hr_overall, best_ndcg_overall = max(all_results, key=lambda x: x[1])

print(f"\n🏆 Best Model: {best_model_name}")
print(f"   HR@{top_k}: {best_hr_overall:.4f}")
print(f"   NDCG@{top_k}: {best_ndcg_overall:.4f}")

# Set the best model as the main model for recommendations
if best_model_name == 'GMF':
    ncf_model = gmf_model
elif best_model_name == 'MLP':
    ncf_model = mlp_model
elif best_model_name == 'NeuMF-pre':
    ncf_model = neumf_pre_model
else:
    ncf_model = ncf_model_neumf_end

print(f"\n✓ Using {best_model_name} model for recommendations")
print("=" * 70)



STEP 7.5.3: Training NeuMF-pre Model
Creating NeuMF model with pre-trained GMF and MLP weights...
Training NeuMF-pre for 20 epochs...
Generating 4 negative samples per positive pair...
✓ Generated 1840900 negative samples
  - Total samples (positives + negatives): 2301125
Epoch 1/20 - Time: 00:00:28 - Loss: 0.1674 - HR@10: 0.7554 - NDCG@10: 0.4782
  ✓ Saved best NeuMF-pre model (HR@10: 0.7554)
Generating 4 negative samples per positive pair...
✓ Generated 1840900 negative samples
  - Total samples (positives + negatives): 2301125
Epoch 2/20 - Time: 00:00:28 - Loss: 0.1663 - HR@10: 0.7531 - NDCG@10: 0.4764
Generating 4 negative samples per positive pair...
✓ Generated 1840900 negative samples
  - Total samples (positives + negatives): 2301125
Epoch 3/20 - Time: 00:00:28 - Loss: 0.1648 - HR@10: 0.7517 - NDCG@10: 0.4752
Generating 4 negative samples per positive pair...
✓ Generated 1840900 negative samples
  - Total samples (positives + negatives): 2301125
Epoch 4/20 - Time: 00:00:28 - L

### Explanation of Step 7.5:

**Why Train Separately?**

1. **GMF (Generalized Matrix Factorization)**:
   - Simple linear model
   - Fast to train
   - Good baseline performance
   - Captures linear user-item interactions

2. **MLP (Multi-Layer Perceptron)**:
   - Deep non-linear model
   - Can learn complex patterns
   - Complements GMF's linear approach

3. **NeuMF-pre (Pre-trained Neural Matrix Factorization)**:
   - Combines pre-trained GMF and MLP
   - Initializes with learned embeddings from both models
   - Typically achieves best performance
   - Fine-tunes the combined model

**Training Strategy:**
- Train GMF and MLP independently
- Use their learned embeddings to initialize NeuMF
- Fine-tune NeuMF with SGD (more stable than Adam for pre-trained models)
- This gives better performance than training NeuMF from scratch

**Model Comparison:**
- GMF: Fast, simple, good baseline
- MLP: Complex patterns, non-linear
- NeuMF-end: Trained from scratch, good balance
- NeuMF-pre: Best performance, uses pre-trained components

---

**✅ Step 7.5 Complete!**

All models have been trained separately. The best model is now available for recommendations.


### Detailed Explanation of Step 7:

#### 7.1 Loss Function and Optimizer

**Binary Cross-Entropy with Logits Loss (BCEWithLogitsLoss):**

**What it does:**
- Combines sigmoid activation + binary cross-entropy loss
- More numerically stable than applying sigmoid separately
- Formula: `loss = -[y*log(σ(x)) + (1-y)*log(1-σ(x))]`
  - Where σ(x) = sigmoid(x), y = true label (0 or 1), x = model output

**Why this loss?**
- Our task: Predict if user will like item (binary classification)
- Labels: 1 (positive interaction) or 0 (negative interaction)
- Model outputs: Raw scores (logits), not probabilities
- BCEWithLogitsLoss handles the conversion internally

**Numerical Stability:**
- Direct sigmoid can cause overflow/underflow
- BCEWithLogitsLoss uses log-sum-exp trick for stability
- Prevents NaN/inf values during training

**Adam Optimizer:**

**What is Adam?**
- Adaptive Moment Estimation
- Combines benefits of:
  - Momentum: Uses moving average of gradients (smoother updates)
  - RMSprop: Adapts learning rate per parameter
  - Bias correction: Accounts for initialization bias

**Why Adam?**
- Works well out-of-the-box (default hyperparameters)
- Adapts learning rate automatically
- Faster convergence than SGD for most problems
- Good for recommendation systems

**Learning Rate:**
- 0.001 is a safe default for Adam
- Too high: Training unstable, loss might explode
- Too low: Training very slow, might not converge

---

#### 7.2 Training Loop - Step by Step

**Epoch Structure:**

1. **Set Training Mode**:
   ```python
   model.train()  # Enables dropout, batch norm training mode
   ```

2. **Generate Negative Samples**:
   ```python
   train_dataset.ng_sample()  # Fresh negatives each epoch
   ```
   - Important: New negatives each epoch improves learning
   - Prevents overfitting to specific negative samples

3. **Iterate Through Batches**:
   ```python
   for user, item, label in train_loader:
       # Process batch
   ```

4. **Forward Pass**:
   ```python
   prediction = model(user, item)  # Get model predictions
   ```
   - Model outputs raw scores (logits)
   - Higher score = more likely user will like item

5. **Compute Loss**:
   ```python
   loss = loss_function(prediction, label)
   ```
   - Compares predictions with true labels
   - Measures how wrong the model is

6. **Backward Pass**:
   ```python
   loss.backward()  # Compute gradients
   optimizer.step()  # Update weights
   ```
   - Gradients tell us how to adjust weights
   - Optimizer updates weights to reduce loss

7. **Evaluation**:
   ```python
   model.eval()  # Disable dropout
   HR, NDCG = evaluate_metrics(model, test_loader, top_k)
   ```
   - Test model on held-out test set
   - Measure performance with Hit Rate and NDCG

8. **Save Best Model**:
   ```python
   if HR > best_hr:
       torch.save(model, 'best_model.pth')
   ```
   - Save model with best validation performance
   - Prevents losing good models if training degrades

---

#### 7.3 Understanding Training Progress

**What to Watch:**

1. **Loss Decreasing**:
   - Should decrease over epochs
   - If increases: learning rate too high or model unstable
   - If plateaus: model converged or needs more capacity

2. **HR and NDCG Increasing**:
   - Should improve over epochs
   - Early epochs: rapid improvement
   - Later epochs: slower improvement (diminishing returns)

3. **Best Model Tracking**:
   - Model might overfit (train loss decreases, test HR decreases)
   - We save best model based on test HR
   - This prevents using overfitted model

**Typical Training Curve:**
```
Epoch 1:  HR: 0.200, NDCG: 0.100  (random)
Epoch 5:  HR: 0.500, NDCG: 0.300  (learning)
Epoch 10: HR: 0.650, NDCG: 0.420  (improving)
Epoch 15: HR: 0.680, NDCG: 0.440  (slowing)
Epoch 20: HR: 0.690, NDCG: 0.445  (converged)
```

**When to Stop:**
- If HR stops improving for several epochs
- If test HR starts decreasing (overfitting)
- After specified number of epochs

---

#### 7.4 Model Saving

**Why Save Models?**
- Training takes time (minutes to hours)
- Best model might not be the last epoch
- Allows loading and using model later
- Enables comparison of different runs

**What Gets Saved:**
- All model parameters (weights and biases)
- Model architecture (can reconstruct model)
- Optimizer state (optional, for resuming training)

**Loading Saved Model:**
```python
model = torch.load('best_model.pth')
model.eval()  # Set to evaluation mode
```

---

#### 7.5 Training Tips

**If Training is Slow:**
- Reduce batch size (if memory allows)
- Use GPU if available
- Reduce number of epochs (if converged early)

**If Loss Not Decreasing:**
- Check learning rate (might be too high/low)
- Check data loading (make sure data is correct)
- Check model architecture (might have bugs)

**If Overfitting (train good, test bad):**
- Increase dropout rate
- Reduce model capacity (fewer layers/embeddings)
- Get more training data
- Use early stopping

**If Underfitting (both train and test bad):**
- Increase model capacity
- Train for more epochs
- Reduce dropout
- Check if learning rate is appropriate

---

**✅ Step 7 Complete!**

We now have:
- Complete training loop implemented
- Model training and evaluation
- Best model saving
- Progress tracking

---

## Step 8: Using the Trained Model for Recommendations

Now that we have a trained model, let's learn how to use it to make recommendations! This step shows:
1. How to load a saved model
2. How to get recommendations for a user
3. How to predict user-item interaction scores
4. Practical examples

This is where the model becomes useful for real-world applications!


In [18]:
# ============================================================================
# STEP 8: USING THE TRAINED MODEL FOR RECOMMENDATIONS
# ============================================================================

"""
This step demonstrates how to use the trained NCF model to make recommendations.
"""

# ============================================================================
# 8.1 LOAD SAVED MODEL (Optional - if you want to load from disk)
# ============================================================================

print("=" * 70)
print("STEP 8: Using Trained Model for Recommendations")
print("=" * 70)

# If you saved a model and want to load it later, use this:
# model_path_to_load = os.path.join(model_path, f'{model_name}.pth')
# if os.path.exists(model_path_to_load):
#     print(f"Loading model from {model_path_to_load}...")
#     ncf_model = torch.load(model_path_to_load)
#     if torch.cuda.is_available():
#         ncf_model = ncf_model.cuda()
#     ncf_model.eval()
#     print("✓ Model loaded successfully!")
# else:
#     print("Using currently trained model...")

# For now, we'll use the model we just trained
print("Using the trained model from Step 7...")
model_path_to_load = os.path.join(model_path, f'{model_name}.pth')
if os.path.exists(model_path_to_load):

    print("✓ Model ready for inference")
    model = torch.load('models/NeuMF-end.pth' , weights_only=False)
    model.eval()  # Important: set to evaluation mode


STEP 8: Using Trained Model for Recommendations
Using the trained model from Step 7...
✓ Model ready for inference


In [19]:
# ============================================================================
# 8.2 GET TOP-K RECOMMENDATIONS FOR A USER
# ============================================================================

def get_top_k_recommendations(model, user_id, item_ids, k=10, device='cpu'):
    """
    Get top-K item recommendations for a given user.
    
    Parameters:
    - model: Trained NCF model
    - user_id: ID of the user (integer)
    - item_ids: List of item IDs to consider (e.g., all items or candidate items)
    - k: Number of recommendations to return
    - device: 'cuda' or 'cpu'
    
    Returns:
    - top_k_items: List of top-K recommended item IDs
    - top_k_scores: List of corresponding prediction scores
    """
    model.eval()  # Set to evaluation mode
    
    # Convert to tensors
    user_tensor = torch.LongTensor([user_id] * len(item_ids))
    item_tensor = torch.LongTensor(item_ids)
    
    # Move to device
    if device == 'cuda' and torch.cuda.is_available():
        user_tensor = user_tensor.cuda()
        item_tensor = item_tensor.cuda()
    
    # Get predictions
    with torch.no_grad():
        scores = model(user_tensor, item_tensor)
        scores = scores.cpu().numpy()
    
    # Get top-K items
    top_k_indices = np.argsort(scores)[::-1][:k]  # Sort descending, take top K
    top_k_items = [item_ids[i] for i in top_k_indices]
    top_k_scores = scores[top_k_indices].tolist()
    
    return top_k_items, top_k_scores

print("\n" + "=" * 70)
print("STEP 8.2: Recommendation Function")
print("=" * 70)
print("✓ get_top_k_recommendations() function defined")
print("  - Takes user ID and candidate items")
print("  - Returns top-K recommendations with scores")



STEP 8.2: Recommendation Function
✓ get_top_k_recommendations() function defined
  - Takes user ID and candidate items
  - Returns top-K recommendations with scores


In [20]:
# ============================================================================
# 8.3 PREDICT USER-ITEM INTERACTION SCORE
# ============================================================================

def predict_interaction_score(model, user_id, item_id, device='cpu'):
    """
    Predict the interaction score for a specific user-item pair.
    
    Parameters:
    - model: Trained NCF model
    - user_id: ID of the user (integer)
    - item_id: ID of the item (integer)
    - device: 'cuda' or 'cpu'
    
    Returns:
    - score: Prediction score (higher = more likely user will like item)
    """
    model.eval()
    
    # Convert to tensors
    user_tensor = torch.LongTensor([user_id])
    item_tensor = torch.LongTensor([item_id])
    
    # Move to device
    if device == 'cuda' and torch.cuda.is_available():
        user_tensor = user_tensor.cuda()
        item_tensor = item_tensor.cuda()
    
    # Get prediction
    with torch.no_grad():
        score = model(user_tensor, item_tensor)
        score = score.cpu().item()
    
    return score

print("\n" + "=" * 70)
print("STEP 8.3: Prediction Function")
print("=" * 70)
print("✓ predict_interaction_score() function defined")
print("  - Predicts score for a single user-item pair")
print("  - Returns a single score value")



STEP 8.3: Prediction Function
✓ predict_interaction_score() function defined
  - Predicts score for a single user-item pair
  - Returns a single score value


In [21]:
# ============================================================================
# 8.4 EXAMPLE: GET RECOMMENDATIONS FOR A USER
# ============================================================================

print("\n" + "=" * 70)
print("STEP 8.4: Example Usage")
print("=" * 70)

# Example: Get recommendations for user 0
example_user_id = 0
print(f"\nGetting top-{top_k} recommendations for User {example_user_id}...")

# Get all item IDs (excluding items user already interacted with in training)
# In practice, you might want to filter out items the user has already seen
all_item_ids = list(range(item_num))

# Get top-K recommendations
recommended_items, recommended_scores = get_top_k_recommendations(
    model,
    example_user_id,
    all_item_ids,
    k=top_k,
    device=device
)

print(f"\n✓ Top-{top_k} Recommendations for User {example_user_id}:")
print("-" * 70)
for i, (item_id, score) in enumerate(zip(recommended_items, recommended_scores), 1):
    print(f"  {i}. Item {item_id:5d} - Score: {score:7.4f}")

# Example: Predict score for a specific user-item pair
print(f"\n" + "=" * 70)
print("Example: Predicting interaction score")
print("=" * 70)
example_item_id = recommended_items[0]  # Use the top recommended item
score = predict_interaction_score(model, example_user_id, example_item_id, device=device)
print(f"User {example_user_id} - Item {example_item_id}: Score = {score:.4f}")
print(f"  → Higher score means user is more likely to like this item")



STEP 8.4: Example Usage

Getting top-10 recommendations for User 0...

✓ Top-10 Recommendations for User 0:
----------------------------------------------------------------------
  1. Item   565 - Score:  4.2599
  2. Item   559 - Score:  3.9050
  3. Item    33 - Score:  3.8329
  4. Item   563 - Score:  3.7590
  5. Item  1125 - Score:  3.5868
  6. Item  1113 - Score:  3.5455
  7. Item   344 - Score:  3.4968
  8. Item  1811 - Score:  3.4611
  9. Item   970 - Score:  3.4368
  10. Item   299 - Score:  3.3611

Example: Predicting interaction score
User 0 - Item 565: Score = 4.2599
  → Higher score means user is more likely to like this item


In [22]:
# ============================================================================
# 8.5 FILTER OUT ITEMS USER ALREADY INTERACTED WITH
# ============================================================================

def get_recommendations_excluding_training(user_id, model, train_mat, all_items, k=10, device='cpu'):
    """
    Get recommendations for a user, excluding items they've already interacted with.
    
    This is more realistic - we don't want to recommend items the user already knows about.
    
    Parameters:
    - user_id: ID of the user
    - model: Trained NCF model
    - train_mat: Training interaction matrix (to check what user already interacted with)
    - all_items: List of all item IDs
    - k: Number of recommendations
    - device: 'cuda' or 'cpu'
    
    Returns:
    - top_k_items: Recommended item IDs (excluding training items)
    - top_k_scores: Corresponding scores
    """
    # Filter out items user already interacted with
    candidate_items = [item_id for item_id in all_items 
                       if (user_id, item_id) not in train_mat]
    
    if len(candidate_items) < k:
        print(f"Warning: Only {len(candidate_items)} candidate items available (requested {k})")
        k = len(candidate_items)
    
    # Get recommendations from candidate items
    top_k_items, top_k_scores = get_top_k_recommendations(
        model, user_id, candidate_items, k=k, device=device
    )
    
    return top_k_items, top_k_scores

print("\n" + "=" * 70)
print("STEP 8.5: Filtered Recommendations")
print("=" * 70)

# Example with filtering
example_user_id = 10
print(f"\nGetting filtered recommendations for User {example_user_id}...")
print("(Excluding items user already interacted with in training)")

filtered_items, filtered_scores = get_recommendations_excluding_training(
    example_user_id,
    ncf_model,
    train_mat,
    list(range(item_num)),
    k=top_k,
    device=device
)

print(f"\n✓ Top-{top_k} NEW Recommendations for User {example_user_id}:")
print("-" * 70)
for i, (item_id, score) in enumerate(zip(filtered_items, filtered_scores), 1):
    print(f"  {i}. Item {item_id:5d} - Score: {score:7.4f}")

print("\n" + "=" * 70)
print("✓ Recommendation system ready!")
print("=" * 70)



STEP 8.5: Filtered Recommendations

Getting filtered recommendations for User 10...
(Excluding items user already interacted with in training)

✓ Top-10 NEW Recommendations for User 10:
----------------------------------------------------------------------
  1. Item  2526 - Score:  3.8752
  2. Item  2659 - Score:  3.4374
  3. Item   939 - Score:  3.2481
  4. Item  1325 - Score:  3.1528
  5. Item  1650 - Score:  3.1314
  6. Item   211 - Score:  2.9503
  7. Item  2093 - Score:  2.9223
  8. Item  1499 - Score:  2.9186
  9. Item   278 - Score:  2.8437
  10. Item   324 - Score:  2.7961

✓ Recommendation system ready!


### Detailed Explanation of Step 8:

#### 8.1 Loading a Saved Model

**When to Load:**
- After training, if you saved the model and want to use it later
- When deploying the model in production
- When sharing the model with others

**How to Load:**
```python
model = torch.load('path/to/model.pth')
model.eval()  # Important: set to evaluation mode
```

**Why `model.eval()`?**
- Disables dropout (uses all neurons)
- Uses deterministic behavior
- Required for consistent predictions

---

#### 8.2 Getting Top-K Recommendations

**How It Works:**
1. **Input**: User ID and list of candidate items
2. **Process**: 
   - Get model predictions for all candidate items
   - Sort items by prediction score (descending)
   - Return top-K items
3. **Output**: List of recommended item IDs and their scores

**Example Use Case:**
```python
# Recommend 10 movies for user 123
recommendations, scores = get_top_k_recommendations(
    model, user_id=123, item_ids=all_movie_ids, k=10
)
```

**Performance Considerations:**
- If you have many items (millions), consider:
  - Pre-filtering candidates (e.g., by genre, popularity)
  - Using approximate nearest neighbor search
  - Caching embeddings for faster computation

---

#### 8.3 Predicting Single Interaction Score

**When to Use:**
- Check if a specific user will like a specific item
- Rank a small set of items
- A/B testing different items

**Example:**
```python
score = predict_interaction_score(model, user_id=123, item_id=456)
if score > 0.5:  # Threshold (adjust based on your data)
    print("User will likely like this item")
```

**Interpreting Scores:**
- Higher score = more likely user will like item
- Scores are logits (not probabilities)
- To get probabilities: `prob = torch.sigmoid(torch.tensor(score))`

---

#### 8.4 Filtering Training Items

**Why Filter?**
- Users have already seen/interacted with training items
- We want to recommend NEW items
- More realistic recommendation scenario

**How It Works:**
1. Check training matrix: `(user_id, item_id) in train_mat`
2. Exclude items user already interacted with
3. Get recommendations from remaining items

**Example:**
```python
# Get new recommendations (excluding items user already knows)
recommendations = get_recommendations_excluding_training(
    user_id, model, train_mat, all_items, k=10
)
```

---

#### 8.5 Real-World Usage Tips

**1. Pre-compute Embeddings (Optional):**
```python
# For faster recommendations, pre-compute item embeddings
with torch.no_grad():
    item_embeddings = model.embed_item_GMF(torch.arange(item_num))
# Then use these for faster similarity search
```

**2. Batch Predictions:**
```python
# Predict for multiple users at once (faster)
user_ids = torch.LongTensor([1, 2, 3, 4, 5])
item_ids = torch.LongTensor([10, 20, 30, 40, 50])
scores = model(user_ids, item_ids)  # Batch prediction
```

**3. Cold Start Problem:**
- New users: No interaction history
- Solutions: Use popularity-based recommendations, ask for preferences
- New items: No interaction history
- Solutions: Use content-based features, wait for initial interactions

**4. Evaluation in Production:**
- A/B testing: Compare different models
- Online metrics: Click-through rate, conversion rate
- Offline metrics: HR, NDCG (what we used)

---

**✅ Step 8 Complete!**

We now have:
- Functions to get recommendations
- Functions to predict interaction scores
- Example usage code
- Understanding of how to use the model in practice

---

**🎉 Congratulations! You've completed the full NCF Tutorial!**

**What You've Learned:**
1. ✅ Environment setup and imports
2. ✅ Configuration and hyperparameters
3. ✅ Data downloading and preprocessing
4. ✅ PyTorch Dataset class implementation
5. ✅ NCF model architecture (with detailed visualization)
6. ✅ Evaluation metrics (Hit Rate and NDCG)
7. ✅ Training loop implementation
8. ✅ Using the model for recommendations

**The Complete Pipeline:**
```
Data → Preprocessing → Dataset → Model → Training → Evaluation → Recommendations
```

**Next Steps:**
- Experiment with different hyperparameters
- Try different model architectures (GMF, MLP, NeuMF)
- Test on different datasets
- Deploy for production use
- Explore advanced techniques (attention mechanisms, graph neural networks)

**Thank you for following along! Happy recommending! 🚀**


---

## Step 9: Loading and Using Saved Models

This final step shows how to load the saved models and use them for recommendations. All models (GMF, MLP, NeuMF-end, NeuMF-pre) have been saved and can be loaded independently.


In [23]:
# ============================================================================
# STEP 9: LOADING AND USING SAVED MODELS
# ============================================================================

print("=" * 70)
print("STEP 9: Loading and Using Saved Models")
print("=" * 70)

# ============================================================================
# 9.1 LOAD ALL SAVED MODELS
# ============================================================================

print("\n" + "=" * 70)
print("STEP 9.1: Loading Saved Models")
print("=" * 70)

loaded_models = {}

# Load GMF model
if os.path.exists(GMF_model_path):
    print(f"Loading GMF model from {GMF_model_path}...")
    gmf_loaded = torch.load(GMF_model_path, map_location=device)
    if device == 'cpu':
        gmf_loaded = gmf_loaded.cpu()
    gmf_loaded.eval()
    loaded_models['GMF'] = gmf_loaded
    print("✓ GMF model loaded")
else:
    print(f"⚠ GMF model not found at {GMF_model_path}")

# Load MLP model
if os.path.exists(MLP_model_path):
    print(f"Loading MLP model from {MLP_model_path}...")
    mlp_loaded = torch.load(MLP_model_path, map_location=device)
    if device == 'cpu':
        mlp_loaded = mlp_loaded.cpu()
    mlp_loaded.eval()
    loaded_models['MLP'] = mlp_loaded
    print("✓ MLP model loaded")
else:
    print(f"⚠ MLP model not found at {MLP_model_path}")

# Load NeuMF-end model
neumf_end_path = os.path.join(model_path, 'NeuMF-end.pth')
if os.path.exists(neumf_end_path):
    print(f"Loading NeuMF-end model from {neumf_end_path}...")
    neumf_end_loaded = torch.load(neumf_end_path, map_location=device)
    if device == 'cpu':
        neumf_end_loaded = neumf_end_loaded.cpu()
    neumf_end_loaded.eval()
    loaded_models['NeuMF-end'] = neumf_end_loaded
    print("✓ NeuMF-end model loaded")
else:
    print(f"⚠ NeuMF-end model not found at {neumf_end_path}")

# Load NeuMF-pre model
if os.path.exists(NeuMF_model_path):
    print(f"Loading NeuMF-pre model from {NeuMF_model_path}...")
    neumf_pre_loaded = torch.load(NeuMF_model_path, map_location=device)
    if device == 'cpu':
        neumf_pre_loaded = neumf_pre_loaded.cpu()
    neumf_pre_loaded.eval()
    loaded_models['NeuMF-pre'] = neumf_pre_loaded
    print("✓ NeuMF-pre model loaded")
else:
    print(f"⚠ NeuMF-pre model not found at {NeuMF_model_path}")

print(f"\n✓ Loaded {len(loaded_models)} model(s)")
print(f"  Available models: {list(loaded_models.keys())}")


STEP 9: Loading and Using Saved Models

STEP 9.1: Loading Saved Models
Loading GMF model from ./models/GMF.pth...


UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL __main__.NCF was not an allowed global by default. Please use `torch.serialization.add_safe_globals([__main__.NCF])` or the `torch.serialization.safe_globals([__main__.NCF])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.

In [None]:
# ============================================================================
# 9.2 COMPARE RECOMMENDATIONS FROM ALL MODELS
# ============================================================================

print("\n" + "=" * 70)
print("STEP 9.2: Comparing Recommendations from All Models")
print("=" * 70)

# Example user
example_user_id = 0
print(f"\nGetting recommendations for User {example_user_id} using all models...")

# Get recommendations from each model
all_recommendations = {}

for model_name, model in loaded_models.items():
    if model_name in loaded_models:
        print(f"\n{model_name} recommendations:")
        recommendations, scores = get_top_k_recommendations(
            model, example_user_id, list(range(item_num)), k=top_k, device=device
        )
        all_recommendations[model_name] = (recommendations, scores)
        
        print(f"  Top-{top_k} items:")
        for i, (item_id, score) in enumerate(zip(recommendations[:5], scores[:5]), 1):
            print(f"    {i}. Item {item_id:5d} - Score: {score:7.4f}")
        if top_k > 5:
            print(f"    ... and {top_k - 5} more")

# Compare overlap between models
print("\n" + "=" * 70)
print("Recommendation Overlap Analysis")
print("=" * 70)

if len(all_recommendations) >= 2:
    model_names = list(all_recommendations.keys())
    for i, model1 in enumerate(model_names):
        for model2 in model_names[i+1:]:
            items1 = set(all_recommendations[model1][0])
            items2 = set(all_recommendations[model2][0])
            overlap = len(items1 & items2)
            print(f"{model1} vs {model2}: {overlap}/{top_k} items overlap ({overlap/top_k*100:.1f}%)")
