# Lab B.2: Neural Collaborative Filtering

**Module:** B - Recommender Systems  
**Time:** 2 hours  
**Difficulty:** ‚≠ê‚≠ê‚≠ê (Intermediate)

---

## üéØ Learning Objectives

By the end of this notebook, you will:
- [ ] Understand the limitations of linear matrix factorization
- [ ] Implement Generalized Matrix Factorization (GMF)
- [ ] Implement Multi-Layer Perceptron (MLP) for recommendations
- [ ] Build the full NeuMF architecture combining GMF and MLP
- [ ] Train with implicit feedback and negative sampling
- [ ] Evaluate with Hit Rate @ K

---

## üìö Prerequisites

- Completed: Lab B.1 (Collaborative Filtering Fundamentals)
- Completed: Module 2.1 (PyTorch Fundamentals)

---

## üåç Real-World Context

**The Problem with Clicks:** Most recommendation systems don't have explicit ratings - they only know what users clicked/watched/bought (implicit feedback). You don't know what rating a user *would* give; you only know they interacted.

**Neural Collaborative Filtering (NeuMF)** was published by He et al. at WWW 2017 and showed that neural networks can learn more complex user-item interactions than simple matrix factorization.

NeuMF and its variants power:
- üéµ **Spotify**: Combining collaborative signals with content features
- üì∫ **YouTube**: Deep neural networks for candidate generation
- üõí **Alibaba**: Neural networks for e-commerce recommendations

---

## üßí ELI5: Why Neural Networks?

> **The Dot Product Limitation:**
>
> Remember matrix factorization? It predicts ratings using:
> $$\text{rating} = \text{user\_vector} \cdot \text{item\_vector}$$
>
> This is just a weighted sum - a **linear** operation. It can't capture complex patterns like:
> - "User likes action movies, EXCEPT when they're also romantic"
> - "User watches comedies on weekends but documentaries on weeknights"
>
> **Neural Networks to the Rescue:**
>
> Instead of a simple dot product, we pass the user and item embeddings through 
> neural network layers that can learn **non-linear** relationships!
>
> Think of it like this:
> - **Matrix Factorization**: "Do these puzzle pieces fit?" (yes/no)
> - **Neural CF**: "Let me examine these pieces under a microscope, rotate them, 
>   and figure out exactly HOW they might fit together."

---

## Part 1: Setup and Data Preparation

In [None]:
# Imports
import sys
from pathlib import Path

module_dir = Path.cwd().parent if 'notebooks' in str(Path.cwd()) else Path.cwd()
sys.path.insert(0, str(module_dir / 'scripts'))

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm.auto import tqdm
import warnings
warnings.filterwarnings('ignore')

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

from data_utils import download_movielens, leave_one_out_split

# Set seeds
np.random.seed(42)
torch.manual_seed(42)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"üñ•Ô∏è  Using device: {device}")

if torch.cuda.is_available():
    print(f"üöÄ GPU: {torch.cuda.get_device_name(0)}")
    print(f"üíæ Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

In [None]:
# Load data
ratings_df, movies_df = download_movielens('100k')

# For implicit feedback, convert to binary (1 = interacted)
# We treat any rating as a positive signal
ratings_df['label'] = 1

num_users = ratings_df['user_id'].nunique()
num_items = ratings_df['item_id'].nunique()

print(f"üìä Dataset:")
print(f"   Users: {num_users}")
print(f"   Items: {num_items}")
print(f"   Interactions: {len(ratings_df):,}")

### Leave-One-Out Evaluation Protocol

For implicit feedback, we use **leave-one-out evaluation**:
1. For each user, hold out their last interaction for testing
2. Train on all other interactions
3. At test time, rank ALL items and see if the held-out item appears in top K

In [None]:
# Leave-one-out split
train_df, test_df = leave_one_out_split(ratings_df, by_time=True)

print(f"üìä Data Split:")
print(f"   Training interactions: {len(train_df):,}")
print(f"   Test interactions: {len(test_df):,} (one per user)")

# Build user -> positive items mapping for negative sampling
user_positive_items = train_df.groupby('user_id')['item_id'].apply(set).to_dict()

---

## Part 2: Negative Sampling Dataset

For implicit feedback, we only have positive examples (things users clicked). We need to generate **negative samples** - items the user probably doesn't like.

In [None]:
class NCFDataset(Dataset):
    """
    Dataset for Neural Collaborative Filtering with negative sampling.
    
    For each positive (user, item) pair, we sample negative items
    that the user hasn't interacted with.
    """
    
    def __init__(self, interactions_df, num_items, user_positive_items, num_negatives=4):
        """
        Args:
            interactions_df: DataFrame with user_id, item_id columns
            num_items: Total number of items
            user_positive_items: Dict mapping user_id -> set of positive item_ids
            num_negatives: Number of negative samples per positive
        """
        self.users = interactions_df['user_id'].values
        self.items = interactions_df['item_id'].values
        self.num_items = num_items
        self.user_positive_items = user_positive_items
        self.num_negatives = num_negatives
        
    def __len__(self):
        return len(self.users)
    
    def __getitem__(self, idx):
        user = self.users[idx]
        pos_item = self.items[idx]
        
        # Sample negative items
        neg_items = []
        positives = self.user_positive_items.get(user, set())
        
        while len(neg_items) < self.num_negatives:
            neg_item = np.random.randint(0, self.num_items)
            if neg_item not in positives and neg_item not in neg_items:
                neg_items.append(neg_item)
        
        # Return: users, items, labels
        # 1 positive + num_negatives negatives
        users = [user] * (1 + self.num_negatives)
        items = [pos_item] + neg_items
        labels = [1.0] + [0.0] * self.num_negatives
        
        return (
            torch.LongTensor(users),
            torch.LongTensor(items),
            torch.FloatTensor(labels)
        )


def collate_fn(batch):
    """Flatten batch into single tensors."""
    users = torch.cat([b[0] for b in batch])
    items = torch.cat([b[1] for b in batch])
    labels = torch.cat([b[2] for b in batch])
    return users, items, labels


# Create dataset and loader
num_negatives = 4  # 4 negatives per positive (common choice)

train_dataset = NCFDataset(
    train_df, 
    num_items, 
    user_positive_items, 
    num_negatives=num_negatives
)

train_loader = DataLoader(
    train_dataset, 
    batch_size=256, 
    shuffle=True,
    collate_fn=collate_fn,
    num_workers=0
)

# Quick test
users, items, labels = next(iter(train_loader))
print(f"‚úÖ Batch shape: users={users.shape}, items={items.shape}, labels={labels.shape}")
print(f"   Positive ratio: {labels.mean():.2%} (expected: {1/(1+num_negatives):.2%})")

---

## Part 3: Generalized Matrix Factorization (GMF)

GMF is matrix factorization with a twist: instead of a fixed dot product, we learn how to combine the element-wise product.

In [None]:
class GMF(nn.Module):
    """
    Generalized Matrix Factorization.
    
    Instead of: sum(user_emb * item_emb)  [fixed dot product]
    We learn:   linear(user_emb * item_emb)  [learnable combination]
    
    This allows the model to weight different latent dimensions differently.
    """
    
    def __init__(self, num_users, num_items, embedding_dim=64):
        super().__init__()
        
        self.user_embedding = nn.Embedding(num_users, embedding_dim)
        self.item_embedding = nn.Embedding(num_items, embedding_dim)
        self.output = nn.Linear(embedding_dim, 1)
        
        self._init_weights()
        
    def _init_weights(self):
        nn.init.normal_(self.user_embedding.weight, std=0.01)
        nn.init.normal_(self.item_embedding.weight, std=0.01)
        nn.init.xavier_uniform_(self.output.weight)
        
    def forward(self, user_ids, item_ids):
        user_emb = self.user_embedding(user_ids)
        item_emb = self.item_embedding(item_ids)
        
        # Element-wise product (Hadamard product)
        element_product = user_emb * item_emb
        
        # Learned linear combination + sigmoid for probability
        output = torch.sigmoid(self.output(element_product))
        
        return output.squeeze()


# Test GMF
gmf_model = GMF(num_users, num_items, embedding_dim=32).to(device)
test_users = torch.LongTensor([0, 1, 2]).to(device)
test_items = torch.LongTensor([10, 20, 30]).to(device)
test_output = gmf_model(test_users, test_items)

print(f"‚úÖ GMF output: {test_output.cpu().detach().numpy()}")
print(f"   Parameters: {sum(p.numel() for p in gmf_model.parameters()):,}")

---

## Part 4: Multi-Layer Perceptron (MLP) Tower

The MLP tower concatenates user and item embeddings, then passes them through fully-connected layers to learn non-linear interactions.

In [None]:
class MLP(nn.Module):
    """
    Multi-Layer Perceptron for learning non-linear user-item interactions.
    
    Concatenates user and item embeddings, then passes through MLP layers.
    The MLP can learn complex patterns that GMF cannot.
    """
    
    def __init__(self, num_users, num_items, embedding_dim=64, 
                 hidden_layers=[128, 64, 32], dropout=0.2):
        super().__init__()
        
        self.user_embedding = nn.Embedding(num_users, embedding_dim)
        self.item_embedding = nn.Embedding(num_items, embedding_dim)
        
        # Build MLP layers
        layers = []
        input_dim = embedding_dim * 2  # Concatenated user + item
        
        for hidden_dim in hidden_layers:
            layers.extend([
                nn.Linear(input_dim, hidden_dim),
                nn.ReLU(),
                nn.Dropout(dropout)
            ])
            input_dim = hidden_dim
            
        self.mlp = nn.Sequential(*layers)
        self.output = nn.Linear(hidden_layers[-1], 1)
        
        self._init_weights()
        
    def _init_weights(self):
        nn.init.normal_(self.user_embedding.weight, std=0.01)
        nn.init.normal_(self.item_embedding.weight, std=0.01)
        for layer in self.mlp:
            if isinstance(layer, nn.Linear):
                nn.init.xavier_uniform_(layer.weight)
        nn.init.xavier_uniform_(self.output.weight)
        
    def forward(self, user_ids, item_ids):
        user_emb = self.user_embedding(user_ids)
        item_emb = self.item_embedding(item_ids)
        
        # Concatenate user and item embeddings
        concat = torch.cat([user_emb, item_emb], dim=1)
        
        # Pass through MLP
        hidden = self.mlp(concat)
        output = torch.sigmoid(self.output(hidden))
        
        return output.squeeze()


# Test MLP
mlp_model = MLP(num_users, num_items, embedding_dim=32).to(device)
test_output = mlp_model(test_users, test_items)

print(f"‚úÖ MLP output: {test_output.cpu().detach().numpy()}")
print(f"   Parameters: {sum(p.numel() for p in mlp_model.parameters()):,}")

---

## Part 5: NeuMF = GMF + MLP

The key insight of NeuMF: combine the linear power of GMF with the non-linear expressiveness of MLP!

In [None]:
class NeuMF(nn.Module):
    """
    Neural Matrix Factorization (NeuMF).
    
    Combines GMF (linear interactions) with MLP (non-linear interactions)
    to get the best of both worlds.
    
    Architecture:
    1. GMF pathway: user_emb * item_emb -> linear
    2. MLP pathway: concat(user_emb, item_emb) -> MLP layers
    3. Concatenate GMF and MLP outputs -> final prediction
    
    Paper: "Neural Collaborative Filtering" (He et al., WWW 2017)
    """
    
    def __init__(self, num_users, num_items, 
                 gmf_dim=32, mlp_dim=64, 
                 mlp_layers=[128, 64, 32], 
                 dropout=0.2):
        super().__init__()
        
        self.num_users = num_users
        self.num_items = num_items
        
        # GMF pathway - separate embeddings
        self.gmf_user_emb = nn.Embedding(num_users, gmf_dim)
        self.gmf_item_emb = nn.Embedding(num_items, gmf_dim)
        
        # MLP pathway - separate embeddings
        self.mlp_user_emb = nn.Embedding(num_users, mlp_dim)
        self.mlp_item_emb = nn.Embedding(num_items, mlp_dim)
        
        # MLP layers
        layers = []
        input_dim = mlp_dim * 2
        
        for hidden_dim in mlp_layers:
            layers.extend([
                nn.Linear(input_dim, hidden_dim),
                nn.ReLU(),
                nn.Dropout(dropout)
            ])
            input_dim = hidden_dim
            
        self.mlp = nn.Sequential(*layers)
        
        # Final prediction: GMF output (gmf_dim) + MLP output (mlp_layers[-1])
        self.output = nn.Linear(gmf_dim + mlp_layers[-1], 1)
        
        self._init_weights()
        
    def _init_weights(self):
        for emb in [self.gmf_user_emb, self.gmf_item_emb, 
                    self.mlp_user_emb, self.mlp_item_emb]:
            nn.init.normal_(emb.weight, std=0.01)
        for layer in self.mlp:
            if isinstance(layer, nn.Linear):
                nn.init.xavier_uniform_(layer.weight)
        nn.init.xavier_uniform_(self.output.weight)
        
    def forward(self, user_ids, item_ids):
        # GMF pathway: element-wise product
        gmf_user = self.gmf_user_emb(user_ids)
        gmf_item = self.gmf_item_emb(item_ids)
        gmf_output = gmf_user * gmf_item  # Shape: (batch, gmf_dim)
        
        # MLP pathway: concatenate and pass through MLP
        mlp_user = self.mlp_user_emb(user_ids)
        mlp_item = self.mlp_item_emb(item_ids)
        mlp_input = torch.cat([mlp_user, mlp_item], dim=1)
        mlp_output = self.mlp(mlp_input)  # Shape: (batch, mlp_layers[-1])
        
        # Combine GMF and MLP
        combined = torch.cat([gmf_output, mlp_output], dim=1)
        prediction = torch.sigmoid(self.output(combined))
        
        return prediction.squeeze()
    
    def predict_all_items(self, user_id):
        """Predict scores for all items for a single user."""
        self.eval()
        with torch.no_grad():
            user_ids = torch.LongTensor([user_id] * self.num_items).to(
                next(self.parameters()).device
            )
            item_ids = torch.arange(self.num_items).to(
                next(self.parameters()).device
            )
            return self(user_ids, item_ids)


# Test NeuMF
neumf_model = NeuMF(num_users, num_items).to(device)
test_output = neumf_model(test_users, test_items)

print(f"‚úÖ NeuMF output: {test_output.cpu().detach().numpy()}")
print(f"   Total parameters: {sum(p.numel() for p in neumf_model.parameters()):,}")

### üîç Architecture Visualization

```
User ID ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ GMF User Emb ‚îÄ‚îÄ‚îê
          ‚îÇ                  ‚îú‚îÄ‚îÄ Element-wise √ó ‚îÄ‚îÄ‚îê
Item ID ‚îÄ‚îÄ‚îº‚îÄ‚îÄ GMF Item Emb ‚îÄ‚îÄ‚îò                    ‚îÇ
          ‚îÇ                                       ‚îú‚îÄ‚îÄ Concat ‚îÄ‚îÄ Linear ‚îÄ‚îÄ Sigmoid ‚îÄ‚îÄ Output
          ‚îú‚îÄ‚îÄ MLP User Emb ‚îÄ‚îÄ‚îê                    ‚îÇ
          ‚îÇ                  ‚îú‚îÄ‚îÄ Concat ‚îÄ‚îÄ MLP ‚îÄ‚îÄ‚îÄ‚îò
          ‚îî‚îÄ‚îÄ MLP Item Emb ‚îÄ‚îÄ‚îò
```

---

## Part 6: Training and Evaluation

In [None]:
def train_ncf(model, train_loader, optimizer, device, epoch):
    """Train for one epoch."""
    model.train()
    total_loss = 0
    
    pbar = tqdm(train_loader, desc=f'Epoch {epoch}')
    for users, items, labels in pbar:
        users = users.to(device)
        items = items.to(device)
        labels = labels.to(device)
        
        # Forward
        predictions = model(users, items)
        loss = F.binary_cross_entropy(predictions, labels)
        
        # Backward
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item() * len(users)
        pbar.set_postfix({'loss': loss.item()})
    
    return total_loss / len(train_loader.dataset)


def evaluate_hit_rate(model, test_df, user_positive_items, k=10):
    """
    Evaluate Hit Rate @ K.
    
    For each user:
    1. Predict scores for all items
    2. Check if the held-out test item is in top K
    """
    model.eval()
    hits = 0
    
    # Group test by user
    test_users = test_df['user_id'].unique()
    
    with torch.no_grad():
        for user_id in tqdm(test_users, desc=f'Evaluating HR@{k}'):
            # Get test item for this user
            test_item = test_df[test_df['user_id'] == user_id]['item_id'].values[0]
            
            # Predict all items
            scores = model.predict_all_items(user_id)
            
            # Mask out training items (we can't recommend items already interacted with)
            train_items = list(user_positive_items.get(user_id, set()))
            scores[train_items] = -float('inf')
            
            # Get top K
            _, top_k_items = torch.topk(scores, k)
            
            # Check if test item is in top K
            if test_item in top_k_items.cpu().numpy():
                hits += 1
    
    return hits / len(test_users)

In [None]:
# Initialize model
model = NeuMF(
    num_users=num_users,
    num_items=num_items,
    gmf_dim=32,
    mlp_dim=64,
    mlp_layers=[128, 64, 32],
    dropout=0.2
).to(device)

optimizer = optim.Adam(model.parameters(), lr=0.001)

print(f"üéØ Target: Hit Rate @ 10 > 0.65")
print(f"   Model parameters: {sum(p.numel() for p in model.parameters()):,}")

In [None]:
# Training loop
num_epochs = 10
train_losses = []
hit_rates = []
best_hr = 0

for epoch in range(1, num_epochs + 1):
    # Train
    train_loss = train_ncf(model, train_loader, optimizer, device, epoch)
    train_losses.append(train_loss)
    
    # Evaluate every 2 epochs (evaluation is slow)
    if epoch % 2 == 0 or epoch == num_epochs:
        hr = evaluate_hit_rate(model, test_df, user_positive_items, k=10)
        hit_rates.append((epoch, hr))
        
        if hr > best_hr:
            best_hr = hr
            best_epoch = epoch
        
        print(f"\nüìä Epoch {epoch}: Loss={train_loss:.4f}, HR@10={hr:.4f}")

print(f"\n{'='*50}")
print(f"üèÜ Best HR@10: {best_hr:.4f} (Epoch {best_epoch})")
if best_hr > 0.65:
    print(f"üéâ Goal achieved! HR@10 > 0.65")
else:
    print(f"üìà Keep tuning! Try more epochs or adjust hyperparameters.")

In [None]:
# Visualize training
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Training loss
axes[0].plot(range(1, len(train_losses)+1), train_losses, 'b-', linewidth=2)
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Training Loss (BCE)')
axes[0].set_title('Training Loss')
axes[0].grid(True, alpha=0.3)

# Hit Rate
epochs, hrs = zip(*hit_rates)
axes[1].plot(epochs, hrs, 'go-', linewidth=2, markersize=8)
axes[1].axhline(y=0.65, color='red', linestyle='--', label='Target: 0.65')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Hit Rate @ 10')
axes[1].set_title('Hit Rate @ 10')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

---

## Part 7: Comparing Models

Let's compare GMF, MLP, and NeuMF to see which performs best.

In [None]:
def train_and_evaluate_model(model_class, model_kwargs, train_loader, test_df, 
                             user_positive_items, device, epochs=10):
    """
    Train a model and return its best hit rate.
    """
    model = model_class(**model_kwargs).to(device)
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    
    best_hr = 0
    for epoch in range(1, epochs + 1):
        # Train (silently)
        model.train()
        for users, items, labels in train_loader:
            users, items, labels = users.to(device), items.to(device), labels.to(device)
            predictions = model(users, items)
            loss = F.binary_cross_entropy(predictions, labels)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
        
        # Evaluate on last epoch
        if epoch == epochs:
            hr = evaluate_hit_rate(model, test_df, user_positive_items, k=10)
            best_hr = hr
    
    return best_hr, model


print("üî¨ Comparing Models (this takes a few minutes)...\n")

results = {}

# GMF
print("Training GMF...")
gmf_hr, _ = train_and_evaluate_model(
    GMF, 
    {'num_users': num_users, 'num_items': num_items, 'embedding_dim': 64},
    train_loader, test_df, user_positive_items, device, epochs=10
)
results['GMF'] = gmf_hr
print(f"  GMF HR@10: {gmf_hr:.4f}")

# MLP
print("\nTraining MLP...")
mlp_hr, _ = train_and_evaluate_model(
    MLP,
    {'num_users': num_users, 'num_items': num_items, 'embedding_dim': 64},
    train_loader, test_df, user_positive_items, device, epochs=10
)
results['MLP'] = mlp_hr
print(f"  MLP HR@10: {mlp_hr:.4f}")

# NeuMF (already trained)
results['NeuMF'] = best_hr
print(f"\n  NeuMF HR@10: {best_hr:.4f}")

In [None]:
# Visualize comparison
plt.figure(figsize=(8, 5))

models = list(results.keys())
hrs = [results[m] for m in models]
colors = ['#3498db', '#2ecc71', '#e74c3c']

bars = plt.bar(models, hrs, color=colors, edgecolor='black', linewidth=2)
plt.axhline(y=0.65, color='gray', linestyle='--', label='Target: 0.65')

# Add value labels
for bar, hr in zip(bars, hrs):
    plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01, 
             f'{hr:.3f}', ha='center', va='bottom', fontsize=12, fontweight='bold')

plt.ylabel('Hit Rate @ 10')
plt.title('Model Comparison: Hit Rate @ 10')
plt.legend()
plt.ylim(0, max(hrs) + 0.1)
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

print(f"\nüìä Results:")
print(f"   GMF:    {results['GMF']:.4f}")
print(f"   MLP:    {results['MLP']:.4f}")
print(f"   NeuMF:  {results['NeuMF']:.4f}")
print(f"\n   NeuMF combines the strengths of both approaches!")

---

## ‚úã Try It Yourself!

### Exercise 1: More Negative Samples

Try different numbers of negative samples (1, 4, 8, 16) and see how it affects training.

<details>
<summary>üí° Hint</summary>

More negatives = harder training but better discrimination.
Too few = easy training but poor generalization.
Typical values: 4-10 negatives per positive.

</details>

In [None]:
# YOUR CODE HERE



### Exercise 2: Deeper MLP

Try different MLP architectures:
- Shallow: [64, 32]
- Deep: [256, 128, 64, 32]
- Wide: [512, 256]

<details>
<summary>üí° Hint</summary>

Deeper networks can learn more complex patterns but:
- Take longer to train
- Risk overfitting on small datasets
- May need more regularization

</details>

In [None]:
# YOUR CODE HERE



---

## ‚ö†Ô∏è Common Mistakes

### Mistake 1: Forgetting to Mask Training Items During Evaluation

In [None]:
# ‚ùå Wrong: Recommend items already seen
# scores = model.predict_all_items(user_id)
# top_k = torch.topk(scores, 10)  # Might include items user already rated!

# ‚úÖ Right: Mask out training items
# train_items = list(user_positive_items.get(user_id, set()))
# scores[train_items] = -float('inf')  # These won't be in top K
# top_k = torch.topk(scores, 10)

print("Why masking matters:")
print("  Without masking: HR@10 might be 95%+ (trivially recommending known items)")
print("  With masking: HR@10 is the true measure of discovery")

### Mistake 2: Using the Same Embeddings for GMF and MLP

In [None]:
# ‚ùå Wrong: Shared embeddings
# self.user_emb = nn.Embedding(num_users, dim)  # Used by both GMF and MLP

# ‚úÖ Right: Separate embeddings for each pathway
# self.gmf_user_emb = nn.Embedding(num_users, gmf_dim)
# self.mlp_user_emb = nn.Embedding(num_users, mlp_dim)

print("Why separate embeddings?")
print("  GMF learns embeddings optimized for element-wise product")
print("  MLP learns embeddings optimized for concatenation + non-linear transform")
print("  These are different objectives - let each pathway specialize!")

### Mistake 3: Not Shuffling Negative Samples

In [None]:
# ‚ùå Wrong: Pre-generate all negatives (same negatives every epoch)
# negatives = generate_negatives_once()  # Fixed negatives

# ‚úÖ Right: Sample fresh negatives each epoch (as we do in NCFDataset)
# Negatives are sampled in __getitem__, so each epoch sees different negatives

print("Why fresh negatives matter:")
print("  Static negatives: Model memorizes 'item 5 is negative for user 3'")
print("  Fresh negatives: Model learns general patterns of positive vs negative")

---

## üéâ Checkpoint

You've learned:
- ‚úÖ Why neural networks can outperform linear matrix factorization
- ‚úÖ How negative sampling works for implicit feedback
- ‚úÖ GMF: learnable element-wise product
- ‚úÖ MLP: non-linear user-item interaction learning
- ‚úÖ NeuMF: combining GMF and MLP for best results
- ‚úÖ Evaluating with Hit Rate @ K

---

## üöÄ Challenge (Optional)

**Implement Pre-training (15-20 min):**

The original NeuMF paper suggests pre-training GMF and MLP separately, then combining their weights to initialize NeuMF. This often improves final performance.

Steps:
1. Train GMF alone for N epochs
2. Train MLP alone for N epochs
3. Initialize NeuMF with weights from trained GMF and MLP
4. Fine-tune the combined NeuMF

---

## üìñ Further Reading

- [Neural Collaborative Filtering Paper](https://arxiv.org/abs/1708.05031) - The original NCF paper
- [Deep Learning for Recommender Systems](https://dl.acm.org/doi/10.1145/3285029) - Survey paper
- [RecBole Library](https://recbole.io/) - Unified framework for recommendation research

---

## üßπ Cleanup

In [None]:
import gc

del model, gmf_model, mlp_model, neumf_model
torch.cuda.empty_cache()
gc.collect()

print("‚úÖ GPU memory cleared!")

---

## ‚û°Ô∏è Next Steps

In the next notebook, we'll build a **Two-Tower Retrieval System** - the architecture used by Google, YouTube, and other tech giants for retrieving candidates from billions of items!

Continue to: **03-two-tower-retrieval.ipynb**