# Lab B.3: Two-Tower Retrieval System

**Module:** B - Recommender Systems  
**Time:** 2.5 hours  
**Difficulty:** ‚≠ê‚≠ê‚≠ê‚≠ê (Advanced)

---

## üéØ Learning Objectives

By the end of this notebook, you will:
- [ ] Understand why two-tower architecture scales to billions of items
- [ ] Build query (user) and item towers with PyTorch
- [ ] Train with in-batch negative sampling
- [ ] Build a FAISS index for fast approximate nearest neighbor search
- [ ] Implement a real-time retrieval pipeline with sub-10ms latency

---

## üìö Prerequisites

- Completed: Lab B.2 (Neural Collaborative Filtering)
- Knowledge of: Embeddings, similarity metrics

---

## üåç Real-World Context

**The Scale Problem:** YouTube has billions of videos. Netflix has millions of shows. Amazon has hundreds of millions of products. You can't score every single item for every user in real-time!

**The Solution:** Two-stage recommendation:
1. **Retrieval (Two-Tower)**: Quickly find ~1000 candidates from millions
2. **Ranking**: Score those 1000 candidates with a more complex model

Two-Tower is used at:
- üì∫ **YouTube**: The famous "Deep Neural Networks for YouTube Recommendations" paper
- üîç **Google Search**: Finding relevant documents
- üõí **Alibaba**: E-commerce product retrieval
- üéµ **Spotify**: Song candidate generation

---

## üßí ELI5: Two-Tower Architecture

> **Imagine a huge library with 10 million books...**
>
> The librarian can't read your mind and check every book. Instead:
>
> 1. **Query Tower (You)**: You describe what you want ‚Üí gets converted to a "vibe" 
>    (a vector like [0.2, -0.5, 0.8, ...])
>
> 2. **Item Tower (Books)**: Every book has been pre-converted to its "vibe" 
>    (stored in a special index)
>
> 3. **Matching**: Find books with similar "vibes" using fast nearest neighbor search
>
> The magic: **Books are pre-encoded offline!** At query time, we only encode the user
> (fast!) and do a similarity search (also fast!).
>
> **Why "Two Towers"?** Because user and item processing are completely separate -
> like two towers that only meet at the top (similarity computation).

```
User Features ‚îÄ‚îÄ‚Üí [ Query Tower ] ‚îÄ‚îÄ‚Üí User Embedding ‚îÄ‚îê
                                                       ‚îú‚îÄ‚Üí Similarity ‚îÄ‚îÄ‚Üí Top K Items
Item Features ‚îÄ‚îÄ‚Üí [ Item Tower ]  ‚îÄ‚îÄ‚Üí Item Embeddings ‚îÄ‚îò
                        ‚Üë                    ‚îÇ
                   (offline)            (indexed in FAISS)
```

---

## Part 1: Setup and Feature Engineering

In [None]:
import sys
from pathlib import Path

module_dir = Path.cwd().parent if 'notebooks' in str(Path.cwd()) else Path.cwd()
sys.path.insert(0, str(module_dir / 'scripts'))

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm.auto import tqdm
import time
import warnings
warnings.filterwarnings('ignore')

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

from data_utils import download_movielens, train_test_split_by_time

# Check for FAISS
try:
    import faiss
    FAISS_AVAILABLE = True
    print("‚úÖ FAISS is available!")
except ImportError:
    FAISS_AVAILABLE = False
    print("‚ö†Ô∏è  FAISS not installed. Install with: pip install faiss-gpu")
    print("   We'll use a simple numpy implementation instead.")

np.random.seed(42)
torch.manual_seed(42)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"\nüñ•Ô∏è  Using device: {device}")

if torch.cuda.is_available():
    print(f"üöÄ GPU: {torch.cuda.get_device_name(0)}")
    print(f"üíæ Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

In [None]:
# Load data
ratings_df, movies_df = download_movielens('100k')

num_users = ratings_df['user_id'].nunique()
num_items = ratings_df['item_id'].nunique()

print(f"üìä Dataset: {num_users} users, {num_items} items, {len(ratings_df):,} ratings")

### Creating Item Features

Two-tower models can use rich features. Let's create item features from genres.

In [None]:
# Extract genre features (multi-hot encoding)
all_genres = set()
for genres in movies_df['genres'].dropna():
    all_genres.update(genres.split('|'))
all_genres = sorted([g for g in all_genres if g])  # Remove empty strings

print(f"Genres found: {all_genres}")

# Create genre feature matrix
def create_genre_features(movies_df, all_genres):
    """Create multi-hot genre encoding for each movie."""
    genre_to_idx = {g: i for i, g in enumerate(all_genres)}
    num_movies = movies_df['item_id'].max() + 1
    
    features = np.zeros((num_movies, len(all_genres)), dtype=np.float32)
    
    for _, row in movies_df.iterrows():
        item_id = row['item_id']
        genres = row['genres']
        if pd.notna(genres):
            for genre in genres.split('|'):
                if genre in genre_to_idx:
                    features[item_id, genre_to_idx[genre]] = 1.0
    
    return features

item_genre_features = create_genre_features(movies_df, all_genres)
print(f"\nItem genre features shape: {item_genre_features.shape}")
print(f"Example (Toy Story): {item_genre_features[0]}")

In [None]:
# Create user features from their rating history
def create_user_features(ratings_df, item_genre_features, num_users):
    """
    Create user features based on their watched genres.
    
    For each user, we average the genre vectors of items they've rated,
    weighted by rating (higher rated = more weight).
    """
    num_genres = item_genre_features.shape[1]
    user_features = np.zeros((num_users, num_genres), dtype=np.float32)
    user_counts = np.zeros(num_users)
    
    for _, row in ratings_df.iterrows():
        user_id = int(row['user_id'])
        item_id = int(row['item_id'])
        rating = row['rating']
        
        # Weight by rating (normalized to 0-1)
        weight = (rating - 1) / 4  # Maps 1-5 to 0-1
        user_features[user_id] += item_genre_features[item_id] * weight
        user_counts[user_id] += 1
    
    # Normalize
    user_counts = np.maximum(user_counts, 1)  # Avoid division by zero
    user_features = user_features / user_counts[:, np.newaxis]
    
    return user_features

user_genre_features = create_user_features(ratings_df, item_genre_features, num_users)
print(f"User features shape: {user_genre_features.shape}")
print(f"Example user 0 preferences: {user_genre_features[0][:5]}...")

---

## Part 2: Two-Tower Model Architecture

### Key PyTorch Components for Two-Tower Models

Before building our towers, let's understand the key components we'll use:

**`F.normalize` (Vector Normalization)**
- Normalizes vectors to have unit length (L2 norm = 1)
- Essential for cosine similarity: dot product of normalized vectors = cosine similarity
- `F.normalize(x, p=2, dim=-1)`: L2 normalize along last dimension

```python
import torch.nn.functional as F

embeddings = model(inputs)  # Shape: (batch, 128)
normalized = F.normalize(embeddings, p=2, dim=-1)  # Each row has length 1
# Now: normalized @ normalized.T gives cosine similarities
```

**`nn.BatchNorm1d` (Batch Normalization)**
- Normalizes activations across the batch dimension
- Speeds up training and improves stability
- `nn.BatchNorm1d(num_features)`: normalizes features for each sample

```python
self.bn = nn.BatchNorm1d(128)  # For 128-dimensional features
output = self.bn(hidden)  # Normalize each feature across the batch
```

**`F.cross_entropy` (Cross-Entropy Loss)**
- Multi-class classification loss (softmax + negative log likelihood)
- For in-batch negatives: "Which of these N items is the correct one?"
- `F.cross_entropy(logits, labels)`: logits are raw scores, labels are class indices

```python
logits = query @ items.T  # Shape: (batch, batch) similarity matrix
labels = torch.arange(batch_size)  # Diagonal = positive pairs
loss = F.cross_entropy(logits, labels)  # Softmax + NLL
```

In [None]:
class QueryTower(nn.Module):
    """
    Query (User) Tower.
    
    Encodes user features and/or user ID into a dense embedding.
    This tower is run at query time for each user request.
    """
    
    def __init__(self, num_users, user_feature_dim, embedding_dim=128, 
                 hidden_dims=[256, 128], use_features=True):
        super().__init__()
        
        self.use_features = use_features
        
        # User ID embedding
        self.user_embedding = nn.Embedding(num_users, hidden_dims[0])
        
        # Feature processing
        if use_features:
            self.feature_layer = nn.Linear(user_feature_dim, hidden_dims[0])
            input_dim = hidden_dims[0] * 2  # ID embedding + feature embedding
        else:
            input_dim = hidden_dims[0]
        
        # MLP layers
        layers = []
        for hidden_dim in hidden_dims:
            layers.extend([
                nn.Linear(input_dim, hidden_dim),
                nn.ReLU(),
                nn.BatchNorm1d(hidden_dim),
            ])
            input_dim = hidden_dim
        
        layers.append(nn.Linear(hidden_dims[-1], embedding_dim))
        self.mlp = nn.Sequential(*layers)
        
        self._init_weights()
        
    def _init_weights(self):
        nn.init.normal_(self.user_embedding.weight, std=0.01)
        
    def forward(self, user_ids, user_features=None):
        """
        Encode users to embedding space.
        
        Args:
            user_ids: (batch_size,) tensor of user IDs
            user_features: (batch_size, feature_dim) tensor of user features
            
        Returns:
            L2-normalized user embeddings (batch_size, embedding_dim)
        """
        user_emb = self.user_embedding(user_ids)
        
        if self.use_features and user_features is not None:
            feature_emb = self.feature_layer(user_features)
            combined = torch.cat([user_emb, feature_emb], dim=1)
        else:
            combined = user_emb
        
        output = self.mlp(combined)
        
        # L2 normalize for cosine similarity
        return F.normalize(output, p=2, dim=-1)


class ItemTower(nn.Module):
    """
    Item Tower.
    
    Encodes item features and/or item ID into a dense embedding.
    This tower is run OFFLINE for all items, and embeddings are stored in an index.
    """
    
    def __init__(self, num_items, item_feature_dim, embedding_dim=128,
                 hidden_dims=[256, 128], use_features=True):
        super().__init__()
        
        self.use_features = use_features
        
        # Item ID embedding
        self.item_embedding = nn.Embedding(num_items, hidden_dims[0])
        
        # Feature processing
        if use_features:
            self.feature_layer = nn.Linear(item_feature_dim, hidden_dims[0])
            input_dim = hidden_dims[0] * 2
        else:
            input_dim = hidden_dims[0]
        
        # MLP layers
        layers = []
        for hidden_dim in hidden_dims:
            layers.extend([
                nn.Linear(input_dim, hidden_dim),
                nn.ReLU(),
                nn.BatchNorm1d(hidden_dim),
            ])
            input_dim = hidden_dim
        
        layers.append(nn.Linear(hidden_dims[-1], embedding_dim))
        self.mlp = nn.Sequential(*layers)
        
        self._init_weights()
        
    def _init_weights(self):
        nn.init.normal_(self.item_embedding.weight, std=0.01)
        
    def forward(self, item_ids, item_features=None):
        """Encode items to embedding space."""
        item_emb = self.item_embedding(item_ids)
        
        if self.use_features and item_features is not None:
            feature_emb = self.feature_layer(item_features)
            combined = torch.cat([item_emb, feature_emb], dim=1)
        else:
            combined = item_emb
        
        output = self.mlp(combined)
        return F.normalize(output, p=2, dim=-1)


class TwoTowerModel(nn.Module):
    """
    Two-Tower Retrieval Model.
    
    Combines query and item towers, computing similarity for training.
    """
    
    def __init__(self, num_users, num_items, user_feature_dim, item_feature_dim,
                 embedding_dim=128, hidden_dims=[256, 128], 
                 temperature=0.07, use_features=True):
        super().__init__()
        
        self.query_tower = QueryTower(
            num_users, user_feature_dim, embedding_dim, hidden_dims, use_features
        )
        self.item_tower = ItemTower(
            num_items, item_feature_dim, embedding_dim, hidden_dims, use_features
        )
        
        # Temperature for softmax (lower = sharper distribution)
        self.temperature = nn.Parameter(torch.tensor(temperature))
        
    def forward(self, user_ids, item_ids, user_features=None, item_features=None):
        """
        Compute similarity matrix for in-batch negatives.
        
        Returns:
            logits: (batch_size, batch_size) similarity matrix
            Diagonal contains positive pairs, off-diagonal are negatives.
        """
        query_emb = self.query_tower(user_ids, user_features)
        item_emb = self.item_tower(item_ids, item_features)
        
        # Compute all pairwise similarities
        # logits[i,j] = similarity between query i and item j
        logits = torch.matmul(query_emb, item_emb.T) / self.temperature
        
        return logits
    
    def encode_queries(self, user_ids, user_features=None):
        """Encode users (for inference)."""
        return self.query_tower(user_ids, user_features)
    
    def encode_items(self, item_ids, item_features=None):
        """Encode items (for building index)."""
        return self.item_tower(item_ids, item_features)


# Test the model
user_feature_dim = user_genre_features.shape[1]
item_feature_dim = item_genre_features.shape[1]

model = TwoTowerModel(
    num_users=num_users,
    num_items=num_items,
    user_feature_dim=user_feature_dim,
    item_feature_dim=item_feature_dim,
    embedding_dim=128,
    use_features=True
).to(device)

print(f"‚úÖ Two-Tower Model created!")
print(f"   Parameters: {sum(p.numel() for p in model.parameters()):,}")

---

## Part 3: In-Batch Negative Sampling

The key training trick: use other items in the same batch as negatives!

In [None]:
class TwoTowerDataset(Dataset):
    """
    Dataset for Two-Tower model training.
    
    Returns positive (user, item) pairs with their features.
    Negatives come from other items in the batch (in-batch negatives).
    """
    
    def __init__(self, interactions_df, user_features, item_features):
        self.users = interactions_df['user_id'].values
        self.items = interactions_df['item_id'].values
        self.user_features = torch.FloatTensor(user_features)
        self.item_features = torch.FloatTensor(item_features)
        
    def __len__(self):
        return len(self.users)
    
    def __getitem__(self, idx):
        user_id = self.users[idx]
        item_id = self.items[idx]
        
        return {
            'user_id': user_id,
            'item_id': item_id,
            'user_features': self.user_features[user_id],
            'item_features': self.item_features[item_id],
        }


def in_batch_negative_loss(logits):
    """
    Compute in-batch negative loss.
    
    The positive pairs are on the diagonal (user i, item i).
    All other pairs in the batch are negatives.
    
    This is essentially a softmax classification:
    "Which item in the batch is the correct one for this user?"
    """
    batch_size = logits.shape[0]
    labels = torch.arange(batch_size, device=logits.device)
    return F.cross_entropy(logits, labels)


# Create datasets
train_df, test_df = train_test_split_by_time(ratings_df, test_ratio=0.2)

train_dataset = TwoTowerDataset(train_df, user_genre_features, item_genre_features)
train_loader = DataLoader(
    train_dataset, 
    batch_size=512,  # Larger batch = more negatives per positive
    shuffle=True,
    num_workers=0
)

print(f"‚úÖ Dataset created: {len(train_dataset):,} training interactions")
print(f"   Batch size: 512 (511 negatives per positive!)")

### üßí ELI5: In-Batch Negatives

> **Why is this so clever?**
>
> Traditional negative sampling: For each positive pair, randomly sample N negatives.
> With batch size 512 and 4 negatives each: 512 √ó 4 = 2048 negative computations.
>
> In-batch negatives: Each batch has 512 positive pairs. 
> For user 1, items 2-512 are negatives. For user 2, items 1, 3-512 are negatives.
> We get 511 negatives per positive **for free** - just from computing the batch!
>
> This is why two-tower training can scale to billions of items efficiently.

---

## Part 4: Training

In [None]:
def train_two_tower(model, train_loader, optimizer, device, epoch):
    """Train for one epoch."""
    model.train()
    total_loss = 0
    
    pbar = tqdm(train_loader, desc=f'Epoch {epoch}')
    for batch in pbar:
        user_ids = batch['user_id'].to(device)
        item_ids = batch['item_id'].to(device)
        user_features = batch['user_features'].to(device)
        item_features = batch['item_features'].to(device)
        
        # Forward: compute similarity matrix
        logits = model(user_ids, item_ids, user_features, item_features)
        
        # Loss: cross-entropy with diagonal as targets
        loss = in_batch_negative_loss(logits)
        
        # Backward
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item() * len(user_ids)
        pbar.set_postfix({'loss': loss.item()})
    
    return total_loss / len(train_loader.dataset)

In [None]:
# Initialize training
model = TwoTowerModel(
    num_users=num_users,
    num_items=num_items,
    user_feature_dim=user_feature_dim,
    item_feature_dim=item_feature_dim,
    embedding_dim=128,
    hidden_dims=[256, 128],
    temperature=0.07,
    use_features=True
).to(device)

optimizer = optim.Adam(model.parameters(), lr=0.001)

print(f"üéØ Training Two-Tower model...")
print(f"   Embedding dim: 128")
print(f"   Temperature: 0.07")

In [None]:
# Training loop
num_epochs = 15
train_losses = []

for epoch in range(1, num_epochs + 1):
    loss = train_two_tower(model, train_loader, optimizer, device, epoch)
    train_losses.append(loss)
    
    if epoch % 5 == 0:
        print(f"\nüìä Epoch {epoch}: Loss = {loss:.4f}")

print(f"\n‚úÖ Training complete!")

In [None]:
# Plot training loss
plt.figure(figsize=(10, 4))
plt.plot(range(1, len(train_losses)+1), train_losses, 'b-', linewidth=2)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Two-Tower Training Loss')
plt.grid(True, alpha=0.3)
plt.show()

### Understanding FAISS (Facebook AI Similarity Search)

FAISS is a library for efficient similarity search developed by Meta AI. It's the industry standard for billion-scale vector retrieval.

**Key FAISS Functions We'll Use:**

1. **`faiss.IndexFlatIP(dim)`** - Exact inner product search
   - Creates an index for exact nearest neighbor search using inner product (dot product)
   - For normalized vectors, inner product = cosine similarity
   - Best for small-medium datasets (< 1M vectors)

2. **`faiss.StandardGpuResources()`** - GPU memory manager
   - Allocates GPU memory for FAISS operations
   - Required for GPU-accelerated search

3. **`faiss.index_cpu_to_gpu(res, gpu_id, index)`** - Move index to GPU
   - Transfers CPU index to GPU for faster search
   - 10-100x speedup for large indices

4. **`index.add(embeddings)`** - Add vectors to index
   - Stores pre-computed item embeddings
   - Call once during offline indexing

5. **`index.search(query, k)`** - Find k nearest neighbors
   - Returns (distances, indices) of top-k similar items
   - Sub-millisecond latency for millions of items

```python
import faiss

# Create index for 128-dimensional embeddings
index = faiss.IndexFlatIP(128)  # Inner Product index

# Add item embeddings (done offline)
index.add(item_embeddings.astype('float32'))

# Search for top-10 similar items (done at query time)
distances, indices = index.search(query_embedding, k=10)
```

---

## Part 5: Building the FAISS Index

Now let's pre-compute all item embeddings and build an index for fast retrieval!

In [None]:
def compute_all_item_embeddings(model, item_features, batch_size=512):
    """
    Compute embeddings for all items (offline).
    
    This is done once and the embeddings are stored in the index.
    """
    model.eval()
    num_items = len(item_features)
    
    all_embeddings = []
    
    with torch.no_grad():
        for start_idx in tqdm(range(0, num_items, batch_size), desc='Computing item embeddings'):
            end_idx = min(start_idx + batch_size, num_items)
            
            item_ids = torch.arange(start_idx, end_idx).to(device)
            item_feats = torch.FloatTensor(item_features[start_idx:end_idx]).to(device)
            
            embeddings = model.encode_items(item_ids, item_feats)
            all_embeddings.append(embeddings.cpu().numpy())
    
    return np.vstack(all_embeddings)

# Compute item embeddings
item_embeddings = compute_all_item_embeddings(model, item_genre_features)
print(f"\n‚úÖ Item embeddings shape: {item_embeddings.shape}")

In [None]:
if FAISS_AVAILABLE:
    # Build FAISS index
    def build_faiss_index(embeddings, use_gpu=True):
        """
        Build a FAISS index for fast nearest neighbor search.
        
        For small datasets, we use exact search (IndexFlatIP).
        For larger datasets, you'd use IVF or HNSW.
        """
        dim = embeddings.shape[1]
        
        # Inner Product index (for cosine similarity with normalized vectors)
        index = faiss.IndexFlatIP(dim)
        
        if use_gpu and torch.cuda.is_available():
            try:
                res = faiss.StandardGpuResources()
                index = faiss.index_cpu_to_gpu(res, 0, index)
                print("üöÄ Using GPU-accelerated FAISS!")
            except Exception as e:
                print(f"‚ö†Ô∏è GPU FAISS failed, using CPU: {e}")
        
        # Add embeddings to index
        index.add(embeddings.astype('float32'))
        
        return index
    
    index = build_faiss_index(item_embeddings)
    print(f"‚úÖ FAISS index built with {index.ntotal} items")
    
else:
    # Simple numpy fallback
    class SimpleIndex:
        def __init__(self, embeddings):
            self.embeddings = embeddings
            self.ntotal = len(embeddings)
            
        def search(self, queries, k):
            # Brute force search
            similarities = queries @ self.embeddings.T
            top_k_indices = np.argsort(-similarities, axis=1)[:, :k]
            top_k_scores = np.take_along_axis(similarities, top_k_indices, axis=1)
            return top_k_scores, top_k_indices
    
    index = SimpleIndex(item_embeddings)
    print(f"‚úÖ Simple index built with {index.ntotal} items")

---

## Part 6: Real-Time Retrieval Pipeline

In [None]:
class TwoTowerRetriever:
    """
    Production-ready retrieval pipeline.
    
    Encodes user queries and retrieves top-K candidates from the index.
    """
    
    def __init__(self, model, index, user_features, movies_df, device):
        self.model = model
        self.index = index
        self.user_features = user_features
        self.movies_df = movies_df
        self.device = device
        self.model.eval()
        
    def retrieve(self, user_id, k=10, return_timing=False):
        """
        Retrieve top-K items for a user.
        
        Args:
            user_id: User to recommend for
            k: Number of items to retrieve
            return_timing: If True, return timing breakdown
            
        Returns:
            List of (item_id, score, title) tuples
        """
        timings = {}
        
        # Step 1: Encode user query
        start = time.perf_counter()
        
        with torch.no_grad():
            user_ids = torch.LongTensor([user_id]).to(self.device)
            user_feats = torch.FloatTensor(self.user_features[user_id:user_id+1]).to(self.device)
            query_embedding = self.model.encode_queries(user_ids, user_feats)
            query_embedding = query_embedding.cpu().numpy()
        
        timings['encode_ms'] = (time.perf_counter() - start) * 1000
        
        # Step 2: Search index
        start = time.perf_counter()
        scores, indices = self.index.search(query_embedding.astype('float32'), k)
        timings['search_ms'] = (time.perf_counter() - start) * 1000
        
        # Step 3: Format results
        results = []
        for item_id, score in zip(indices[0], scores[0]):
            title = self.movies_df[self.movies_df['item_id'] == item_id]['title'].values
            title = title[0] if len(title) > 0 else f"Item {item_id}"
            results.append((item_id, float(score), title))
        
        timings['total_ms'] = timings['encode_ms'] + timings['search_ms']
        
        if return_timing:
            return results, timings
        return results


# Create retriever
retriever = TwoTowerRetriever(
    model=model,
    index=index,
    user_features=user_genre_features,
    movies_df=movies_df,
    device=device
)

print("‚úÖ Retriever ready!")

In [None]:
# Test retrieval
user_id = 0
results, timings = retriever.retrieve(user_id, k=10, return_timing=True)

print(f"üé¨ Top 10 Recommendations for User {user_id}:")
print("‚îÄ" * 60)
for item_id, score, title in results:
    print(f"  {score:.3f} | {title}")

print(f"\n‚è±Ô∏è  Timing Breakdown:")
print(f"   Query encoding: {timings['encode_ms']:.2f} ms")
print(f"   Index search:   {timings['search_ms']:.2f} ms")
print(f"   Total:          {timings['total_ms']:.2f} ms")

if timings['total_ms'] < 10:
    print(f"\nüéâ Goal achieved! Sub-10ms latency!")

In [None]:
# Benchmark retrieval latency
def benchmark_retrieval(retriever, num_users=100, k=100):
    """Benchmark retrieval latency over many users."""
    latencies = []
    
    for user_id in tqdm(range(num_users), desc='Benchmarking'):
        _, timings = retriever.retrieve(user_id, k=k, return_timing=True)
        latencies.append(timings['total_ms'])
    
    return {
        'mean_ms': np.mean(latencies),
        'p50_ms': np.percentile(latencies, 50),
        'p95_ms': np.percentile(latencies, 95),
        'p99_ms': np.percentile(latencies, 99),
        'qps': 1000 / np.mean(latencies),
    }

benchmark = benchmark_retrieval(retriever, num_users=100, k=100)

print(f"\nüìä Retrieval Benchmark (k=100):")
print(f"   Mean latency:  {benchmark['mean_ms']:.2f} ms")
print(f"   P50 latency:   {benchmark['p50_ms']:.2f} ms")
print(f"   P95 latency:   {benchmark['p95_ms']:.2f} ms")
print(f"   P99 latency:   {benchmark['p99_ms']:.2f} ms")
print(f"   Throughput:    {benchmark['qps']:.0f} queries/second")

**Recap: t-SNE for Embedding Visualization**

As we learned in Lab B.1, t-SNE (from `sklearn.manifold`) reduces high-dimensional embeddings to 2D for visualization while preserving neighborhood structure. Similar items cluster together in the visualization.

In [None]:
# Visualize latency distribution
latencies = []
for user_id in range(200):
    _, timings = retriever.retrieve(user_id, k=100, return_timing=True)
    latencies.append(timings['total_ms'])

plt.figure(figsize=(10, 4))
plt.hist(latencies, bins=30, edgecolor='black', alpha=0.7)
plt.axvline(x=10, color='red', linestyle='--', label='Target: 10ms')
plt.axvline(x=np.mean(latencies), color='green', linestyle='--', label=f'Mean: {np.mean(latencies):.2f}ms')
plt.xlabel('Latency (ms)')
plt.ylabel('Count')
plt.title('Retrieval Latency Distribution')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

---

## Part 7: Analyzing Embeddings

In [None]:
from sklearn.manifold import TSNE

# Sample embeddings for visualization
sample_size = 500
sample_indices = np.random.choice(len(item_embeddings), sample_size, replace=False)
sample_embeddings = item_embeddings[sample_indices]

# Run t-SNE
print("Running t-SNE (this takes a moment)...")
tsne = TSNE(n_components=2, random_state=42, perplexity=30)
embeddings_2d = tsne.fit_transform(sample_embeddings)

In [None]:
# Color by primary genre
def get_genre(item_id):
    genres = movies_df[movies_df['item_id'] == item_id]['genres'].values
    if len(genres) > 0 and pd.notna(genres[0]):
        return genres[0].split('|')[0]
    return 'Unknown'

genres = [get_genre(idx) for idx in sample_indices]
unique_genres = list(set(genres))
genre_to_color = {g: i for i, g in enumerate(unique_genres)}
colors = [genre_to_color[g] for g in genres]

plt.figure(figsize=(12, 8))
scatter = plt.scatter(
    embeddings_2d[:, 0],
    embeddings_2d[:, 1],
    c=colors,
    cmap='tab20',
    alpha=0.6,
    s=30
)

plt.title('Item Embeddings (t-SNE Visualization)\nTwo-Tower Model')
plt.xlabel('Dimension 1')
plt.ylabel('Dimension 2')

# Add legend for top genres
top_genres = pd.Series(genres).value_counts().head(8).index
handles = [plt.scatter([], [], c=[plt.cm.tab20(genre_to_color[g]/20)], label=g, s=100) 
           for g in top_genres]
plt.legend(handles=handles, loc='upper right', title='Genre')

plt.tight_layout()
plt.show()

print("\nüìä Notice how similar genres cluster together!")
print("   The two-tower model learned semantic structure from interactions.")

---

## ‚úã Try It Yourself!

### Exercise 1: Temperature Tuning

Try different temperature values (0.01, 0.07, 0.2, 0.5) and observe how it affects training.

<details>
<summary>üí° Hint</summary>

- Lower temperature = sharper softmax = more confident predictions
- Higher temperature = softer softmax = more exploration
- Too low: training instability, Too high: weak learning signal

</details>

In [None]:
# YOUR CODE HERE



### Exercise 2: Batch Size Impact

Try different batch sizes (64, 256, 1024) and measure the impact on training.

<details>
<summary>üí° Hint</summary>

Larger batch = more in-batch negatives = better gradient estimates
But also = more GPU memory needed

On DGX Spark with 128GB, you can use very large batches!

</details>

In [None]:
# YOUR CODE HERE



---

## ‚ö†Ô∏è Common Mistakes

### Mistake 1: Forgetting to Normalize Embeddings

In [None]:
# ‚ùå Wrong: Raw embeddings (magnitude varies)
# output = self.mlp(combined)  # Magnitudes can be 0.5, 5.0, 50.0...
# similarity = query @ item.T  # Dominated by large magnitudes!

# ‚úÖ Right: L2 normalize (all vectors have magnitude 1)
# output = F.normalize(self.mlp(combined), p=2, dim=-1)
# similarity = query @ item.T  # Now this is cosine similarity!

print("Why normalization matters:")
print("  Without: Dot product favors vectors with large magnitude")
print("  With:    Dot product = cosine similarity (direction only)")

### Mistake 2: Using the Same Batch for Queries and Items

In [None]:
# ‚ùå Potential Issue: All positive pairs are on the diagonal
# If batch = [(u1,i1), (u2,i2), ...], then positive(u1) = i1, not i2
# Make sure your loss function expects this!

# ‚úÖ Right: Labels should be [0, 1, 2, ...] (diagonal indices)
# labels = torch.arange(batch_size)
# loss = F.cross_entropy(logits, labels)

print("The diagonal pattern:")
print("  logits[0,0] = similarity(user 0, item 0) ‚Üê POSITIVE")
print("  logits[0,1] = similarity(user 0, item 1) ‚Üê negative")
print("  logits[1,0] = similarity(user 1, item 0) ‚Üê negative")
print("  logits[1,1] = similarity(user 1, item 1) ‚Üê POSITIVE")

### Mistake 3: Not Updating Item Index After Training

In [None]:
# ‚ùå Wrong: Use old embeddings after model update
# model = train_more_epochs(model)
# results = retriever.retrieve(user_id)  # Using stale index!

# ‚úÖ Right: Rebuild index after training
# model = train_more_epochs(model)
# new_embeddings = compute_all_item_embeddings(model, item_features)
# new_index = build_faiss_index(new_embeddings)
# retriever = TwoTowerRetriever(model, new_index, ...)

print("In production:")
print("  - Item embeddings are recomputed periodically (hourly/daily)")
print("  - Index is rebuilt and swapped atomically")
print("  - This is why item tower runs OFFLINE!")

---

## üéâ Checkpoint

You've learned:
- ‚úÖ Why two-tower architecture scales to billions of items
- ‚úÖ How to build query and item towers
- ‚úÖ In-batch negative sampling for efficient training
- ‚úÖ Building FAISS indices for fast retrieval
- ‚úÖ Achieving sub-10ms retrieval latency

---

## üöÄ Challenge (Optional)

**Add Hard Negative Mining (20-30 min):**

In-batch negatives are often "easy" - obviously wrong items. Hard negatives are items that are similar to positives but wrong.

Implement:
1. After each epoch, find top-K most similar items to each positive
2. Add some of these as explicit hard negatives in the next epoch
3. This typically improves retrieval quality by 5-15%!

---

## üìñ Further Reading

- [Deep Neural Networks for YouTube Recommendations](https://research.google/pubs/pub45530/) - The foundational paper
- [Sampling-Bias-Corrected Neural Modeling](https://research.google/pubs/pub48840/) - Fixing popularity bias
- [FAISS Documentation](https://faiss.ai/) - Production-grade similarity search

---

## üßπ Cleanup

In [None]:
import gc

del model, index, retriever
torch.cuda.empty_cache()
gc.collect()

print("‚úÖ GPU memory cleared!")

---

## ‚û°Ô∏è Next Steps

In the final notebook, we'll bring everything together with **comprehensive evaluation and analysis** - comparing all our models with proper ranking metrics!

Continue to: **04-evaluation-and-analysis.ipynb**