# Lesson 18 - Collaborative Filtering via Matrix Factorization


## Objectives
- Implement matrix factorization with SGD.
- Reconstruct a sparse ratings matrix.
- Visualize training loss over epochs.


## From the notes

**Collaborative filtering**
- Factor ratings matrix $R \approx U V^T$.
- Optimize with squared error on observed entries.

_TODO: Validate collaborative filtering equations in the CS229 main notes PDF._


## Intuition
Matrix factorization learns latent user and item vectors such that their dot product approximates observed ratings.


## Data
We generate a small synthetic ratings matrix with missing entries.


In [None]:
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

num_users, num_items, k = 20, 15, 3
true_U = np.random.randn(num_users, k)
true_V = np.random.randn(num_items, k)
R_full = true_U @ true_V.T
mask = np.random.rand(num_users, num_items) < 0.6
R = np.where(mask, R_full, np.nan)

U = 0.1 * np.random.randn(num_users, k)
V = 0.1 * np.random.randn(num_items, k)

def sgd(R, U, V, lr=0.05, reg=0.01, epochs=200):
    errors = []
    rows, cols = np.where(~np.isnan(R))
    for _ in range(epochs):
        for i, j in zip(rows, cols):
            err = R[i, j] - U[i] @ V[j]
            U[i] += lr * (err * V[j] - reg * U[i])
            V[j] += lr * (err * U[i] - reg * V[j])
        pred = U @ V.T
        errors.append(np.nanmean((pred - R) ** 2))
    return errors

losses = sgd(R, U, V)


## Experiments


In [None]:
losses[-1]


## Visualizations


In [None]:
plt.figure(figsize=(6,4))
plt.plot(losses)
plt.title("Matrix factorization loss")
plt.xlabel("epoch")
plt.ylabel("MSE")
plt.show()

plt.figure(figsize=(6,4))
plt.imshow(U @ V.T, aspect="auto", cmap="viridis")
plt.title("Reconstructed ratings")
plt.xlabel("item")
plt.ylabel("user")
plt.colorbar()
plt.show()


## Takeaways
- Matrix factorization learns latent factors that explain observed ratings.
- SGD with regularization prevents overfitting on sparse data.


## Explain it in an interview
- Explain how matrix factorization handles missing entries.
- Describe the role of regularization in collaborative filtering.


## Exercises
- Try different latent dimensions and compare loss.
- Hold out a validation set to tune regularization.
- Add user and item bias terms.
