# Lesson 17: Collaborative Filtering / Matrix Factorization## Objectives- Implement low-rank matrix factorization via SGD.- Track reconstruction error over iterations.- Visualize predicted ratings.

## From the notesWe model ratings \(R\) as \(R pprox U V^T\), minimizing squared error over observed entries.

## IntuitionLatent factors capture user preferences and item attributes; SGD updates only observed ratings.

## DataWe use a small synthetic ratings matrix with missing values.

In [None]:
import numpy as npimport matplotlib.pyplot as pltnp.random.seed(42)

In [None]:
# Ratings matrix with NaNs for missing valuesR = np.array([    [5, 3, np.nan, 1],    [4, np.nan, np.nan, 1],    [1, 1, np.nan, 5],    [1, np.nan, np.nan, 4],    [np.nan, 1, 5, 4],])observed = ~np.isnan(R)

## Implementation: SGD matrix factorization

In [None]:
def matrix_factorization_sgd(R, k=2, alpha=0.01, reg=0.02, num_iters=500):    m, n = R.shape    U = 0.1 * np.random.randn(m, k)    V = 0.1 * np.random.randn(n, k)    history = []    obs = np.argwhere(~np.isnan(R))    for _ in range(num_iters):        for i, j in obs:            eij = R[i, j] - U[i] @ V[j]            U[i] += alpha * (eij * V[j] - reg * U[i])            V[j] += alpha * (eij * U[i] - reg * V[j])        preds = U @ V.T        mse = np.nanmean((R - preds) ** 2)        history.append(mse)    return U, V, np.array(history)

## Experiments

In [None]:
U, V, history = matrix_factorization_sgd(R)preds = U @ V.T

## Visualizations

In [None]:
plt.figure(figsize=(6,4))plt.plot(history)plt.xlabel("iteration")plt.ylabel("MSE")plt.title("Matrix factorization convergence")plt.show()plt.figure(figsize=(5,4))plt.imshow(preds, cmap="viridis")plt.colorbar(label="predicted rating")plt.title("Predicted ratings matrix")plt.show()

## Takeaways- Matrix factorization captures latent structure with few parameters.- Regularization prevents overfitting sparse ratings.

## Explain it in an interview- Describe the low-rank factorization objective and SGD updates.- Mention cold-start and sparsity challenges.

## Exercises1. Increase rank \(k\) and compare reconstruction error.2. Try alternating least squares instead of SGD.3. Add user/item bias terms and evaluate improvements.