Matrix factorization is a way to turn a big table of user–item ratings into two smaller tables that capture hidden patterns about users and items, so we can predict new ratings and make recommendations.



### Core idea of matrix factorization

- You start with a **rating matrix** $R$:  
  - Rows = users, columns = items, cells = ratings (with many missing).  
- Matrix factorization tries to find two smaller matrices:
  - A **user matrix** $P$ (users × factors).  
  - An **item matrix** $Q$ (factors × items).  
- When you multiply them ($P \times Q$), you get an **approximate rating matrix** $\tilde{R}$:  
  - Each entry in $\tilde{R}$ is the **predicted rating** for a user–item pair.  
- The numbers in these smaller matrices are **latent features** (hidden factors) that describe:
  - How much each user likes each factor.  
  - How much each item has each factor.  

So instead of explicitly saying “this user likes action movies,” the model learns numeric factors that implicitly capture such patterns.



### Matrix factorization as a view of collaborative filtering

- In collaborative filtering, we use only user–item ratings (no explicit genres, tags, etc.).  
- Matrix factorization is a **collaborative filtering technique** where:
  - Users are represented by latent factor vectors (rows of $P$).  
  - Items are represented by latent factor vectors (columns or rows of $Q$).  
  - A predicted rating is the **dot product** between a user’s vector and an item’s vector.  
- This dot product is exactly what you compute to get a predicted rating: user factors × item factors.

This gives a compact way to summarize all the relationships in the rating data using a relatively small number of hidden factors.



### Connection to Funk SVD

- The gradient descent method you saw earlier (Funk SVD) is **one specific matrix factorization algorithm** for recommendation.  
- It:
  - Chooses a factor dimension $Z$ (for example, 2, 20, 100, etc.).  
  - Learns $P$ (size $M \times Z$) and $Q$ (size $Z \times N$) by minimizing prediction error with gradient descent.  
  - Uses only the observed ratings and can handle many missing entries.  
- This is often called “SVD” in recommendation contexts, but it is **not** the same as classical linear-algebra singular value decomposition.

So: matrix factorization is the general concept; Funk SVD is a practical, gradient-descent-based way to perform it on sparse rating data.



### How this differs from true SVD

Classical **singular value decomposition (SVD)** (from linear algebra):

- Decomposes a full $M \times N$ matrix into **three** matrices (often written $U \Sigma V^T$).  
- The shapes are fixed: $U$ is $M \times M$, $\Sigma$ is $M \times N$, $V$ is $N \times N$.  
- The columns of $U$ and $V$ are **orthogonal** (a strong mathematical property).  
- Assumes the matrix is **complete** (no missing entries).

In contrast, the matrix factorization used in Funk SVD:

- Uses **two** matrices $P$ and $Q$ of sizes $M \times Z$ and $Z \times N$, where **$Z$ is chosen by you** (e.g., 50 or 100 factors).  
- The factors are learned via **gradient descent**, aiming to minimize rating prediction error.  
- Does **not** enforce orthogonality on the factor matrices.  
- Crucially, it is designed to work with **missing entries**, which are standard in real-world recommendation data.

That’s why we can’t just apply standard SVD directly to a typical user–item rating matrix with lots of missing ratings.



### Why matrix factorization is useful in recommender systems

- It **handles sparsity**: most users rate only a tiny fraction of items, but the model can still learn useful patterns and predict missing ratings.  
- It **captures complex relationships**: users who have similar hidden factor vectors will tend to like similar items, even if they’ve never rated the exact same ones.  
- It **scales**: with efficient implementations, matrix factorization can be applied to very large datasets (millions of users and items).  
- It forms the backbone of many modern collaborative filtering systems (e.g., variants used in the Netflix Prize era).

In short, matrix factorization is the viewpoint that your prediction model is effectively factoring the rating matrix into user and item factor matrices, and Funk SVD is a popular, gradient-descent-driven way to learn those factors from incomplete data.