Funk SVD is a way to turn a big, sparse table of ratings (users × items) into two smaller tables of numbers that capture hidden “taste” patterns, and then use those to predict missing ratings.



### Big picture: what Funk SVD does

- You start with a **rating matrix** $R$: rows = users, columns = items, entries = ratings (many are missing).  
- Funk SVD learns:
  - A **user matrix $P$**: each row is a short list of numbers describing that user’s latent preferences.  
  - An **item matrix $Q$**: each row (or column, depending on convention) is a short list of numbers describing that item’s latent properties.  
- The goal is:  
  $$
  R \approx P Q^T
  $$ 
  meaning: if you multiply $P$ and $Q^T$, you get an approximate version of the rating matrix where missing entries are filled in with predictions. [freecodecamp](https://www.freecodecamp.org/news/singular-value-decomposition-vs-matrix-factorization-in-recommender-systems-b1e99bc73599/)

Think of it as compressing users and items into a shared hidden space where similar users and similar items end up close to each other. [arxiv](https://arxiv.org/pdf/2203.11026.pdf)



### Model pieces: users, items, and biases

- **User matrix (P)**  
  - Each user $u$ is represented by a vector $p_u$ of latent features (for example, 20 or 50 numbers).  
  - These numbers say how strongly the user likes each hidden factor (e.g., “action vs. romance,” “mainstream vs. niche”), but we don’t name these factors explicitly. [riverml](https://riverml.xyz/0.8.0/examples/matrix-factorization-for-recommender-systems-part-1/)

- **Item matrix (Q)**  
  - Each item $i$ is represented by a vector $q_i$ of latent features of the same length.  
  - These numbers say how much the item has each hidden factor (e.g., “how action-heavy,” “how niche”), again without explicit labels. [arxiv](https://arxiv.org/pdf/2203.11026.pdf)

- **Bias terms (optional but common)**  
  - A **global bias**: the overall average rating across all users and items.  
  - A **user bias**: some users rate higher or lower than average.  
  - An **item bias**: some items tend to get higher or lower ratings than average.  
  - These help capture simple patterns like “this user is strict” or “this movie is broadly liked” before considering detailed factors. [datajobs](https://datajobs.com/data-science-repo/Recommender-Systems-%5BNetflix%5D.pdf)



### Objective: what Funk SVD tries to optimize

For every known rating $r_{u,i}$:

- The model predicts a rating $\hat{r}_{u,i}$ using the user vector $p_u$ and item vector $q_i$ (plus biases if used).  
- The **error** is the difference: $r_{u,i} - \hat{r}_{u,i}$.  
- The loss usually uses **mean squared error (MSE)**: sum of squared errors over all known ratings, divided by how many there are. [robinwitte](https://robinwitte.com/wp-content/uploads/2019/10/RecommenderSystem.pdf)

To avoid overfitting (memorizing the training data), the objective also adds **regularization**:

- This penalizes very large values in $p_u$ and $q_i$.  
- A regularization strength $\lambda$ controls how strong this penalty is. [riverml](https://riverml.xyz/0.8.0/examples/matrix-factorization-for-recommender-systems-part-1/)

So the training goal is: choose all user and item vectors so that:

- Predicted ratings are close to the real ratings on known entries.  
- Vectors stay reasonably small (controlled by regularization).



### Training with stochastic gradient descent (SGD)

Funk SVD does **not** run a heavy math routine like classical SVD; instead, it uses **stochastic gradient descent** (SGD): [freecodecamp](https://www.freecodecamp.org/news/singular-value-decomposition-vs-matrix-factorization-in-recommender-systems-b1e99bc73599/)

1. **Initialize**  
   - Start with small random values for all user vectors $p_u$ and item vectors $q_i$.  
   - Initialize biases (if used) to simple values, like 0 or global average.

2. **Loop over known ratings**  
   For each observed rating $r_{u,i}$:  
   - Compute the current prediction $\hat{r}_{u,i}$ using $p_u$ and $q_i$ (and biases).  
   - Compute the error $e_{u,i} = r_{u,i} - \hat{r}_{u,i}$.  
   - Update:
     - The user vector $p_u$ a little in the direction that reduces this error.  
     - The item vector $q_i$ similarly.  
     - Biases (if present) are also nudged based on the error.  
   - The step size is controlled by a **learning rate** (a small number).

3. **Repeat**  
   - Make many passes (epochs) over all known ratings until the error stops improving significantly.

Because Funk SVD uses **only known ratings** and updates parameters online, it works well with sparse matrices and scales to large datasets. [rpubs](https://rpubs.com/Argaadya/recommender-svdf)



### Making predictions

Once training is done:

- To predict how user $u$ will rate item $i$:  
  - Compute the **dot product** of their vectors: $p_u \cdot q_i$.  
  - Add any bias terms if the model uses them (global, user, item biases).  
- This predicted rating fills in the missing entry in the original rating matrix.  
- To recommend items to a user, you:
  - Predict ratings for items they haven’t rated.  
  - Sort by predicted score.  
  - Show the top items. [freecodecamp](https://www.freecodecamp.org/news/singular-value-decomposition-vs-matrix-factorization-in-recommender-systems-b1e99bc73599/)



### Why Funk SVD is useful

- Handles **sparse data** well (lots of missing ratings). [ceur-ws](https://ceur-ws.org/Vol-3842/paper18.pdf)
- Finds **latent patterns** in user taste and item style without hand-designed features.  
- Scales to large systems (e.g., Netflix-size) when implemented efficiently with SGD or related methods. [datajobs](https://datajobs.com/data-science-repo/Recommender-Systems-%5BNetflix%5D.pdf)