Collaborative filtering is a **recommendation system technique** that predicts a user's preferences based on **other users’ behaviors**. The fundamental idea is: **"Users who agreed in the past will likely agree again in the future."**


### **Core Concept**

Collaborative filtering uses **user-item interactions** (like movie ratings, product clicks, etc.) to infer preferences. It **does not require metadata** about the items or users — only interaction data.

There are two main types:
1. **User-based filtering**: Find similar users to the target user.
2. **Item-based filtering**: Find items similar to the ones the user liked.

A more powerful and scalable method is:
3. **Matrix factorization**: Learn latent features of users and items from their interaction matrix.


### **Implementation via Matrix Factorization (SVD / ALS)**

Let:
- $ R \in \mathbb{R}^{m \times n} $ be the **user-item interaction matrix** (e.g. ratings, views).
- $ m $ = number of users, $ n $ = number of items.
- The goal: predict unknown entries in $ R $.

### **1. Factorization**

Decompose $ R \approx U \cdot V^T $

- $ U \in \mathbb{R}^{m \times k} $: user-feature matrix
- $ V \in \mathbb{R}^{n \times k} $: item-feature matrix
- $ k \ll m,n $: number of latent features

Then:  
$$
\hat{R}_{ij} = U_i \cdot V_j^T
$$

Where $ \hat{R}_{ij} $ is the predicted rating by user $ i $ for item $ j $.

---

### **2. Loss Function**

We minimize:

$$
\min_{U, V} \sum_{(i,j) \in \mathcal{K}} (R_{ij} - U_i \cdot V_j^T)^2 + \lambda (||U||^2 + ||V||^2)
$$

Where:
- $ \mathcal{K} $: indices of known ratings.
- $ \lambda $: regularization parameter to avoid overfitting.

---

### **3. Optimization**

Use **stochastic gradient descent (SGD)** or **alternating least squares (ALS)** to update $ U $ and $ V $.

---

### **Detailed Example**

Imagine a movie rating dataset:

|       | Movie A | Movie B | Movie C | Movie D |
|-------|---------|---------|---------|---------|
| User1 |   5     |   ?     |   3     |   ?     |
| User2 |   4     |   ?     |   2     |   1     |
| User3 |   1     |   3     |   ?     |   5     |
| User4 |   ?     |   4     |   ?     |   4     |

Our goal: Predict missing entries.

**Steps:**
1. Initialize $ U $ and $ V $ randomly.
2. Use gradient descent to minimize loss.
3. After convergence, estimate missing entries like:
   $$
   \hat{R}_{1B} = U_1 \cdot V_B^T
   $$
4. Recommend items with highest predicted rating.

### Mean Normalization and how it helps for Collaborative Filtering

In collaborative filtering, users have varying rating behaviors—some consistently rate items higher, while others rate lower. Mean normalization adjusts for these biases by subtracting the user's average rating from each of their ratings:

$$
\tilde{r}_{ui} = r_{ui} - \bar{r}_u
$$

Where:
- $ r_{ui} $ is the original rating of user $ u $ for item $ i $.
- $ \bar{r}_u $ is the average rating given by user $ u $.
- $ \tilde{r}_{ui} $ is the normalized rating.

This normalization centers each user's ratings around zero, highlighting their relative preferences.

### Why Use Mean Normalization?

1. **Adjusts for User Biases**: By centering ratings, we account for users who tend to rate items unusually high or low, ensuring that the model captures true preferences rather than rating habits.

2. **Improves Similarity Measures**: In user-based collaborative filtering, similarity computations (like cosine similarity or Pearson correlation) benefit from normalized data, leading to more accurate neighbor identification.

3. **Enhances Model Training**: For matrix factorization methods, mean normalization can lead to faster convergence and better performance by reducing variance in the data.


### Addressing the Cold Start Problem

The cold start problem arises when a new user has no prior interactions, making it challenging to provide personalized recommendations. Mean normalization aids in this scenario by:

- **Defaulting to Item Averages**: In the absence of user data, the system can rely on item average ratings, which are more informative when user biases are removed through normalization.

- **Facilitating Hybrid Approaches**: Combining normalized collaborative filtering with content-based methods allows the system to make reasonable recommendations even for new users.


As noted in discussions on recommender systems, mean normalization helps in handling the cold start problem to some extent by providing a baseline for predictions when user-specific data is sparse.  


### Example

Consider the following user-item rating matrix:

|       | Item A | Item B | Item C |
|-------|--------|--------|--------|
| User1 |   5    |   3    |   4    |
| User2 |   2    |   1    |   2    |
| User3 |   4    |   5    |   5    |

**Step 1: Compute User Averages**

- User1: $ \bar{r}_1 = (5 + 3 + 4)/3 = 4.0 $
- User2: $ \bar{r}_2 = (2 + 1 + 2)/3 = 1.67 $
- User3: $ \bar{r}_3 = (4 + 5 + 5)/3 = 4.67 $

**Step 2: Normalize Ratings**

|       | Item A | Item B | Item C |
|-------|--------|--------|--------|
| User1 | 1.0    | -1.0   | 0.0    |
| User2 | 0.33   | -0.67  | 0.33   |
| User3 | -0.67  | 0.33   | 0.33   |

These normalized ratings can now be used in collaborative filtering algorithms to predict unknown ratings more accurately.

# Collaborative Filtering with the Surprise library

Surprise is a Python scikit for recommender systems. It has many built-in features that help to build, train, test recommender systems.

Let's see an example of how to use collaborative filtering with Suprise.

In [2]:
# Install dependencies as needed:
%pip install numpy==1.25.2
%pip install --upgrade matplotlib
%pip install --upgrade scikit-surprise

Collecting numpy==1.25.2
  Using cached numpy-1.25.2-cp311-cp311-win_amd64.whl.metadata (5.7 kB)
Using cached numpy-1.25.2-cp311-cp311-win_amd64.whl (15.5 MB)
Installing collected packages: numpy
Successfully installed numpy-1.25.2
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.0.1 -> 25.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.0.1 -> 25.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.0.1 -> 25.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [15]:
import matplotlib.pyplot as plt
import numpy as np
from surprise import accuracy, Dataset, SVD
from surprise.model_selection import train_test_split

In [12]:
# Load the movielens-100k dataset.
data = Dataset.load_builtin('ml-100k')

In [13]:
# Visualizing all user ratings of items
print(data.build_full_trainset().ur)

defaultdict(<class 'list'>, {0: [(0, 3.0), (528, 4.0), (377, 4.0), (522, 3.0), (431, 5.0), (834, 5.0), (380, 4.0), (329, 4.0), (550, 5.0), (83, 4.0), (632, 2.0), (86, 4.0), (289, 5.0), (363, 3.0), (438, 5.0), (389, 5.0), (649, 4.0), (947, 4.0), (423, 3.0), (291, 3.0), (10, 2.0), (1006, 4.0), (179, 3.0), (751, 3.0), (487, 3.0), (665, 3.0), (92, 4.0), (512, 5.0), (1045, 3.0), (672, 4.0), (656, 4.0), (221, 5.0), (432, 2.0), (365, 3.0), (321, 2.0), (466, 4.0), (302, 4.0), (491, 3.0), (521, 1.0)], 1: [(1, 3.0), (476, 5.0), (305, 1.0), (577, 4.0), (627, 3.0), (746, 5.0), (800, 3.0), (151, 4.0), (114, 4.0), (433, 4.0), (370, 1.0), (970, 5.0), (516, 3.0), (51, 5.0), (527, 1.0), (280, 3.0), (204, 4.0), (364, 3.0), (349, 2.0), (368, 4.0), (77, 2.0), (1102, 4.0), (1152, 4.0), (309, 5.0), (197, 1.0), (10, 4.0), (469, 5.0), (140, 5.0), (43, 3.0), (60, 1.0), (443, 4.0), (899, 4.0), (864, 4.0), (748, 5.0), (25, 4.0), (49, 4.0), (85, 3.0), (416, 3.0), (23, 2.0), (758, 5.0), (652, 5.0), (585, 3.0), (34

In [16]:
# sample random trainset and testset
# test set is made of 25% of the ratings.
trainset, testset = train_test_split(data, test_size=0.25)

In [17]:
# We'll use the famous SVD algorithm.
algo = SVD()

In [18]:
# Train the algorithm on the trainset, and predict ratings for the testset
algo.fit(trainset)
predictions = algo.test(testset)

In [19]:
print(predictions)

[Prediction(uid='883', iid='1592', r_ui=5.0, est=3.700147335392315, details={'was_impossible': False}), Prediction(uid='416', iid='345', r_ui=5.0, est=4.018772059116901, details={'was_impossible': False}), Prediction(uid='699', iid='886', r_ui=3.0, est=2.8916483363638346, details={'was_impossible': False}), Prediction(uid='137', iid='249', r_ui=4.0, est=4.296454540216142, details={'was_impossible': False}), Prediction(uid='805', iid='417', r_ui=2.0, est=2.5885897682853853, details={'was_impossible': False}), Prediction(uid='157', iid='298', r_ui=4.0, est=4.066750957051098, details={'was_impossible': False}), Prediction(uid='473', iid='127', r_ui=5.0, est=4.0796785417485, details={'was_impossible': False}), Prediction(uid='200', iid='215', r_ui=4.0, est=4.3646908573341205, details={'was_impossible': False}), Prediction(uid='521', iid='241', r_ui=4.0, est=3.1490969255278274, details={'was_impossible': False}), Prediction(uid='889', iid='869', r_ui=3.0, est=2.974384254562699, details={'wa

In [20]:
# Then compute RMSE
accuracy.rmse(predictions)

RMSE: 0.9413


0.9412543188270663