# How Does SVD Help in Collaborative Filtering?

## Collaborative Filtering Context
- Collaborative filtering typically relies on a **user-item interaction matrix**, where:
  - Rows represent users.
  - Columns represent items.
  - Each entry corresponds to a user's rating or interaction with an item.
- The matrix is often sparse because most users rate or interact with only a small subset of available items.

---

## Role of SVD
Singular Value Decomposition (SVD) decomposes the user-item matrix $ R $ into three matrices:

$$
R \approx U \Sigma V^T
$$

Where:
- $U$: An $ m \times k $ matrix capturing the latent features for users.
- $\Sigma$ : A diagonal $ k \times k $ matrix of singular values (importance of latent features).
- $ V^T $: A $ k \times n $ matrix capturing the latent features for items.

---

## Benefits in Collaborative Filtering

### 1. Dimensionality Reduction
- The original user-item matrix can have very high dimensions (number of users and items).
- By using only the top $ k $ singular values, SVD reduces the dimensionality, focusing on the most significant latent features. This reduces noise and computational complexity.

### 2. Latent Feature Representation
- Latent features represent abstract user preferences and item characteristics (e.g., "genre preferences" or "popularity").
- SVD captures these relationships, enabling the system to generalize and make recommendations even for unseen user-item pairs.

### 3. Dealing with Sparsity
- By approximating the user-item matrix with the $ U $, $ \Sigma $, and $ V^T $ matrices, SVD fills in missing entries in the original sparse matrix, effectively predicting unknown ratings.

---

## Making Recommendations

The predicted rating for a user $ u $ and an item $ i $ is given by:

$$
\hat{r}_{ui} = U_u \Sigma V_i^T
$$

Where:
- $ U_u $: Latent feature vector for user $ u $.
- $ V_i^T $: Latent feature vector for item $ i $.

---

## Example
Imagine a user hasn't rated a movie, but SVD identifies latent features indicating:
- The user likes science fiction.
- The movie is strongly associated with science fiction.

SVD predicts a high rating, recommending the movie.

---

## Limitations of SVD
- **Cold Start Problem**: It cannot handle new users or items without sufficient historical data.
- **Computation Cost**: For very large matrices, computing SVD can be computationally expensive, though techniques like truncated SVD or alternating least squares (ALS) mitigate this.

---

## Summary
SVD enhances collaborative filtering by:
- Uncovering latent patterns in the user-item matrix.
- Predicting missing values.
- Reducing dimensionality to improve recommendation accuracy and efficiency.


In [5]:
import os

# Set the working directory
os.chdir(r'c:\Users\Ramro\OneDrive\Documents\Library\SJSU Semester 1 Material\CMPE-255\project\CMPE-255-Project')

# Print the current working directory to confirm
print(os.getcwd())

c:\Users\Ramro\OneDrive\Documents\Library\SJSU Semester 1 Material\CMPE-255\project\CMPE-255-Project


In [12]:
import pandas as pd

# Read the user_item_matrix_scaled.csv file
user_item_matrix_scaled = pd.read_csv('data_prep/user_item_matrix_scaled.csv', index_col=0).values

# Display the first few rows of the dataframe
display(user_item_matrix_scaled)

array([[ 1.35135521, -0.45137737,  3.87730537, ..., -0.04052204,
        -0.04052204, -0.04052204],
       [-0.71333277, -0.45137737, -0.2894531 , ..., -0.04052204,
        -0.04052204, -0.04052204],
       [-0.71333277, -0.45137737, -0.2894531 , ..., -0.04052204,
        -0.04052204, -0.04052204],
       ...,
       [ 0.57709721,  1.00737864,  1.79392613, ..., -0.04052204,
        -0.04052204, -0.04052204],
       [ 0.83518321, -0.45137737, -0.2894531 , ..., -0.04052204,
        -0.04052204, -0.04052204],
       [ 1.8675272 , -0.45137737, -0.2894531 , ..., -0.04052204,
        -0.04052204, -0.04052204]])

In [None]:
from sklearn.decomposition import TruncatedSVD

# Apply Truncated SVD for matrix factorization
svd = TruncatedSVD(n_components=50, random_state=42)
svd_matrix = svd.fit_transform(user_item_matrix_scaled)

# Convert the SVD results back into a DataFrame for easier understanding
svd_df = pd.DataFrame(svd_matrix, index=user_item_matrix_scaled.index)

# Show the SVD results
display(svd_df)


AttributeError: 'numpy.ndarray' object has no attribute 'index'

c:\Users\Ramro\OneDrive\Documents\Library\SJSU Semester 1 Material\CMPE-255\project\CMPE-255-Project


In [8]:
from sklearn.metrics import mean_squared_error
import numpy as np

# Define the custom RMSE scoring function
def rmse_scorer(model, X, y):
    # Predict ratings by multiplying user-item matrix with SVD components
    predicted_ratings = model.transform(X).dot(model.components_)
    # Compute RMSE
    return np.sqrt(mean_squared_error(y, predicted_ratings))


In [10]:
type(user_item_matrix_scaled)

pandas.core.frame.DataFrame

In [9]:

from sklearn.model_selection import cross_val_score, KFold

# Perform cross-validation with RMSE scoring
kf = KFold(n_splits=5, shuffle=True, random_state=42)
for train_index, test_index in kf.split(user_item_matrix_scaled):
    X_train, X_test = user_item_matrix_scaled[train_index], user_item_matrix_scaled[test_index]
    y_train, y_test = user_item_matrix_scaled[train_index], user_item_matrix_scaled[test_index]
    svd.fit(X_train)
    cv_scores = cross_val_score(svd, X_test, y_test, cv=kf, scoring=rmse_scorer)

# Show the results
print("Cross-validated RMSE scores:", cv_scores)
print("Mean RMSE:", np.mean(cv_scores))


KeyError: "None of [Index([  0,   1,   3,   4,   5,   7,   8,   9,  12,  13,\n       ...\n       596, 597, 599, 600, 601, 602, 603, 604, 607, 609],\n      dtype='int32', length=488)] are in the [columns]"