<a href="https://colab.research.google.com/github/couragedike1/-Recommendation-Systems/blob/Recommendation-Systems/SVD.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Movie Recommendation System using SVD (Singular Value Decomposition)

## Introduction

In this notebook, we will build a movie recommendation system using **Singular Value Decomposition (SVD)** with the **MovieLens 100k dataset**. SVD is a powerful matrix factorization technique that breaks down the user-item interaction matrix into lower-dimensional latent factors. This allows us to capture hidden relationships between users and items (movies) and make recommendations based on these latent features.

### MovieLens Dataset

We are using the **MovieLens 100k dataset**, which contains 100,000 ratings from 943 users on 1682 movies. Each user has rated at least 20 movies. This dataset is widely used for benchmarking in recommendation systems research.

### Singular Value Decomposition (SVD)

**SVD** is a technique that decomposes a matrix into three matrices:
- `U` (User latent features)
- `Σ` (Singular values representing the strength of latent features)
- `V` (Item latent features)

In this notebook, we use SVD to decompose the **user-item matrix**. By keeping only a few latent factors, we can represent both users and items in a low-dimensional space, capturing the underlying structure of the data and making predictions about unseen ratings. This allows us to recommend movies to users based on the latent factors learned through SVD.

### Steps

1. Load and preprocess the MovieLens 100k dataset.
2. Create a **user-item matrix** where rows are users and columns are movie ratings.
3. Apply **Truncated SVD** to reduce the dimensionality of the matrix.
4. Reconstruct the user-item matrix using the decomposed matrices.
5. Define a function to recommend movies based on the predicted ratings from the SVD model.
6. Test the system by generating movie recommendations for a specific user.



In [2]:
import pandas as pd
import numpy as np
from sklearn.decomposition import TruncatedSVD

# Load the MovieLens 100k dataset
column_names = ['user_id', 'item_id', 'rating', 'timestamp']
df = pd.read_csv('u.data', sep='\t', names=column_names)

# Load movie titles
movie_titles = pd.read_csv('u.item', sep='|', encoding='ISO-8859-1', header=None, usecols=[0, 1], names=['item_id', 'title'])

# Merge the movie titles with ratings data
df = pd.merge(df, movie_titles, on='item_id')

# Create the user-item matrix
user_item_matrix = df.pivot_table(index='user_id', columns='title', values='rating').fillna(0)

df


Unnamed: 0,user_id,item_id,rating,timestamp,title
0,196,242,3,881250949,Kolya (1996)
1,186,302,3,891717742,L.A. Confidential (1997)
2,22,377,1,878887116,Heavyweights (1994)
3,244,51,2,880606923,Legends of the Fall (1994)
4,166,346,1,886397596,Jackie Brown (1997)
...,...,...,...,...,...
99995,880,476,3,880175444,"First Wives Club, The (1996)"
99996,716,204,5,879795543,Back to the Future (1985)
99997,276,1090,1,874795795,Sliver (1993)
99998,13,225,2,882399156,101 Dalmatians (1996)


### Applying SVD to the User-Item Matrix

In this section, we use **TruncatedSVD** from `scikit-learn` to perform **Singular Value Decomposition (SVD)** on the user-item matrix. SVD helps reduce the dimensionality of the matrix by capturing the most significant latent factors that explain the relationship between users and items (movies).

1. **TruncatedSVD**:  
   We set `n_components=20`, which means the SVD will keep the top 20 latent factors. You can adjust this value to optimize the model.

2. **Matrix Reconstruction**:  
   After applying SVD, we reconstruct the user-item matrix by multiplying the decomposed matrices. This reconstructed matrix contains the predicted ratings for all users and movies based on the latent factors.

3. **Convert to DataFrame**:  
   Finally, we convert the reconstructed matrix back into a DataFrame so that we can easily work with the predicted ratings and make recommendations for specific users.


In [3]:
# Use TruncatedSVD to perform SVD on the user-item matrix
svd = TruncatedSVD(n_components=20)  # n_components can be tuned
svd_matrix = svd.fit_transform(user_item_matrix)

# Reconstruct the matrix using the dot product of the decomposed matrices
reconstructed_matrix = np.dot(svd_matrix, svd.components_)

# Convert the reconstructed matrix back to a DataFrame
reconstructed_df = pd.DataFrame(reconstructed_matrix, index=user_item_matrix.index, columns=user_item_matrix.columns)




### Generating Recommendations using the SVD-Reconstructed Matrix

In this section, we define a function `recommend_items_svd` that generates personalized movie recommendations for a user based on the ratings predicted by the SVD-reconstructed matrix.

1. **Predicted Ratings**:  
   The function retrieves the predicted ratings for a given user from the reconstructed matrix. These ratings are the SVD-based estimates of how the user might rate movies they haven't seen yet.

2. **Filter Unrated Movies**:  
   We filter out the movies that the user has already rated, so the recommendations are only for movies that the user has not interacted with.

3. **Sort and Recommend**:  
   The function sorts the predicted ratings for the unseen movies in descending order and selects the top `n` recommendations. These are the movies with the highest predicted ratings for the user.

4. **Example**:  
   We demonstrate this by generating movie recommendations for a specific user (in this case, `user_id=1`), showing the top 5 movies that the system recommends based on the user's predicted preferences.

This function allows us to effectively use the SVD-reconstructed matrix to recommend movies that a user might enjoy based on the latent patterns captured by SVD.


In [5]:
# Function to recommend items using the SVD-reconstructed matrix
def recommend_items_svd(user_id, reconstructed_df, original_ratings_df, num_recommendations=5):
    # Get the user's predicted ratings from the reconstructed matrix
    user_ratings = reconstructed_df.loc[user_id]

    # Get the original ratings of the user
    original_ratings = original_ratings_df.loc[user_id]

    # Filter out the items the user has already rated
    unrated_items = original_ratings[original_ratings == 0].index

    # Sort the predicted ratings for the items the user has not rated
    recommended_items = user_ratings.loc[unrated_items].sort_values(ascending=False).head(num_recommendations)

    return recommended_items

# Example: Recommend movies for a specific user (user_id = 1)
user_id = 9
recommendations_svd = recommend_items_svd(user_id, reconstructed_df, user_item_matrix, num_recommendations=5)
print(f"Top recommendations for User {user_id} based on SVD:\n", recommendations_svd)

Top recommendations for User 9 based on SVD:
 title
Godfather, The (1972)        1.469861
Fargo (1996)                 1.455227
Return of the Jedi (1983)    1.155571
Air Force One (1997)         0.930406
Full Monty, The (1997)       0.901597
Name: 9, dtype: float64
