<a href="https://colab.research.google.com/github/Labdhiiii/GIT-Learning/blob/master/Recommendation_Systems_Collaborative_Filtering_using_Matrix_Factorization_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

| Step | What Happens               | Code Part                              |
| ---- | -------------------------- | -------------------------------------- |
| 1️⃣  | Load data                  | `pd.read_csv(... u.data ...)`          |
| 2️⃣  | Build user-item matrix     | `user_movie_matrix = df.pivot(...)`    |
| 3️⃣  | Apply Matrix Factorization | `TruncatedSVD(...).fit_transform(...)` |
| 4️⃣  | Predict ratings            | `np.dot(...)`                          |
| 5️⃣  | Recommend top movies       | Sort predicted ratings per user        |


In [1]:
import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error
from math import sqrt


In [3]:
ratings = pd.read_csv('u.data', sep='\t', names=['user_id', 'item_id', 'rating', 'timestamp'])
ratings.drop('timestamp', axis=1, inplace=True)


In [4]:
user_item_matrix = ratings.pivot_table(index='user_id', columns='item_id', values='rating')
user_item_matrix.fillna(0, inplace=True)

In [5]:
from sklearn.decomposition import TruncatedSVD

svd = TruncatedSVD(n_components=20, random_state=42)
matrix_svd = svd.fit_transform(user_item_matrix)




*   The core concept of Matrix Factorization is to factor the user-item rating matrix into two lower-dimensional matrice

*  This part does the magic:


      1.   user_movie_matrix: a matrix where rows = users, columns = movies, values = ratings.

      2.   .fillna(0): fills missing ratings with 0 (unrated).
      TruncatedSVD: this is the matrix factorization step. It reduces the large

      3.   matrix into lower-dimension latent factors (like "user taste" and "movie genre affinity").
      4.   fit_transform: decomposes the matrix into latent user features.

In [6]:
predicted_ratings = np.dot(matrix_svd, svd.components_)

This line is where the magic of matrix factorization comes to life. It's how we rebuild the user-item rating matrix using the compressed latent features.

You had a user-item matrix where:

Rows = Users

Columns = Movies

Values = Ratings (with missing ones)

In [7]:
true_values = user_item_matrix.values[user_item_matrix.values.nonzero()].flatten()
predicted_values = predicted_ratings[user_item_matrix.values.nonzero()].flatten()

rmse = sqrt(mean_squared_error(true_values, predicted_values))
print("RMSE:", rmse)

RMSE: 2.132852123917159


In [9]:
import pandas as pd
import numpy as np
from sklearn.decomposition import TruncatedSVD

# Load the ratings data
ratings = pd.read_csv("u.data", sep="\t", names=["user_id", "item_id", "rating", "timestamp"])
ratings.drop("timestamp", axis=1, inplace=True)

# Load movie titles
movies = pd.read_csv("u.item", sep="|", encoding="latin-1", header=None, usecols=[0, 1], names=["item_id", "title"])

# Merge ratings with movie titles
data = pd.merge(ratings, movies, on="item_id")

# Create user-item matrix
user_movie_matrix = data.pivot_table(index="user_id", columns="title", values="rating")

# Fill NaNs with 0s for SVD
matrix_filled = user_movie_matrix.fillna(0)

# Apply SVD
svd = TruncatedSVD(n_components=20, random_state=42)
matrix_svd = svd.fit_transform(matrix_filled)

# Reconstruct the ratings
predicted_ratings = np.dot(matrix_svd, svd.components_)

# Map back to movie titles
predicted_df = pd.DataFrame(predicted_ratings, index=user_movie_matrix.index, columns=user_movie_matrix.columns)

# Function to recommend top N movies
def recommend_movies(user_id, original_df, predicted_df, num_recommendations=5):
    # Movies the user already rated
    user_rated = original_df.loc[user_id].dropna().index.tolist()

    # Predicted ratings for the user
    user_predictions = predicted_df.loc[user_id]

    # Remove already rated movies
    user_predictions = user_predictions.drop(user_rated)

    # Get top recommendations
    recommended_movies = user_predictions.sort_values(ascending=False).head(num_recommendations)

    print(f"Top {num_recommendations} recommendations for User {user_id}:\n")
    for i, (title, rating) in enumerate(recommended_movies.items(), start=1):
        print(f"{i}. {title} — Predicted Rating: {round(rating, 2)}")

# Recommend for User 10
recommend_movies(user_id=10, original_df=user_movie_matrix, predicted_df=predicted_df)


Top 5 recommendations for User 10:

1. Godfather: Part II, The (1974) — Predicted Rating: 4.12
2. Annie Hall (1977) — Predicted Rating: 4.11
3. To Kill a Mockingbird (1962) — Predicted Rating: 3.87
4. Schindler's List (1993) — Predicted Rating: 3.78
5. Babe (1995) — Predicted Rating: 3.47
