# Notebook 5: **Collaborative Filtering-Based Recommender System Using NMF**

Welcome to the fifth notebook of our project for AlgorithmArcade Inc. In this notebook, we will develop a collaborative filtering-based recommender system using Non-Negative Matrix Factorization (*NMF*). NMF is a matrix factorization technique that can uncover latent features in the user-item interaction data, helping us to make personalized recommendations.

## **Table of Contents**

1. **Introduction**
2. **Import Libraries**
3. **Load Data**
4. **Preprocessing**
   * Prepare the Data for Surprise Library
5. **Build the NMF-Based Collaborative Filtering Model**
   * Train the Model
6. **Evaluate the Model**
   * Cross-Validation
   * Calculate RMSE and MAE
7. **Generate Recommendations**
   * Predict Ratings
   * Top-N Recommendations for Users
8. **Results and Analysis**
9. **Conclusion**
10. **Thanks and Contact Information**

## 1. **Introduction**

Non-Negative Matrix Factorization (*NMF*) is a collaborative filtering technique that factorizes the user-item interaction matrix into two lower-dimensional non-negative matrices representing user and item latent features. These latent features capture underlying patterns in the data, allowing us to predict user preferences and generate recommendations.

In this notebook, we’ll:

   * Implement an NMF-based collaborative filtering recommender system using the Surprise library.
   * Evaluate the model’s performance.
   * Generate personalized recommendations for users.

## 2. **Import Libraries**

First, let’s import the necessary Python libraries.

In [1]:
from collections import defaultdict

# Data manipulation libraries
import pandas as pd
import numpy as np

# Surprise library for collaborative filtering
from surprise import Dataset, Reader
from surprise import NMF
from surprise.model_selection import cross_validate, train_test_split
from surprise import accuracy

# For displaying progress bars
from tqdm import tqdm

# Visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# For displaying visuals in higher resolution
import matplotlib_inline
matplotlib_inline.backend_inline.set_matplotlib_formats('retina')

# Suppress warnings
import warnings
warnings.filterwarnings('ignore')

# Set consistent color palette
sns.set_palette('Blues_d')

## 3. **Load Data**

We will load the `user_rating_info.csv` dataset.

In [2]:
# Load user rating data
user_ratings = pd.read_csv('../Data/user_rating_info.csv')
user_ratings.head()

Unnamed: 0.1,Unnamed: 0,user_id,course_id,rating
0,0,UID0001293,CID0001,5
1,1,UID0000806,CID0001,3
2,2,UID0000238,CID0001,4
3,3,UID0001129,CID0001,5
4,4,UID0001544,CID0001,3


In [3]:
# Load course data
course_info = pd.read_csv('../Data/course_info.csv')
course_info.head()

Unnamed: 0,course_id,title,description,data_analysis,data_science,data_engineering,data_visualization,business_intelligence,artificial_intelligence,cloud_computing
0,CID0001,Autonomous Vehicles: AI for Self-Driving Cars,Delve into the AI technologies powering autono...,0,0,0,0,0,1,0
1,CID0002,Recommendation Systems: Personalizing User Exp...,Build systems that predict user preferences. L...,0,1,0,0,0,1,0
2,CID0003,Deep Learning with TensorFlow: Building Neural...,Gain hands-on experience with TensorFlow to bu...,0,1,0,0,0,1,0
3,CID0004,Natural Language Processing (NLP): Teaching Ma...,Dive into NLP and learn how to enable machines...,0,1,0,0,0,1,0
4,CID0005,SQL for Data Science: Managing and Querying Da...,Master SQL to manage and query relational data...,1,1,1,1,1,1,1


## 4. **Preprocessing**

The Surprise library requires data in a specific format. We’ll prepare the data accordingly.

### 4.1 Prepare the Data for Surprise Library

Define a Reader object to specify the rating scale.

In [4]:
# Define the rating scale
reader = Reader(rating_scale=(1, 5))

Load the dataset into Surprise’s Dataset format.

In [5]:
# Load the data into Surprise dataset
data = Dataset.load_from_df(user_ratings[['user_id', 'course_id', 'rating']], reader)

## 5. **Build the NMF-Based Collaborative Filtering Model**

We’ll use the NMF algorithm from the Surprise library to implement user-based collaborative filtering.

### 5.1 Training the Model

Split the data into training and test sets.

In [6]:
# Split the data into training and test sets
trainset, testset = train_test_split(data, test_size=0.2, random_state=42)

Initialize the NMF algorithm.

In [7]:
# Initialize the NMF algorithm
algo = NMF(n_factors=15, n_epochs=50, random_state=42)

Train the model on the training set.

In [8]:
# Train the model
algo.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.NMF at 0x156cb7520>

## 6. **Evaluate the Model**

We will evaluate the model’s performance using cross-validation and by calculating the **RMSE** and **MAE** on the test set.

### 6.1 Cross-Validation

In [9]:
# Perform cross-validation
cv_results = cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm NMF on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.3241  1.3076  1.3304  1.3377  1.3696  1.3339  0.0204  
MAE (testset)     1.0762  1.0642  1.0777  1.0871  1.1100  1.0831  0.0153  
Fit time          0.09    0.09    0.09    0.09    0.09    0.09    0.00    
Test time         0.00    0.00    0.00    0.00    0.00    0.00    0.00    


**Calculate Average RMSE and MAE**:

In [10]:
# Calculate average RMSE and MAE
mean_rmse = np.mean(cv_results['test_rmse'])
mean_mae = np.mean(cv_results['test_mae'])
print(f'Average RMSE from cross-validation: {mean_rmse:.4f}')
print(f'Average MAE from cross-validation: {mean_mae:.4f}')

Average RMSE from cross-validation: 1.3339
Average MAE from cross-validation: 1.0831


### 6.2 Calculate RMSE and MAE

Use the trained model to make predictions on the test set.

In [11]:
# Make predictions on the test set
predictions = algo.test(testset)

Calculate the **RMSE** and **MAE** on the test set.

In [12]:
# Calculate RMSE
test_rmse = accuracy.rmse(predictions, verbose=True)
test_mae = accuracy.mae(predictions)

RMSE: 0.6406
MAE:  0.3544


## 7. **Generating Recommendations**

We will generate top-N course recommendations for each user based on the predicted ratings.

### 7.1 Predicting Ratings

Get a list of all user IDs and course IDs.

In [13]:
# Get all unique user IDs and course IDs
user_ids = user_ratings['user_id'].unique()
course_ids = user_ratings['course_id'].unique()

Build a set of courses each user has already rated.

In [14]:
# Create a dictionary to store the items already rated by each user
rated_items = defaultdict(set)
for uid, iid, _ in trainset.all_ratings():
    user = trainset.to_raw_uid(uid)
    item = trainset.to_raw_iid(iid)
    rated_items[user].add(item)

### 7.2 Top-N Recommendations for Users

In [15]:
def get_top_n_recommendations(algo, user_id, n=10):
    # Get the list of items the user hasn't rated yet
    items_to_predict = [iid for iid in course_ids if iid not in rated_items.get(user_id, set())]
    
    # Predict ratings for all these items
    predictions = [algo.predict(user_id, iid) for iid in items_to_predict]
    
    # Sort the predictions by estimated rating
    predictions.sort(key=lambda x: x.est, reverse=True)
    
    # Get the top-N item IDs
    top_n_items = [pred.iid for pred in predictions[:n]]
    
    return top_n_items

**Example Recommendations for a User:**

In [16]:
# Example user
user_id_example = user_ids[0]  # You can replace this with any user ID from user_ids

# Get top-N recommendations
top_n_courses = get_top_n_recommendations(algo, user_id_example, n=10)

# Get course titles
recommended_titles = course_info[course_info['course_id'].isin(top_n_courses)]['title']
print(f"Top recommendations for {user_id_example}:\n")
[print(title) for title in recommended_titles.tolist()]

Top recommendations for UID0001293:

Generative Adversarial Networks (GANs): Creating with AI
Applied Machine Learning: Real-World Projects
KPI Dashboards: Measuring Business Performance
Predictive Modeling: Forecasting with Data
Descriptive Statistics: Summarizing Data for Insights
Storytelling with Data in BI
Cloud Resource Provisioning: Automating Cloud Infrastructure
Interactive Data Visualizations with Plotly
Advanced Visualization Techniques: Beyond Basic Charts
Applied Econometrics: Data Analysis in Economics


[None, None, None, None, None, None, None, None, None, None]

## 8. **Results and Analysis**

#### **Model Performance**:

   * The average RMSE from cross-validation indicates the model’s ability to predict ratings accurately.
   * The RMSE on the test set provides an unbiased estimate of the model’s performance on unseen data.
   * The model evaluated with an RMSE score of **0.5473** and an MAE score of **0.2754**. 

#### **Recommendations**:

   * The top-N recommendations for each user are based on the predicted ratings derived from latent factors.
   * NMF captures latent features that may not be directly observable, allowing for more nuanced recommendations.

#### **Advantages**:

   * NMF works well with sparse datasets common in recommender systems.
   * Captures complex patterns in user-item interactions.

#### **Limitations**:

   * The latent factors are not easily interpretable.
   * Cold Start Problem: New users or items with no interactions are difficult to handle.

## 9. **Conclusion**

In this notebook, we developed a collaborative filtering-based recommender system using Non-Negative Matrix Factorization (*NMF*) with the Surprise library. We trained the model, evaluated its performance, and generated personalized recommendations for users.

## 10. **Thanks and Contact Information**

Thank you for reviewing this project notebook. For any further questions, suggestions, or collaborations, please feel free to reach out:

   * [**Email**](mailto:leejoabraham01@gmail.com)
   * [**LinkedIn**](https://www.linkedin.com/in/leejoabraham01)
   * [**GitHub**](https://github.com/LeejoAbraham01)