# Notebook 4: **Collaborative Filtering-Based Recommender System Using KNN**

Welcome to the fourth notebook of our project for AlgorithmArcade Inc. In this notebook, we will develop a collaborative filtering-based recommender system using the K-Nearest Neighbors (*KNN*) algorithm. This approach leverages user-item interactions to predict user preferences and generate personalized recommendations.

## **Table of Contents**

1. **Introduction**
2. **Import Libraries**
3. **Load Data**
4. **Preprocessing**
   * Prepare the Data for Surprise Library
5. **Build the KNN-Based Collaborative Filtering Model**
   * Choose a Similarity Metric
   * Train the Model
6. **Evaluate the Model**
   * Cross-Validation
   * Calculate RMSE and MAE
7. **Generate Recommendations**
   * Predict Ratings
   * Top-N Recommendations for Users
8. **Results and Analysis**
9. **Conclusion**
10. **Thanks and Contact Information**

## 1. **Introduction**

Collaborative filtering (*CF*) is a popular recommendation technique that makes automatic predictions about a user’s interests by collecting preferences from many users. The assumption is that if *user A* has the same opinion as *user B* on one item, *A* is more likely to share *B’s* opinion on a different item than that of a randomly chosen user.

In this notebook, we’ll:

   * Implement a KNN-based collaborative filtering recommender system using the Surprise library.
   * Evaluate the model’s performance.
   * Generate personalized recommendations for users.

## 2. **Import Libraries**

First, let’s import the necessary Python libraries.

In [1]:
# Data manipulation libraries
import pandas as pd
import numpy as np

# Surprise library for collaborative filtering
from surprise import Dataset, Reader
from surprise import KNNBasic
from surprise.model_selection import cross_validate, train_test_split
from surprise import accuracy

# For displaying progress bars
from tqdm import tqdm

# Visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# For displaying visuals in higher resolution
import matplotlib_inline
matplotlib_inline.backend_inline.set_matplotlib_formats('retina')

# Suppress warnings
import warnings
warnings.filterwarnings('ignore')

# Set consistent color palette
sns.set_palette('Blues_d')

## 3. **Load Data**

We will load the datasets.

In [2]:
# Load user rating data
user_ratings = pd.read_csv('../Data/user_rating_info.csv')
user_ratings.head()

Unnamed: 0.1,Unnamed: 0,user_id,course_id,rating
0,0,UID0001293,CID0001,5
1,1,UID0000806,CID0001,3
2,2,UID0000238,CID0001,4
3,3,UID0001129,CID0001,5
4,4,UID0001544,CID0001,3


In [3]:
# Load course data
course_info = pd.read_csv('../Data/course_info.csv')
course_info.head()

Unnamed: 0,course_id,title,description,data_analysis,data_science,data_engineering,data_visualization,business_intelligence,artificial_intelligence,cloud_computing
0,CID0001,Autonomous Vehicles: AI for Self-Driving Cars,Delve into the AI technologies powering autono...,0,0,0,0,0,1,0
1,CID0002,Recommendation Systems: Personalizing User Exp...,Build systems that predict user preferences. L...,0,1,0,0,0,1,0
2,CID0003,Deep Learning with TensorFlow: Building Neural...,Gain hands-on experience with TensorFlow to bu...,0,1,0,0,0,1,0
3,CID0004,Natural Language Processing (NLP): Teaching Ma...,Dive into NLP and learn how to enable machines...,0,1,0,0,0,1,0
4,CID0005,SQL for Data Science: Managing and Querying Da...,Master SQL to manage and query relational data...,1,1,1,1,1,1,1


## 4. **Preprocessing**

The Surprise library requires data in a specific format. We’ll prepare the data accordingly.

### 4.1 Prepare the Data for Surprise Library

Define a Reader object to specify the rating scale.

In [4]:
# Define the rating scale
reader = Reader(rating_scale=(1, 5))

Load the dataset into Surprise’s Dataset format.

In [5]:
# Load the data into Surprise dataset
data = Dataset.load_from_df(user_ratings[['user_id', 'course_id', 'rating']], reader)

## 5. **Build the KNN-Based Collaborative Filtering Model**

We’ll use the KNNBasic algorithm from the Surprise library to implement user-based collaborative filtering.

### 5.1 Choosing a Similarity Metric

We’ll use the **cosine** as the similarity metric.

In [6]:
# Define similarity options
sim_options = {
    'name': 'cosine', 
    'user_based': True   # True for user-based collaborative filtering
}

### 5.2 Training the Model

Split the data into training and test sets.

In [7]:
# Split the data into training and test sets
trainset, testset = train_test_split(data, test_size=0.2, random_state=42)

Initialize the KNNBasic algorithm with the specified similarity options.

In [8]:
# Initialize the KNNBasic algorithm
algo = KNNBasic(sim_options=sim_options)

Train the model on the training set.

In [9]:
# Train the model
algo.fit(trainset)

Computing the cosine similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNBasic at 0x157ad3fd0>

## 6. **Evaluate the Model**

We will evaluate the model’s performance using cross-validation and by calculating the **RMSE** and **MAE** on the test set.

### 6.1 Cross-Validation

In [10]:
# Perform cross-validation
cv_results = cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Evaluating RMSE, MAE of algorithm KNNBasic on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.0035  1.0114  0.9968  1.0165  1.0389  1.0134  0.0144  
MAE (testset)     0.7858  0.7914  0.7789  0.7848  0.8053  0.7892  0.0089  
Fit time          0.08    0.07    0.07    0.06    0.09    0.07    0.01    
Test time         0.22    0.21    0.20    0.20    0.22    0.21    0.01    


### 6.2 Calculate RMSE and MAE

Use the trained model to make predictions on the test set.

In [11]:
# Make predictions on the test set
predictions = algo.test(testset)

Calculate the RMSE and MAE on the test set.

In [12]:
# Calculate RMSE
test_rmse = accuracy.rmse(predictions, verbose=True)
test_mae = accuracy.mae(predictions)

RMSE: 0.8906
MAE:  0.6886


## 7. **Generate Recommendations**

We will generate top-N course recommendations for each user based on the predicted ratings.

### 7.1 Predict Ratings

Get a list of all user IDs and course IDs.

In [13]:
# Get all unique user IDs and course IDs
user_ids = user_ratings['user_id'].unique()
course_ids = user_ratings['course_id'].unique()

Build a user-item interaction matrix.

In [14]:
# Build a set of (user_id, course_id) pairs that are not in the training set
trainset_users = set([trainset.to_raw_uid(u) for u in trainset.all_users()])
trainset_items = set([trainset.to_raw_iid(i) for i in trainset.all_items()])

# Create a dictionary to store the items already rated by each user
rated_items = {}
for uid, iid, _ in trainset.all_ratings():
    user = trainset.to_raw_uid(uid)
    item = trainset.to_raw_iid(iid)
    if user not in rated_items:
        rated_items[user] = set()
    rated_items[user].add(item)

### 7.2 Top-N Recommendations for Users

In [15]:
def get_top_n_recommendations(algo, user_id, n=10):
    # Get the list of items the user hasn't rated yet
    items_to_predict = [iid for iid in course_ids if iid not in rated_items.get(user_id, [])]
    
    # Predict ratings for all these items
    predictions = [algo.predict(user_id, iid) for iid in items_to_predict]
    
    # Sort the predictions by estimated rating
    predictions.sort(key=lambda x: x.est, reverse=True)
    
    # Get the top-N item IDs
    top_n_items = [pred.iid for pred in predictions[:n]]
    
    return top_n_items

**Example Recommendations for a User:**

In [19]:
# Example user
user_id_example = user_ids[0]  # You can replace this with any user ID from user_ids

# Get top-N recommendations
top_n_courses = get_top_n_recommendations(algo, user_id_example, n=10)

# Get course titles
recommended_titles = course_info[course_info['course_id'].isin(top_n_courses)]['title']
print(f"Top recommendations for {user_id_example}:\n")
[print(title) for title in recommended_titles.values.tolist()]

Top recommendations for UID0001293:

Serverless Architecture Patterns: Building Modern Applications
Computer Vision Techniques: Enabling Machines to See
Self-Service BI Platforms: Empowering Business Users
Data Governance Implementation: Policies and Practices
Survey Data Analysis: Extracting Insights from Questionnaires
Advanced Hypothesis Testing: Beyond the Basics
Data Stream Processing in AI: Handling Continuous Data
Data Warehouse Design: Architecting for Scalability
Data Governance Framework: Establishing BI Standards
Self-Service BI with Qlik Sense


[None, None, None, None, None, None, None, None, None, None]

## 8. **Results and Analysis**

#### **Model Performance**:

   * The RMSE on the test set provides an unbiased estimate of the model’s performance on unseen data.
   * The model evaluated with an RMSE score of **~0.88** and an MAE score of **~0.70**.

#### **Recommendations**:

   * The top-N recommendations for each user are based on the predicted ratings from similar users.
   * Since we’re using user-based collaborative filtering, the recommendations capture community preferences.

#### **Advantages**:

   * The model considers the preferences of similar users, which can help in discovering new courses.
   * Recommendations are tailored to each user based on their similarity to others.

#### **Limitations**:

   * If there are users with few ratings, it may be challenging to find similar users.
   * KNN algorithms can be computationally intensive with large datasets.

## 9. **Conclusion**

In this notebook, we developed a KNN-based collaborative filtering recommender system using the Surprise library. We trained the model, evaluated its performance, and generated personalized recommendations for users.

## 10. **Thanks and Contact Information**

Thank you for reviewing this project notebook. For any further questions, suggestions, or collaborations, please feel free to reach out:

   * [**Email**](mailto:leejoabraham01@gmail.com)
   * [**LinkedIn**](https://www.linkedin.com/in/leejoabraham01)
   * [**GitHub**](https://github.com/LeejoAbraham01)