# Notebook 2: **Content-Based Recommender Systems**

Welcome to the second notebook of our project for AlgorithmArcade Inc. In this notebook, we will develop two content-based recommender systems:

   * **Part A**: Content-Based Recommender System Based on User-Course Interactions.
   * **Part B**: Content-Based Recommender System Based on Course Similarity.

These systems aim to provide personalized course recommendations by leveraging course content and user interactions.

## **Table of Contents**

1. **Introduction**
2. **Import Libraries**
3. **Load Data**
4. **Preprocessing**
   * Merge Data
   * Encode Course Topics
5. **Part A: User-Course Interaction-Based Recommender**
   * Build User Profiles
   * Calculate Similarity Scores
   * Generate Recommendations
6. **Part B: Course Similarity-Based Recommender**
   * Calculate Course Similarity Matrix
   * Generate Recommendations
7. **Results and Analysis**
8. **Conclusion**
9. **Thanks and Contact Information**

## 1. **Introduction**

Content-based recommender systems suggest items similar to those a user liked in the past or match the user’s explicit preferences. They rely on item features and user profiles to generate recommendations.

In this notebook, we’ll:

   * **Part A**: Create user profiles based on their interactions and recommend courses matching their interests.
   * **Part B**: Use course similarities to recommend courses similar to those a user has liked before.

## 2. **Import Libraries**

First, let’s import the necessary Python libraries.

In [1]:
# Data manipulation libraries
import pandas as pd
import numpy as np

# Text processing
from sklearn.feature_extraction.text import TfidfVectorizer

# Similarity measurement
from sklearn.metrics.pairwise import cosine_similarity

# For progress bars
from tqdm import tqdm

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# For displaying visuals in higher resolution
import matplotlib_inline
matplotlib_inline.backend_inline.set_matplotlib_formats('retina')

# Suppress warnings
import warnings
warnings.filterwarnings('ignore')

# Set consistent color palette
sns.set_palette('Blues_d')

## 3. **Load Data**

We will load the datasets from the data folder.

In [2]:
# Load user rating data
user_ratings = pd.read_csv('../Data/user_rating_info.csv', usecols=['user_id', 'course_id', 'rating'])

# Load course information data
course_info = pd.read_csv('../Data/course_info.csv')

## 4. **Preprocessing**

Before building the recommender systems, we need to preprocess the data.

### 4.1 Merge Data

We’ll merge user_ratings and course_info to have all relevant information in one DataFrame.

In [3]:
# Merge user ratings with course info
data = pd.merge(user_ratings, course_info, on='course_id')
data.head(1)

Unnamed: 0,user_id,course_id,rating,title,description,data_analysis,data_science,data_engineering,data_visualization,business_intelligence,artificial_intelligence,cloud_computing
0,UID0001293,CID0001,5,Autonomous Vehicles: AI for Self-Driving Cars,Delve into the AI technologies powering autono...,0,0,0,0,0,1,0


### 4.2 Encode Course Topics

Ensure that topic columns are correctly interpreted as numeric binary features.

In [4]:
# List of topic columns
topic_columns = ['data_analysis', 'data_science', 'data_engineering',
                 'data_visualization', 'business_intelligence',
                 'artificial_intelligence', 'cloud_computing']

# Convert topic columns to integer type
for col in topic_columns:
    course_info[col] = course_info[col].astype(int)

## 5. **Part A: User-Course Interaction-Based Recommender**

In this section, we’ll build a recommender system that creates user profiles based on their past interactions and recommends courses that align with their interests.

### 5.1 Build User Profiles

For each user, identify the courses they rated highly (e.g., ratings of 4 or 5).

In [5]:
# Filter interactions with high ratings
high_rating_interactions = user_ratings[user_ratings['rating'] >= 4]

For each user, aggregate the topics of these highly rated courses to build a user profile.

In [6]:
# Merge with course topics
high_rating_courses = pd.merge(high_rating_interactions, course_info, on='course_id')

# Initialize an empty DataFrame for user profiles
user_profiles = pd.DataFrame(0, index=user_ratings['user_id'].unique(), columns=topic_columns)

# Build user profiles
for user in tqdm(user_profiles.index):
    # Get the topics of courses rated highly by the user
    user_data = high_rating_courses[high_rating_courses['user_id'] == user][topic_columns]
    # Sum the topics to create the profile
    user_profiles.loc[user] = user_data.sum()
    
# Normalize the user profiles
user_profiles = user_profiles.div(user_profiles.sum(axis=1), axis=0).fillna(0)

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2338/2338 [00:01<00:00, 1511.67it/s]


**Note**: Each user’s profile is a vector representing the proportion of topics in their preferred courses.

### 5.2 Calculate Similarity Scores

Represent each course as a vector of topics.

In [7]:
# Get course topic vectors
course_topics = course_info.set_index('course_id')[topic_columns]

Calculate the cosine similarity between each user profile and all course topic vectors.

In [8]:
# Function to compute similarity scores for a user
def compute_similarity(user_id):
    user_vector = user_profiles.loc[user_id].values.reshape(1, -1)
    similarities = cosine_similarity(user_vector, course_topics.values)
    return similarities.flatten()

### 5.2 Generate Recommendations

For each user, recommend courses they haven’t interacted with, ranked by similarity score.

In [9]:
# Dictionary to store recommendations
user_recommendations = {}

for user in tqdm(user_profiles.index):
    # Compute similarity scores
    similarities = compute_similarity(user)
    
    # Create a DataFrame with course IDs and similarity scores
    sim_df = pd.DataFrame({
        'course_id': course_topics.index,
        'similarity': similarities
    })
    
    # Exclude courses the user has already interacted with
    interacted_courses = user_ratings[user_ratings['user_id'] == user]['course_id'].tolist()
    sim_df = sim_df[~sim_df['course_id'].isin(interacted_courses)]
    
    # Get top N recommendations
    top_recommendations = sim_df.sort_values(by='similarity', ascending=False).head(10)
    
    # Store recommendations
    user_recommendations[user] = top_recommendations['course_id'].tolist()

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2338/2338 [00:01<00:00, 1344.47it/s]


**Example Recommendations for a User**:

In [10]:
# Example: Recommendations for a specific user
user_id_example = user_profiles.index[0]
recommended_courses = user_recommendations[user_id_example]

# Get course titles
recommended_titles = course_info[course_info['course_id'].isin(recommended_courses)]['title']
print(f"Top recommendations for {user_id_example}:\n")
[print(title) for title in recommended_titles.tolist()]

Top recommendations for UID0001293:

SQL for Data Science: Managing and Querying Databases
AI in Cybersecurity: Defending Against Digital Threats
Adversarial Machine Learning: Securing AI Models
Ethical Hacking with AI: Offense and Defense Strategies
Data Ethics and Privacy: Responsible Data Science
Web Scraping for Data Collection: Gathering Web Data
Data Integration Techniques: Combining Data from Multiple Sources
Identity and Access Management (IAM) in Cloud Platforms
Cloud Security Best Practices: Protecting Cloud Environments
Federated Learning: Collaborative AI Without Data Sharing


[None, None, None, None, None, None, None, None, None, None]

## 6. **Part B: Course Similarity-Based Recommender**

In this section, we’ll build a recommender system that recommends courses similar to those a user has liked before.

### 6.1 Calculate Course Similarity Matrix

Calculate the cosine similarity between all pairs of courses based on their topic vectors.

In [11]:
# Compute the cosine similarity matrix
course_similarity_matrix = pd.DataFrame(
    cosine_similarity(course_topics),
    index=course_topics.index,
    columns=course_topics.index
)

**Note**: The matrix contains similarity scores between courses.

## 6.2 Generate Recommendations

For each user, recommend courses similar to those they have rated highly.

In [12]:
# Dictionary to store recommendations
user_recommendations_course_sim = {}

for user in tqdm(user_profiles.index):
    # Get courses rated highly by the user
    user_high_rated_courses = high_rating_interactions[high_rating_interactions['user_id'] == user]['course_id'].tolist()
    
    # Initialize an empty Series to store similarity scores
    sim_scores = pd.Series(dtype=float)
    
    # Accumulate similarity scores from each liked course
    for course in user_high_rated_courses:
        sim_scores = sim_scores.add(course_similarity_matrix[course], fill_value=0)
    
    # Remove courses the user has already interacted with
    sim_scores = sim_scores[~sim_scores.index.isin(interacted_courses)]
    
    # Sort by similarity score
    top_recommendations = sim_scores.sort_values(ascending=False).head(10)
    
    # Store recommendations
    user_recommendations_course_sim[user] = top_recommendations.index.tolist()

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2338/2338 [00:01<00:00, 1899.02it/s]


**Example Recommendations for a User**:

In [13]:
# Example: Recommendations for the same user
recommended_courses_sim = user_recommendations_course_sim[user_id_example]

# Get course titles
recommended_titles_sim = course_info[course_info['course_id'].isin(recommended_courses_sim)]['title']
print(f"Top similar course recommendations for {user_id_example}:\n")
[print(title) for title in recommended_titles_sim.tolist()]

Top similar course recommendations for UID0001293:

Recommendation Systems: Personalizing User Experiences
Deep Learning with TensorFlow: Building Neural Networks
Applied Machine Learning: Real-World Projects
Transfer Learning in AI: Leveraging Pre-Trained Models
Model Evaluation and Validation: Ensuring Reliable Predictions
Advanced Machine Learning with Scikit-Learn
AI-Driven Chatbots: Enhancing Customer Engagement
Reinforcement Learning in Robotics: Teaching Machines to Act
Computer Vision Techniques: Enabling Machines to See
Quantum Machine Learning: The Future of AI


[None, None, None, None, None, None, None, None, None, None]

## 7. **Results and Analysis**

#### **Compare Recommendations**:

Interaction-Based Recommendations

In [14]:
print(f"User {user_id_example} - Interaction-Based Recommendations:")
display(recommended_titles.tolist())

User UID0001293 - Interaction-Based Recommendations:


['SQL for Data Science: Managing and Querying Databases',
 'AI in Cybersecurity: Defending Against Digital Threats',
 'Adversarial Machine Learning: Securing AI Models',
 'Ethical Hacking with AI: Offense and Defense Strategies',
 'Data Ethics and Privacy: Responsible Data Science',
 'Web Scraping for Data Collection: Gathering Web Data',
 'Data Integration Techniques: Combining Data from Multiple Sources',
 'Identity and Access Management (IAM) in Cloud Platforms',
 'Cloud Security Best Practices: Protecting Cloud Environments',
 'Federated Learning: Collaborative AI Without Data Sharing']

Similarity-Based Recommendations

In [15]:
print(f"\nUser {user_id_example} - Course Similarity-Based Recommendations:")
display(recommended_titles_sim.tolist())


User UID0001293 - Course Similarity-Based Recommendations:


['Recommendation Systems: Personalizing User Experiences',
 'Deep Learning with TensorFlow: Building Neural Networks',
 'Applied Machine Learning: Real-World Projects',
 'Transfer Learning in AI: Leveraging Pre-Trained Models',
 'Model Evaluation and Validation: Ensuring Reliable Predictions',
 'Advanced Machine Learning with Scikit-Learn',
 'AI-Driven Chatbots: Enhancing Customer Engagement',
 'Reinforcement Learning in Robotics: Teaching Machines to Act',
 'Computer Vision Techniques: Enabling Machines to See',
 'Quantum Machine Learning: The Future of AI']

#### **Analysis**:

   * The **Interaction-Based Recommender** suggests courses aligned with the user’s overall interests.
   * The **Course Similarity-Based Recommender** focuses on courses similar to specific courses the user liked.

#### **Advantages and Limitations**:
   * **Interaction-Based Recommender**:
      * **Advantage**: Captures the user's broader interests.
      * **Limitations**: Requires sufficient interaction history.
   * **Course Similarity-Based Recommender**:
      * **Advantage**: Helps user discover courses closely related to their favorites.
      * **Limitations**: May not promote diversity in recommendations.

## 8. **Conclusion**

**In this notebook, we developed two content-based recommender systems**:

   * **User-Course Interaction-Based Recommender**:
      * Builds user profiles based on preferred topics.
      * Recommends courses aligning with the user’s interests.
   * **Course Similarity-Based Recommender**:
      * Uses course similarity to recommend courses similar to those a user liked.

## 9. **Thanks and Contact Information**

Thank you for reviewing this project notebook. For any further questions, suggestions, or collaborations, please feel free to reach out:

   * [**Email**](mailto:leejoabraham01@gmail.com)
   * [**LinkedIn**](https://www.linkedin.com/in/leejoabraham01)
   * [**GitHub**](https://github.com/LeejoAbraham01)