# Course Recommendation System

This notebook outlines the creation of a simple course recommendation system using user profiles and course descriptions. The system leverages TF-IDF vectorization to establish a similarity basis for recommending courses that align with user preferences.


In [24]:
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity


## Data Preprocessing

We first clean and preprocess the course data corpus. This includes handling missing values and normalizing text fields to ensure consistency across datasets.


In [25]:
def preprocess_data(course_data):
    """
    Preprocess the user and course data by handling missing values and cleaning text fields.
    
    Parameters:
        user_data (pd.DataFrame): DataFrame containing user information.
        course_data (pd.DataFrame): DataFrame containing course information.
        
    Returns:
        pd.DataFrame, pd.DataFrame: Cleaned user_data and course_data DataFrames.
    """
    
    # Handle missing values in course data
    course_data.fillna('', inplace=True)
    
    course_data['Description'] = course_data['Description'].str.lower()
    
    return course_data


## Load and Clean Data

Load the course data from their respective files and apply the preprocessing function.


In [26]:
# Load your datasets
course_data = pd.read_csv('/kaggle/input/course-recs/courses.csv', encoding='ISO-8859-1')

# Preprocess data
course_data_clean = preprocess_data(course_data)


## Create Course Profiles

Generate course profiles using TF-IDF vectorization of course descriptions. This will help in establishing a similarity measure based on course content.


In [27]:
def create_course_profiles(course_data):
    """
    Create course profiles using TF-IDF vectorization of course descriptions.
    
    Parameters:
        course_data (pd.DataFrame): DataFrame containing cleaned course information.
    
    Returns:
        pd.DataFrame, TfidfVectorizer: DataFrame containing TF-IDF vectors for course descriptions and the vectorizer used.
    """
    tfidf_vectorizer = TfidfVectorizer(stop_words='english', max_features=1000)
    tfidf_matrix = tfidf_vectorizer.fit_transform(course_data['Description'])
    course_profiles = pd.DataFrame(tfidf_matrix.toarray(), columns=tfidf_vectorizer.get_feature_names_out(), index=course_data.index)
    return course_profiles, tfidf_vectorizer

# Create course profiles
course_profiles, tfidf_vectorizer = create_course_profiles(course_data_clean)


## Recommendation Function

Define a function to recommend courses based on user features by calculating cosine similarities between user input and course profiles.


In [28]:
def recommend_courses(user_features, course_profiles, course_data, tfidf_vectorizer, top_n=5):
    """
    Recommend courses based on user features.
    
    Parameters:
        user_features (dict): Dictionary of user features.
        course_profiles (pd.DataFrame): DataFrame containing TF-IDF vectors for course descriptions.
        course_data (pd.DataFrame): DataFrame containing course information.
        tfidf_vectorizer (TfidfVectorizer): Trained TF-IDF vectorizer.
        top_n (int): Number of top recommendations to return.

    Returns:
        pd.DataFrame: DataFrame containing the top recommended courses for the user.
    """
    user_vector = preprocess_user_input(user_features, tfidf_vectorizer)
    similarity_scores = cosine_similarity(user_vector, course_profiles).flatten()
    top_courses_indices = np.argsort(similarity_scores)[-top_n:][::-1]
    recommended_courses = course_data.loc[top_courses_indices]
    return recommended_courses

def preprocess_user_input(features, tfidf_vectorizer):
    profile = ' '.join(features.values()).lower()
    user_vector = tfidf_vectorizer.transform([profile])
    return user_vector


## Generate Recommendations

Generate course recommendations for a sample user based on their features.


In [29]:
# Example user features input
user_features = {
    'Field_Of_Study': 'Computer Programming',
    'Primary_Hobby': 'Gaming',
    'Secondary_Hobby': 'Photography',
    'Gender': 'Female',
    'Desired_Career_Field': 'Gaming',
    'Country_Of_Origin': 'USA'
}

# Recommend courses based on user features
recommended_courses = recommend_courses(user_features, course_profiles, course_data_clean, tfidf_vectorizer)
print(recommended_courses)



    Unnamed: 0                                 Subject Catalog Number  \
21          22              COMPSCI - Computer Science             94   
22          23              COMPSCI - Computer Science           101L   
85          86  GAMEDSGN - Game Design and Development            530   
23          24              COMPSCI - Computer Science           102L   
34          35              COMPSCI - Computer Science            553   

                                         Course Title Course Type  \
21                    Programming and Problem Solving  FALL-SPRNG   
22                   Introduction to Computer Science  FALL-SPRNG   
85                   Critical Analysis of Video Games        FALL   
23  Interdisciplinary Introduction to Computer Sci...        FALL   
34                              Compiler Construction      SPRING   

                                          Description  \
21  programming and problem solving in a specific ...   
22  introduction practices and p