# <center><b><i><h3>Content-based recommendation system using combined course dataset

## <center><h4> Importing Libraries

In [96]:
import pandas as pd

import spacy

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

import warnings
warnings.filterwarnings("ignore")

## <center><h4> Loading Dataset

In [109]:
data = pd.read_csv("../../Data/Cleaned_data.csv").drop(columns='Unnamed: 0')
data.head()

Unnamed: 0,Title,Difficult,Description,Departments,Topics
0,Ecology I: The Earth System,Undergraduate,"We will cover fundamentals of ecology, conside...",Civil and Environmental Engineering,"Science, Biology, Ecology, Earth Science, Scie..."
1,Ecology II: Engineering for Sustainability,Undergraduate,"This course provides a review of physical, che...",Civil and Environmental Engineering,"Engineering, Civil Engineering, Science, Biolo..."
2,Transport Processes in the Environment,Undergraduate,This class serves as an introduction to mass t...,Civil and Environmental Engineering,"Engineering, Chemical Engineering, Transport P..."
3,Advanced Fluid Dynamics of the Environment,Graduate,Designed to familiarize students with theories...,Civil and Environmental Engineering,"Engineering, Environmental Engineering, Hydrod..."
4,"Land, Water, Food, and Climate",Graduate,"This reading seminar examines land, water, foo...",Civil and Environmental Engineering,"Energy, Climate, Renewables, Science, Earth Sc..."


In [110]:
data.columns

Index(['Title', 'Difficult', 'Description', 'Departments', 'Topics'], dtype='object')

## Feature Engineering

- Combine Relevant Textual Fields

In [111]:
data['Tags'] = data['Description'] + data['Departments'] + data['Topics']

- Preprocess Text

In [112]:
nlp = spacy.load("en_core_web_sm")

In [113]:
str(data['Tags'][0])

'We will cover fundamentals of ecology, considering Earth as an integrated dynamic system. Topics include coevolution of the biosphere, geosphere, atmosphere and oceans; photosynthesis and respiration; the hydrologic, carbon and nitrogen cycles. We will examine the flow of energy and materials through ecosystems; regulation of the distribution and abundance of organisms; structure and function of ecosystems, including evolution and natural selection; metabolic diversity; productivity; trophic dynamics; models of population growth, competition, mutualism and predation. This course is designated as Communication-Intensive; instruction and practice in oral and written communication provided. Biology is a recommended prerequisite.Show lessCivil and Environmental EngineeringScience, Biology, Ecology, Earth Science, Science, Biology, Ecology, Earth Science'

- Text Preprocessor Function

In [114]:
def text_preprocessor(text):
    doc = nlp(text=str(text).lower())
    filtered_tokens = [
            token.lemma_ for token in doc 
            if not token.is_stop and not token.is_punct and token.pos_ in ["NOUN", "ADJ", "VERB"]
        ]
    return " ".join(filtered_tokens)

In [115]:
data['Preprocessed_Tags'] = data['Tags'].apply(text_preprocessor)

<b><h4><center>Vectorize the Processed Text

 - Applying TF-IDF (Term Frequency-Inverse Document Frequency) in a content-based recommendation system is a widely accepted and effective approach.

In [116]:
vectorizer = TfidfVectorizer(
    max_features=1500,  
    ngram_range=(1, 2),  
    stop_words='english',
    max_df=0.8,          
    min_df=2             
)
tfidf_matrix = vectorizer.fit_transform(data['Preprocessed_Tags'])

<b><h4><center>Calculate Similarity Matrix

In [117]:
cosine_matrix = cosine_similarity(tfidf_matrix, tfidf_matrix)

<b><h4><center>Build the Recommendation Function

In [118]:
def get_recommendations(title, cosine_sim=cosine_matrix, data=data, top_n=5):
    
    idx = data[data['Title'] == title].index[0]

    sim_scores = list(enumerate(cosine_sim[idx]))

    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    sim_scores = sim_scores[1:top_n + 1]

    course_indices = [i[0] for i in sim_scores]

    return data.iloc[course_indices][['Title', 'Description']]


In [121]:
get_recommendations('Ecology I: The Earth System')['Title']

26            Theoretical Environmental Analysis
1     Ecology II: Engineering for Sustainability
16                Weather and Climate Laboratory
25                         Groundwater Hydrology
5                          Atmospheric Chemistry
Name: Title, dtype: object

In [108]:
data[data['Title'] == 'Ecology I: The Earth System']

Unnamed: 0.1,Unnamed: 0,Title,Difficult,Description,Departments,Topics,Tags,Preprocessed_Tags
0,0,Ecology I: The Earth System,Undergraduate,"We will cover fundamentals of ecology, conside...",Civil and Environmental Engineering,"Science, Biology, Ecology, Earth Science, Scie...","We will cover fundamentals of ecology, conside...",cover fundamental ecology consider earth integ...
