# <center><b><i><h3>Content-based recommendation system using combined course dataset

## <center><h4> Importing Libraries

In [1]:
import pandas as pd

import spacy

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

import warnings
warnings.filterwarnings("ignore")

## <center><h4> Loading Dataset

In [21]:
data = pd.read_csv("../../Data/Cleaned_data.csv").drop(columns='Unnamed: 0')
data.head()

Unnamed: 0,Title,Difficult,Description,Link,Departments,Topics,urls
0,Ecology I: The Earth System,Undergraduate,"We will cover fundamentals of ecology, conside...",https://ocw.mit.edu/courses/1-018j-ecology-i-t...,Civil and Environmental Engineering,"Science, Biology, Ecology, Earth Science, Scie...",https://ocw.mit.edu/courses/1-018j-ecology-i-t...
1,Ecology II: Engineering for Sustainability,Undergraduate,"This course provides a review of physical, che...",https://ocw.mit.edu/courses/1-020-ecology-ii-e...,Civil and Environmental Engineering,"Engineering, Civil Engineering, Science, Biolo...",https://ocw.mit.edu/courses/1-020-ecology-ii-e...
2,Transport Processes in the Environment,Undergraduate,This class serves as an introduction to mass t...,https://ocw.mit.edu/courses/1-061-transport-pr...,Civil and Environmental Engineering,"Engineering, Chemical Engineering, Transport P...",https://ocw.mit.edu/courses/1-061-transport-pr...
3,Advanced Fluid Dynamics of the Environment,Graduate,Designed to familiarize students with theories...,https://ocw.mit.edu/courses/1-63-advanced-flui...,Civil and Environmental Engineering,"Engineering, Environmental Engineering, Hydrod...",https://ocw.mit.edu/courses/1-63-advanced-flui...
4,"Land, Water, Food, and Climate",Graduate,"This reading seminar examines land, water, foo...",https://ocw.mit.edu/courses/1-74-land-water-fo...,Civil and Environmental Engineering,"Energy, Climate, Renewables, Science, Earth Sc...",https://ocw.mit.edu/courses/1-74-land-water-fo...


In [22]:
data.columns

Index(['Title', 'Difficult', 'Description', 'Link', 'Departments', 'Topics',
       'urls'],
      dtype='object')

## Feature Engineering

- Combine Relevant Textual Fields

In [23]:
data['Tags'] = data['Description'] + data['Departments'] + data['Topics']

- Preprocess Text

In [24]:
nlp = spacy.load("en_core_web_sm")

In [25]:
str(data['Tags'][0])

'We will cover fundamentals of ecology, considering Earth as an integrated dynamic system. Topics include coevolution of the biosphere, geosphere, atmosphere and oceans; photosynthesis and respiration; the hydrologic, carbon and nitrogen cycles. We will examine the flow of energy and materials through ecosystems; regulation of the distribution and abundance of organisms; structure and function of ecosystems, including evolution and natural selection; metabolic diversity; productivity; trophic dynamics; models of population growth, competition, mutualism and predation. This course is designated as Communication-Intensive; instruction and practice in oral and written communication provided. Biology is a recommended prerequisite.Show lessCivil and Environmental EngineeringScience, Biology, Ecology, Earth Science, Science, Biology, Ecology, Earth Science'

- Text Preprocessor Function

In [26]:
def text_preprocessor(text):
    doc = nlp(text=str(text).lower())
    filtered_tokens = [
            token.lemma_ for token in doc 
            if not token.is_stop and not token.is_punct and token.pos_ in ["NOUN", "ADJ", "VERB"]
        ]
    return " ".join(filtered_tokens)

In [27]:
data['Preprocessed_Tags'] = data['Tags'].apply(text_preprocessor)

<b><h4><center>Vectorize the Processed Text

 - Applying TF-IDF (Term Frequency-Inverse Document Frequency) in a content-based recommendation system is a widely accepted and effective approach.

In [28]:
vectorizer = TfidfVectorizer(
    max_features=1500,  
    ngram_range=(1, 2),  
    stop_words='english',
    max_df=0.8,          
    min_df=2             
)
tfidf_matrix = vectorizer.fit_transform(data['Preprocessed_Tags'])

<b><h4><center>Calculate Similarity Matrix

In [29]:
cosine_matrix = cosine_similarity(tfidf_matrix, tfidf_matrix)

<b><h4><center>Build the Recommendation Function

In [30]:
def get_recommendations(title, cosine_sim=cosine_matrix, data=data, top_n=5):
    
    idx = data[data['Title'] == title].index[0]

    sim_scores = list(enumerate(cosine_sim[idx]))

    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    sim_scores = sim_scores[1:top_n + 1]

    course_indices = [i[0] for i in sim_scores]

    return data.iloc[course_indices][['Title', 'Description']]


In [36]:
recommendations = get_recommendations('Ecology I: The Earth System')

In [33]:
data[data['Title'] == 'Ecology I: The Earth System']

Unnamed: 0,Title,Difficult,Description,Link,Departments,Topics,urls,Tags,Preprocessed_Tags
0,Ecology I: The Earth System,Undergraduate,"We will cover fundamentals of ecology, conside...",https://ocw.mit.edu/courses/1-018j-ecology-i-t...,Civil and Environmental Engineering,"Science, Biology, Ecology, Earth Science, Scie...",https://ocw.mit.edu/courses/1-018j-ecology-i-t...,"We will cover fundamentals of ecology, conside...",cover fundamental ecology consider earth integ...


In [50]:
recommendations

Unnamed: 0,Title,Description
26,Theoretical Environmental Analysis,This course analyzes cooperative processes tha...
1,Ecology II: Engineering for Sustainability,"This course provides a review of physical, che..."
16,Weather and Climate Laboratory,Course 12.307 is an undergraduate course inten...
25,Groundwater Hydrology,This course covers fundamentals of subsurface ...
5,Atmospheric Chemistry,This course provides a detailed overview of th...


In [52]:
result = recommendations.merge(data[['Title', 'Link', 'urls']], on='Title', how='left')
result[['Title', 'Link', 'urls']]

Unnamed: 0,Title,Link,urls
0,Theoretical Environmental Analysis,https://ocw.mit.edu/courses/12-009j-theoretica...,https://ocw.mit.edu/courses/12-009j-theoretica...
1,Ecology II: Engineering for Sustainability,https://ocw.mit.edu/courses/1-020-ecology-ii-e...,https://ocw.mit.edu/courses/1-020-ecology-ii-e...
2,Weather and Climate Laboratory,https://ocw.mit.edu/courses/12-307-weather-and...,https://ocw.mit.edu/courses/12-307-weather-and...
3,Groundwater Hydrology,https://ocw.mit.edu/courses/1-72-groundwater-h...,https://ocw.mit.edu/courses/1-72-groundwater-h...
4,Atmospheric Chemistry,https://ocw.mit.edu/courses/1-84j-atmospheric-...,https://ocw.mit.edu/courses/1-84j-atmospheric-...
