# <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;">**Import Needed Libraries**</p>

pip install spacy

python -m spacy download en_core_web_sm

In [1]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report

import spacy

# <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;">**Exploratory Data Analysis EDA**</p>

In [2]:
# Read the dataset with name "Emotion_classify_Data.csv" and store it in a variable df
df = pd.read_csv("Emotion_classify_Data.csv")

# Print the shape of dataframe
print(df.shape)

# Print top 5 rows
df.head(5)

(5937, 2)


Unnamed: 0,Comment,Emotion
0,i seriously hate one subject to death but now ...,fear
1,im so full of life i feel appalled,anger
2,i sit here to write i start to dig out my feel...,fear
3,ive been really angry with r and i feel like a...,joy
4,i feel suspicious if there is no one outside l...,fear


In [3]:
# Check the distribution of Emotion
df['Emotion'].value_counts()

anger    2000
joy      2000
fear     1937
Name: Emotion, dtype: int64

In [4]:
# Show sample
print(f"{df['Comment'][0]} -> {df['Emotion'][0]}")

i seriously hate one subject to death but now i feel reluctant to drop it -> fear


# <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;">**Preprocessing**</p>

In [5]:
# load english language model and create nlp object from it
nlp = spacy.load("en_core_web_sm") 

In [6]:
txt = df['Comment'][3]
txt

'ive been really angry with r and i feel like an idiot for trusting him in the first place'

In [7]:
# Tokenization
doc = nlp(txt)

## <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;">**Sentence Tokenization**</p>
**We won't do this process as data in dataframe is splitted in sentences**

In [8]:
# for sentence in doc.sents:
#     print(sentence)

## <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;">**Word Tokenization**</p>

In [9]:
for token in doc:
    print(token)

i

ve

been

really

angry

with

r

and

i

feel

like

an

idiot

for

trusting

him

in

the

first

place


## <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;">**Stemming and Lemmatization**</p>

In [10]:
for token in doc:
    print(f"Word: {token} | -> {token.lemma_}")

Word: i | -> I

Word: ve | -> ve

Word: been | -> be

Word: really | -> really

Word: angry | -> angry

Word: with | -> with

Word: r | -> r

Word: and | -> and

Word: i | -> I

Word: feel | -> feel

Word: like | -> like

Word: an | -> an

Word: idiot | -> idiot

Word: for | -> for

Word: trusting | -> trust

Word: him | -> he

Word: in | -> in

Word: the | -> the

Word: first | -> first

Word: place | -> place


## <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;">**Stop Words**</p>

In [11]:
for token in doc:
    if token.is_stop or token.is_punct:
        print(token)

i

been

really

with

and

i

an

for

him

in

the

first


## <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;">**Preprocess Function**</p>

In [12]:
# use this utility function to get the preprocessed text data
def preprocess(text):
    # remove stop words and lemmatize the text
    doc = nlp(text)
    filtered_tokens = []
    for token in doc:
        if token.is_stop or token.is_punct:
            continue
        filtered_tokens.append(token.lemma_)
    
    return " ".join(filtered_tokens) 

In [13]:
print(txt)
procces_txt = preprocess(txt)
print(procces_txt)

ive been really angry with r and i feel like an idiot for trusting him in the first place

ve angry r feel like idiot trust place


 ## <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;">**Apply preprocess function on dataframe**</p>

In [14]:
df['preprocessed_comment'] = df['Comment'].apply(preprocess) 

In [15]:
df

Unnamed: 0,Comment,Emotion,preprocessed_comment
0,i seriously hate one subject to death but now ...,fear,seriously hate subject death feel reluctant drop
1,im so full of life i feel appalled,anger,m life feel appalled
2,i sit here to write i start to dig out my feel...,fear,sit write start dig feeling think afraid accep...
3,ive been really angry with r and i feel like a...,joy,ve angry r feel like idiot trust place
4,i feel suspicious if there is no one outside l...,fear,feel suspicious outside like rapture happen
...,...,...,...
5932,i begun to feel distressed for you,fear,begin feel distressed
5933,i left feeling annoyed and angry thinking that...,anger,leave feel annoyed angry thinking center stupi...
5934,i were to ever get married i d have everything...,joy,marry d ready offer ve get club perfect good l...
5935,i feel reluctant in applying there because i w...,fear,feel reluctant apply want able find company kn...


## <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;">**Encoding target column**</p>

In [16]:
df['Emotion_num'] = df['Emotion'].map({'joy' : 0, 'fear': 1, 'anger': 2})

df.head(5)

Unnamed: 0,Comment,Emotion,preprocessed_comment,Emotion_num
0,i seriously hate one subject to death but now ...,fear,seriously hate subject death feel reluctant drop,1
1,im so full of life i feel appalled,anger,m life feel appalled,2
2,i sit here to write i start to dig out my feel...,fear,sit write start dig feeling think afraid accep...,1
3,ive been really angry with r and i feel like a...,joy,ve angry r feel like idiot trust place,0
4,i feel suspicious if there is no one outside l...,fear,feel suspicious outside like rapture happen,1


## <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;">**Split data into train and test**</p>

In [17]:
X_train, X_test, y_train, y_test = train_test_split(df['preprocessed_comment'], df['Emotion_num'], 
                                                    test_size=0.2, random_state=42, stratify=df['Emotion_num'])

In [18]:
print("Shape of X_train: ", X_train.shape)
print("Shape of X_test: ", X_test.shape)

Shape of X_train:  (4749,)

Shape of X_test:  (1188,)


## <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;">**Convert text column to numeric vector**

In [19]:
v = TfidfVectorizer()

X_train_cv = v.fit_transform(X_train)
X_test_cv = v.transform(X_test)

# All TfidfVectorizer vocabularies
print(v.vocabulary_)



# <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;">**Machine Learning Model**</p>

## <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;">1.**Naive Bayes**

In [20]:
NB_model = MultinomialNB()

# Model training
NB_model.fit(X_train_cv, y_train)

In [21]:
# Get prediction
y_pred = NB_model.predict(X_test_cv)

In [22]:
# Print accuracy score
print(accuracy_score(y_test, y_pred))

0.9031986531986532


In [23]:
# Print classification report
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support



           0       0.90      0.89      0.89       400

           1       0.91      0.90      0.91       388

           2       0.90      0.92      0.91       400



    accuracy                           0.90      1188

   macro avg       0.90      0.90      0.90      1188

weighted avg       0.90      0.90      0.90      1188




## <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;">2.**Random Forest**</p>

In [24]:
RFC_model = RandomForestClassifier()

RFC_model.fit(X_train_cv, y_train)

In [25]:
# Get the predictions for X_test and store it in y_pred
y_pred = RFC_model.predict(X_test_cv)

In [26]:
# Print Accuracy
print(accuracy_score(y_test, y_pred))

0.9267676767676768


In [27]:
# Print the classfication report
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support



           0       0.92      0.95      0.93       400

           1       0.92      0.93      0.92       388

           2       0.94      0.90      0.92       400



    accuracy                           0.93      1188

   macro avg       0.93      0.93      0.93      1188

weighted avg       0.93      0.93      0.93      1188




# <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;">**Test Model**

## <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;">**Get text**</p>

In [28]:
test_text = df['Comment'][2000]
test_text

'im looking good and feeling good other than this crappy cold im dealing with'

## <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;">**Apply preprocess**</p>

In [29]:
test_text_processed = [preprocess(test_text)]
test_text_processed

['m look good feel good crappy cold m deal']

## <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;">**Convert to vector**</p>

In [30]:
test_text_vc = v.transform(test_text_processed)

## <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;">**Get Prediction**</p>

In [31]:
test_text = RFC_model.predict(test_text_vc)

## <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;">**Output**</p>

In [32]:
print(f"{df['Emotion'][2000]} -> {df['Emotion_num'][2000]}")
print(test_text)

joy -> 0

[0]
