### About Data: Emotion Detection
Credits: https://www.kaggle.com/datasets/praveengovi/emotions-dataset-for-nlp

This data consists of two columns. - Comment - Emotion

Comment are the statements or messages regarding to a particular event/situation.

Emotion feature tells whether the given comment is fear 😨, Anger 😡, Joy 😂.

As there are only 3 classes, this problem comes under the Multi-Class Classification.

In [64]:
import pandas as pd
import spacy

nlp=spacy.load("en_core_web_sm")

In [65]:
def read_file(file):
    return pd.read_csv(file,sep=";",names=['sentence', 'label'])
traindf,testdf=pd.concat([read_file("assets/train.txt"),read_file("assets/val.txt")]),read_file("assets/test.txt")

In [66]:
traindf.head()

Unnamed: 0,sentence,label
0,i didnt feel humiliated,sadness
1,i can go from feeling so hopeless to so damned...,sadness
2,im grabbing a minute to post i feel greedy wrong,anger
3,i am ever feeling nostalgic about the fireplac...,love
4,i am feeling grouchy,anger


In [67]:
testdf.head()

Unnamed: 0,sentence,label
0,im feeling rather rotten so im not very ambiti...,sadness
1,im updating my blog because i feel shitty,sadness
2,i never make her separate from me because i do...,sadness
3,i left with my bouquet of red and yellow tulip...,joy
4,i was feeling a little vain when i did this one,sadness


In [68]:
traindf.shape

(18000, 2)

In [69]:
testdf.shape

(2000, 2)

In [70]:
catagorys={j:i for i,j in enumerate(traindf["label"].unique())}
catagorys

{'sadness': 0, 'anger': 1, 'love': 2, 'surprise': 3, 'fear': 4, 'joy': 5}

In [71]:
traindf["label"]=traindf["label"].map(catagorys)
testdf["label"]=testdf["label"].map(catagorys)

In [75]:
testdf.head()

Unnamed: 0,sentence,label
0,im feeling rather rotten so im not very ambiti...,0
1,im updating my blog because i feel shitty,0
2,i never make her separate from me because i do...,0
3,i left with my bouquet of red and yellow tulip...,5
4,i was feeling a little vain when i did this one,0


In [76]:
x_train=traindf["sentence"]
y_train=traindf["label"]
x_test=testdf["sentence"]
y_test=testdf["label"]

In [77]:
print(x_train.head())
print(y_train.head())
print(x_test.head())
print(y_test.head())

0                              i didnt feel humiliated
1    i can go from feeling so hopeless to so damned...
2     im grabbing a minute to post i feel greedy wrong
3    i am ever feeling nostalgic about the fireplac...
4                                 i am feeling grouchy
Name: sentence, dtype: object
0    0
1    0
2    1
3    2
4    1
Name: label, dtype: int64
0    im feeling rather rotten so im not very ambiti...
1            im updating my blog because i feel shitty
2    i never make her separate from me because i do...
3    i left with my bouquet of red and yellow tulip...
4      i was feeling a little vain when i did this one
Name: sentence, dtype: object
0    0
1    0
2    0
3    5
4    0
Name: label, dtype: int64


In [88]:
from sklearn.metrics import classification_report
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer

tfidfVectorizer=TfidfVectorizer()
def preprocess(text):
    doc = nlp(text)
    no_stop_words = [token.text for token in doc if not token.is_stop and not token.is_punct and not token.is_space]
    return " ".join(no_stop_words) 
def get_pipeline(model):
    return Pipeline([
        ("vactorizer",tfidfVectorizer),
        ("model",model)
    ])
def getreport(pipe):
    pipe.fit(x_train,y_train)
    y_pred=pipe.predict(x_test)
    return classification_report(y_test,y_pred)

In [93]:
x_test=x_test.apply(preprocess)

In [94]:
x_train=x_train.apply(preprocess)


In [95]:
from sklearn.naive_bayes import MultinomialNB


piplin1=get_pipeline(
    MultinomialNB()
)
piplin1

In [96]:
print(getreport(piplin1))

              precision    recall  f1-score   support

           0       0.71      0.92      0.80       581
           1       0.96      0.40      0.56       275
           2       1.00      0.08      0.14       159
           3       0.00      0.00      0.00        66
           4       0.92      0.35      0.50       224
           5       0.66      0.99      0.79       695

    accuracy                           0.71      2000
   macro avg       0.71      0.45      0.47      2000
weighted avg       0.75      0.71      0.65      2000



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
