# Emotion Text Detection Using Emotion Dataset


#### Link to the Dataset: https://github.com/Jcharis/end2end-nlp-project/blob/main/notebooks/data/emotion_dataset_raw.csv


## First we will Import Necessary Libraries

In [1]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt

# Ml pckg
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB
from sklearn.naive_bayes import GaussianNB
from sklearn import tree
from sklearn.ensemble import RandomForestClassifier

#Vectorizer
from sklearn.feature_extraction.text import CountVectorizer
#Split the dataset
from sklearn.model_selection import train_test_split
#Metrics
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix


## Data Cleaning
 Then we will analyze our dataset and look for missing values. Then we will clean the dataset so that it will be able to be used in the Machine Learning Model.

In [2]:
df=pd.read_csv('archive/emotion_dataset_raw.csv')

In [59]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 34792 entries, 0 to 34791
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Emotion     34792 non-null  object
 1   Text        34792 non-null  object
 2   Clean_text  34792 non-null  object
dtypes: object(3)
memory usage: 815.6+ KB


In [4]:
df.isnull().sum()

Emotion    0
Text       0
dtype: int64

In [5]:
df['Emotion'].value_counts()

joy         11045
sadness      6722
fear         5410
anger        4297
surprise     4062
neutral      2254
disgust       856
shame         146
Name: Emotion, dtype: int64

In [6]:
import neattext.functions as nfx

In [7]:
dir(nfx)

['BTC_ADDRESS_REGEX',
 'CURRENCY_REGEX',
 'CURRENCY_SYMB_REGEX',
 'Counter',
 'DATE_REGEX',
 'EMAIL_REGEX',
 'EMOJI_REGEX',
 'HASTAG_REGEX',
 'MASTERCard_REGEX',
 'MD5_SHA_REGEX',
 'MOST_COMMON_PUNCT_REGEX',
 'NUMBERS_REGEX',
 'PHONE_REGEX',
 'PoBOX_REGEX',
 'SPECIAL_CHARACTERS_REGEX',
 'STOPWORDS',
 'STOPWORDS_de',
 'STOPWORDS_en',
 'STOPWORDS_es',
 'STOPWORDS_fr',
 'STOPWORDS_ru',
 'STOPWORDS_yo',
 'STREET_ADDRESS_REGEX',
 'TextFrame',
 'URL_PATTERN',
 'USER_HANDLES_REGEX',
 'VISACard_REGEX',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__generate_text',
 '__loader__',
 '__name__',
 '__numbers_dict',
 '__package__',
 '__spec__',
 '_lex_richness_herdan',
 '_lex_richness_maas_ttr',
 'clean_text',
 'defaultdict',
 'digit2words',
 'extract_btc_address',
 'extract_currencies',
 'extract_currency_symbols',
 'extract_dates',
 'extract_emails',
 'extract_emojis',
 'extract_hashtags',
 'extract_html_tags',
 'extract_mastercard_addr',
 'extract_md5sha',
 'extract_numbers',
 'extr

In [8]:
df['Clean_text']=df['Text'].apply(nfx.remove_userhandles)

In [9]:
df['Clean_text']=df['Clean_text'].apply(nfx.remove_hashtags)

In [10]:
df['Clean_text']=df['Clean_text'].apply(nfx.remove_non_ascii)

In [11]:
df['Clean_text']=df['Clean_text'].apply(nfx.remove_punctuations)

In [12]:
df['Clean_text']=df['Clean_text'].apply(nfx.remove_special_characters)

In [13]:
df['Clean_text']=df['Clean_text'].apply(nfx.remove_urls)

In [14]:
df['Clean_text']=df['Clean_text'].apply(nfx.remove_emojis)

In [15]:
df.head()

Unnamed: 0,Emotion,Text,Clean_text
0,neutral,Why ?,Why
1,joy,Sage Act upgrade on my to do list for tommorow.,Sage Act upgrade on my to do list for tommorow
2,sadness,ON THE WAY TO MY HOMEGIRL BABY FUNERAL!!! MAN ...,ON THE WAY TO MY HOMEGIRL BABY FUNERAL MAN I H...
3,joy,Such an eye ! The true hazel eye-and so brill...,Such an eye The true hazel eyeand so brillia...
4,joy,@Iluvmiasantos ugh babe.. hugggzzz for u .! b...,ugh babe hugggzzz for u babe naamazed nga ...


In [16]:
df.drop('Text', axis=1)

Unnamed: 0,Emotion,Clean_text
0,neutral,Why
1,joy,Sage Act upgrade on my to do list for tommorow
2,sadness,ON THE WAY TO MY HOMEGIRL BABY FUNERAL MAN I H...
3,joy,Such an eye The true hazel eyeand so brillia...
4,joy,ugh babe hugggzzz for u babe naamazed nga ...
...,...,...
34787,surprise,have you gift Hope you like it Its hand made...
34788,joy,The world didnt give it to meso the world MOST...
34789,anger,A man robbed me today
34790,fear,Youu call it JEALOUSY I call it of YOU


### Preparing Features and Labels for the Machine Learning Model

First we will check different Machine learning models and there accuracy then will choose the model with the most accuracy.

In [17]:
features=df['Clean_text']
labels=df['Emotion']

In [18]:
cv=CountVectorizer()
x=cv.fit_transform(features)

In [19]:
X_train, X_test, y_train, y_test= train_test_split(x, labels, test_size=0.3, random_state=42)

In [20]:
nv_model=MultinomialNB()
nv_model.fit(X_train, y_train)


MultinomialNB()

In [21]:
nv_model.score(X_test, y_test)

0.5602605863192183

In [22]:
nv_model.predict(X_test)

array(['fear', 'sadness', 'sadness', ..., 'sadness', 'joy', 'sadness'],
      dtype='<U8')

In [23]:
sample=['You guys are nuts WHY ARE YOU GOING THERE WHEN YOU KNOW ITS RISKY']
sample_cv=cv.transform(sample).toarray()

In [24]:
nv_model.predict(sample_cv)

array(['fear'], dtype='<U8')

In [113]:
lg=LogisticRegression(random_state=0, solver='liblinear')
lg.fit(X_train, y_train)

LogisticRegression(random_state=0, solver='liblinear')

In [114]:
lg.score(X_test,y_test)

0.6313470013412531

In [27]:
lg.predict(sample_cv)

array(['neutral'], dtype=object)

In [28]:
classifier= tree.DecisionTreeClassifier(random_state = 0)
classifier.fit(X_train, y_train)

DecisionTreeClassifier(random_state=0)

In [29]:
classifier.score(X_test, y_test)

0.5201187967043495

In [30]:
classifier.predict(sample_cv)

array(['sadness'], dtype=object)

In [31]:
RF_classifier=RandomForestClassifier(n_estimators=200, random_state=0)
RF_classifier.fit(X_train, y_train)

In [32]:
RF_classifier.score(X_test, y_test)

In [52]:
def emotion_detection(data, model):
    vec=cv.transform(data).toarray()
    prediction=model.predict(vec)
    prob_pred=model.predict_proba(vec)
    return prediction

In [40]:
emotion_detection(sample, lg)

Prediction is ['neutral'], with probability 0.4799192636466322


{'anger': 0.036011441772148406,
 'disgust': 0.04082495069407158,
 'fear': 0.14599408864597282,
 'joy': 0.01312176848648796,
 'neutral': 0.4799192636466322,
 'sadness': 0.051507894813119716,
 'shame': 0.0003268747682437447,
 'surprise': 0.23229371717332353}

## Now using our Model to predict the text on the data of Climate-pp.csv and meta_nc.csv

In [41]:
test_df=pd.read_csv('archive/climate-pp.csv')

In [42]:
test_df

Unnamed: 0,DocText,DocDate
0,400 private jets too the climate change summit...,2021-11-02 23:23:58
1,“Covid” restrictions are so much like “Climate...,2021-11-02 23:22:22
2,Thank you to everybody who made this a success...,2021-11-02 23:09:11
3,anti-climate change **** deserve love and respect,2021-11-02 22:27:22
4,Very alarming ang effects ng climate change es...,2021-11-02 22:26:23
...,...,...
361,Always love @amelia_draper on @nbcwashington p...,2021-03-16 22:42:09
362,Thank you @NYSenDems for including the $3 bill...,2021-03-16 22:20:31
363,"Thank you so much @Kevin_Fong @astro_timpeake,...",2021-03-16 20:55:31
364,Always love seeing representatives of actual p...,2021-03-15 23:46:38


In [53]:
test=test_df['DocText']

In [56]:
test_df['Prediction']=emotion_detection(test, lg)

In [58]:
test_df

Unnamed: 0,DocText,DocDate,Prediction
0,400 private jets too the climate change summit...,2021-11-02 23:23:58,fear
1,“Covid” restrictions are so much like “Climate...,2021-11-02 23:22:22,sadness
2,Thank you to everybody who made this a success...,2021-11-02 23:09:11,joy
3,anti-climate change **** deserve love and respect,2021-11-02 22:27:22,joy
4,Very alarming ang effects ng climate change es...,2021-11-02 22:26:23,joy
...,...,...,...
361,Always love @amelia_draper on @nbcwashington p...,2021-03-16 22:42:09,joy
362,Thank you @NYSenDems for including the $3 bill...,2021-03-16 22:20:31,joy
363,"Thank you so much @Kevin_Fong @astro_timpeake,...",2021-03-16 20:55:31,surprise
364,Always love seeing representatives of actual p...,2021-03-15 23:46:38,joy


In [115]:
meta_test=pd.read_csv('archive/meta_nc.csv')

In [116]:
meta_test

Unnamed: 0,DocText,DocDate
0,Facebook doesn’t have a great rep when it come...,2021-10-28 23:59:52
1,Wasn’t Meta a superhero in one of the Pixar mo...,2021-10-28 23:59:51
2,Not that it will lead anywhere but the idea be...,2021-10-28 23:59:27
3,Zuckerburg changing FB to Meta just confirms w...,2021-10-28 23:59:23
4,Everytime i open twitter now i see a headline ...,2021-10-28 23:58:55
...,...,...
2642,bad team? oh no its just anti meta you wouldnt...,2021-10-27 00:46:13
2643,no one is a must pull in this game not zhongli...,2021-10-27 00:29:15
2644,Isn't drawing a tree on paper actually meta ?,2021-10-27 00:20:24
2645,"""isn't that awesome"" -dad's gf referring to ho...",2021-10-27 00:07:42


In [117]:
meta_text=meta_test['DocText']

In [119]:
meta_test['Predictions']=emotion_detection(meta_text, lg)

In [120]:
meta_test

Unnamed: 0,DocText,DocDate,Predictions
0,Facebook doesn’t have a great rep when it come...,2021-10-28 23:59:52,joy
1,Wasn’t Meta a superhero in one of the Pixar mo...,2021-10-28 23:59:51,neutral
2,Not that it will lead anywhere but the idea be...,2021-10-28 23:59:27,anger
3,Zuckerburg changing FB to Meta just confirms w...,2021-10-28 23:59:23,surprise
4,Everytime i open twitter now i see a headline ...,2021-10-28 23:58:55,joy
...,...,...,...
2642,bad team? oh no its just anti meta you wouldnt...,2021-10-27 00:46:13,sadness
2643,no one is a must pull in this game not zhongli...,2021-10-27 00:29:15,anger
2644,Isn't drawing a tree on paper actually meta ?,2021-10-27 00:20:24,surprise
2645,"""isn't that awesome"" -dad's gf referring to ho...",2021-10-27 00:07:42,disgust
