# Introduction



*   In the previous step, we have cleanded our data
*   In this notebook, we will try to build a baseline model that detects one or multiple emotions in a text based on the GoEmotions data (multi-label text classification)
*   The score of pur baseline model will be used as a reference when building more complex models






# 1 - Importing libraries and loading data

First, let's install and import some libraries for data exploration and  processing.

# Installing additional libraries for text preprocessing
!pip install emoji
!pip install contractions

In [90]:
# Data manipulation libraries
import pandas as pd
import numpy as np
import json
from pprint import pprint
import matplotlib.pyplot as plt
import seaborn as sns

# Text processing libraries
import emoji
import re
import contractions
import spacy
from spacy.lang.en.stop_words import STOP_WORDS

# Scikit-Learn packages
from sklearn.linear_model import SGDClassifier
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import RidgeClassifier
from sklearn.multioutput import MultiOutputClassifier
from sklearn.metrics import precision_recall_fscore_support
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score

from sklearn.neighbors import KNeighborsClassifier
from sklearn import svm
from sklearn.linear_model import SGDClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import scale

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

from google.colab import drive
drive.mount('/content/drive')

Now, let's import our data.

In [91]:
# Importing train, validation and test datasets with preprocessed texts and labels
train_GE = pd.read_csv("train_clean.csv")
val_GE = pd.read_csv("val_clean.csv")
test_GE = pd.read_csv("test_clean.csv")

# Shape validation
print(train_GE.shape)
print(val_GE.shape)
print(test_GE.shape)

(43410, 29)
(5426, 29)
(5427, 29)


In [92]:
# Loading emotion labels for GoEmotions taxonomy
with open("emotions.txt", "r") as file:
    GE_taxonomy = file.read().split("\n")

for emo in GE_taxonomy:
  print(emo)

admiration
amusement
anger
annoyance
approval
caring
confusion
curiosity
desire
disappointment
disapproval
disgust
embarrassment
excitement
fear
gratitude
grief
joy
love
nervousness
optimism
pride
realization
relief
remorse
sadness
surprise
neutral


# 2 - Preprocessings and transformations

Before defining and constructing a baseline model, we need to perform some additional processings such as tokenizing and lemmatizing our samples.

## 2.1 - Additional preprocessings for basic Machine Learning tasks

First, let's remove all punctuations.

In [93]:
# Additional text preprocessing
train_GE['Clean_text'] = train_GE['Clean_text'].apply(lambda x: re.sub(r"[^A-Za-z_]+"," ", x))
test_GE['Clean_text'] = test_GE['Clean_text'].apply(lambda x: re.sub(r"[^A-Za-z_]+"," ", x))
val_GE['Clean_text'] = val_GE['Clean_text'].apply(lambda x: re.sub(r"[^A-Za-z_]+"," ", x))

In [94]:
train_GE_no_neu = train_GE.drop('neutral', axis=1)
test_GE_no_neu = test_GE.drop('neutral', axis=1)
val_GE_no_neu = val_GE.drop('neutral', axis=1)

In [95]:
# Removing samples with only 0 in their labels
train_GE_no_neu = train_GE_no_neu.loc[ train_GE_no_neu.apply(lambda x: sum(x[1:]), axis=1)>0 ]
val_GE_no_neu = val_GE_no_neu.loc[ val_GE_no_neu.apply(lambda x: sum(x[1:]), axis=1)>0 ]
test_GE_no_neu = test_GE_no_neu.loc[ test_GE_no_neu.apply(lambda x: sum(x[1:]), axis=1)>0 ]

# Shape validation
print(train_GE_no_neu.shape)
print(val_GE_no_neu.shape)
print(val_GE_no_neu.shape)

(30587, 28)
(3834, 28)
(3834, 28)


New we can tokenize our samples using spacy and more specifically the english model. After creating these tokens, we will be able to lemmatize them and remove english stop words that may not help us in the classification task.

# Download model 
!python -m spacy download en_core_web_sm -q

In [96]:
# Import English using en_core_web_sm.load()
import en_core_web_sm
nlp = en_core_web_sm.load()



In [97]:
# Creating tokenized documents
tokenized_train_GE = train_GE["Clean_text"].apply(lambda desc: nlp(desc))
tokenized_test_GE = test_GE["Clean_text"].apply(lambda desc: nlp(desc))
tokenized_val_GE = val_GE["Clean_text"].apply(lambda desc: nlp(desc))

tokenized_train_GE_no_neu = train_GE_no_neu["Clean_text"].apply(lambda desc: nlp(desc))
tokenized_test_GE_no_neu = test_GE_no_neu["Clean_text"].apply(lambda desc: nlp(desc))
tokenized_val_GE_no_neu = val_GE_no_neu["Clean_text"].apply(lambda desc: nlp(desc))

In [98]:
STOP_WORDS

{"'d",
 "'ll",
 "'m",
 "'re",
 "'s",
 "'ve",
 'a',
 'about',
 'above',
 'across',
 'after',
 'afterwards',
 'again',
 'against',
 'all',
 'almost',
 'alone',
 'along',
 'already',
 'also',
 'although',
 'always',
 'am',
 'among',
 'amongst',
 'amount',
 'an',
 'and',
 'another',
 'any',
 'anyhow',
 'anyone',
 'anything',
 'anyway',
 'anywhere',
 'are',
 'around',
 'as',
 'at',
 'back',
 'be',
 'became',
 'because',
 'become',
 'becomes',
 'becoming',
 'been',
 'before',
 'beforehand',
 'behind',
 'being',
 'below',
 'beside',
 'besides',
 'between',
 'beyond',
 'both',
 'bottom',
 'but',
 'by',
 'ca',
 'call',
 'can',
 'cannot',
 'could',
 'did',
 'do',
 'does',
 'doing',
 'done',
 'down',
 'due',
 'during',
 'each',
 'eight',
 'either',
 'eleven',
 'else',
 'elsewhere',
 'empty',
 'enough',
 'even',
 'ever',
 'every',
 'everyone',
 'everything',
 'everywhere',
 'except',
 'few',
 'fifteen',
 'fifty',
 'first',
 'five',
 'for',
 'former',
 'formerly',
 'forty',
 'four',
 'from',
 'fron

In [99]:
a = ['neither','never','nevertheless','no','none','nor','not','nothing']
stop_words = { text for text in STOP_WORDS if text not in a}
stop_words

{"'d",
 "'ll",
 "'m",
 "'re",
 "'s",
 "'ve",
 'a',
 'about',
 'above',
 'across',
 'after',
 'afterwards',
 'again',
 'against',
 'all',
 'almost',
 'alone',
 'along',
 'already',
 'also',
 'although',
 'always',
 'am',
 'among',
 'amongst',
 'amount',
 'an',
 'and',
 'another',
 'any',
 'anyhow',
 'anyone',
 'anything',
 'anyway',
 'anywhere',
 'are',
 'around',
 'as',
 'at',
 'back',
 'be',
 'became',
 'because',
 'become',
 'becomes',
 'becoming',
 'been',
 'before',
 'beforehand',
 'behind',
 'being',
 'below',
 'beside',
 'besides',
 'between',
 'beyond',
 'both',
 'bottom',
 'but',
 'by',
 'ca',
 'call',
 'can',
 'cannot',
 'could',
 'did',
 'do',
 'does',
 'doing',
 'done',
 'down',
 'due',
 'during',
 'each',
 'eight',
 'either',
 'eleven',
 'else',
 'elsewhere',
 'empty',
 'enough',
 'even',
 'ever',
 'every',
 'everyone',
 'everything',
 'everywhere',
 'except',
 'few',
 'fifteen',
 'fifty',
 'first',
 'five',
 'for',
 'former',
 'formerly',
 'forty',
 'four',
 'from',
 'fron

In [100]:
# Lemmatize each token and removing english stopwords
tokenized_train_GE = tokenized_train_GE.apply(lambda x: [token.lemma_ for token in x if token.lemma_ not in stop_words])
tokenized_test_GE = tokenized_test_GE.apply(lambda x: [token.lemma_ for token in x if token.lemma_ not in stop_words])
tokenized_val_GE = tokenized_val_GE.apply(lambda x: [token.lemma_ for token in x if token.lemma_ not in stop_words])

tokenized_train_GE_no_neu = tokenized_train_GE_no_neu.apply(lambda x: [token.lemma_ for token in x if token.lemma_ not in stop_words])
tokenized_test_GE_no_neu = tokenized_test_GE_no_neu.apply(lambda x: [token.lemma_ for token in x if token.lemma_ not in stop_words])
tokenized_val_GE_no_neu = tokenized_val_GE_no_neu.apply(lambda x: [token.lemma_ for token in x if token.lemma_ not in stop_words])

# Creating clean data in our dataframes
train_GE["Clean_token"] = [" ".join(x) for x in tokenized_train_GE]
test_GE["Clean_token"] = [" ".join(x) for x in tokenized_test_GE]
val_GE["Clean_token"] = [" ".join(x) for x in tokenized_val_GE]

train_GE_no_neu["Clean_token"] = [" ".join(x) for x in tokenized_train_GE_no_neu]
test_GE_no_neu["Clean_token"] = [" ".join(x) for x in tokenized_test_GE_no_neu]
val_GE_no_neu["Clean_token"] = [" ".join(x) for x in tokenized_val_GE_no_neu]

In [101]:
train_GE_no_neu.head(3)

Unnamed: 0,Clean_text,admiration,amusement,anger,annoyance,approval,caring,confusion,curiosity,desire,...,love,nervousness,optimism,pride,realization,relief,remorse,sadness,surprise,Clean_token
2,why the fuck is bayless isoing,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,fuck bayless isoing
3,to make her feel threatened,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,feel threaten
4,dirty southern wankers,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,dirty southern wanker


## 2.2 - Create TF-IDF matrix

Finally, we can create a TF-IDF matrix that will help us represent each sample of our corpus using the importance and frequency of each word in the sample, but also in the whole corpus.

In [102]:
# TF-IDF vector with 1000 words vocabulary 
vectorizer = TfidfVectorizer(stop_words=stop_words, max_features=1000)

# Fitting the vectorizer and transforming train and test data
tfidf_train_GE = vectorizer.fit_transform(train_GE['Clean_token'])
tfidf_test_GE = vectorizer.transform(test_GE['Clean_token'])
tfidf_val_GE = vectorizer.transform(val_GE['Clean_token'])

tfidf_train_GE_no_neu = vectorizer.fit_transform(train_GE_no_neu['Clean_token'])
tfidf_test_GE_no_neu = vectorizer.transform(test_GE_no_neu['Clean_token'])
tfidf_val_GE_no_neu = vectorizer.transform(val_GE_no_neu['Clean_token'])

# Transforming from generators to arrays
tfidf_train_GE = tfidf_train_GE.toarray()
tfidf_test_GE = tfidf_test_GE.toarray()
tfidf_val_GE = tfidf_val_GE.toarray()

tfidf_train_GE_no_neu = tfidf_train_GE_no_neu.toarray()
tfidf_test_GE_no_neu = tfidf_test_GE_no_neu.toarray()
tfidf_val_GE_no_neu = tfidf_val_GE_no_neu.toarray()

# Validating the shape of train and test data
print(tfidf_train_GE.shape)
print(tfidf_test_GE.shape)
print(tfidf_val_GE.shape)
print(tfidf_train_GE_no_neu.shape)
print(tfidf_test_GE_no_neu.shape)
print(tfidf_val_GE_no_neu.shape)



(43410, 1000)
(5427, 1000)
(5426, 1000)
(30587, 1000)
(3821, 1000)
(3834, 1000)


The `max_features` argument in the `TfidfVectorizer` allows the maximum number of words to be considered in the vocabulary. Therefore, each sample in the train and test datasets will be represented using a vector of dimension `(1,1000)`.




## 2.3 - Train and test variables

Let's define some explicit variables that will be used in constructing a machine learning model.

In [103]:
GE_taxonomy_no_neu = GE_taxonomy.copy()
GE_taxonomy_no_neu.remove('neutral')

In [104]:
GE_taxonomy_no_neu

['admiration',
 'amusement',
 'anger',
 'annoyance',
 'approval',
 'caring',
 'confusion',
 'curiosity',
 'desire',
 'disappointment',
 'disapproval',
 'disgust',
 'embarrassment',
 'excitement',
 'fear',
 'gratitude',
 'grief',
 'joy',
 'love',
 'nervousness',
 'optimism',
 'pride',
 'realization',
 'relief',
 'remorse',
 'sadness',
 'surprise']

In [105]:
# Defining train and test variables
X_train =  tfidf_train_GE
y_train = train_GE.loc[:,GE_taxonomy].values
X_train_no_neu =  tfidf_train_GE_no_neu
y_train_no_neu = train_GE_no_neu.loc[:,GE_taxonomy_no_neu].values

X_test =  tfidf_test_GE
y_test = test_GE.loc[:,GE_taxonomy].values
X_test_no_neu =  tfidf_test_GE_no_neu
y_test_no_neu = test_GE_no_neu.loc[:,GE_taxonomy_no_neu].values

X_val =  tfidf_val_GE
y_val = val_GE.loc[:,GE_taxonomy].values
X_val_no_neu =  tfidf_val_GE_no_neu
y_val_no_neu = val_GE_no_neu.loc[:,GE_taxonomy_no_neu].values

# Shape validation
print("The shape of X_train is : ", X_train.shape)
print("The shape of y_train is : ", y_train.shape)
print("The shape of X_train_no_neu: ", X_train_no_neu.shape)
print("The shape of y_train_no_neu is : ", y_train_no_neu.shape)
print()
print("The shape of X_test is : ", X_test.shape)
print("The shape of y_test is : ", y_test.shape)
print("The shape of X_test_neu: ", X_test_no_neu.shape)
print("The shape of y_test_no_neu is : ", y_test_no_neu.shape)
print()
print("The shape of X_val is : ", X_val.shape)
print("The shape of y_val is : ", y_val.shape)
print("The shape of X_val_neu: ", X_val_no_neu.shape)
print("The shape of y_val_no_neu is : ", y_val_no_neu.shape)

The shape of X_train is :  (43410, 1000)
The shape of y_train is :  (43410, 28)
The shape of X_train_no_neu:  (30587, 1000)
The shape of y_train_no_neu is :  (30587, 27)

The shape of X_test is :  (5427, 1000)
The shape of y_test is :  (5427, 28)
The shape of X_test_neu:  (3821, 1000)
The shape of y_test_no_neu is :  (3821, 27)

The shape of X_val is :  (5426, 1000)
The shape of y_val is :  (5426, 28)
The shape of X_val_neu:  (3834, 1000)
The shape of y_val_no_neu is :  (3834, 27)


### Creating another data inputs without one hot encoding for ekman labels

In [106]:
df_train = pd.read_csv('https://github.com/google-research/google-research/raw/master/goemotions/data/train.tsv', sep='\t', header=None, names=['Text', 'Class', 'ID']).drop('ID', axis=1)
df_val = pd.read_csv('https://github.com/google-research/google-research/raw/master/goemotions/data/dev.tsv', sep='\t', header=None, names=['Text', 'Class', 'ID']).drop('ID', axis=1)
df_test = pd.read_csv('https://github.com/google-research/google-research/raw/master/goemotions/data/test.tsv', sep='\t', header=None, names=['Text', 'Class', 'ID']).drop('ID', axis=1)


In [107]:
df_train

Unnamed: 0,Text,Class
0,My favourite food is anything I didn't have to...,27
1,"Now if he does off himself, everyone will thin...",27
2,WHY THE FUCK IS BAYLESS ISOING,2
3,To make her feel threatened,14
4,Dirty Southern Wankers,3
...,...,...
43405,Added you mate well I’ve just got the bow and ...,18
43406,Always thought that was funny but is it a refe...,6
43407,What are you talking about? Anything bad that ...,3
43408,"More like a baptism, with sexy results!",13


In [108]:
df_train['List of classes'] = df_train['Class'].apply(lambda x: x.split(','))
df_train['Len of classes'] = df_train['List of classes'].apply(lambda x: len(x))
df_val['List of classes'] = df_val['Class'].apply(lambda x: x.split(','))
df_val['Len of classes'] = df_val['List of classes'].apply(lambda x: len(x))
df_test['List of classes'] = df_test['Class'].apply(lambda x: x.split(','))
df_test['Len of classes'] = df_test['List of classes'].apply(lambda x: len(x))

In [109]:
df_train

Unnamed: 0,Text,Class,List of classes,Len of classes
0,My favourite food is anything I didn't have to...,27,[27],1
1,"Now if he does off himself, everyone will thin...",27,[27],1
2,WHY THE FUCK IS BAYLESS ISOING,2,[2],1
3,To make her feel threatened,14,[14],1
4,Dirty Southern Wankers,3,[3],1
...,...,...,...,...
43405,Added you mate well I’ve just got the bow and ...,18,[18],1
43406,Always thought that was funny but is it a refe...,6,[6],1
43407,What are you talking about? Anything bad that ...,3,[3],1
43408,"More like a baptism, with sexy results!",13,[13],1


In [110]:
df_train.isnull().sum()

Text               0
Class              0
List of classes    0
Len of classes     0
dtype: int64

In [111]:
df_train["Class"].value_counts()

27           12823
0             2710
4             1873
15            1857
1             1652
             ...  
6,15,22          1
9,10,19          1
7,10,25          1
7,9,24,25        1
0,1,18           1
Name: Class, Length: 711, dtype: int64

In [112]:
with open('ekman_mapping.json') as file:
    ekman_mapping = json.load(file)

In [113]:
emotion_file = open("emotions.txt", "r")
emotion_list = emotion_file.read()
emotion_list = emotion_list.split("\n")
print(emotion_list)

['admiration', 'amusement', 'anger', 'annoyance', 'approval', 'caring', 'confusion', 'curiosity', 'desire', 'disappointment', 'disapproval', 'disgust', 'embarrassment', 'excitement', 'fear', 'gratitude', 'grief', 'joy', 'love', 'nervousness', 'optimism', 'pride', 'realization', 'relief', 'remorse', 'sadness', 'surprise', 'neutral']


In [114]:
def idx2class(idx_list):
    arr = []
    for i in idx_list:
        arr.append(emotion_list[int(i)])
    return arr

In [115]:
df_train['Emotions'] = df_train['List of classes'].apply(idx2class)
df_val['Emotions'] = df_val['List of classes'].apply(idx2class)
df_test['Emotions'] = df_test['List of classes'].apply(idx2class)

In [116]:
ekman_mapping

{'anger': ['anger', 'annoyance', 'disapproval'],
 'disgust': ['disgust'],
 'fear': ['fear', 'nervousness'],
 'joy': ['joy',
  'amusement',
  'approval',
  'excitement',
  'gratitude',
  'love',
  'optimism',
  'relief',
  'pride',
  'admiration',
  'desire',
  'caring'],
 'sadness': ['sadness', 'disappointment', 'embarrassment', 'grief', 'remorse'],
 'surprise': ['surprise', 'realization', 'confusion', 'curiosity']}

In [117]:
def EmotionMapping(emotion_list):
    map_list = []
    
    for i in emotion_list:
        if i in ekman_mapping['anger']:
            map_list.append('anger')
        if i in ekman_mapping['disgust']:
            map_list.append('disgust')
        if i in ekman_mapping['fear']:
            map_list.append('fear')
        if i in ekman_mapping['joy']:
            map_list.append('joy')
        if i in ekman_mapping['sadness']:
            map_list.append('sadness')
        if i in ekman_mapping['surprise']:
            map_list.append('surprise')
        if i == 'neutral':
            map_list.append('neutral')
            
    return map_list

In [118]:
df_train['Mapped Emotions'] = df_train['Emotions'].apply(EmotionMapping)
df_val['Mapped Emotions'] = df_val['Emotions'].apply(EmotionMapping)
df_test['Mapped Emotions'] = df_test['Emotions'].apply(EmotionMapping)

In [119]:
df_train

Unnamed: 0,Text,Class,List of classes,Len of classes,Emotions,Mapped Emotions
0,My favourite food is anything I didn't have to...,27,[27],1,[neutral],[neutral]
1,"Now if he does off himself, everyone will thin...",27,[27],1,[neutral],[neutral]
2,WHY THE FUCK IS BAYLESS ISOING,2,[2],1,[anger],[anger]
3,To make her feel threatened,14,[14],1,[fear],[fear]
4,Dirty Southern Wankers,3,[3],1,[annoyance],[anger]
...,...,...,...,...,...,...
43405,Added you mate well I’ve just got the bow and ...,18,[18],1,[love],[joy]
43406,Always thought that was funny but is it a refe...,6,[6],1,[confusion],[surprise]
43407,What are you talking about? Anything bad that ...,3,[3],1,[annoyance],[anger]
43408,"More like a baptism, with sexy results!",13,[13],1,[excitement],[joy]


In [120]:
# Building a preprocessing function to clean text
def preprocess_corpus(x):

  # Adding a space between words and punctation
  x = re.sub( r'([a-zA-Z\[\]])([,;.!?])', r'\1 \2', x)
  x = re.sub( r'([,;.!?])([a-zA-Z\[\]])', r'\1 \2', x)
  
  # Demojize
  x = emoji.demojize(x)
  
  # Expand contraction
  x = contractions.fix(x)
  
  # Lower
  x = x.lower()

  #correct some acronyms/typos/abbreviations  
  x = re.sub(r"lmao", "laughing my ass off", x)  
  x = re.sub(r"amirite", "am i right", x)
  x = re.sub(r"\b(tho)\b", "though", x)
  x = re.sub(r"\b(ikr)\b", "i know right", x)
  x = re.sub(r"\b(ya|u)\b", "you", x)
  x = re.sub(r"\b(eu)\b", "europe", x)
  x = re.sub(r"\b(da)\b", "the", x)
  x = re.sub(r"\b(dat)\b", "that", x)
  x = re.sub(r"\b(dats)\b", "that is", x)
  x = re.sub(r"\b(cuz)\b", "because", x)
  x = re.sub(r"\b(fkn)\b", "fucking", x)
  x = re.sub(r"\b(tbh)\b", "to be honest", x)
  x = re.sub(r"\b(tbf)\b", "to be fair", x)
  x = re.sub(r"faux pas", "mistake", x)
  x = re.sub(r"\b(btw)\b", "by the way", x)
  x = re.sub(r"\b(bs)\b", "bullshit", x)
  x = re.sub(r"\b(kinda)\b", "kind of", x)
  x = re.sub(r"\b(bruh)\b", "bro", x)
  x = re.sub(r"\b(w/e)\b", "whatever", x)
  x = re.sub(r"\b(w/)\b", "with", x)
  x = re.sub(r"\b(w/o)\b", "without", x)
  x = re.sub(r"\b(doj)\b", "department of justice", x)
  
  #replace some words with multiple occurences of a letter, example "coooool" turns into --> cool
  x = re.sub(r"\b(j+e{2,}z+e*)\b", "jeez", x)
  x = re.sub(r"\b(co+l+)\b", "cool", x)
  x = re.sub(r"\b(g+o+a+l+)\b", "goal", x)
  x = re.sub(r"\b(s+h+i+t+)\b", "shit", x)
  x = re.sub(r"\b(o+m+g+)\b", "omg", x)
  x = re.sub(r"\b(w+t+f+)\b", "wtf", x)
  x = re.sub(r"\b(w+h+a+t+)\b", "what", x)
  x = re.sub(r"\b(y+e+y+|y+a+y+|y+e+a+h+)\b", "yeah", x)
  x = re.sub(r"\b(w+o+w+)\b", "wow", x)
  x = re.sub(r"\b(w+h+y+)\b", "why", x)
  x = re.sub(r"\b(s+o+)\b", "so", x)
  x = re.sub(r"\b(f)\b", "fuck", x)
  x = re.sub(r"\b(w+h+o+p+s+)\b", "whoops", x)
  x = re.sub(r"\b(ofc)\b", "of course", x)
  x = re.sub(r"\b(the us)\b", "usa", x)
  x = re.sub(r"\b(gf)\b", "girlfriend", x)
  x = re.sub(r"\b(hr)\b", "human ressources", x)
  x = re.sub(r"\b(mh)\b", "mental health", x)
  x = re.sub(r"\b(idk)\b", "i do not know", x)
  x = re.sub(r"\b(gotcha)\b", "i got you", x)
  x = re.sub(r"\b(y+e+p+)\b", "yes", x)
  x = re.sub(r"\b(a*ha+h[ha]*|a*ha +h[ha]*)\b", "haha", x)
  x = re.sub(r"\b(o?l+o+l+[ol]*)\b", "lol", x)
  x = re.sub(r"\b(o*ho+h[ho]*|o*ho +h[ho]*)\b", "ohoh", x)
  x = re.sub(r"\b(o+h+)\b", "oh", x)
  x = re.sub(r"\b(a+h+)\b", "ah", x)
  x = re.sub(r"\b(u+h+)\b", "uh", x)

  # Handling emojis
  x = re.sub(r"<3", " love ", x)
  x = re.sub(r"xd", " smiling_face_with_open_mouth_and_tightly_closed_eyes ", x)
  x = re.sub(r":\)", " smiling_face ", x)
  x = re.sub(r"^_^", " smiling_face ", x)
  x = re.sub(r"\*_\*", " star_struck ", x)
  x = re.sub(r":\(", " frowning_face ", x)
  x = re.sub(r":\^\(", " frowning_face ", x)
  x = re.sub(r";\(", " frowning_face ", x)
  x = re.sub(r":\/",  " confused_face", x)
  x = re.sub(r";\)",  " wink", x)
  x = re.sub(r">__<",  " unamused ", x)
  x = re.sub(r"\b([xo]+x*)\b", " xoxo ", x)
  x = re.sub(r"\b(n+a+h+)\b", "no", x)

  # Handling special cases of text
  x = re.sub(r"h a m b e r d e r s", "hamberders", x)
  x = re.sub(r"b e n", "ben", x)
  x = re.sub(r"s a t i r e", "satire", x)
  x = re.sub(r"y i k e s", "yikes", x)
  x = re.sub(r"s p o i l e r", "spoiler", x)
  x = re.sub(r"thankyou", "thank you", x)
  x = re.sub(r"a^r^o^o^o^o^o^o^o^n^d", "around", x)

  # Remove special characters and numbers replace by space + remove double space
  x = re.sub(r"\b([.]{3,})"," dots ", x)
  x = re.sub(r"[^A-Za-z!?_]+"," ", x)
  x = re.sub(r"\b([s])\b *","", x)
  x = re.sub(r" +"," ", x)
  x = x.strip()

  return x

In [121]:
# Defining the number of samples in train, validation and test dataset
size_train = df_train.shape[0]
size_val = df_val.shape[0]
size_test = df_test.shape[0]

# Defining the total number of samples
size_all = size_train + size_val + size_test

In [122]:
size_train

43410

In [123]:
# Shape of train, validation and test datasets
print("Train dataset has {} samples and represents {:.2f}% of overall data".format(size_train, size_train/size_all*100))
print("Validation dataset has {} samples and represents {:.2f}% of overall data".format(size_val, size_val/size_all*100))
print("Test dataset has {} samples and represents {:.2f}% of overall data".format(size_test, size_test/size_all*100))
print()
print("The total number of samples is : {}".format(size_all))

Train dataset has 43410 samples and represents 80.00% of overall data
Validation dataset has 5426 samples and represents 10.00% of overall data
Test dataset has 5427 samples and represents 10.00% of overall data

The total number of samples is : 54263


In [124]:
# Concatenating the 3 datasets for labels preprocessing
df_all = pd.concat([df_train, df_val, df_test], axis=0).reset_index(drop=True)

# Preview of data
display(df_all.head(5))

print(df_all.shape)

Unnamed: 0,Text,Class,List of classes,Len of classes,Emotions,Mapped Emotions
0,My favourite food is anything I didn't have to...,27,[27],1,[neutral],[neutral]
1,"Now if he does off himself, everyone will thin...",27,[27],1,[neutral],[neutral]
2,WHY THE FUCK IS BAYLESS ISOING,2,[2],1,[anger],[anger]
3,To make her feel threatened,14,[14],1,[fear],[fear]
4,Dirty Southern Wankers,3,[3],1,[annoyance],[anger]


(54263, 6)


In [125]:
# Applying the preprocessing function on the dataset
df_all["Clean_text"] = df_all["Text"].apply(preprocess_corpus)

# Preview of data
display(df_all[['Text', 'Clean_text']].sample(5))

Unnamed: 0,Text,Clean_text
22039,I am in distance relationship so this probably...,i am in distance relationship so this probably...
4674,[NAME] was absolutely awful in 15.,name was absolutely awful in
2064,"Time to go to heaven, son.",time to go to heaven son
30695,"Ooh, political posturing! This accomplishes so...",oh political posturing ! this accomplishes so ...
5372,I got offered a couple of batches. They weren'...,i got offered a couple of batches they were no...


In [126]:
df_all

Unnamed: 0,Text,Class,List of classes,Len of classes,Emotions,Mapped Emotions,Clean_text
0,My favourite food is anything I didn't have to...,27,[27],1,[neutral],[neutral],my favourite food is anything i did not have t...
1,"Now if he does off himself, everyone will thin...",27,[27],1,[neutral],[neutral],now if he does off himself everyone will think...
2,WHY THE FUCK IS BAYLESS ISOING,2,[2],1,[anger],[anger],why the fuck is bayless isoing
3,To make her feel threatened,14,[14],1,[fear],[fear],to make her feel threatened
4,Dirty Southern Wankers,3,[3],1,[annoyance],[anger],dirty southern wankers
...,...,...,...,...,...,...,...
54258,Thanks. I was diagnosed with BP 1 after the ho...,15,[15],1,[gratitude],[joy],thanks i was diagnosed with bp after the hospi...
54259,Well that makes sense.,4,[4],1,[approval],[joy],well that makes sense
54260,Daddy issues [NAME],27,[27],1,[neutral],[neutral],daddy issues name
54261,So glad I discovered that subreddit a couple m...,0,[0],1,[admiration],[joy],so glad i discovered that subreddit a couple m...


In [127]:
# Keeping only necessary columns
df_all = df_all.drop(['Class','List of classes','Len of classes','Emotions'], axis=1)
df_all.head(3)

Unnamed: 0,Text,Mapped Emotions,Clean_text
0,My favourite food is anything I didn't have to...,[neutral],my favourite food is anything i did not have t...
1,"Now if he does off himself, everyone will thin...",[neutral],now if he does off himself everyone will think...
2,WHY THE FUCK IS BAYLESS ISOING,[anger],why the fuck is bayless isoing


In [128]:
# Dropping raw text column
df_all = df_all[ ['Clean_text','Mapped Emotions'] ]
df_all

Unnamed: 0,Clean_text,Mapped Emotions
0,my favourite food is anything i did not have t...,[neutral]
1,now if he does off himself everyone will think...,[neutral]
2,why the fuck is bayless isoing,[anger]
3,to make her feel threatened,[fear]
4,dirty southern wankers,[anger]
...,...,...
54258,thanks i was diagnosed with bp after the hospi...,[joy]
54259,well that makes sense,[joy]
54260,daddy issues name,[neutral]
54261,so glad i discovered that subreddit a couple m...,[joy]


In [129]:
emotion_dict={
"anger": 0,
"disgust": 1,
"fear": 2,
"joy": 3,
"sadness": 4,
"surprise": 5,
"neutral":6
}

In [130]:
# Defining a function that maps each emotion lables to index
def class2idx(emotion_lst):
    for e in emotion_lst:
        ind = emotion_dict[e]
    return ind

# Applying the function
df_all['Mapped_id'] = df_all['Mapped Emotions'].apply(class2idx)

# Preview of data
display(df_all.head(3))

Unnamed: 0,Clean_text,Mapped Emotions,Mapped_id
0,my favourite food is anything i did not have t...,[neutral],6
1,now if he does off himself everyone will think...,[neutral],6
2,why the fuck is bayless isoing,[anger],0


In [131]:
# Dropping Mapped Emotions column
df_all = df_all.drop(['Mapped Emotions'], axis=1)
df_all.head(3)

Unnamed: 0,Clean_text,Mapped_id
0,my favourite food is anything i did not have t...,6
1,now if he does off himself everyone will think...,6
2,why the fuck is bayless isoing,0


In [132]:
# Building a function that will divide in train, validation and test sets
def get_train_val_test(df):
    train = df.iloc[:size_train, :]
    val = df.iloc[size_train:size_train+size_val, :]
    test = df.iloc[size_train+size_val:size_train+size_val+size_test, :]
    return train, val, test

In [133]:
# Dividing back in train, validation and test datasets (GoEmotions)
train_ekman, val_ekman, test_ekman = get_train_val_test(df_all)
print(train_ekman.shape)
print(val_ekman.shape)
print(test_ekman.shape)

(43410, 2)
(5426, 2)
(5427, 2)


In [134]:
train_ekman_no_neu = train_ekman.copy()
val_ekman_no_neu = val_ekman.copy()
test_ekman_no_neu = test_ekman.copy()

In [135]:
train_ekman_no_neu

Unnamed: 0,Clean_text,Mapped_id
0,my favourite food is anything i did not have t...,6
1,now if he does off himself everyone will think...,6
2,why the fuck is bayless isoing,0
3,to make her feel threatened,2
4,dirty southern wankers,0
...,...,...
43405,added you mate well i have just got the bow an...,3
43406,always thought that was funny but is it a refe...,5
43407,what are you talking about ? anything bad that...,0
43408,more like a baptism with sexy results !,3


In [136]:
train_ekman_no_neu = train_ekman_no_neu[train_ekman_no_neu['Mapped_id'] !=6]
val_ekman_no_neu = val_ekman_no_neu[val_ekman_no_neu['Mapped_id'] != 6]
test_ekman_no_neu = test_ekman_no_neu[test_ekman_no_neu['Mapped_id'] != 6]

In [137]:
class_label_names_ekman = ['anger', 'disgust', 'fear', 'joy', 'sadness', 'surprise', 'neutral']

In [138]:
class_label_names_ekman_no_neu = ['anger', 'disgust', 'fear', 'joy', 'sadness', 'surprise']

In [139]:
X_train_ekman = train_ekman[:]["Clean_text"]
y_train_ekman = train_ekman[:]["Mapped_id"]
X_train_ekman_no_neu = train_ekman_no_neu[:]["Clean_text"]
y_train_ekman_no_neu = train_ekman_no_neu[:]["Mapped_id"]

X_val_ekman = val_ekman[:]["Clean_text"]
y_val_ekman = val_ekman[:]["Mapped_id"]
X_val_ekman_no_neu = val_ekman_no_neu[:]["Clean_text"]
y_val_ekman_no_neu = val_ekman_no_neu[:]["Mapped_id"]

X_test_ekman = test_ekman[:]["Clean_text"]
y_test_ekman = test_ekman[:]["Mapped_id"]
X_test_ekman_no_neu = test_ekman_no_neu[:]["Clean_text"]
y_test_ekman_no_neu = test_ekman_no_neu[:]["Mapped_id"]

print(X_train_ekman.shape, y_train_ekman.shape, X_val_ekman.shape, y_val_ekman.shape, X_test_ekman.shape, y_test_ekman.shape)

(43410,) (43410,) (5426,) (5426,) (5427,) (5427,)


In [140]:
# TF-IDF vector with 1000 words vocabulary 
vectorizer2 = TfidfVectorizer(stop_words=stop_words, max_features=1000)

# Fitting the vectorizer and transforming train, validation and test data
tfidf_train_ekman = vectorizer.fit_transform(train_ekman['Clean_text'])
tfidf_val_ekman = vectorizer.transform(val_ekman['Clean_text'])
tfidf_test_ekman = vectorizer.transform(test_ekman['Clean_text'])

tfidf_train_ekman_no_neu = vectorizer.fit_transform(train_ekman_no_neu['Clean_text'])
tfidf_val_ekman_no_neu = vectorizer.transform(val_ekman_no_neu['Clean_text'])
tfidf_test_ekman_no_neu = vectorizer.transform(test_ekman_no_neu['Clean_text'])


# Transforming from generators to arrays

tfidf_train_ekman = tfidf_train_ekman.toarray()
tfidf_val_ekman = tfidf_val_ekman.toarray()
tfidf_test_ekman = tfidf_test_ekman.toarray()

tfidf_train_ekman_no_neu = tfidf_train_ekman_no_neu.toarray()
tfidf_val_ekman_no_neu = tfidf_val_ekman_no_neu.toarray()
tfidf_test_ekman_no_neu = tfidf_test_ekman_no_neu.toarray()


# Validating the shape of train and test data
print(tfidf_train_ekman.shape)
print(tfidf_val_ekman.shape)
print(tfidf_test_ekman.shape)
print()
print(tfidf_train_ekman_no_neu.shape)
print(tfidf_val_ekman_no_neu.shape)
print(tfidf_test_ekman_no_neu.shape)


(43410, 1000)
(5426, 1000)
(5427, 1000)

(29191, 1000)
(3660, 1000)
(3640, 1000)


In [141]:
# Defining train and test variables
X_train_ekman = tfidf_train_ekman
X_train_ekman_no_neu = tfidf_train_ekman_no_neu

# y_train_new = train_GE.loc[:,GE_taxonomy].values
# y_train_no_neu = train_GE_no_neu.loc[:,GE_taxonomy_no_neu].values
X_val_ekman =tfidf_val_ekman
X_val_ekman_no_neu =tfidf_val_ekman_no_neu


X_test_ekman =tfidf_test_ekman
X_test_ekman_no_neu =tfidf_test_ekman_no_neu
# y_test = test_GE.loc[:,GE_taxonomy].values
# y_test_no_neu = test_GE_no_neu.loc[:,GE_taxonomy_no_neu].values

# Shape validation
print("The shape of X_train is : ", X_train_ekman.shape)
print("The shape of X_train_no_neu is : ", X_train_ekman_no_neu.shape)
print()
print("The shape of X_val is : ", X_val_ekman.shape)
print("The shape of X_val_no_neu is : ", X_val_ekman_no_neu.shape)
print()
print("The shape of X_test is : ", X_test_ekman.shape)
print("The shape of X_test_no_neu is : ", X_test_ekman_no_neu.shape)

The shape of X_train is :  (43410, 1000)
The shape of X_train_no_neu is :  (29191, 1000)

The shape of X_val is :  (5426, 1000)
The shape of X_val_no_neu is :  (3660, 1000)

The shape of X_test is :  (5427, 1000)
The shape of X_test_no_neu is :  (3640, 1000)


In [142]:
# Model evaluation function 
def model_eval(y_true, y_pred_labels, emotions):
    
    # Defining variables
    precision = []
    recall = []
    f1 = []
    
    # Per emotion evaluation      
    idx2emotion = {i: e for i, e in enumerate(emotions)}
    
    for i in range(len(emotions)):
   
        # Computing precision, recall and f1-score
        p, r, f1_score, _ = precision_recall_fscore_support(y_true[:, i], y_pred_labels[:, i], average="binary")
        
        # Append results in lists
        precision.append(round(p, 2))
        recall.append(round(r, 2))
        f1.append(round(f1_score, 2))
    
    # Macro evaluation
    macro_p, macro_r, macro_f1_score, _ = precision_recall_fscore_support(y_true, y_pred_labels, average="macro")
    
    # Append results in lists
    precision.append(round(macro_p, 2))
    recall.append(round(macro_r, 2))
    f1.append(round(macro_f1_score, 2))
    
    # Converting results to a dataframe
    df_results = pd.DataFrame({"Precision":precision, "Recall":recall, 'F1':f1})
    df_results.index = emotions+['MACRO-AVERAGE']
    
    return df_results

# 4 - Conventional models: 

In this section, we will train a simple classification algorithms, SGD, SVM, KNN, Desion Tree, Random Forest. However, SGD, SVM, KNN algorithm does not support multi-label classification. A simple strategy to do that consists of fitting one model per target using the `MultiOutputClassifier`.

## 4.1 - Training the model and evaluation

### SGD Classifier

### SGD Model with Goemotion labels (28) including neutral emotion

#### Model 1

"modified_huber", "perceptron", 

In [75]:
sgdc1 = SGDClassifier(random_state=42)
params = {'estimator__max_iter' :[2000], 'estimator__tol':[0.1], 'estimator__loss': ["hinge"], 'estimator__penalty': ["l1"]}
classifier1 = MultiOutputClassifier(sgdc1)
Grid_sgd1 = GridSearchCV(estimator = classifier1, cv = 3, param_grid = params, scoring  = 'accuracy',n_jobs=-1, verbose=1)
Grid_sgd1.fit(X_train, y_train)

Fitting 3 folds for each of 1 candidates, totalling 3 fits


In [76]:
Grid_sgd1.best_params_

{'estimator__loss': 'hinge',
 'estimator__max_iter': 2000,
 'estimator__penalty': 'l1',
 'estimator__tol': 0.1}

In [77]:
Grid_sgd1.best_score_

0.36924671734623354

In [70]:
Grid_sgd1.best_params_

{'estimator__loss': 'hinge',
 'estimator__max_iter': 1000,
 'estimator__penalty': 'l1',
 'estimator__tol': 0.1}

In [71]:
Grid_sgd1.best_score_

0.36924671734623354

In [65]:
Grid_sgd1.best_params_

{'estimator__loss': 'hinge',
 'estimator__max_iter': 1000,
 'estimator__tol': 0.1}

In [66]:
Grid_sgd1.best_score_

0.32833448514167246

In [62]:
Grid_sgd1.best_params_

{'estimator__loss': 'hinge',
 'estimator__max_iter': 1000,
 'estimator__tol': 0.1}

In [63]:
Grid_sgd1.best_score_

0.32833448514167246

In [64]:
Grid_sgd1.best_params_

{'estimator__loss': 'hinge',
 'estimator__max_iter': 1000,
 'estimator__tol': 0.01}

In [65]:
Grid_sgd1.best_score_

0.28329877908316053

In [61]:
Grid_sgd1.best_params_

{'estimator__loss': 'hinge',
 'estimator__max_iter': 5000,
 'estimator__tol': 0.1}

In [62]:
Grid_sgd1.best_score_

0.27756277355448056

In [58]:
Grid_sgd1.best_params_

{'estimator__loss': 'hinge',
 'estimator__max_iter': 5000,
 'estimator__tol': 0.1}

In [59]:
Grid_sgd1.best_score_

0.2924902096291177

In [55]:
Grid_sgd1.best_params_

{'estimator__loss': 'hinge',
 'estimator__max_iter': 5000,
 'estimator__tol': 0.01}

In [56]:
Grid_sgd1.best_score_

0.28590186592950934

In [117]:
Grid_sgd1.best_params_

{'estimator__max_iter': 5000, 'estimator__tol': 0.001}

In [118]:
Grid_sgd1.best_score_

0.29295093296475466

In [114]:
Grid_sgd1.best_params_

{'estimator__max_iter': 1000, 'estimator__tol': 0.01}

In [115]:
Grid_sgd1.best_score_

0.2945173923059203



### SGD Model -  GoEmotions with neutral
- 'estimator__loss': 'hinge',
- 'estimator__max_iter': 1000,
- 'estimator__penalty': 'l1',
- 'estimator__tol': 0.1

---

In [54]:
sgdc1 = SGDClassifier(max_iter=1000, tol=0.1, loss='hinge', penalty='l1', random_state=42)

classifier_sgdc1 = MultiOutputClassifier(sgdc1)

classifier_sgdc1.fit(X_train, y_train)


In [55]:
# Making predictions on GoEmotions taxonomy for train
classifier_preds = classifier_sgdc1.predict(X_train)

#cm = confusion_matrix(y_train, classifier_preds)
#print(cm)

# Model evaluation
model_eval(y_train, classifier_preds, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.75,0.39,0.52
amusement,0.75,0.79,0.77
anger,0.65,0.12,0.2
annoyance,0.8,0.01,0.03
approval,0.69,0.07,0.13
caring,0.63,0.02,0.04
confusion,0.84,0.04,0.07
curiosity,0.91,0.05,0.1
desire,0.61,0.31,0.41
disappointment,0.0,0.0,0.0


In [56]:
# Making predictions on GoEmotions taxonomy for test
y_test_pred = classifier_sgdc1.predict(X_test)

# Model evaluation
model_eval(y_test, y_test_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.69,0.38,0.49
amusement,0.76,0.81,0.78
anger,0.68,0.09,0.15
annoyance,0.9,0.03,0.05
approval,0.67,0.09,0.16
caring,1.0,0.02,0.04
confusion,0.89,0.05,0.1
curiosity,0.86,0.02,0.04
desire,0.57,0.2,0.3
disappointment,0.0,0.0,0.0


In [57]:
# Making predictions on GoEmotions taxonomy for cal
y_val_pred = classifier_sgdc1.predict(X_val)

# Model evaluation
model_eval(y_val, y_val_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.75,0.45,0.57
amusement,0.76,0.78,0.77
anger,0.63,0.1,0.17
annoyance,0.5,0.01,0.02
approval,0.75,0.08,0.15
caring,0.25,0.01,0.01
confusion,1.0,0.07,0.12
curiosity,0.95,0.07,0.13
desire,0.74,0.38,0.5
disappointment,0.0,0.0,0.0


In [59]:
accuracy_score(y_val, y_val_pred)

0.33892370070033173

### Model 2 - Final


max_iter=1000, tol=0.1, loss='hinge', penalty='none', random_state=42, n_jobs=-1

In [118]:
sgdc1 = SGDClassifier(max_iter=1000, tol=0.1, loss='hinge', penalty='none', random_state=42, n_jobs=-1)

classifier_sgdc1 = MultiOutputClassifier(sgdc1)

classifier_sgdc1.fit(X_train, y_train)


In [119]:
# Making predictions on GoEmotions taxonomy for train
classifier_preds = classifier_sgdc1.predict(X_train)

#cm = confusion_matrix(y_train, classifier_preds)
#print(cm)

# Model evaluation
model_eval(y_train, classifier_preds, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.79,0.46,0.58
amusement,0.79,0.78,0.79
anger,0.71,0.28,0.41
annoyance,0.63,0.04,0.07
approval,0.71,0.07,0.13
caring,0.62,0.09,0.16
confusion,0.84,0.04,0.07
curiosity,0.91,0.05,0.1
desire,0.78,0.32,0.46
disappointment,0.61,0.02,0.03


In [120]:
# Making predictions on GoEmotions taxonomy for test
y_test_pred = classifier_sgdc1.predict(X_test)

# Model evaluation
model_eval(y_test, y_test_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.71,0.38,0.5
amusement,0.77,0.76,0.76
anger,0.53,0.18,0.27
annoyance,0.58,0.04,0.08
approval,0.67,0.09,0.15
caring,0.73,0.08,0.15
confusion,0.89,0.05,0.1
curiosity,0.86,0.02,0.04
desire,0.57,0.19,0.29
disappointment,1.0,0.03,0.05


In [121]:
# Making predictions on GoEmotions taxonomy for cal
y_val_pred = classifier_sgdc1.predict(X_val)

# Model evaluation
model_eval(y_val, y_val_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.76,0.49,0.6
amusement,0.75,0.7,0.72
anger,0.57,0.26,0.35
annoyance,0.35,0.02,0.04
approval,0.78,0.08,0.14
caring,0.5,0.06,0.11
confusion,1.0,0.07,0.12
curiosity,0.95,0.07,0.13
desire,0.74,0.32,0.45
disappointment,0.0,0.0,0.0


In [122]:
accuracy_score(y_val, y_val_pred)

0.33966089200147437

### SGD Model 2  - Goemotions excluding neutral - Hyperparameter tuning

"modified_huber", "perceptron",  'estimator__tol':[0.1,0.01],

In [135]:
y_train_no_neu.shape

(30587, 27)

In [173]:
sgdc2 = SGDClassifier(random_state=42)
params = {'estimator__max_iter' :[1000], 
          'estimator__tol':[0.1], 
          'estimator__loss': ["hinge"], 
          'estimator__penalty': ['none'],
          'estimator__alpha' : [3e-05,2e-005]}
classifier2 = MultiOutputClassifier(sgdc2)
Grid_sgd2 = GridSearchCV(estimator = classifier2, cv = 3, param_grid = params, scoring  = 'accuracy',n_jobs=-1, verbose=1)
Grid_sgd2.fit(X_train_no_neu, y_train_no_neu)

Fitting 3 folds for each of 2 candidates, totalling 6 fits


In [174]:
Grid_sgd2.best_params_

{'estimator__alpha': 3e-05,
 'estimator__loss': 'hinge',
 'estimator__max_iter': 1000,
 'estimator__penalty': 'none',
 'estimator__tol': 0.1}

In [175]:
Grid_sgd2.best_score_

0.3002579329080062

In [162]:
Grid_sgd2.best_params_

{'estimator__alpha': 3e-05,
 'estimator__loss': 'hinge',
 'estimator__max_iter': 1000,
 'estimator__penalty': 'none',
 'estimator__tol': 0.1}

In [163]:
Grid_sgd2.best_score_

0.3002579329080062

In [146]:
Grid_sgd2.best_params_

{'estimator__loss': 'hinge',
 'estimator__max_iter': 1000,
 'estimator__penalty': 'none',
 'estimator__tol': 0.1}

In [147]:
Grid_sgd2.best_score_

0.30418171662775945

In [140]:
Grid_sgd2.best_params_

{'estimator__loss': 'hinge',
 'estimator__max_iter': 2000,
 'estimator__penalty': 'none',
 'estimator__tol': 0.1}

In [141]:
Grid_sgd2.best_score_

0.30418171662775945

In [133]:
Grid_sgd2.best_params_

{'estimator__loss': 'modified_huber',
 'estimator__max_iter': 1000,
 'estimator__penalty': 'l1',
 'estimator__tol': 0.1}

In [134]:
Grid_sgd2.best_score_

0.29411169650940955

In [131]:
Grid_sgd2.best_params_

{'estimator__loss': 'modified_huber',
 'estimator__max_iter': 1000,
 'estimator__penalty': 'l1',
 'estimator__tol': 0.1}

In [130]:
Grid_sgd2.best_score_

0.29411169650940955

In [125]:
Grid_sgd2.best_params_

{'estimator__loss': 'modified_huber',
 'estimator__max_iter': 1000,
 'estimator__penalty': 'l1'}

In [126]:
Grid_sgd2.best_score_

0.285317314716885

In [117]:
Grid_sgd2.best_params_

{'estimator__loss': 'modified_huber',
 'estimator__max_iter': 1000,
 'estimator__penalty': 'l1',
 'estimator__tol': 0.1}

In [118]:
Grid_sgd2.best_score_

0.28714766744996484

In [108]:
Grid_sgd2.best_params_

{'estimator__loss': 'hinge',
 'estimator__max_iter': 1000,
 'estimator__penalty': 'l1',
 'estimator__tol': 0.01}

In [109]:
Grid_sgd2.best_score_

0.27684978155470097

In [105]:
Grid_sgd2.best_params_

{'estimator__loss': 'modified_huber',
 'estimator__max_iter': 1000,
 'estimator__penalty': 'l1',
 'estimator__tol': 0.1}

In [106]:
Grid_sgd2.best_score_

0.29411169650940955

In [102]:
Grid_sgd2.best_params_

{'estimator__loss': 'modified_huber',
 'estimator__max_iter': 1000,
 'estimator__penalty': 'l1',
 'estimator__tol': 0.1}

In [103]:
Grid_sgd2.best_score_

0.29411169650940955

In [99]:
Grid_sgd2.best_params_

{'estimator__loss': 'hinge',
 'estimator__max_iter': 1000,
 'estimator__penalty': 'l1',
 'estimator__tol': 0.01}

In [100]:
Grid_sgd2.best_score_

0.27684978155470097

In [96]:
Grid_sgd2.best_params_

{'estimator__loss': 'hinge',
 'estimator__max_iter': 1000,
 'estimator__penalty': 'l1',
 'estimator__tol': 0.01}

In [97]:
Grid_sgd2.best_score_

0.27684978155470097

In [87]:
Grid_sgd2.best_params_

{'estimator__loss': 'hinge',
 'estimator__max_iter': 1000,
 'estimator__penalty': 'l1',
 'estimator__tol': 0.1}

In [88]:
Grid_sgd2.best_score_

0.2762613379366509

### SGD Model -  GoEmotions excluding neutral emotion

- 'estimator__alpha': 3e-05
- 'estimator__loss': 'hinge'
- 'estimator__max_iter': 1000
- 'estimator__penalty': 'none'
- 'estimator__tol': 0.1

In [176]:
sgdc2 = SGDClassifier(max_iter=1000, tol=0.1, loss='hinge', penalty='none', random_state=42, n_jobs=-1)

classifier_sgdc2 = MultiOutputClassifier(sgdc2)

classifier_sgdc2.fit(X_train_no_neu, y_train_no_neu)


In [178]:
# Making predictions on GoEmotions taxonomy for train
classifier_preds2 = classifier_sgdc2.predict(X_train_no_neu)

#cm = confusion_matrix(y_train, classifier_preds)
#print(cm)

# Model evaluation
model_eval(y_train_no_neu, classifier_preds2, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.8,0.58,0.67
amusement,0.85,0.79,0.82
anger,0.72,0.41,0.52
annoyance,0.65,0.12,0.2
approval,0.72,0.14,0.24
caring,0.72,0.16,0.27
confusion,0.79,0.08,0.15
curiosity,0.98,0.05,0.1
desire,0.8,0.4,0.54
disappointment,0.73,0.06,0.11


In [180]:
# Making predictions on GoEmotions taxonomy for test
y_test_pred2 = classifier_sgdc2.predict(X_test_no_neu)

# Model evaluation
model_eval(y_test_no_neu, y_test_pred2, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.71,0.5,0.59
amusement,0.82,0.77,0.79
anger,0.51,0.29,0.37
annoyance,0.54,0.11,0.18
approval,0.68,0.16,0.25
caring,0.73,0.14,0.24
confusion,0.71,0.11,0.19
curiosity,1.0,0.02,0.04
desire,0.64,0.22,0.32
disappointment,0.92,0.07,0.13


In [181]:
# Making predictions on GoEmotions taxonomy for cal
y_val_pred2 = classifier_sgdc2.predict(X_val_no_neu)

# Model evaluation
model_eval(y_val_no_neu, y_val_pred2, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.73,0.57,0.64
amusement,0.82,0.74,0.78
anger,0.63,0.33,0.44
annoyance,0.59,0.11,0.18
approval,0.7,0.12,0.2
caring,0.62,0.1,0.18
confusion,0.86,0.12,0.21
curiosity,1.0,0.07,0.14
desire,0.74,0.42,0.53
disappointment,0.38,0.03,0.06


In [183]:
accuracy_score(y_val_no_neu, y_val_pred2)

0.3205529473135107

### SGD Model 3  - ekman including neutral emotion - Hyperparameter tuning

"modified_huber", "perceptron",  'estimator__tol':[0.1,0.01],, 
          'estimator__loss': ["hinge"], 
          'estimator__penalty': ['none'],
          'estimator__alpha' : [3e-05,2e-005]

In [221]:
sgdc3 = SGDClassifier(random_state=42)
params = {'max_iter' :[1000], 
          'tol':[0.0001,0.00001], 
          'loss': ["modified_huber",'hinge'], 
          'penalty': ['l1','l2'] }

Grid_sgd3 = GridSearchCV(estimator = sgdc3, cv = 3, param_grid = params, scoring  = 'accuracy', n_jobs=-1, verbose=1)
Grid_sgd3.fit(X_train_ekman, y_train_ekman)

Fitting 3 folds for each of 8 candidates, totalling 24 fits


In [222]:
Grid_sgd3.best_params_

{'loss': 'modified_huber', 'max_iter': 1000, 'penalty': 'l1', 'tol': 0.0001}

In [223]:
Grid_sgd3.best_score_

0.6117023727251786

In [219]:
Grid_sgd3.best_params_

{'loss': 'modified_huber', 'max_iter': 1000, 'penalty': 'l1', 'tol': 0.0001}

In [220]:
Grid_sgd3.best_score_

0.6117023727251786

In [216]:
Grid_sgd3.best_params_

{'loss': 'modified_huber', 'max_iter': 1000, 'penalty': 'l1', 'tol': 0.0001}

In [217]:
Grid_sgd3.best_score_

0.6117023727251786

In [213]:
Grid_sgd3.best_params_

{'loss': 'modified_huber', 'max_iter': 1000, 'penalty': 'l1', 'tol': 0.001}

In [214]:
Grid_sgd3.best_score_

0.6083160562082469

In [210]:
Grid_sgd3.best_params_

{'loss': 'modified_huber', 'max_iter': 1000, 'penalty': 'l1', 'tol': 0.001}

In [211]:
Grid_sgd3.best_score_

0.6083160562082469

In [206]:
Grid_sgd3.best_params_

{'loss': 'modified_huber', 'max_iter': 1000, 'penalty': 'l1', 'tol': 0.001}

In [207]:
Grid_sgd3.best_score_

0.6083160562082469

In [203]:
Grid_sgd3.best_params_

{'loss': 'modified_huber', 'max_iter': 1000, 'penalty': 'none', 'tol': 0.001}

In [204]:
Grid_sgd3.best_score_

0.5911771481225524

In [199]:
Grid_sgd3.best_params_

{'loss': 'modified_huber', 'max_iter': 1000, 'tol': 0.001}

In [200]:
Grid_sgd3.best_score_

0.6051601013591338

In [196]:
Grid_sgd3.best_params_

{'loss': 'hinge', 'max_iter': 1000, 'tol': 0.01}

In [197]:
Grid_sgd3.best_score_

0.605897258696153

In [192]:
Grid_sgd3.best_params_

{'max_iter': 1000, 'tol': 0.01}

In [193]:
Grid_sgd3.best_score_

0.605897258696153

In [189]:
Grid_sgd3.best_params_

{'max_iter': 1000, 'tol': 0.01}

In [190]:
Grid_sgd3.best_score_

0.605897258696153

In [186]:
Grid_sgd3.best_params_

{'max_iter': 1000, 'tol': 0.1}

In [187]:
Grid_sgd3.best_score_

0.604538124856024

### SGD Model -  ekman including neutral emotion

- 'loss': 'modified_huber'
- 'max_iter': 1000
- 'penalty': 'l1'
- 'tol': 0.0001

In [224]:
sgdc3 = SGDClassifier(random_state=42,max_iter=1000, tol = 0.0001, penalty='l1', loss = 'modified_huber', n_jobs=-1 )

sgdc3.fit(X_train_ekman, y_train_ekman)

In [227]:
# Making predictions on GoEmotions taxonomy for train
y_pred_train_ekman = sgdc3.predict(X_train_ekman)

#cm = confusion_matrix(y_train, classifier_preds)
#print(cm)

# Model evaluation
print(classification_report(y_train_ekman, y_pred_train_ekman, target_names=class_label_names_ekman))

              precision    recall  f1-score   support

       anger       0.56      0.26      0.36      4517
     disgust       0.58      0.31      0.40       694
        fear       0.59      0.44      0.50       642
         joy       0.77      0.74      0.75     15693
     sadness       0.70      0.43      0.53      2938
    surprise       0.67      0.19      0.30      4707
     neutral       0.53      0.82      0.64     14219

    accuracy                           0.63     43410
   macro avg       0.63      0.46      0.50     43410
weighted avg       0.65      0.63      0.60     43410



In [228]:
# Making predictions on GoEmotions taxonomy for test
y_test_pred_ekman = sgdc3.predict(X_test_ekman)

# Model evaluation
print(classification_report(y_test_ekman, y_test_pred_ekman, target_names=class_label_names_ekman))

              precision    recall  f1-score   support

       anger       0.57      0.24      0.34       595
     disgust       0.65      0.37      0.47       112
        fear       0.66      0.53      0.59        87
         joy       0.75      0.74      0.74      1915
     sadness       0.67      0.39      0.49       341
    surprise       0.60      0.15      0.25       590
     neutral       0.52      0.83      0.64      1787

    accuracy                           0.62      5427
   macro avg       0.63      0.46      0.50      5427
weighted avg       0.63      0.62      0.59      5427



In [229]:
# Making predictions on GoEmotions taxonomy for cal
y_val_pred_ekman = sgdc3.predict(X_val_ekman)

# Model evaluation
print(classification_report(y_val_ekman, y_val_pred_ekman, target_names=class_label_names_ekman))

              precision    recall  f1-score   support

       anger       0.51      0.23      0.31       582
     disgust       0.57      0.40      0.47        81
        fear       0.63      0.43      0.51        89
         joy       0.79      0.74      0.76      1997
     sadness       0.69      0.37      0.48       352
    surprise       0.63      0.18      0.28       559
     neutral       0.52      0.83      0.64      1766

    accuracy                           0.62      5426
   macro avg       0.62      0.45      0.49      5426
weighted avg       0.64      0.62      0.60      5426



### Model 2


In [135]:
sgdc3 = SGDClassifier(random_state=42,max_iter=1000, tol = 0.1, penalty='none', loss = 'hinge', n_jobs=-1 )

sgdc3.fit(X_train_ekman, y_train_ekman)

In [136]:
# Making predictions on GoEmotions taxonomy for train
y_pred_train_ekman = sgdc3.predict(X_train_ekman)

#cm = confusion_matrix(y_train, classifier_preds)
#print(cm)

# Model evaluation
print(classification_report(y_train_ekman, y_pred_train_ekman, target_names=class_label_names_ekman))

              precision    recall  f1-score   support

       anger       0.60      0.24      0.35      4517
     disgust       0.64      0.36      0.46       694
        fear       0.61      0.52      0.56       642
         joy       0.76      0.74      0.75     15693
     sadness       0.64      0.48      0.55      2938
    surprise       0.60      0.20      0.30      4707
     neutral       0.53      0.82      0.65     14219

    accuracy                           0.63     43410
   macro avg       0.63      0.48      0.52     43410
weighted avg       0.64      0.63      0.60     43410



In [137]:
# Making predictions on GoEmotions taxonomy for test
y_test_pred_ekman = sgdc3.predict(X_test_ekman)

# Model evaluation
print(classification_report(y_test_ekman, y_test_pred_ekman, target_names=class_label_names_ekman))

              precision    recall  f1-score   support

       anger       0.57      0.22      0.32       595
     disgust       0.67      0.38      0.48       112
        fear       0.64      0.55      0.59        87
         joy       0.75      0.73      0.74      1915
     sadness       0.58      0.42      0.49       341
    surprise       0.49      0.17      0.25       590
     neutral       0.53      0.81      0.64      1787

    accuracy                           0.61      5427
   macro avg       0.60      0.47      0.50      5427
weighted avg       0.61      0.61      0.58      5427



In [138]:
# Making predictions on GoEmotions taxonomy for cal
y_val_pred_ekman = sgdc3.predict(X_val_ekman)

# Model evaluation
print(classification_report(y_val_ekman, y_val_pred_ekman, target_names=class_label_names_ekman))

              precision    recall  f1-score   support

       anger       0.55      0.20      0.30       582
     disgust       0.53      0.41      0.46        81
        fear       0.62      0.49      0.55        89
         joy       0.77      0.74      0.76      1997
     sadness       0.61      0.41      0.49       352
    surprise       0.54      0.19      0.28       559
     neutral       0.53      0.81      0.64      1766

    accuracy                           0.62      5426
   macro avg       0.59      0.47      0.50      5426
weighted avg       0.63      0.62      0.60      5426



### SGD Model 4  - ekman excluding neutral emotion - Hyperparameter tuning

"modified_huber", "perceptron",  'estimator__tol':[0.1,0.01],, 
          'estimator__loss': ["hinge"], 
          'estimator__penalty': ['none'],
          'estimator__alpha' : [3e-05,2e-005]

In [280]:
sgdc4 = SGDClassifier(random_state=42)
params = {'max_iter' :[1000], 
          'tol':[0.0001], 
          'loss': ["hinge"],
          'penalty': ['l2','l1'],
          'alpha' : [3e-05,4e-05]
         }
          

Grid_sgd4 = GridSearchCV(estimator = sgdc4, cv = 3, param_grid = params, scoring  = 'f1_macro', n_jobs=-1, verbose=1)
Grid_sgd4.fit(X_train_ekman_no_neu, y_train_ekman_no_neu)

Fitting 3 folds for each of 4 candidates, totalling 12 fits


In [281]:
Grid_sgd4.best_params_

{'alpha': 4e-05,
 'loss': 'hinge',
 'max_iter': 1000,
 'penalty': 'l2',
 'tol': 0.0001}

In [282]:
Grid_sgd4.best_score_

0.5438814804263884

In [278]:
Grid_sgd4.best_params_

{'alpha': 4e-05,
 'loss': 'modified_huber',
 'max_iter': 1000,
 'penalty': 'l2',
 'tol': 0.0001}

In [279]:
Grid_sgd4.best_score_

0.5477244624716312

In [275]:
Grid_sgd4.best_params_

{'alpha': 4e-05,
 'loss': 'modified_huber',
 'max_iter': 1000,
 'penalty': 'l2',
 'tol': 0.0001}

In [276]:
Grid_sgd4.best_score_

0.5477244624716312

In [272]:
Grid_sgd4.best_params_

{'alpha': 4e-05,
 'loss': 'modified_huber',
 'max_iter': 1000,
 'penalty': 'l2',
 'tol': 0.0001}

In [273]:
Grid_sgd4.best_score_

0.5477244624716312

In [269]:
Grid_sgd4.best_params_

{'alpha': 4e-05, 'loss': 'modified_huber', 'max_iter': 1000, 'tol': 0.001}

In [270]:
Grid_sgd4.best_score_

0.5459002799533184

In [266]:
Grid_sgd4.best_params_

{'alpha': 4e-05,
 'loss': 'modified_huber',
 'max_iter': 1000,
 'penalty': 'l1',
 'tol': 0.001}

In [267]:
Grid_sgd4.best_score_

0.532623921310128

In [262]:
Grid_sgd4.best_params_

{'alpha': 4e-05, 'loss': 'hinge', 'max_iter': 1000, 'tol': 0.01}

In [263]:
Grid_sgd4.best_score_

0.5456841648068878

In [259]:
Grid_sgd4.best_params_

{'alpha': 4e-05, 'loss': 'hinge', 'max_iter': 1000, 'tol': 0.01}

In [260]:
Grid_sgd4.best_score_

0.6822308238234053

In [253]:
Grid_sgd4.best_params_

{'alpha': 4e-05, 'loss': 'hinge', 'max_iter': 1000, 'tol': 0.01}

In [254]:
Grid_sgd4.best_score_

0.5456841648068878

In [247]:
Grid_sgd4.best_params_

{'alpha': 3e-05, 'loss': 'hinge', 'max_iter': 1000, 'tol': 0.1}

In [248]:
Grid_sgd4.best_score_

0.5439326278849789

In [244]:
Grid_sgd4.best_params_

{'loss': 'hinge', 'max_iter': 1000, 'tol': 0.1}

In [245]:
Grid_sgd4.best_score_

0.542036750800761

In [241]:
Grid_sgd4.best_params_

{'loss': 'perceptron', 'max_iter': 1000, 'tol': 0.01}

In [242]:
Grid_sgd4.best_score_

0.4833476463634663

In [238]:
Grid_sgd4.best_params_

{'loss': 'modified_huber', 'max_iter': 1000, 'tol': 0.1}

In [239]:
Grid_sgd4.best_score_

0.5380909022264274

In [234]:
Grid_sgd4.best_params_

{'max_iter': 1000, 'tol': 0.1}

In [235]:
Grid_sgd4.best_score_

0.542036750800761

In [231]:
Grid_sgd4.best_params_

{'max_iter': 1000, 'tol': 0.01}

In [232]:
Grid_sgd4.best_score_

0.683943623027793

In [None]:
'max_iter' :[1000], 
          'tol':[0.1,0.01], 
          'loss': ["hinge"],
          'alpha' : [3e-5,4e-05]

### SGD Model -  ekman excluding neutral emotion

- 'max_iter' :1000, 
- 'tol':0.0001
- 'loss': "modified_huber"
- 'alpha': 4e-05,
- 'penalty': 'l2'



In [283]:
sgdc4 = SGDClassifier(random_state=42,max_iter=1000, tol = 0.0001, penalty='l2', loss = 'modified_huber', n_jobs=-1, alpha= 4e-05)

sgdc4.fit(X_train_ekman_no_neu, y_train_ekman_no_neu)

In [284]:
# Making predictions on GoEmotions taxonomy for train
y_pred_train_ekman_no_neu = sgdc4.predict(X_train_ekman_no_neu)

#cm = confusion_matrix(y_train, classifier_preds)
#print(cm)

# Model evaluation
print(classification_report(y_train_ekman_no_neu, y_pred_train_ekman_no_neu, target_names=class_label_names_ekman_no_neu))

              precision    recall  f1-score   support

       anger       0.63      0.52      0.57      4517
     disgust       0.69      0.45      0.54       694
        fear       0.71      0.51      0.59       642
         joy       0.75      0.93      0.83     15693
     sadness       0.73      0.56      0.63      2938
    surprise       0.67      0.37      0.48      4707

    accuracy                           0.72     29191
   macro avg       0.69      0.56      0.61     29191
weighted avg       0.71      0.72      0.70     29191



In [285]:
# Making predictions on GoEmotions taxonomy for test
y_test_pred_ekman_no_neu = sgdc4.predict(X_test_ekman_no_neu)

# Model evaluation
print(classification_report(y_test_ekman_no_neu, y_test_pred_ekman_no_neu, target_names=class_label_names_ekman_no_neu))

              precision    recall  f1-score   support

       anger       0.58      0.47      0.52       595
     disgust       0.65      0.42      0.51       112
        fear       0.74      0.57      0.65        87
         joy       0.72      0.91      0.80      1915
     sadness       0.63      0.49      0.55       341
    surprise       0.56      0.31      0.40       590

    accuracy                           0.68      3640
   macro avg       0.64      0.53      0.57      3640
weighted avg       0.66      0.68      0.65      3640



In [286]:
# Making predictions on GoEmotions taxonomy for cal
y_val_pred_ekman_no_neu = sgdc4.predict(X_val_ekman_no_neu)

# Model evaluation
print(classification_report(y_val_ekman_no_neu, y_val_pred_ekman_no_neu, target_names=class_label_names_ekman_no_neu))

              precision    recall  f1-score   support

       anger       0.59      0.47      0.52       582
     disgust       0.50      0.36      0.42        81
        fear       0.72      0.53      0.61        89
         joy       0.73      0.91      0.81      1997
     sadness       0.66      0.51      0.57       352
    surprise       0.57      0.33      0.42       559

    accuracy                           0.69      3660
   macro avg       0.63      0.52      0.56      3660
weighted avg       0.67      0.69      0.67      3660



### Model 2

In [151]:
sgdc4 = SGDClassifier(random_state=42,max_iter=1000, tol = 0.1, penalty='l1', loss = 'hinge', n_jobs=-1)

sgdc4.fit(X_train_ekman_no_neu, y_train_ekman_no_neu)

In [152]:
# Making predictions on GoEmotions taxonomy for train
y_pred_train_ekman_no_neu = sgdc4.predict(X_train_ekman_no_neu)

#cm = confusion_matrix(y_train, classifier_preds)
#print(cm)

# Model evaluation
print(classification_report(y_train_ekman_no_neu, y_pred_train_ekman_no_neu, target_names=class_label_names_ekman_no_neu))

              precision    recall  f1-score   support

       anger       0.61      0.46      0.52      4517
     disgust       0.36      0.45      0.40       694
        fear       0.72      0.40      0.51       642
         joy       0.71      0.94      0.81     15693
     sadness       0.77      0.47      0.59      2938
    surprise       0.68      0.29      0.41      4707

    accuracy                           0.69     29191
   macro avg       0.64      0.50      0.54     29191
weighted avg       0.69      0.69      0.66     29191



In [153]:
# Making predictions on GoEmotions taxonomy for test
y_test_pred_ekman_no_neu = sgdc4.predict(X_test_ekman_no_neu)

# Model evaluation
print(classification_report(y_test_ekman_no_neu, y_test_pred_ekman_no_neu, target_names=class_label_names_ekman_no_neu))

              precision    recall  f1-score   support

       anger       0.59      0.43      0.49       595
     disgust       0.50      0.50      0.50       112
        fear       0.81      0.49      0.61        87
         joy       0.69      0.93      0.79      1915
     sadness       0.68      0.43      0.52       341
    surprise       0.58      0.26      0.36       590

    accuracy                           0.67      3640
   macro avg       0.64      0.51      0.55      3640
weighted avg       0.65      0.67      0.64      3640



In [154]:
# Making predictions on GoEmotions taxonomy for cal
y_val_pred_ekman_no_neu = sgdc4.predict(X_val_ekman_no_neu)

# Model evaluation
print(classification_report(y_val_ekman_no_neu, y_val_pred_ekman_no_neu, target_names=class_label_names_ekman_no_neu))

              precision    recall  f1-score   support

       anger       0.63      0.46      0.53       582
     disgust       0.34      0.46      0.39        81
        fear       0.74      0.45      0.56        89
         joy       0.71      0.93      0.80      1997
     sadness       0.73      0.43      0.54       352
    surprise       0.62      0.27      0.37       559

    accuracy                           0.68      3660
   macro avg       0.63      0.50      0.53      3660
weighted avg       0.68      0.68      0.65      3660



### KNN

#### KNN model with Goemotion labels (28) including neutral emotion

In [318]:
# Tune Hyperpaprameter using gridsearch cv
# Define KNN 
model_knn = KNeighborsClassifier(n_jobs=-1)

params = {'n_neighbors' : [3,5],
          'p':[1,2],
          'weights' : ['distance']
         }

model_knn_1 = GridSearchCV(estimator = model_knn , cv = 3, param_grid = params, scoring  = 'f1_macro', n_jobs=-1, verbose=1)
model_knn_1.fit(X_train, y_train)


Fitting 3 folds for each of 4 candidates, totalling 12 fits


In [319]:
model_knn_1.best_params_

{'n_neighbors': 3, 'p': 1, 'weights': 'distance'}

In [320]:
model_knn_1.best_score_

0.2001199406407684

In [316]:
model_knn_1.best_params_

{'n_neighbors': 3, 'weights': 'distance'}

In [317]:
model_knn_1.best_score_

0.19744407098789918

In [298]:
model_knn_1.best_params_

{'n_neighbors': 3, 'p': 1}

In [299]:
model_knn_1.best_score_

0.1983198029381603

In [295]:
model_knn_1.best_params_

{'n_neighbors': 3, 'p': 1}

In [296]:
model_knn_1.best_score_

0.24316977654918223

In [292]:
model_knn_1.best_params_

{'n_neighbors': 3}

In [293]:
model_knn_1.best_score_

0.2337019120018429

In [289]:
model_knn_1.best_params_

{'n_neighbors': 3}

In [290]:
model_knn_1.best_score_

0.19617991171342197

### KNN model with Goemotion labels (28) including neutral emotion

- 'n_neighbors': 3
- 'p': 1

#### Model 1

In [687]:
# Building model using best parameter for KNN Model

model_knn_GE = KNeighborsClassifier(n_jobs=-1, n_neighbors=5, p=1)
model_knn_GE.fit(X_train, y_train)

In [688]:
y_train_pred_GE = model_knn_GE.predict(X_train)
y_test_pred_GE = model_knn_GE.predict(X_test)
y_val_pred_GE = model_knn_GE.predict(X_val)



In [689]:
print(" Train set results")
print(accuracy_score(y_train, y_train_pred_GE))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred_GE))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred_GE))

 Train set results
0.29725869615296013
 Test set results
0.21282476506357104
 Val set results
0.21654994471065242


In [690]:
cr = classification_report(y_test, y_test_pred_GE)
print(cr)

              precision    recall  f1-score   support

           0       0.59      0.24      0.34       504
           1       0.81      0.27      0.41       264
           2       0.42      0.16      0.23       198
           3       0.37      0.04      0.07       320
           4       0.37      0.05      0.09       351
           5       0.61      0.13      0.21       135
           6       0.50      0.02      0.04       153
           7       0.29      0.02      0.04       284
           8       0.64      0.11      0.19        83
           9       0.33      0.03      0.06       151
          10       0.34      0.07      0.11       267
          11       0.71      0.12      0.21       123
          12       0.50      0.08      0.14        37
          13       0.42      0.11      0.17       103
          14       0.83      0.24      0.38        78
          15       0.93      0.50      0.65       352
          16       0.00      0.00      0.00         6
          17       0.53    

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [691]:
cr = classification_report(y_val, y_val_pred_GE)
print(cr)

              precision    recall  f1-score   support

           0       0.67      0.29      0.40       488
           1       0.73      0.24      0.36       303
           2       0.47      0.21      0.29       195
           3       0.24      0.01      0.03       303
           4       0.42      0.05      0.09       397
           5       0.29      0.04      0.07       153
           6       0.50      0.02      0.04       152
           7       0.35      0.03      0.05       248
           8       0.82      0.18      0.30        77
           9       0.11      0.01      0.01       163
          10       0.50      0.06      0.11       292
          11       0.62      0.10      0.18        97
          12       0.33      0.03      0.05        35
          13       0.35      0.07      0.12        96
          14       0.75      0.23      0.36        90
          15       0.95      0.53      0.68       358
          16       0.00      0.00      0.00        13
          17       0.71    

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [223]:
# Model evaluation
model_eval(y_test, y_test_pred_GE, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.51,0.25,0.34
amusement,0.75,0.29,0.42
anger,0.41,0.17,0.24
annoyance,0.27,0.06,0.1
approval,0.28,0.06,0.1
caring,0.43,0.15,0.22
confusion,0.16,0.04,0.06
curiosity,0.12,0.03,0.05
desire,0.52,0.13,0.21
disappointment,0.22,0.03,0.06


#### Model 2

In [692]:
# Building model using best parameter for KNN Model

model_knn_GE_1 = KNeighborsClassifier(n_jobs=-1, n_neighbors=7)
model_knn_GE_1.fit(X_train, y_train)

In [693]:
y_train_pred_GE_1 = model_knn_GE_1.predict(X_train)
y_test_pred_GE_1 = model_knn_GE_1.predict(X_test)
y_val_pred_GE_1 = model_knn_GE_1.predict(X_val)



In [694]:
print(" Train set results")
print(accuracy_score(y_train, y_train_pred_GE_1))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred_GE_1))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred_GE_1))

 Train set results
0.2942870306381018
 Test set results
0.25188870462502305
 Val set results
0.25285661629192774


In [695]:
cr = classification_report(y_test, y_test_pred_GE_1)
print(cr)

              precision    recall  f1-score   support

           0       0.66      0.21      0.32       504
           1       0.78      0.22      0.34       264
           2       0.42      0.16      0.23       198
           3       0.36      0.01      0.02       320
           4       0.38      0.03      0.05       351
           5       0.52      0.08      0.14       135
           6       0.67      0.01      0.03       153
           7       0.60      0.02      0.04       284
           8       0.75      0.11      0.19        83
           9       0.27      0.02      0.04       151
          10       0.41      0.05      0.09       267
          11       0.80      0.13      0.22       123
          12       0.75      0.08      0.15        37
          13       0.62      0.13      0.21       103
          14       0.88      0.28      0.43        78
          15       0.93      0.43      0.59       352
          16       0.00      0.00      0.00         6
          17       0.53    

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [696]:
cr = classification_report(y_val, y_val_pred_GE_1)
print(cr)

              precision    recall  f1-score   support

           0       0.72      0.26      0.38       488
           1       0.74      0.19      0.31       303
           2       0.45      0.21      0.28       195
           3       0.33      0.01      0.02       303
           4       0.54      0.03      0.06       397
           5       0.27      0.03      0.05       153
           6       0.67      0.01      0.03       152
           7       0.67      0.02      0.05       248
           8       0.88      0.18      0.30        77
           9       0.00      0.00      0.00       163
          10       0.52      0.04      0.08       292
          11       0.73      0.11      0.20        97
          12       0.00      0.00      0.00        35
          13       0.48      0.12      0.20        96
          14       0.81      0.23      0.36        90
          15       0.93      0.46      0.61       358
          16       0.00      0.00      0.00        13
          17       0.71    

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


#### Model 3 - Final Model

In [697]:
# Building model using best parameter for KNN Model
#KNeighborsClassifier(n_jobs=-1, n_neighbors=3, p=1)

model_knn_GE_2 = KNeighborsClassifier(n_jobs=-1, n_neighbors=7, p=1)
model_knn_GE_2.fit(X_train, y_train)

In [698]:
y_train_pred_GE_2 = model_knn_GE_2.predict(X_train)
y_test_pred_GE_2 = model_knn_GE_2.predict(X_test)
y_val_pred_GE_2 = model_knn_GE_2.predict(X_val)



In [699]:
print(" Train set results")
print(accuracy_score(y_train, y_train_pred_GE_2))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred_GE_2))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred_GE_2))

 Train set results
0.3114950472241419
 Test set results
0.26570849456421597
 Val set results
0.26612605971249537


In [700]:
cr = classification_report(y_test, y_test_pred_GE_2, target_names=GE_taxonomy)
print(cr)

              precision    recall  f1-score   support

           0       0.66      0.24      0.35       504
           1       0.79      0.23      0.35       264
           2       0.44      0.16      0.23       198
           3       0.30      0.01      0.02       320
           4       0.50      0.04      0.08       351
           5       0.48      0.08      0.14       135
           6       0.25      0.01      0.01       153
           7       0.55      0.02      0.04       284
           8       0.67      0.10      0.17        83
           9       0.36      0.03      0.06       151
          10       0.31      0.03      0.05       267
          11       0.75      0.12      0.21       123
          12       1.00      0.03      0.05        37
          13       0.53      0.10      0.16       103
          14       0.88      0.27      0.41        78
          15       0.95      0.48      0.64       352
          16       0.00      0.00      0.00         6
          17       0.53    

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [701]:
cr = classification_report(y_val, y_val_pred_GE_2)
print(cr)

              precision    recall  f1-score   support

           0       0.71      0.28      0.40       488
           1       0.69      0.23      0.34       303
           2       0.46      0.20      0.28       195
           3       0.14      0.00      0.01       303
           4       0.52      0.04      0.08       397
           5       0.33      0.04      0.07       153
           6       0.67      0.01      0.03       152
           7       0.50      0.02      0.05       248
           8       0.88      0.18      0.30        77
           9       0.00      0.00      0.00       163
          10       0.45      0.03      0.06       292
          11       0.67      0.10      0.18        97
          12       0.50      0.03      0.05        35
          13       0.53      0.09      0.16        96
          14       0.77      0.19      0.30        90
          15       0.95      0.48      0.64       358
          16       0.00      0.00      0.00        13
          17       0.70    

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


#### Model 4

In [208]:
# Building model using best parameter for KNN Model
#KNeighborsClassifier(n_jobs=-1, n_neighbors=3, p=1)

model_knn3 = KNeighborsClassifier(n_jobs=-1, n_neighbors=5)
model_knn3.fit(X_train, y_train)

n_jobs=-1, n_neighbors=5

In [209]:
y_train_pred2 = model_knn3.predict(X_train)
y_test_pred2 = model_knn3.predict(X_test)
y_val_pred2 = model_knn3.predict(X_val)



In [210]:
print(" Train set results")
print(accuracy_score(y_train, y_train_pred2))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred2))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred2))

 Train set results
0.29037088228518776
 Test set results
0.20766537681960567
 Val set results
0.2152598599336528


In [211]:
#cr = classification_report(y_test, y_test_pred2)
#print(cr)
# Model evaluation
model_eval(y_test, y_test_pred2, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.61,0.24,0.34
amusement,0.79,0.24,0.37
anger,0.44,0.16,0.24
annoyance,0.42,0.04,0.08
approval,0.3,0.03,0.05
caring,0.52,0.11,0.18
confusion,0.57,0.03,0.05
curiosity,0.2,0.02,0.03
desire,0.64,0.11,0.19
disappointment,0.25,0.02,0.04


In [212]:
#cr = classification_report(y_val, y_val_pred2)
#print(cr)
# Model evaluation
model_eval(y_val, y_val_pred2, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.64,0.26,0.37
amusement,0.74,0.22,0.34
anger,0.49,0.21,0.29
annoyance,0.33,0.02,0.04
approval,0.53,0.05,0.09
caring,0.3,0.04,0.07
confusion,0.67,0.01,0.03
curiosity,0.35,0.03,0.05
desire,0.78,0.18,0.29
disappointment,0.09,0.01,0.01


#### Model 5

In [203]:
# Building model using best parameter for KNN Model
#KNeighborsClassifier(n_jobs=-1, n_neighbors=3, p=1)

model_knn3 = KNeighborsClassifier(n_jobs=-1, n_neighbors=3)
model_knn3.fit(X_train, y_train)

n_jobs=-1, n_neighbors=5

In [204]:
y_train_pred2 = model_knn3.predict(X_train)
y_test_pred2 = model_knn3.predict(X_test)
y_val_pred2 = model_knn3.predict(X_val)



In [205]:
print(" Train set results")
print(accuracy_score(y_train, y_train_pred2))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred2))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred2))

 Train set results
0.3818244644091223
 Test set results
0.22259074995393405
 Val set results
0.22318466642093623


In [206]:
#cr = classification_report(y_test, y_test_pred2)
#print(cr)
# Model evaluation
model_eval(y_test, y_test_pred2, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.51,0.25,0.34
amusement,0.75,0.29,0.42
anger,0.41,0.17,0.24
annoyance,0.27,0.06,0.1
approval,0.28,0.06,0.1
caring,0.43,0.15,0.22
confusion,0.16,0.04,0.06
curiosity,0.12,0.03,0.05
desire,0.52,0.13,0.21
disappointment,0.22,0.03,0.06


In [207]:
#cr = classification_report(y_val, y_val_pred2)
#print(cr)
# Model evaluation
model_eval(y_val, y_val_pred2, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.56,0.29,0.38
amusement,0.71,0.25,0.37
anger,0.41,0.18,0.26
annoyance,0.18,0.04,0.07
approval,0.38,0.07,0.12
caring,0.2,0.07,0.1
confusion,0.24,0.04,0.07
curiosity,0.12,0.03,0.05
desire,0.55,0.21,0.3
disappointment,0.16,0.02,0.04


#### Model 6

In [193]:
# Building model using best parameter for KNN Model
#KNeighborsClassifier(n_jobs=-1, n_neighbors=3, p=1)

model_knn3 = KNeighborsClassifier(n_jobs=-1, n_neighbors=9)
model_knn3.fit(X_train, y_train)

n_jobs=-1, n_neighbors=5

In [194]:
y_train_pred2 = model_knn3.predict(X_train)
y_test_pred2 = model_knn3.predict(X_test)
y_val_pred2 = model_knn3.predict(X_val)



In [195]:
print(" Train set results")
print(accuracy_score(y_train, y_train_pred2))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred2))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred2))

 Train set results
0.2940797051370652
 Test set results
0.2537313432835821
 Val set results
0.2532252119424991


In [196]:
#cr = classification_report(y_test, y_test_pred2)
#print(cr)
# Model evaluation
model_eval(y_test, y_test_pred2, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.65,0.22,0.33
amusement,0.78,0.22,0.34
anger,0.45,0.17,0.25
annoyance,0.43,0.01,0.02
approval,0.59,0.03,0.05
caring,0.6,0.07,0.12
confusion,0.0,0.0,0.0
curiosity,0.75,0.02,0.04
desire,0.73,0.1,0.17
disappointment,0.33,0.01,0.01


In [197]:
#cr = classification_report(y_val, y_val_pred2)
#print(cr)
# Model evaluation
model_eval(y_val, y_val_pred2, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.71,0.25,0.37
amusement,0.71,0.18,0.29
anger,0.48,0.22,0.3
annoyance,0.22,0.01,0.01
approval,0.67,0.03,0.05
caring,0.43,0.02,0.04
confusion,0.67,0.01,0.03
curiosity,0.7,0.03,0.05
desire,0.93,0.17,0.29
disappointment,0.0,0.0,0.0


#### Model 7

In [198]:
# Building model using best parameter for KNN Model
#KNeighborsClassifier(n_jobs=-1, n_neighbors=3, p=1)

model_knn3 = KNeighborsClassifier(n_jobs=-1, n_neighbors=11)
model_knn3.fit(X_train, y_train)

n_jobs=-1, n_neighbors=5

In [199]:
y_train_pred2 = model_knn3.predict(X_train)
y_test_pred2 = model_knn3.predict(X_test)
y_val_pred2 = model_knn3.predict(X_val)



In [200]:
print(" Train set results")
print(accuracy_score(y_train, y_train_pred2))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred2))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred2))

 Train set results
0.28373646625201565
 Test set results
0.25981205085682696
 Val set results
0.2541467010689274


In [201]:
#cr = classification_report(y_test, y_test_pred2)
#print(cr)
# Model evaluation
model_eval(y_test, y_test_pred2, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.69,0.2,0.31
amusement,0.78,0.22,0.34
anger,0.49,0.18,0.26
annoyance,0.4,0.01,0.01
approval,0.6,0.03,0.06
caring,0.67,0.07,0.13
confusion,0.0,0.0,0.0
curiosity,0.83,0.02,0.03
desire,0.78,0.08,0.15
disappointment,1.0,0.01,0.01


In [202]:
#cr = classification_report(y_val, y_val_pred2)
#print(cr)
# Model evaluation
model_eval(y_val, y_val_pred2, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.75,0.23,0.35
amusement,0.65,0.17,0.27
anger,0.52,0.21,0.3
annoyance,0.14,0.0,0.01
approval,0.61,0.03,0.05
caring,0.44,0.03,0.05
confusion,0.67,0.01,0.03
curiosity,0.8,0.02,0.03
desire,0.93,0.17,0.29
disappointment,0.0,0.0,0.0


### KNN model with Goemotion labels (27) excluding neutral emotion hyperparameter tuning

,
          'weights' : ['distance']

In [385]:
# Tune Hyperpaprameter using gridsearch cv
# Define KNN 
model_knn = KNeighborsClassifier(n_jobs=-1)

params = {'n_neighbors' : [3],
          'p':[1],
          'weights' : ['distance']
         }

model_knn_1 = GridSearchCV(estimator = model_knn , cv = 3, param_grid = params, scoring  = 'f1_macro', n_jobs=-1, verbose=1)
model_knn_1.fit(X_train_no_neu, y_train_no_neu)


Fitting 3 folds for each of 1 candidates, totalling 3 fits


In [386]:
model_knn_1.best_params_

{'n_neighbors': 3}

In [387]:
model_knn_1.best_score_

0.22350994670692462

In [378]:
model_knn_1.best_params_

{'n_neighbors': 3, 'p': 1, 'weights': 'distance'}

In [379]:
model_knn_1.best_score_

0.23184676334307786

In [370]:
model_knn_1.best_params_

{'n_neighbors': 3, 'p': 1}

In [371]:
model_knn_1.best_score_

0.22897653556303768

### KNN model with Goemotion labels (28) excluding neutral emotion

- 'n_neighbors': 3
- 'p': 1

#### Model 1

In [372]:
# Building model using best parameter for KNN Model

model_knn2 = KNeighborsClassifier(n_jobs=-1, n_neighbors=3, p=1)
model_knn2.fit(X_train_no_neu, y_train_no_neu)

In [373]:
y_train_pred2 = model_knn2.predict(X_train_no_neu)
y_test_pred2 = model_knn2.predict(X_test_no_neu)
y_val_pred2 = model_knn2.predict(X_val_no_neu)



In [375]:
print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_pred2))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_pred2))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_pred2))

 Train set results
0.3378232582469677
 Test set results
0.18738550117770217
 Val set results
0.18988002086593636


In [376]:
cr = classification_report(y_test_no_neu, y_test_pred2)
print(cr)

              precision    recall  f1-score   support

           0       0.58      0.33      0.42       504
           1       0.78      0.35      0.48       264
           2       0.48      0.22      0.30       198
           3       0.39      0.10      0.16       320
           4       0.33      0.12      0.18       351
           5       0.42      0.19      0.26       135
           6       0.16      0.11      0.13       153
           7       0.20      0.07      0.10       284
           8       0.53      0.12      0.20        83
           9       0.26      0.04      0.07       151
          10       0.27      0.11      0.16       267
          11       0.55      0.20      0.29       123
          12       0.33      0.05      0.09        37
          13       0.28      0.15      0.19       103
          14       0.69      0.31      0.42        78
          15       0.93      0.53      0.68       352
          16       0.50      0.17      0.25         6
          17       0.44    

  _warn_prf(average, modifier, msg_start, len(result))


In [395]:
# Model evaluation
model_eval(y_test_no_neu, y_test_pred2, GE_taxonomy_no_neu)

Unnamed: 0,Precision,Recall,F1
admiration,0.58,0.33,0.42
amusement,0.78,0.35,0.48
anger,0.48,0.22,0.3
annoyance,0.39,0.1,0.16
approval,0.33,0.12,0.18
caring,0.42,0.19,0.26
confusion,0.16,0.11,0.13
curiosity,0.2,0.07,0.1
desire,0.53,0.12,0.2
disappointment,0.26,0.04,0.07


#### Model2

In [381]:
# Building model using best parameter for KNN Model

model_knn2_1 = KNeighborsClassifier(n_jobs=-1, n_neighbors=3, p=1, weights='distance')
model_knn2_1.fit(X_train_no_neu, y_train_no_neu)

In [382]:
y_train_pred2_1 = model_knn2_1.predict(X_train_no_neu)
y_test_pred2_1 = model_knn2_1.predict(X_test_no_neu)
y_val_pred2_1 = model_knn2_1.predict(X_val_no_neu)



In [383]:
print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_pred2_1))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_pred2_1))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_pred2_1))

 Train set results
0.9049596233694053
 Test set results
0.19104946349123267
 Val set results
0.19222743870631195


In [384]:
cr = classification_report(y_test_no_neu, y_test_pred2_1)
print(cr)

              precision    recall  f1-score   support

           0       0.58      0.34      0.43       504
           1       0.78      0.36      0.50       264
           2       0.47      0.23      0.31       198
           3       0.37      0.10      0.16       320
           4       0.32      0.12      0.17       351
           5       0.40      0.19      0.26       135
           6       0.16      0.12      0.14       153
           7       0.20      0.07      0.11       284
           8       0.53      0.12      0.20        83
           9       0.27      0.05      0.08       151
          10       0.24      0.10      0.15       267
          11       0.53      0.20      0.29       123
          12       0.29      0.05      0.09        37
          13       0.28      0.15      0.19       103
          14       0.65      0.31      0.42        78
          15       0.93      0.55      0.69       352
          16       0.50      0.17      0.25         6
          17       0.42    

  _warn_prf(average, modifier, msg_start, len(result))


In [392]:
# Model evaluation
model_eval(y_test_no_neu, y_test_pred2_1, GE_taxonomy_no_neu)

Unnamed: 0,Precision,Recall,F1
admiration,0.58,0.34,0.43
amusement,0.78,0.36,0.5
anger,0.47,0.23,0.31
annoyance,0.37,0.1,0.16
approval,0.32,0.12,0.17
caring,0.4,0.19,0.26
confusion,0.16,0.12,0.14
curiosity,0.2,0.07,0.11
desire,0.53,0.12,0.2
disappointment,0.27,0.05,0.08


#### Model 3

In [213]:
# Building model using best parameter for KNN Model

model_knn2_2 = KNeighborsClassifier(n_jobs=-1, n_neighbors=3)
model_knn2_2.fit(X_train_no_neu, y_train_no_neu)

In [214]:
y_train_pred2_2 = model_knn2_2.predict(X_train_no_neu)
y_test_pred2_2 = model_knn2_2.predict(X_test_no_neu)
y_val_pred2_2 = model_knn2_2.predict(X_val_no_neu)



In [215]:
print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_pred2_2))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_pred2_2))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_pred2_2))

 Train set results
0.32327459378167195
 Test set results
0.18215126930123005
 Val set results
0.18440271257172666


In [216]:
cr = classification_report(y_test_no_neu, y_test_pred2_2)
print(cr)

              precision    recall  f1-score   support

           0       0.56      0.33      0.41       504
           1       0.75      0.31      0.44       264
           2       0.48      0.22      0.30       198
           3       0.31      0.08      0.13       320
           4       0.35      0.15      0.21       351
           5       0.43      0.19      0.27       135
           6       0.24      0.12      0.16       153
           7       0.17      0.05      0.08       284
           8       0.65      0.13      0.22        83
           9       0.30      0.05      0.09       151
          10       0.27      0.10      0.15       267
          11       0.55      0.19      0.28       123
          12       0.25      0.05      0.09        37
          13       0.31      0.17      0.22       103
          14       0.72      0.33      0.46        78
          15       0.86      0.51      0.64       352
          16       1.00      0.17      0.29         6
          17       0.46    

  _warn_prf(average, modifier, msg_start, len(result))


In [217]:
# Model evaluation
model_eval(y_test_no_neu, y_test_pred2_2, GE_taxonomy_no_neu)

Unnamed: 0,Precision,Recall,F1
admiration,0.56,0.33,0.41
amusement,0.75,0.31,0.44
anger,0.48,0.22,0.3
annoyance,0.31,0.08,0.13
approval,0.35,0.15,0.21
caring,0.43,0.19,0.27
confusion,0.24,0.12,0.16
curiosity,0.17,0.05,0.08
desire,0.65,0.13,0.22
disappointment,0.3,0.05,0.09


In [218]:
# Model evaluation
model_eval(y_val_no_neu, y_val_pred2_2, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.56,0.35,0.43
amusement,0.68,0.28,0.39
anger,0.5,0.21,0.3
annoyance,0.28,0.08,0.13
approval,0.34,0.15,0.2
caring,0.35,0.12,0.18
confusion,0.23,0.11,0.14
curiosity,0.14,0.06,0.08
desire,0.55,0.22,0.31
disappointment,0.21,0.04,0.07


#### Model 4

In [613]:
# Building model using best parameter for KNN Model

model_knn2_3 = KNeighborsClassifier(n_jobs=-1, n_neighbors=5)
model_knn2_3.fit(X_train_no_neu, y_train_no_neu)

In [614]:
y_train_pred2_3 = model_knn2_3.predict(X_train_no_neu)
y_test_pred2_3 = model_knn2_3.predict(X_test_no_neu)
y_val_pred2_3 = model_knn2_3.predict(X_val_no_neu)



In [615]:
print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_pred2_3))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_pred2_3))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_pred2_3))

 Train set results
0.22248013862098276
 Test set results
0.16147605338916515
 Val set results
0.1630151278038602


In [616]:
cr = classification_report(y_train_no_neu, y_train_pred2_3)
print(cr)

              precision    recall  f1-score   support

           0       0.78      0.41      0.54      4130
           1       0.88      0.35      0.51      2328
           2       0.68      0.29      0.41      1567
           3       0.64      0.13      0.21      2470
           4       0.68      0.15      0.25      2939
           5       0.67      0.16      0.26      1087
           6       0.80      0.14      0.24      1368
           7       0.61      0.12      0.19      2191
           8       0.79      0.17      0.28       641
           9       0.59      0.07      0.12      1269
          10       0.63      0.15      0.25      2022
          11       0.73      0.20      0.31       793
          12       0.88      0.16      0.27       303
          13       0.74      0.20      0.32       853
          14       0.79      0.32      0.46       596
          15       0.93      0.54      0.68      2662
          16       0.60      0.04      0.07        77
          17       0.71    

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [617]:
cr = classification_report(y_test_no_neu, y_test_pred2_3)
print(cr)

              precision    recall  f1-score   support

           0       0.62      0.28      0.38       504
           1       0.83      0.28      0.41       264
           2       0.48      0.19      0.27       198
           3       0.37      0.07      0.11       320
           4       0.40      0.07      0.12       351
           5       0.49      0.15      0.23       135
           6       0.54      0.10      0.17       153
           7       0.25      0.04      0.06       284
           8       0.75      0.14      0.24        83
           9       0.35      0.05      0.08       151
          10       0.36      0.10      0.16       267
          11       0.71      0.20      0.31       123
          12       0.50      0.08      0.14        37
          13       0.40      0.12      0.18       103
          14       0.81      0.33      0.47        78
          15       0.88      0.46      0.60       352
          16       1.00      0.17      0.29         6
          17       0.52    

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [618]:
cr = classification_report(y_val_no_neu, y_val_pred2_3)
print(cr)

              precision    recall  f1-score   support

           0       0.65      0.34      0.45       488
           1       0.75      0.23      0.35       303
           2       0.56      0.21      0.31       195
           3       0.36      0.05      0.09       303
           4       0.42      0.08      0.14       397
           5       0.57      0.10      0.18       153
           6       0.42      0.05      0.09       152
           7       0.27      0.04      0.07       248
           8       0.81      0.22      0.35        77
           9       0.12      0.01      0.02       163
          10       0.52      0.11      0.19       292
          11       0.48      0.14      0.22        97
          12       1.00      0.09      0.16        35
          13       0.40      0.12      0.19        96
          14       0.83      0.27      0.40        90
          15       0.91      0.49      0.64       358
          16       0.00      0.00      0.00        13
          17       0.59    

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [587]:
# Model evaluation
model_eval(y_test_no_neu, y_test_pred2_3, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.62,0.28,0.38
amusement,0.83,0.28,0.41
anger,0.48,0.19,0.27
annoyance,0.37,0.07,0.11
approval,0.4,0.07,0.12
caring,0.49,0.15,0.23
confusion,0.54,0.1,0.17
curiosity,0.25,0.04,0.06
desire,0.75,0.14,0.24
disappointment,0.35,0.05,0.08


#### Model 5

In [603]:
# Building model using best parameter for KNN Model

model_knn2_4 = KNeighborsClassifier(n_jobs=-1, n_neighbors=7)
model_knn2_4.fit(X_train_no_neu, y_train_no_neu)

In [604]:
y_train_pred2_4 = model_knn2_4.predict(X_train_no_neu)
y_test_pred2_4 = model_knn2_4.predict(X_test_no_neu)
y_val_pred2_4 = model_knn2_4.predict(X_val_no_neu)



In [605]:
print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_pred2_4))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_pred2_4))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_pred2_4))

 Train set results
0.1841632065910354
 Test set results
0.15362470557445695
 Val set results
0.14945226917057902


In [606]:
cr = classification_report(y_train_no_neu, y_train_pred2_4)
print(cr)

              precision    recall  f1-score   support

           0       0.78      0.36      0.50      4130
           1       0.86      0.29      0.43      2328
           2       0.64      0.27      0.38      1567
           3       0.72      0.08      0.15      2470
           4       0.69      0.09      0.16      2939
           5       0.69      0.15      0.24      1087
           6       0.77      0.09      0.16      1368
           7       0.65      0.07      0.13      2191
           8       0.75      0.15      0.25       641
           9       0.55      0.05      0.09      1269
          10       0.64      0.10      0.18      2022
          11       0.74      0.17      0.28       793
          12       0.84      0.09      0.16       303
          13       0.77      0.18      0.29       853
          14       0.80      0.28      0.42       596
          15       0.93      0.49      0.64      2662
          16       0.67      0.03      0.05        77
          17       0.69    

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [607]:
cr = classification_report(y_test_no_neu, y_test_pred2_4)
print(cr)

              precision    recall  f1-score   support

           0       0.64      0.27      0.38       504
           1       0.83      0.26      0.40       264
           2       0.51      0.21      0.29       198
           3       0.54      0.06      0.11       320
           4       0.50      0.06      0.11       351
           5       0.55      0.13      0.21       135
           6       0.47      0.06      0.10       153
           7       0.42      0.03      0.05       284
           8       0.80      0.14      0.24        83
           9       0.43      0.04      0.07       151
          10       0.48      0.08      0.14       267
          11       0.70      0.15      0.25       123
          12       0.25      0.03      0.05        37
          13       0.47      0.14      0.21       103
          14       0.86      0.31      0.45        78
          15       0.92      0.45      0.60       352
          16       1.00      0.17      0.29         6
          17       0.55    

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [619]:
cr = classification_report(y_val_no_neu, y_val_pred2_4)
print(cr)

              precision    recall  f1-score   support

           0       0.70      0.32      0.44       488
           1       0.74      0.22      0.34       303
           2       0.56      0.22      0.32       195
           3       0.48      0.04      0.07       303
           4       0.50      0.06      0.10       397
           5       0.53      0.10      0.17       153
           6       0.42      0.05      0.09       152
           7       0.48      0.04      0.07       248
           8       0.75      0.19      0.31        77
           9       0.11      0.01      0.01       163
          10       0.57      0.09      0.15       292
          11       0.54      0.14      0.23        97
          12       1.00      0.09      0.16        35
          13       0.48      0.12      0.20        96
          14       0.85      0.24      0.38        90
          15       0.92      0.47      0.62       358
          16       0.00      0.00      0.00        13
          17       0.65    

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [587]:
# Model evaluation
model_eval(y_test_no_neu, y_test_pred2_4, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.62,0.28,0.38
amusement,0.83,0.28,0.41
anger,0.48,0.19,0.27
annoyance,0.37,0.07,0.11
approval,0.4,0.07,0.12
caring,0.49,0.15,0.23
confusion,0.54,0.1,0.17
curiosity,0.25,0.04,0.06
desire,0.75,0.14,0.24
disappointment,0.35,0.05,0.08


#### Model 6

In [608]:
# Building model using best parameter for KNN Model

model_knn2_5 = KNeighborsClassifier(n_jobs=-1, n_neighbors=9)
model_knn2_5.fit(X_train_no_neu, y_train_no_neu)

In [609]:
y_train_pred2_5 = model_knn2_5.predict(X_train_no_neu)
y_test_pred2_5 = model_knn2_5.predict(X_test_no_neu)
y_val_pred2_5 = model_knn2_5.predict(X_val_no_neu)



In [610]:
print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_pred2_5))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_pred2_5))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_pred2_5))

 Train set results
0.1645143361558832
 Test set results
0.14237110704004188
 Val set results
0.1387584767866458


In [611]:
cr = classification_report(y_train_no_neu, y_train_pred2_5)
print(cr)

              precision    recall  f1-score   support

           0       0.79      0.33      0.47      4130
           1       0.85      0.27      0.41      2328
           2       0.64      0.25      0.36      1567
           3       0.70      0.07      0.13      2470
           4       0.67      0.07      0.13      2939
           5       0.70      0.11      0.19      1087
           6       0.81      0.08      0.14      1368
           7       0.79      0.06      0.10      2191
           8       0.74      0.14      0.23       641
           9       0.73      0.03      0.06      1269
          10       0.63      0.08      0.14      2022
          11       0.76      0.16      0.26       793
          12       0.89      0.08      0.15       303
          13       0.64      0.17      0.27       853
          14       0.80      0.24      0.37       596
          15       0.94      0.45      0.61      2662
          16       0.00      0.00      0.00        77
          17       0.69    

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [612]:
cr = classification_report(y_test_no_neu, y_test_pred2_5)
print(cr)

              precision    recall  f1-score   support

           0       0.68      0.25      0.37       504
           1       0.79      0.26      0.39       264
           2       0.59      0.21      0.31       198
           3       0.59      0.05      0.10       320
           4       0.54      0.06      0.10       351
           5       0.60      0.11      0.19       135
           6       0.36      0.03      0.05       153
           7       0.41      0.02      0.05       284
           8       0.79      0.13      0.23        83
           9       0.50      0.02      0.04       151
          10       0.45      0.06      0.11       267
          11       0.77      0.16      0.27       123
          12       0.33      0.03      0.05        37
          13       0.42      0.13      0.19       103
          14       0.89      0.31      0.46        78
          15       0.94      0.43      0.59       352
          16       0.00      0.00      0.00         6
          17       0.58    

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [587]:
# Model evaluation
model_eval(y_test_no_neu, y_test_pred2_5, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.62,0.28,0.38
amusement,0.83,0.28,0.41
anger,0.48,0.19,0.27
annoyance,0.37,0.07,0.11
approval,0.4,0.07,0.12
caring,0.49,0.15,0.23
confusion,0.54,0.1,0.17
curiosity,0.25,0.04,0.06
desire,0.75,0.14,0.24
disappointment,0.35,0.05,0.08


#### Model 7 -- Final - Final model

In [153]:
# Building model using best parameter for KNN Model

model_knn2_6 = KNeighborsClassifier(n_jobs=-1, n_neighbors=3, p=1)
model_knn2_6.fit(X_train_no_neu, y_train_no_neu)

In [154]:
y_train_pred2_6 = model_knn2_6.predict(X_train_no_neu)
y_test_pred2_6 = model_knn2_6.predict(X_test_no_neu)
y_val_pred2_6 = model_knn2_6.predict(X_val_no_neu)



In [155]:
print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_pred2_6))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_pred2_6))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_pred2_6))

 Train set results
0.3378232582469677
 Test set results
0.18738550117770217
 Val set results
0.18988002086593636


In [678]:
cr = classification_report(y_train_no_neu, y_train_pred2_6)
print(cr)

              precision    recall  f1-score   support

           0       0.79      0.43      0.55      4130
           1       0.88      0.38      0.53      2328
           2       0.67      0.31      0.42      1567
           3       0.63      0.13      0.21      2470
           4       0.70      0.18      0.29      2939
           5       0.70      0.18      0.29      1087
           6       0.81      0.14      0.24      1368
           7       0.60      0.11      0.19      2191
           8       0.74      0.18      0.29       641
           9       0.58      0.06      0.11      1269
          10       0.68      0.16      0.26      2022
          11       0.74      0.20      0.32       793
          12       0.93      0.14      0.25       303
          13       0.71      0.18      0.28       853
          14       0.86      0.34      0.49       596
          15       0.96      0.56      0.71      2662
          16       0.60      0.04      0.07        77
          17       0.71    

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [679]:
cr = classification_report(y_test_no_neu, y_test_pred2_6)
print(cr)

              precision    recall  f1-score   support

           0       0.64      0.30      0.41       504
           1       0.84      0.32      0.46       264
           2       0.51      0.21      0.30       198
           3       0.49      0.08      0.14       320
           4       0.44      0.08      0.13       351
           5       0.51      0.16      0.24       135
           6       0.38      0.10      0.16       153
           7       0.30      0.05      0.08       284
           8       0.71      0.14      0.24        83
           9       0.35      0.04      0.07       151
          10       0.30      0.08      0.12       267
          11       0.74      0.19      0.30       123
          12       1.00      0.08      0.15        37
          13       0.31      0.08      0.12       103
          14       0.80      0.31      0.44        78
          15       0.92      0.50      0.65       352
          16       1.00      0.17      0.29         6
          17       0.50    

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [680]:
cr = classification_report(y_val_no_neu, y_val_pred2_6)
print(cr)

              precision    recall  f1-score   support

           0       0.66      0.34      0.45       488
           1       0.76      0.27      0.40       303
           2       0.53      0.22      0.31       195
           3       0.33      0.04      0.08       303
           4       0.45      0.08      0.14       397
           5       0.53      0.12      0.19       153
           6       0.36      0.09      0.15       152
           7       0.29      0.04      0.07       248
           8       0.74      0.26      0.38        77
           9       0.21      0.02      0.03       163
          10       0.52      0.12      0.19       292
          11       0.44      0.11      0.18        97
          12       1.00      0.06      0.11        35
          13       0.41      0.11      0.18        96
          14       0.82      0.20      0.32        90
          15       0.94      0.54      0.69       358
          16       0.00      0.00      0.00        13
          17       0.60    

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [674]:
# Model evaluation
model_eval(y_test_no_neu, y_test_pred2_6, GE_taxonomy_no_neu)

Unnamed: 0,Precision,Recall,F1
admiration,0.58,0.33,0.42
amusement,0.78,0.35,0.48
anger,0.48,0.22,0.3
annoyance,0.39,0.1,0.16
approval,0.33,0.12,0.18
caring,0.42,0.19,0.26
confusion,0.16,0.11,0.13
curiosity,0.2,0.07,0.1
desire,0.53,0.12,0.2
disappointment,0.26,0.04,0.07


In [156]:
# Model evaluation
model_eval(y_test_no_neu, y_test_pred2_6, GE_taxonomy_no_neu)

Unnamed: 0,Precision,Recall,F1
admiration,0.58,0.33,0.42
amusement,0.78,0.35,0.48
anger,0.48,0.22,0.3
annoyance,0.39,0.1,0.16
approval,0.33,0.12,0.18
caring,0.42,0.19,0.26
confusion,0.16,0.11,0.13
curiosity,0.2,0.07,0.1
desire,0.53,0.12,0.2
disappointment,0.26,0.04,0.07


In [157]:
# Model evaluation
model_eval(y_val_no_neu, y_val_pred2_6, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.58,0.35,0.44
amusement,0.74,0.31,0.43
anger,0.46,0.23,0.31
annoyance,0.33,0.1,0.15
approval,0.34,0.13,0.19
caring,0.35,0.12,0.18
confusion,0.22,0.14,0.17
curiosity,0.15,0.06,0.09
desire,0.52,0.22,0.31
disappointment,0.19,0.04,0.06


#### Model 8 - Final Model

In [219]:
# Building model using best parameter for KNN Model

model_knn2_7 = KNeighborsClassifier(n_jobs=-1, n_neighbors=7, p=1)
model_knn2_7.fit(X_train_no_neu, y_train_no_neu)

In [220]:
y_train_pred2_7 = model_knn2_7.predict(X_train_no_neu)
y_test_pred2_7 = model_knn2_7.predict(X_test_no_neu)
y_val_pred2_7 = model_knn2_7.predict(X_val_no_neu)



In [221]:
print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_pred2_7))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_pred2_7))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_pred2_7))

 Train set results
0.19279432438617713
 Test set results
0.156241821512693
 Val set results
0.162754303599374


In [222]:
cr = classification_report(y_train_no_neu, y_train_pred2_7)
print(cr)

              precision    recall  f1-score   support

           0       0.78      0.38      0.51      4130
           1       0.86      0.32      0.47      2328
           2       0.64      0.30      0.41      1567
           3       0.73      0.07      0.14      2470
           4       0.69      0.09      0.17      2939
           5       0.66      0.14      0.23      1087
           6       0.78      0.09      0.16      1368
           7       0.68      0.07      0.13      2191
           8       0.71      0.17      0.27       641
           9       0.53      0.05      0.08      1269
          10       0.65      0.11      0.19      2022
          11       0.75      0.17      0.28       793
          12       0.93      0.09      0.16       303
          13       0.68      0.15      0.25       853
          14       0.83      0.25      0.38       596
          15       0.95      0.51      0.67      2662
          16       0.00      0.00      0.00        77
          17       0.67    

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [223]:
cr = classification_report(y_test_no_neu, y_test_pred2_7)
print(cr)

              precision    recall  f1-score   support

           0       0.68      0.28      0.40       504
           1       0.82      0.28      0.42       264
           2       0.52      0.22      0.31       198
           3       0.62      0.06      0.10       320
           4       0.61      0.07      0.13       351
           5       0.56      0.15      0.23       135
           6       0.32      0.05      0.08       153
           7       0.36      0.03      0.05       284
           8       0.67      0.14      0.24        83
           9       0.38      0.03      0.06       151
          10       0.43      0.07      0.12       267
          11       0.72      0.15      0.24       123
          12       1.00      0.05      0.10        37
          13       0.32      0.07      0.11       103
          14       0.85      0.29      0.44        78
          15       0.95      0.48      0.64       352
          16       1.00      0.17      0.29         6
          17       0.56    

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [224]:
cr = classification_report(y_val_no_neu, y_val_pred2_7)
print(cr)

              precision    recall  f1-score   support

           0       0.71      0.33      0.45       488
           1       0.75      0.26      0.39       303
           2       0.54      0.23      0.32       195
           3       0.58      0.04      0.07       303
           4       0.54      0.07      0.12       397
           5       0.48      0.09      0.15       153
           6       0.50      0.07      0.13       152
           7       0.42      0.04      0.07       248
           8       0.69      0.23      0.35        77
           9       0.00      0.00      0.00       163
          10       0.46      0.08      0.13       292
          11       0.54      0.14      0.23        97
          12       1.00      0.06      0.11        35
          13       0.42      0.11      0.18        96
          14       0.80      0.18      0.29        90
          15       0.95      0.50      0.66       358
          16       0.00      0.00      0.00        13
          17       0.68    

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [225]:
# Model evaluation
model_eval(y_test_no_neu, y_test_pred2_7, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.68,0.28,0.4
amusement,0.82,0.28,0.42
anger,0.52,0.22,0.31
annoyance,0.62,0.06,0.1
approval,0.61,0.07,0.13
caring,0.56,0.15,0.23
confusion,0.32,0.05,0.08
curiosity,0.36,0.03,0.05
desire,0.67,0.14,0.24
disappointment,0.38,0.03,0.06


In [226]:
# Model evaluation
model_eval(y_val_no_neu, y_val_pred2_7, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.71,0.33,0.45
amusement,0.75,0.26,0.39
anger,0.54,0.23,0.32
annoyance,0.58,0.04,0.07
approval,0.54,0.07,0.12
caring,0.48,0.09,0.15
confusion,0.5,0.07,0.13
curiosity,0.42,0.04,0.07
desire,0.69,0.23,0.35
disappointment,0.0,0.0,0.0


### KNN model with ekman labels (7) including neutral emotion hyperparameter tuning

,
          'weights' : ['distance']
          
,
          'p':[1],
          'weights' : ['distance']

In [419]:
# Tune Hyperpaprameter using gridsearch cv
# Define KNN 
model_knn = KNeighborsClassifier(n_jobs=-1)

params = {'n_neighbors' : [3],
          'p':[1],
          'weights' : ['distance']
         }

model_knn_1 = GridSearchCV(estimator = model_knn , cv = 3, param_grid = params, scoring  = 'f1_macro', n_jobs=-1, verbose=1)
model_knn_1.fit(X_train_ekman, y_train_ekman)


Fitting 3 folds for each of 1 candidates, totalling 3 fits


In [420]:
model_knn_1.best_params_

{'n_neighbors': 3, 'p': 1, 'weights': 'distance'}

In [421]:
model_knn_1.best_score_

0.3107722942849375

In [413]:
model_knn_1.best_params_

{'n_neighbors': 3, 'weights': 'distance'}

In [414]:
model_knn_1.best_score_

0.3084084922985526

In [400]:
model_knn_1.best_params_

{'n_neighbors': 3, 'p': 1}

In [401]:
model_knn_1.best_score_

0.29917999485012575

In [397]:
model_knn_1.best_params_

{'n_neighbors': 3}

In [398]:
model_knn_1.best_score_

0.298539719068027

### KNN model with ekman labels (7) including neutral emotion

#### Model 1

In [402]:
# Building model using best parameter for KNN Model

model_knn_ekman = KNeighborsClassifier(n_jobs=-1, n_neighbors=3)
model_knn_ekman.fit(X_train_ekman, y_train_ekman)

In [403]:
y_train_pred_ekman = model_knn_ekman.predict(X_train_ekman)
y_test_pred_ekman = model_knn_ekman.predict(X_test_ekman)
y_val_pred_ekman = model_knn_ekman.predict(X_val_ekman)



In [404]:
print(" Train set results")
print(accuracy_score(y_train_ekman, y_train_pred_ekman))
print(" Test set results")
print(accuracy_score(y_test_ekman, y_test_pred_ekman))
print(" Val set results")
print(accuracy_score(y_val_ekman, y_val_pred_ekman))

 Train set results
0.6056208246947707
 Test set results
0.39911553344389167
 Val set results
0.3964246221894582


In [427]:
cr = classification_report(y_train_ekman, y_train_pred_ekman)
print(cr)

              precision    recall  f1-score   support

           0       0.31      0.74      0.44      4517
           1       0.49      0.49      0.49       694
           2       0.43      0.54      0.48       642
           3       0.77      0.68      0.72     15693
           4       0.69      0.40      0.51      2938
           5       0.74      0.36      0.48      4707
           6       0.65      0.61      0.63     14219

    accuracy                           0.61     43410
   macro avg       0.58      0.55      0.54     43410
weighted avg       0.67      0.61      0.62     43410



In [406]:
cr = classification_report(y_test_ekman, y_test_pred_ekman)
print(cr)

              precision    recall  f1-score   support

           0       0.17      0.43      0.25       595
           1       0.28      0.22      0.25       112
           2       0.25      0.30      0.27        87
           3       0.62      0.49      0.55      1915
           4       0.42      0.24      0.30       341
           5       0.30      0.11      0.16       590
           6       0.42      0.43      0.43      1787

    accuracy                           0.40      5427
   macro avg       0.35      0.32      0.32      5427
weighted avg       0.44      0.40      0.41      5427



#### Model 2

In [408]:
# Building model using best parameter for KNN Model

model_knn_ekman_1 = KNeighborsClassifier(n_jobs=-1, n_neighbors=3, p=1)
model_knn_ekman_1.fit(X_train_ekman, y_train_ekman)

In [409]:
y_train_pred_ekman_1 = model_knn_ekman_1.predict(X_train_ekman)
y_test_pred_ekman_1 = model_knn_ekman_1.predict(X_test_ekman)
y_val_pred_ekman_1 = model_knn_ekman_1.predict(X_val_ekman)



In [410]:
print(" Train set results")
print(accuracy_score(y_train_ekman, y_train_pred_ekman_1))
print(" Test set results")
print(accuracy_score(y_test_ekman, y_test_pred_ekman_1))
print(" Val set results")
print(accuracy_score(y_val_ekman, y_val_pred_ekman_1))

 Train set results
0.6093757198802119
 Test set results
0.4085129906025428
 Val set results
0.40600810910431256


In [411]:
cr = classification_report(y_test_ekman, y_test_pred_ekman_1)
print(cr)

              precision    recall  f1-score   support

           0       0.17      0.46      0.25       595
           1       0.24      0.20      0.22       112
           2       0.26      0.30      0.28        87
           3       0.65      0.50      0.56      1915
           4       0.46      0.24      0.31       341
           5       0.33      0.12      0.17       590
           6       0.44      0.44      0.44      1787

    accuracy                           0.41      5427
   macro avg       0.36      0.32      0.32      5427
weighted avg       0.47      0.41      0.42      5427



#### Model 3

In [415]:
# Building model using best parameter for KNN Model

model_knn_ekman_2 = KNeighborsClassifier(n_jobs=-1, n_neighbors=3, weights='distance')
model_knn_ekman_2.fit(X_train_ekman, y_train_ekman)

In [416]:
y_train_pred_ekman_2 = model_knn_ekman_2.predict(X_train_ekman)
y_test_pred_ekman_2 = model_knn_ekman_2.predict(X_test_ekman)
y_val_pred_ekman_2 = model_knn_ekman_2.predict(X_val_ekman)



In [417]:
print(" Train set results")
print(accuracy_score(y_train_ekman, y_train_pred_ekman_2))
print(" Test set results")
print(accuracy_score(y_test_ekman, y_test_pred_ekman_2))
print(" Val set results")
print(accuracy_score(y_val_ekman, y_val_pred_ekman_2))

 Train set results
0.8885970974429855
 Test set results
0.4013266998341625
 Val set results
0.4098783634353115


In [418]:
cr = classification_report(y_test_ekman, y_test_pred_ekman_2)
print(cr)

              precision    recall  f1-score   support

           0       0.18      0.33      0.23       595
           1       0.28      0.21      0.24       112
           2       0.22      0.25      0.24        87
           3       0.63      0.49      0.55      1915
           4       0.35      0.26      0.30       341
           5       0.22      0.14      0.17       590
           6       0.41      0.47      0.44      1787

    accuracy                           0.40      5427
   macro avg       0.33      0.31      0.31      5427
weighted avg       0.43      0.40      0.41      5427



#### Model 4

In [422]:
# Building model using best parameter for KNN Model

model_knn_ekman_3 = KNeighborsClassifier(n_jobs=-1, n_neighbors=3, weights='distance', p=1)
model_knn_ekman_3.fit(X_train_ekman, y_train_ekman)

In [423]:
y_train_pred_ekman_3 = model_knn_ekman_3.predict(X_train_ekman)
y_test_pred_ekman_3 = model_knn_ekman_3.predict(X_test_ekman)
y_val_pred_ekman_3 = model_knn_ekman_3.predict(X_val_ekman)



In [424]:
print(" Train set results")
print(accuracy_score(y_train_ekman, y_train_pred_ekman_3))
print(" Test set results")
print(accuracy_score(y_test_ekman, y_test_pred_ekman_3))
print(" Val set results")
print(accuracy_score(y_val_ekman, y_val_pred_ekman_3))

 Train set results
0.8885970974429855
 Test set results
0.4131195872489405
 Val set results
0.4130114264651677


In [426]:
cr = classification_report(y_train_ekman, y_train_pred_ekman_3)
print(cr)

              precision    recall  f1-score   support

           0       0.62      0.95      0.75      4517
           1       0.87      0.88      0.88       694
           2       0.78      0.90      0.83       642
           3       0.95      0.94      0.95     15693
           4       0.93      0.92      0.92      2938
           5       0.96      0.83      0.89      4707
           6       0.94      0.82      0.88     14219

    accuracy                           0.89     43410
   macro avg       0.86      0.89      0.87     43410
weighted avg       0.91      0.89      0.89     43410



In [425]:
cr = classification_report(y_test_ekman, y_test_pred_ekman_3)
print(cr)

              precision    recall  f1-score   support

           0       0.17      0.34      0.23       595
           1       0.24      0.20      0.22       112
           2       0.22      0.24      0.23        87
           3       0.65      0.51      0.57      1915
           4       0.37      0.26      0.31       341
           5       0.25      0.15      0.18       590
           6       0.43      0.48      0.45      1787

    accuracy                           0.41      5427
   macro avg       0.33      0.31      0.31      5427
weighted avg       0.45      0.41      0.42      5427



#### Model 5

In [453]:
# Building model using best parameter for KNN Model

model_knn_ekman_4 = KNeighborsClassifier(n_jobs=-1, n_neighbors=15)
model_knn_ekman_4.fit(X_train_ekman, y_train_ekman)

In [454]:
y_train_pred_ekman_4 = model_knn_ekman_4.predict(X_train_ekman)
y_test_pred_ekman_4 = model_knn_ekman_4.predict(X_test_ekman)
y_val_pred_ekman_4 = model_knn_ekman_4.predict(X_val_ekman)



In [455]:
print(" Train set results")
print(accuracy_score(y_train_ekman, y_train_pred_ekman_4))
print(" Test set results")
print(accuracy_score(y_test_ekman, y_test_pred_ekman_4))
print(" Val set results")
print(accuracy_score(y_val_ekman, y_val_pred_ekman_4))

 Train set results
0.514351531905091
 Test set results
0.4750322461765248
 Val set results
0.48193881312200515


In [456]:
cr = classification_report(y_train_ekman, y_train_pred_ekman_4)
print(cr)

              precision    recall  f1-score   support

           0       0.51      0.22      0.31      4517
           1       0.59      0.26      0.36       694
           2       0.69      0.29      0.41       642
           3       0.73      0.49      0.59     15693
           4       0.66      0.20      0.31      2938
           5       0.58      0.14      0.22      4707
           6       0.43      0.85      0.57     14219

    accuracy                           0.51     43410
   macro avg       0.60      0.35      0.39     43410
weighted avg       0.58      0.51      0.49     43410



In [457]:
cr = classification_report(y_test_ekman, y_test_pred_ekman_4)
print(cr)

              precision    recall  f1-score   support

           0       0.44      0.20      0.27       595
           1       0.54      0.25      0.34       112
           2       0.76      0.32      0.45        87
           3       0.65      0.44      0.53      1915
           4       0.64      0.19      0.29       341
           5       0.42      0.11      0.17       590
           6       0.41      0.80      0.54      1787

    accuracy                           0.48      5427
   macro avg       0.55      0.33      0.37      5427
weighted avg       0.52      0.48      0.44      5427



#### Model 6 - Final

In [227]:
# Building model using best parameter for KNN Model

model_knn_ekman_5 = KNeighborsClassifier(n_jobs=-1, n_neighbors=13)
model_knn_ekman_5.fit(X_train_ekman, y_train_ekman)

In [228]:
y_train_pred_ekman_5 = model_knn_ekman_5.predict(X_train_ekman)
y_test_pred_ekman_5 = model_knn_ekman_5.predict(X_test_ekman)
y_val_pred_ekman_5 = model_knn_ekman_5.predict(X_val_ekman)



In [229]:
print(" Train set results")
print(accuracy_score(y_train_ekman, y_train_pred_ekman_5))
print(" Test set results")
print(accuracy_score(y_test_ekman, y_test_pred_ekman_5))
print(" Val set results")
print(accuracy_score(y_val_ekman, y_val_pred_ekman_5))

 Train set results
0.5252476387929048
 Test set results
0.48240280081076103
 Val set results
0.47899004791743455


In [230]:
cr = classification_report(y_train_ekman, y_train_pred_ekman_5)
print(cr)

              precision    recall  f1-score   support

           0       0.50      0.23      0.32      4517
           1       0.55      0.26      0.35       694
           2       0.68      0.31      0.43       642
           3       0.73      0.52      0.61     15693
           4       0.65      0.21      0.32      2938
           5       0.57      0.15      0.24      4707
           6       0.44      0.84      0.57     14219

    accuracy                           0.53     43410
   macro avg       0.59      0.36      0.40     43410
weighted avg       0.58      0.53      0.50     43410



In [231]:
cr = classification_report(y_test_ekman, y_test_pred_ekman_5, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.45      0.21      0.29       595
     disgust       0.57      0.27      0.36       112
        fear       0.75      0.38      0.50        87
         joy       0.66      0.45      0.54      1915
     sadness       0.66      0.20      0.30       341
    surprise       0.39      0.11      0.17       590
     neutral       0.41      0.80      0.55      1787

    accuracy                           0.48      5427
   macro avg       0.56      0.35      0.39      5427
weighted avg       0.52      0.48      0.45      5427



In [232]:
cr = classification_report(y_val_ekman, y_val_pred_ekman_5, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.37      0.18      0.24       582
     disgust       0.45      0.31      0.36        81
        fear       0.69      0.33      0.44        89
         joy       0.69      0.46      0.55      1997
     sadness       0.60      0.16      0.26       352
    surprise       0.42      0.12      0.19       559
     neutral       0.40      0.79      0.53      1766

    accuracy                           0.48      5426
   macro avg       0.52      0.34      0.37      5426
weighted avg       0.52      0.48      0.45      5426



#### Model 7

In [443]:
# Building model using best parameter for KNN Model

model_knn_ekman_6 = KNeighborsClassifier(n_jobs=-1, n_neighbors=11)
model_knn_ekman_6.fit(X_train_ekman, y_train_ekman)

In [444]:
y_train_pred_ekman_6 = model_knn_ekman_6.predict(X_train_ekman)
y_test_pred_ekman_6 = model_knn_ekman_6.predict(X_test_ekman)
y_val_pred_ekman_6 = model_knn_ekman_6.predict(X_val_ekman)



In [445]:
print(" Train set results")
print(accuracy_score(y_train_ekman, y_train_pred_ekman_6))
print(" Test set results")
print(accuracy_score(y_test_ekman, y_test_pred_ekman_6))
print(" Val set results")
print(accuracy_score(y_val_ekman, y_val_pred_ekman_6))

 Train set results
0.5344160331720802
 Test set results
0.47779620416436336
 Val set results
0.47880575009214893


In [446]:
cr = classification_report(y_train_ekman, y_train_pred_ekman_6)
print(cr)

              precision    recall  f1-score   support

           0       0.49      0.25      0.33      4517
           1       0.58      0.26      0.36       694
           2       0.66      0.33      0.44       642
           3       0.73      0.54      0.62     15693
           4       0.67      0.23      0.34      2938
           5       0.54      0.17      0.26      4707
           6       0.44      0.83      0.58     14219

    accuracy                           0.53     43410
   macro avg       0.59      0.37      0.42     43410
weighted avg       0.58      0.53      0.51     43410



In [447]:
cr = classification_report(y_test_ekman, y_test_pred_ekman_6)
print(cr)

              precision    recall  f1-score   support

           0       0.43      0.19      0.27       595
           1       0.54      0.22      0.32       112
           2       0.71      0.39      0.50        87
           3       0.65      0.46      0.54      1915
           4       0.62      0.19      0.29       341
           5       0.38      0.12      0.18       590
           6       0.41      0.78      0.54      1787

    accuracy                           0.48      5427
   macro avg       0.53      0.34      0.38      5427
weighted avg       0.51      0.48      0.45      5427



#### Model 8

In [233]:
# Building model using best parameter for KNN Model

model_knn_ekman_7 = KNeighborsClassifier(n_jobs=-1, n_neighbors=15)
model_knn_ekman_7.fit(X_train_ekman, y_train_ekman)

In [234]:
y_train_pred_ekman_7 = model_knn_ekman_7.predict(X_train_ekman)
y_test_pred_ekman_7 = model_knn_ekman_7.predict(X_test_ekman)
y_val_pred_ekman_7 = model_knn_ekman_7.predict(X_val_ekman)



In [235]:
print(" Train set results")
print(accuracy_score(y_train_ekman, y_train_pred_ekman_7))
print(" Test set results")
print(accuracy_score(y_test_ekman, y_test_pred_ekman_7))
print(" Val set results")
print(accuracy_score(y_val_ekman, y_val_pred_ekman_7))

 Train set results
0.514351531905091
 Test set results
0.4750322461765248
 Val set results
0.48193881312200515


In [236]:
cr = classification_report(y_train_ekman, y_train_pred_ekman_7)
print(cr)

              precision    recall  f1-score   support

           0       0.51      0.22      0.31      4517
           1       0.59      0.26      0.36       694
           2       0.69      0.29      0.41       642
           3       0.73      0.49      0.59     15693
           4       0.66      0.20      0.31      2938
           5       0.58      0.14      0.22      4707
           6       0.43      0.85      0.57     14219

    accuracy                           0.51     43410
   macro avg       0.60      0.35      0.39     43410
weighted avg       0.58      0.51      0.49     43410



In [237]:
cr = classification_report(y_test_ekman, y_test_pred_ekman_7, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.44      0.20      0.27       595
     disgust       0.54      0.25      0.34       112
        fear       0.76      0.32      0.45        87
         joy       0.65      0.44      0.53      1915
     sadness       0.64      0.19      0.29       341
    surprise       0.42      0.11      0.17       590
     neutral       0.41      0.80      0.54      1787

    accuracy                           0.48      5427
   macro avg       0.55      0.33      0.37      5427
weighted avg       0.52      0.48      0.44      5427



In [238]:
cr = classification_report(y_val_ekman, y_val_pred_ekman_7, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.40      0.18      0.24       582
     disgust       0.50      0.32      0.39        81
        fear       0.70      0.31      0.43        89
         joy       0.70      0.46      0.55      1997
     sadness       0.62      0.17      0.27       352
    surprise       0.41      0.11      0.17       559
     neutral       0.40      0.81      0.54      1766

    accuracy                           0.48      5426
   macro avg       0.53      0.34      0.37      5426
weighted avg       0.53      0.48      0.45      5426



### KNN model with ekman labels (6) excluding neutral emotion -- hyperparameter tuning

,
          'weights' : ['distance']
          
,
          'p':[1],
          'weights' : ['distance']

In [491]:
# Tune Hyperpaprameter using gridsearch cv
# Define KNN 
model_knn = KNeighborsClassifier(n_jobs=-1)

params = {'n_neighbors' : [7,9,5]
         }

model_knn_1 = GridSearchCV(estimator = model_knn , cv = 3, param_grid = params, scoring  = 'f1_macro', n_jobs=-1, verbose=1)
model_knn_1.fit(X_train_ekman_no_neu, y_train_ekman_no_neu)


Fitting 3 folds for each of 3 candidates, totalling 9 fits


In [492]:
model_knn_1.best_params_

{'n_neighbors': 7}

In [493]:
model_knn_1.best_score_

0.39114942282616944

In [489]:
model_knn_1.best_params_

{'n_neighbors': 9}

In [490]:
model_knn_1.best_score_

0.38856435293618724

### KNN model with ekman labels (6) excluding neutral emotion

#### Model 1

In [504]:
# Building model using best parameter for KNN Model

model_knn_ekman_no_neu = KNeighborsClassifier(n_jobs=-1, n_neighbors=7)
model_knn_ekman_no_neu.fit(X_train_ekman_no_neu, y_train_ekman_no_neu)

In [498]:
y_train_pred_ekman_no_neu = model_knn_ekman_no_neu.predict(X_train_ekman_no_neu)
y_test_pred_ekman_no_neu = model_knn_ekman_no_neu.predict(X_test_ekman_no_neu)
y_val_pred_ekman_no_neu = model_knn_ekman_no_neu.predict(X_val_ekman_no_neu)



In [500]:
print(" Train set results")
print(accuracy_score(y_train_ekman_no_neu, y_train_pred_ekman_no_neu))
print(" Test set results")
print(accuracy_score(y_test_ekman_no_neu, y_test_pred_ekman_no_neu))
print(" Val set results")
print(accuracy_score(y_val_ekman_no_neu, y_val_pred_ekman_no_neu))

 Train set results
0.6234113254085163
 Test set results
0.5151098901098901
 Val set results
0.5284153005464481


In [501]:
cr = classification_report(y_train_ekman_no_neu, y_train_pred_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

           0       0.34      0.68      0.46      4517
           1       0.59      0.34      0.43       694
           2       0.69      0.37      0.48       642
           3       0.78      0.77      0.77     15693
           4       0.73      0.32      0.44      2938
           5       0.61      0.36      0.45      4707

    accuracy                           0.62     29191
   macro avg       0.62      0.47      0.51     29191
weighted avg       0.67      0.62      0.62     29191



In [502]:
cr = classification_report(y_test_ekman_no_neu, y_test_pred_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

           0       0.26      0.57      0.36       595
           1       0.52      0.22      0.31       112
           2       0.74      0.39      0.51        87
           3       0.71      0.66      0.68      1915
           4       0.56      0.25      0.35       341
           5       0.41      0.23      0.29       590

    accuracy                           0.52      3640
   macro avg       0.53      0.39      0.42      3640
weighted avg       0.57      0.52      0.52      3640



#### Model 2

In [505]:
# Building model using best parameter for KNN Model

model_knn_ekman_no_neu_1 = KNeighborsClassifier(n_jobs=-1, n_neighbors=9)
model_knn_ekman_no_neu_1.fit(X_train_ekman_no_neu, y_train_ekman_no_neu)

In [506]:
y_train_pred_ekman_no_neu_1 = model_knn_ekman_no_neu_1.predict(X_train_ekman_no_neu)
y_test_pred_ekman_no_neu_1 = model_knn_ekman_no_neu_1.predict(X_test_ekman_no_neu)
y_val_pred_ekman_no_neu_1 = model_knn_ekman_no_neu_1.predict(X_val_ekman_no_neu)



In [507]:
print(" Train set results")
print(accuracy_score(y_train_ekman_no_neu, y_train_pred_ekman_no_neu_1))
print(" Test set results")
print(accuracy_score(y_test_ekman_no_neu, y_test_pred_ekman_no_neu_1))
print(" Val set results")
print(accuracy_score(y_val_ekman_no_neu, y_val_pred_ekman_no_neu_1))

 Train set results
0.6058716727758555
 Test set results
0.5409340659340659
 Val set results
0.5467213114754098


In [508]:
cr = classification_report(y_train_ekman_no_neu, y_train_pred_ekman_no_neu_1)
print(cr)

              precision    recall  f1-score   support

           0       0.33      0.66      0.44      4517
           1       0.60      0.33      0.42       694
           2       0.68      0.38      0.48       642
           3       0.76      0.76      0.76     15693
           4       0.72      0.30      0.42      2938
           5       0.61      0.29      0.39      4707

    accuracy                           0.61     29191
   macro avg       0.62      0.45      0.49     29191
weighted avg       0.66      0.61      0.60     29191



In [509]:
cr = classification_report(y_test_ekman_no_neu, y_test_pred_ekman_no_neu_1)
print(cr)

              precision    recall  f1-score   support

           0       0.28      0.55      0.37       595
           1       0.66      0.28      0.39       112
           2       0.80      0.46      0.58        87
           3       0.69      0.72      0.70      1915
           4       0.58      0.24      0.34       341
           5       0.44      0.20      0.27       590

    accuracy                           0.54      3640
   macro avg       0.58      0.41      0.44      3640
weighted avg       0.57      0.54      0.53      3640



#### Model 3 - Final

In [239]:
# Building model using best parameter for KNN Model

model_knn_ekman_no_neu_2 = KNeighborsClassifier(n_jobs=-1, n_neighbors=11)
model_knn_ekman_no_neu_2.fit(X_train_ekman_no_neu, y_train_ekman_no_neu)

In [240]:
y_train_pred_ekman_no_neu_2 = model_knn_ekman_no_neu_2.predict(X_train_ekman_no_neu)
y_test_pred_ekman_no_neu_2 = model_knn_ekman_no_neu_2.predict(X_test_ekman_no_neu)
y_val_pred_ekman_no_neu_2 = model_knn_ekman_no_neu_2.predict(X_val_ekman_no_neu)



In [241]:
print(" Train set results")
print(accuracy_score(y_train_ekman_no_neu, y_train_pred_ekman_no_neu_2))
print(" Test set results")
print(accuracy_score(y_test_ekman_no_neu, y_test_pred_ekman_no_neu_2))
print(" Val set results")
print(accuracy_score(y_val_ekman_no_neu, y_val_pred_ekman_no_neu_2))

 Train set results
0.6639717721215443
 Test set results
0.5906593406593407
 Val set results
0.5983606557377049


In [242]:
cr = classification_report(y_train_ekman_no_neu, y_train_pred_ekman_no_neu_2)
print(cr)

              precision    recall  f1-score   support

           0       0.51      0.58      0.55      4517
           1       0.62      0.32      0.42       694
           2       0.69      0.34      0.46       642
           3       0.71      0.91      0.79     15693
           4       0.72      0.27      0.40      2938
           5       0.60      0.27      0.38      4707

    accuracy                           0.66     29191
   macro avg       0.64      0.45      0.50     29191
weighted avg       0.66      0.66      0.63     29191



In [243]:
cr = classification_report(y_test_ekman_no_neu, y_test_pred_ekman_no_neu_2, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.38      0.41      0.40       595
     disgust       0.71      0.30      0.42       112
        fear       0.80      0.45      0.57        87
         joy       0.65      0.86      0.74      1915
     sadness       0.58      0.22      0.32       341
    surprise       0.48      0.20      0.28       590

    accuracy                           0.59      3640
   macro avg       0.60      0.41      0.46      3640
weighted avg       0.58      0.59      0.56      3640



In [244]:
cr = classification_report(y_val_ekman_no_neu, y_val_pred_ekman_no_neu_2, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.40      0.43      0.42       582
     disgust       0.52      0.32      0.40        81
        fear       0.68      0.34      0.45        89
         joy       0.66      0.85      0.74      1997
     sadness       0.60      0.20      0.30       352
    surprise       0.44      0.20      0.27       559

    accuracy                           0.60      3660
   macro avg       0.55      0.39      0.43      3660
weighted avg       0.58      0.60      0.56      3660



#### Model 4

In [245]:
# Building model using best parameter for KNN Model

model_knn_ekman_no_neu_3 = KNeighborsClassifier(n_jobs=-1, n_neighbors=13)
model_knn_ekman_no_neu_3.fit(X_train_ekman_no_neu, y_train_ekman_no_neu)

In [246]:
y_train_pred_ekman_no_neu_3 = model_knn_ekman_no_neu_3.predict(X_train_ekman_no_neu)
y_test_pred_ekman_no_neu_3 = model_knn_ekman_no_neu_3.predict(X_test_ekman_no_neu)
y_val_pred_ekman_no_neu_3 = model_knn_ekman_no_neu_3.predict(X_val_ekman_no_neu)



In [247]:
print(" Train set results")
print(accuracy_score(y_train_ekman_no_neu, y_train_pred_ekman_no_neu_3))
print(" Test set results")
print(accuracy_score(y_test_ekman_no_neu, y_test_pred_ekman_no_neu_3))
print(" Val set results")
print(accuracy_score(y_val_ekman_no_neu, y_val_pred_ekman_no_neu_3))

 Train set results
0.6633551437086773
 Test set results
0.5854395604395605
 Val set results
0.5972677595628415


In [248]:
cr = classification_report(y_train_ekman_no_neu, y_train_pred_ekman_no_neu_3)
print(cr)

              precision    recall  f1-score   support

           0       0.57      0.55      0.56      4517
           1       0.61      0.31      0.41       694
           2       0.71      0.33      0.45       642
           3       0.69      0.92      0.79     15693
           4       0.71      0.25      0.37      2938
           5       0.59      0.28      0.38      4707

    accuracy                           0.66     29191
   macro avg       0.65      0.44      0.49     29191
weighted avg       0.66      0.66      0.63     29191



In [249]:
cr = classification_report(y_test_ekman_no_neu, y_test_pred_ekman_no_neu_3, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.41      0.39      0.40       595
     disgust       0.71      0.30      0.42       112
        fear       0.79      0.43      0.55        87
         joy       0.65      0.84      0.73      1915
     sadness       0.58      0.21      0.31       341
    surprise       0.40      0.25      0.31       590

    accuracy                           0.59      3640
   macro avg       0.59      0.40      0.46      3640
weighted avg       0.57      0.59      0.56      3640



In [250]:
cr = classification_report(y_val_ekman_no_neu, y_val_pred_ekman_no_neu_3, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.41      0.39      0.40       582
     disgust       0.51      0.33      0.40        81
        fear       0.66      0.33      0.44        89
         joy       0.66      0.85      0.74      1997
     sadness       0.60      0.20      0.30       352
    surprise       0.42      0.26      0.32       559

    accuracy                           0.60      3660
   macro avg       0.54      0.39      0.43      3660
weighted avg       0.58      0.60      0.57      3660



#### Model 5

In [525]:
# Building model using best parameter for KNN Model

model_knn_ekman_no_neu_4 = KNeighborsClassifier(n_jobs=-1, n_neighbors=15)
model_knn_ekman_no_neu_4.fit(X_train_ekman_no_neu, y_train_ekman_no_neu)

In [526]:
y_train_pred_ekman_no_neu_4 = model_knn_ekman_no_neu_4.predict(X_train_ekman_no_neu)
y_test_pred_ekman_no_neu_4 = model_knn_ekman_no_neu_4.predict(X_test_ekman_no_neu)
y_val_pred_ekman_no_neu_4 = model_knn_ekman_no_neu_4.predict(X_val_ekman_no_neu)



In [527]:
print(" Train set results")
print(accuracy_score(y_train_ekman_no_neu, y_train_pred_ekman_no_neu_4))
print(" Test set results")
print(accuracy_score(y_test_ekman_no_neu, y_test_pred_ekman_no_neu_4))
print(" Val set results")
print(accuracy_score(y_val_ekman_no_neu, y_val_pred_ekman_no_neu_4))

 Train set results
0.6461580624164982
 Test set results
0.5906593406593407
 Val set results
0.6084699453551913


In [528]:
cr = classification_report(y_train_ekman_no_neu, y_train_pred_ekman_no_neu_4)
print(cr)

              precision    recall  f1-score   support

           0       0.57      0.41      0.48      4517
           1       0.62      0.29      0.40       694
           2       0.69      0.31      0.43       642
           3       0.66      0.93      0.77     15693
           4       0.70      0.24      0.36      2938
           5       0.56      0.29      0.39      4707

    accuracy                           0.65     29191
   macro avg       0.63      0.41      0.47     29191
weighted avg       0.64      0.65      0.61     29191



In [529]:
cr = classification_report(y_test_ekman_no_neu, y_test_pred_ekman_no_neu_4)
print(cr)

              precision    recall  f1-score   support

           0       0.45      0.37      0.40       595
           1       0.70      0.29      0.41       112
           2       0.78      0.40      0.53        87
           3       0.63      0.88      0.73      1915
           4       0.61      0.20      0.30       341
           5       0.44      0.20      0.28       590

    accuracy                           0.59      3640
   macro avg       0.60      0.39      0.44      3640
weighted avg       0.57      0.59      0.55      3640



#### Model 6

In [530]:
# Building model using best parameter for KNN Model

model_knn_ekman_no_neu_5 = KNeighborsClassifier(n_jobs=-1, n_neighbors=17)
model_knn_ekman_no_neu_5.fit(X_train_ekman_no_neu, y_train_ekman_no_neu)

In [531]:
y_train_pred_ekman_no_neu_5 = model_knn_ekman_no_neu_5.predict(X_train_ekman_no_neu)
y_test_pred_ekman_no_neu_5 = model_knn_ekman_no_neu_5.predict(X_test_ekman_no_neu)
y_val_pred_ekman_no_neu_5 = model_knn_ekman_no_neu_5.predict(X_val_ekman_no_neu)



In [532]:
print(" Train set results")
print(accuracy_score(y_train_ekman_no_neu, y_train_pred_ekman_no_neu_5))
print(" Test set results")
print(accuracy_score(y_test_ekman_no_neu, y_test_pred_ekman_no_neu_5))
print(" Val set results")
print(accuracy_score(y_val_ekman_no_neu, y_val_pred_ekman_no_neu_5))

 Train set results
0.6382104073173238
 Test set results
0.5967032967032967
 Val set results
0.6131147540983607


In [533]:
cr = classification_report(y_train_ekman_no_neu, y_train_pred_ekman_no_neu_5)
print(cr)

              precision    recall  f1-score   support

           0       0.58      0.35      0.44      4517
           1       0.64      0.28      0.39       694
           2       0.68      0.29      0.41       642
           3       0.65      0.94      0.77     15693
           4       0.71      0.24      0.35      2938
           5       0.58      0.27      0.37      4707

    accuracy                           0.64     29191
   macro avg       0.64      0.39      0.45     29191
weighted avg       0.63      0.64      0.59     29191



In [534]:
cr = classification_report(y_test_ekman_no_neu, y_test_pred_ekman_no_neu_5)
print(cr)

              precision    recall  f1-score   support

           0       0.48      0.34      0.39       595
           1       0.69      0.24      0.36       112
           2       0.76      0.37      0.50        87
           3       0.62      0.90      0.73      1915
           4       0.65      0.21      0.32       341
           5       0.48      0.18      0.27       590

    accuracy                           0.60      3640
   macro avg       0.61      0.37      0.43      3640
weighted avg       0.58      0.60      0.55      3640



#### Model 7

In [567]:
# Building model using best parameter for KNN Model

model_knn_ekman_no_neu_6 = KNeighborsClassifier(n_jobs=-33, n_neighbors=9, weights='distance')
model_knn_ekman_no_neu_6.fit(X_train_ekman_no_neu, y_train_ekman_no_neu)

In [568]:
y_train_pred_ekman_no_neu_6 = model_knn_ekman_no_neu_6.predict(X_train_ekman_no_neu)
y_test_pred_ekman_no_neu_6 = model_knn_ekman_no_neu_6.predict(X_test_ekman_no_neu)
y_val_pred_ekman_no_neu_6 = model_knn_ekman_no_neu_6.predict(X_val_ekman_no_neu)



In [569]:
print(" Train set results")
print(accuracy_score(y_train_ekman_no_neu, y_train_pred_ekman_no_neu_6))
print(" Test set results")
print(accuracy_score(y_test_ekman_no_neu, y_test_pred_ekman_no_neu_6))
print(" Val set results")
print(accuracy_score(y_val_ekman_no_neu, y_val_pred_ekman_no_neu_6))

 Train set results
0.9467301565551026
 Test set results
0.546978021978022
 Val set results
0.5581967213114755


In [570]:
cr = classification_report(y_train_ekman_no_neu, y_train_pred_ekman_no_neu_6)
print(cr)

              precision    recall  f1-score   support

           0       0.82      0.97      0.89      4517
           1       0.96      0.90      0.93       694
           2       0.97      0.90      0.94       642
           3       0.98      0.97      0.97     15693
           4       0.98      0.93      0.95      2938
           5       0.97      0.87      0.92      4707

    accuracy                           0.95     29191
   macro avg       0.95      0.92      0.93     29191
weighted avg       0.95      0.95      0.95     29191



In [571]:
cr = classification_report(y_test_ekman_no_neu, y_test_pred_ekman_no_neu_6)
print(cr)

              precision    recall  f1-score   support

           0       0.29      0.49      0.36       595
           1       0.57      0.28      0.37       112
           2       0.71      0.43      0.53        87
           3       0.70      0.74      0.71      1915
           4       0.54      0.26      0.35       341
           5       0.40      0.23      0.29       590

    accuracy                           0.55      3640
   macro avg       0.53      0.40      0.44      3640
weighted avg       0.56      0.55      0.54      3640



### SVM

In [81]:
y_train_new

0        6
1        6
2        0
3        2
4        0
        ..
43405    3
43406    5
43407    0
43408    3
43409    3
Name: Mapped_id, Length: 43410, dtype: int64

In [230]:
# Tune Hyperpaprameter using gridsearch cv
from sklearn.svm import SVC
svm_mdl = svm.SVC(random_state=42)

params = {'kernel' :['sigmoid'],
          'C':[0.4,0.5], 
          'tol'=0.01
          }
from sklearn.model_selection import GridSearchCV
svm_mdl_grid = GridSearchCV(estimator = svm_mdl , cv = 3, param_grid = params, scoring  = 'accuracy',n_jobs=-1, verbose=1)
svm_mdl_grid.fit(X_train_new, y_train_new)

Fitting 3 folds for each of 2 candidates, totalling 6 fits


, C=0.4, kernel='sigmoid'

### SVM model with Goemotion labels (28) including neutral emotion - Hyperparamert tuning

In [702]:
from sklearn.svm import SVC
svm_mdl = svm.SVC(random_state=42)
params = {'estimator__kernel' :['sigmoid'],'estimator__C':[0.9,1], 'estimator__tol':[0.01]}
classifier1 = MultiOutputClassifier(svm_mdl)
Grid_svm1 = GridSearchCV(estimator = classifier1, cv = 3, param_grid = params, scoring  = 'accuracy',n_jobs=-1, verbose=1)
Grid_svm1.fit(X_train, y_train)

Fitting 3 folds for each of 2 candidates, totalling 6 fits


In [703]:
Grid_svm1.best_params_

{'estimator__C': 1, 'estimator__kernel': 'sigmoid', 'estimator__tol': 0.01}

In [704]:
Grid_svm1.best_score_

0.31310757889887125

In [145]:
from sklearn.svm import SVC
svm_mdl = svm.SVC(random_state=42)
params = {'estimator__kernel' :['rbf'],'estimator__C':[1], 'estimator__tol':[0.01], 'estimator__class_weight':['balanced']}
classifier1 = MultiOutputClassifier(svm_mdl)
Grid_svm1 = GridSearchCV(estimator = classifier1, cv = 3, param_grid = params, scoring  = 'accuracy',n_jobs=-1, verbose=1)
Grid_svm1.fit(X_train, y_train)

Fitting 3 folds for each of 1 candidates, totalling 3 fits


In [146]:
Grid_svm1.best_params_

{'estimator__C': 1,
 'estimator__class_weight': 'balanced',
 'estimator__kernel': 'rbf',
 'estimator__tol': 0.01}

In [147]:
Grid_svm1.best_score_

0.2985487214927436

In [142]:
from sklearn.svm import SVC
svm_mdl = svm.SVC(random_state=42)
params = {'estimator__kernel' :['sigmoid'], 'estimator__tol':[0.01]}
classifier1 = MultiOutputClassifier(svm_mdl)
Grid_svm2 = GridSearchCV(estimator = classifier1, cv = 10, param_grid = params, scoring  = 'accuracy',n_jobs=-1, verbose=1)
Grid_svm2.fit(X_train, y_train)

Fitting 10 folds for each of 1 candidates, totalling 10 fits


In [143]:
Grid_svm2.best_params_

{'estimator__kernel': 'sigmoid', 'estimator__tol': 0.01}

In [144]:
Grid_svm2.best_score_

0.30562082469477075

In [83]:
Grid_svm1.best_params_

{'estimator__C': 0.8, 'estimator__kernel': 'sigmoid', 'estimator__tol': 0.01}

In [84]:
Grid_svm1.best_score_

0.31370651923519927

In [77]:
Grid_svm1.best_params_

{'estimator__C': 0.5, 'estimator__kernel': 'sigmoid', 'estimator__tol': 0.0001}

In [78]:
Grid_svm1.best_score_

0.29907855332872607

In [67]:
Grid_svm1.best_params_

{'estimator__C': 0.5, 'estimator__kernel': 'sigmoid', 'estimator__tol': 0.01}

In [68]:
Grid_svm1.best_score_

0.2997926744989634

In [62]:
# Making predictions on GoEmotions taxonomy 
classifier_preds = classifier1.predict(X_test)
classifier_preds

array([[0, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int64)

In [63]:
# Model evaluation
model_eval(y_test, classifier_preds, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.74,0.41,0.53
amusement,0.77,0.77,0.77
anger,0.74,0.1,0.18
annoyance,0.9,0.03,0.05
approval,0.7,0.09,0.16
caring,1.0,0.07,0.12
confusion,0.88,0.05,0.09
curiosity,0.86,0.02,0.04
desire,0.62,0.18,0.28
disappointment,1.0,0.01,0.01


### SVM Model with GoEmotion taxonomy 28 labels including neutral emotion 


#### Model 1

{'estimator__C': 0.8, 'estimator__kernel': 'sigmoid', 'estimator__tol': 0.01}

In [705]:
# Modelling SVM using multioutput classifier

svm_GE = svm.SVC(random_state=42, C=0.8, kernel='sigmoid', tol=0.01, max_iter=1000)
mul_out_class = MultiOutputClassifier(svm_GE)
mul_out_class.fit(X_train, y_train)




In [708]:
# Making predictions on GoEmotions taxonomy 
y_train_pred_GE = mul_out_class.predict(X_train)
y_test_pred_GE = mul_out_class.predict(X_test)
y_val_pred_GE = mul_out_class.predict(X_val)

In [None]:
y_train_pred_ekman_no_neu_6 = model_knn_ekman_no_neu_6.predict(X_train_ekman_no_neu)
y_test_pred_ekman_no_neu_6 = model_knn_ekman_no_neu_6.predict(X_test_ekman_no_neu)
y_val_pred_ekman_no_neu_6 = model_knn_ekman_no_neu_6.predict(X_val_ekman_no_neu)



In [709]:
print(" Train set results")
print(accuracy_score(y_train, y_train_pred_GE))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred_GE))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred_GE))

 Train set results
0.01971895876526146
 Test set results
0.019716233646581906
 Val set results
0.02488020641356432


In [710]:
# Model evaluation for train
model_eval(y_train, y_train_pred_GE, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.12,0.84,0.21
amusement,0.73,0.75,0.74
anger,0.13,0.39,0.2
annoyance,0.09,0.32,0.14
approval,0.07,0.57,0.12
caring,0.12,0.11,0.11
confusion,0.05,0.1,0.07
curiosity,0.06,0.57,0.1
desire,0.56,0.31,0.4
disappointment,0.04,0.21,0.07


In [711]:
# Model evaluation
model_eval(y_test, y_test_pred_GE, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.12,0.85,0.2
amusement,0.73,0.75,0.74
anger,0.15,0.38,0.21
annoyance,0.09,0.3,0.14
approval,0.07,0.59,0.12
caring,0.13,0.11,0.12
confusion,0.04,0.08,0.06
curiosity,0.06,0.56,0.1
desire,0.49,0.2,0.29
disappointment,0.05,0.24,0.08


In [712]:
# Model evaluation
model_eval(y_val, y_val_pred_GE, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.11,0.86,0.2
amusement,0.72,0.7,0.71
anger,0.13,0.37,0.19
annoyance,0.08,0.29,0.13
approval,0.08,0.6,0.14
caring,0.15,0.11,0.13
confusion,0.06,0.12,0.08
curiosity,0.06,0.64,0.1
desire,0.61,0.36,0.46
disappointment,0.05,0.25,0.09


### Model 2

In [713]:
# Modelling SVM using multioutput classifier

svm_GE2 = svm.SVC(random_state=42, C=0.8, kernel='sigmoid', tol=0.01, max_iter=100000)
mul_out_class2 = MultiOutputClassifier(svm_GE2)
mul_out_class2.fit(X_train, y_train)


In [714]:
# Making predictions on GoEmotions taxonomy 
y_train_pred_GE2 = mul_out_class2.predict(X_train)
y_test_pred_GE2 = mul_out_class2.predict(X_test)
y_val_pred_GE2 = mul_out_class2.predict(X_val)

In [719]:
print(" Train set results")
print(accuracy_score(y_train, y_train_pred_GE2))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred_GE2))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred_GE2))

 Train set results
0.2994240958304538
 Test set results
0.3110374055647688
 Val set results
0.32528566162919276


In [716]:
# Model evaluation for train
model_eval(y_train, y_train_pred_GE2, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.66,0.4,0.5
amusement,0.75,0.7,0.73
anger,0.44,0.17,0.25
annoyance,0.17,0.03,0.05
approval,0.4,0.07,0.12
caring,0.29,0.02,0.04
confusion,0.16,0.04,0.06
curiosity,0.4,0.05,0.08
desire,0.55,0.28,0.37
disappointment,0.22,0.02,0.03


In [717]:
# Model evaluation
model_eval(y_test, y_test_pred_GE2, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.64,0.4,0.49
amusement,0.75,0.73,0.74
anger,0.45,0.17,0.25
annoyance,0.29,0.04,0.08
approval,0.45,0.09,0.15
caring,0.25,0.02,0.04
confusion,0.15,0.04,0.06
curiosity,0.32,0.02,0.04
desire,0.5,0.2,0.29
disappointment,0.31,0.03,0.06


In [718]:
# Model evaluation
model_eval(y_val, y_val_pred_GE2, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.71,0.46,0.56
amusement,0.76,0.71,0.74
anger,0.48,0.21,0.29
annoyance,0.13,0.02,0.04
approval,0.46,0.07,0.13
caring,0.17,0.01,0.01
confusion,0.19,0.05,0.08
curiosity,0.54,0.08,0.13
desire,0.6,0.35,0.44
disappointment,0.0,0.0,0.0


### Model 3

In [720]:
# Modelling SVM using multioutput classifier

svm_GE3 = svm.SVC(random_state=42, C=0.8, kernel='sigmoid', tol=0.01)
mul_out_class3 = MultiOutputClassifier(svm_GE3)
mul_out_class3.fit(X_train, y_train)


In [721]:
# Making predictions on GoEmotions taxonomy 
y_train_pred_GE3 = mul_out_class3.predict(X_train)
y_test_pred_GE3 = mul_out_class3.predict(X_test)
y_val_pred_GE3 = mul_out_class3.predict(X_val)

In [722]:
print(" Train set results")
print(accuracy_score(y_train, y_train_pred_GE3))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred_GE3))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred_GE3))

 Train set results
0.2994240958304538
 Test set results
0.3110374055647688
 Val set results
0.32528566162919276


In [723]:
# Model evaluation for train
model_eval(y_train, y_train_pred_GE3, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.66,0.4,0.5
amusement,0.75,0.7,0.73
anger,0.44,0.17,0.25
annoyance,0.17,0.03,0.05
approval,0.4,0.07,0.12
caring,0.29,0.02,0.04
confusion,0.16,0.04,0.06
curiosity,0.4,0.05,0.08
desire,0.55,0.28,0.37
disappointment,0.22,0.02,0.03


In [724]:
# Model evaluation
model_eval(y_test, y_test_pred_GE3, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.64,0.4,0.49
amusement,0.75,0.73,0.74
anger,0.45,0.17,0.25
annoyance,0.29,0.04,0.08
approval,0.45,0.09,0.15
caring,0.25,0.02,0.04
confusion,0.15,0.04,0.06
curiosity,0.32,0.02,0.04
desire,0.5,0.2,0.29
disappointment,0.31,0.03,0.06


In [725]:
# Model evaluation
model_eval(y_val, y_val_pred_GE3, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.71,0.46,0.56
amusement,0.76,0.71,0.74
anger,0.48,0.21,0.29
annoyance,0.13,0.02,0.04
approval,0.46,0.07,0.13
caring,0.17,0.01,0.01
confusion,0.19,0.05,0.08
curiosity,0.54,0.08,0.13
desire,0.6,0.35,0.44
disappointment,0.0,0.0,0.0


### Model 4 - Final

In [749]:
# Modelling SVM using multioutput classifier

svm_GE4 = svm.SVC(random_state=42, C=0.9, kernel='sigmoid', tol=0.001, max_iter=10000)
mul_out_class4 = MultiOutputClassifier(svm_GE4)
mul_out_class4.fit(X_train, y_train)




In [750]:
# Making predictions on GoEmotions taxonomy 
y_train_pred_GE4 = mul_out_class4.predict(X_train)
y_test_pred_GE4 = mul_out_class4.predict(X_test)
y_val_pred_GE4 = mul_out_class4.predict(X_val)

In [751]:
print(" Train set results")
print(accuracy_score(y_train, y_train_pred_GE4))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred_GE4))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred_GE4))

 Train set results
0.29949320433079935
 Test set results
0.3189607517965727
 Val set results
0.32602285293033545


In [752]:
# Model evaluation for train
model_eval(y_train, y_train_pred_GE4, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.65,0.4,0.49
amusement,0.76,0.7,0.73
anger,0.45,0.16,0.24
annoyance,0.23,0.04,0.07
approval,0.44,0.08,0.13
caring,0.21,0.02,0.04
confusion,0.17,0.04,0.06
curiosity,0.48,0.05,0.08
desire,0.55,0.28,0.37
disappointment,0.18,0.02,0.04


In [753]:
# Model evaluation
model_eval(y_test, y_test_pred_GE4, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.65,0.41,0.5
amusement,0.77,0.73,0.75
anger,0.46,0.15,0.22
annoyance,0.32,0.06,0.1
approval,0.51,0.11,0.19
caring,0.23,0.02,0.04
confusion,0.15,0.04,0.06
curiosity,0.36,0.02,0.03
desire,0.53,0.22,0.31
disappointment,0.25,0.03,0.06


In [754]:
# Model evaluation
model_eval(y_val, y_val_pred_GE4, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.68,0.46,0.55
amusement,0.77,0.71,0.74
anger,0.4,0.16,0.23
annoyance,0.16,0.03,0.05
approval,0.47,0.08,0.14
caring,0.06,0.01,0.01
confusion,0.19,0.05,0.08
curiosity,0.64,0.07,0.13
desire,0.6,0.34,0.43
disappointment,0.08,0.01,0.01


### Model 5

In [155]:
# Modelling SVM using multioutput classifier

svm_GE5 = svm.SVC(random_state=42, C=0.9, kernel='sigmoid', tol=0.0001, max_iter=10000)
mul_out_class5 = MultiOutputClassifier(svm_GE5)
mul_out_class5.fit(X_train, y_train)




In [156]:
# Making predictions on GoEmotions taxonomy 
y_train_pred_GE5 = mul_out_class5.predict(X_train)
y_test_pred_GE5 = mul_out_class5.predict(X_test)
y_val_pred_GE5 = mul_out_class5.predict(X_val)

In [157]:
print(" Train set results")
print(accuracy_score(y_train, y_train_pred_GE5))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred_GE5))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred_GE5))

 Train set results
0.2981110343238885
 Test set results
0.31711811313801364
 Val set results
0.32491706597862147


In [158]:
# Model evaluation for train
model_eval(y_train, y_train_pred_GE5, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.66,0.4,0.5
amusement,0.77,0.69,0.73
anger,0.46,0.18,0.25
annoyance,0.23,0.04,0.07
approval,0.44,0.08,0.13
caring,0.25,0.02,0.05
confusion,0.16,0.04,0.06
curiosity,0.55,0.04,0.08
desire,0.57,0.27,0.36
disappointment,0.21,0.02,0.03


In [159]:
# Model evaluation
model_eval(y_test, y_test_pred_GE5, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.66,0.4,0.5
amusement,0.77,0.73,0.75
anger,0.47,0.16,0.23
annoyance,0.3,0.06,0.09
approval,0.51,0.11,0.19
caring,0.21,0.02,0.04
confusion,0.15,0.04,0.06
curiosity,0.44,0.01,0.03
desire,0.53,0.19,0.28
disappointment,0.29,0.03,0.06


In [160]:
# Model evaluation
model_eval(y_val, y_val_pred_GE5, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.69,0.46,0.55
amusement,0.78,0.68,0.73
anger,0.46,0.21,0.29
annoyance,0.16,0.03,0.05
approval,0.47,0.08,0.14
caring,0.07,0.01,0.01
confusion,0.18,0.05,0.08
curiosity,0.64,0.06,0.12
desire,0.67,0.36,0.47
disappointment,0.0,0.0,0.0


#### SVM model with Goemotion labels (27) excluding neutral emotion

In [761]:
from sklearn.svm import SVC
svm_mdl2 = svm.SVC(random_state=42)
params = {'estimator__kernel' :['sigmoid'],'estimator__C':[0.4,0.5], 'estimator__tol':[0.01]}
classifier2 = MultiOutputClassifier(svm_mdl2)
Grid_svm2 = GridSearchCV(estimator = classifier2, cv = 3, param_grid = params, scoring  = 'accuracy',n_jobs=-1, verbose=1)
Grid_svm2.fit(X_train_no_neu, y_train_no_neu)

Fitting 3 folds for each of 2 candidates, totalling 6 fits


In [762]:
Grid_svm2.best_params_

{'estimator__C': 0.5, 'estimator__kernel': 'sigmoid', 'estimator__tol': 0.01}

In [763]:
Grid_svm2.best_score_

0.26468780000914555

In [764]:
from sklearn.svm import SVC
svm_mdl2 = svm.SVC(random_state=42)
params = {'estimator__kernel' :['sigmoid'],'estimator__C':[0.4,0.3], 'estimator__tol':[0.01]}
classifier2 = MultiOutputClassifier(svm_mdl2)
Grid_svm2 = GridSearchCV(estimator = classifier2, cv = 3, param_grid = params, scoring  = 'accuracy',n_jobs=-1, verbose=1)
Grid_svm2.fit(X_train_no_neu, y_train_no_neu)

Fitting 3 folds for each of 2 candidates, totalling 6 fits


In [765]:
Grid_svm2.best_params_

{'estimator__C': 0.4, 'estimator__kernel': 'sigmoid', 'estimator__tol': 0.01}

In [766]:
Grid_svm2.best_score_

0.2606338681573062

In [767]:
from sklearn.svm import SVC
svm_mdl2 = svm.SVC(random_state=42)
params = {'estimator__kernel' :['sigmoid'],'estimator__C':[0.6,0.5], 'estimator__tol':[0.01]}
classifier2 = MultiOutputClassifier(svm_mdl2)
Grid_svm2 = GridSearchCV(estimator = classifier2, cv = 3, param_grid = params, scoring  = 'accuracy',n_jobs=-1, verbose=1)
Grid_svm2.fit(X_train_no_neu, y_train_no_neu)

Fitting 3 folds for each of 2 candidates, totalling 6 fits


In [768]:
Grid_svm2.best_params_

{'estimator__C': 0.6, 'estimator__kernel': 'sigmoid', 'estimator__tol': 0.01}

In [769]:
Grid_svm2.best_score_

0.2664859901721581

In [770]:
from sklearn.svm import SVC
svm_mdl2 = svm.SVC(random_state=42)
params = {'estimator__kernel' :['sigmoid'],'estimator__C':[0.5], 'estimator__tol':[0.01,0.001]}
classifier2 = MultiOutputClassifier(svm_mdl2)
Grid_svm2 = GridSearchCV(estimator = classifier2, cv = 3, param_grid = params, scoring  = 'accuracy',n_jobs=-1, verbose=1)
Grid_svm2.fit(X_train_no_neu, y_train_no_neu)

Fitting 3 folds for each of 2 candidates, totalling 6 fits


In [771]:
Grid_svm2.best_params_

{'estimator__C': 0.5, 'estimator__kernel': 'sigmoid', 'estimator__tol': 0.01}

In [772]:
Grid_svm2.best_score_

0.26468780000914555

In [773]:
from sklearn.svm import SVC
svm_mdl2 = svm.SVC(random_state=42)
params = {'estimator__kernel' :['sigmoid'],'estimator__C':[0.6,0.7], 'estimator__tol':[0.0001]}
classifier2 = MultiOutputClassifier(svm_mdl2)
Grid_svm2 = GridSearchCV(estimator = classifier2, cv = 3, param_grid = params, scoring  = 'accuracy',n_jobs=-1, verbose=1)
Grid_svm2.fit(X_train_no_neu, y_train_no_neu)

Fitting 3 folds for each of 2 candidates, totalling 6 fits


In [774]:
Grid_svm2.best_params_

{'estimator__C': 0.6, 'estimator__kernel': 'sigmoid', 'estimator__tol': 0.0001}

In [775]:
Grid_svm2.best_score_

0.26697637855976114

In [62]:
# Making predictions on GoEmotions taxonomy 
classifier_preds2 = classifier2.predict(X_test)
classifier_preds2

array([[0, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int64)

In [63]:
# Model evaluation
model_eval(y_test, classifier_preds2, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.74,0.41,0.53
amusement,0.77,0.77,0.77
anger,0.74,0.1,0.18
annoyance,0.9,0.03,0.05
approval,0.7,0.09,0.16
caring,1.0,0.07,0.12
confusion,0.88,0.05,0.09
curiosity,0.86,0.02,0.04
desire,0.62,0.18,0.28
disappointment,1.0,0.01,0.01


### Model 1

In [161]:
# Modelling SVM using multioutput classifier

svm_GE_no_neu = svm.SVC(random_state=42, C=0.9, kernel='sigmoid', tol=0.001, max_iter=10000)
mul_out_class4 = MultiOutputClassifier(svm_GE_no_neu)
mul_out_class4.fit(X_train_no_neu, y_train_no_neu)




In [162]:
# Making predictions on GoEmotions taxonomy 
y_train_pred_GE4 = mul_out_class4.predict(X_train_no_neu)
y_test_pred_GE4 = mul_out_class4.predict(X_test_no_neu)
y_val_pred_GE4 = mul_out_class4.predict(X_val_no_neu)

In [163]:
print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_pred_GE4))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_pred_GE4))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_pred_GE4))

 Train set results
0.2642625952201916
 Test set results
0.2677309604815493
 Val set results
0.2861241523213354


In [164]:
# Model evaluation for train
model_eval(y_train_no_neu, y_train_pred_GE4, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.7,0.52,0.59
amusement,0.81,0.77,0.79
anger,0.53,0.22,0.31
annoyance,0.25,0.06,0.1
approval,0.43,0.07,0.12
caring,0.35,0.04,0.07
confusion,0.3,0.06,0.1
curiosity,0.46,0.06,0.11
desire,0.66,0.3,0.41
disappointment,0.4,0.05,0.09


In [165]:
# Model evaluation
model_eval(y_test_no_neu, y_test_pred_GE4, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.66,0.48,0.56
amusement,0.81,0.78,0.79
anger,0.52,0.19,0.28
annoyance,0.31,0.1,0.15
approval,0.48,0.09,0.15
caring,0.29,0.04,0.07
confusion,0.26,0.07,0.11
curiosity,0.29,0.03,0.05
desire,0.64,0.22,0.32
disappointment,0.5,0.07,0.13


In [166]:
# Model evaluation
model_eval(y_val_no_neu, y_val_pred_GE4, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.7,0.58,0.64
amusement,0.81,0.76,0.78
anger,0.52,0.21,0.3
annoyance,0.23,0.06,0.1
approval,0.49,0.08,0.14
caring,0.12,0.01,0.01
confusion,0.46,0.12,0.2
curiosity,0.54,0.08,0.15
desire,0.69,0.38,0.49
disappointment,0.14,0.02,0.03


#### SVM model with Ekman labels (7) including neutral emotion

In [758]:
# Tune Hyperpaprameter using gridsearch cv
from sklearn.svm import SVC
svm_3 = svm.SVC(random_state=42)

params = {'kernel' :['sigmoid'],
          'C':[0.3,0.4], 
          'tol':[0.01]
          }
from sklearn.model_selection import GridSearchCV
Grid_svm3 = GridSearchCV(estimator = svm_3 , cv = 3, param_grid = params, scoring  = 'accuracy',n_jobs=-1, verbose=1)
Grid_svm3.fit(X_train_ekman, y_train_ekman)

Fitting 3 folds for each of 2 candidates, totalling 6 fits


In [759]:
Grid_svm3.best_params_

{'C': 0.3, 'kernel': 'sigmoid', 'tol': 0.01}

In [760]:
Grid_svm3.best_score_

0.5999769638332181

In [73]:
Grid_svm3.best_params_

{'C': 0.4, 'kernel': 'sigmoid', 'tol': 0.01}

In [74]:
Grid_svm3.best_score_

0.5998848191660908

In [168]:
y_train_ekman.shape

(43410,)

### Model 1

In [169]:
# Modelling SVM using multioutput classifier

svm_GE_ekman = svm.SVC(random_state=42, C=0.9, kernel='sigmoid', tol=0.001)
#mul_out_class4 = MultiOutputClassifier(svm_GE_ekman)
svm_GE_ekman.fit(X_train_ekman, y_train_ekman)


In [170]:
# Making predictions on GoEmotions taxonomy 
y_train_pred_GE4 = svm_GE_ekman.predict(X_train_ekman)
y_test_pred_GE4 = svm_GE_ekman.predict(X_test_ekman)
y_val_pred_GE4 = svm_GE_ekman.predict(X_val_ekman)

In [171]:
print(" Train set results")
print(accuracy_score(y_train_ekman, y_train_pred_GE4))
print(" Test set results")
print(accuracy_score(y_test_ekman, y_test_pred_GE4))
print(" Val set results")
print(accuracy_score(y_val_ekman, y_val_pred_GE4))

 Train set results
0.573715733701912
 Test set results
0.5942509673852957
 Val set results
0.5930704017692591


In [173]:
# Model evaluation for train
cr = classification_report(y_train_ekman, y_train_pred_GE4, target_names=class_label_names_ekman)
print(cr)
# model_eval(y_train_ekman, y_train_pred_GE4, class_label_names_ekman)

              precision    recall  f1-score   support

       anger       0.41      0.27      0.33      4517
     disgust       0.46      0.32      0.38       694
        fear       0.48      0.49      0.48       642
         joy       0.72      0.69      0.70     15693
     sadness       0.64      0.37      0.47      2938
    surprise       0.53      0.16      0.25      4707
     neutral       0.50      0.73      0.59     14219

    accuracy                           0.57     43410
   macro avg       0.53      0.43      0.46     43410
weighted avg       0.58      0.57      0.55     43410



In [174]:
# Model evaluation
#model_eval(y_test_ekman, y_test_pred_GE4, class_label_names_ekman)

cr = classification_report(y_test_ekman, y_test_pred_GE4, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.46      0.28      0.35       595
     disgust       0.54      0.40      0.46       112
        fear       0.61      0.62      0.61        87
         joy       0.72      0.71      0.72      1915
     sadness       0.67      0.38      0.48       341
    surprise       0.53      0.15      0.24       590
     neutral       0.52      0.77      0.62      1787

    accuracy                           0.59      5427
   macro avg       0.58      0.47      0.50      5427
weighted avg       0.60      0.59      0.57      5427



In [175]:
# Model evaluation
#model_eval(y_val_ekman, y_val_pred_GE4, class_label_names_ekman)

cr = classification_report(y_val_ekman, y_val_pred_GE4, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.42      0.28      0.34       582
     disgust       0.47      0.43      0.45        81
        fear       0.59      0.52      0.55        89
         joy       0.74      0.72      0.73      1997
     sadness       0.61      0.34      0.43       352
    surprise       0.54      0.17      0.26       559
     neutral       0.51      0.75      0.61      1766

    accuracy                           0.59      5426
   macro avg       0.55      0.46      0.48      5426
weighted avg       0.60      0.59      0.57      5426



#### SVM model with Ekman labels (6) excluding neutral emotion

In [75]:
# Tune Hyperpaprameter using gridsearch cv
from sklearn.svm import SVC
svm_model = svm.SVC(random_state=42)

params = {'kernel' :['sigmoid'],
          'C':[0.4,0.5], 
          'tol':[0.01]
          }
from sklearn.model_selection import GridSearchCV
svm_model = GridSearchCV(estimator = svm_model , cv = 3, param_grid = params, scoring  = 'accuracy',n_jobs=-1, verbose=1)
svm_model.fit(X_train_new, y_train_new)

Fitting 3 folds for each of 2 candidates, totalling 6 fits


In [231]:
svm_model.best_params_

{'C': 0.4, 'kernel': 'sigmoid'}

In [232]:
svm_model.best_score_

0.6000691085003456

In [228]:
svm_model.best_params_

{'C': 0.5, 'kernel': 'sigmoid'}

In [229]:
svm_model.best_score_

0.5994701681640175

### Model 1

In [176]:
# Modelling SVM using multioutput classifier

svm_GE_ekman_no_neu = svm.SVC(random_state=42, C=0.9, kernel='sigmoid', tol=0.01)
#mul_out_class4 = MultiOutputClassifier(svm_GE_ekman)
svm_GE_ekman_no_neu.fit(X_train_ekman_no_neu, y_train_ekman_no_neu)


In [177]:
# Making predictions on GoEmotions taxonomy 
y_train_pred_GE4 = svm_GE_ekman_no_neu.predict(X_train_ekman_no_neu)
y_test_pred_GE4 = svm_GE_ekman_no_neu.predict(X_test_ekman_no_neu)
y_val_pred_GE4 = svm_GE_ekman_no_neu.predict(X_val_ekman_no_neu)

In [178]:
print(" Train set results")
print(accuracy_score(y_train_ekman_no_neu, y_train_pred_GE4))
print(" Test set results")
print(accuracy_score(y_test_ekman_no_neu, y_test_pred_GE4))
print(" Val set results")
print(accuracy_score(y_val_ekman_no_neu, y_val_pred_GE4))

 Train set results
0.6675002569285053
 Test set results
0.6585164835164835
 Val set results
0.6721311475409836


In [179]:
# Model evaluation for train
cr = classification_report(y_train_ekman_no_neu, y_train_pred_GE4, target_names=class_label_names_ekman_no_neu)
print(cr)
# model_eval(y_train_ekman, y_train_pred_GE4, class_label_names_ekman)

              precision    recall  f1-score   support

       anger       0.51      0.50      0.50      4517
     disgust       0.50      0.43      0.46       694
        fear       0.57      0.50      0.53       642
         joy       0.73      0.88      0.79     15693
     sadness       0.71      0.48      0.57      2938
    surprise       0.54      0.31      0.40      4707

    accuracy                           0.67     29191
   macro avg       0.59      0.52      0.54     29191
weighted avg       0.65      0.67      0.65     29191



In [180]:
# Model evaluation
#model_eval(y_test_ekman, y_test_pred_GE4, class_label_names_ekman)

cr = classification_report(y_test_ekman_no_neu, y_test_pred_GE4, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.52      0.50      0.51       595
     disgust       0.61      0.49      0.54       112
        fear       0.66      0.61      0.63        87
         joy       0.72      0.87      0.79      1915
     sadness       0.68      0.46      0.55       341
    surprise       0.50      0.28      0.36       590

    accuracy                           0.66      3640
   macro avg       0.61      0.54      0.56      3640
weighted avg       0.64      0.66      0.64      3640



In [181]:
# Model evaluation
#model_eval(y_val_ekman, y_val_pred_GE4, class_label_names_ekman)

cr = classification_report(y_val_ekman_no_neu, y_val_pred_GE4, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.53      0.51      0.52       582
     disgust       0.48      0.51      0.49        81
        fear       0.68      0.55      0.61        89
         joy       0.73      0.87      0.79      1997
     sadness       0.66      0.44      0.53       352
    surprise       0.54      0.31      0.39       559

    accuracy                           0.67      3660
   macro avg       0.60      0.53      0.56      3660
weighted avg       0.66      0.67      0.65      3660



### Model 2

In [182]:
# Modelling SVM using multioutput classifier

svm_GE_ekman_no_neu = svm.SVC(random_state=42, C=0.9, kernel='sigmoid', tol=0.1)
#mul_out_class4 = MultiOutputClassifier(svm_GE_ekman)
svm_GE_ekman_no_neu.fit(X_train_ekman_no_neu, y_train_ekman_no_neu)


In [183]:
# Making predictions on GoEmotions taxonomy 
y_train_pred_GE4 = svm_GE_ekman_no_neu.predict(X_train_ekman_no_neu)
y_test_pred_GE4 = svm_GE_ekman_no_neu.predict(X_test_ekman_no_neu)
y_val_pred_GE4 = svm_GE_ekman_no_neu.predict(X_val_ekman_no_neu)

In [184]:
print(" Train set results")
print(accuracy_score(y_train_ekman_no_neu, y_train_pred_GE4))
print(" Test set results")
print(accuracy_score(y_test_ekman_no_neu, y_test_pred_GE4))
print(" Val set results")
print(accuracy_score(y_val_ekman_no_neu, y_val_pred_GE4))

 Train set results
0.6664040286389641
 Test set results
0.6620879120879121
 Val set results
0.6751366120218579


In [185]:
# Model evaluation for train
cr = classification_report(y_train_ekman_no_neu, y_train_pred_GE4, target_names=class_label_names_ekman_no_neu)
print(cr)
# model_eval(y_train_ekman, y_train_pred_GE4, class_label_names_ekman)

              precision    recall  f1-score   support

       anger       0.51      0.50      0.50      4517
     disgust       0.49      0.43      0.46       694
        fear       0.56      0.50      0.53       642
         joy       0.73      0.88      0.79     15693
     sadness       0.71      0.48      0.57      2938
    surprise       0.54      0.31      0.39      4707

    accuracy                           0.67     29191
   macro avg       0.59      0.51      0.54     29191
weighted avg       0.65      0.67      0.65     29191



In [186]:
# Model evaluation
#model_eval(y_test_ekman, y_test_pred_GE4, class_label_names_ekman)

cr = classification_report(y_test_ekman_no_neu, y_test_pred_GE4, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.52      0.50      0.51       595
     disgust       0.61      0.48      0.54       112
        fear       0.66      0.61      0.63        87
         joy       0.72      0.88      0.79      1915
     sadness       0.70      0.48      0.57       341
    surprise       0.51      0.27      0.36       590

    accuracy                           0.66      3640
   macro avg       0.62      0.54      0.57      3640
weighted avg       0.64      0.66      0.64      3640



In [187]:
# Model evaluation
#model_eval(y_val_ekman, y_val_pred_GE4, class_label_names_ekman)

cr = classification_report(y_val_ekman_no_neu, y_val_pred_GE4, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.54      0.52      0.53       582
     disgust       0.48      0.49      0.48        81
        fear       0.68      0.55      0.61        89
         joy       0.73      0.88      0.80      1997
     sadness       0.67      0.45      0.54       352
    surprise       0.54      0.31      0.39       559

    accuracy                           0.68      3660
   macro avg       0.61      0.53      0.56      3660
weighted avg       0.66      0.68      0.66      3660



### Decision Tree

In [55]:
# performing hyperparameter tuning

dt1 = DecisionTreeClassifier(random_state=42)

params = {
    #'max_depth':[500],
    'min_samples_leaf': [55,60],
    'criterion':['gini']
}

grid_search = GridSearchCV(estimator=dt1,param_grid=params,
                           cv=5,n_jobs=-1,verbose=1, scoring='accuracy')
grid_search.fit(X_train, y_train)

Fitting 5 folds for each of 2 candidates, totalling 10 fits


In [56]:
# Getting the best score
grid_search.best_score_

0.3020732550103663

In [57]:
# Getting best set of hyperparameters
grid_search.best_estimator_

### Decision Tree Model with GoEmotion taxonomy 28 labels including neutral emotion 


#### Model 1

In [58]:
dt = DecisionTreeClassifier(min_samples_leaf=55, criterion='gini', random_state=42)
dt.fit(X_train, y_train)

In [59]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_pred = dt.predict(X_train)
y_test_pred = dt.predict(X_test)



print(" Train set results")
print(accuracy_score(y_train, y_train_pred))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred))

 Train set results
0.3332181524994241
 Test set results
0.31767090473558135


In [60]:
# Model evaluation
model_eval(y_test, y_test_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.67,0.35,0.46
amusement,0.76,0.8,0.78
anger,0.67,0.09,0.16
annoyance,0.46,0.02,0.04
approval,0.81,0.05,0.09
caring,0.83,0.04,0.07
confusion,0.0,0.0,0.0
curiosity,1.0,0.01,0.02
desire,0.61,0.13,0.22
disappointment,0.0,0.0,0.0


### Model 2 -- Final - Final

In [776]:
dt1 = DecisionTreeClassifier(min_samples_leaf=25, criterion='gini', random_state=42)
dt1.fit(X_train, y_train)

In [778]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_pred_1 = dt1.predict(X_train)
y_test_pred_1 = dt1.predict(X_test)
y_val_pred_1 = dt1.predict(X_val)


print(" Train set results")
print(accuracy_score(y_train, y_train_pred_1))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred_1))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred_1))

 Train set results
0.3560239576134531
 Test set results
0.3296480560162152
 Val set results
0.3483228897899005


In [779]:
# Model evaluation
model_eval(y_train, y_train_pred_1, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.74,0.43,0.55
amusement,0.78,0.76,0.77
anger,0.64,0.2,0.3
annoyance,0.57,0.04,0.08
approval,0.76,0.06,0.11
caring,0.61,0.05,0.1
confusion,0.63,0.07,0.13
curiosity,0.95,0.04,0.08
desire,0.71,0.24,0.36
disappointment,0.0,0.0,0.0


In [780]:
# Model evaluation
model_eval(y_test, y_test_pred_1, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.68,0.37,0.48
amusement,0.76,0.75,0.76
anger,0.6,0.17,0.27
annoyance,0.5,0.03,0.06
approval,0.72,0.07,0.12
caring,0.41,0.05,0.09
confusion,0.64,0.05,0.09
curiosity,1.0,0.01,0.02
desire,0.55,0.13,0.21
disappointment,0.0,0.0,0.0


In [781]:
# Model evaluation
model_eval(y_val, y_val_pred_1, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.71,0.48,0.57
amusement,0.76,0.72,0.74
anger,0.57,0.21,0.3
annoyance,0.54,0.02,0.04
approval,0.85,0.07,0.13
caring,0.47,0.05,0.08
confusion,0.38,0.07,0.11
curiosity,0.94,0.06,0.11
desire,0.72,0.27,0.4
disappointment,0.0,0.0,0.0


### Model 3

In [782]:
dt2 = DecisionTreeClassifier(min_samples_leaf=15, criterion='gini', random_state=42)
dt2.fit(X_train, y_train)

In [783]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_pred_2 = dt2.predict(X_train)
y_test_pred_2 = dt2.predict(X_test)
y_val_pred_2 = dt2.predict(X_val)


print(" Train set results")
print(accuracy_score(y_train, y_train_pred_2))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred_2))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred_2))

 Train set results
0.3701220916839438
 Test set results
0.3255942509673853
 Val set results
0.3391079985256174


In [784]:
# Model evaluation
model_eval(y_train, y_train_pred_2, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.75,0.47,0.58
amusement,0.78,0.77,0.78
anger,0.66,0.23,0.34
annoyance,0.61,0.06,0.11
approval,0.75,0.06,0.12
caring,0.62,0.08,0.13
confusion,0.66,0.08,0.15
curiosity,0.88,0.05,0.1
desire,0.71,0.26,0.38
disappointment,0.58,0.03,0.05


In [785]:
# Model evaluation
model_eval(y_test, y_test_pred_2, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.69,0.42,0.52
amusement,0.77,0.76,0.76
anger,0.6,0.19,0.29
annoyance,0.58,0.04,0.08
approval,0.55,0.05,0.09
caring,0.5,0.08,0.14
confusion,0.41,0.05,0.08
curiosity,0.67,0.01,0.03
desire,0.58,0.13,0.22
disappointment,0.67,0.03,0.05


In [786]:
# Model evaluation
model_eval(y_val, y_val_pred_2, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.7,0.5,0.58
amusement,0.77,0.73,0.75
anger,0.57,0.2,0.3
annoyance,0.47,0.03,0.05
approval,0.83,0.08,0.14
caring,0.33,0.05,0.08
confusion,0.4,0.05,0.09
curiosity,0.85,0.07,0.13
desire,0.77,0.3,0.43
disappointment,0.33,0.01,0.01


### Model 4

In [787]:
dt3 = DecisionTreeClassifier(min_samples_leaf=10, criterion='gini', random_state=42)
dt3.fit(X_train, y_train)

In [788]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_pred_3 = dt3.predict(X_train)
y_test_pred_3 = dt3.predict(X_test)
y_val_pred_3 = dt3.predict(X_val)


print(" Train set results")
print(accuracy_score(y_train, y_train_pred_3))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred_3))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred_3))

 Train set results
0.37611149504722413
 Test set results
0.32098765432098764
 Val set results
0.3267600442314781


In [789]:
# Model evaluation
model_eval(y_train, y_train_pred_3, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.77,0.48,0.59
amusement,0.81,0.75,0.78
anger,0.7,0.24,0.36
annoyance,0.66,0.07,0.13
approval,0.7,0.1,0.17
caring,0.72,0.08,0.14
confusion,0.71,0.09,0.16
curiosity,0.91,0.05,0.1
desire,0.72,0.28,0.41
disappointment,0.63,0.02,0.04


In [790]:
# Model evaluation
model_eval(y_test, y_test_pred_3, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.66,0.42,0.51
amusement,0.78,0.75,0.77
anger,0.6,0.17,0.26
annoyance,0.52,0.05,0.09
approval,0.53,0.08,0.13
caring,0.57,0.1,0.16
confusion,0.29,0.03,0.06
curiosity,0.67,0.01,0.03
desire,0.52,0.16,0.24
disappointment,0.6,0.02,0.04


In [791]:
# Model evaluation
model_eval(y_val, y_val_pred_3, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.71,0.5,0.59
amusement,0.79,0.69,0.74
anger,0.53,0.2,0.29
annoyance,0.36,0.03,0.05
approval,0.64,0.09,0.16
caring,0.18,0.02,0.04
confusion,0.46,0.04,0.07
curiosity,0.85,0.07,0.13
desire,0.65,0.26,0.37
disappointment,0.0,0.0,0.0


### Model 5 

In [792]:
dt2 = DecisionTreeClassifier(min_samples_leaf=20, criterion='gini', random_state=42)
dt2.fit(X_train, y_train)

In [793]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_pred_2 = dt2.predict(X_train)
y_test_pred_2 = dt2.predict(X_test)
y_val_pred_2 = dt2.predict(X_val)


print(" Train set results")
print(accuracy_score(y_train, y_train_pred_2))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred_2))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred_2))

 Train set results
0.36053904630269523
 Test set results
0.32633130643080893
 Val set results
0.3405823811279027


In [794]:
# Model evaluation
model_eval(y_train, y_train_pred_2, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.75,0.44,0.56
amusement,0.78,0.75,0.77
anger,0.65,0.21,0.32
annoyance,0.59,0.05,0.09
approval,0.78,0.06,0.11
caring,0.63,0.07,0.12
confusion,0.65,0.08,0.15
curiosity,0.93,0.05,0.09
desire,0.72,0.25,0.37
disappointment,0.55,0.02,0.04


In [795]:
# Model evaluation
model_eval(y_test, y_test_pred_2, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.66,0.38,0.48
amusement,0.78,0.76,0.77
anger,0.54,0.19,0.28
annoyance,0.48,0.03,0.06
approval,0.6,0.05,0.09
caring,0.5,0.07,0.13
confusion,0.6,0.06,0.11
curiosity,0.8,0.01,0.03
desire,0.61,0.13,0.22
disappointment,1.0,0.03,0.05


In [796]:
# Model evaluation
model_eval(y_val, y_val_pred_2, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.73,0.46,0.57
amusement,0.78,0.7,0.74
anger,0.54,0.21,0.3
annoyance,0.47,0.02,0.04
approval,0.83,0.07,0.13
caring,0.41,0.05,0.08
confusion,0.38,0.07,0.12
curiosity,0.94,0.07,0.13
desire,0.72,0.27,0.4
disappointment,0.33,0.01,0.01


### Model 6

In [797]:
dt6 = DecisionTreeClassifier(min_samples_leaf=5, criterion='gini', random_state=42)
dt6.fit(X_train, y_train)

In [798]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_pred_6 = dt6.predict(X_train)
y_test_pred_6 = dt6.predict(X_test)
y_val_pred_6 = dt6.predict(X_val)


print(" Train set results")
print(accuracy_score(y_train, y_train_pred_6))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred_6))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred_6))

 Train set results
0.4211011287721723
 Test set results
0.31693384927215773
 Val set results
0.33210468116476227


In [799]:
# Model evaluation
model_eval(y_train, y_train_pred_6, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.8,0.53,0.64
amusement,0.84,0.76,0.8
anger,0.73,0.31,0.43
annoyance,0.69,0.17,0.28
approval,0.71,0.18,0.29
caring,0.7,0.16,0.26
confusion,0.74,0.13,0.23
curiosity,0.74,0.1,0.18
desire,0.77,0.32,0.45
disappointment,0.66,0.08,0.14


In [800]:
# Model evaluation
model_eval(y_test, y_test_pred_6, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.6,0.4,0.48
amusement,0.76,0.7,0.73
anger,0.55,0.23,0.32
annoyance,0.35,0.08,0.13
approval,0.39,0.09,0.15
caring,0.39,0.11,0.17
confusion,0.29,0.05,0.09
curiosity,0.25,0.02,0.04
desire,0.48,0.13,0.21
disappointment,0.25,0.03,0.05


In [801]:
# Model evaluation
model_eval(y_val, y_val_pred_6, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.65,0.5,0.57
amusement,0.78,0.68,0.73
anger,0.48,0.23,0.31
annoyance,0.28,0.06,0.09
approval,0.44,0.1,0.17
caring,0.27,0.05,0.08
confusion,0.38,0.05,0.09
curiosity,0.46,0.07,0.13
desire,0.67,0.29,0.4
disappointment,0.31,0.02,0.05


### Decision Tree Model with GoEmotion taxonomy 27 labels excluding neutral emotion 


#### Model 1

In [802]:
dt = DecisionTreeClassifier(min_samples_leaf=55, criterion='gini', random_state=42)
dt.fit(X_train_no_neu, y_train_no_neu)

In [803]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_pred = dt.predict(X_train_no_neu)
y_test_pred = dt.predict(X_test_no_neu)
y_val_pred = dt.predict(X_val_no_neu)


print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_pred))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_pred))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_pred))

 Train set results
0.26132016869912056
 Test set results
0.24810259094477885
 Val set results
0.2806468440271257


In [804]:
# Model evaluation
model_eval(y_train_no_neu, y_train_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.73,0.41,0.53
amusement,0.81,0.76,0.78
anger,0.62,0.23,0.34
annoyance,0.62,0.01,0.03
approval,0.68,0.11,0.18
caring,0.0,0.0,0.0
confusion,0.6,0.09,0.15
curiosity,1.0,0.04,0.08
desire,0.75,0.26,0.39
disappointment,0.0,0.0,0.0


In [807]:
# Model evaluation
model_eval(y_test_no_neu, y_test_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.7,0.4,0.51
amusement,0.83,0.77,0.8
anger,0.6,0.2,0.3
annoyance,0.67,0.01,0.01
approval,0.62,0.1,0.17
caring,0.0,0.0,0.0
confusion,0.38,0.07,0.12
curiosity,1.0,0.01,0.02
desire,0.64,0.11,0.19
disappointment,0.0,0.0,0.0


In [808]:
# Model evaluation
model_eval(y_val_no_neu, y_val_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.73,0.5,0.59
amusement,0.82,0.72,0.77
anger,0.65,0.23,0.33
annoyance,0.5,0.01,0.01
approval,0.7,0.08,0.15
caring,0.0,0.0,0.0
confusion,0.38,0.07,0.12
curiosity,1.0,0.06,0.11
desire,0.77,0.31,0.44
disappointment,0.0,0.0,0.0


### Model 2

In [809]:
dt1 = DecisionTreeClassifier(min_samples_leaf=20, criterion='gini', random_state=42)
dt1.fit(X_train_no_neu, y_train_no_neu)

In [810]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_pred_1 = dt1.predict(X_train_no_neu)
y_test_pred_1 = dt1.predict(X_test_no_neu)
y_val_pred_1 = dt1.predict(X_val_no_neu)


print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_pred_1))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_pred_1))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_pred_1))

 Train set results
0.30784320136005494
 Test set results
0.2873593300183198
 Val set results
0.3124673969744392


In [811]:
# Model evaluation
model_eval(y_train_no_neu, y_train_pred_1, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.75,0.48,0.59
amusement,0.82,0.79,0.81
anger,0.69,0.27,0.39
annoyance,0.62,0.07,0.12
approval,0.68,0.15,0.24
caring,0.65,0.08,0.14
confusion,0.7,0.12,0.21
curiosity,0.84,0.06,0.1
desire,0.73,0.32,0.44
disappointment,0.56,0.03,0.06


In [812]:
# Model evaluation
model_eval(y_test_no_neu, y_test_pred_1, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.71,0.46,0.56
amusement,0.84,0.79,0.81
anger,0.67,0.2,0.31
annoyance,0.66,0.07,0.12
approval,0.58,0.11,0.19
caring,0.55,0.09,0.15
confusion,0.54,0.14,0.22
curiosity,0.55,0.02,0.04
desire,0.58,0.17,0.26
disappointment,0.56,0.03,0.06


In [813]:
# Model evaluation
model_eval(y_val_no_neu, y_val_pred_1, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.73,0.53,0.61
amusement,0.83,0.74,0.78
anger,0.62,0.25,0.36
annoyance,0.48,0.05,0.08
approval,0.69,0.14,0.23
caring,0.52,0.07,0.13
confusion,0.47,0.1,0.16
curiosity,0.85,0.07,0.13
desire,0.7,0.39,0.5
disappointment,0.5,0.02,0.04


### Model 3

In [814]:
dt1 = DecisionTreeClassifier(min_samples_leaf=15, criterion='gini', random_state=42)
dt1.fit(X_train_no_neu, y_train_no_neu)

In [815]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_pred_1 = dt1.predict(X_train_no_neu)
y_test_pred_1 = dt1.predict(X_test_no_neu)
y_val_pred_1 = dt1.predict(X_val_no_neu)


print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_pred_1))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_pred_1))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_pred_1))

 Train set results
0.3266747310949096
 Test set results
0.29390211986390996
 Val set results
0.3202921231090245


In [816]:
# Model evaluation
model_eval(y_train_no_neu, y_train_pred_1, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.77,0.51,0.62
amusement,0.82,0.8,0.81
anger,0.68,0.33,0.44
annoyance,0.63,0.1,0.17
approval,0.69,0.16,0.26
caring,0.66,0.11,0.19
confusion,0.71,0.14,0.24
curiosity,0.83,0.07,0.13
desire,0.72,0.34,0.47
disappointment,0.62,0.04,0.07


In [817]:
# Model evaluation
model_eval(y_test_no_neu, y_test_pred_1, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.71,0.46,0.56
amusement,0.83,0.8,0.81
anger,0.61,0.26,0.36
annoyance,0.62,0.08,0.14
approval,0.58,0.12,0.2
caring,0.59,0.13,0.21
confusion,0.5,0.16,0.25
curiosity,0.5,0.04,0.07
desire,0.46,0.13,0.21
disappointment,0.6,0.04,0.07


In [818]:
# Model evaluation
model_eval(y_val_no_neu, y_val_pred_1, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.74,0.55,0.63
amusement,0.82,0.75,0.79
anger,0.57,0.27,0.37
annoyance,0.67,0.06,0.11
approval,0.7,0.14,0.23
caring,0.48,0.08,0.14
confusion,0.47,0.12,0.19
curiosity,0.83,0.08,0.15
desire,0.64,0.39,0.48
disappointment,0.25,0.01,0.02


### Model 4 - Final

In [819]:
dt1 = DecisionTreeClassifier(min_samples_leaf=10, criterion='gini', random_state=42)
dt1.fit(X_train_no_neu, y_train_no_neu)

In [820]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_pred_1 = dt1.predict(X_train_no_neu)
y_test_pred_1 = dt1.predict(X_test_no_neu)
y_val_pred_1 = dt1.predict(X_val_no_neu)


print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_pred_1))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_pred_1))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_pred_1))

 Train set results
0.33726746657076534
 Test set results
0.28971473436273226
 Val set results
0.31533646322378717


In [821]:
# Model evaluation
model_eval(y_train_no_neu, y_train_pred_1, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.79,0.53,0.63
amusement,0.84,0.79,0.81
anger,0.72,0.32,0.44
annoyance,0.65,0.13,0.22
approval,0.7,0.18,0.29
caring,0.71,0.13,0.23
confusion,0.72,0.16,0.26
curiosity,0.83,0.07,0.14
desire,0.76,0.34,0.47
disappointment,0.66,0.07,0.12


In [822]:
# Model evaluation
model_eval(y_test_no_neu, y_test_pred_1, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.71,0.45,0.55
amusement,0.81,0.77,0.79
anger,0.63,0.25,0.36
annoyance,0.55,0.09,0.16
approval,0.57,0.13,0.21
caring,0.61,0.14,0.23
confusion,0.47,0.16,0.24
curiosity,0.5,0.04,0.07
desire,0.53,0.19,0.28
disappointment,0.47,0.05,0.1


In [823]:
# Model evaluation
model_eval(y_val_no_neu, y_val_pred_1, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.73,0.55,0.63
amusement,0.82,0.72,0.77
anger,0.61,0.26,0.37
annoyance,0.53,0.08,0.14
approval,0.62,0.14,0.23
caring,0.43,0.08,0.13
confusion,0.48,0.12,0.2
curiosity,0.77,0.08,0.15
desire,0.65,0.34,0.44
disappointment,0.27,0.02,0.04


### Model 5

In [824]:
dt1 = DecisionTreeClassifier(min_samples_split=4, criterion='gini', random_state=42)
dt1.fit(X_train_no_neu, y_train_no_neu)

In [825]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_pred_1 = dt1.predict(X_train_no_neu)
y_test_pred_1 = dt1.predict(X_test_no_neu)
y_val_pred_1 = dt1.predict(X_val_no_neu)


print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_pred_1))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_pred_1))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_pred_1))

 Train set results
0.7948474842253245
 Test set results
0.32138183721538865
 Val set results
0.33776734480959836


In [826]:
# Model evaluation
model_eval(y_train_no_neu, y_train_pred_1, GE_taxonomy_no_neu)

Unnamed: 0,Precision,Recall,F1
admiration,0.97,0.88,0.92
amusement,0.97,0.93,0.95
anger,0.95,0.8,0.87
annoyance,0.96,0.78,0.86
approval,0.96,0.8,0.87
caring,0.96,0.76,0.85
confusion,0.95,0.74,0.83
curiosity,0.96,0.72,0.83
desire,0.97,0.83,0.9
disappointment,0.94,0.76,0.84


In [827]:
# Model evaluation
model_eval(y_test_no_neu, y_test_pred_1, GE_taxonomy_no_neu)

Unnamed: 0,Precision,Recall,F1
admiration,0.56,0.52,0.54
amusement,0.79,0.78,0.79
anger,0.4,0.33,0.36
annoyance,0.26,0.19,0.22
approval,0.31,0.25,0.27
caring,0.28,0.19,0.22
confusion,0.26,0.25,0.26
curiosity,0.2,0.12,0.15
desire,0.43,0.31,0.36
disappointment,0.15,0.09,0.11


In [828]:
# Model evaluation
model_eval(y_val_no_neu, y_val_pred_1, GE_taxonomy_no_neu)

Unnamed: 0,Precision,Recall,F1
admiration,0.59,0.6,0.6
amusement,0.75,0.75,0.75
anger,0.4,0.36,0.38
annoyance,0.28,0.18,0.22
approval,0.35,0.28,0.31
caring,0.25,0.16,0.19
confusion,0.28,0.23,0.25
curiosity,0.21,0.14,0.17
desire,0.45,0.43,0.44
disappointment,0.11,0.06,0.07


### Model 6

In [889]:
dt1 = DecisionTreeClassifier(min_samples_split=85, criterion='gini', random_state=42)
dt1.fit(X_train_no_neu, y_train_no_neu)

In [890]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_pred_1 = dt1.predict(X_train_no_neu)
y_test_pred_1 = dt1.predict(X_test_no_neu)
y_val_pred_1 = dt1.predict(X_val_no_neu)


print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_pred_1))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_pred_1))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_pred_1))

 Train set results
0.4830810475038415
 Test set results
0.30960481549332636
 Val set results
0.31846635367762127


In [891]:
# Model evaluation
model_eval(y_train_no_neu, y_train_pred_1, GE_taxonomy_no_neu)

Unnamed: 0,Precision,Recall,F1
admiration,0.88,0.67,0.76
amusement,0.89,0.85,0.87
anger,0.79,0.45,0.58
annoyance,0.79,0.33,0.46
approval,0.82,0.41,0.54
caring,0.87,0.28,0.42
confusion,0.81,0.35,0.49
curiosity,0.88,0.36,0.51
desire,0.83,0.56,0.67
disappointment,0.88,0.27,0.42


In [892]:
# Model evaluation
model_eval(y_test_no_neu, y_test_pred_1, GE_taxonomy_no_neu)

Unnamed: 0,Precision,Recall,F1
admiration,0.63,0.48,0.54
amusement,0.82,0.77,0.79
anger,0.5,0.25,0.33
annoyance,0.35,0.15,0.21
approval,0.43,0.21,0.28
caring,0.32,0.09,0.14
confusion,0.34,0.22,0.26
curiosity,0.23,0.08,0.12
desire,0.53,0.3,0.38
disappointment,0.19,0.06,0.09


In [893]:
# Model evaluation
model_eval(y_val_no_neu, y_val_pred_1, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.64,0.57,0.6
amusement,0.79,0.74,0.76
anger,0.53,0.28,0.37
annoyance,0.37,0.13,0.19
approval,0.43,0.19,0.27
caring,0.33,0.08,0.14
confusion,0.38,0.18,0.25
curiosity,0.24,0.11,0.15
desire,0.46,0.39,0.42
disappointment,0.11,0.02,0.04


### Model 7

In [261]:
dt1 = DecisionTreeClassifier(min_samples_split=2, criterion='gini', random_state=42)
dt1.fit(X_train_no_neu, y_train_no_neu)

In [262]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_pred_1 = dt1.predict(X_train_no_neu)
y_test_pred_1 = dt1.predict(X_test_no_neu)
y_val_pred_1 = dt1.predict(X_val_no_neu)


print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_pred_1))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_pred_1))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_pred_1))

 Train set results
0.9031287802007388
 Test set results
0.3242606647474483
 Val set results
0.3351591027647366


In [263]:
# Model evaluation
model_eval(y_train_no_neu, y_train_pred_1, GE_taxonomy_no_neu)

Unnamed: 0,Precision,Recall,F1
admiration,0.98,0.93,0.95
amusement,0.98,0.97,0.98
anger,0.98,0.88,0.93
annoyance,0.99,0.89,0.94
approval,0.99,0.89,0.94
caring,0.99,0.91,0.95
confusion,0.98,0.87,0.92
curiosity,0.99,0.83,0.9
desire,0.98,0.95,0.97
disappointment,0.99,0.91,0.95


In [264]:
# Model evaluation
model_eval(y_test_no_neu, y_test_pred_1, GE_taxonomy_no_neu)

Unnamed: 0,Precision,Recall,F1
admiration,0.53,0.51,0.52
amusement,0.77,0.8,0.78
anger,0.39,0.31,0.35
annoyance,0.25,0.19,0.22
approval,0.3,0.26,0.28
caring,0.28,0.2,0.23
confusion,0.24,0.25,0.25
curiosity,0.19,0.13,0.15
desire,0.43,0.33,0.37
disappointment,0.15,0.11,0.12


In [265]:
# Model evaluation
model_eval(y_val_no_neu, y_val_pred_1, GE_taxonomy_no_neu)

Unnamed: 0,Precision,Recall,F1
admiration,0.58,0.6,0.59
amusement,0.73,0.76,0.75
anger,0.39,0.34,0.36
annoyance,0.24,0.18,0.21
approval,0.33,0.29,0.31
caring,0.25,0.18,0.21
confusion,0.28,0.24,0.26
curiosity,0.21,0.15,0.17
desire,0.41,0.43,0.42
disappointment,0.14,0.08,0.1


### Model 8 - Final - Final

In [251]:
dt1 = DecisionTreeClassifier(min_samples_split=3, criterion='gini', random_state=42)
dt1.fit(X_train_no_neu, y_train_no_neu)

In [252]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_pred_1 = dt1.predict(X_train_no_neu)
y_test_pred_1 = dt1.predict(X_test_no_neu)
y_val_pred_1 = dt1.predict(X_val_no_neu)


print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_pred_1))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_pred_1))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_pred_1))

 Train set results
0.8265603033968679
 Test set results
0.32556922271656635
 Val set results
0.33750652060511216


In [253]:
# Model evaluation
model_eval(y_train_no_neu, y_train_pred_1, GE_taxonomy_no_neu)

Unnamed: 0,Precision,Recall,F1
admiration,0.98,0.89,0.93
amusement,0.98,0.94,0.96
anger,0.98,0.82,0.89
annoyance,0.99,0.81,0.89
approval,0.99,0.83,0.9
caring,0.99,0.8,0.88
confusion,0.98,0.79,0.87
curiosity,0.99,0.75,0.85
desire,0.98,0.86,0.92
disappointment,0.99,0.79,0.88


In [254]:
# Model evaluation
model_eval(y_test_no_neu, y_test_pred_1, GE_taxonomy_no_neu)

Unnamed: 0,Precision,Recall,F1
admiration,0.56,0.51,0.53
amusement,0.79,0.78,0.78
anger,0.43,0.34,0.38
annoyance,0.26,0.19,0.22
approval,0.32,0.26,0.28
caring,0.31,0.2,0.24
confusion,0.26,0.24,0.25
curiosity,0.21,0.12,0.15
desire,0.5,0.34,0.4
disappointment,0.15,0.1,0.12


In [255]:
# Model evaluation
model_eval(y_val_no_neu, y_val_pred_1, GE_taxonomy_no_neu)

Unnamed: 0,Precision,Recall,F1
admiration,0.59,0.59,0.59
amusement,0.76,0.75,0.75
anger,0.43,0.36,0.39
annoyance,0.26,0.18,0.21
approval,0.35,0.28,0.31
caring,0.26,0.16,0.2
confusion,0.28,0.22,0.25
curiosity,0.2,0.14,0.17
desire,0.44,0.47,0.46
disappointment,0.11,0.06,0.08


### Decision Tree Model with EKMAN taxonomy 7 labels including neutral emotion 


#### Model 1

In [899]:
dt = DecisionTreeClassifier(min_samples_leaf=15, criterion='gini', random_state=42)
dt.fit(X_train_ekman, y_train_ekman)

In [900]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_pred = dt.predict(X_train_ekman)
y_test_ekman_pred = dt.predict(X_test_ekman)
y_val_ekman_pred = dt.predict(X_val_ekman)


print(" Train set results")
print(accuracy_score(y_train_ekman, y_train_ekman_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman, y_test_ekman_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman, y_val_ekman_pred))

 Train set results
0.6360055286800277
 Test set results
0.5784042749216879
 Val set results
0.5932546995945448


In [902]:
# Model evaluation
cr = classification_report(y_train_ekman, y_train_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.50      0.38      0.43      4517
     disgust       0.54      0.30      0.38       694
        fear       0.60      0.38      0.46       642
         joy       0.73      0.78      0.76     15693
     sadness       0.68      0.42      0.52      2938
    surprise       0.55      0.27      0.36      4707
     neutral       0.58      0.75      0.66     14219

    accuracy                           0.64     43410
   macro avg       0.60      0.47      0.51     43410
weighted avg       0.63      0.64      0.62     43410



In [903]:
# Model evaluation
cr = classification_report(y_test_ekman, y_test_ekman_pred, target_names=class_label_names_ekman)
print(cr)


              precision    recall  f1-score   support

       anger       0.40      0.28      0.33       595
     disgust       0.55      0.31      0.40       112
        fear       0.68      0.44      0.53        87
         joy       0.70      0.75      0.72      1915
     sadness       0.63      0.36      0.46       341
    surprise       0.43      0.20      0.27       590
     neutral       0.52      0.68      0.59      1787

    accuracy                           0.58      5427
   macro avg       0.56      0.43      0.47      5427
weighted avg       0.57      0.58      0.56      5427



In [904]:
# Model evaluation
cr = classification_report(y_val_ekman, y_val_ekman_pred, target_names=class_label_names_ekman)
print(cr)


              precision    recall  f1-score   support

       anger       0.44      0.30      0.36       582
     disgust       0.47      0.35      0.40        81
        fear       0.59      0.29      0.39        89
         joy       0.73      0.75      0.74      1997
     sadness       0.62      0.36      0.46       352
    surprise       0.41      0.21      0.28       559
     neutral       0.53      0.71      0.60      1766

    accuracy                           0.59      5426
   macro avg       0.54      0.42      0.46      5426
weighted avg       0.59      0.59      0.58      5426



### Model 2

In [905]:
dt = DecisionTreeClassifier(min_samples_leaf=20, criterion='gini', random_state=42)
dt.fit(X_train_ekman, y_train_ekman)

In [906]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_pred = dt.predict(X_train_ekman)
y_test_ekman_pred = dt.predict(X_test_ekman)
y_val_ekman_pred = dt.predict(X_val_ekman)


print(" Train set results")
print(accuracy_score(y_train_ekman, y_train_ekman_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman, y_test_ekman_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman, y_val_ekman_pred))

 Train set results
0.6269983874683253
 Test set results
0.58595909342178
 Val set results
0.5954662734979728


In [908]:
# Model evaluation
cr = classification_report(y_train_ekman, y_train_ekman_pred, target_names=class_label_names_ekman)
print(cr)


              precision    recall  f1-score   support

       anger       0.49      0.34      0.40      4517
     disgust       0.54      0.30      0.38       694
        fear       0.59      0.38      0.46       642
         joy       0.74      0.76      0.75     15693
     sadness       0.66      0.42      0.51      2938
    surprise       0.55      0.24      0.34      4707
     neutral       0.56      0.77      0.65     14219

    accuracy                           0.63     43410
   macro avg       0.59      0.46      0.50     43410
weighted avg       0.62      0.63      0.61     43410



In [909]:
# Model evaluation
cr = classification_report(y_test_ekman, y_test_ekman_pred, target_names=class_label_names_ekman)
print(cr)


              precision    recall  f1-score   support

       anger       0.42      0.26      0.32       595
     disgust       0.58      0.34      0.43       112
        fear       0.64      0.41      0.50        87
         joy       0.71      0.74      0.73      1915
     sadness       0.66      0.39      0.49       341
    surprise       0.44      0.19      0.27       590
     neutral       0.52      0.72      0.60      1787

    accuracy                           0.59      5427
   macro avg       0.57      0.44      0.48      5427
weighted avg       0.58      0.59      0.57      5427



In [910]:
# Model evaluation
cr = classification_report(y_val_ekman, y_val_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.45      0.28      0.35       582
     disgust       0.46      0.35      0.39        81
        fear       0.65      0.34      0.44        89
         joy       0.73      0.74      0.74      1997
     sadness       0.64      0.38      0.48       352
    surprise       0.44      0.19      0.27       559
     neutral       0.52      0.73      0.61      1766

    accuracy                           0.60      5426
   macro avg       0.56      0.43      0.47      5426
weighted avg       0.59      0.60      0.58      5426



### Model 3

In [911]:
dt = DecisionTreeClassifier(min_samples_leaf=25, criterion='gini', random_state=42)
dt.fit(X_train_ekman, y_train_ekman)

In [912]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_pred = dt.predict(X_train_ekman)
y_test_ekman_pred = dt.predict(X_test_ekman)
y_val_ekman_pred = dt.predict(X_val_ekman)


print(" Train set results")
print(accuracy_score(y_train_ekman, y_train_ekman_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman, y_test_ekman_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman, y_val_ekman_pred))

 Train set results
0.6175996314213315
 Test set results
0.5807997051778147
 Val set results
0.5962034647991153


In [913]:
# Model evaluation
cr = classification_report(y_train_ekman, y_train_ekman_pred, target_names=class_label_names_ekman)
print(cr)


              precision    recall  f1-score   support

       anger       0.49      0.32      0.39      4517
     disgust       0.51      0.30      0.38       694
        fear       0.59      0.38      0.46       642
         joy       0.74      0.75      0.74     15693
     sadness       0.65      0.40      0.50      2938
    surprise       0.56      0.21      0.31      4707
     neutral       0.55      0.77      0.64     14219

    accuracy                           0.62     43410
   macro avg       0.58      0.45      0.49     43410
weighted avg       0.62      0.62      0.60     43410



In [914]:
# Model evaluation
cr = classification_report(y_test_ekman, y_test_ekman_pred, target_names=class_label_names_ekman)
print(cr)


              precision    recall  f1-score   support

       anger       0.42      0.25      0.32       595
     disgust       0.55      0.32      0.41       112
        fear       0.66      0.40      0.50        87
         joy       0.72      0.73      0.72      1915
     sadness       0.63      0.35      0.45       341
    surprise       0.47      0.16      0.24       590
     neutral       0.50      0.73      0.60      1787

    accuracy                           0.58      5427
   macro avg       0.57      0.42      0.46      5427
weighted avg       0.58      0.58      0.56      5427



In [915]:
# Model evaluation
cr = classification_report(y_val_ekman, y_val_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.45      0.27      0.34       582
     disgust       0.48      0.36      0.41        81
        fear       0.65      0.34      0.44        89
         joy       0.73      0.74      0.73      1997
     sadness       0.62      0.38      0.47       352
    surprise       0.47      0.18      0.26       559
     neutral       0.52      0.74      0.61      1766

    accuracy                           0.60      5426
   macro avg       0.56      0.43      0.47      5426
weighted avg       0.59      0.60      0.58      5426



### Model 4

In [916]:
dt = DecisionTreeClassifier(min_samples_leaf=23, criterion='gini', random_state=42)
dt.fit(X_train_ekman, y_train_ekman)

In [917]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_pred = dt.predict(X_train_ekman)
y_test_ekman_pred = dt.predict(X_test_ekman)
y_val_ekman_pred = dt.predict(X_val_ekman)


print(" Train set results")
print(accuracy_score(y_train_ekman, y_train_ekman_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman, y_test_ekman_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman, y_val_ekman_pred))

 Train set results
0.6213314904399908
 Test set results
0.5811682329095265
 Val set results
0.5967563582749723


In [918]:
# Model evaluation
cr = classification_report(y_train_ekman, y_train_ekman_pred, target_names=class_label_names_ekman)
print(cr)


              precision    recall  f1-score   support

       anger       0.48      0.34      0.40      4517
     disgust       0.54      0.29      0.38       694
        fear       0.59      0.38      0.46       642
         joy       0.74      0.76      0.75     15693
     sadness       0.66      0.41      0.50      2938
    surprise       0.54      0.23      0.32      4707
     neutral       0.55      0.76      0.64     14219

    accuracy                           0.62     43410
   macro avg       0.59      0.45      0.49     43410
weighted avg       0.62      0.62      0.60     43410



In [919]:
# Model evaluation
cr = classification_report(y_test_ekman, y_test_ekman_pred, target_names=class_label_names_ekman)
print(cr)


              precision    recall  f1-score   support

       anger       0.42      0.28      0.34       595
     disgust       0.57      0.30      0.40       112
        fear       0.64      0.41      0.50        87
         joy       0.71      0.74      0.72      1915
     sadness       0.65      0.37      0.47       341
    surprise       0.44      0.18      0.25       590
     neutral       0.51      0.72      0.60      1787

    accuracy                           0.58      5427
   macro avg       0.56      0.43      0.47      5427
weighted avg       0.58      0.58      0.56      5427



In [920]:
# Model evaluation
cr = classification_report(y_val_ekman, y_val_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.44      0.29      0.35       582
     disgust       0.48      0.35      0.40        81
        fear       0.65      0.34      0.44        89
         joy       0.74      0.74      0.74      1997
     sadness       0.62      0.38      0.47       352
    surprise       0.45      0.19      0.27       559
     neutral       0.52      0.73      0.61      1766

    accuracy                           0.60      5426
   macro avg       0.56      0.43      0.47      5426
weighted avg       0.59      0.60      0.58      5426



### Model 5

In [946]:
dt = DecisionTreeClassifier(min_samples_split=20, criterion='gini', random_state=42)
dt.fit(X_train_ekman, y_train_ekman)

In [947]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_pred = dt.predict(X_train_ekman)
y_test_ekman_pred = dt.predict(X_test_ekman)
y_val_ekman_pred = dt.predict(X_val_ekman)


print(" Train set results")
print(accuracy_score(y_train_ekman, y_train_ekman_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman, y_test_ekman_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman, y_val_ekman_pred))

 Train set results
0.8074176457037548
 Test set results
0.5317855168601437
 Val set results
0.5510504976041283


In [948]:
# Model evaluation
cr = classification_report(y_train_ekman, y_train_ekman_pred, target_names=class_label_names_ekman)
print(cr)


              precision    recall  f1-score   support

       anger       0.70      0.73      0.72      4517
     disgust       0.73      0.68      0.70       694
        fear       0.80      0.70      0.75       642
         joy       0.87      0.88      0.87     15693
     sadness       0.83      0.70      0.76      2938
    surprise       0.79      0.61      0.69      4707
     neutral       0.78      0.85      0.81     14219

    accuracy                           0.81     43410
   macro avg       0.79      0.74      0.76     43410
weighted avg       0.81      0.81      0.81     43410



In [949]:
# Model evaluation
cr = classification_report(y_test_ekman, y_test_ekman_pred, target_names=class_label_names_ekman)
print(cr)


              precision    recall  f1-score   support

       anger       0.30      0.26      0.27       595
     disgust       0.39      0.32      0.35       112
        fear       0.51      0.40      0.45        87
         joy       0.67      0.70      0.68      1915
     sadness       0.48      0.38      0.42       341
    surprise       0.32      0.22      0.26       590
     neutral       0.51      0.60      0.55      1787

    accuracy                           0.53      5427
   macro avg       0.45      0.41      0.43      5427
weighted avg       0.52      0.53      0.52      5427



In [950]:
# Model evaluation
cr = classification_report(y_val_ekman, y_val_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.34      0.30      0.32       582
     disgust       0.36      0.36      0.36        81
        fear       0.46      0.33      0.38        89
         joy       0.69      0.71      0.70      1997
     sadness       0.48      0.36      0.42       352
    surprise       0.35      0.24      0.28       559
     neutral       0.52      0.61      0.56      1766

    accuracy                           0.55      5426
   macro avg       0.46      0.42      0.43      5426
weighted avg       0.54      0.55      0.54      5426



### Model 6

In [291]:
dt = DecisionTreeClassifier(min_samples_leaf=18, criterion='gini', random_state=42)
dt.fit(X_train_ekman, y_train_ekman)

In [292]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_pred = dt.predict(X_train_ekman)
y_test_ekman_pred = dt.predict(X_test_ekman)
y_val_ekman_pred = dt.predict(X_val_ekman)


print(" Train set results")
print(accuracy_score(y_train_ekman, y_train_ekman_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman, y_test_ekman_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman, y_val_ekman_pred))

 Train set results
0.630062197650311
 Test set results
0.5876174682144831
 Val set results
0.597493549576115


In [293]:
# Model evaluation
cr = classification_report(y_train_ekman, y_train_ekman_pred, target_names=class_label_names_ekman)
print(cr)


              precision    recall  f1-score   support

       anger       0.50      0.35      0.41      4517
     disgust       0.54      0.30      0.38       694
        fear       0.58      0.39      0.47       642
         joy       0.74      0.77      0.75     15693
     sadness       0.66      0.42      0.51      2938
    surprise       0.55      0.25      0.34      4707
     neutral       0.57      0.76      0.65     14219

    accuracy                           0.63     43410
   macro avg       0.59      0.46      0.50     43410
weighted avg       0.63      0.63      0.61     43410



In [294]:
# Model evaluation
cr = classification_report(y_test_ekman, y_test_ekman_pred, target_names=class_label_names_ekman)
print(cr)


              precision    recall  f1-score   support

       anger       0.40      0.26      0.31       595
     disgust       0.59      0.35      0.44       112
        fear       0.66      0.44      0.52        87
         joy       0.71      0.74      0.73      1915
     sadness       0.65      0.38      0.48       341
    surprise       0.46      0.20      0.28       590
     neutral       0.52      0.72      0.61      1787

    accuracy                           0.59      5427
   macro avg       0.57      0.44      0.48      5427
weighted avg       0.58      0.59      0.57      5427



In [295]:
# Model evaluation
cr = classification_report(y_val_ekman, y_val_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.44      0.28      0.35       582
     disgust       0.48      0.35      0.40        81
        fear       0.61      0.34      0.43        89
         joy       0.73      0.74      0.74      1997
     sadness       0.61      0.37      0.46       352
    surprise       0.45      0.21      0.28       559
     neutral       0.53      0.73      0.61      1766

    accuracy                           0.60      5426
   macro avg       0.55      0.43      0.47      5426
weighted avg       0.59      0.60      0.58      5426



### Model 7 - Final

In [281]:
dt = DecisionTreeClassifier(min_samples_leaf=19, criterion='gini', random_state=42)
dt.fit(X_train_ekman, y_train_ekman)

In [282]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_pred = dt.predict(X_train_ekman)
y_test_ekman_pred = dt.predict(X_test_ekman)
y_val_ekman_pred = dt.predict(X_val_ekman)


print(" Train set results")
print(accuracy_score(y_train_ekman, y_train_ekman_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman, y_test_ekman_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman, y_val_ekman_pred))

 Train set results
0.6289103893112186
 Test set results
0.5852220379583564
 Val set results
0.5989679321784003


In [283]:
# Model evaluation
cr = classification_report(y_train_ekman, y_train_ekman_pred, target_names=class_label_names_ekman)
print(cr)


              precision    recall  f1-score   support

       anger       0.50      0.35      0.41      4517
     disgust       0.54      0.30      0.38       694
        fear       0.58      0.39      0.47       642
         joy       0.74      0.76      0.75     15693
     sadness       0.66      0.42      0.51      2938
    surprise       0.56      0.25      0.34      4707
     neutral       0.56      0.76      0.65     14219

    accuracy                           0.63     43410
   macro avg       0.59      0.46      0.50     43410
weighted avg       0.63      0.63      0.61     43410



In [284]:
# Model evaluation
cr = classification_report(y_test_ekman, y_test_ekman_pred, target_names=class_label_names_ekman)
print(cr)


              precision    recall  f1-score   support

       anger       0.40      0.26      0.32       595
     disgust       0.59      0.35      0.44       112
        fear       0.66      0.44      0.52        87
         joy       0.71      0.74      0.72      1915
     sadness       0.65      0.39      0.49       341
    surprise       0.46      0.19      0.27       590
     neutral       0.52      0.72      0.60      1787

    accuracy                           0.59      5427
   macro avg       0.57      0.44      0.48      5427
weighted avg       0.58      0.59      0.57      5427



In [285]:
# Model evaluation
cr = classification_report(y_val_ekman, y_val_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.45      0.29      0.35       582
     disgust       0.48      0.35      0.40        81
        fear       0.61      0.34      0.43        89
         joy       0.74      0.74      0.74      1997
     sadness       0.61      0.38      0.46       352
    surprise       0.45      0.20      0.28       559
     neutral       0.53      0.74      0.61      1766

    accuracy                           0.60      5426
   macro avg       0.55      0.43      0.47      5426
weighted avg       0.59      0.60      0.58      5426



### Decision Tree Model with EKMAN taxonomy 6 labels excluding neutral emotion 


#### Model 1 

In [951]:
dt = DecisionTreeClassifier(min_samples_leaf=15, criterion='gini', random_state=42)
dt.fit(X_train_ekman_no_neu, y_train_ekman_no_neu)

In [952]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_no_neu_pred = dt.predict(X_train_ekman_no_neu)
y_test_ekman_no_neu_pred = dt.predict(X_test_ekman_no_neu)
y_val_ekman_no_neu_pred = dt.predict(X_val_ekman_no_neu)


print(" Train set results")
print(accuracy_score(y_train_ekman_no_neu, y_train_ekman_no_neu_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman_no_neu, y_test_ekman_no_neu_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman_no_neu, y_val_ekman_no_neu_pred))

 Train set results
0.6936727073413038
 Test set results
0.6357142857142857
 Val set results
0.6568306010928961


In [953]:
# Model evaluation
cr = classification_report(y_train_ekman_no_neu, y_train_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.55      0.54      0.55      4517
     disgust       0.59      0.33      0.43       694
        fear       0.70      0.37      0.48       642
         joy       0.75      0.89      0.81     15693
     sadness       0.69      0.49      0.58      2938
    surprise       0.58      0.40      0.47      4707

    accuracy                           0.69     29191
   macro avg       0.64      0.50      0.55     29191
weighted avg       0.68      0.69      0.68     29191



In [954]:
# Model evaluation
cr = classification_report(y_test_ekman_no_neu, y_test_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)


              precision    recall  f1-score   support

       anger       0.48      0.45      0.46       595
     disgust       0.63      0.34      0.44       112
        fear       0.72      0.41      0.53        87
         joy       0.71      0.86      0.78      1915
     sadness       0.58      0.43      0.50       341
    surprise       0.43      0.29      0.35       590

    accuracy                           0.64      3640
   macro avg       0.59      0.46      0.51      3640
weighted avg       0.61      0.64      0.62      3640



In [955]:
# Model evaluation
cr = classification_report(y_val_ekman_no_neu, y_val_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)


              precision    recall  f1-score   support

       anger       0.50      0.45      0.48       582
     disgust       0.51      0.38      0.44        81
        fear       0.72      0.33      0.45        89
         joy       0.73      0.87      0.79      1997
     sadness       0.61      0.45      0.52       352
    surprise       0.48      0.34      0.40       559

    accuracy                           0.66      3660
   macro avg       0.59      0.47      0.51      3660
weighted avg       0.64      0.66      0.64      3660



### Model 2

In [956]:
dt = DecisionTreeClassifier(min_samples_leaf=20, criterion='gini', random_state=42)
dt.fit(X_train_ekman_no_neu, y_train_ekman_no_neu)

In [957]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_no_neu_pred = dt.predict(X_train_ekman_no_neu)
y_test_ekman_no_neu_pred = dt.predict(X_test_ekman_no_neu)
y_val_ekman_no_neu_pred = dt.predict(X_val_ekman_no_neu)


print(" Train set results")
print(accuracy_score(y_train_ekman_no_neu, y_train_ekman_no_neu_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman_no_neu, y_test_ekman_no_neu_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman_no_neu, y_val_ekman_no_neu_pred))

 Train set results
0.6796615395156042
 Test set results
0.6255494505494505
 Val set results
0.6538251366120219


In [958]:
# Model evaluation
cr = classification_report(y_train_ekman_no_neu, y_train_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.53      0.52      0.53      4517
     disgust       0.55      0.34      0.42       694
        fear       0.70      0.35      0.47       642
         joy       0.73      0.89      0.80     15693
     sadness       0.70      0.45      0.55      2938
    surprise       0.57      0.37      0.45      4707

    accuracy                           0.68     29191
   macro avg       0.63      0.49      0.53     29191
weighted avg       0.67      0.68      0.66     29191



In [959]:
# Model evaluation
cr = classification_report(y_test_ekman_no_neu, y_test_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)


              precision    recall  f1-score   support

       anger       0.45      0.41      0.43       595
     disgust       0.62      0.37      0.46       112
        fear       0.72      0.39      0.51        87
         joy       0.70      0.86      0.77      1915
     sadness       0.63      0.42      0.50       341
    surprise       0.42      0.27      0.33       590

    accuracy                           0.63      3640
   macro avg       0.59      0.45      0.50      3640
weighted avg       0.60      0.63      0.60      3640



In [960]:
# Model evaluation
cr = classification_report(y_val_ekman_no_neu, y_val_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)


              precision    recall  f1-score   support

       anger       0.48      0.45      0.46       582
     disgust       0.47      0.36      0.41        81
        fear       0.76      0.33      0.46        89
         joy       0.72      0.87      0.79      1997
     sadness       0.64      0.43      0.52       352
    surprise       0.48      0.32      0.38       559

    accuracy                           0.65      3660
   macro avg       0.59      0.46      0.50      3660
weighted avg       0.64      0.65      0.63      3660



### Model 3

In [961]:
dt = DecisionTreeClassifier(min_samples_leaf=25, criterion='gini', random_state=42)
dt.fit(X_train_ekman_no_neu, y_train_ekman_no_neu)

In [962]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_no_neu_pred = dt.predict(X_train_ekman_no_neu)
y_test_ekman_no_neu_pred = dt.predict(X_test_ekman_no_neu)
y_val_ekman_no_neu_pred = dt.predict(X_val_ekman_no_neu)


print(" Train set results")
print(accuracy_score(y_train_ekman_no_neu, y_train_ekman_no_neu_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman_no_neu, y_test_ekman_no_neu_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman_no_neu, y_val_ekman_no_neu_pred))

 Train set results
0.6702750847864067
 Test set results
0.6282967032967033
 Val set results
0.6450819672131147


In [963]:
# Model evaluation
cr = classification_report(y_train_ekman_no_neu, y_train_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.52      0.51      0.51      4517
     disgust       0.57      0.30      0.39       694
        fear       0.70      0.33      0.45       642
         joy       0.72      0.89      0.80     15693
     sadness       0.70      0.42      0.52      2938
    surprise       0.56      0.35      0.43      4707

    accuracy                           0.67     29191
   macro avg       0.63      0.47      0.52     29191
weighted avg       0.66      0.67      0.65     29191



In [964]:
# Model evaluation
cr = classification_report(y_test_ekman_no_neu, y_test_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)


              precision    recall  f1-score   support

       anger       0.47      0.44      0.45       595
     disgust       0.67      0.35      0.46       112
        fear       0.70      0.36      0.47        87
         joy       0.69      0.87      0.77      1915
     sadness       0.62      0.38      0.47       341
    surprise       0.45      0.27      0.34       590

    accuracy                           0.63      3640
   macro avg       0.60      0.44      0.49      3640
weighted avg       0.61      0.63      0.60      3640



In [965]:
# Model evaluation
cr = classification_report(y_val_ekman_no_neu, y_val_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)


              precision    recall  f1-score   support

       anger       0.47      0.42      0.44       582
     disgust       0.50      0.36      0.42        81
        fear       0.78      0.31      0.45        89
         joy       0.71      0.88      0.78      1997
     sadness       0.67      0.41      0.51       352
    surprise       0.46      0.30      0.36       559

    accuracy                           0.65      3660
   macro avg       0.60      0.45      0.49      3660
weighted avg       0.63      0.65      0.62      3660



### Model 4 

In [966]:
dt = DecisionTreeClassifier(min_samples_leaf=22, criterion='gini', random_state=42)
dt.fit(X_train_ekman_no_neu, y_train_ekman_no_neu)

In [967]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_no_neu_pred = dt.predict(X_train_ekman_no_neu)
y_test_ekman_no_neu_pred = dt.predict(X_test_ekman_no_neu)
y_val_ekman_no_neu_pred = dt.predict(X_val_ekman_no_neu)


print(" Train set results")
print(accuracy_score(y_train_ekman_no_neu, y_train_ekman_no_neu_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman_no_neu, y_test_ekman_no_neu_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman_no_neu, y_val_ekman_no_neu_pred))

 Train set results
0.6754136548936316
 Test set results
0.6324175824175824
 Val set results
0.6502732240437158


In [968]:
# Model evaluation
cr = classification_report(y_train_ekman_no_neu, y_train_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.52      0.52      0.52      4517
     disgust       0.60      0.28      0.39       694
        fear       0.69      0.35      0.47       642
         joy       0.73      0.89      0.80     15693
     sadness       0.70      0.44      0.54      2938
    surprise       0.57      0.36      0.44      4707

    accuracy                           0.68     29191
   macro avg       0.63      0.47      0.53     29191
weighted avg       0.66      0.68      0.66     29191



In [969]:
# Model evaluation
cr = classification_report(y_test_ekman_no_neu, y_test_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)


              precision    recall  f1-score   support

       anger       0.48      0.44      0.46       595
     disgust       0.70      0.33      0.45       112
        fear       0.72      0.39      0.51        87
         joy       0.70      0.87      0.77      1915
     sadness       0.63      0.42      0.50       341
    surprise       0.42      0.27      0.33       590

    accuracy                           0.63      3640
   macro avg       0.61      0.45      0.50      3640
weighted avg       0.61      0.63      0.61      3640



In [970]:
# Model evaluation
cr = classification_report(y_val_ekman_no_neu, y_val_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)


              precision    recall  f1-score   support

       anger       0.47      0.44      0.46       582
     disgust       0.50      0.32      0.39        81
        fear       0.76      0.33      0.46        89
         joy       0.72      0.88      0.79      1997
     sadness       0.65      0.42      0.51       352
    surprise       0.47      0.31      0.37       559

    accuracy                           0.65      3660
   macro avg       0.60      0.45      0.50      3660
weighted avg       0.63      0.65      0.63      3660



### Model 5

In [976]:
dt = DecisionTreeClassifier(min_samples_leaf=21, criterion='gini', random_state=42)
dt.fit(X_train_ekman_no_neu, y_train_ekman_no_neu)

In [977]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_no_neu_pred = dt.predict(X_train_ekman_no_neu)
y_test_ekman_no_neu_pred = dt.predict(X_test_ekman_no_neu)
y_val_ekman_no_neu_pred = dt.predict(X_val_ekman_no_neu)


print(" Train set results")
print(accuracy_score(y_train_ekman_no_neu, y_train_ekman_no_neu_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman_no_neu, y_test_ekman_no_neu_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman_no_neu, y_val_ekman_no_neu_pred))

 Train set results
0.6776061114727142
 Test set results
0.6274725274725275
 Val set results
0.6491803278688525


In [978]:
# Model evaluation
cr = classification_report(y_train_ekman_no_neu, y_train_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.53      0.52      0.52      4517
     disgust       0.57      0.30      0.39       694
        fear       0.70      0.35      0.47       642
         joy       0.73      0.89      0.80     15693
     sadness       0.68      0.45      0.54      2938
    surprise       0.57      0.36      0.44      4707

    accuracy                           0.68     29191
   macro avg       0.63      0.48      0.53     29191
weighted avg       0.66      0.68      0.66     29191



In [979]:
# Model evaluation
cr = classification_report(y_test_ekman_no_neu, y_test_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)


              precision    recall  f1-score   support

       anger       0.47      0.43      0.45       595
     disgust       0.66      0.36      0.46       112
        fear       0.72      0.39      0.51        87
         joy       0.70      0.86      0.77      1915
     sadness       0.62      0.41      0.49       341
    surprise       0.42      0.27      0.33       590

    accuracy                           0.63      3640
   macro avg       0.60      0.45      0.50      3640
weighted avg       0.61      0.63      0.60      3640



In [980]:
# Model evaluation
cr = classification_report(y_val_ekman_no_neu, y_val_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)


              precision    recall  f1-score   support

       anger       0.48      0.42      0.45       582
     disgust       0.50      0.35      0.41        81
        fear       0.76      0.33      0.46        89
         joy       0.72      0.88      0.79      1997
     sadness       0.62      0.42      0.50       352
    surprise       0.47      0.31      0.37       559

    accuracy                           0.65      3660
   macro avg       0.59      0.45      0.50      3660
weighted avg       0.63      0.65      0.63      3660



### Model 6

In [321]:
dt = DecisionTreeClassifier(min_samples_leaf=20, criterion='gini', random_state=42)
dt.fit(X_train_ekman_no_neu, y_train_ekman_no_neu)

In [322]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_no_neu_pred = dt.predict(X_train_ekman_no_neu)
y_test_ekman_no_neu_pred = dt.predict(X_test_ekman_no_neu)
y_val_ekman_no_neu_pred = dt.predict(X_val_ekman_no_neu)


print(" Train set results")
print(accuracy_score(y_train_ekman_no_neu, y_train_ekman_no_neu_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman_no_neu, y_test_ekman_no_neu_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman_no_neu, y_val_ekman_no_neu_pred))

 Train set results
0.6796615395156042
 Test set results
0.6255494505494505
 Val set results
0.6538251366120219


In [323]:
# Model evaluation
cr = classification_report(y_train_ekman_no_neu, y_train_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.53      0.52      0.53      4517
     disgust       0.55      0.34      0.42       694
        fear       0.70      0.35      0.47       642
         joy       0.73      0.89      0.80     15693
     sadness       0.70      0.45      0.55      2938
    surprise       0.57      0.37      0.45      4707

    accuracy                           0.68     29191
   macro avg       0.63      0.49      0.53     29191
weighted avg       0.67      0.68      0.66     29191



In [324]:
# Model evaluation
cr = classification_report(y_test_ekman_no_neu, y_test_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)


              precision    recall  f1-score   support

       anger       0.45      0.41      0.43       595
     disgust       0.62      0.37      0.46       112
        fear       0.72      0.39      0.51        87
         joy       0.70      0.86      0.77      1915
     sadness       0.63      0.42      0.50       341
    surprise       0.42      0.27      0.33       590

    accuracy                           0.63      3640
   macro avg       0.59      0.45      0.50      3640
weighted avg       0.60      0.63      0.60      3640



In [325]:
# Model evaluation
cr = classification_report(y_val_ekman_no_neu, y_val_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)


              precision    recall  f1-score   support

       anger       0.48      0.45      0.46       582
     disgust       0.47      0.36      0.41        81
        fear       0.76      0.33      0.46        89
         joy       0.72      0.87      0.79      1997
     sadness       0.64      0.43      0.52       352
    surprise       0.48      0.32      0.38       559

    accuracy                           0.65      3660
   macro avg       0.59      0.46      0.50      3660
weighted avg       0.64      0.65      0.63      3660



### Model 7 - Final - Final

In [316]:
dt = DecisionTreeClassifier(min_samples_leaf=19, criterion='gini', random_state=42)
dt.fit(X_train_ekman_no_neu, y_train_ekman_no_neu)

In [317]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_no_neu_pred = dt.predict(X_train_ekman_no_neu)
y_test_ekman_no_neu_pred = dt.predict(X_test_ekman_no_neu)
y_val_ekman_no_neu_pred = dt.predict(X_val_ekman_no_neu)


print(" Train set results")
print(accuracy_score(y_train_ekman_no_neu, y_train_ekman_no_neu_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman_no_neu, y_test_ekman_no_neu_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman_no_neu, y_val_ekman_no_neu_pred))

 Train set results
0.682950224384228
 Test set results
0.6362637362637362
 Val set results
0.6579234972677596


In [318]:
# Model evaluation
cr = classification_report(y_train_ekman_no_neu, y_train_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.54      0.53      0.53      4517
     disgust       0.55      0.35      0.43       694
        fear       0.69      0.37      0.48       642
         joy       0.73      0.89      0.81     15693
     sadness       0.70      0.46      0.55      2938
    surprise       0.58      0.36      0.45      4707

    accuracy                           0.68     29191
   macro avg       0.63      0.49      0.54     29191
weighted avg       0.67      0.68      0.66     29191



In [319]:
# Model evaluation
cr = classification_report(y_test_ekman_no_neu, y_test_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)


              precision    recall  f1-score   support

       anger       0.47      0.44      0.45       595
     disgust       0.62      0.38      0.47       112
        fear       0.72      0.41      0.53        87
         joy       0.70      0.87      0.78      1915
     sadness       0.63      0.43      0.51       341
    surprise       0.45      0.28      0.35       590

    accuracy                           0.64      3640
   macro avg       0.60      0.47      0.51      3640
weighted avg       0.62      0.64      0.61      3640



In [320]:
# Model evaluation
cr = classification_report(y_val_ekman_no_neu, y_val_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)


              precision    recall  f1-score   support

       anger       0.49      0.45      0.47       582
     disgust       0.46      0.38      0.42        81
        fear       0.74      0.35      0.47        89
         joy       0.72      0.88      0.79      1997
     sadness       0.66      0.43      0.52       352
    surprise       0.49      0.31      0.38       559

    accuracy                           0.66      3660
   macro avg       0.59      0.47      0.51      3660
weighted avg       0.64      0.66      0.64      3660



### Random Forest

In [748]:
# performing hyperparameter tuning

rf1 = RandomForestClassifier(random_state=42, bootstrap=True)

params = {
    'criterion' : ['gini'],
    #'max_depth' : [4,5],
    'min_samples_split':[4],
    'n_estimators' : [400]
}

grid_search_rf = GridSearchCV(estimator=rf1,param_grid=params,
                           cv=3,n_jobs=-1,verbose=1, scoring='accuracy')
grid_search_rf.fit(X_train, y_train)

Fitting 3 folds for each of 2 candidates, totalling 6 fits


In [755]:
# Getting the best score
grid_search_rf.best_score_

0.3515549412577747

In [756]:
# Getting best set of hyperparameters
grid_search_rf.best_params_

{'criterion': 'gini', 'min_samples_split': 4, 'n_estimators': 400}

In [746]:
# Getting the best score
grid_search_rf.best_score_

0.3018198571757659

In [747]:
# Getting best set of hyperparameters
grid_search_rf.best_params_

{'criterion': 'gini', 'min_samples_leaf': 4, 'n_estimators': 400}

In [743]:
# Getting the best score
grid_search_rf.best_score_

0.08818244644091222

In [744]:
# Getting best set of hyperparameters
grid_search_rf.best_params_

{'criterion': 'gini', 'min_samples_leaf': 50, 'n_estimators': 400}

In [737]:
# Getting the best score
grid_search_rf.best_score_

0.35577055977885275

In [738]:
# Getting best set of hyperparameters -  bootstrap=True
grid_search_rf.best_params_

{'criterion': 'gini', 'n_estimators': 400}

In [734]:
# Getting the best score
grid_search_rf.best_score_

0.09292789679797282

In [735]:
# Getting best set of hyperparameters
grid_search_rf.best_params_

{'criterion': 'gini', 'min_samples_leaf': 55, 'n_estimators': 400}

In [731]:
# Getting the best score
grid_search_rf.best_score_

0.0

In [732]:
# Getting best set of hyperparameters
grid_search_rf.best_params_

{'criterion': 'gini', 'max_depth': 4}

In [727]:
# Getting the best score
grid_search_rf.best_score_

0.0890808569454043

In [729]:
# Getting best set of hyperparameters
grid_search_rf.best_params_

{'criterion': 'gini', 'min_samples_leaf': 55}

### Random Forest Model with GoEmotion taxonomy 28 labels including neutral emotion 


#### Model 1

In [992]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=300, min_samples_leaf=15)
rf_model1.fit(X_train, y_train)


In [993]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_pred = rf_model1.predict(X_train)
y_test_pred = rf_model1.predict(X_test)
y_val_pred = rf_model1.predict(X_val)


print(" Train set results")
print(accuracy_score(y_train, y_train_pred))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred))

 Train set results
0.28675420410043767
 Test set results
0.28137092316196793
 Val set results
0.28805750092148913


In [994]:
# Model evaluation
model_eval(y_train, y_train_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.81,0.29,0.42
amusement,0.81,0.6,0.69
anger,0.69,0.1,0.18
annoyance,0.58,0.01,0.01
approval,0.78,0.05,0.09
caring,0.67,0.01,0.03
confusion,0.94,0.02,0.04
curiosity,0.92,0.05,0.09
desire,0.71,0.19,0.3
disappointment,0.0,0.0,0.0


In [995]:
# Model evaluation
model_eval(y_test, y_test_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.74,0.27,0.4
amusement,0.81,0.61,0.7
anger,0.75,0.11,0.19
annoyance,0.5,0.0,0.01
approval,0.73,0.05,0.1
caring,0.6,0.02,0.04
confusion,1.0,0.03,0.05
curiosity,0.83,0.02,0.03
desire,0.47,0.08,0.14
disappointment,0.0,0.0,0.0


In [996]:
# Model evaluation
model_eval(y_val, y_val_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.81,0.34,0.48
amusement,0.85,0.6,0.7
anger,0.58,0.08,0.14
annoyance,0.33,0.0,0.01
approval,0.84,0.05,0.1
caring,1.0,0.02,0.04
confusion,1.0,0.02,0.04
curiosity,0.94,0.07,0.13
desire,0.71,0.19,0.31
disappointment,0.0,0.0,0.0


### Model 2

In [987]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_leaf=15)
rf_model1.fit(X_train, y_train)


In [988]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_pred = rf_model1.predict(X_train)
y_test_pred = rf_model1.predict(X_test)
y_val_pred = rf_model1.predict(X_val)


print(" Train set results")
print(accuracy_score(y_train, y_train_pred))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred))

 Train set results
0.2870997466021654
 Test set results
0.28173945089367974
 Val set results
0.2895318835237744


In [989]:
# Model evaluation
model_eval(y_train, y_train_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.81,0.29,0.42
amusement,0.81,0.6,0.69
anger,0.68,0.11,0.18
annoyance,0.58,0.01,0.01
approval,0.78,0.05,0.09
caring,0.67,0.01,0.03
confusion,0.94,0.02,0.04
curiosity,0.92,0.05,0.09
desire,0.71,0.19,0.3
disappointment,0.0,0.0,0.0


In [990]:
# Model evaluation
model_eval(y_test, y_test_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.74,0.27,0.4
amusement,0.81,0.63,0.71
anger,0.75,0.11,0.19
annoyance,0.5,0.0,0.01
approval,0.73,0.05,0.1
caring,0.5,0.02,0.04
confusion,1.0,0.03,0.05
curiosity,0.86,0.02,0.04
desire,0.47,0.08,0.14
disappointment,0.0,0.0,0.0


In [991]:
# Model evaluation
model_eval(y_val, y_val_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.81,0.34,0.48
amusement,0.84,0.6,0.7
anger,0.59,0.08,0.14
annoyance,0.33,0.0,0.01
approval,0.84,0.05,0.1
caring,1.0,0.02,0.04
confusion,1.0,0.02,0.04
curiosity,0.94,0.07,0.13
desire,0.7,0.18,0.29
disappointment,0.0,0.0,0.0


### Model 3

In [997]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_leaf=30)
rf_model1.fit(X_train, y_train)


In [998]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_pred = rf_model1.predict(X_train)
y_test_pred = rf_model1.predict(X_test)
y_val_pred = rf_model1.predict(X_val)


print(" Train set results")
print(accuracy_score(y_train, y_train_pred))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred))

 Train set results
0.2607924441372956
 Test set results
0.2570480928689884
 Val set results
0.268521931441209


In [999]:
# Model evaluation
model_eval(y_train, y_train_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.82,0.26,0.39
amusement,0.82,0.57,0.67
anger,0.73,0.06,0.11
annoyance,0.0,0.0,0.0
approval,0.86,0.04,0.08
caring,0.0,0.0,0.0
confusion,0.0,0.0,0.0
curiosity,0.94,0.04,0.07
desire,0.71,0.16,0.27
disappointment,0.0,0.0,0.0


In [1000]:
# Model evaluation
model_eval(y_test, y_test_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.75,0.25,0.38
amusement,0.84,0.59,0.69
anger,0.79,0.06,0.1
annoyance,0.0,0.0,0.0
approval,0.79,0.04,0.08
caring,0.0,0.0,0.0
confusion,0.0,0.0,0.0
curiosity,1.0,0.01,0.02
desire,0.42,0.06,0.11
disappointment,0.0,0.0,0.0


In [1001]:
# Model evaluation
model_eval(y_val, y_val_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.82,0.32,0.46
amusement,0.84,0.56,0.67
anger,0.75,0.05,0.09
annoyance,0.0,0.0,0.0
approval,0.9,0.05,0.09
caring,0.0,0.0,0.0
confusion,0.0,0.0,0.0
curiosity,0.94,0.06,0.11
desire,0.72,0.17,0.27
disappointment,0.0,0.0,0.0


### Model 3

In [1002]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_split=15)
rf_model1.fit(X_train, y_train)


In [1003]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_pred = rf_model1.predict(X_train)
y_test_pred = rf_model1.predict(X_test)
y_val_pred = rf_model1.predict(X_val)


print(" Train set results")
print(accuracy_score(y_train, y_train_pred))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred))

 Train set results
0.6109652153881594
 Test set results
0.3617099686751428
 Val set results
0.3695171396977516


In [1004]:
# Model evaluation
model_eval(y_train, y_train_pred, GE_taxonomy)

Unnamed: 0,Precision,Recall,F1
admiration,0.96,0.71,0.81
amusement,0.96,0.89,0.93
anger,0.93,0.55,0.69
annoyance,0.98,0.35,0.52
approval,0.99,0.37,0.54
caring,0.94,0.31,0.46
confusion,0.99,0.29,0.45
curiosity,0.98,0.24,0.38
desire,0.99,0.53,0.69
disappointment,0.99,0.26,0.41


In [1005]:
# Model evaluation
model_eval(y_test, y_test_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.68,0.45,0.54
amusement,0.79,0.78,0.78
anger,0.52,0.23,0.32
annoyance,0.58,0.04,0.08
approval,0.59,0.08,0.14
caring,0.67,0.1,0.18
confusion,0.62,0.05,0.1
curiosity,0.47,0.02,0.05
desire,0.52,0.14,0.23
disappointment,0.75,0.02,0.04


In [1006]:
# Model evaluation
model_eval(y_val, y_val_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.72,0.53,0.61
amusement,0.77,0.71,0.74
anger,0.47,0.23,0.31
annoyance,0.48,0.04,0.07
approval,0.7,0.09,0.16
caring,0.55,0.04,0.07
confusion,0.37,0.05,0.08
curiosity,0.68,0.07,0.12
desire,0.76,0.29,0.42
disappointment,0.33,0.01,0.01


### Model 4

In [1007]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_split=10)
rf_model1.fit(X_train, y_train)


In [1008]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_pred = rf_model1.predict(X_train)
y_test_pred = rf_model1.predict(X_test)
y_val_pred = rf_model1.predict(X_val)


print(" Train set results")
print(accuracy_score(y_train, y_train_pred))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred))

 Train set results
0.6608154803040774
 Test set results
0.3607886493458633
 Val set results
0.3684113527460376


In [1009]:
# Model evaluation
model_eval(y_train, y_train_pred, GE_taxonomy)

Unnamed: 0,Precision,Recall,F1
admiration,0.97,0.76,0.85
amusement,0.96,0.91,0.94
anger,0.93,0.6,0.73
annoyance,0.98,0.44,0.61
approval,0.99,0.46,0.63
caring,0.97,0.41,0.57
confusion,0.99,0.37,0.54
curiosity,0.99,0.32,0.48
desire,0.99,0.59,0.74
disappointment,1.0,0.34,0.51


In [1010]:
# Model evaluation
model_eval(y_test, y_test_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.68,0.45,0.54
amusement,0.78,0.78,0.78
anger,0.53,0.24,0.33
annoyance,0.6,0.05,0.09
approval,0.55,0.08,0.14
caring,0.63,0.09,0.16
confusion,0.64,0.06,0.11
curiosity,0.44,0.02,0.05
desire,0.52,0.13,0.21
disappointment,0.75,0.02,0.04


In [1011]:
# Model evaluation
model_eval(y_val, y_val_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.71,0.52,0.6
amusement,0.77,0.7,0.73
anger,0.48,0.23,0.31
annoyance,0.5,0.04,0.08
approval,0.69,0.09,0.16
caring,0.33,0.02,0.04
confusion,0.35,0.05,0.08
curiosity,0.61,0.07,0.12
desire,0.76,0.29,0.42
disappointment,0.33,0.01,0.01


### Model 5

In [1012]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_split=5)
rf_model1.fit(X_train, y_train)


In [1013]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_pred = rf_model1.predict(X_train)
y_test_pred = rf_model1.predict(X_test)
y_val_pred = rf_model1.predict(X_val)


print(" Train set results")
print(accuracy_score(y_train, y_train_pred))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred))

 Train set results
0.7803501497350841
 Test set results
0.3648424543946932
 Val set results
0.3717287136011795


In [1014]:
# Model evaluation
model_eval(y_train, y_train_pred, GE_taxonomy)

Unnamed: 0,Precision,Recall,F1
admiration,0.98,0.83,0.9
amusement,0.97,0.94,0.95
anger,0.95,0.73,0.82
annoyance,0.99,0.66,0.8
approval,0.99,0.69,0.81
caring,0.98,0.64,0.78
confusion,1.0,0.58,0.74
curiosity,0.99,0.56,0.72
desire,0.99,0.76,0.86
disappointment,1.0,0.61,0.75


In [1015]:
# Model evaluation
model_eval(y_test, y_test_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.67,0.45,0.54
amusement,0.79,0.78,0.78
anger,0.53,0.23,0.32
annoyance,0.57,0.05,0.09
approval,0.58,0.09,0.15
caring,0.63,0.09,0.16
confusion,0.6,0.06,0.11
curiosity,0.39,0.02,0.05
desire,0.5,0.13,0.21
disappointment,0.57,0.03,0.05


In [1016]:
# Model evaluation
model_eval(y_val, y_val_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.7,0.52,0.59
amusement,0.76,0.7,0.73
anger,0.47,0.23,0.31
annoyance,0.54,0.04,0.08
approval,0.66,0.1,0.17
caring,0.36,0.03,0.05
confusion,0.4,0.05,0.09
curiosity,0.59,0.08,0.14
desire,0.77,0.3,0.43
disappointment,0.25,0.01,0.01


### Model 6

In [1017]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_split=20)
rf_model1.fit(X_train, y_train)


In [1018]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_pred = rf_model1.predict(X_train)
y_test_pred = rf_model1.predict(X_test)
y_val_pred = rf_model1.predict(X_val)


print(" Train set results")
print(accuracy_score(y_train, y_train_pred))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred))

 Train set results
0.5785994010596637
 Test set results
0.3611571770775751
 Val set results
0.3695171396977516


In [1019]:
# Model evaluation
model_eval(y_train, y_train_pred, GE_taxonomy)

Unnamed: 0,Precision,Recall,F1
admiration,0.95,0.68,0.79
amusement,0.95,0.88,0.91
anger,0.92,0.51,0.65
annoyance,0.97,0.3,0.45
approval,0.99,0.32,0.48
caring,0.91,0.25,0.39
confusion,0.99,0.25,0.4
curiosity,0.98,0.2,0.33
desire,0.98,0.5,0.66
disappointment,0.99,0.21,0.35


In [1020]:
# Model evaluation
model_eval(y_test, y_test_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.68,0.45,0.54
amusement,0.79,0.77,0.78
anger,0.54,0.23,0.32
annoyance,0.57,0.04,0.07
approval,0.58,0.08,0.14
caring,0.68,0.1,0.17
confusion,0.69,0.06,0.11
curiosity,0.47,0.02,0.05
desire,0.5,0.13,0.21
disappointment,0.75,0.02,0.04


In [1021]:
# Model evaluation
model_eval(y_val, y_val_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.72,0.53,0.61
amusement,0.78,0.71,0.74
anger,0.49,0.23,0.31
annoyance,0.5,0.04,0.08
approval,0.71,0.09,0.15
caring,0.62,0.05,0.1
confusion,0.41,0.05,0.08
curiosity,0.68,0.07,0.12
desire,0.76,0.29,0.42
disappointment,0.33,0.01,0.01


### Model 7

In [331]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_split=14)
rf_model1.fit(X_train, y_train)


In [332]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_pred = rf_model1.predict(X_train)
y_test_pred = rf_model1.predict(X_test)
y_val_pred = rf_model1.predict(X_val)


print(" Train set results")
print(accuracy_score(y_train, y_train_pred))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred))

 Train set results
0.618728403593642
 Test set results
0.3611571770775751
 Val set results
0.36970143752303725


In [333]:
# Model evaluation
model_eval(y_train, y_train_pred, GE_taxonomy)

Unnamed: 0,Precision,Recall,F1
admiration,0.96,0.72,0.82
amusement,0.96,0.9,0.93
anger,0.93,0.56,0.7
annoyance,0.98,0.37,0.54
approval,0.99,0.38,0.55
caring,0.95,0.33,0.49
confusion,0.99,0.3,0.47
curiosity,0.99,0.25,0.4
desire,0.99,0.53,0.69
disappointment,0.99,0.28,0.43


In [334]:
# Model evaluation
model_eval(y_test, y_test_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.68,0.45,0.54
amusement,0.79,0.77,0.78
anger,0.52,0.23,0.32
annoyance,0.6,0.05,0.09
approval,0.58,0.08,0.14
caring,0.67,0.1,0.18
confusion,0.64,0.06,0.11
curiosity,0.47,0.02,0.05
desire,0.52,0.13,0.21
disappointment,0.75,0.02,0.04


In [335]:
# Model evaluation
model_eval(y_val, y_val_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.71,0.52,0.6
amusement,0.77,0.7,0.74
anger,0.49,0.24,0.32
annoyance,0.48,0.04,0.07
approval,0.72,0.09,0.16
caring,0.44,0.03,0.05
confusion,0.37,0.05,0.08
curiosity,0.68,0.07,0.12
desire,0.75,0.27,0.4
disappointment,0.33,0.01,0.01


### Model 8

In [336]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_split=13, criterion='gini')
rf_model1.fit(X_train, y_train)


In [337]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_pred = rf_model1.predict(X_train)
y_test_pred = rf_model1.predict(X_test)
y_val_pred = rf_model1.predict(X_val)


print(" Train set results")
print(accuracy_score(y_train, y_train_pred))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred))

 Train set results
0.6278507256392536
 Test set results
0.36226276027271054
 Val set results
0.3711758201253225


In [338]:
# Model evaluation
model_eval(y_train, y_train_pred, GE_taxonomy)

Unnamed: 0,Precision,Recall,F1
admiration,0.96,0.72,0.83
amusement,0.96,0.9,0.93
anger,0.93,0.57,0.71
annoyance,0.98,0.39,0.55
approval,0.99,0.4,0.57
caring,0.96,0.33,0.49
confusion,0.99,0.31,0.47
curiosity,0.99,0.27,0.42
desire,0.98,0.54,0.7
disappointment,0.99,0.3,0.46


In [339]:
# Model evaluation
model_eval(y_test, y_test_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.68,0.45,0.54
amusement,0.79,0.78,0.78
anger,0.5,0.23,0.31
annoyance,0.6,0.05,0.09
approval,0.58,0.08,0.14
caring,0.63,0.09,0.16
confusion,0.64,0.06,0.11
curiosity,0.5,0.02,0.05
desire,0.5,0.13,0.21
disappointment,0.75,0.02,0.04


In [340]:
# Model evaluation
model_eval(y_val, y_val_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.71,0.52,0.6
amusement,0.77,0.71,0.74
anger,0.49,0.24,0.32
annoyance,0.5,0.04,0.07
approval,0.71,0.09,0.16
caring,0.29,0.01,0.02
confusion,0.37,0.05,0.08
curiosity,0.68,0.07,0.12
desire,0.76,0.29,0.42
disappointment,0.33,0.01,0.01


### Model 8.2 - Final - Final

In [341]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_split=12, criterion='gini')
rf_model1.fit(X_train, y_train)


In [342]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_pred = rf_model1.predict(X_train)
y_test_pred = rf_model1.predict(X_test)
y_val_pred = rf_model1.predict(X_val)


print(" Train set results")
print(accuracy_score(y_train, y_train_pred))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred))

 Train set results
0.6371803731859018
 Test set results
0.36189423254099873
 Val set results
0.3704386288241799


In [343]:
# Model evaluation
model_eval(y_train, y_train_pred, GE_taxonomy)

Unnamed: 0,Precision,Recall,F1
admiration,0.96,0.73,0.83
amusement,0.96,0.9,0.93
anger,0.93,0.58,0.71
annoyance,0.98,0.4,0.57
approval,0.99,0.42,0.59
caring,0.96,0.36,0.52
confusion,0.99,0.33,0.5
curiosity,0.99,0.28,0.44
desire,0.99,0.56,0.72
disappointment,0.99,0.31,0.47


In [344]:
# Model evaluation
model_eval(y_test, y_test_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.68,0.45,0.54
amusement,0.79,0.78,0.78
anger,0.54,0.25,0.34
annoyance,0.59,0.05,0.09
approval,0.56,0.08,0.14
caring,0.63,0.09,0.16
confusion,0.67,0.05,0.1
curiosity,0.5,0.02,0.05
desire,0.52,0.14,0.23
disappointment,0.75,0.02,0.04


In [345]:
# Model evaluation
model_eval(y_val, y_val_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.71,0.53,0.6
amusement,0.77,0.71,0.74
anger,0.5,0.24,0.32
annoyance,0.52,0.04,0.08
approval,0.71,0.09,0.16
caring,0.44,0.03,0.05
confusion,0.37,0.05,0.08
curiosity,0.63,0.07,0.12
desire,0.77,0.3,0.43
disappointment,0.33,0.01,0.01


### Model 9

In [1080]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_split=45, criterion='log_loss')
rf_model1.fit(X_train, y_train)


In [1081]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_pred = rf_model1.predict(X_train)
y_test_pred = rf_model1.predict(X_test)
y_val_pred = rf_model1.predict(X_val)


print(" Train set results")
print(accuracy_score(y_train, y_train_pred))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred))

 Train set results
0.4381248560239576
 Test set results
0.3480744426018058
 Val set results
0.3536675267231847


In [1082]:
# Model evaluation
model_eval(y_train, y_train_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.92,0.53,0.67
amusement,0.9,0.82,0.86
anger,0.87,0.29,0.44
annoyance,0.88,0.07,0.13
approval,0.96,0.11,0.2
caring,0.84,0.05,0.09
confusion,0.96,0.06,0.11
curiosity,0.96,0.08,0.15
desire,0.93,0.32,0.48
disappointment,0.88,0.02,0.04


In [1083]:
# Model evaluation
model_eval(y_test, y_test_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.7,0.39,0.5
amusement,0.81,0.73,0.77
anger,0.66,0.17,0.27
annoyance,0.64,0.03,0.05
approval,0.69,0.05,0.1
caring,0.67,0.04,0.08
confusion,1.0,0.03,0.05
curiosity,0.29,0.01,0.03
desire,0.6,0.11,0.18
disappointment,1.0,0.01,0.03


In [1084]:
# Model evaluation
model_eval(y_val, y_val_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.75,0.47,0.58
amusement,0.79,0.67,0.73
anger,0.61,0.2,0.3
annoyance,0.43,0.02,0.04
approval,0.82,0.06,0.11
caring,1.0,0.01,0.03
confusion,0.5,0.02,0.04
curiosity,0.74,0.07,0.13
desire,0.75,0.27,0.4
disappointment,0.0,0.0,0.0


### Model 10

In [1085]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_split=40, criterion='log_loss')
rf_model1.fit(X_train, y_train)


In [1086]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_pred = rf_model1.predict(X_train)
y_test_pred = rf_model1.predict(X_test)
y_val_pred = rf_model1.predict(X_val)


print(" Train set results")
print(accuracy_score(y_train, y_train_pred))
print(" Test set results")
print(accuracy_score(y_test, y_test_pred))
print(" Val set results")
print(accuracy_score(y_val, y_val_pred))

 Train set results
0.4478461184058973
 Test set results
0.35028560899207667
 Val set results
0.3558791006266126


In [1087]:
# Model evaluation
model_eval(y_train, y_train_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.92,0.54,0.68
amusement,0.91,0.82,0.86
anger,0.88,0.31,0.45
annoyance,0.9,0.08,0.15
approval,0.97,0.12,0.21
caring,0.79,0.06,0.1
confusion,0.97,0.07,0.13
curiosity,0.96,0.08,0.16
desire,0.94,0.33,0.49
disappointment,0.89,0.03,0.05


In [1088]:
# Model evaluation
model_eval(y_test, y_test_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.7,0.39,0.5
amusement,0.81,0.73,0.76
anger,0.66,0.17,0.27
annoyance,0.53,0.02,0.05
approval,0.69,0.05,0.1
caring,0.73,0.06,0.11
confusion,1.0,0.03,0.05
curiosity,0.27,0.01,0.03
desire,0.5,0.11,0.18
disappointment,1.0,0.01,0.03


In [1089]:
# Model evaluation
model_eval(y_val, y_val_pred, GE_taxonomy)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.74,0.46,0.57
amusement,0.78,0.67,0.72
anger,0.6,0.2,0.3
annoyance,0.4,0.02,0.04
approval,0.74,0.06,0.11
caring,1.0,0.03,0.05
confusion,0.57,0.03,0.05
curiosity,0.74,0.07,0.13
desire,0.73,0.25,0.37
disappointment,0.0,0.0,0.0


### Random Forest Model with GoEmotion taxonomy 27 labels excluding neutral emotion 


#### Model 1

In [1090]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=300, min_samples_leaf=15)
rf_model1.fit(X_train_no_neu, y_train_no_neu)


In [1091]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_no_neu_pred = rf_model1.predict(X_train_no_neu)
y_test_no_neu_pred = rf_model1.predict(X_test_no_neu)
y_val_no_neu_pred = rf_model1.predict(X_val_no_neu)


print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_no_neu_pred))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_no_neu_pred))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_no_neu_pred))

 Train set results
0.23009775394775558
 Test set results
0.22690395184506673
 Val set results
0.24700052164840897


In [1093]:
# Model evaluation
model_eval(y_train_no_neu, y_train_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.83,0.34,0.48
amusement,0.87,0.63,0.73
anger,0.72,0.16,0.27
annoyance,0.66,0.02,0.03
approval,0.84,0.06,0.1
caring,0.88,0.01,0.03
confusion,0.97,0.02,0.04
curiosity,1.0,0.04,0.08
desire,0.8,0.22,0.34
disappointment,0.0,0.0,0.0


In [1094]:
# Model evaluation
model_eval(y_test_no_neu, y_test_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.83,0.33,0.47
amusement,0.87,0.67,0.76
anger,0.81,0.15,0.26
annoyance,0.83,0.02,0.03
approval,0.81,0.05,0.09
caring,0.75,0.02,0.04
confusion,1.0,0.03,0.05
curiosity,1.0,0.01,0.03
desire,0.67,0.1,0.17
disappointment,0.0,0.0,0.0


In [1095]:
# Model evaluation
model_eval(y_val_no_neu, y_val_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.83,0.41,0.54
amusement,0.89,0.64,0.75
anger,0.68,0.13,0.22
annoyance,0.6,0.01,0.02
approval,0.96,0.06,0.12
caring,1.0,0.02,0.04
confusion,1.0,0.02,0.04
curiosity,1.0,0.06,0.11
desire,0.72,0.23,0.35
disappointment,0.0,0.0,0.0


### Model 2

In [1096]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_leaf=45)
rf_model1.fit(X_train_no_neu, y_train_no_neu)


In [1097]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_no_neu_pred = rf_model1.predict(X_train_no_neu)
y_test_no_neu_pred = rf_model1.predict(X_test_no_neu)
y_val_no_neu_pred = rf_model1.predict(X_val_no_neu)


print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_no_neu_pred))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_no_neu_pred))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_no_neu_pred))

 Train set results
0.15732173799326513
 Test set results
0.156241821512693
 Val set results
0.17553468961919666


In [1098]:
# Model evaluation
model_eval(y_train_no_neu, y_train_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.88,0.2,0.33
amusement,0.9,0.5,0.64
anger,1.0,0.0,0.0
annoyance,0.0,0.0,0.0
approval,0.93,0.04,0.08
caring,0.0,0.0,0.0
confusion,0.0,0.0,0.0
curiosity,1.0,0.04,0.07
desire,0.82,0.09,0.17
disappointment,0.0,0.0,0.0


In [1099]:
# Model evaluation
model_eval(y_test_no_neu, y_test_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.87,0.19,0.32
amusement,0.9,0.49,0.64
anger,0.0,0.0,0.0
annoyance,0.0,0.0,0.0
approval,0.87,0.04,0.07
caring,0.0,0.0,0.0
confusion,0.0,0.0,0.0
curiosity,1.0,0.01,0.01
desire,0.67,0.02,0.05
disappointment,0.0,0.0,0.0


In [1100]:
# Model evaluation
model_eval(y_val_no_neu, y_val_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.9,0.25,0.39
amusement,0.91,0.47,0.62
anger,0.0,0.0,0.0
annoyance,0.0,0.0,0.0
approval,0.94,0.04,0.08
caring,0.0,0.0,0.0
confusion,0.0,0.0,0.0
curiosity,1.0,0.05,0.1
desire,0.91,0.13,0.23
disappointment,0.0,0.0,0.0


### Model 3

In [1101]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_leaf=10)
rf_model1.fit(X_train_no_neu, y_train_no_neu)


In [1102]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_no_neu_pred = rf_model1.predict(X_train_no_neu)
y_test_no_neu_pred = rf_model1.predict(X_test_no_neu)
y_val_no_neu_pred = rf_model1.predict(X_val_no_neu)


print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_no_neu_pred))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_no_neu_pred))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_no_neu_pred))

 Train set results
0.25592572007715697
 Test set results
0.2504579952891913
 Val set results
0.2655190401669275


In [1103]:
# Model evaluation
model_eval(y_train_no_neu, y_train_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.82,0.41,0.54
amusement,0.86,0.67,0.75
anger,0.7,0.17,0.28
annoyance,0.73,0.04,0.07
approval,0.77,0.09,0.15
caring,0.82,0.03,0.05
confusion,0.96,0.03,0.06
curiosity,0.97,0.05,0.1
desire,0.8,0.23,0.36
disappointment,0.0,0.0,0.0


In [1104]:
# Model evaluation
model_eval(y_test_no_neu, y_test_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.8,0.39,0.52
amusement,0.87,0.72,0.79
anger,0.76,0.16,0.26
annoyance,0.87,0.04,0.08
approval,0.73,0.09,0.17
caring,0.75,0.04,0.08
confusion,0.75,0.04,0.07
curiosity,1.0,0.02,0.05
desire,0.67,0.1,0.17
disappointment,0.0,0.0,0.0


In [1105]:
# Model evaluation
model_eval(y_val_no_neu, y_val_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.81,0.45,0.58
amusement,0.88,0.66,0.75
anger,0.67,0.14,0.24
annoyance,0.78,0.02,0.04
approval,0.88,0.07,0.13
caring,1.0,0.02,0.04
confusion,0.88,0.05,0.09
curiosity,1.0,0.07,0.13
desire,0.74,0.26,0.38
disappointment,0.0,0.0,0.0


### Model 4

In [1106]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_split=15)
rf_model1.fit(X_train_no_neu, y_train_no_neu)


In [1107]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_no_neu_pred = rf_model1.predict(X_train_no_neu)
y_test_no_neu_pred = rf_model1.predict(X_test_no_neu)
y_val_no_neu_pred = rf_model1.predict(X_val_no_neu)


print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_no_neu_pred))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_no_neu_pred))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_no_neu_pred))

 Train set results
0.5803445908392454
 Test set results
0.32112012562156506
 Val set results
0.33437663015127805


In [1108]:
# Model evaluation
model_eval(y_train_no_neu, y_train_no_neu_pred, GE_taxonomy_no_neu)

Unnamed: 0,Precision,Recall,F1
admiration,0.95,0.78,0.86
amusement,0.96,0.9,0.93
anger,0.95,0.59,0.73
annoyance,0.98,0.47,0.64
approval,0.98,0.53,0.69
caring,0.96,0.38,0.55
confusion,0.96,0.44,0.6
curiosity,0.98,0.39,0.56
desire,0.98,0.6,0.74
disappointment,0.98,0.34,0.51


In [1109]:
# Model evaluation
model_eval(y_test_no_neu, y_test_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.71,0.51,0.6
amusement,0.85,0.82,0.83
anger,0.64,0.26,0.37
annoyance,0.74,0.1,0.18
approval,0.61,0.15,0.24
caring,0.67,0.13,0.22
confusion,0.46,0.16,0.24
curiosity,0.46,0.04,0.08
desire,0.62,0.25,0.36
disappointment,0.75,0.04,0.08


In [1110]:
# Model evaluation
model_eval(y_val_no_neu, y_val_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.73,0.58,0.65
amusement,0.82,0.76,0.79
anger,0.61,0.27,0.38
annoyance,0.56,0.08,0.13
approval,0.63,0.17,0.26
caring,0.6,0.06,0.11
confusion,0.51,0.15,0.23
curiosity,0.57,0.08,0.14
desire,0.6,0.32,0.42
disappointment,0.17,0.01,0.01


### Model 5

In [1111]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_leaf=4)
rf_model1.fit(X_train_no_neu, y_train_no_neu)


In [1112]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_no_neu_pred = rf_model1.predict(X_train_no_neu)
y_test_no_neu_pred = rf_model1.predict(X_test_no_neu)
y_val_no_neu_pred = rf_model1.predict(X_val_no_neu)


print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_no_neu_pred))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_no_neu_pred))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_no_neu_pred))

 Train set results
0.2865269558962958
 Test set results
0.2747971735147867
 Val set results
0.2871674491392801


In [1113]:
# Model evaluation
model_eval(y_train_no_neu, y_train_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.81,0.49,0.61
amusement,0.85,0.73,0.79
anger,0.73,0.22,0.34
annoyance,0.77,0.06,0.1
approval,0.77,0.1,0.18
caring,0.76,0.07,0.12
confusion,0.95,0.04,0.08
curiosity,0.93,0.07,0.12
desire,0.8,0.25,0.38
disappointment,0.83,0.03,0.05


In [1114]:
# Model evaluation
model_eval(y_test_no_neu, y_test_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.78,0.46,0.58
amusement,0.87,0.77,0.81
anger,0.79,0.19,0.3
annoyance,0.89,0.05,0.1
approval,0.73,0.11,0.18
caring,0.8,0.09,0.16
confusion,0.78,0.05,0.09
curiosity,0.88,0.02,0.05
desire,0.69,0.11,0.19
disappointment,0.8,0.03,0.05


In [1115]:
# Model evaluation
model_eval(y_val_no_neu, y_val_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.77,0.52,0.62
amusement,0.85,0.7,0.77
anger,0.68,0.17,0.28
annoyance,0.65,0.04,0.07
approval,0.86,0.09,0.16
caring,0.67,0.05,0.1
confusion,0.89,0.05,0.1
curiosity,0.89,0.07,0.13
desire,0.75,0.27,0.4
disappointment,0.33,0.01,0.01


### Model 6

In [59]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_split=17)
rf_model1.fit(X_train_no_neu, y_train_no_neu)


In [60]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_no_neu_pred = rf_model1.predict(X_train_no_neu)
y_test_no_neu_pred = rf_model1.predict(X_test_no_neu)
y_val_no_neu_pred = rf_model1.predict(X_val_no_neu)


print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_no_neu_pred))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_no_neu_pred))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_no_neu_pred))

 Train set results
0.5621342400366168
 Test set results
0.3200732792462706
 Val set results
0.3369848721961398


In [61]:
# Model evaluation
model_eval(y_train_no_neu, y_train_no_neu_pred, GE_taxonomy_no_neu)

Unnamed: 0,Precision,Recall,F1
admiration,0.95,0.77,0.85
amusement,0.96,0.9,0.93
anger,0.95,0.58,0.72
annoyance,0.97,0.44,0.61
approval,0.98,0.51,0.67
caring,0.95,0.36,0.52
confusion,0.96,0.42,0.58
curiosity,0.98,0.36,0.53
desire,0.98,0.58,0.73
disappointment,0.98,0.31,0.47


In [62]:
# Model evaluation
model_eval(y_test_no_neu, y_test_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.71,0.51,0.59
amusement,0.85,0.83,0.84
anger,0.66,0.27,0.39
annoyance,0.75,0.09,0.17
approval,0.61,0.15,0.23
caring,0.71,0.13,0.21
confusion,0.49,0.17,0.25
curiosity,0.52,0.04,0.08
desire,0.63,0.27,0.37
disappointment,0.67,0.04,0.08


In [63]:
# Model evaluation
model_eval(y_val_no_neu, y_val_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.73,0.59,0.65
amusement,0.82,0.76,0.79
anger,0.61,0.28,0.39
annoyance,0.56,0.08,0.13
approval,0.61,0.16,0.25
caring,0.62,0.07,0.12
confusion,0.51,0.15,0.23
curiosity,0.58,0.08,0.15
desire,0.6,0.32,0.42
disappointment,0.29,0.01,0.02


### Model 6.1

In [54]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_split=16)
rf_model1.fit(X_train_no_neu, y_train_no_neu)

In [55]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_no_neu_pred = rf_model1.predict(X_train_no_neu)
y_test_no_neu_pred = rf_model1.predict(X_test_no_neu)
y_val_no_neu_pred = rf_model1.predict(X_val_no_neu)


print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_no_neu_pred))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_no_neu_pred))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_no_neu_pred))

 Train set results
0.571942328440187
 Test set results
0.32269039518450665
 Val set results
0.3367240479916536


In [56]:
# Model evaluation
model_eval(y_train_no_neu, y_train_no_neu_pred, GE_taxonomy_no_neu)

Unnamed: 0,Precision,Recall,F1
admiration,0.95,0.78,0.85
amusement,0.96,0.9,0.93
anger,0.95,0.59,0.73
annoyance,0.97,0.46,0.63
approval,0.98,0.52,0.68
caring,0.96,0.36,0.53
confusion,0.96,0.43,0.59
curiosity,0.98,0.38,0.55
desire,0.97,0.59,0.73
disappointment,0.98,0.33,0.49


In [57]:
# Model evaluation
model_eval(y_test_no_neu, y_test_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.71,0.52,0.6
amusement,0.85,0.83,0.84
anger,0.65,0.28,0.39
annoyance,0.79,0.09,0.17
approval,0.61,0.15,0.24
caring,0.71,0.13,0.21
confusion,0.49,0.17,0.25
curiosity,0.52,0.04,0.08
desire,0.61,0.27,0.37
disappointment,0.75,0.04,0.08


In [58]:
# Model evaluation
model_eval(y_val_no_neu, y_val_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.73,0.59,0.65
amusement,0.82,0.77,0.79
anger,0.62,0.29,0.39
annoyance,0.56,0.07,0.13
approval,0.63,0.16,0.26
caring,0.6,0.06,0.11
confusion,0.51,0.15,0.23
curiosity,0.57,0.08,0.14
desire,0.6,0.32,0.42
disappointment,0.29,0.01,0.02


### Model 6.2

In [64]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_split=14)
rf_model1.fit(X_train_no_neu, y_train_no_neu)

In [65]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_no_neu_pred = rf_model1.predict(X_train_no_neu)
y_test_no_neu_pred = rf_model1.predict(X_test_no_neu)
y_val_no_neu_pred = rf_model1.predict(X_val_no_neu)


print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_no_neu_pred))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_no_neu_pred))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_no_neu_pred))

 Train set results
0.5924085395756367
 Test set results
0.32190526040303585
 Val set results
0.3362023995826813


In [66]:
# Model evaluation
model_eval(y_train_no_neu, y_train_no_neu_pred, GE_taxonomy_no_neu)

Unnamed: 0,Precision,Recall,F1
admiration,0.95,0.79,0.87
amusement,0.96,0.91,0.93
anger,0.95,0.6,0.74
annoyance,0.98,0.49,0.65
approval,0.98,0.55,0.71
caring,0.96,0.4,0.56
confusion,0.96,0.45,0.62
curiosity,0.98,0.42,0.58
desire,0.98,0.61,0.75
disappointment,0.98,0.36,0.52


In [67]:
# Model evaluation
model_eval(y_test_no_neu, y_test_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.7,0.51,0.59
amusement,0.84,0.83,0.84
anger,0.67,0.26,0.38
annoyance,0.76,0.1,0.18
approval,0.62,0.15,0.24
caring,0.68,0.13,0.21
confusion,0.47,0.16,0.24
curiosity,0.48,0.04,0.08
desire,0.63,0.27,0.37
disappointment,0.6,0.04,0.07


In [68]:
# Model evaluation
model_eval(y_val_no_neu, y_val_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.73,0.59,0.65
amusement,0.82,0.76,0.79
anger,0.61,0.28,0.38
annoyance,0.56,0.07,0.13
approval,0.62,0.16,0.26
caring,0.6,0.06,0.11
confusion,0.51,0.15,0.23
curiosity,0.57,0.08,0.14
desire,0.6,0.32,0.42
disappointment,0.29,0.01,0.02


### Model 6.3 - Final - Final

In [70]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_split=16, max_features=None)
rf_model1.fit(X_train_no_neu, y_train_no_neu)

In [71]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_no_neu_pred = rf_model1.predict(X_train_no_neu)
y_test_no_neu_pred = rf_model1.predict(X_test_no_neu)
y_val_no_neu_pred = rf_model1.predict(X_val_no_neu)


print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_no_neu_pred))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_no_neu_pred))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_no_neu_pred))

 Train set results
0.5642593258573904
 Test set results
0.3352525516880398
 Val set results
0.35002608242044864


In [72]:
# Model evaluation
model_eval(y_train_no_neu, y_train_no_neu_pred, GE_taxonomy_no_neu)

Unnamed: 0,Precision,Recall,F1
admiration,0.94,0.76,0.84
amusement,0.95,0.9,0.92
anger,0.94,0.56,0.7
annoyance,0.95,0.45,0.61
approval,0.97,0.51,0.67
caring,0.94,0.4,0.56
confusion,0.96,0.44,0.6
curiosity,0.97,0.39,0.55
desire,0.95,0.63,0.75
disappointment,0.94,0.33,0.49


In [73]:
# Model evaluation
model_eval(y_test_no_neu, y_test_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.68,0.53,0.6
amusement,0.82,0.84,0.83
anger,0.6,0.3,0.4
annoyance,0.54,0.13,0.21
approval,0.56,0.17,0.26
caring,0.57,0.16,0.24
confusion,0.44,0.22,0.29
curiosity,0.36,0.05,0.08
desire,0.56,0.34,0.42
disappointment,0.47,0.06,0.11


In [74]:
# Model evaluation
model_eval(y_val_no_neu, y_val_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.68,0.6,0.64
amusement,0.8,0.78,0.79
anger,0.59,0.32,0.41
annoyance,0.5,0.11,0.18
approval,0.56,0.17,0.26
caring,0.47,0.09,0.15
confusion,0.41,0.19,0.26
curiosity,0.54,0.09,0.15
desire,0.53,0.36,0.43
disappointment,0.27,0.02,0.04


### Model 7

In [64]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_split=45, criterion='log_loss')
rf_model1.fit(X_train_no_neu, y_train_no_neu)


In [65]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_no_neu_pred = rf_model1.predict(X_train_no_neu)
y_test_no_neu_pred = rf_model1.predict(X_test_no_neu)
y_val_no_neu_pred = rf_model1.predict(X_val_no_neu)


print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_no_neu_pred))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_no_neu_pred))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_no_neu_pred))

 Train set results
0.3600222316670481
 Test set results
0.28160167495420046
 Val set results
0.3030777256129369


In [66]:
# Model evaluation
model_eval(y_train_no_neu, y_train_no_neu_pred, GE_taxonomy_no_neu)

Unnamed: 0,Precision,Recall,F1
admiration,0.93,0.59,0.72
amusement,0.92,0.84,0.88
anger,0.9,0.33,0.48
annoyance,0.88,0.1,0.18
approval,0.92,0.18,0.31
caring,0.85,0.09,0.16
confusion,0.9,0.15,0.26
curiosity,0.97,0.12,0.22
desire,0.92,0.38,0.54
disappointment,0.86,0.03,0.06


In [67]:
# Model evaluation
model_eval(y_test_no_neu, y_test_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.73,0.45,0.56
amusement,0.85,0.8,0.83
anger,0.75,0.21,0.33
annoyance,0.75,0.05,0.09
approval,0.72,0.11,0.19
caring,0.71,0.07,0.13
confusion,0.47,0.06,0.1
curiosity,0.41,0.02,0.05
desire,0.67,0.14,0.24
disappointment,0.75,0.02,0.04


In [68]:
# Model evaluation
model_eval(y_val_no_neu, y_val_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.74,0.52,0.61
amusement,0.83,0.73,0.78
anger,0.7,0.24,0.36
annoyance,0.68,0.04,0.08
approval,0.74,0.11,0.19
caring,0.78,0.05,0.09
confusion,0.5,0.08,0.14
curiosity,0.66,0.08,0.15
desire,0.71,0.29,0.41
disappointment,0.33,0.01,0.01


### Model 8

In [69]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_split=45, max_features='log2')
rf_model1.fit(X_train_no_neu, y_train_no_neu)


In [70]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_no_neu_pred = rf_model1.predict(X_train_no_neu)
y_test_no_neu_pred = rf_model1.predict(X_test_no_neu)
y_val_no_neu_pred = rf_model1.predict(X_val_no_neu)


print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_no_neu_pred))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_no_neu_pred))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_no_neu_pred))

 Train set results
0.39791414653284074
 Test set results
0.2622350170112536
 Val set results
0.27829942618675013


In [71]:
# Model evaluation
model_eval(y_train_no_neu, y_train_no_neu_pred, GE_taxonomy_no_neu)

Unnamed: 0,Precision,Recall,F1
admiration,0.95,0.66,0.78
amusement,0.96,0.86,0.91
anger,0.93,0.38,0.54
annoyance,0.95,0.18,0.31
approval,0.96,0.23,0.37
caring,0.91,0.11,0.19
confusion,0.97,0.18,0.3
curiosity,0.98,0.15,0.25
desire,0.96,0.39,0.55
disappointment,0.95,0.08,0.14


In [72]:
# Model evaluation
model_eval(y_test_no_neu, y_test_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.77,0.42,0.55
amusement,0.86,0.73,0.79
anger,0.8,0.19,0.3
annoyance,0.88,0.04,0.08
approval,0.84,0.07,0.14
caring,0.71,0.07,0.13
confusion,0.56,0.06,0.11
curiosity,0.53,0.03,0.06
desire,0.92,0.13,0.23
disappointment,0.75,0.02,0.04


In [73]:
# Model evaluation
model_eval(y_val_no_neu, y_val_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.78,0.5,0.61
amusement,0.86,0.67,0.75
anger,0.77,0.21,0.33
annoyance,0.68,0.04,0.08
approval,0.83,0.09,0.16
caring,0.78,0.05,0.09
confusion,0.74,0.09,0.16
curiosity,0.63,0.07,0.12
desire,0.82,0.23,0.36
disappointment,0.33,0.01,0.01


### Model 9

In [82]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_split=45, criterion='log_loss', max_features=None)
rf_model1.fit(X_train_no_neu, y_train_no_neu)


In [83]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_no_neu_pred = rf_model1.predict(X_train_no_neu)
y_test_no_neu_pred = rf_model1.predict(X_test_no_neu)
y_val_no_neu_pred = rf_model1.predict(X_val_no_neu)


print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_no_neu_pred))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_no_neu_pred))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_no_neu_pred))

 Train set results
0.3287998169156831
 Test set results
0.2912850039256739
 Val set results
0.31794470526864893


In [84]:
# Model evaluation
model_eval(y_train_no_neu, y_train_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.81,0.53,0.64
amusement,0.84,0.81,0.82
anger,0.73,0.3,0.42
annoyance,0.77,0.07,0.13
approval,0.76,0.15,0.25
caring,0.76,0.09,0.16
confusion,0.79,0.14,0.24
curiosity,0.95,0.12,0.21
desire,0.78,0.33,0.46
disappointment,0.81,0.03,0.05


In [85]:
# Model evaluation
model_eval(y_test_no_neu, y_test_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.69,0.46,0.55
amusement,0.84,0.8,0.82
anger,0.65,0.25,0.36
annoyance,0.71,0.05,0.09
approval,0.65,0.11,0.19
caring,0.72,0.1,0.17
confusion,0.42,0.09,0.15
curiosity,0.36,0.03,0.05
desire,0.65,0.16,0.25
disappointment,0.75,0.02,0.04


In [86]:
# Model evaluation
model_eval(y_val_no_neu, y_val_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.72,0.53,0.61
amusement,0.83,0.77,0.8
anger,0.66,0.26,0.38
annoyance,0.61,0.04,0.07
approval,0.74,0.11,0.19
caring,0.57,0.05,0.1
confusion,0.49,0.12,0.19
curiosity,0.64,0.09,0.16
desire,0.72,0.4,0.52
disappointment,0.0,0.0,0.0


### Model 10

In [87]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_split=45, max_features=None)
rf_model1.fit(X_train_no_neu, y_train_no_neu)


In [88]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_no_neu_pred = rf_model1.predict(X_train_no_neu)
y_test_no_neu_pred = rf_model1.predict(X_test_no_neu)
y_val_no_neu_pred = rf_model1.predict(X_val_no_neu)


print(" Train set results")
print(accuracy_score(y_train_no_neu, y_train_no_neu_pred))
print(" Test set results")
print(accuracy_score(y_test_no_neu, y_test_no_neu_pred))
print(" Val set results")
print(accuracy_score(y_val_no_neu, y_val_no_neu_pred))

 Train set results
0.4697093536469742
 Test set results
0.32269039518450665
 Val set results
0.3367240479916536


In [89]:
# Model evaluation
model_eval(y_train_no_neu, y_train_no_neu_pred, GE_taxonomy_no_neu)

Unnamed: 0,Precision,Recall,F1
admiration,0.91,0.69,0.78
amusement,0.91,0.87,0.89
anger,0.87,0.43,0.57
annoyance,0.87,0.28,0.43
approval,0.91,0.37,0.52
caring,0.88,0.24,0.38
confusion,0.91,0.31,0.46
curiosity,0.95,0.27,0.42
desire,0.93,0.54,0.68
disappointment,0.94,0.17,0.29


In [90]:
# Model evaluation
model_eval(y_test_no_neu, y_test_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.68,0.52,0.58
amusement,0.83,0.83,0.83
anger,0.63,0.29,0.4
annoyance,0.57,0.1,0.17
approval,0.6,0.17,0.26
caring,0.61,0.1,0.18
confusion,0.44,0.18,0.25
curiosity,0.42,0.05,0.08
desire,0.58,0.31,0.41
disappointment,0.6,0.04,0.07


In [91]:
# Model evaluation
model_eval(y_val_no_neu, y_val_no_neu_pred, GE_taxonomy_no_neu)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Precision,Recall,F1
admiration,0.7,0.6,0.64
amusement,0.81,0.77,0.79
anger,0.64,0.3,0.41
annoyance,0.5,0.09,0.15
approval,0.58,0.17,0.26
caring,0.43,0.06,0.1
confusion,0.44,0.16,0.24
curiosity,0.5,0.08,0.13
desire,0.55,0.36,0.44
disappointment,0.29,0.01,0.02


### Random Forest Model with EKMAN taxonomy 7 labels including neutral emotion 


#### Model 1

In [92]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_leaf=15)
rf_model1.fit(X_train_ekman, y_train_ekman)


In [93]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_pred = rf_model1.predict(X_train_ekman)
y_test_ekman_pred = rf_model1.predict(X_test_ekman)
y_val_ekman_pred = rf_model1.predict(X_val_ekman)


print(" Train set results")
print(accuracy_score(y_train_ekman, y_train_ekman_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman, y_test_ekman_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman, y_val_ekman_pred))

 Train set results
0.6182446440912233
 Test set results
0.6029113690805233
 Val set results
0.6196092886103944


In [95]:
# Model evaluation
cr = classification_report(y_train_ekman, y_train_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.58      0.22      0.32      4517
     disgust       0.58      0.28      0.38       694
        fear       0.61      0.36      0.45       642
         joy       0.76      0.74      0.75     15693
     sadness       0.75      0.37      0.49      2938
    surprise       0.71      0.17      0.27      4707
     neutral       0.51      0.84      0.64     14219

    accuracy                           0.62     43410
   macro avg       0.64      0.43      0.47     43410
weighted avg       0.65      0.62      0.59     43410



In [96]:
# Model evaluation
cr = classification_report(y_test_ekman, y_test_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.57      0.20      0.29       595
     disgust       0.64      0.33      0.44       112
        fear       0.63      0.41      0.50        87
         joy       0.74      0.74      0.74      1915
     sadness       0.72      0.33      0.45       341
    surprise       0.64      0.13      0.22       590
     neutral       0.51      0.83      0.63      1787

    accuracy                           0.60      5427
   macro avg       0.64      0.42      0.47      5427
weighted avg       0.63      0.60      0.57      5427



In [97]:
# Model evaluation
cr = classification_report(y_val_ekman, y_val_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.59      0.20      0.30       582
     disgust       0.55      0.37      0.44        81
        fear       0.71      0.39      0.51        89
         joy       0.78      0.75      0.76      1997
     sadness       0.74      0.33      0.46       352
    surprise       0.69      0.15      0.24       559
     neutral       0.51      0.84      0.64      1766

    accuracy                           0.62      5426
   macro avg       0.65      0.43      0.48      5426
weighted avg       0.65      0.62      0.59      5426



### Model 2

In [98]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_leaf=20)
rf_model1.fit(X_train_ekman, y_train_ekman)


In [99]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_pred = rf_model1.predict(X_train_ekman)
y_test_ekman_pred = rf_model1.predict(X_test_ekman)
y_val_ekman_pred = rf_model1.predict(X_val_ekman)


print(" Train set results")
print(accuracy_score(y_train_ekman, y_train_ekman_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman, y_test_ekman_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman, y_val_ekman_pred))

 Train set results
0.6111495047224141
 Test set results
0.5962778699097107
 Val set results
0.6148175451529672


In [100]:
# Model evaluation
cr = classification_report(y_train_ekman, y_train_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.58      0.21      0.31      4517
     disgust       0.60      0.29      0.39       694
        fear       0.60      0.33      0.42       642
         joy       0.76      0.73      0.75     15693
     sadness       0.75      0.34      0.47      2938
    surprise       0.70      0.15      0.25      4707
     neutral       0.51      0.84      0.63     14219

    accuracy                           0.61     43410
   macro avg       0.64      0.41      0.46     43410
weighted avg       0.65      0.61      0.58     43410



In [101]:
# Model evaluation
cr = classification_report(y_test_ekman, y_test_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.56      0.19      0.28       595
     disgust       0.63      0.33      0.43       112
        fear       0.63      0.39      0.48        87
         joy       0.74      0.73      0.74      1915
     sadness       0.71      0.29      0.41       341
    surprise       0.65      0.11      0.19       590
     neutral       0.50      0.83      0.62      1787

    accuracy                           0.60      5427
   macro avg       0.63      0.41      0.45      5427
weighted avg       0.63      0.60      0.56      5427



In [102]:
# Model evaluation
cr = classification_report(y_val_ekman, y_val_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.57      0.19      0.28       582
     disgust       0.55      0.37      0.44        81
        fear       0.71      0.38      0.50        89
         joy       0.78      0.74      0.76      1997
     sadness       0.75      0.32      0.45       352
    surprise       0.69      0.14      0.23       559
     neutral       0.50      0.85      0.63      1766

    accuracy                           0.61      5426
   macro avg       0.65      0.43      0.47      5426
weighted avg       0.65      0.61      0.58      5426



### Model 3

In [103]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_leaf=10)
rf_model1.fit(X_train_ekman, y_train_ekman)


In [104]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_pred = rf_model1.predict(X_train_ekman)
y_test_ekman_pred = rf_model1.predict(X_test_ekman)
y_val_ekman_pred = rf_model1.predict(X_val_ekman)


print(" Train set results")
print(accuracy_score(y_train_ekman, y_train_ekman_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman, y_test_ekman_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman, y_val_ekman_pred))

 Train set results
0.6247408431237043
 Test set results
0.60936060438548
 Val set results
0.622558053814965


In [105]:
# Model evaluation
cr = classification_report(y_train_ekman, y_train_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.58      0.24      0.34      4517
     disgust       0.58      0.29      0.39       694
        fear       0.60      0.41      0.49       642
         joy       0.76      0.75      0.76     15693
     sadness       0.74      0.39      0.51      2938
    surprise       0.70      0.17      0.28      4707
     neutral       0.52      0.83      0.64     14219

    accuracy                           0.62     43410
   macro avg       0.64      0.44      0.49     43410
weighted avg       0.65      0.62      0.60     43410



In [106]:
# Model evaluation
cr = classification_report(y_test_ekman, y_test_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.56      0.21      0.31       595
     disgust       0.64      0.34      0.44       112
        fear       0.64      0.47      0.54        87
         joy       0.74      0.75      0.74      1915
     sadness       0.73      0.35      0.48       341
    surprise       0.66      0.14      0.23       590
     neutral       0.51      0.82      0.63      1787

    accuracy                           0.61      5427
   macro avg       0.64      0.44      0.48      5427
weighted avg       0.63      0.61      0.58      5427



In [107]:
# Model evaluation
cr = classification_report(y_val_ekman, y_val_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.58      0.21      0.31       582
     disgust       0.54      0.38      0.45        81
        fear       0.64      0.40      0.50        89
         joy       0.77      0.75      0.76      1997
     sadness       0.72      0.34      0.46       352
    surprise       0.71      0.16      0.26       559
     neutral       0.52      0.84      0.64      1766

    accuracy                           0.62      5426
   macro avg       0.64      0.44      0.48      5426
weighted avg       0.65      0.62      0.59      5426



### Model 4

In [108]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_split=25)
rf_model1.fit(X_train_ekman, y_train_ekman)


In [109]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_pred = rf_model1.predict(X_train_ekman)
y_test_ekman_pred = rf_model1.predict(X_test_ekman)
y_val_ekman_pred = rf_model1.predict(X_val_ekman)


print(" Train set results")
print(accuracy_score(y_train_ekman, y_train_ekman_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman, y_test_ekman_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman, y_val_ekman_pred))

 Train set results
0.8592029486293481
 Test set results
0.5988575640316934
 Val set results
0.6111315886472539


In [110]:
# Model evaluation
cr = classification_report(y_train_ekman, y_train_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.89      0.75      0.81      4517
     disgust       0.92      0.71      0.80       694
        fear       0.84      0.75      0.79       642
         joy       0.92      0.91      0.91     15693
     sadness       0.91      0.78      0.84      2938
    surprise       0.93      0.68      0.79      4707
     neutral       0.78      0.93      0.85     14219

    accuracy                           0.86     43410
   macro avg       0.88      0.79      0.83     43410
weighted avg       0.87      0.86      0.86     43410



In [111]:
# Model evaluation
cr = classification_report(y_test_ekman, y_test_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.47      0.25      0.33       595
     disgust       0.55      0.33      0.41       112
        fear       0.65      0.48      0.55        87
         joy       0.72      0.75      0.73      1915
     sadness       0.66      0.42      0.51       341
    surprise       0.50      0.20      0.29       590
     neutral       0.52      0.75      0.61      1787

    accuracy                           0.60      5427
   macro avg       0.58      0.45      0.49      5427
weighted avg       0.60      0.60      0.58      5427



In [112]:
# Model evaluation
cr = classification_report(y_val_ekman, y_val_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.49      0.25      0.33       582
     disgust       0.52      0.40      0.45        81
        fear       0.61      0.42      0.49        89
         joy       0.75      0.75      0.75      1997
     sadness       0.66      0.40      0.50       352
    surprise       0.50      0.20      0.29       559
     neutral       0.52      0.76      0.62      1766

    accuracy                           0.61      5426
   macro avg       0.58      0.45      0.49      5426
weighted avg       0.61      0.61      0.59      5426



#### Model 5

In [113]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_split=50)
rf_model1.fit(X_train_ekman, y_train_ekman)


In [114]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_pred = rf_model1.predict(X_train_ekman)
y_test_ekman_pred = rf_model1.predict(X_test_ekman)
y_val_ekman_pred = rf_model1.predict(X_val_ekman)


print(" Train set results")
print(accuracy_score(y_train_ekman, y_train_ekman_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman, y_test_ekman_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman, y_val_ekman_pred))

 Train set results
0.8164478230822391
 Test set results
0.599778883360973
 Val set results
0.6133431625506819


In [115]:
# Model evaluation
cr = classification_report(y_train_ekman, y_train_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.85      0.68      0.75      4517
     disgust       0.82      0.63      0.71       694
        fear       0.75      0.67      0.71       642
         joy       0.88      0.89      0.88     15693
     sadness       0.87      0.70      0.77      2938
    surprise       0.88      0.58      0.70      4707
     neutral       0.73      0.90      0.81     14219

    accuracy                           0.82     43410
   macro avg       0.83      0.72      0.76     43410
weighted avg       0.83      0.82      0.81     43410



In [116]:
# Model evaluation
cr = classification_report(y_test_ekman, y_test_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.47      0.24      0.32       595
     disgust       0.58      0.37      0.45       112
        fear       0.65      0.49      0.56        87
         joy       0.72      0.74      0.73      1915
     sadness       0.67      0.41      0.51       341
    surprise       0.51      0.20      0.28       590
     neutral       0.52      0.75      0.62      1787

    accuracy                           0.60      5427
   macro avg       0.59      0.46      0.50      5427
weighted avg       0.60      0.60      0.58      5427



In [117]:
# Model evaluation
cr = classification_report(y_val_ekman, y_val_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.48      0.25      0.32       582
     disgust       0.52      0.41      0.46        81
        fear       0.61      0.43      0.50        89
         joy       0.76      0.75      0.76      1997
     sadness       0.68      0.40      0.50       352
    surprise       0.53      0.20      0.29       559
     neutral       0.52      0.77      0.62      1766

    accuracy                           0.61      5426
   macro avg       0.59      0.46      0.49      5426
weighted avg       0.62      0.61      0.59      5426



#### Model 6 - Final - Final

In [118]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_split=45, criterion='log_loss')
rf_model1.fit(X_train_ekman, y_train_ekman)


In [119]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_pred = rf_model1.predict(X_train_ekman)
y_test_ekman_pred = rf_model1.predict(X_test_ekman)
y_val_ekman_pred = rf_model1.predict(X_val_ekman)


print(" Train set results")
print(accuracy_score(y_train_ekman, y_train_ekman_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman, y_test_ekman_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman, y_val_ekman_pred))

 Train set results
0.8000460723335637
 Test set results
0.603464160678091
 Val set results
0.6166605234058238


In [120]:
# Model evaluation
cr = classification_report(y_train_ekman, y_train_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.84      0.64      0.73      4517
     disgust       0.82      0.56      0.67       694
        fear       0.72      0.64      0.68       642
         joy       0.88      0.88      0.88     15693
     sadness       0.85      0.66      0.74      2938
    surprise       0.89      0.54      0.67      4707
     neutral       0.71      0.90      0.79     14219

    accuracy                           0.80     43410
   macro avg       0.81      0.69      0.74     43410
weighted avg       0.81      0.80      0.80     43410



In [121]:
# Model evaluation
cr = classification_report(y_test_ekman, y_test_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.50      0.25      0.33       595
     disgust       0.58      0.37      0.45       112
        fear       0.62      0.52      0.56        87
         joy       0.74      0.74      0.74      1915
     sadness       0.68      0.42      0.52       341
    surprise       0.56      0.19      0.28       590
     neutral       0.52      0.77      0.62      1787

    accuracy                           0.60      5427
   macro avg       0.60      0.46      0.50      5427
weighted avg       0.61      0.60      0.58      5427



In [122]:
# Model evaluation
cr = classification_report(y_val_ekman, y_val_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.52      0.25      0.34       582
     disgust       0.49      0.42      0.45        81
        fear       0.60      0.43      0.50        89
         joy       0.77      0.74      0.76      1997
     sadness       0.68      0.40      0.51       352
    surprise       0.54      0.19      0.29       559
     neutral       0.52      0.79      0.63      1766

    accuracy                           0.62      5426
   macro avg       0.59      0.46      0.50      5426
weighted avg       0.62      0.62      0.60      5426



#### Model 6.1

In [80]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_split=40, criterion='gini')
rf_model1.fit(X_train_ekman, y_train_ekman)


In [81]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_pred = rf_model1.predict(X_train_ekman)
y_test_ekman_pred = rf_model1.predict(X_test_ekman)
y_val_ekman_pred = rf_model1.predict(X_val_ekman)


print(" Train set results")
print(accuracy_score(y_train_ekman, y_train_ekman_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman, y_test_ekman_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman, y_val_ekman_pred))

 Train set results
0.8304998848191661
 Test set results
0.6005159388243966
 Val set results
0.6137117582012532


In [82]:
# Model evaluation
cr = classification_report(y_train_ekman, y_train_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.86      0.70      0.78      4517
     disgust       0.86      0.66      0.75       694
        fear       0.78      0.70      0.74       642
         joy       0.89      0.89      0.89     15693
     sadness       0.88      0.72      0.79      2938
    surprise       0.90      0.62      0.73      4707
     neutral       0.75      0.91      0.82     14219

    accuracy                           0.83     43410
   macro avg       0.85      0.74      0.78     43410
weighted avg       0.84      0.83      0.83     43410



In [83]:
# Model evaluation
cr = classification_report(y_test_ekman, y_test_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.47      0.25      0.33       595
     disgust       0.57      0.35      0.43       112
        fear       0.66      0.48      0.56        87
         joy       0.73      0.75      0.74      1915
     sadness       0.66      0.41      0.51       341
    surprise       0.51      0.20      0.29       590
     neutral       0.52      0.75      0.62      1787

    accuracy                           0.60      5427
   macro avg       0.59      0.46      0.49      5427
weighted avg       0.60      0.60      0.58      5427



In [84]:
# Model evaluation
cr = classification_report(y_val_ekman, y_val_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.48      0.25      0.33       582
     disgust       0.52      0.40      0.45        81
        fear       0.62      0.43      0.51        89
         joy       0.76      0.75      0.76      1997
     sadness       0.67      0.40      0.50       352
    surprise       0.52      0.20      0.29       559
     neutral       0.52      0.77      0.62      1766

    accuracy                           0.61      5426
   macro avg       0.59      0.46      0.49      5426
weighted avg       0.62      0.61      0.59      5426



#### Model 7

In [128]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, max_features='log2', criterion='gini')
rf_model1.fit(X_train_ekman, y_train_ekman)


In [129]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_pred = rf_model1.predict(X_train_ekman)
y_test_ekman_pred = rf_model1.predict(X_test_ekman)
y_val_ekman_pred = rf_model1.predict(X_val_ekman)


print(" Train set results")
print(accuracy_score(y_train_ekman, y_train_ekman_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman, y_test_ekman_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman, y_val_ekman_pred))

 Train set results
0.9264224832987791
 Test set results
0.5918555371291689
 Val set results
0.61002580169554


In [130]:
# Model evaluation
cr = classification_report(y_train_ekman, y_train_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.95      0.88      0.91      4517
     disgust       0.96      0.86      0.91       694
        fear       0.99      0.88      0.93       642
         joy       0.96      0.95      0.95     15693
     sadness       0.97      0.91      0.94      2938
    surprise       0.96      0.83      0.89      4707
     neutral       0.86      0.96      0.91     14219

    accuracy                           0.93     43410
   macro avg       0.95      0.90      0.92     43410
weighted avg       0.93      0.93      0.93     43410



In [131]:
# Model evaluation
cr = classification_report(y_test_ekman, y_test_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.46      0.22      0.30       595
     disgust       0.54      0.29      0.38       112
        fear       0.65      0.45      0.53        87
         joy       0.71      0.74      0.73      1915
     sadness       0.65      0.42      0.51       341
    surprise       0.50      0.19      0.28       590
     neutral       0.52      0.74      0.61      1787

    accuracy                           0.59      5427
   macro avg       0.58      0.44      0.48      5427
weighted avg       0.59      0.59      0.57      5427



In [132]:
# Model evaluation
cr = classification_report(y_val_ekman, y_val_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.48      0.23      0.31       582
     disgust       0.43      0.35      0.38        81
        fear       0.67      0.35      0.46        89
         joy       0.75      0.76      0.75      1997
     sadness       0.68      0.37      0.48       352
    surprise       0.50      0.20      0.28       559
     neutral       0.52      0.77      0.62      1766

    accuracy                           0.61      5426
   macro avg       0.58      0.43      0.47      5426
weighted avg       0.61      0.61      0.59      5426



### Model 8

In [138]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_leaf=11)
rf_model1.fit(X_train_ekman, y_train_ekman)


In [139]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_pred = rf_model1.predict(X_train_ekman)
y_test_ekman_pred = rf_model1.predict(X_test_ekman)
y_val_ekman_pred = rf_model1.predict(X_val_ekman)


print(" Train set results")
print(accuracy_score(y_train_ekman, y_train_ekman_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman, y_test_ekman_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman, y_val_ekman_pred))

 Train set results
0.6236351071181755
 Test set results
0.6084392850562005
 Val set results
0.6229266494655363


In [140]:
# Model evaluation
cr = classification_report(y_train_ekman, y_train_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.59      0.24      0.34      4517
     disgust       0.58      0.30      0.39       694
        fear       0.61      0.41      0.49       642
         joy       0.76      0.75      0.76     15693
     sadness       0.72      0.39      0.50      2938
    surprise       0.70      0.17      0.28      4707
     neutral       0.52      0.83      0.64     14219

    accuracy                           0.62     43410
   macro avg       0.64      0.44      0.49     43410
weighted avg       0.65      0.62      0.60     43410



In [141]:
# Model evaluation
cr = classification_report(y_test_ekman, y_test_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.56      0.21      0.31       595
     disgust       0.64      0.34      0.44       112
        fear       0.63      0.46      0.53        87
         joy       0.74      0.75      0.74      1915
     sadness       0.72      0.35      0.47       341
    surprise       0.65      0.14      0.23       590
     neutral       0.51      0.82      0.63      1787

    accuracy                           0.61      5427
   macro avg       0.64      0.44      0.48      5427
weighted avg       0.63      0.61      0.58      5427



In [142]:
# Model evaluation
cr = classification_report(y_val_ekman, y_val_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.59      0.21      0.32       582
     disgust       0.54      0.38      0.45        81
        fear       0.68      0.40      0.51        89
         joy       0.77      0.75      0.76      1997
     sadness       0.71      0.34      0.46       352
    surprise       0.71      0.16      0.26       559
     neutral       0.51      0.84      0.64      1766

    accuracy                           0.62      5426
   macro avg       0.65      0.44      0.48      5426
weighted avg       0.65      0.62      0.59      5426



### Model 9

In [143]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_leaf=9)
rf_model1.fit(X_train_ekman, y_train_ekman)


In [144]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_pred = rf_model1.predict(X_train_ekman)
y_test_ekman_pred = rf_model1.predict(X_test_ekman)
y_val_ekman_pred = rf_model1.predict(X_val_ekman)


print(" Train set results")
print(accuracy_score(y_train_ekman, y_train_ekman_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman, y_test_ekman_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman, y_val_ekman_pred))

 Train set results
0.6266989173001613
 Test set results
0.6089920766537682
 Val set results
0.6227423516402506


In [145]:
# Model evaluation
cr = classification_report(y_train_ekman, y_train_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.58      0.24      0.34      4517
     disgust       0.59      0.30      0.39       694
        fear       0.60      0.42      0.50       642
         joy       0.76      0.76      0.76     15693
     sadness       0.74      0.39      0.51      2938
    surprise       0.70      0.18      0.29      4707
     neutral       0.53      0.83      0.64     14219

    accuracy                           0.63     43410
   macro avg       0.64      0.44      0.49     43410
weighted avg       0.65      0.63      0.60     43410



In [146]:
# Model evaluation
cr = classification_report(y_test_ekman, y_test_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.57      0.22      0.31       595
     disgust       0.63      0.34      0.44       112
        fear       0.65      0.49      0.56        87
         joy       0.74      0.75      0.74      1915
     sadness       0.72      0.35      0.47       341
    surprise       0.64      0.14      0.23       590
     neutral       0.51      0.82      0.63      1787

    accuracy                           0.61      5427
   macro avg       0.64      0.44      0.48      5427
weighted avg       0.63      0.61      0.58      5427



In [147]:
# Model evaluation
cr = classification_report(y_val_ekman, y_val_ekman_pred, target_names=class_label_names_ekman)
print(cr)

              precision    recall  f1-score   support

       anger       0.59      0.22      0.32       582
     disgust       0.54      0.38      0.45        81
        fear       0.63      0.40      0.49        89
         joy       0.77      0.75      0.76      1997
     sadness       0.70      0.34      0.46       352
    surprise       0.69      0.16      0.26       559
     neutral       0.52      0.83      0.64      1766

    accuracy                           0.62      5426
   macro avg       0.64      0.44      0.48      5426
weighted avg       0.65      0.62      0.59      5426



### Random Forest Model with EKMAN taxonomy 6 labels excluding neutral emotion 


#### Model 1

In [149]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_leaf=15)
rf_model1.fit(X_train_ekman_no_neu, y_train_ekman_no_neu)


In [150]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_no_neu_pred = rf_model1.predict(X_train_ekman_no_neu)
y_test_ekman_no_neu_pred = rf_model1.predict(X_test_ekman_no_neu)
y_val_ekman_no_neu_pred = rf_model1.predict(X_val_ekman_no_neu)


print(" Train set results")
print(accuracy_score(y_train_ekman_no_neu, y_train_ekman_no_neu_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman_no_neu, y_test_ekman_no_neu_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman_no_neu, y_val_ekman_no_neu_pred))

 Train set results
0.6838066527354322
 Test set results
0.6543956043956044
 Val set results
0.669672131147541


In [151]:
# Model evaluation
cr = classification_report(y_train_ekman_no_neu, y_train_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.57      0.47      0.52      4517
     disgust       0.56      0.34      0.42       694
        fear       0.69      0.36      0.48       642
         joy       0.70      0.94      0.80     15693
     sadness       0.75      0.44      0.56      2938
    surprise       0.70      0.29      0.41      4707

    accuracy                           0.68     29191
   macro avg       0.66      0.47      0.53     29191
weighted avg       0.68      0.68      0.65     29191



In [152]:
# Model evaluation
cr = classification_report(y_test_ekman_no_neu, y_test_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.53      0.43      0.48       595
     disgust       0.69      0.38      0.49       112
        fear       0.70      0.40      0.51        87
         joy       0.68      0.93      0.78      1915
     sadness       0.68      0.39      0.50       341
    surprise       0.59      0.23      0.33       590

    accuracy                           0.65      3640
   macro avg       0.65      0.46      0.51      3640
weighted avg       0.64      0.65      0.62      3640



In [153]:
# Model evaluation
cr = classification_report(y_val_ekman_no_neu, y_val_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.56      0.42      0.48       582
     disgust       0.47      0.33      0.39        81
        fear       0.76      0.38      0.51        89
         joy       0.69      0.93      0.79      1997
     sadness       0.72      0.41      0.53       352
    surprise       0.65      0.25      0.36       559

    accuracy                           0.67      3660
   macro avg       0.64      0.46      0.51      3660
weighted avg       0.66      0.67      0.63      3660



### Model 2

In [154]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_leaf=10)
rf_model1.fit(X_train_ekman_no_neu, y_train_ekman_no_neu)


In [155]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_no_neu_pred = rf_model1.predict(X_train_ekman_no_neu)
y_test_ekman_no_neu_pred = rf_model1.predict(X_test_ekman_no_neu)
y_val_ekman_no_neu_pred = rf_model1.predict(X_val_ekman_no_neu)


print(" Train set results")
print(accuracy_score(y_train_ekman_no_neu, y_train_ekman_no_neu_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman_no_neu, y_test_ekman_no_neu_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman_no_neu, y_val_ekman_no_neu_pred))

 Train set results
0.7002843342125997
 Test set results
0.6664835164835164
 Val set results
0.6816939890710383


In [156]:
# Model evaluation
cr = classification_report(y_train_ekman_no_neu, y_train_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.58      0.51      0.55      4517
     disgust       0.57      0.38      0.46       694
        fear       0.67      0.42      0.52       642
         joy       0.72      0.93      0.81     15693
     sadness       0.75      0.48      0.59      2938
    surprise       0.68      0.33      0.45      4707

    accuracy                           0.70     29191
   macro avg       0.66      0.51      0.56     29191
weighted avg       0.69      0.70      0.68     29191



In [157]:
# Model evaluation
cr = classification_report(y_test_ekman_no_neu, y_test_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.53      0.47      0.50       595
     disgust       0.69      0.40      0.51       112
        fear       0.71      0.48      0.58        87
         joy       0.70      0.92      0.80      1915
     sadness       0.69      0.43      0.53       341
    surprise       0.57      0.26      0.35       590

    accuracy                           0.67      3640
   macro avg       0.65      0.49      0.54      3640
weighted avg       0.65      0.67      0.64      3640



In [158]:
# Model evaluation
cr = classification_report(y_val_ekman_no_neu, y_val_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.57      0.47      0.52       582
     disgust       0.46      0.36      0.40        81
        fear       0.73      0.43      0.54        89
         joy       0.71      0.92      0.80      1997
     sadness       0.70      0.44      0.54       352
    surprise       0.62      0.29      0.39       559

    accuracy                           0.68      3660
   macro avg       0.63      0.48      0.53      3660
weighted avg       0.67      0.68      0.65      3660



### Model 3

In [159]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_leaf=9)
rf_model1.fit(X_train_ekman_no_neu, y_train_ekman_no_neu)


In [160]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_no_neu_pred = rf_model1.predict(X_train_ekman_no_neu)
y_test_ekman_no_neu_pred = rf_model1.predict(X_test_ekman_no_neu)
y_val_ekman_no_neu_pred = rf_model1.predict(X_val_ekman_no_neu)


print(" Train set results")
print(accuracy_score(y_train_ekman_no_neu, y_train_ekman_no_neu_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman_no_neu, y_test_ekman_no_neu_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman_no_neu, y_val_ekman_no_neu_pred))

 Train set results
0.7045322188345723
 Test set results
0.6706043956043956
 Val set results
0.6841530054644809


In [161]:
# Model evaluation
cr = classification_report(y_train_ekman_no_neu, y_train_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.58      0.52      0.55      4517
     disgust       0.57      0.38      0.46       694
        fear       0.67      0.43      0.52       642
         joy       0.73      0.93      0.82     15693
     sadness       0.75      0.48      0.59      2938
    surprise       0.68      0.35      0.46      4707

    accuracy                           0.70     29191
   macro avg       0.67      0.52      0.57     29191
weighted avg       0.70      0.70      0.68     29191



In [162]:
# Model evaluation
cr = classification_report(y_test_ekman_no_neu, y_test_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.54      0.47      0.50       595
     disgust       0.70      0.42      0.53       112
        fear       0.70      0.48      0.57        87
         joy       0.71      0.92      0.80      1915
     sadness       0.69      0.43      0.53       341
    surprise       0.57      0.27      0.37       590

    accuracy                           0.67      3640
   macro avg       0.65      0.50      0.55      3640
weighted avg       0.66      0.67      0.64      3640



In [163]:
# Model evaluation
cr = classification_report(y_val_ekman_no_neu, y_val_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.57      0.48      0.52       582
     disgust       0.47      0.37      0.41        81
        fear       0.75      0.43      0.54        89
         joy       0.71      0.92      0.80      1997
     sadness       0.70      0.44      0.54       352
    surprise       0.62      0.30      0.40       559

    accuracy                           0.68      3660
   macro avg       0.64      0.49      0.54      3660
weighted avg       0.67      0.68      0.66      3660



### Model 4

In [164]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_leaf=8)
rf_model1.fit(X_train_ekman_no_neu, y_train_ekman_no_neu)


In [165]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_no_neu_pred = rf_model1.predict(X_train_ekman_no_neu)
y_test_ekman_no_neu_pred = rf_model1.predict(X_test_ekman_no_neu)
y_val_ekman_no_neu_pred = rf_model1.predict(X_val_ekman_no_neu)


print(" Train set results")
print(accuracy_score(y_train_ekman_no_neu, y_train_ekman_no_neu_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman_no_neu, y_test_ekman_no_neu_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman_no_neu, y_val_ekman_no_neu_pred))

 Train set results
0.7103901887568086
 Test set results
0.6711538461538461
 Val set results
0.6860655737704918


In [166]:
# Model evaluation
cr = classification_report(y_train_ekman_no_neu, y_train_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.59      0.54      0.56      4517
     disgust       0.57      0.39      0.46       694
        fear       0.67      0.43      0.52       642
         joy       0.74      0.93      0.82     15693
     sadness       0.75      0.49      0.59      2938
    surprise       0.68      0.37      0.48      4707

    accuracy                           0.71     29191
   macro avg       0.67      0.52      0.57     29191
weighted avg       0.70      0.71      0.69     29191



In [167]:
# Model evaluation
cr = classification_report(y_test_ekman_no_neu, y_test_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.53      0.48      0.51       595
     disgust       0.69      0.41      0.51       112
        fear       0.70      0.49      0.58        87
         joy       0.71      0.92      0.80      1915
     sadness       0.69      0.43      0.53       341
    surprise       0.56      0.28      0.37       590

    accuracy                           0.67      3640
   macro avg       0.65      0.50      0.55      3640
weighted avg       0.65      0.67      0.64      3640



In [168]:
# Model evaluation
cr = classification_report(y_val_ekman_no_neu, y_val_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.58      0.49      0.53       582
     disgust       0.46      0.37      0.41        81
        fear       0.74      0.42      0.53        89
         joy       0.72      0.92      0.81      1997
     sadness       0.70      0.44      0.54       352
    surprise       0.60      0.31      0.41       559

    accuracy                           0.69      3660
   macro avg       0.63      0.49      0.54      3660
weighted avg       0.67      0.69      0.66      3660



### Model 5 - Final

In [169]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_leaf=7)
rf_model1.fit(X_train_ekman_no_neu, y_train_ekman_no_neu)


In [170]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_no_neu_pred = rf_model1.predict(X_train_ekman_no_neu)
y_test_ekman_no_neu_pred = rf_model1.predict(X_test_ekman_no_neu)
y_val_ekman_no_neu_pred = rf_model1.predict(X_val_ekman_no_neu)


print(" Train set results")
print(accuracy_score(y_train_ekman_no_neu, y_train_ekman_no_neu_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman_no_neu, y_test_ekman_no_neu_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman_no_neu, y_val_ekman_no_neu_pred))

 Train set results
0.7161111301428522
 Test set results
0.673901098901099
 Val set results
0.6879781420765028


In [171]:
# Model evaluation
cr = classification_report(y_train_ekman_no_neu, y_train_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.59      0.55      0.57      4517
     disgust       0.57      0.39      0.47       694
        fear       0.67      0.43      0.53       642
         joy       0.75      0.93      0.83     15693
     sadness       0.74      0.49      0.59      2938
    surprise       0.68      0.39      0.50      4707

    accuracy                           0.72     29191
   macro avg       0.67      0.53      0.58     29191
weighted avg       0.71      0.72      0.70     29191



In [172]:
# Model evaluation
cr = classification_report(y_test_ekman_no_neu, y_test_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.54      0.49      0.51       595
     disgust       0.68      0.40      0.51       112
        fear       0.72      0.49      0.59        87
         joy       0.72      0.91      0.80      1915
     sadness       0.68      0.44      0.54       341
    surprise       0.55      0.29      0.38       590

    accuracy                           0.67      3640
   macro avg       0.65      0.51      0.55      3640
weighted avg       0.66      0.67      0.65      3640



In [173]:
# Model evaluation
cr = classification_report(y_val_ekman_no_neu, y_val_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.58      0.50      0.54       582
     disgust       0.48      0.40      0.43        81
        fear       0.73      0.42      0.53        89
         joy       0.73      0.91      0.81      1997
     sadness       0.70      0.44      0.54       352
    surprise       0.59      0.33      0.43       559

    accuracy                           0.69      3660
   macro avg       0.63      0.50      0.54      3660
weighted avg       0.67      0.69      0.67      3660



### Model 6

In [85]:
rf_model1 = RandomForestClassifier(random_state=42, n_jobs=-1, n_estimators=400, min_samples_leaf=7, max_features=None)
rf_model1.fit(X_train_ekman_no_neu, y_train_ekman_no_neu)


In [86]:
# Predicting on train, test and validation datasets
from sklearn.metrics import confusion_matrix, accuracy_score

y_train_ekman_no_neu_pred = rf_model1.predict(X_train_ekman_no_neu)
y_test_ekman_no_neu_pred = rf_model1.predict(X_test_ekman_no_neu)
y_val_ekman_no_neu_pred = rf_model1.predict(X_val_ekman_no_neu)


print(" Train set results")
print(accuracy_score(y_train_ekman_no_neu, y_train_ekman_no_neu_pred))
print(" Test set results")
print(accuracy_score(y_test_ekman_no_neu, y_test_ekman_no_neu_pred))
print(" Val set results")
print(accuracy_score(y_val_ekman_no_neu, y_val_ekman_no_neu_pred))

 Train set results
0.7424206090918434
 Test set results
0.6574175824175824
 Val set results
0.6751366120218579


In [87]:
# Model evaluation
cr = classification_report(y_train_ekman_no_neu, y_train_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.64      0.62      0.63      4517
     disgust       0.64      0.38      0.47       694
        fear       0.72      0.40      0.52       642
         joy       0.78      0.93      0.85     15693
     sadness       0.76      0.52      0.62      2938
    surprise       0.68      0.48      0.57      4707

    accuracy                           0.74     29191
   macro avg       0.70      0.56      0.61     29191
weighted avg       0.73      0.74      0.73     29191



In [88]:
# Model evaluation
cr = classification_report(y_test_ekman_no_neu, y_test_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.50      0.47      0.48       595
     disgust       0.60      0.31      0.41       112
        fear       0.73      0.44      0.55        87
         joy       0.72      0.88      0.79      1915
     sadness       0.66      0.46      0.54       341
    surprise       0.49      0.33      0.40       590

    accuracy                           0.66      3640
   macro avg       0.62      0.48      0.53      3640
weighted avg       0.64      0.66      0.64      3640



In [89]:
# Model evaluation
cr = classification_report(y_val_ekman_no_neu, y_val_ekman_no_neu_pred, target_names=class_label_names_ekman_no_neu)
print(cr)

              precision    recall  f1-score   support

       anger       0.53      0.48      0.50       582
     disgust       0.45      0.36      0.40        81
        fear       0.71      0.33      0.45        89
         joy       0.74      0.88      0.80      1997
     sadness       0.67      0.46      0.54       352
    surprise       0.52      0.38      0.44       559

    accuracy                           0.68      3660
   macro avg       0.60      0.48      0.52      3660
weighted avg       0.66      0.68      0.66      3660



*   In this notebook, we constructed a dummy model that always predicts the 'Neutral' emotion. Given that this emotion is the most represented, the "model" has reasonable performances when it comes to detecting 'Neutral', but has poor global performances.
* The baseline model we trained allowed an increase in the score but it is still very low. This can be due to the fact that it considers the words in a text sample only according to their importance, and does not put them in their context (a sample is a combination of independent words)
*  In the next step, we are going to be using an algorithme that adresses the latter issue usinf the mechanism of 'attention': The BERT model

In [1]:
import torch
from GPUtil import showUtilization as gpu_usage
from numba import cuda

def free_gpu_cache():
    print("Initial GPU Usage")
    gpu_usage()                             

    torch.cuda.empty_cache()

    cuda.select_device(0)
    cuda.close()
    cuda.select_device(0)

    print("GPU Usage after emptying the cache")
    gpu_usage()

free_gpu_cache()

Initial GPU Usage
| ID | GPU | MEM |
------------------
|  0 |  0% | 18% |
GPU Usage after emptying the cache
| ID | GPU | MEM |
------------------
|  0 |  0% | 19% |
