<a href="https://colab.research.google.com/github/TalhaBinZahid/Text-Base-Emotion-Detections/blob/main/Copy_of_Emotions_Detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ****PROJECT****

# ****Text Emotions Detection with Machine Learning****


----


---


# **Introduction**
In machine learning, the detection of textual emotions is the problem of content-based classification, which is the task of natural language processing. Detecting a person’s emotions is a difficult task, but detecting the emotions using text written by a person is even more difficult as a human can express his emotions in any form.\
Usually, emotions are expressed as joy, sadness, anger, surprise, hate, fear, etc. Recognizing this type of emotion from a text written by a person plays an important role in applications such as chatbots, customer support forum, customer reviews etc. In the section below, I will take you through a machine learning project on Text Emotions Detection using Python where I will build a machine learning model to classify the emotions of a text.


In [None]:
import re
from collections import Counter
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.svm import SVC
from sklearn.svm import LinearSVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier

def read_data(file):
    data = []
    with open(file, 'r')as f:
        for line in f:
            line = line.strip()
            label = ' '.join(line[1:line.find("]")].strip().split())
            text = line[line.find("]")+1:].strip()
            data.append([label, text])
    return data

file = '/content/drive/MyDrive/Machine Learning/emojis.txt'
data = read_data(file)
print("Number of instances: {}".format(len(data)))

Number of instances: 7480


Check the data, if there any duplication or empty data.

In [None]:
# Check for empty data
if len(data) == 0:
    print("Data is empty")
else:
    print("Data is not empty")

# Check for duplicate data
seen = set()
duplicates = []
for label, text in data:
    if (label, text) in seen:
        duplicates.append((text))
    else:
        seen.add((text))

if duplicates:
    print("Duplicate entries found:")
    for duplicate in duplicates:
        print(duplicate)
else:
    print("No duplicates found")


Data is not empty
No duplicates found


Now I will create two Python functions for tokenization and generating the features of an input sentence:

In [None]:
def ngram(token, n):
    output = []
    for i in range(n-1, len(token)):
        ngram = ' '.join(token[i-n+1:i+1])
        output.append(ngram)
    return output

def create_feature(text, nrange=(1, 1)):
    text_features = []
    text = text.lower()
    text_alphanum = re.sub('[^a-z0-9#]', ' ', text)
    for n in range(nrange[0], nrange[1]+1):
        text_features += ngram(text_alphanum.split(), n)
    # text_punc = re.sub('[a-z0-9]', ' ', text)
    # text_features += ngram(text_punc.split(), 1)
    return Counter(text_features)

Now I will create a Python function to store the labels, our labels will be based on emotions such as Joy, Fear, Anger, and so on:

In [None]:
def convert_label(item, name):
    items = list(map(float, item.split()))
    label = ""
    for idx in range(len(items)):
        if items[idx] == 1:
            label += name[idx] + " "

    return label.strip()

emotions = ["joy", 'fear', "anger", "sadness", "disgust", "shame", "guilt"]

X_all = []
y_all = []
for label, text in data:
    y_all.append(convert_label(label, emotions))
    X_all.append(create_feature(text, nrange=(1, 4)))

Now I will split the data into training and test sets:

In [None]:

X_train, X_test, y_train, y_test = train_test_split(X_all, y_all, test_size = 0.2, random_state = 123)

def train_test(clf, X_train, X_test, y_train, y_test):
    clf.fit(X_train, y_train)
    train_acc = accuracy_score(y_train, clf.predict(X_train))
    test_acc = accuracy_score(y_test, clf.predict(X_test))
    return train_acc, test_acc

from sklearn.feature_extraction import DictVectorizer
vectorizer = DictVectorizer(sparse = True)
X_train = vectorizer.fit_transform(X_train)
X_test = vectorizer.transform(X_test)

Now I’m going to train four machine learning models and then choose the model that works best on the training and testing sets:

In [None]:
svc = SVC()
lsvc = LinearSVC(max_iter=10000, random_state=123)
rforest = RandomForestClassifier(random_state=123)
dtree = DecisionTreeClassifier()

clifs = [svc, lsvc, rforest, dtree]

# train and test them
print("| {:25} | {} | {} |".format("Classifier", "Training Accuracy", "Test Accuracy"))
print("| {} | {} | {} |".format("-"*25, "-"*17, "-"*13))
for clf in clifs:
    clf_name = clf.__class__.__name__
    train_acc, test_acc = train_test(clf=clf, X_train=X_train, X_test=X_test, y_train=y_train, y_test=y_test)
    print("| {:25} | {:17.7f} | {:13.7f} |".format(clf_name, train_acc, test_acc))

| Classifier                | Training Accuracy | Test Accuracy |
| ------------------------- | ----------------- | ------------- |
| SVC                       |         0.9119318 |     0.4525401 |
| LinearSVC                 |         0.9988302 |     0.5822193 |
| RandomForestClassifier    |         0.9988302 |     0.5407754 |
| DecisionTreeClassifier    |         0.9988302 |     0.4632353 |


# **Detecting Emotion**
Now, I’m going to assign an emoji to each label that is emotions in this problem, then I’ll write 4 input sentences, then I’ll use our trained machine learning model to take a look at the emotions of our input sentences:

In [None]:
l = ["joy", 'fear', "anger", "sadness", "disgust", "shame", "guilt"]
l.sort()
label_freq = {}
for label, _ in data:
    label_freq[label] = label_freq.get(label, 0) + 1

# print the labels and their counts in sorted order
for l in sorted(label_freq, key=label_freq.get, reverse=True):
    print("{:10}({})  {}".format(convert_label(l, emotions), l, label_freq[l]))

joy       (1. 0. 0. 0. 0. 0. 0.)  1084
anger     (0. 0. 1. 0. 0. 0. 0.)  1080
sadness   (0. 0. 0. 1. 0. 0. 0.)  1079
fear      (0. 1. 0. 0. 0. 0. 0.)  1078
disgust   (0. 0. 0. 0. 1. 0. 0.)  1057
guilt     (0. 0. 0. 0. 0. 0. 1.)  1057
shame     (0. 0. 0. 0. 0. 1. 0.)  1045


In [None]:
emoji_dict = {"joy":"😂 Joy", "fear":"😱 Fear", "anger":"😠 Anger", "sadness":"😢 Sadness", "disgust":"😒 Disgust", "shame":"😳 Shame", "guilt":"😳 Guilt"}

t1 = "I feel so happy right now!"
t2 = "Watching a horror movie late at night always gives me a sense of fear."
t3 = "I feel guilty for lying to you."
t4 = "The sight of the rotten food disgusted me."
t5 = input("Enter your text for emotion detection: ")

texts = [t1, t2, t3, t4, t5]

for text in texts:
    features = create_feature(text, nrange=(1, 4))
    features = vectorizer.transform(features)
    prediction = clf.predict(features)[0]
    print(f"The Text for Emotion Detection is:{text} \nThe Predicted Emoji for text is:{emoji_dict[prediction]}")

Enter your text for emotion detection: The sight of the rotten food disgusted me.
The Text for Emotion Detection is:I feel so happy right now! 
The Predicted Emoji for text is:😂 Joy
The Text for Emotion Detection is:Watching a horror movie late at night always gives me a sense of fear. 
The Predicted Emoji for text is:😱 Fear
The Text for Emotion Detection is:I feel guilty for lying to you. 
The Predicted Emoji for text is:😳 Guilt
The Text for Emotion Detection is:The sight of the rotten food disgusted me. 
The Predicted Emoji for text is:😒 Disgust
The Text for Emotion Detection is:The sight of the rotten food disgusted me. 
The Predicted Emoji for text is:😒 Disgust
