<a href="https://colab.research.google.com/github/adrien50/basicprojectmachinelearning/blob/main/text_Emotions_Detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

###Introduction
Emotion is the state of mind that is aligned with feelings, and thoughts usually directed toward a specific object. Emotion is a behavior that reflects personal significance or opinion regarding the interaction we have with other human beings or related to a certain event. The human being is able to identify the emotions from textual data and can understand the matter of the text. But if you think about the machines, can they able to identify the emotions from the text?

###1. What is text2emotion?

Text2emotion is the python package developed with the clear intension to find the appropriate emotions embedded in the text data. The research says that when the human is in the thinking process and he is damn sure about his statement then he will express his emotions in the right context of manner and it will be proper aligned in case of words expressing those emotions.
If I want to tell you in simple words then consider if the customer did not like the product by investing the large amount then he will surely give his feedback like “I am very angry by your product services and gonna file a complaint regarding this issue”. When you read this kind of feedback then you will be sure that the customer is completely angry about product services and we have to improve it as soon as possible. Text2emotion works, in the same manner, to extract the emotions from the text.

Let’s conclude this, Text2Emotion is the python package that will assist you to pull out the emotions from the content.

Processes any textual data, recognizes the emotion embedded in it, and provides the output in the form of a dictionary.

Well suited with 5 basic emotion categories such as Happy, Angry, Sad, Surprise, and Fear.

In [1]:
#Install package using pip
!pip install text2emotion

Collecting text2emotion
[?25l  Downloading https://files.pythonhosted.org/packages/fe/31/b190e37c1396ca68ab1b5c8ea1a23f2f7848df532ad69133e94853120aed/text2emotion-0.0.5-py3-none-any.whl (57kB)
[K     |████████████████████████████████| 61kB 4.6MB/s 
[?25hCollecting emoji>=0.6.0
[?25l  Downloading https://files.pythonhosted.org/packages/24/fa/b3368f41b95a286f8d300e323449ab4e86b85334c2e0b477e94422b8ed0f/emoji-1.2.0-py3-none-any.whl (131kB)
[K     |████████████████████████████████| 133kB 11.0MB/s 
Installing collected packages: emoji, text2emotion
Successfully installed emoji-1.2.0 text2emotion-0.0.5


In [2]:
#Import the modules
import text2emotion as te

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.


In [3]:
text = "I was asked to sign a third party contract a week out from stay. If it wasn't an 8 person group that took a lot of wrangling I would have cancelled the booking straight away. Bathrooms - there are no stand alone bathrooms. Please consider this - you have to clear out the main bedroom to use that bathroom. Other option is you walk through a different bedroom to get to its en-suite. Signs all over the apartment - there are signs everywhere - some helpful - some telling you rules. Perhaps some people like this but It negatively affected our enjoyment of the accommodation. Stairs - lots of them - some had slightly bending wood which caused a minor injury."

In [4]:
#Call to the function
te.get_emotion(text)

{'Angry': 0.12, 'Fear': 0.42, 'Happy': 0.04, 'Sad': 0.33, 'Surprise': 0.08}

Here, we got the output in terms of the dictionary where we have emotion categories along with the respective score.
Now, if we think about the scores of the relative emotion categories then the Fear score is 0.42 & the Sad score is 0.33. So on the overall analysis, we can say that the statement we took as input has the Fear & Sad tone.
The good thing about the package is, it is able to identify the emotion from the emojis which describes human behavior. Let’s take an example,

E-Commerce Industry: Customer Engagement Endpoint

Analyzing the input received from customers through various sources such as textual data from chat-bots, logs from contact centers, emails, etc. Tracking these tone signals can help Customer Service Managers improve how their teams interact with customers.

Social Media Monitoring

In today’s digital world Brand Monitoring and reputation management has become one of the most important aspects of every business unit. This is where emotion analysis comes into the picture. It will help companies by allowing them: In tracking the perception of the company by the consumers, in pointing out the attitude of the consumers by giving specific details, finding different patterns and trends, in keeping a close look on the demonstration by the influencers.


In [5]:
text = "Day was pretty amazing😃😃"
te.get_emotion(text)

{'Angry': 0.0, 'Fear': 0.0, 'Happy': 1.0, 'Sad': 0.0, 'Surprise': 0.0}

From the output, you can conclude that the text input belongs to the Happy and Surprise emotion category.

###text Emotions Detection with Machine Learning
In machine learning, the detection of textual emotions is the problem of content-based classification, which is the task of natural language processing. Detecting a person’s emotions is a difficult task, but detecting the emotions using text written by a person is even more difficult as a human can express his emotions in any form.



Usually, emotions are expressed as joy, sadness, anger, surprise, hate, fear, etc. Recognizing this type of emotion from a text written by a person plays an important role in applications such as chatbots, customer support forum, customer reviews etc. In the section below, I will take you through a machine learning project on Text Emotions Detection using Python where I will build a machine learning model to classify the emotions of a text.

###Text Emotions Detection using Python
For detecting emotions from the text, I will perform a few steps that will start with preparing the data. Then the next step will be tokenization where the textual data will be converted into tokens and from these tokens, we have to identify the emotional words.

These emotional words will be the keyword to classify the emotions of a text. Next, we’ll frame this task in such a way that a text will be taken as an input and the emoji that represents the emotions in that text is generated as the output.



In [6]:
import re 
from collections import Counter
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.svm import SVC
from sklearn.svm import LinearSVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier

def read_data(file):
    data = []
    with open(file, 'r')as f:
        for line in f:
            line = line.strip()
            label = ' '.join(line[1:line.find("]")].strip().split())
            text = line[line.find("]")+1:].strip()
            data.append([label, text])
    return data

file = 'text.txt'
data = read_data(file)
print("Number of instances: {}".format(len(data)))

Number of instances: 7480


Now I will create two Python functions for tokenization and generating the features of an input sentence:

In [7]:
def ngram(token, n): 
    output = []
    for i in range(n-1, len(token)): 
        ngram = ' '.join(token[i-n+1:i+1])
        output.append(ngram) 
    return output

def create_feature(text, nrange=(1, 1)):
    text_features = [] 
    text = text.lower() 
    text_alphanum = re.sub('[^a-z0-9#]', ' ', text)
    for n in range(nrange[0], nrange[1]+1): 
        text_features += ngram(text_alphanum.split(), n)    
    text_punc = re.sub('[a-z0-9]', ' ', text)
    text_features += ngram(text_punc.split(), 1)
    return Counter(text_features)

Now I will create a Python function to store the labels, our labels will be based on emotions such as Joy, Fear, Anger, and so on:

In [8]:
def convert_label(item, name): 
    items = list(map(float, item.split()))
    label = ""
    for idx in range(len(items)): 
        if items[idx] == 1: 
            label += name[idx] + " "
    
    return label.strip()

emotions = ["joy", 'fear', "anger", "sadness", "disgust", "shame", "guilt"]

X_all = []
y_all = []
for label, text in data:
    y_all.append(convert_label(label, emotions))
    X_all.append(create_feature(text, nrange=(1, 4)))

In [9]:
X_train, X_test, y_train, y_test = train_test_split(X_all, y_all, test_size = 0.2, random_state = 123)

def train_test(clf, X_train, X_test, y_train, y_test):
    clf.fit(X_train, y_train)
    train_acc = accuracy_score(y_train, clf.predict(X_train))
    test_acc = accuracy_score(y_test, clf.predict(X_test))
    return train_acc, test_acc

from sklearn.feature_extraction import DictVectorizer
vectorizer = DictVectorizer(sparse = True)
X_train = vectorizer.fit_transform(X_train)
X_test = vectorizer.transform(X_test)

Now I’m going to train four machine learning models and then choose the model that works best on the training and testing sets:

In [10]:
svc = SVC()
lsvc = LinearSVC(random_state=123)
rforest = RandomForestClassifier(random_state=123)
dtree = DecisionTreeClassifier()

clifs = [svc, lsvc, rforest, dtree]

# train and test them 
print("| {:25} | {} | {} |".format("Classifier", "Training Accuracy", "Test Accuracy"))
print("| {} | {} | {} |".format("-"*25, "-"*17, "-"*13))
for clf in clifs: 
    clf_name = clf.__class__.__name__
    train_acc, test_acc = train_test(clf, X_train, X_test, y_train, y_test)
    print("| {:25} | {:17.7f} | {:13.7f} |".format(clf_name, train_acc, test_acc))

| Classifier                | Training Accuracy | Test Accuracy |
| ------------------------- | ----------------- | ------------- |
| SVC                       |         0.9067513 |     0.4512032 |




| LinearSVC                 |         0.9988302 |     0.5768717 |
| RandomForestClassifier    |         0.9988302 |     0.5541444 |
| DecisionTreeClassifier    |         0.9988302 |     0.4598930 |
