## Text Emotion Classification

### Introduction

**Text emotions classification** is the problem of assigning emotion to a text by understanding the context and the emotion behind the text. One real-world example is the keyboard of a phone that recommends the most relevant emoji by understanding the text. So, if you want to learn how to classify the emotions of a text, this article is for you. In this article, I will take you through the task of text emotions classification with Machine Learning using Python.
- **Text emotion classification is a task within natural language processing and text classification where the goal is to train a model to identify the emotion conveyed in a text.** To achieve this, we need labeled datasets that associate texts with their corresponding emotions. I discovered an excellent dataset on Kaggle that fits this requirement.
- In the following sections, **I will guide you through the process of training a text classification model for text emotion classification using Machine Learning and Python.**

In [1]:
import pandas as pd
import numpy as np
import keras
import tensorflow
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Embedding, Flatten, Dense


data = pd.read_csv("C:/Users/asus/OneDrive/Desktop/ML_Datasets/project/More_Projects/train.txt", sep=';')
data.columns = ["Text", "Emotions"]
print(data.head())

                                                Text Emotions
0  i can go from feeling so hopeless to so damned...  sadness
1   im grabbing a minute to post i feel greedy wrong    anger
2  i am ever feeling nostalgic about the fireplac...     love
3                               i am feeling grouchy    anger
4  ive been feeling a little burdened lately wasn...  sadness


**from tensorflow.keras.preprocessing.text import Tokenizer:**
- The Tokenizer class is used to convert text into sequences of integers. Each word in the text is assigned a unique integer index based on its frequency in the dataset.

**from tensorflow.keras.preprocessing.sequence import pad_sequences:**
- The pad_sequences function is used to ensure that all sequences (lists of integers) have the same length. This is often necessary for training neural network models, as they require input of consistent dimensions.

**from keras.models import Sequential:**
- The Sequential class is a linear stack of layers in Keras. It allows you to build a model layer by layer, which is ideal for simple models with a single input and output.

**from keras.layers import Embedding, Flatten, Dense:**
- **Embedding:** This layer is used to create dense word embeddings from integer sequences. It converts positive integers (indexes) into dense vectors of fixed size.
- **Flatten:** This layer flattens the input, i.e., it converts a multi-dimensional tensor into a single-dimensional tensor. This is useful when transitioning from convolutional layers to dense layers.
- **Dense:** This is a fully connected neural network layer. Each neuron in this layer is connected to every neuron in the previous layer.

**As this is a problem of natural language processing, I’ll start by tokenizing the data:**

In [2]:
texts = data["Text"].tolist()
labels = data["Emotions"].tolist()

# Tokenize the text data
tokenizer = Tokenizer()
tokenizer.fit_on_texts(texts)

**Now we will pad the sequence, means all the list of integers should have the same length**

In [3]:
sequences = tokenizer.texts_to_sequences(texts)
max_length = max([len(seq) for seq in sequences])
padded_sequences = pad_sequences(sequences, maxlen=max_length)

**use the label encoder method to convert the classes from strings to a numerical representation:**

In [4]:
# Encode the string labels to integers
label_encoder = LabelEncoder()
labels = label_encoder.fit_transform(labels)

We are now going to One-hot encode the labels. One hot encoding refers to the transformation of categorical labels into a binary representation where each label is represented as a vector of all zeros except a single 1. This is necessary because machine learning algorithms work with numerical data. So here is how we can One-hot encode the labels:

In [5]:
one_hot_labels = keras.utils.to_categorical(labels)

**Now we will split the data into training and test sets:**

In [6]:
# Split the data into training and testing sets
xtrain, xtest, ytrain, ytest = train_test_split(padded_sequences, 
                                                one_hot_labels, 
                                                test_size=0.2)

**A neural network** is a computational model inspired by the way biological neural networks in the human brain process information. It consists of interconnected units called neurons or nodes, which work together to recognize patterns, learn from data, and make decisions. **Neural networks are a fundamental part of deep learning, a subset of machine learning.**

Now let’s define a neural network architecture for our classification problem and use it to train a model to classify emotions:

In [7]:
# Define the model
model = Sequential()
model.add(Embedding(input_dim=len(tokenizer.word_index) + 1, 
                    output_dim=128))
model.add(Flatten())
model.add(Dense(units=128, activation="relu"))
model.add(Dense(units=len(one_hot_labels[0]), activation="softmax"))

In [8]:
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])
model.fit(xtrain, ytrain, epochs=10, batch_size=32, validation_data=(xtest, ytest))

Epoch 1/10
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 28ms/step - accuracy: 0.3841 - loss: 1.5245 - val_accuracy: 0.6878 - val_loss: 0.8970
Epoch 2/10
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 28ms/step - accuracy: 0.8529 - loss: 0.4711 - val_accuracy: 0.8259 - val_loss: 0.5399
Epoch 3/10
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 28ms/step - accuracy: 0.9816 - loss: 0.0731 - val_accuracy: 0.8178 - val_loss: 0.5847
Epoch 4/10
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 28ms/step - accuracy: 0.9939 - loss: 0.0249 - val_accuracy: 0.8206 - val_loss: 0.6200
Epoch 5/10
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 31ms/step - accuracy: 0.9965 - loss: 0.0192 - val_accuracy: 0.8288 - val_loss: 0.6260
Epoch 6/10
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 31ms/step - accuracy: 0.9977 - loss: 0.0110 - val_accuracy: 0.8253 - val_loss: 0.6707
Epoch 7/10
[1m4

<keras.src.callbacks.history.History at 0x1c28cde52d0>

**Optimizer:**
- **adam:** Adam (short for Adaptive Moment Estimation) is an optimization algorithm that adjusts the learning rate for each parameter. It's efficient and well-suited for a wide range of problems, especially those with large datasets or many parameters.

**Loss Function:**
 - **categorical_crossentropy** This loss function is used for multi-class classification tasks where the labels are one-hot encoded. It measures the difference between the predicted probability distribution and the actual distribution (one-hot encoded labels).

**Metrics:**
 - **accuracy:** This metric calculates how often the model's predictions match the true labels. Accuracy is a common metric for classification tasks.
 
**Training Data:**

- xtrain: The training input data (padded sequences of text).
- ytrain: The training target data (one-hot encoded labels).

**Epochs:**
- **epochs=10:** The number of times the entire training dataset will be passed through the model. Each epoch consists of one complete forward and backward pass of all the training examples.

**Batch Size:**
 - **batch_size=32:** The number of training examples utilized in one forward/backward pass. The model's weights are updated once per batch.

**Validation Data:**
 - **validation_data=(xtest, ytest):** Data used to evaluate the model's performance during training, but not used for updating the model's weights. This helps to monitor overfitting and gives an indication of how well the model generalizes to unseen data.

**Now let’s take a sentence as an input text and see how the model performs:**

In [11]:
input_text = str(input())

# Preprocess the input text
input_sequence = tokenizer.texts_to_sequences([input_text])
padded_input_sequence = pad_sequences(input_sequence, maxlen=max_length)
prediction = model.predict(padded_input_sequence)
predicted_label = label_encoder.inverse_transform([np.argmax(prediction[0])])
print(predicted_label)

There are places i feel so good visiting
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step
['joy']


This is how you can leverage Machine Learning for text emotion classification using Python. Text emotion classification involves assigning an emotion to a piece of text by understanding its context and underlying sentiment. A practical example of this is the iPhone keyboard, which suggests the most relevant emoji based on the text input. I hope you found this article on Text Emotion Classification with Machine Learning using Python informative. 