<a href="https://colab.research.google.com/github/affu-11/Detect-Human-Emotions/blob/main/Week_5_GNCIPL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project :** # **Emotion Detection from Text**

1. **Objective:**
Detect human emotions (happy, sad, angry, fear, surprise, neutral, etc.) from textual input using a deep learning model.

2. **Dataset:** thttps://www.kaggle.com/datasets/pashupatigupta/emotion-detection-from-text?resource=download

    **Description:** Contains text samples (tweets or sentences) labeled with emotions such as joy, anger, sadness, fear, surprise, and love.

3. **Preprocessing:**

   Text cleaning: remove punctuation, numbers, stopwords.

   Tokenization & padding.

   Convert words to vectors using Embedding layer (or pretrained embeddings like GloVe).

   Encode labels into one-hot vectors.

4. **Model Architecture:**

    Embedding Layer (input_dim = vocab_size, output_dim = 100, input_length = max_len).

    Dense(128) – ReLU activation.

    Dropout(0.3).

    Dense(64) – ReLU activation.

    Dense(output_classes) – Softmax activation.

5. **Training:**

   Optimizer: Adam.

   Loss: Categorical Crossentropy.

   Epochs: 15–20.

   Batch size: 32 or 64.

6. **Evaluation:**

   Accuracy, Precision, Recall, F1-score.

   Confusion Matrix to visualize per-class performance.

7. **Extensions:**

   Use LSTM/GRU for better sequence modeling.

   Build a real-time emotion detection chatbot.

   Extend to multilingual emotion detection.

   Deploy as a web app (Flask/Streamlit).

8. **Tools:**

   TensorFlow/Keras – for ANN modeling.

   NLTK/Spacy – for text preprocessing.

   scikit-learn – for evaluation metrics.

   Matplotlib/Seaborn – for visualization.

# **IMPORT LIBRARIES**

In [2]:
import pandas as pd
import numpy as np
import re
import plotly.express as px
import plotly.graph_objects as go

from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Dense, Dropout

# **LOAD THE DATASET**

In [3]:
from google.colab import drive
drive.mount('/content/drive')
df = pd.read_csv("/content/drive/MyDrive/Datasets/tweet_emotions.csv")

Mounted at /content/drive


In [4]:
print(df.isnull().sum())
df = df.rename(columns={"content": "text", "sentiment": "emotion"})
print(df.head())

tweet_id     0
sentiment    0
content      0
dtype: int64
     tweet_id     emotion                                               text
0  1956967341       empty  @tiffanylue i know  i was listenin to bad habi...
1  1956967666     sadness  Layin n bed with a headache  ughhhh...waitin o...
2  1956967696     sadness                Funeral ceremony...gloomy friday...
3  1956967789  enthusiasm               wants to hang out with friends SOON!
4  1956968416     neutral  @dannycastillo We want to trade with someone w...


# **PREPROCESSING**

In [5]:
def clean_text(text):
    text = str(text).lower()
    text = re.sub(r"[^a-z\s]", "", text)
    return text

df["clean_text"] = df["text"].apply(clean_text)

# Encode labels
label_encoder = LabelEncoder()
df["label"] = label_encoder.fit_transform(df["emotion"])
num_classes = len(label_encoder.classes_)

# Tokenization & padding
VOCAB_SIZE = 10000
MAX_LEN = 100
tokenizer = Tokenizer(num_words=VOCAB_SIZE, oov_token="<OOV>")
tokenizer.fit_on_texts(df["clean_text"])

sequences = tokenizer.texts_to_sequences(df["clean_text"])
padded = pad_sequences(sequences, maxlen=MAX_LEN, padding="post")

X = padded
y = tf.keras.utils.to_categorical(df["label"], num_classes=num_classes)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=df["label"])


# **BUILD ANN MODEL**

In [6]:
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Dense, Dropout, Flatten

model = Sequential([
    Embedding(input_dim=VOCAB_SIZE, output_dim=100, input_length=MAX_LEN),
    Flatten(),  # Add Flatten layer here
    Dense(128, activation="relu"),
    Dropout(0.3),
    Dense(64, activation="relu"),
    Dense(num_classes, activation="softmax")
])

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
model.summary()



# **TRAIN MODEL**

In [7]:
history = model.fit(X_train, y_train, epochs=15, batch_size=32, validation_split=0.2, verbose=1)

Epoch 1/15
[1m800/800[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m30s[0m 34ms/step - accuracy: 0.2470 - loss: 2.1689 - val_accuracy: 0.3156 - val_loss: 1.9699
Epoch 2/15
[1m800/800[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m31s[0m 38ms/step - accuracy: 0.3592 - loss: 1.8706 - val_accuracy: 0.3291 - val_loss: 1.9640
Epoch 3/15
[1m800/800[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m48s[0m 47ms/step - accuracy: 0.4595 - loss: 1.5489 - val_accuracy: 0.3039 - val_loss: 2.1928
Epoch 4/15
[1m800/800[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m32s[0m 35ms/step - accuracy: 0.6210 - loss: 1.1129 - val_accuracy: 0.2895 - val_loss: 2.5143
Epoch 5/15
[1m800/800[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m31s[0m 39ms/step - accuracy: 0.7621 - loss: 0.7417 - val_accuracy: 0.2841 - val_loss: 3.1613
Epoch 6/15
[1m800/800[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m37s[0m 34ms/step - accuracy: 0.8419 - loss: 0.5017 - val_accuracy: 0.2739 - val_loss: 3.8118
Epoch 7/15
[1m8

In [8]:
#Step 7: Plotly Training Curves
# Accuracy
fig_acc = go.Figure()
fig_acc.add_trace(go.Scatter(y=history.history['accuracy'], mode='lines+markers', name='Train Accuracy'))
fig_acc.add_trace(go.Scatter(y=history.history['val_accuracy'], mode='lines+markers', name='Val Accuracy'))
fig_acc.update_layout(title="Training vs Validation Accuracy", xaxis_title="Epoch", yaxis_title="Accuracy")
fig_acc.show()

# Loss
fig_loss = go.Figure()
fig_loss.add_trace(go.Scatter(y=history.history['loss'], mode='lines+markers', name='Train Loss'))
fig_loss.add_trace(go.Scatter(y=history.history['val_loss'], mode='lines+markers', name='Val Loss'))
fig_loss.update_layout(title="Training vs Validation Loss", xaxis_title="Epoch", yaxis_title="Loss")
fig_loss.show()

In [9]:
# Step 8: Evaluation
y_pred_probs = model.predict(X_test)
y_pred = np.argmax(y_pred_probs, axis=1)
y_true = np.argmax(y_test, axis=1)

print("Classification Report:\n")
print(classification_report(y_true, y_pred, target_names=label_encoder.classes_))

# Confusion Matrix with Plotly
cm = confusion_matrix(y_true, y_pred)
cm_fig = px.imshow(cm, text_auto=True, x=label_encoder.classes_, y=label_encoder.classes_, color_continuous_scale="Blues")
cm_fig.update_layout(title="Confusion Matrix - Emotion Detection", xaxis_title="Predicted", yaxis_title="True")
cm_fig.show()


[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step
Classification Report:

              precision    recall  f1-score   support

       anger       0.00      0.00      0.00        22
     boredom       0.00      0.00      0.00        36
       empty       0.04      0.03      0.03       165
  enthusiasm       0.02      0.01      0.01       152
         fun       0.08      0.06      0.07       355
   happiness       0.24      0.30      0.27      1042
        hate       0.18      0.08      0.11       265
        love       0.31      0.31      0.31       768
     neutral       0.32      0.27      0.29      1728
      relief       0.06      0.03      0.04       305
     sadness       0.23      0.28      0.25      1033
    surprise       0.09      0.06      0.07       437
       worry       0.30      0.36      0.33      1692

    accuracy                           0.25      8000
   macro avg       0.14      0.14      0.14      8000
weighted avg       0.24      0.25  

In [10]:
def predict_emotion(text):
    seq = tokenizer.texts_to_sequences([clean_text(text)])
    pad = pad_sequences(seq, maxlen=MAX_LEN, padding="post")
    pred = model.predict(pad)
    return label_encoder.classes_[np.argmax(pred)]

print(predict_emotion("I am so happy today!"))
print(predict_emotion("This is the worst day ever."))

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 100ms/step
happiness
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 41ms/step
sadness
