***Load and Explore Dataset:***

Loads the GoEmotions dataset, a multi-label emotion classification dataset consisting of text samples and associated emotions. Performs initial data exploration to understand dimensions, view sample records, and identify any missing values or dominant emotion categories.



In [None]:
import pandas as pd

# Load the CSV file
file_path = "/kaggle/input/go-emotions/go_emotions_dataset.csv"
df = pd.read_csv(file_path)

# Show basic info about the dataset
print("Shape of dataset:", df.shape)
print("\nColumn Names:", df.columns.tolist())

# Display the first 5 rows
print("\nSample data:")
print(df.head())

# Check for missing values
print("\nMissing values per column:")
print(df.isnull().sum())

# Optional: Check unique categories (assuming 'categories' column exists)
if 'categories' in df.columns:
    print("\nUnique primary categories (top 10):")
    print(df['categories'].value_counts().head(10))


Shape of dataset: (211225, 31)

Column Names: ['id', 'text', 'example_very_unclear', 'admiration', 'amusement', 'anger', 'annoyance', 'approval', 'caring', 'confusion', 'curiosity', 'desire', 'disappointment', 'disapproval', 'disgust', 'embarrassment', 'excitement', 'fear', 'gratitude', 'grief', 'joy', 'love', 'nervousness', 'optimism', 'pride', 'realization', 'relief', 'remorse', 'sadness', 'surprise', 'neutral']

Sample data:
        id                                               text  \
0  eew5j0j                                    That game hurt.   
1  eemcysk   >sexuality shouldn’t be a grouping category I...   
2  ed2mah1     You do right, if you don't care then fuck 'em!   
3  eeibobj                                 Man I love reddit.   
4  eda6yn6  [NAME] was nowhere near them, he was by the Fa...   

   example_very_unclear  admiration  amusement  anger  annoyance  approval  \
0                 False           0          0      0          0         0   
1                  Tr

 ***Inspect Data Columns:***

 Displays all column names in the dataset to confirm availability of essential fields like text, labels, and optional metadata. This helps guide preprocessing and modeling decisions.

In [None]:
print(df.columns)


Index(['id', 'text', 'example_very_unclear', 'admiration', 'amusement',
       'anger', 'annoyance', 'approval', 'caring', 'confusion', 'curiosity',
       'desire', 'disappointment', 'disapproval', 'disgust', 'embarrassment',
       'excitement', 'fear', 'gratitude', 'grief', 'joy', 'love',
       'nervousness', 'optimism', 'pride', 'realization', 'relief', 'remorse',
       'sadness', 'surprise', 'neutral'],
      dtype='object')


***Import Core Libraries:***

Imports required libraries for data manipulation (NumPy, Pandas), visualization (Matplotlib, Seaborn), deep learning (TensorFlow, Keras), and evaluation (Scikit-learn). These tools form the foundation for preprocessing, model building, and performance tracking.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model
from tensorflow.keras.metrics import AUC, Precision, Recall
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.metrics import f1_score, classification_report
import time
from tqdm import tqdm

***Import Modeling Utilities:***

Loads specific TensorFlow/Keras layers and metrics used for building a custom deep learning model. Also includes roc_auc_score and f1_score to evaluate multi-label classification tasks.

In [None]:
import time
import numpy as np
import tensorflow as tf
from tensorflow.keras import Input, Model
from tensorflow.keras.layers import Dropout, Dense, GlobalAveragePooling1D, Lambda
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import AUC, Precision, Recall
from sklearn.metrics import f1_score, roc_auc_score


***Custom Metrics Callback:***

Implements a custom Keras callback to monitor performance on the validation set during training. It evaluates multiple threshold values for converting probabilities into binary predictions and selects the best threshold based on macro F1-score — critical for multi-label settings like emotion classification.

In [None]:
# ================================
# Custom Metrics Callback
# ================================
class MetricsCallback(tf.keras.callbacks.Callback):
    def __init__(self, X_val, y_val, thresholds=np.arange(0.1, 0.6, 0.1)):
        super().__init__()
        self.X_val = X_val
        self.y_val = y_val
        self.thresholds = thresholds
        self.best_threshold = 0.5
        self.best_macro_f1 = 0.0

    def on_epoch_end(self, epoch, logs=None):
        y_pred_probs = self.model.predict(self.X_val, verbose=0)
        for threshold in self.thresholds:
            y_pred_bin = (y_pred_probs >= threshold).astype(int)
            macro_f1 = f1_score(self.y_val, y_pred_bin, average='macro', zero_division=0)
            if macro_f1 > self.best_macro_f1:
                self.best_macro_f1 = macro_f1
                self.best_threshold = threshold

        try:
            micro_auc = roc_auc_score(self.y_val, y_pred_probs, average='micro')
        except ValueError:
            micro_auc = float('nan')

        print(f"🔹 [Epoch {epoch+1}] Best Macro F1: {self.best_macro_f1:.4f} at threshold={self.best_threshold:.2f}")
        print(f"🔹 [Epoch {epoch+1}] Micro AUC: {micro_auc:.4f}")

***Load & Prepare Multi-label Emotion Data:***

We load the GoEmotions dataset and prepare it for training by extracting the input texts and corresponding multi-label emotion annotations. The dataset is split into training, validation, and test sets using an 80/10/10 ratio. This setup ensures that the model is trained, tuned, and evaluated fairly without data leakage.

In [None]:
# =========================================
# Load and Preprocess GoEmotions Dataset
# =========================================
file_path = "/kaggle/input/go-emotions/go_emotions_dataset.csv"
df = pd.read_csv(file_path)

# Extract texts and multi-labels
texts = df['text'].tolist()
label_columns = df.columns[3:]  # skip id, text, example_very_unclear
labels = df[label_columns].values

# Split the dataset
X_train, X_temp, y_train, y_temp = train_test_split(texts, labels, test_size=0.2, random_state=42)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)


***Tokenization & Sequence Padding***

Texts are converted into sequences of integers using a Tokenizer limited to 20,000 most frequent words. Each sequence is padded to a fixed length of 100 tokens to ensure consistent input size for the model. This step prepares raw text for input into the embedding and transformer layers.

In [None]:
# =========================================
# Tokenization & Padding
# =========================================
tokenizer = keras.preprocessing.text.Tokenizer(num_words=20000, oov_token='<OOV>')
tokenizer.fit_on_texts(X_train)

maxlen = 100

def encode_pad(texts):
    seqs = [tokenizer.texts_to_sequences([t])[0] for t in tqdm(texts, desc="Tokenizing")]
    return keras.preprocessing.sequence.pad_sequences(seqs, maxlen=maxlen)

X_train_pad = encode_pad(X_train)
X_val_pad = encode_pad(X_val)
X_test_pad = encode_pad(X_test)

Tokenizing: 100%|██████████| 168980/168980 [00:02<00:00, 70556.45it/s]
Tokenizing: 100%|██████████| 21122/21122 [00:00<00:00, 78746.49it/s]
Tokenizing: 100%|██████████| 21123/21123 [00:00<00:00, 78295.91it/s]


***Define Transformer Block***

We define a custom Transformer encoder block using multi-head self-attention, followed by feed-forward layers and residual connections. This layer captures contextual relationships in the input sequence and is key to learning meaningful representations for each token.

In [None]:
class TransformerBlock(Layer):
    def __init__(self, embed_dim, num_heads, ff_dim, rate=0.1):
        super().__init__()
        self.att = MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim)
        self.ffn = keras.Sequential([
            Dense(ff_dim, activation='relu'),
            Dense(embed_dim),
        ])
        self.norm1 = LayerNormalization(epsilon=1e-6)
        self.norm2 = LayerNormalization(epsilon=1e-6)
        self.dropout1 = Dropout(rate)
        self.dropout2 = Dropout(rate)

    def call(self, inputs, training=False, mask=None):
        attn_output = self.att(inputs, inputs, attention_mask=mask)
        out1 = self.norm1(inputs + self.dropout1(attn_output, training=training))
        ffn_output = self.ffn(out1)
        return self.norm2(out1 + self.dropout2(ffn_output, training=training))

***Token + Positional Embedding Layer***

Since transformers have no inherent sense of order, we embed both tokens and their positions in the input. This custom layer ensures the model can learn not only what each word means but also where it appears in the sentence.

In [None]:
class TokenAndPositionEmbedding(Layer):
    def __init__(self, maxlen, vocab_size, embed_dim):
        super().__init__()
        self.token_emb = Embedding(input_dim=vocab_size, output_dim=embed_dim)
        self.pos_emb = Embedding(input_dim=maxlen, output_dim=embed_dim)

    def call(self, x):
        positions = tf.range(start=0, limit=tf.shape(x)[1], delta=1)
        positions = self.pos_emb(positions)
        x = self.token_emb(x)
        return x + positions

***Build and Compile the Transformer Model***

This block constructs the final Transformer model using stacked self-attention layers, pooling, and dense layers. It outputs multi-label predictions using a sigmoid activation across 28 emotion classes. The model is compiled with binary cross-entropy loss and AUC, precision, and recall metrics — all important for multi-label classification evaluation.

In [None]:
# ================================
vocab_size = 20000
embed_dim = 128
num_heads = 4
ff_dim = 256
num_classes = y_train.shape[1]  # 28 emotions

inputs = Input(shape=(maxlen,))
embedding = TokenAndPositionEmbedding(maxlen, vocab_size, embed_dim)(inputs)
mask = Lambda(lambda x: tf.cast(tf.not_equal(x, 0), tf.int32)[:, tf.newaxis, tf.newaxis, :])(inputs)

x = embedding
for _ in range(4):
    x = TransformerBlock(embed_dim, num_heads, ff_dim)(x, mask=mask)

x = GlobalAveragePooling1D()(x)
x = Dropout(0.3)(x)
x = Dense(128, activation='relu')(x)
x = Dropout(0.3)(x)
outputs = Dense(num_classes, activation='sigmoid')(x)  # sigmoid for multi-label

model = Model(inputs, outputs)
model.compile(
    optimizer=Adam(learning_rate=1e-3),
    loss='binary_crossentropy',
    metrics=[
        AUC(name='auc', multi_label=True),
        Precision(name='precision'),
        Recall(name='recall')
    ])
model.summary()

***Define and Compile the Model***

This cell defines the deep learning model architecture for multi-label emotion classification. It likely uses an embedding layer (like BERT or pretrained embeddings) followed by dense layers with a sigmoid activation to handle multilabel outputs. The model is compiled with a binary cross-entropy loss and appropriate metrics for multilabel classification.

***Train the Model with Custom Callback***

We train the model using the training dataset with validation monitoring and multiple callbacks:

EarlyStopping halts training if validation performance stops improving.

ModelCheckpoint saves the best model based on validation loss.

MetricsCallback is a custom callback that dynamically tracks the best F1 score and threshold. The model is trained for up to 20 epochs with a batch size of 64, and the total training time is logged.

In [None]:



# ================================
# Define and Compile Model



# ================================
# Train Model with Custom Callback
# ================================
metrics_callback = MetricsCallback(X_val_pad, y_val)

callbacks = [
    tf.keras.callbacks.EarlyStopping(patience=3, restore_best_weights=True),
    tf.keras.callbacks.ModelCheckpoint('best_goemotions_model.keras', save_best_only=True),
    metrics_callback
]

start_time = time.time()
history = model.fit(
    X_train_pad, y_train,
    validation_data=(X_val_pad, y_val),
    epochs=20,
    batch_size=64,
    verbose=2,
    callbacks=callbacks
)
training_time = time.time() - start_time
print("✅ Training completed in {:.2f} seconds".format(training_time))

# ================================
# Final Test Evaluation
# ================================
best_threshold = metrics_callback.best_threshold
print(f"\n🏁 Best threshold from training: {best_threshold:.2f}")

y_pred_probs = model.predict(X_test_pad, batch_size=64)
y_pred_bin = (y_pred_probs >= best_threshold).astype(int)

# Final metrics
macro_f1 = f1_score(y_test, y_pred_bin, average='macro', zero_division=0)
try:
    micro_auc = roc_auc_score(y_test, y_pred_probs, average='micro')
except ValueError:
    micro_auc = float('nan')

print(f"🎯 Final Macro F1-score on Test: {macro_f1:.4f}")
print(f"🎯 Final Micro AUC-score on Test: {micro_auc:.4f}")


Epoch 1/20
🔹 [Epoch 1] Best Macro F1: 0.1066 at threshold=0.10
🔹 [Epoch 1] Micro AUC: 0.8152
2641/2641 - 107s - 41ms/step - auc: 0.5899 - loss: 0.1551 - precision: 0.4963 - recall: 0.0281 - val_auc: 0.6937 - val_loss: 0.1397 - val_precision: 0.7310 - val_recall: 0.0638
Epoch 2/20
🔹 [Epoch 2] Best Macro F1: 0.1958 at threshold=0.10
🔹 [Epoch 2] Micro AUC: 0.8469
2641/2641 - 68s - 26ms/step - auc: 0.7109 - loss: 0.1387 - precision: 0.6389 - recall: 0.0946 - val_auc: 0.7590 - val_loss: 0.1326 - val_precision: 0.6372 - val_recall: 0.1142
Epoch 3/20
🔹 [Epoch 3] Best Macro F1: 0.1974 at threshold=0.10
🔹 [Epoch 3] Micro AUC: 0.8480
2641/2641 - 69s - 26ms/step - auc: 0.7481 - loss: 0.1348 - precision: 0.6352 - recall: 0.1147 - val_auc: 0.7588 - val_loss: 0.1323 - val_precision: 0.6296 - val_recall: 0.1196
Epoch 4/20
🔹 [Epoch 4] Best Macro F1: 0.2133 at threshold=0.10
🔹 [Epoch 4] Micro AUC: 0.8510
2641/2641 - 68s - 26ms/step - auc: 0.7665 - loss: 0.1314 - precision: 0.6485 - recall: 0.1372 - val

In [None]:
# Sample test sentences (you can replace or load from CSV)
new_texts = [
    "I'm extremely happy and proud of what I achieved!",
    "This is the worst thing ever. I'm so mad.",
    "I'm feeling so lost and unsure about everything.",
    "Wow, what a surprise! Totally unexpected.",
    "I really appreciate your kindness."
]


In [None]:
# Use your defined function
X_new_pad = encode_pad(new_texts)


Tokenizing: 100%|██████████| 5/5 [00:00<00:00, 18493.40it/s]


***Evaluate Model on Test Set***

After training, we evaluate the model on the test set using the best threshold learned during validation. We compute:

Macro F1-score: Averages F1 across all emotion classes equally.

Micro AUC: Measures the area under the ROC curve across all labels. This provides a robust understanding of the model’s multilabel classification performance.

In [None]:
# Predict probabilities
y_pred_probs_new = model.predict(X_new_pad, batch_size=64)

# Apply threshold from training
y_pred_bin_new = (y_pred_probs_new >= best_threshold).astype(int)


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2s/step


***Make Predictions on New Text Inputs***

We now apply the trained model to new, unseen text examples. Probabilities are predicted for each emotion label, and then binarized using the best threshold. This step simulates real-world inference where user-generated text is input and emotions are predicted.

In [None]:
# GoEmotions labels
labels = [
    "admiration", "amusement", "anger", "annoyance", "approval", "caring",
    "confusion", "curiosity", "desire", "disappointment", "disapproval",
    "disgust", "embarrassment", "excitement", "fear", "gratitude", "grief",
    "joy", "love", "nervousness", "optimism", "pride", "realization", "relief",
    "remorse", "sadness", "surprise", "neutral"
]

# Format predictions
for text, bin_preds in zip(new_texts, y_pred_bin_new):
    predicted_emotions = [label for label, val in zip(labels, bin_preds) if val == 1]
    print(f"\n📝 Text: {text}")
    print(f"🔍 Predicted Emotions: {predicted_emotions if predicted_emotions else ['None']}")


***Display Final Cleaned Predictions***

The cleaned emotion predictions are displayed for each input sentence. This final step showcases how the model works in a practical setting, highlighting its ability to capture complex emotional nuances beyond simple positive/negative/neutral sentiment.

In [None]:
# If any non-neutral emotion is predicted, remove 'neutral'
def clean_emotions(preds, labels):
    cleaned = []
    for pred in preds:
        emotions = [label for i, label in enumerate(labels) if pred[i] == 1]
        if 'neutral' in emotions and len(emotions) > 1:
            emotions.remove('neutral')
        cleaned.append(emotions)
    return cleaned

# Apply to your predictions
cleaned_preds = clean_emotions(y_pred_bin_new, labels)

# Display
for text, emotions in zip(new_texts, cleaned_preds):
    print(f"\n📝 Text: {text}")
    print(f"🎯 Final Emotions: {emotions if emotions else ['None']}")



📝 Text: I'm extremely happy and proud of what I achieved!
🎯 Final Emotions: ['admiration', 'joy']

📝 Text: This is the worst thing ever. I'm so mad.
🎯 Final Emotions: ['anger', 'annoyance', 'disgust']

📝 Text: I'm feeling so lost and unsure about everything.
🎯 Final Emotions: ['disapproval']

📝 Text: Wow, what a surprise! Totally unexpected.
🎯 Final Emotions: ['approval']

📝 Text: I really appreciate your kindness.
🎯 Final Emotions: ['admiration', 'gratitude', 'joy']
