# <p style="padding:10px;background-color:#264653 ;margin:0;color:white;font-family:Arial, sans-serif;font-size:100%;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">Text Emotion Classifier : BERT&CNN  Duo💼 🗝️</p>

<div style="background-color:#08A4BD; padding:20px; border-radius:10px; box-shadow: 0 4px 8px 0 rgba(0,0,0,0.2);">
<p style="font-family:Arial, sans-serif; font-size:24px; color:#000000; text-align:center;">🎯 Project Goal</p>

<p style="font-family:Arial, sans-serif; font-size:16px; line-height:1.5; color:#000000;">Welcome to our journey of emotion classification! 🌈 Our goal is to train a powerful model that can accurately identify emotions from text data. We'll be fine-tuning a BERT model, and trust me, we have some exciting architectural enhancements up our sleeves to take it to the next level! 💪</p>

<p style="font-family:Arial, sans-serif; font-size:16px; line-height:1.5; color:#000000;">For our dataset, we'll be using the incredible Emotions dataset from Kaggle. 📂 This dataset contains two key columns: 'text' and 'label'. The 'label' column represents six different emotion classes: sadness (0), joy (1), love (2), anger (3), fear (4), and surprise (5). Get ready to dive deep into the world of human emotions! 🌊</p>
</div>

<div style="background-color:#264653; padding:20px; border-radius:10px; box-shadow: 0 4px 8px 0 rgba(0,0,0,0.2);">
<p style="font-family:Arial, sans-serif; font-size:24px; color:white; text-align:center;">📥 Importing Libraries</p>

<p style="font-family:Arial, sans-serif; font-size:16px; line-height:1.5; color:white;">Before we embark on our journey, we need to gather our tools. 🛠️ In this phase, we imported the essential libraries that will power our project:</p>

<ul style="font-family:Arial, sans-serif; font-size:16px; color:white;">
   <li>PyTorch: The backbone of our deep learning models 🔥</li>
   <li>Transformers: Allowing us to harness the power of pre-trained language models like BERT 🌐</li>
   <li>And many other useful libraries for data manipulation, visualization, and evaluation 📊</li>
</ul>

<p style="font-family:Arial, sans-serif; font-size:16px; line-height:1.5; color:white;">With these tools at our disposal, we're ready to embark on our quest for emotion classification! 🚀</p>
</div>

In [1]:
from sklearn.metrics import f1_score, precision_score, recall_score
import torch
import torch.nn as nn
import pandas as pd
import numpy as np
from torch.utils.data import Dataset, DataLoader
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel
from sklearn.model_selection import train_test_split
import random
from sklearn.metrics import f1_score, precision_score, recall_score, confusion_matrix
from torch.nn.parallel import DataParallel
import matplotlib.pyplot as plt


In [2]:
df = pd.read_csv("/kaggle/input/emotions/text.csv")
df.head(3)

Unnamed: 0.1,Unnamed: 0,text,label
0,0,i just feel really helpless and heavy hearted,4
1,1,ive enjoyed being able to slouch about relax a...,0
2,2,i gave up my internship with the dmrg and am f...,4


<div style="background-color:#08A4BD; padding:20px; border-radius:10px; box-shadow: 0 4px 8px 0 rgba(0,0,0,0.2);">
<p style="font-family:Arial, sans-serif; font-size:24px; color:#000000; text-align:center;">📊 Data Analysis</p>

<p style="font-family:Arial, sans-serif; font-size:16px; line-height:1.5; color:#000000;">Now, let's dive into our dataset! 🏊‍♀️ In this phase, we'll get up close and personal with our data:</p>

<ol style="font-family:Arial, sans-serif; font-size:16px; color:#000000;">
   <li>We'll read the dataset into a cozy DataFrame 📖</li>
   <li>Next, we'll plot the value counts of the 'label' column to identify any imbalances in our classes 📈</li>
</ol>

In [3]:
df.shape

(416809, 3)

In [4]:
df.label.value_counts()


label
1    141067
0    121187
3     57317
4     47712
2     34554
5     14972
Name: count, dtype: int64

<p  style="font-family:Arial, sans-serif; font-size:16px; line-height:1.5; color:#333333;"><span style="font-weight:bold; color:#4B7BE5;">Key Insight:</span> Our analysis revealed a clear class imbalance, with some emotion classes being significantly over or underrepresented. 🕵️‍♀️</p>

<p style="font-family:Arial, sans-serif; font-size:16px; line-height:1.5; color:#333333;">Knowledge is power, and this initial exploration will guide us in preparing our data for modeling. 💡</p>
</div>

<div style="background-color:#4C212A; padding:20px; border-radius:10px; box-shadow: 0 4px 8px 0 rgba(0,0,0,0.2);">
<p style="font-family:Arial, sans-serif; font-size:24px; color:white; text-align:center;">🔧 Data Preparation</p>

<p style="font-family:Arial, sans-serif; font-size:16px; line-height:1.5; color:white;">With our insights from the data analysis phase, it's time to roll up our sleeves and prepare our data for modeling! 💪 In this phase, we'll:</p>

<ul style="font-family:Arial, sans-serif; font-size:16px; color:white;">
   <li>Create a custom PyTorch class to balance our dataset and control the number of rows we'll use 🎚️</li>
   <li>Perform a train/test split on our data 🔀</li>
   <li>Apply auto-tokenization and create data loaders to feed our hungry model 🍽️</li>
</ul>

In [5]:
class TextStratifiedData(Dataset):
    def __init__(self, df, length=None):
        if length is not None and length > df.shape[0]:
            raise ValueError("Length parameter cannot be greater than the size of the dataset.")
        self.length = length if length is not None else len(df)
        self.df = self.stratify(df)
 
    def stratify(self, df):
        min_count = df['label'].value_counts().min()
        df = df.groupby('label').apply(lambda x: x.sample(min_count)).reset_index(drop=True)
        return df.sample(self.length)

    def len(self):
        return self.df.shape[0]

    def get_item(self, idx):
        return self.df.iloc[idx, :]
    
    def get_all(self):
        return self.df
    
    
df = TextStratifiedData(df,25000)


  df = df.groupby('label').apply(lambda x: x.sample(min_count)).reset_index(drop=True)


In [6]:
df =df.get_all()

In [7]:
df.label.value_counts()

label
1    4251
0    4203
2    4177
4    4144
3    4114
5    4111
Name: count, dtype: int64

<p  style="font-family:Arial, sans-serif; font-size:16px; line-height:1.5; color:#333333;"><span style="font-weight:bold; color:#4B7BE5;">Key Insight:</span> Now we can see that the labels are balanced, our Class worked 🕵️‍♀️</p>



In [8]:
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")

train_texts, val_texts, train_labels, val_labels = train_test_split(df["text"], df["label"], test_size=0.2, random_state=42)

train_texts = train_texts.tolist()
val_texts = val_texts.tolist()
train_labels = np.array(train_labels)
val_labels = np.array(val_labels)

train_encodings = tokenizer(train_texts, truncation=True, padding=True, max_length=128, return_tensors='pt')
val_encodings = tokenizer(val_texts, truncation=True, padding=True, max_length=128, return_tensors='pt')

train_dataset = torch.utils.data.TensorDataset(train_encodings['input_ids'], train_encodings['attention_mask'], torch.tensor(train_labels,dtype=torch.float64))
val_dataset = torch.utils.data.TensorDataset(val_encodings['input_ids'], val_encodings['attention_mask'], torch.tensor(val_labels,dtype=torch.float64))

train_dataloader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_dataloader = DataLoader(val_dataset, batch_size=32)


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

<p style="font-weight:bold; color:#4B7BE5;">Key Insight:</span> Our dataset is quite large, so we'll start with a smaller subset for our initial experiments. 🔍</p>
</div>


<div style="background-color:#08A4BD; padding:20px; border-radius:10px; box-shadow: 0 4px 8px 0 rgba(0,0,0,0.2);">
<p style="font-family:Georgia, serif; font-size:24px; color:#000000; text-align:center;">⚙️ Modeling: Approach 1</p>

<p style="font-family:Arial, sans-serif; font-size:16px; line-height:1.5; color:#000000;">In our first modeling approach, we'll be building a powerful architecture fueled by the mighty BERT! 💥 Here's what we've got in store:</p>

<ol style="font-family:Arial, sans-serif; font-size:16px; color:#000000;">
   <li>We'll create a model class with BERT as the backbone, followed by a pooling layer and a fully connected layer 🧠</li>
   <li>To harness the power of Kaggle's dual GPU accelerators, we'll utilize `nn.parallel` 🔥</li>
   <li>We'll define a cross-entropy criterion that applies softmax to the final layer of our model 🎯</li>
   <li>Finally, we'll implement training and testing loops to optimize our model's performance 🏋️‍♀️</li>
</ol>
</div>

In [9]:
class EmotionClassifier(nn.Module):
    def __init__(self, transformer_model, num_classes):
        super(EmotionClassifier, self).__init__()
        self.transformer = transformer_model
        self.fc = nn.Linear(768, num_classes)  
        
    def forward(self, input_ids, attention_mask):
        output = self.transformer(input_ids=input_ids, attention_mask=attention_mask)
        pooled_output = output.pooler_output 
        logits = self.fc(pooled_output)
        return logits

    


num_classes =6  

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")



model = EmotionClassifier(model, num_classes)
model = DataParallel(model)
model = model.to(device)



optimizer = torch.optim.Adam(model.parameters(), lr=1e-5)
criterion = nn.CrossEntropyLoss()

num_epochs=3
for epoch in range(num_epochs):
    model.train()
    total_loss = 0.0
    total_correct = 0
    
    for batch in train_dataloader:
        input_ids, attention_mask, labels = batch
        input_ids, attention_mask, labels = input_ids.to(device), attention_mask.to(device), labels.to(device)
        
        optimizer.zero_grad()
        outputs = model(input_ids, attention_mask)
       

        labels = labels.to(device).long()
        outputs = outputs.float()
        


        
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
        _, predicted = torch.max(outputs, 1)
        total_correct += (predicted == labels).sum().item()
    
    train_loss = total_loss / len(train_dataloader)
    train_accuracy = total_correct / len(train_dataset)
    
    # Validation loop
    model.eval()
    total_val_loss = 0.0
    total_val_correct = 0
    val_predicted = []
    val_labels = []
    
    with torch.no_grad():
        for batch in val_dataloader:
            input_ids, attention_mask, labels = batch
            input_ids, attention_mask, labels = input_ids.to(device), attention_mask.to(device), labels.to(device)

            outputs = model(input_ids, attention_mask)
            labels = labels.to(device).long()
            
            

            outputs = outputs.float()
            loss = criterion(outputs, labels)

            total_val_loss += loss.item()
            _, predicted = torch.max(outputs, 1)
            total_val_correct += (predicted == labels).sum().item()
            val_predicted.extend(predicted.cpu().numpy())
            val_labels.extend(labels.cpu().numpy())
    




<div style="background-color:#264653; padding:20px; border-radius:10px; box-shadow: 0 4px 8px 0 rgba(0,0,0,0.2);">
<p style="font-family:Georgia, serif; font-size:24px; color:white; text-align:center;">📈 Results: Approach 1</p>

<p style="font-family:Arial, sans-serif; font-size:16px; line-height:1.5; color:white;">Our first modeling approach has yielded some promising results! 🎉 With this architecture, we achieved an impressive F1 score of around 93%. 🏆</p>
</div>

In [10]:
from sklearn.metrics import classification_report


val_predicted = np.array(val_predicted)
val_labels = np.array(val_labels)


report = classification_report(val_labels, val_predicted, target_names=[f'Class {i}' for i in range(num_classes)])

print("Classification Report:")
print(report)

Classification Report:
              precision    recall  f1-score   support

     Class 0       0.98      0.95      0.96       903
     Class 1       0.98      0.91      0.95       815
     Class 2       0.94      0.99      0.96       843
     Class 3       0.96      0.94      0.95       818
     Class 4       0.94      0.89      0.91       795
     Class 5       0.91      1.00      0.95       826

    accuracy                           0.95      5000
   macro avg       0.95      0.95      0.95      5000
weighted avg       0.95      0.95      0.95      5000



<div style="background-color:#08A4BD; padding:20px; border-radius:10px; box-shadow: 0 4px 8px 0 rgba(0,0,0,0.2);">
<p style="font-family:Georgia, serif; font-size:24px; color:#000000; text-align:center;">⚙️ Modeling: Approach 2</p>

<p style="font-family:Arial, sans-serif; font-size:16px; line-height:1.5; color:#000000;">For our second modeling approach, we're going to take things up a notch! 🔥 Building upon our previous architecture, we'll be adding convolutional layers with dropouts to our model.</p>

<p style="font-family:Arial, sans-serif; font-size:16px; line-height:1.5; color:#000000;">Why, you ask? 🤔 Well, these additional layers have the potential to capture more complex features and patterns in our data, potentially leading to even better performance. 🚀</p>
</div>

In [11]:
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")

train_texts, val_texts, train_labels, val_labels = train_test_split(df["text"], df["label"], test_size=0.2, random_state=42)

train_texts = train_texts.tolist()
val_texts = val_texts.tolist()
train_labels = np.array(train_labels)
val_labels = np.array(val_labels)

train_encodings = tokenizer(train_texts, truncation=True, padding=True, max_length=128, return_tensors='pt')
val_encodings = tokenizer(val_texts, truncation=True, padding=True, max_length=128, return_tensors='pt')

train_dataset = torch.utils.data.TensorDataset(train_encodings['input_ids'], train_encodings['attention_mask'], torch.tensor(train_labels,dtype=torch.float64))
val_dataset = torch.utils.data.TensorDataset(val_encodings['input_ids'], val_encodings['attention_mask'], torch.tensor(val_labels,dtype=torch.float64))

train_dataloader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_dataloader = DataLoader(val_dataset, batch_size=32)


In [12]:
class EmotionClassifierWithConv(nn.Module):
    def __init__(self, transformer_model, num_classes, kernel_size=3, num_filters=256):
        super(EmotionClassifierWithConv, self).__init__()
        self.transformer = transformer_model
        self.conv = nn.Conv1d(in_channels=768, out_channels=num_filters, kernel_size=kernel_size, padding=1)  # Adjust padding
        self.fc = nn.Linear(num_filters, num_classes)
    
    def forward(self, input_ids, attention_mask):
        output = self.transformer(input_ids=input_ids, attention_mask=attention_mask)
        pooled_output = output.pooler_output
        pooled_output = pooled_output.unsqueeze(2)
        
        conv_out = F.relu(self.conv(pooled_output))
        pooled_conv_out, _ = torch.max(conv_out, dim=2)  
        logits = self.fc(pooled_conv_out)
        return logits


In [13]:
num_classes =6  
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device_ids = [0, 1]  


model = EmotionClassifierWithConv(model, num_classes)
model = nn.DataParallel(model, device_ids=device_ids)
model = model.to(device)


optimizer = torch.optim.Adam(model.parameters(), lr=1e-5)
criterion = nn.CrossEntropyLoss()


num_epochs=3
for epoch in range(num_epochs):
    model.train()
    total_loss = 0.0
    total_correct = 0
    
    for batch in train_dataloader:
        input_ids, attention_mask, labels = batch
        input_ids, attention_mask, labels = input_ids.to(device), attention_mask.to(device), labels.to(device)
        
        optimizer.zero_grad()
        outputs = model(input_ids, attention_mask)
       

        labels = labels.to(device).long()
        outputs = outputs.float()
        


        
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
        _, predicted = torch.max(outputs, 1)
        total_correct += (predicted == labels).sum().item()
    
    train_loss = total_loss / len(train_dataloader)
    train_accuracy = total_correct / len(train_dataset)
    
    # Validation loop
    model.eval()
    total_val_loss = 0.0
    total_val_correct = 0
    val_predicted = []
    val_labels = []
    
    with torch.no_grad():
        for batch in val_dataloader:
            input_ids, attention_mask, labels = batch
            input_ids, attention_mask, labels = input_ids.to(device), attention_mask.to(device), labels.to(device)

            outputs = model(input_ids, attention_mask)
            labels = labels.to(device).long()
            
            outputs = outputs.float()
            loss = criterion(outputs, labels)

            total_val_loss += loss.item()
            _, predicted = torch.max(outputs, 1)
            total_val_correct += (predicted == labels).sum().item()
            val_predicted.extend(predicted.cpu().numpy())
            val_labels.extend(labels.cpu().numpy())
    

In [14]:
from sklearn.metrics import classification_report


val_predicted = np.array(val_predicted)
val_labels = np.array(val_labels)


report = classification_report(val_labels, val_predicted, target_names=[f'Class {i}' for i in range(num_classes)])

print("Classification Report:")
print(report)

Classification Report:
              precision    recall  f1-score   support

     Class 0       0.99      0.95      0.97       903
     Class 1       0.98      0.91      0.94       815
     Class 2       0.93      1.00      0.96       843
     Class 3       0.94      0.97      0.95       818
     Class 4       0.95      0.87      0.91       795
     Class 5       0.91      1.00      0.95       826

    accuracy                           0.95      5000
   macro avg       0.95      0.95      0.95      5000
weighted avg       0.95      0.95      0.95      5000



<div style="background-color:#4C212A; padding:20px; border-radius:10px; box-shadow: 0 4px 8px 0 rgba(0,0,0,0.2);">
<p style="font-family:Georgia, serif; font-size:24px; color:white; text-align:center;">📈 Results: Approach 2</p>

<p style="font-family:Arial, sans-serif; font-size:16px; line-height:1.5; color:white;">Drumroll, please! 🥁 Our second modeling approach has taken our performance to new heights! 🏔️</p>

<ul style="font-family:Arial, sans-serif; font-size:16px; color:white;">
   <li>The F1 score has been upgraded to an impressive ~95%! 🏆</li>
   <li>But that's not all! We've also achieved excellent precision and recall values across all six emotion classes. 💯</li>
</ul>
<p style="font-family:Arial, sans-serif; font-size:16px; line-height:1.5; color:white;">Now let's save our model</p>
</div>

In [15]:
torch.save(model.state_dict(), 'model_weights.pth')
!cp model_weights.pth /kaggle/working/

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


cp: 'model_weights.pth' and '/kaggle/working/model_weights.pth' are the same file


<div style="background-color:#08A4BD; padding:20px; border-radius:10px; box-shadow: 0 4px 8px 0 rgba(0,0,0,0.2);">
<p style="font-family:Georgia, serif; font-size:24px; color:#000000; text-align:center;">🎉 Conclusion</p>

<p style="font-family:Arial, sans-serif; font-size:16px; line-height:1.5; color:#000000;">Congratulations! We've made it to the end of our emotion classification journey! 🏆 Through this project, we've explored the power of fine-tuning BERT models and enhancing their architectures with additional layers.</p>

<p style="font-family:Arial, sans-serif; font-size:16px; line-height:1.5; color:#000000;">Our efforts have paid off, as we've achieved outstanding results, with an impressive F1 score of ~95% and excellent precision and recall values across all six emotion classes. 🚀</p>

<p style="font-family:Arial, sans-serif; font-size:16px; line-height:1.5; color:#000000;">But our journey doesn't end here. There's always room for improvement and new challenges to tackle. Feel free to experiment with different architectures, try out other pre-trained language models, or even explore different datasets. The possibilities are endless! 🌍</p>

<p style="font-family:Arial, sans-serif; font-size:16px; line-height:1.5; color:#000000;">If you found this notebook helpful and insightful, we'd greatly appreciate it if you could upvote it on Kaggle. Your support means a lot and encourages us to continue sharing our knowledge and experiences with the community. 👏</p>

<p style="font-family:Arial, sans-serif; font-size:16px; line-height:1.5; color:#000000;">Thank you for joining us on this incredible journey, and happy coding! 🙌</p>
</div>