<a href="https://colab.research.google.com/github/chasekenyon/GoEmotion/blob/master/GoEmotion%20Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Repo available at https://github.com/chasekenyon/GoEmotion

## Installation of Required Libraries

This section contains the pip command to install the essential libraries for the project:

- `optuna`: A hyperparameter optimization framework.
- `transformers`: A library by Hugging Face that provides pre-trained models for Natural Language Processing (NLP), including RoBERTa.
- `datasets`: A library by Hugging Face for loading and processing datasets, used here to load the GoEmotions dataset.
- `torchmetrics`: A library that integrates with PyTorch to provide various evaluation metrics.

The `-q` flag ensures that the installation is done quietly without extensive logs.


In [None]:
# Installing the required libraries such as Optuna, Transformers, Datasets, and Torchmetrics, uncomment and run if needed
# pip install optuna transformers datasets torchmetrics -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m404.2/404.2 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m32.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m519.3/519.3 kB[0m [31m34.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m731.6/731.6 kB[0m [31m11.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m225.3/225.3 kB[0m [31m18.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m22.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m79.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m57.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━

## Importing Libraries and Loading the Dataset

In this section, the necessary libraries are imported, including:

- PyTorch (`torch`): A deep learning framework.
- NumPy (`np`): A library for numerical operations.
- Transformers (`RobertaTokenizer`, `RobertaConfig`, `RobertaForSequenceClassification`, etc.): Classes and functions related to the RoBERTa model.
- Dataloader and other PyTorch utilities for handling data.
- Loss function (`BCEWithLogitsLoss`) and metrics (`classification_report`).

Additionally, the GoEmotions dataset is loaded using the `load_dataset` function from the `datasets` library. This dataset contains text and corresponding emotion labels and is available in a simplified version.


In [None]:
# Importing required libraries for handling tensors, numerical operations, and loading datasets.
import torch
import numpy as np
from datasets import load_dataset
from torch.utils.data import Dataset
from transformers import RobertaTokenizer, RobertaConfig, RobertaForSequenceClassification, get_linear_schedule_with_warmup
from torch.utils.data import DataLoader
from torch.optim import AdamW
from tqdm import tqdm
from torch.nn import BCEWithLogitsLoss
from sklearn.metrics import classification_report
from collections import Counter

In [None]:
# Loading the GoEmotions dataset in its simplified version.
dataset = load_dataset("go_emotions", "simplified")

Downloading builder script:   0%|          | 0.00/5.75k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/7.03k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/9.12k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.61M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/203k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/201k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/43410 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/5426 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/5427 [00:00<?, ? examples/s]

## Tokenization and Preprocessing

This section focuses on the tokenization and preprocessing of the GoEmotions dataset:

- **Tokenizer Initialization**: A pre-trained RoBERTa tokenizer is initialized. The `do_lower_case=False` parameter ensures that the text is not converted to lowercase, preserving the original casing.

In [None]:
# Initializing the Roberta tokenizer with base configuration.
tokenizer = RobertaTokenizer.from_pretrained('roberta-base', do_lower_case=False)

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

- **Processing Function**: The `process_example` function is defined to handle individual examples in the dataset. Inside this function:
  - The text is tokenized using the RoBERTa tokenizer with additional parameters such as truncation, padding, and a maximum length of 200.
  - Labels are converted to a binary vector of length 28, representing the possible emotion categories. A value of 1 is set at the indices corresponding to the labels present in the example.
- **Applying the Processing Function**: The `process_example` function is mapped to both the train and test datasets, converting them into a format suitable for training the model.

In [None]:
# Defining a function to process individual examples in the dataset. This includes tokenization, padding, and truncation.
def process_example(example):
    # Tokenize the text
    tokenized = tokenizer(example['text'], add_special_tokens=True, truncation=True, padding='max_length', max_length=200, return_tensors="pt")

    # Convert labels to a binary vector
    label_vector = torch.zeros(28)
    for label in example['labels']:
        label_vector[label] = 1

    return {
        'input_ids': tokenized['input_ids'].squeeze(),
        'attention_mask': tokenized['attention_mask'].squeeze(),
        'labels': label_vector,
    }

train_data = dataset['train'].map(process_example)
test_data = dataset['test'].map(process_example)

Map:   0%|          | 0/43410 [00:00<?, ? examples/s]

Map:   0%|          | 0/5427 [00:00<?, ? examples/s]

## Label Mapping

A dictionary `id2label` is created to map the numerical label IDs to their corresponding emotion categories. This mapping is essential for interpreting the output predictions of the model and associating them with human-readable emotion labels.

In [None]:
# Mapping the label IDs to their corresponding emotion names.
id2label = {0:"admiration",
            1:"amusement",
            2:"anger",
            3:"annoyance",
            4:"approval",
            5:"caring",
            6:"confusion",
            7:"curiosity",
            8:"desire",
            9:"disappointment",
            10:"disapproval",
            11:"disgust",
            12:"embarrassment",
            13:"excitement",
            14:"fear",
            15:"gratitude",
            16:"grief",
            17:"joy",
            18:"love",
            19:"nervousness",
            20:"optimism",
            21:"pride",
            22:"realization",
            23:"relief",
            24:"remorse",
            25:"sadness",
            26:"surprise",
            27:"neutral"}

## Setting Dataset Format

The format of the training and test datasets is set to `'torch'`, specifying that the data should be represented as PyTorch tensors. Additionally, the relevant columns `'input_ids'`, `'attention_mask'`, and `'labels'` are selected. This ensures compatibility with PyTorch's DataLoader and the subsequent training process.

In [None]:
# Setting the format of the training and test data to torch tensors and specifying the required columns.
train_data.set_format(type='torch', columns=['input_ids', 'attention_mask', 'labels'])
test_data.set_format(type='torch', columns=['input_ids', 'attention_mask', 'labels'])

## Hyperparameters and Training Arguments

A dictionary `args` is defined to contain various hyperparameters and training arguments:

- `weight_decay`: Regularization parameter to prevent overfitting (set to 0.0 here).
- `learning_rate`: The learning rate for the optimizer (5e-5 in this case).
- `epochs`: The number of training epochs.
- `gradient_accumulation_steps`: The number of steps for gradient accumulation, allowing for larger effective batch sizes.
- `batch_size`: The batch size used for training.

In [None]:
# Defining hyperparameters such as weight decay and learning rate for training the model.
args = {
    'weight_decay': 0.0,
    'learning_rate': 5e-5,
    'epochs': 5,
    'gradient_accumulation_steps':1,
    'batch_size': 32
}

Originally, attempts were made with different model architecture, to try both manual hyperparameter tuning, and tuning with the Optuna Framework. Due to issues with model architecture, time and resources spent on this were not fruitful, but should be revisted with more time and access to resources. Below is a link to an unformatted Colab notebook with the code from this original attempt.  
https://github.com/chasekenyon/GoEmotion/blob/master/GoEmotion_Analysis_and_Training.ipynb

## Model Configuration and Initialization

The RoBERTa model configuration is defined using `RobertaConfig.from_pretrained('roberta-base', num_labels=28)`, specifying the base RoBERTa model and the number of output labels (28). The model is then initialized with this configuration using `RobertaForSequenceClassification`, and moved to the GPU using `.cuda()`.

In [None]:
# Configuring the Roberta model with 28 output labels and initializing the model instance.
roberta_config = RobertaConfig.from_pretrained('roberta-base', num_labels=28)
model = RobertaForSequenceClassification(config=roberta_config).cuda() # Send to Cuda cores for processing

## Creating Data Loaders

A DataLoader for the training data is created using PyTorch's `DataLoader` class. It takes the preprocessed training data and the specified batch size from the `args` dictionary. The data is shuffled to ensure randomness during training.

In [None]:
# Creating a DataLoader for the training data, with shuffling and specified batch size.
train_loader = DataLoader(train_data, batch_size=args['batch_size'], shuffle=True)

## Optimizer and Scheduler Configuration

This section configures the optimizer and learning rate scheduler for training:

- `t_total`: The total number of training steps, calculated based on the number of batches, gradient accumulation steps, and epochs.
- `warmup_steps`: The number of warmup steps for the scheduler, set as 10% of `t_total`.
- `no_decay`: Parameters like biases and layer normalization weights that should not be decayed.
- `optimizer_grouped_parameters`: Grouping model parameters into two groups, one with weight decay and one without.
- `optimizer`: The AdamW optimizer is initialized with the learning rate and grouped parameters.
- `scheduler`: A linear learning rate scheduler with warmup is used to adjust the learning rate during training.

In [None]:
# Calculating the total number of training steps required for the entire training process.
# This considers the number of batches in the training loader, gradient accumulation steps, and the total number of epochs.
args['t_total'] = len(train_loader) // args['gradient_accumulation_steps'] * args['epochs']
args['warmup_steps'] = int(0.10*args['t_total'])

# Grouping model parameters into two groups:
# 1. Parameters that require weight decay (excluding those in the no_decay list).
# 2. Parameters that do not require weight decay (those in the no_decay list).
no_decay = ['bias', 'LayerNorm.weight']
optimizer_grouped_parameters = [
        {'params': [p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay)],
         'weight_decay': args['weight_decay']},
        {'params': [p for n, p in model.named_parameters() if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}
    ]

# Initializing the AdamW optimizer with the grouped parameters and specified learning rate.
optimizer = AdamW(optimizer_grouped_parameters, lr=args['learning_rate'])

# Creating a learning rate scheduler that linearly decreases the learning rate after a specified number of warmup steps.
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=args['warmup_steps'], num_training_steps=args['t_total'])

## Loss Function Definition

The loss function used for training is the Binary Cross-Entropy with Logits Loss (`BCEWithLogitsLoss`). This loss function is suitable for multi-label classification problems like the GoEmotions task, where each example may belong to multiple emotion categories.

In [None]:
# Defining the loss function as Binary Cross-Entropy with Logits Loss.
loss_function = BCEWithLogitsLoss()

## Training Loop

This section contains the main training loop for the model:

- The model is set to training mode using `model.train()`.
- The device (GPU or CPU) is determined based on availability.
- For each epoch and batch:
  - The input data is moved to the device.
  - A forward pass through the model is performed to compute the logits.
  - The loss is calculated using the predefined loss function.
  - A backward pass is performed to calculate gradients.
  - The optimizer and scheduler are stepped to update the model weights.
  - Gradients are zeroed for the next iteration.

In [None]:
# Defining the training loop for the model, including forward and backward passes.
model.train()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
for epoch in range(args['epochs']):
    for batch in tqdm(train_loader):

        # Move batch data to the device
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['labels'].to(device)

        # Forward pass
        outputs = model(input_ids=input_ids, attention_mask=attention_mask)
        loss = loss_function(outputs.logits, labels.float())

        # Backward pass
        loss.backward()
        optimizer.step()
        scheduler.step()
        model.zero_grad()

100%|██████████| 1357/1357 [07:02<00:00,  3.21it/s]
100%|██████████| 1357/1357 [07:02<00:00,  3.21it/s]
100%|██████████| 1357/1357 [07:02<00:00,  3.21it/s]
100%|██████████| 1357/1357 [07:02<00:00,  3.21it/s]
100%|██████████| 1357/1357 [07:02<00:00,  3.21it/s]


## Test Data Loader

A DataLoader for the test data is created using PyTorch's `DataLoader` class. It takes the preprocessed test data and the specified batch size from the `args` dictionary. This DataLoader will be used during the evaluation phase.

In [None]:
# Creating a DataLoader for the test data, with the specified batch size.
test_loader = DataLoader(test_data, batch_size=args['batch_size'])

## Evaluation Function

The `evaluate` function is defined to evaluate the model on a given dataset (e.g., test data):

- The model is set to evaluation mode using `model.eval()`.
- A dictionary `dict_result` is initialized to collect actual labels and predictions.
- For each batch in the evaluation data:
  - The input data is moved to the device.
  - A forward pass through the model is performed to compute the logits.
  - The sigmoid function is applied to the logits to obtain probabilities.
  - A threshold of 0.5 is applied to the probabilities to obtain binary predictions.
- The actual labels and predictions are collected and returned as the evaluation result.

In [None]:
# Defining a function to evaluate the model on a given data loader. It calculates predictions and actual values.
def evaluate(model_, eval_loader):
    model.eval()
    dict_result = {'actual':[], 'preds':[]}
    with torch.no_grad():
        for batch in tqdm(eval_loader):
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            outputs = model(input_ids=input_ids, attention_mask=attention_mask)
            logits = outputs.logits
            probs = torch.sigmoid(logits)  # Apply sigmoid to get probabilities
            preds = (probs > 0.5).int()    # Apply threshold

            dict_result['actual'] += batch['labels'].cpu().numpy().tolist()
            dict_result['preds'] += preds.cpu().numpy().tolist()
    return dict_result

## Performance Evaluation Function

The `get_performance` function is defined to evaluate the model's performance and print a detailed classification report:

- **Input Parameters**:
  - `actual_og`: The actual labels in one-hot encoding.
  - `preds_og`: The predicted labels in one-hot encoding.
  - `dict_mapping`: A dictionary mapping label IDs to their corresponding emotion categories.
- **Processing**:
  - The one-hot encoded labels are converted to label encoding using the `argmax` function.
  - The numerical labels are then converted to their corresponding emotion names using `dict_mapping`.
  - A classification report is generated using `classification_report` from scikit-learn, including metrics like precision, recall, F1-score, etc.
  - The distribution of actual and predicted labels is printed using the `Counter` class.
- **Output**: The function returns the classification report as a string.

In [None]:
# Defining a function to calculate performance metrics such as accuracy, precision, recall, and F1-score.
def get_performance(actual_og, preds_og, dict_mapping):
    # Convert one-hot encoding to label encoding
    actual_ = [np.argmax(item) for item in actual_og]
    preds_ = [np.argmax(item) for item in preds_og]

    # Convert to label names
    target_names = [dict_mapping[i] for i in range(len(dict_mapping))]

    # Print the classification report
    report = classification_report(actual_, preds_, target_names=target_names)

    print(report)

    print('Actual counter:', Counter(actual_))
    print('Prediction counter:', Counter(preds_))

    return report

## Model Evaluation and Reporting

This final section is dedicated to evaluating the trained model on the test data and reporting the results:

- The `evaluate` function is called with the trained model and test data loader, resulting in a dictionary containing the actual labels and predictions.
- The `get_performance` function is then called with the test results and the label mapping dictionary (`id2label`). This prints the detailed classification report, including the distribution of actual and predicted labels, and returns the report as a string.
- The evaluation provides insights into the model's performance on individual emotion categories and helps identify areas for improvement or further fine-tuning.

In [None]:
# Evaluating the trained model on the test data and reporting the results in a DataFrame.
dict_test_results = evaluate(model_=model, eval_loader=test_loader)

df_test = get_performance(actual_og=dict_test_results['actual'],
                          preds_og=dict_test_results['preds'],
                          dict_mapping=id2label)

100%|██████████| 170/170 [00:17<00:00,  9.88it/s]


                precision    recall  f1-score   support

    admiration       0.33      0.59      0.42       504
     amusement       0.71      0.77      0.74       252
         anger       0.44      0.32      0.37       197
     annoyance       0.25      0.17      0.21       286
      approval       0.26      0.24      0.25       318
        caring       0.28      0.21      0.24       114
     confusion       0.36      0.28      0.31       139
     curiosity       0.43      0.36      0.39       233
        desire       0.39      0.28      0.33        74
disappointment       0.24      0.13      0.16       127
   disapproval       0.22      0.14      0.17       220
       disgust       0.52      0.38      0.44        84
 embarrassment       0.60      0.20      0.30        30
    excitement       0.32      0.32      0.32        84
          fear       0.60      0.61      0.60        74
     gratitude       0.79      0.85      0.82       288
         grief       0.00      0.00      0.00  

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


## Results Comparison

Below is a comparative analysis of the results obtained from the current implementation using RoBERTa and the results from Google's research paper using BERT, focusing on the macro-average scores for precision, recall, and F1-score:

### Macro-Average Scores
- **Our Model**:
  - Precision: 0.37
  - Recall: 0.34
  - F1-Score: 0.34
- **Google's Model**:
  - Precision: 0.40
  - Recall: 0.63
  - F1-Score: 0.46
  - Standard Deviation: 0.19

### Observations
- **Precision**: Google's model shows a slightly higher macro-average precision compared to our implementation.
- **Recall**: There is a more significant difference in recall, where Google's model exhibits higher sensitivity to positive instances.
- **F1-Score**: The macro-average F1-Score, which balances precision and recall, is also higher in Google's model.
- **Standard Deviation**: Google's results include a standard deviation of 0.19, reflecting the variability across different emotions.

### Conclusion
The comparative analysis indicates that Google's BERT-based approach achieves higher overall performance in terms of macro-average metrics compared to our RoBERTa-based implementation. While the differences in precision are relatively small, the recall and F1-score show more substantial disparities. Possible reasons for this discrepancy might include differences in model architecture, training strategies, hyperparameter tuning, or preprocessing methods. Further experimentation and fine-tuning may help bridge the performance gap.