<a href="https://colab.research.google.com/github/JK88-1337/Emotion-Classification-using-Tweets/blob/DylanXiao/Copy_of_RoBERTa_Fine_Tuning_Emotion_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **1. Introduction**

Emotions play a crucial role in human communication, influencing decisions, behaviors, and social interactions. In today's digital age, people frequently express their feelings through short textual posts on social media platforms like Twitter. However, interpreting these emotions is a challenging task for machines due to the brevity, informal language, sarcasm, and contextual dependencies often found in such texts.

# **2. Problem Statement**


The central problem is that emotions are expressed in nuanced and subjective ways—highly dependent on cultural context, individual experience, and linguistic variation. Traditional machine learning approaches struggle to generalize well across these complexities.

Hence, the objective of this project is to develop deep learning-based solutions that can robustly and accurately classify emotions from tweets. This involves leveraging advanced Natural Language Processing (NLP) models capable of understanding contextual and semantic subtleties in text, such as transformer-based architectures. The project aims to push the boundaries of emotion recognition by integrating class imbalance handling, attention mechanisms, and performance evaluation using balanced metrics like F1 score.

# **3. Data Sources**

# **4. Models and/or Methods**

##4.1 Dataset Download & Tokenizer Setup

In [5]:
!pip install huggingface_hub pytorch_lightning torchmetrics
from huggingface_hub import hf_hub_download

# Download the Parquet files from Hugging Face
hf_hub_download(repo_id="dair-ai/emotion", filename="split/train-00000-of-00001.parquet", repo_type="dataset", local_dir=".")
hf_hub_download(repo_id="dair-ai/emotion", filename="split/test-00000-of-00001.parquet", repo_type="dataset", local_dir=".")
hf_hub_download(repo_id="dair-ai/emotion", filename="split/validation-00000-of-00001.parquet", repo_type="dataset", local_dir=".")

# Rename the files for consistency with the rest of the code
import os
os.rename("split/train-00000-of-00001.parquet", "train.parquet")
os.rename("split/test-00000-of-00001.parquet", "test.parquet")
os.rename("split/validation-00000-of-00001.parquet", "validation.parquet")

# Create the tokenizer directory
!mkdir -p tokenizer
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('distilroberta-base')
tokenizer.save_pretrained("tokenizer")



train-00000-of-00001.parquet:   0%|          | 0.00/1.03M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/129k [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/127k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/480 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

('tokenizer/tokenizer_config.json',
 'tokenizer/special_tokens_map.json',
 'tokenizer/vocab.json',
 'tokenizer/merges.txt',
 'tokenizer/added_tokens.json',
 'tokenizer/tokenizer.json')

In [None]:
pip install pytorch_lightning

##4.2 import Libraries
Import all necessary Python libraries used in the notebook.


In [1]:
# === Standard Library ===
from typing import List
from functools import lru_cache
from argparse import Namespace

# === Third-party Libraries ===
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.optim import AdamW
from torch.utils.data import DataLoader, Dataset
import torchmetrics
import pytorch_lightning as pl
from sklearn.metrics import classification_report

# === Hugging Face Transformers ===
from transformers import (
    AutoTokenizer,
    AutoModel,
    AutoModelForMaskedLM,
    get_linear_schedule_with_warmup
)

# === Hugging Face Tokenizers (Optional, if needed) ===
from tokenizers import ByteLevelBPETokenizer
from tokenizers.processors import BertProcessing

# Set random seed
def set_seed(seed: int = 42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    pl.seed_everything(seed, workers=True)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

set_seed(42)


ModuleNotFoundError: No module named 'pytorch_lightning'

##4.3 Define Emotion Labels and Data Paths
Set up the emotion categories and specify file paths to the dataset splits.

In [None]:
# Define labels
label2int = {
  "sadness": 0,
  "joy": 1,
  "love": 2,
  "anger": 3,
  "fear": 4,
  "surprise": 5
}
emotions = list(label2int.keys())

# Define paths
train_path = "train.parquet"
test_path = "test.parquet"
val_path = "validation.parquet"

##4.4 Load and Save Tokenizer
We use the pretrained tokenizer from `distilroberta-base` and save it locally for consistent use across data loading and model input encoding.


In [None]:
# Save tokenizer files
tokenizer = AutoTokenizer.from_pretrained('distilroberta-base')
tokenizer.save_pretrained("tokenizer")

## 4.5 Define Custom Activation Function: Mish

We define and register the **Mish** activation function using `@torch.jit.script`. Mish is a smooth, non-monotonic activation function defined as:

\[
\text{Mish}(x) = x \cdot \tanh(\text{softplus}(x))
\]


It is known to yield improved performance in some deep learning tasks compared to ReLU or GELU.


In [None]:
@torch.jit.script
def mish(input):
    return input * torch.tanh(F.softplus(input))

class Mish(nn.Module):
    def forward(self, input):
        return mish(input)


## 4.6 Model Architecture: EmoModel


### 4.6.1 Baseline
This class defines the emotion classification model based on a pretrained Transformer (e.g., DistilRoBERTa). The architecture includes:

- A base Transformer model for feature extraction
- A classification head with dropout, Mish activation, and linear layers
- Custom weight initialization
- CLS token pooling for final prediction

Note: We initially experimented with a simpler version of EmoModel, where the [CLS] token representation was passed directly into a classifier. However, this approach lacked flexibility in capturing important contextual information across the entire sequence.
As a result, we adopted a revised architecture with attention pooling over all token representations, allowing the model to focus more effectively on informative tokens. The earlier model was deprecated in favor of this final version.


In [None]:

class EmoModel(nn.Module):
    def __init__(self, base_model, n_classes, base_model_output_size=768, dropout=0.05):
        super().__init__()
        self.base_model = base_model

        self.classifier = nn.Sequential(
            nn.Dropout(dropout),
            nn.Linear(base_model_output_size, base_model_output_size),
            Mish(),
            nn.Dropout(dropout),
            nn.Linear(base_model_output_size, n_classes)
        )

        for layer in self.classifier:
            if isinstance(layer, nn.Linear):
                layer.weight.data.normal_(mean=0.0, std=0.02)
                if layer.bias is not None:
                    layer.bias.data.zero_()

    def forward(self, input_, *args):
        X, attention_mask = input_
        hidden_states = self.base_model(X, attention_mask=attention_mask)
        return self.classifier(hidden_states[0][:, 0, :])

### 4.6.2 Tong's Model


In [None]:

class EmoModel(nn.Module):
    def __init__(self, n_classes, base_model_output_size=768, dropout=0.05):
        super().__init__()
        self.base_model = AutoModel.from_pretrained("microsoft/deberta-v3-small")

        self.attention = nn.Sequential(
            nn.Linear(base_model_output_size, base_model_output_size),
            nn.Tanh(),
            nn.Linear(base_model_output_size, 1, bias=False),
            nn.Softmax(dim=1)
        )

        self.classifier = nn.Sequential(
            nn.Dropout(dropout),
            nn.Linear(base_model_output_size, base_model_output_size),
            nn.GELU(),
            nn.Dropout(dropout),
            nn.Linear(base_model_output_size, n_classes)
        )

        for layer in self.classifier:
            if isinstance(layer, nn.Linear):
                layer.weight.data.normal_(mean=0.0, std=0.02)
                if layer.bias is not None:
                    layer.bias.data.zero_()

    def forward(self, input_, *args):
        X, attention_mask = input_
        hidden_states = self.base_model(X, attention_mask=attention_mask)[0]
        weights = self.attention(hidden_states)
        context_vector = torch.sum(weights * hidden_states, dim=1)
        return self.classifier(context_vector)


dylan

In [None]:
# ✅ DeBERTa + Attention Pooling + Classifier
class EmoModel(nn.Module):
    def __init__(self, n_classes, hidden_size=768, dropout=0.3):
        super().__init__()
        self.base_model = AutoModel.from_pretrained("microsoft/deberta-v3-small")
        self.attention = nn.Sequential(
            nn.Linear(hidden_size, hidden_size), nn.Tanh(),
            nn.Linear(hidden_size, 1), nn.Softmax(dim=1)
        )
        self.classifier = nn.Sequential(
            nn.Dropout(dropout),
            nn.Linear(hidden_size, hidden_size), nn.GELU(),
            nn.Dropout(dropout),
            nn.Linear(hidden_size, n_classes)
        )

    def forward(self, inputs):
        input_ids, attn_mask = inputs
        hidden_states = self.base_model(input_ids, attention_mask=attn_mask).last_hidden_state
        weights = self.attention(hidden_states)
        pooled = torch.sum(weights * hidden_states, dim=1)
        return self.classifier(pooled)


In [None]:
# ✅ DeBERTa + Attention Pooling + Classifier
class EmoModel(nn.Module):
    def __init__(self, n_classes, hidden_size=768, dropout=0.3):
        super().__init__()
        self.base_model = AutoModel.from_pretrained("microsoft/deberta-v3-small")
        self.attention = nn.Sequential(
            nn.Linear(hidden_size, hidden_size), nn.Tanh(),
            nn.Linear(hidden_size, 1), nn.Softmax(dim=1)
        )
        self.classifier = nn.Sequential(
            nn.Dropout(dropout),
            nn.Linear(hidden_size, hidden_size), nn.GELU(),
            nn.Dropout(dropout),
            nn.Linear(hidden_size, n_classes)
        )
    def forward(self, inputs):
        input_ids, attn_mask = inputs
        hidden_states = self.base_model(input_ids, attention_mask=attn_mask).last_hidden_state
        weights = self.attention(hidden_states)
        pooled = torch.sum(weights * hidden_states, dim=1)
        return self.classifier(pooled)

## 4.7 TokenizersCollateFn

###4.7.1 Baseline

A helper class for preprocessing batches of text using a Byte-Level BPE tokenizer.

- **Initializes** with a tokenizer (`vocab.json` + `merges.txt`), applies BERT-style special tokens (`<s>`, `</s>`), and sets max token length with padding/truncation.
- **`__call__` method** takes a batch of `(text, label)` pairs, returns:
  - Padded token ID tensor
  - Attention mask tensor
  - Label tensor

Useful for preparing inputs for transformer-based models.


In [None]:
class TokenizersCollateFn:
    def __init__(self, max_tokens=512):
        t = ByteLevelBPETokenizer(
            "tokenizer/vocab.json",
            "tokenizer/merges.txt"
        )
        t._tokenizer.post_processor = BertProcessing(
            ("</s>", t.token_to_id("</s>")),
            ("<s>", t.token_to_id("<s>")),
        )
        t.enable_truncation(max_tokens)
        t.enable_padding(length=max_tokens, pad_id=t.token_to_id("<pad>"))
        self.tokenizer = t

    def __call__(self, batch):
        batch = [x for x in batch if x is not None]
        if not batch:
            return None
        encoded = self.tokenizer.encode_batch([x[0] for x in batch])
        sequences_padded = torch.tensor([enc.ids for enc in encoded])
        attention_masks_padded = torch.tensor([enc.attention_mask for enc in encoded])
        labels = torch.tensor([x[1] for x in batch])
        return (sequences_padded, attention_masks_padded), labels

### 4.7.2 Tong's Tokenizer


In [None]:

class TokenizersCollateFn:
    def __init__(self, max_tokens=512):
        self.tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-small")

    def __call__(self, batch):
        batch = [x for x in batch if x is not None]
        if not batch:
            return None
        encoded = self.tokenizer(
            [x[0] for x in batch],
            padding=True,
            truncation=True,
            max_length=512,
            return_attention_mask=True,
            return_tensors="pt"
        )
        sequences_padded = encoded["input_ids"]
        attention_masks_padded = encoded["attention_mask"]
        labels = torch.tensor([x[1] for x in batch])
        return (sequences_padded, attention_masks_padded), labels



dylan

In [None]:
# ✅ Tokenization wrapper
class TokenizersCollateFn:
    def __init__(self):
        self.tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-small")

    def __call__(self, batch):
        texts, labels = zip(*batch)
        enc = self.tokenizer(list(texts), padding=True, truncation=True, return_tensors="pt", max_length=512)
        return (enc["input_ids"], enc["attention_mask"]), torch.tensor(labels)


## 4.8 Baseline EmoDataset

A custom PyTorch `Dataset` for loading emotion classification data from a `.parquet` file.

- Loads text and label columns (`"text"` and `"label"`).
- `__getitem__`: Returns `(text, label)` for a given index, or `None` if invalid.
- `__len__`: Returns dataset size.

Used for feeding data into DataLoader during training.


In [None]:

class EmoDataset(Dataset):
    def __init__(self, path):
        super().__init__()
        self.data_column = "text"
        self.class_column = "label"
        self.data = pd.read_parquet(path)

    def __getitem__(self, idx):
        label = self.data.loc[idx, self.class_column]
        if label is None:
            return None
        try:
            return self.data.loc[idx, self.data_column], label
        except KeyError:
            return None

    def __len__(self):
        return self.data.shape[0]

Dylan's DataSet

In [None]:
class EmoDataset(Dataset):
    def __init__(self, path):
        self.data = pd.read_parquet(path).dropna(subset=["text", "label"])
        self.data = self.data[self.data["label"].isin(label2int.values())]

    def __getitem__(self, idx):
        row = self.data.iloc[idx]
        return row["text"], row["label"]

    def __len__(self): return len(self.data)

using over/undersampling

In [None]:
# ✅ Label mapping
label2int = {"sadness": 0, "joy": 1, "love": 2, "anger": 3, "fear": 4, "surprise": 5}
emotions = list(label2int.keys())

# ✅ Balanced (Under/Oversampled) Dataset
class EmoDataset(Dataset):
    def __init__(self, path, max_samples_per_class=1000):
        self.data = pd.read_parquet(path).dropna(subset=["text", "label"])
        self.data = self.data[self.data["label"].isin(label2int.values())]

        grouped = []
        for label in self.data["label"].unique():
            class_data = self.data[self.data["label"] == label]

            if len(class_data) > max_samples_per_class:
                sampled = class_data.sample(max_samples_per_class, random_state=42)  # Undersampling
            else:
                sampled = class_data.sample(max_samples_per_class, replace=True, random_state=42)  # Oversampling

            grouped.append(sampled)

        self.data = pd.concat(grouped).sample(frac=1, random_state=42).reset_index(drop=True)

    def __getitem__(self, idx):
        row = self.data.iloc[idx]
        return row["text"], row["label"]

    def __len__(self): return len(self.data)


## 4.9 Define TrainingModule Base & Initialization

Baseline

In [None]:
class TrainingModule(pl.LightningModule):
    def __init__(self, hparams):
        super().__init__()
        self.model = EmoModel(
            AutoModelForMaskedLM.from_pretrained("distilroberta-base").roberta,
            len(emotions)
        )
        self.loss = nn.CrossEntropyLoss()
        self.save_hyperparameters(hparams)
        self.accuracy = torchmetrics.Accuracy(task="multiclass", num_classes=len(emotions))


Tong

In [None]:
class TrainingModule(pl.LightningModule):
    def __init__(self, hparams):
        super().__init__()
        self.model = EmoModel(len(emotions))
        self.loss = nn.CrossEntropyLoss()
        self.save_hyperparameters(hparams)
        self.accuracy = torchmetrics.Accuracy(task="multiclass", num_classes=len(emotions))


Dylan

In [None]:
class TrainingModule(pl.LightningModule):
    def __init__(self, hparams):
        super().__init__()
        self.save_hyperparameters(hparams)
        self.model = EmoModel(n_classes=len(emotions))

        # ✅ Added macro-averaged F1 score for imbalanced emotion classes
        self.f1 = torchmetrics.F1Score(task="multiclass", num_classes=len(emotions), average="macro")
        self.acc = torchmetrics.Accuracy(task="multiclass", num_classes=len(emotions))

        # ✅ Added weighted loss to penalize underrepresented classes
        self.weight = self.compute_class_weight(hparams.train_path)
        self.loss = nn.CrossEntropyLoss(weight=self.weight)


In [None]:
class TrainingModule(pl.LightningModule):
    def __init__(self, hparams):
        super().__init__()
        self.save_hyperparameters(hparams)
        self.model = EmoModel(n_classes=len(emotions))
        self.f1 = torchmetrics.F1Score(task="multiclass", num_classes=len(emotions), average="macro")
        self.acc = torchmetrics.Accuracy(task="multiclass", num_classes=len(emotions))
        self.loss = nn.CrossEntropyLoss()

## 4.10 Forward + Shared Step Function

Baseline

In [None]:
    def forward(self, X, *args):
        return self.model(X, *args)

    def step(self, batch, step_name="train"):
        if batch is None:
            return None
        X, y = batch
        y_hat = self.forward(X)
        loss = self.loss(y_hat, y)
        self.log(f"{step_name}_loss", loss, on_step=True, on_epoch=True, prog_bar=True, logger=True)

        acc = self.accuracy(y_hat, y)
        self.log(f"{step_name}_acc", acc, on_step=True, on_epoch=True, prog_bar=True, logger=True)

        return loss


Dylan

In [None]:
    def forward(self, x): return self.model(x)

    def step(self, batch, name):
        x, y = batch
        y_hat = self(x)
        loss = self.loss(y_hat, y)
        self.log(f"{name}_loss", loss, prog_bar=True)
        self.log(f"{name}_acc", self.acc(y_hat, y), prog_bar=True)
        self.log(f"{name}_f1", self.f1(y_hat, y), prog_bar=True)  # ✅ Log F1 score
        return loss

not used

In [None]:
    def compute_class_weight(self, path):
        df = pd.read_parquet(path)
        counts = torch.bincount(torch.tensor(df["label"].values))
        weights = 1.0 / counts.float()
        return weights / weights.sum() * len(emotions)

    # ✅ Added weightsampling with class-balanced WeightedRandomSampler
    def get_sampler(self, path):
        df = pd.read_parquet(path)
        counts = torch.bincount(torch.tensor(df["label"].values))
        weights = 1.0 / counts.float()
        sample_weights = [weights[l] for l in df["label"].values]
        return WeightedRandomSampler(sample_weights, len(sample_weights), replacement=True)


## 4.11 Training/Validation/Test Steps

In [None]:
    def training_step(self, batch, batch_idx):
        return self.step(batch, "train")

    def validation_step(self, batch, batch_idx):
        return self.step(batch, "val")

    def test_step(self, batch, batch_idx):
        self.step(batch, "test")


## 4.12 Data Loaders

Baseline

In [None]:
    def train_dataloader(self):
        return self.create_data_loader(self.hparams.train_path, shuffle=True)

    def val_dataloader(self):
        return self.create_data_loader(self.hparams.val_path)

    def test_dataloader(self):
        return self.create_data_loader(self.hparams.test_path)

    def create_data_loader(self, ds_path: str, shuffle=False):
        return DataLoader(
            EmoDataset(ds_path),
            batch_size=self.hparams.batch_size,
            shuffle=shuffle,
            collate_fn=TokenizersCollateFn()
        )


Dylan

In [None]:
    def train_dataloader(self): return self.create_loader(self.hparams.train_path, True)
    def val_dataloader(self): return self.create_loader(self.hparams.val_path, False)
    def test_dataloader(self): return self.create_loader(self.hparams.test_path, False)
    def create_loader(self, path, train=True):
            return DataLoader(
                EmoDataset(path),
                batch_size=self.hparams.batch_size,
                sampler=self.get_sampler(path) if train else None,  # ✅ Use oversampling only during training
                shuffle=not train,
                collate_fn=TokenizersCollateFn()
            )

In [None]:
    def train_dataloader(self): return self.create_loader(self.hparams.train_path, True)
    def val_dataloader(self): return self.create_loader(self.hparams.val_path, False)
    def test_dataloader(self): return self.create_loader(self.hparams.test_path, False)
    def create_loader(self, path, train=True):
        return DataLoader(
            EmoDataset(path),
            batch_size=self.hparams.batch_size,
            shuffle=train,
            collate_fn=TokenizersCollateFn()
        )

## 4.13 Optimizer and Schedulerpython


In [None]:
    def configure_optimizers(self):
        optimizer = AdamW(self.model.parameters(), lr=self.hparams.lr)
        scheduler = get_linear_schedule_with_warmup(
            optimizer,
            num_warmup_steps=self.hparams.warmup_steps,
            num_training_steps=self.trainer.estimated_stepping_batches,
        )
        return [optimizer], [{"scheduler": scheduler, "interval": "step"}]


In [None]:
dylan

In [None]:
    def configure_optimizers(self):
        optimizer = AdamW(self.parameters(), lr=self.hparams.lr)
        scheduler = get_linear_schedule_with_warmup(
            optimizer, self.hparams.warmup_steps, self.trainer.estimated_stepping_batches
        )
        return [optimizer], [{"scheduler": scheduler, "interval": "step"}]


## 4.14 Model Training and Evaluation

This section initializes hyperparameters, sets up the PyTorch Lightning training module, and starts training and evaluation using GPU acceleration.


In [None]:
# Training
hparams = Namespace(
    train_path=train_path,
    val_path=val_path,
    test_path=test_path,
    batch_size=32,  # Reduced batch size from 64 to prevent OutOfMemoryError
    warmup_steps=100,
    epochs=10,
    lr=1e-4,
)
module = TrainingModule(hparams)

# Enable GPU and start training
trainer = pl.Trainer(
    accelerator="gpu",
    devices=1,
    max_epochs=hparams.epochs,
    callbacks=[pl.callbacks.TQDMProgressBar(refresh_rate=10)],
)
trainer.fit(module)

# Evaluation
trainer.test(module)

Dylan

In [None]:
# Training config
from argparse import Namespace
from pytorch_lightning.callbacks import EarlyStopping, TQDMProgressBar

hparams = Namespace(
    train_path="train.parquet",
    val_path="validation.parquet",
    test_path="test.parquet",
    batch_size=32,
    lr=1e-4,
    epochs=20,
    warmup_steps=100
)
module = TrainingModule(hparams)

# ✅ Early stopping callback added to halt training when val_f1 stops improving
early_stop_callback = EarlyStopping(
    monitor="val_f1",
    patience=3,
    mode="max",
    verbose=True
)

# Train & Evaluate
trainer = pl.Trainer(
    accelerator="gpu", devices=1,
    max_epochs=hparams.epochs,
    callbacks=[TQDMProgressBar(refresh_rate=10), early_stop_callback]  # ✅ Plug in early stopping
)
trainer.fit(module)
trainer.test(module)

In [None]:
!ls lightning_logs
%load_ext tensorboard
%tensorboard --logdir lightning_logs/

Tong's Improvement of model. Used DeBERTa-v3-small output size is 768

In [8]:
!pip install huggingface_hub pytorch_lightning torchmetrics
from huggingface_hub import hf_hub_download

# Download the Parquet files from Hugging Face
hf_hub_download(repo_id="dair-ai/emotion", filename="split/train-00000-of-00001.parquet", repo_type="dataset", local_dir=".")
hf_hub_download(repo_id="dair-ai/emotion", filename="split/test-00000-of-00001.parquet", repo_type="dataset", local_dir=".")
hf_hub_download(repo_id="dair-ai/emotion", filename="split/validation-00000-of-00001.parquet", repo_type="dataset", local_dir=".")

# Rename the files for consistency with the rest of the code
import os
os.rename("split/train-00000-of-00001.parquet", "train.parquet")
os.rename("split/test-00000-of-00001.parquet", "test.parquet")
os.rename("split/validation-00000-of-00001.parquet", "validation.parquet")

# Create the tokenizer directory
!mkdir -p tokenizer
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('microsoft/deberta-v3-small')
tokenizer.save_pretrained("tokenizer")


import torch
from torch import nn
from typing import List
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForMaskedLM, get_linear_schedule_with_warmup, AutoModel
from torch.optim import AdamW
import pytorch_lightning as pl
from torch.utils.data import DataLoader, Dataset
import pandas as pd
from argparse import Namespace
from sklearn.metrics import classification_report
from tokenizers import ByteLevelBPETokenizer
from tokenizers.processors import BertProcessing
from functools import lru_cache


# Define labels
label2int = {
  "sadness": 0,
  "joy": 1,
  "love": 2,
  "anger": 3,
  "fear": 4,
  "surprise": 5
}
emotions = list(label2int.keys())

# Define paths
train_path = "train.parquet"
test_path = "test.parquet"
val_path = "validation.parquet"

# Save tokenizer files
tokenizer = AutoTokenizer.from_pretrained('microsoft/deberta-v3-small')
tokenizer.save_pretrained("tokenizer")


class EmoModel(nn.Module):
    def __init__(self, n_classes, base_model_output_size=768, dropout=0.05): # DeBERTa-v3-small output size is 768
        super().__init__()
        self.base_model = AutoModel.from_pretrained("microsoft/deberta-v3-small")

        self.attention = nn.Sequential(
            nn.Linear(base_model_output_size, base_model_output_size),
            nn.Tanh(),
            nn.Linear(base_model_output_size, 1, bias=False),
            nn.Softmax(dim=1)
        )

        self.classifier = nn.Sequential(
            nn.Dropout(dropout),
            nn.Linear(base_model_output_size, base_model_output_size),
            nn.GELU(),
            nn.Dropout(dropout),
            nn.Linear(base_model_output_size, n_classes)
        )

        for layer in self.classifier:
            if isinstance(layer, nn.Linear):
                layer.weight.data.normal_(mean=0.0, std=0.02)
                if layer.bias is not None:
                    layer.bias.data.zero_()

    def forward(self, input_, *args):
        X, attention_mask = input_
        hidden_states = self.base_model(X, attention_mask=attention_mask)[0]

        # Attention pooling
        weights = self.attention(hidden_states)
        context_vector = torch.sum(weights * hidden_states, dim=1)

        return self.classifier(context_vector)

class TokenizersCollateFn:
    def __init__(self, max_tokens=512):
        t = AutoTokenizer.from_pretrained("microsoft/deberta-v3-small")
        self.tokenizer = t

    def __call__(self, batch):
        batch = [x for x in batch if x is not None]
        if not batch:
            return None
        encoded = self.tokenizer(
            [x[0] for x in batch],
            padding=True,
            truncation=True,
            max_length=512,
            return_attention_mask=True,
            return_tensors="pt"
        )
        sequences_padded = encoded["input_ids"]
        attention_masks_padded = encoded["attention_mask"]
        labels = torch.tensor([x[1] for x in batch])
        return (sequences_padded, attention_masks_padded), labels

class EmoDataset(Dataset):
    def __init__(self, path):
        super().__init__()
        self.data_column = "text"
        self.class_column = "label"
        self.data = pd.read_parquet(path)

    def __getitem__(self, idx):
        label = self.data.loc[idx, self.class_column]
        if label is None:
            return None
        try:
            return self.data.loc[idx, self.data_column], label
        except KeyError:
            return None

    def __len__(self):
        return self.data.shape[0]


import torchmetrics

class TrainingModule(pl.LightningModule):
    def __init__(self, hparams):
        super().__init__()
        self.model = EmoModel(len(emotions))  # No need to pass the base model here
        self.loss = nn.CrossEntropyLoss()
        self.save_hyperparameters(hparams)
        self.accuracy = torchmetrics.Accuracy(task="multiclass", num_classes=len(emotions))


    def step(self, batch, step_name="train"):
        if batch is None:
            return None
        X, y = batch
        y_hat = self.forward(X)
        loss = self.loss(y_hat, y)
        loss_key = f"{step_name}_loss"
        self.log(loss_key, loss, on_step=True, on_epoch=True, prog_bar=True, logger=True)

        acc = self.accuracy(y_hat, y)
        acc_key = f"{step_name}_acc"
        self.log(acc_key, acc, on_step=True, on_epoch=True, prog_bar=True, logger=True)

        return loss

    def forward(self, X, *args):
        return self.model(X, *args)

    def training_step(self, batch, batch_idx):
        return self.step(batch, "train")

    def validation_step(self, batch, batch_idx):
        return self.step(batch, "val")

    def test_step(self, batch, batch_idx):
        self.step(batch, "test")


    def train_dataloader(self):
        return self.create_data_loader(self.hparams.train_path, shuffle=True)

    def val_dataloader(self):
        return self.create_data_loader(self.hparams.val_path)

    def test_dataloader(self):
        return self.create_data_loader(self.hparams.test_path)

    def create_data_loader(self, ds_path: str, shuffle=False):
        return DataLoader(
                    EmoDataset(ds_path),
                    batch_size=self.hparams.batch_size,
                    shuffle=shuffle,
                    collate_fn=TokenizersCollateFn()
        )


    def configure_optimizers(self):
        optimizer = AdamW(self.model.parameters(), lr=self.hparams.lr)
        scheduler = get_linear_schedule_with_warmup(
            optimizer,
            num_warmup_steps=self.hparams.warmup_steps,
            num_training_steps=self.trainer.estimated_stepping_batches,
        )
        return [optimizer], [{"scheduler": scheduler, "interval": "step"}]



# Trainng

hparams = Namespace(
    train_path=train_path,
    val_path=val_path,
    test_path=test_path,
    batch_size=32,  # Reduced batch size to prevent OutOfMemoryError
    warmup_steps=100,
    epochs=10,
    lr=1e-4,
)
module = TrainingModule(hparams)

# Enable GPU and start training
trainer = pl.Trainer(
    accelerator="gpu",
    devices=1,
    max_epochs=hparams.epochs,
    callbacks=[pl.callbacks.TQDMProgressBar(refresh_rate=10)],
)
trainer.fit(module)

# Evaluation
trainer.test(module)

!ls lightning_logs



train-00000-of-00001.parquet:   0%|          | 0.00/1.03M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/129k [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/127k [00:00<?, ?B/s]

INFO:pytorch_lightning.utilities.rank_zero:💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:pytorch_lightning.utilities.rank_zero:Loading `train_dataloader` to estimate number of stepping batches.
INFO:pytorch_lightning.callbacks.model_summary:
  | Name     | Type               | Params | Mode 
--------------------------------------------------------
0 | model    | EmoModel           | 142 M  | train
1 | loss     | CrossEntropyLoss   | 0      | train
2 | accuracy | MulticlassAccuracy | 0      | train
-----

Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.utilities.rank_zero:
Detected KeyboardInterrupt, attempting graceful shutdown ...


NameError: name 'exit' is not defined

Dylan's improvement. Using CrossEntropyLoss(weight=...) helps handle class imbalance by assigning higher loss to underrepresented classes and lower loss to frequent ones. and also using WeightedRandomSampler to o	Ensures each batch has a more balanced class distribution.

In [11]:
# Install required libraries
!pip install -q huggingface_hub pytorch_lightning torchmetrics

# Download the dataset from Hugging Face
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="dair-ai/emotion", filename="split/train-00000-of-00001.parquet", repo_type="dataset", local_dir=".")
hf_hub_download(repo_id="dair-ai/emotion", filename="split/test-00000-of-00001.parquet", repo_type="dataset", local_dir=".")
hf_hub_download(repo_id="dair-ai/emotion", filename="split/validation-00000-of-00001.parquet", repo_type="dataset", local_dir=".")

# Rename the downloaded files to match internal references
import os
os.rename("split/train-00000-of-00001.parquet", "train.parquet")
os.rename("split/test-00000-of-00001.parquet", "test.parquet")
os.rename("split/validation-00000-of-00001.parquet", "validation.parquet")

# Load tokenizer
from transformers import AutoTokenizer, AutoModel, get_linear_schedule_with_warmup
tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-small")
tokenizer.save_pretrained("tokenizer")

# Core imports
import torch
import torch.nn as nn
import pandas as pd
import pytorch_lightning as pl
from torch.utils.data import Dataset, DataLoader, WeightedRandomSampler
from torch.optim import AdamW
import torchmetrics  # ✅ Added to support F1 Score & Accuracy metrics

import random
import numpy as np

def set_seed(seed: int = 42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    pl.seed_everything(seed, workers=True)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

set_seed(42)

# Label mapping
label2int = {"sadness": 0, "joy": 1, "love": 2, "anger": 3, "fear": 4, "surprise": 5}
emotions = list(label2int.keys())

# Dataset class
class EmoDataset(Dataset):
    def __init__(self, path):
        self.data = pd.read_parquet(path).dropna(subset=["text", "label"])
        self.data = self.data[self.data["label"].isin(label2int.values())]
    def __getitem__(self, idx):
        row = self.data.iloc[idx]
        return row["text"], row["label"]
    def __len__(self): return len(self.data)

# Tokenization wrapper for batch processing
class TokenizersCollateFn:
    def __init__(self): self.tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-small")
    def __call__(self, batch):
        texts, labels = zip(*batch)
        enc = self.tokenizer(list(texts), padding=True, truncation=True, return_tensors="pt", max_length=512)
        return (enc["input_ids"], enc["attention_mask"]), torch.tensor(labels)

# Emotion classification model using DeBERTa
class EmoModel(nn.Module):
    def __init__(self, n_classes, hidden_size=768, dropout=0.3):
        super().__init__()
        self.base_model = AutoModel.from_pretrained("microsoft/deberta-v3-small")
        self.attention = nn.Sequential(  # Attention pooling to enhance contextual representation
            nn.Linear(hidden_size, hidden_size), nn.Tanh(),
            nn.Linear(hidden_size, 1), nn.Softmax(dim=1)
        )
        self.classifier = nn.Sequential(
            nn.Dropout(dropout),
            nn.Linear(hidden_size, hidden_size), nn.GELU(),
            nn.Dropout(dropout),
            nn.Linear(hidden_size, n_classes)
        )
    def forward(self, inputs):
        input_ids, attn_mask = inputs
        hidden_states = self.base_model(input_ids, attention_mask=attn_mask).last_hidden_state
        weights = self.attention(hidden_states)
        pooled = torch.sum(weights * hidden_states, dim=1)
        return self.classifier(pooled)

# PyTorch Lightning training module
class TrainingModule(pl.LightningModule):
    def __init__(self, hparams):
        super().__init__()
        self.save_hyperparameters(hparams)
        self.model = EmoModel(n_classes=len(emotions))

        # ✅ Added macro-averaged F1 score for imbalanced emotion classes
        self.f1 = torchmetrics.F1Score(task="multiclass", num_classes=len(emotions), average="macro")
        self.acc = torchmetrics.Accuracy(task="multiclass", num_classes=len(emotions))

        # ✅ Added weighted loss to penalize underrepresented classes
        self.weight = self.compute_class_weight(hparams.train_path)
        self.loss = nn.CrossEntropyLoss(weight=self.weight)

    def compute_class_weight(self, path):
        df = pd.read_parquet(path)
        counts = torch.bincount(torch.tensor(df["label"].values))
        weights = 1.0 / counts.float()
        return weights / weights.sum() * len(emotions)

    def forward(self, x): return self.model(x)

    def step(self, batch, name):
        x, y = batch
        y_hat = self(x)
        loss = self.loss(y_hat, y)
        self.log(f"{name}_loss", loss, prog_bar=True)
        self.log(f"{name}_acc", self.acc(y_hat, y), prog_bar=True)
        self.log(f"{name}_f1", self.f1(y_hat, y), prog_bar=True)  # ✅ Log F1 score
        return loss

    def training_step(self, batch, _): return self.step(batch, "train")
    def validation_step(self, batch, _): return self.step(batch, "val")
    def test_step(self, batch, _): return self.step(batch, "test")

    def configure_optimizers(self):
        optimizer = AdamW(self.parameters(), lr=self.hparams.lr)
        scheduler = get_linear_schedule_with_warmup(
            optimizer, self.hparams.warmup_steps, self.trainer.estimated_stepping_batches
        )
        return [optimizer], [{"scheduler": scheduler, "interval": "step"}]

    # ✅ Added oversampling with class-balanced WeightedRandomSampler
    def get_sampler(self, path):
        df = pd.read_parquet(path)
        counts = torch.bincount(torch.tensor(df["label"].values))
        weights = 1.0 / counts.float()
        sample_weights = [weights[l] for l in df["label"].values]
        return WeightedRandomSampler(sample_weights, len(sample_weights), replacement=True)

    def create_loader(self, path, train=True):
        return DataLoader(
            EmoDataset(path),
            batch_size=self.hparams.batch_size,
            sampler=self.get_sampler(path) if train else None,  # ✅ Use oversampling only during training
            shuffle=not train,
            collate_fn=TokenizersCollateFn()
        )
    def train_dataloader(self): return self.create_loader(self.hparams.train_path, True)
    def val_dataloader(self): return self.create_loader(self.hparams.val_path, False)
    def test_dataloader(self): return self.create_loader(self.hparams.test_path, False)

# Training config
from argparse import Namespace
from pytorch_lightning.callbacks import EarlyStopping, TQDMProgressBar

hparams = Namespace(
    train_path="train.parquet",
    val_path="validation.parquet",
    test_path="test.parquet",
    batch_size=32,
    lr=1e-4,
    epochs=20,
    warmup_steps=100
)
module = TrainingModule(hparams)

# ✅ Early stopping callback added to halt training when val_f1 stops improving
early_stop_callback = EarlyStopping(
    monitor="val_f1",
    patience=3,
    mode="max",
    verbose=True
)

# Train & Evaluate
trainer = pl.Trainer(
    accelerator="gpu", devices=1,
    max_epochs=hparams.epochs,
    callbacks=[TQDMProgressBar(refresh_rate=10), early_stop_callback]  # ✅ Plug in early stopping
)
trainer.fit(module)
trainer.test(module)


train-00000-of-00001.parquet:   0%|          | 0.00/1.03M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/129k [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/127k [00:00<?, ?B/s]

INFO:lightning_fabric.utilities.seed:Seed set to 42
INFO:pytorch_lightning.utilities.rank_zero:💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:pytorch_lightning.utilities.rank_zero:Loading `train_dataloader` to estimate number of stepping batches.
INFO:pytorch_lightning.callbacks.model_summary:
  | Name  | Type               | Params | Mode 
-----------------------------------------------------
0 | model | EmoModel           | 142 M  | train
1 | f1    | MulticlassF1Score  | 0      | train
2 | acc   | Mult

Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py:476: Your `val_dataloader`'s sampler has shuffling enabled, it is strongly recommended that you turn shuffling off for val/test dataloaders.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_f1 improved. New best score: 0.850


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_f1 improved by 0.011 >= min_delta = 0.0. New best score: 0.860


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_f1 improved by 0.023 >= min_delta = 0.0. New best score: 0.883


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_f1 improved by 0.004 >= min_delta = 0.0. New best score: 0.887


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_f1 improved by 0.001 >= min_delta = 0.0. New best score: 0.888


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_f1 improved by 0.000 >= min_delta = 0.0. New best score: 0.888


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_f1 improved by 0.011 >= min_delta = 0.0. New best score: 0.899


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Monitored metric val_f1 did not improve in the last 3 records. Best score: 0.899. Signaling Trainer to stop.
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py:476: Your `test_dataloader`'s sampler has shuffling enabled, it is strongly recommended that you turn shuffling off for val/test dataloaders.


Testing: |          | 0/? [00:00<?, ?it/s]

[{'test_loss': 0.4056320786476135,
  'test_acc': 0.9325000047683716,
  'test_f1': 0.872313916683197}]

oversampling and undersampling mixed

In [17]:
# ✅ Install required libraries
!pip install -q huggingface_hub pytorch_lightning torchmetrics

# ✅ Download the dataset from Hugging Face
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="dair-ai/emotion", filename="split/train-00000-of-00001.parquet", repo_type="dataset", local_dir=".")
hf_hub_download(repo_id="dair-ai/emotion", filename="split/test-00000-of-00001.parquet", repo_type="dataset", local_dir=".")
hf_hub_download(repo_id="dair-ai/emotion", filename="split/validation-00000-of-00001.parquet", repo_type="dataset", local_dir=".")

# ✅ Rename to simpler file names
import os
os.rename("split/train-00000-of-00001.parquet", "train.parquet")
os.rename("split/test-00000-of-00001.parquet", "test.parquet")
os.rename("split/validation-00000-of-00001.parquet", "validation.parquet")

# ✅ Tokenizer
from transformers import AutoTokenizer, AutoModel, get_linear_schedule_with_warmup
tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-small")
tokenizer.save_pretrained("tokenizer")

# ✅ Core Libraries
import torch
import torch.nn as nn
import pandas as pd
import pytorch_lightning as pl
from torch.utils.data import Dataset, DataLoader
from torch.optim import AdamW
import torchmetrics

import random
import numpy as np

def set_seed(seed: int = 42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    pl.seed_everything(seed, workers=True)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

set_seed(42)

label2int = {"sadness": 0, "joy": 1, "love": 2, "anger": 3, "fear": 4, "surprise": 5}
emotions = list(label2int.keys())

# # ✅ Oversampling Dataset
# class EmoDataset(Dataset):
#     def __init__(self, path):
#         self.data = pd.read_parquet(path).dropna(subset=["text", "label"])
#         self.data = self.data[self.data["label"].isin(label2int.values())]

#         # Oversample: duplicate minority classes to match max count
#         max_count = self.data["label"].value_counts().max()
#         oversampled = []
#         for label in self.data["label"].unique():
#             class_data = self.data[self.data["label"] == label]
#             repeated = class_data.sample(max_count, replace=True, random_state=42)
#             oversampled.append(repeated)
#         self.data = pd.concat(oversampled).sample(frac=1).reset_index(drop=True)  # Shuffle

#     def __getitem__(self, idx):
#         row = self.data.iloc[idx]
#         return row["text"], row["label"]

#     def __len__(self): return len(self.data)
class EmoDataset(Dataset):
    def __init__(self, path, max_samples_per_class=1000):
        self.data = pd.read_parquet(path).dropna(subset=["text", "label"])
        self.data = self.data[self.data["label"].isin(label2int.values())]

        grouped = []
        for label in self.data["label"].unique():
            class_data = self.data[self.data["label"] == label]

            if len(class_data) > max_samples_per_class:
                # ✅ Undersampling majority class
                sampled = class_data.sample(max_samples_per_class, random_state=42)
            else:
                # ✅ Oversampling minority class
                sampled = class_data.sample(max_samples_per_class, replace=True, random_state=42)

            grouped.append(sampled)

        self.data = pd.concat(grouped).sample(frac=1, random_state=42).reset_index(drop=True)  # Shuffle

    def __getitem__(self, idx):
        row = self.data.iloc[idx]
        return row["text"], row["label"]

    def __len__(self): return len(self.data)


# ✅ Tokenization wrapper
class TokenizersCollateFn:
    def __init__(self): self.tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-small")
    def __call__(self, batch):
        texts, labels = zip(*batch)
        enc = self.tokenizer(list(texts), padding=True, truncation=True, return_tensors="pt", max_length=512)
        return (enc["input_ids"], enc["attention_mask"]), torch.tensor(labels)

# ✅ DeBERTa + Attention Pooling + Classifier
class EmoModel(nn.Module):
    def __init__(self, n_classes, hidden_size=768, dropout=0.3):
        super().__init__()
        self.base_model = AutoModel.from_pretrained("microsoft/deberta-v3-small")
        self.attention = nn.Sequential(
            nn.Linear(hidden_size, hidden_size), nn.Tanh(),
            nn.Linear(hidden_size, 1), nn.Softmax(dim=1)
        )
        self.classifier = nn.Sequential(
            nn.Dropout(dropout),
            nn.Linear(hidden_size, hidden_size), nn.GELU(),
            nn.Dropout(dropout),
            nn.Linear(hidden_size, n_classes)
        )
    def forward(self, inputs):
        input_ids, attn_mask = inputs
        hidden_states = self.base_model(input_ids, attention_mask=attn_mask).last_hidden_state
        weights = self.attention(hidden_states)
        pooled = torch.sum(weights * hidden_states, dim=1)
        return self.classifier(pooled)

# ✅ Lightning training module
class TrainingModule(pl.LightningModule):
    def __init__(self, hparams):
        super().__init__()
        self.save_hyperparameters(hparams)
        self.model = EmoModel(n_classes=len(emotions))
        self.f1 = torchmetrics.F1Score(task="multiclass", num_classes=len(emotions), average="macro")
        self.acc = torchmetrics.Accuracy(task="multiclass", num_classes=len(emotions))
        self.loss = nn.CrossEntropyLoss()  # Optionally add weight if needed

    def forward(self, x): return self.model(x)

    def step(self, batch, name):
        x, y = batch
        y_hat = self(x)
        loss = self.loss(y_hat, y)
        self.log(f"{name}_loss", loss, prog_bar=True)
        self.log(f"{name}_acc", self.acc(y_hat, y), prog_bar=True)
        self.log(f"{name}_f1", self.f1(y_hat, y), prog_bar=True)
        return loss

    def training_step(self, batch, _): return self.step(batch, "train")
    def validation_step(self, batch, _): return self.step(batch, "val")
    def test_step(self, batch, _): return self.step(batch, "test")

    def configure_optimizers(self):
        optimizer = AdamW(self.parameters(), lr=self.hparams.lr)
        scheduler = get_linear_schedule_with_warmup(
            optimizer, self.hparams.warmup_steps, self.trainer.estimated_stepping_batches
        )
        return [optimizer], [{"scheduler": scheduler, "interval": "step"}]

    def create_loader(self, path, train=True):
        return DataLoader(
            EmoDataset(path),
            batch_size=self.hparams.batch_size,
            shuffle=train,  # No sampler now, use shuffle
            collate_fn=TokenizersCollateFn()
        )

    def train_dataloader(self): return self.create_loader(self.hparams.train_path, True)
    def val_dataloader(self): return self.create_loader(self.hparams.val_path, False)
    def test_dataloader(self): return self.create_loader(self.hparams.test_path, False)

# ✅ Training configuration
from argparse import Namespace
from pytorch_lightning.callbacks import EarlyStopping, TQDMProgressBar

hparams = Namespace(
    train_path="train.parquet",
    val_path="validation.parquet",
    test_path="test.parquet",
    batch_size=32,
    lr=1e-4,
    epochs=20,
    warmup_steps=100
)
module = TrainingModule(hparams)

# ✅ EarlyStopping callback
early_stop_callback = EarlyStopping(
    monitor="val_f1", patience=3, mode="max", verbose=True
)

# ✅ Trainer setup
trainer = pl.Trainer(
    accelerator="gpu", devices=1,
    max_epochs=hparams.epochs,
    callbacks=[TQDMProgressBar(refresh_rate=10), early_stop_callback]
)

# ✅ Train and evaluate
trainer.fit(module)
trainer.test(module)

train-00000-of-00001.parquet:   0%|          | 0.00/1.03M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/129k [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/127k [00:00<?, ?B/s]

INFO:lightning_fabric.utilities.seed:Seed set to 42
INFO:pytorch_lightning.utilities.rank_zero:💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:pytorch_lightning.utilities.rank_zero:Loading `train_dataloader` to estimate number of stepping batches.
INFO:pytorch_lightning.callbacks.model_summary:
  | Name  | Type               | Params | Mode 
-----------------------------------------------------
0 | model | EmoModel           | 142 M  | train
1 | f1    | MulticlassF1Score  | 0      | train
2 | acc   | Mult

Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_f1 improved. New best score: 0.888


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_f1 improved by 0.013 >= min_delta = 0.0. New best score: 0.901


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_f1 improved by 0.002 >= min_delta = 0.0. New best score: 0.903


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_f1 improved by 0.001 >= min_delta = 0.0. New best score: 0.904


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_f1 improved by 0.009 >= min_delta = 0.0. New best score: 0.913


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_f1 improved by 0.003 >= min_delta = 0.0. New best score: 0.916


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Monitored metric val_f1 did not improve in the last 3 records. Best score: 0.916. Signaling Trainer to stop.
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Testing: |          | 0/? [00:00<?, ?it/s]

[{'test_loss': 0.3841523826122284,
  'test_acc': 0.9128333330154419,
  'test_f1': 0.9016481637954712}]

In [18]:
# ✅ Install required libraries
!pip install -q huggingface_hub pytorch_lightning torchmetrics

# ✅ Download the dataset from Hugging Face
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="dair-ai/emotion", filename="split/train-00000-of-00001.parquet", repo_type="dataset", local_dir=".")
hf_hub_download(repo_id="dair-ai/emotion", filename="split/test-00000-of-00001.parquet", repo_type="dataset", local_dir=".")
hf_hub_download(repo_id="dair-ai/emotion", filename="split/validation-00000-of-00001.parquet", repo_type="dataset", local_dir=".")

# ✅ Rename to simpler file names
import os
os.rename("split/train-00000-of-00001.parquet", "train.parquet")
os.rename("split/test-00000-of-00001.parquet", "test.parquet")
os.rename("split/validation-00000-of-00001.parquet", "validation.parquet")

# ✅ Tokenizer
from transformers import AutoTokenizer, AutoModel, get_linear_schedule_with_warmup
tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-small")
tokenizer.save_pretrained("tokenizer")

# ✅ Core Libraries
import torch
import torch.nn as nn
import pandas as pd
import pytorch_lightning as pl
from torch.utils.data import Dataset, DataLoader
from torch.optim import AdamW
import torchmetrics

import random
import numpy as np

def set_seed(seed: int = 42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    pl.seed_everything(seed, workers=True)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

set_seed(42)

label2int = {"sadness": 0, "joy": 1, "love": 2, "anger": 3, "fear": 4, "surprise": 5}
emotions = list(label2int.keys())

# # ✅ Oversampling Dataset
# class EmoDataset(Dataset):
#     def __init__(self, path):
#         self.data = pd.read_parquet(path).dropna(subset=["text", "label"])
#         self.data = self.data[self.data["label"].isin(label2int.values())]

#         # Oversample: duplicate minority classes to match max count
#         max_count = self.data["label"].value_counts().max()
#         oversampled = []
#         for label in self.data["label"].unique():
#             class_data = self.data[self.data["label"] == label]
#             repeated = class_data.sample(max_count, replace=True, random_state=42)
#             oversampled.append(repeated)
#         self.data = pd.concat(oversampled).sample(frac=1).reset_index(drop=True)  # Shuffle

#     def __getitem__(self, idx):
#         row = self.data.iloc[idx]
#         return row["text"], row["label"]

#     def __len__(self): return len(self.data)
class EmoDataset(Dataset):
    def __init__(self, path, max_samples_per_class=1000):
        self.data = pd.read_parquet(path).dropna(subset=["text", "label"])
        self.data = self.data[self.data["label"].isin(label2int.values())]

        grouped = []
        for label in self.data["label"].unique():
            class_data = self.data[self.data["label"] == label]

            if len(class_data) > max_samples_per_class:
                # ✅ Undersampling majority class
                sampled = class_data.sample(max_samples_per_class, random_state=42)
            else:
                # ✅ Oversampling minority class
                sampled = class_data.sample(max_samples_per_class, replace=True, random_state=42)

            grouped.append(sampled)

        self.data = pd.concat(grouped).sample(frac=1, random_state=42).reset_index(drop=True)  # Shuffle

    def __getitem__(self, idx):
        row = self.data.iloc[idx]
        return row["text"], row["label"]

    def __len__(self): return len(self.data)


# ✅ Tokenization wrapper
class TokenizersCollateFn:
    def __init__(self): self.tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-small")
    def __call__(self, batch):
        texts, labels = zip(*batch)
        enc = self.tokenizer(list(texts), padding=True, truncation=True, return_tensors="pt", max_length=512)
        return (enc["input_ids"], enc["attention_mask"]), torch.tensor(labels)

# ✅ DeBERTa + Attention Pooling + Classifier
class EmoModel(nn.Module):
    def __init__(self, n_classes, hidden_size=768, dropout=0.3):
        super().__init__()
        self.base_model = AutoModel.from_pretrained("microsoft/deberta-v3-small")
        self.attention = nn.Sequential(
            nn.Linear(hidden_size, hidden_size), nn.Tanh(),
            nn.Linear(hidden_size, 1), nn.Softmax(dim=1)
        )
        self.classifier = nn.Sequential(
            nn.Dropout(dropout),
            nn.Linear(hidden_size, hidden_size), nn.GELU(),
            nn.Dropout(dropout),
            nn.Linear(hidden_size, n_classes)
        )
    def forward(self, inputs):
        input_ids, attn_mask = inputs
        hidden_states = self.base_model(input_ids, attention_mask=attn_mask).last_hidden_state
        weights = self.attention(hidden_states)
        pooled = torch.sum(weights * hidden_states, dim=1)
        return self.classifier(pooled)

# ✅ Lightning training module
class TrainingModule(pl.LightningModule):
    def __init__(self, hparams):
        super().__init__()
        self.save_hyperparameters(hparams)
        self.model = EmoModel(n_classes=len(emotions))
        self.f1 = torchmetrics.F1Score(task="multiclass", num_classes=len(emotions), average="macro")
        self.acc = torchmetrics.Accuracy(task="multiclass", num_classes=len(emotions))
        self.loss = nn.CrossEntropyLoss()  # Optionally add weight if needed

        # ✅ 类别权重损失：使用训练集标签频率
        self.weight = self.compute_class_weight(hparams.train_path)
        self.loss = nn.CrossEntropyLoss(weight=self.weight)

    def compute_class_weight(self, path):
        df = pd.read_parquet(path)
        counts = torch.bincount(torch.tensor(df["label"].values))
        weights = 1.0 / counts.float()
        return weights / weights.sum() * len(emotions)  # Normalize total weight to num_classes


    def forward(self, x): return self.model(x)

    def step(self, batch, name):
        x, y = batch
        y_hat = self(x)
        loss = self.loss(y_hat, y)
        self.log(f"{name}_loss", loss, prog_bar=True)
        self.log(f"{name}_acc", self.acc(y_hat, y), prog_bar=True)
        self.log(f"{name}_f1", self.f1(y_hat, y), prog_bar=True)
        return loss

    def training_step(self, batch, _): return self.step(batch, "train")
    def validation_step(self, batch, _): return self.step(batch, "val")
    def test_step(self, batch, _): return self.step(batch, "test")

    def configure_optimizers(self):
        optimizer = AdamW(self.parameters(), lr=self.hparams.lr)
        scheduler = get_linear_schedule_with_warmup(
            optimizer, self.hparams.warmup_steps, self.trainer.estimated_stepping_batches
        )
        return [optimizer], [{"scheduler": scheduler, "interval": "step"}]

    def create_loader(self, path, train=True):
        return DataLoader(
            EmoDataset(path),
            batch_size=self.hparams.batch_size,
            shuffle=train,
            collate_fn=TokenizersCollateFn()
        )
    def train_dataloader(self): return self.create_loader(self.hparams.train_path, True)
    def val_dataloader(self): return self.create_loader(self.hparams.val_path, False)
    def test_dataloader(self): return self.create_loader(self.hparams.test_path, False)

# ✅ Training configuration
from argparse import Namespace
from pytorch_lightning.callbacks import EarlyStopping, TQDMProgressBar

hparams = Namespace(
    train_path="train.parquet",
    val_path="validation.parquet",
    test_path="test.parquet",
    batch_size=32,
    lr=1e-4,
    epochs=20,
    warmup_steps=100
)
module = TrainingModule(hparams)

# ✅ EarlyStopping callback
early_stop_callback = EarlyStopping(
    monitor="val_f1", patience=3, mode="max", verbose=True
)

# ✅ Trainer setup
trainer = pl.Trainer(
    accelerator="gpu", devices=1,
    max_epochs=hparams.epochs,
    callbacks=[TQDMProgressBar(refresh_rate=10), early_stop_callback]
)

# ✅ Train and evaluate
trainer.fit(module)
trainer.test(module)

train-00000-of-00001.parquet:   0%|          | 0.00/1.03M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/129k [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/127k [00:00<?, ?B/s]

INFO:lightning_fabric.utilities.seed:Seed set to 42
INFO:pytorch_lightning.utilities.rank_zero:💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:pytorch_lightning.utilities.rank_zero:Loading `train_dataloader` to estimate number of stepping batches.
INFO:pytorch_lightning.callbacks.model_summary:
  | Name  | Type               | Params | Mode 
-----------------------------------------------------
0 | model | EmoModel           | 142 M  | train
1 | f1    | MulticlassF1Score  | 0      | train
2 | acc   | Mult

Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_f1 improved. New best score: 0.846


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_f1 improved by 0.015 >= min_delta = 0.0. New best score: 0.861


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_f1 improved by 0.042 >= min_delta = 0.0. New best score: 0.903


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_f1 improved by 0.012 >= min_delta = 0.0. New best score: 0.916


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_f1 improved by 0.000 >= min_delta = 0.0. New best score: 0.916


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_f1 improved by 0.007 >= min_delta = 0.0. New best score: 0.923


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Monitored metric val_f1 did not improve in the last 3 records. Best score: 0.923. Signaling Trainer to stop.
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Testing: |          | 0/? [00:00<?, ?it/s]

[{'test_loss': 0.3123108446598053,
  'test_acc': 0.8993333578109741,
  'test_f1': 0.8842878341674805}]

Tong's improvement with f1 score and early stop

In [10]:
!pip install -q huggingface_hub pytorch_lightning torchmetrics

from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="dair-ai/emotion", filename="split/train-00000-of-00001.parquet", repo_type="dataset", local_dir=".")
hf_hub_download(repo_id="dair-ai/emotion", filename="split/test-00000-of-00001.parquet", repo_type="dataset", local_dir=".")
hf_hub_download(repo_id="dair-ai/emotion", filename="split/validation-00000-of-00001.parquet", repo_type="dataset", local_dir=".")

import os
os.rename("split/train-00000-of-00001.parquet", "train.parquet")
os.rename("split/test-00000-of-00001.parquet", "test.parquet")
os.rename("split/validation-00000-of-00001.parquet", "validation.parquet")

from transformers import AutoTokenizer, AutoModel, get_linear_schedule_with_warmup
tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-small")
tokenizer.save_pretrained("tokenizer")

import torch
import torch.nn as nn
import pandas as pd
import pytorch_lightning as pl
from torch.utils.data import Dataset, DataLoader
from torch.optim import AdamW
import torchmetrics

import random
import numpy as np

def set_seed(seed: int = 42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    pl.seed_everything(seed, workers=True)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

set_seed(42)


label2int = {"sadness": 0, "joy": 1, "love": 2, "anger": 3, "fear": 4, "surprise": 5}
emotions = list(label2int.keys())

class EmoDataset(Dataset):
    def __init__(self, path):
        self.data = pd.read_parquet(path).dropna(subset=["text", "label"])
        self.data = self.data[self.data["label"].isin(label2int.values())]
    def __getitem__(self, idx):
        row = self.data.iloc[idx]
        return row["text"], row["label"]
    def __len__(self): return len(self.data)

class TokenizersCollateFn:
    def __init__(self): self.tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-small")
    def __call__(self, batch):
        texts, labels = zip(*batch)
        enc = self.tokenizer(list(texts), padding=True, truncation=True, return_tensors="pt", max_length=512)
        return (enc["input_ids"], enc["attention_mask"]), torch.tensor(labels)

class EmoModel(nn.Module):
    def __init__(self, n_classes, hidden_size=768, dropout=0.3):
        super().__init__()
        self.base_model = AutoModel.from_pretrained("microsoft/deberta-v3-small")
        self.attention = nn.Sequential(
            nn.Linear(hidden_size, hidden_size), nn.Tanh(),
            nn.Linear(hidden_size, 1), nn.Softmax(dim=1)
        )
        self.classifier = nn.Sequential(
            nn.Dropout(dropout),
            nn.Linear(hidden_size, hidden_size), nn.GELU(),
            nn.Dropout(dropout),
            nn.Linear(hidden_size, n_classes)
        )
    def forward(self, inputs):
        input_ids, attn_mask = inputs
        hidden_states = self.base_model(input_ids, attention_mask=attn_mask).last_hidden_state
        weights = self.attention(hidden_states)
        pooled = torch.sum(weights * hidden_states, dim=1)
        return self.classifier(pooled)

class TrainingModule(pl.LightningModule):
    def __init__(self, hparams):
        super().__init__()
        self.save_hyperparameters(hparams)
        self.model = EmoModel(n_classes=len(emotions))
        self.f1 = torchmetrics.F1Score(task="multiclass", num_classes=len(emotions), average="macro")
        self.acc = torchmetrics.Accuracy(task="multiclass", num_classes=len(emotions))
        self.loss = nn.CrossEntropyLoss()

    def forward(self, x): return self.model(x)

    def step(self, batch, name):
        x, y = batch
        y_hat = self(x)
        loss = self.loss(y_hat, y)
        self.log(f"{name}_loss", loss, prog_bar=True)
        self.log(f"{name}_acc", self.acc(y_hat, y), prog_bar=True)
        self.log(f"{name}_f1", self.f1(y_hat, y), prog_bar=True)
        return loss

    def training_step(self, batch, _): return self.step(batch, "train")
    def validation_step(self, batch, _): return self.step(batch, "val")
    def test_step(self, batch, _): return self.step(batch, "test")

    def configure_optimizers(self):
        optimizer = AdamW(self.parameters(), lr=self.hparams.lr)
        scheduler = get_linear_schedule_with_warmup(optimizer, self.hparams.warmup_steps, self.trainer.estimated_stepping_batches)
        return [optimizer], [{"scheduler": scheduler, "interval": "step"}]

    def create_loader(self, path, train=True):
        return DataLoader(
            EmoDataset(path),
            batch_size=self.hparams.batch_size,
            shuffle=train,
            collate_fn=TokenizersCollateFn()
        )
    def train_dataloader(self): return self.create_loader(self.hparams.train_path, True)
    def val_dataloader(self): return self.create_loader(self.hparams.val_path, False)
    def test_dataloader(self): return self.create_loader(self.hparams.test_path, False)

# Setup
from argparse import Namespace
from pytorch_lightning.callbacks import EarlyStopping, TQDMProgressBar
hparams = Namespace(
    train_path="train.parquet",
    val_path="validation.parquet",
    test_path="test.parquet",
    batch_size=32,
    lr=1e-4,
    epochs=20,
    warmup_steps=100
)
module = TrainingModule(hparams)

early_stop_callback = EarlyStopping(
    monitor="val_f1",
    patience=3,
    mode="max",
    verbose=True
)

trainer = pl.Trainer(
    accelerator="gpu", devices=1,
    max_epochs=hparams.epochs,
    callbacks=[TQDMProgressBar(refresh_rate=10), early_stop_callback]
)
trainer.fit(module)
trainer.test(module)


train-00000-of-00001.parquet:   0%|          | 0.00/1.03M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/129k [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/127k [00:00<?, ?B/s]

INFO:lightning_fabric.utilities.seed:Seed set to 42
INFO:pytorch_lightning.utilities.rank_zero:💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:pytorch_lightning.utilities.rank_zero:Loading `train_dataloader` to estimate number of stepping batches.
INFO:pytorch_lightning.callbacks.model_summary:
  | Name  | Type               | Params | Mode 
-----------------------------------------------------
0 | model | EmoModel           | 142 M  | train
1 | f1    | MulticlassF1Score  | 0      | train
2 | acc   | Mult

Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_f1 improved. New best score: 0.880


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_f1 improved by 0.010 >= min_delta = 0.0. New best score: 0.890


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_f1 improved by 0.000 >= min_delta = 0.0. New best score: 0.890


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_f1 improved by 0.013 >= min_delta = 0.0. New best score: 0.904


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Monitored metric val_f1 did not improve in the last 3 records. Best score: 0.904. Signaling Trainer to stop.
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Testing: |          | 0/? [00:00<?, ?it/s]

[{'test_loss': 0.16681009531021118,
  'test_acc': 0.9330000281333923,
  'test_f1': 0.8772312998771667}]