Vitor Domingos Baldoino dos Santos</br>
Universidade Presbiteriana Mackenzie</br>
Faculdade de Computação e Informática</br>
[vdbaldoino@gmail.com](mailto:vdbaldoino@gmail.com)</br>

Dataset: [Portuguese Tweets for Sentiment Analysis](https://www.kaggle.com/datasets/augustop/portuguese-tweets-for-sentiment-analysis)

Recursos para problemas que tive:

- Não estava conseguindo implementar tudo que queria apenas utilizando a biblioteca da HuggingFace, então pensei em fazer o fine-tuning apenas com o PyTorch.
    - Para uma abordagem usando apenas o PyTorch, cheguei no link: [BERT Fine-Tuning Tutorial with PyTorch · Chris McCormick](https://mccormickml.com/2019/07/22/BERT-fine-tuning/)
    - [Training with PyTorch](https://pytorch.org/tutorials/beginner/introyt/trainingyt.html)
    - [Hyperparameter tuning with Ray Tune](https://pytorch.org/tutorials/beginner/hyperparameter_tuning_tutorial.html)
    - [Saving and loading a general checkpoint in PyTorch](https://pytorch.org/tutorials/recipes/recipes/saving_and_loading_a_general_checkpoint.html)
    - Para uma abordagem hibrída (HuggingFace + Training Loop do PyTorch), cheguei nessa documentação: [Fine-tune a pretrained model](https://huggingface.co/docs/transformers/training)
    - Publicação de exemplo: [análise de sentimentos em português utilizando Pytorch e Python](https://medium.com/data-hackers/an%C3%A1lise-de-sentimentos-em-portugu%C3%AAs-utilizando-pytorch-e-python-91a232165ec0)
- Para realizar a busca de hiperprâmetros no modelo achei a documentação abaixo, mas não consegui utilizar porque há um bug na integração com o `Ray Tune`.
    - [Hyperparameter Search with Transformers and Ray Tune](https://huggingface.co/blog/ray-tune)
- O link abaixxo é um notebook de exemplo para realizar a classificação de texto utilizando apenas as ferramentas da HuggingFace. Quase tudo nesse notebook foi tirado daqui. O segundo link é o tutorial de como monitorar o treinamento com o TensorBoard.
    - [Text Classification on GLUE using `Trainer`](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification.ipynb#scrollTo=8sgjdLKcIrJm)
    - [BERT Finetuning with Hugging Face and Training Visualizations with TensorBoard](https://medium.com/nlplanet/bert-finetuning-with-hugging-face-and-training-visualizations-with-tensorboard-46368a57fc97)
- Em algum momento percebi que o biblioteca da HuggingFace não calcula as métricas de performance do modelo no dataset de treino, impedindo a detecção de um possível overfitting. Para lidar com isso eu cheguei nos links abaixo:
    - [How to tweak `Trainer` to monitor other metrics on the training set](https://discuss.huggingface.co/t/metrics-for-training-set-in-trainer/2461/3)
    - [Batch and Epoch training metrics for transformers `Trainer`](https://stackoverflow.com/questions/78311534/batch-and-epoch-training-metrics-for-transformers-trainer/78311535#78311535)

- [Performance tips for training](https://huggingface.co/docs/transformers/v4.18.0/en/performance)

## Configurações

In [1]:
%%shell
pip install -q transformers==4.39.3
pip install -q datasets==2.18.0
pip install -q torch==2.2.1
pip install -q ray[tune]==2.12.0
pip install -q scikit-learn==1.4.2
# pip install -q optuna
# pip install -q hyperopt



In [2]:
import os
import torch
import logging
import numpy as np
import pandas as pd

from copy import deepcopy

from datasets import load_from_disk
from datasets import DatasetDict

from sklearn.metrics import precision_recall_fscore_support
from sklearn.metrics import accuracy_score

from torch.optim import AdamW
from torch.utils.data import DataLoader
from torch.utils.data import Dataset

from tqdm.auto import tqdm

from transformers import AutoModelForSequenceClassification
from transformers import AutoTokenizer
from transformers import DataCollatorWithPadding
from transformers import Trainer
from transformers import TrainingArguments
from transformers import get_scheduler

In [3]:
from google.colab import drive

drive.mount('/content/drive')
os.chdir('/content/drive/MyDrive/sentiment-analysis/')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [4]:
SEED = 42
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
BATCH_SIZE = 128
NUM_LABELS = 3
MAX_LENGTH = 128
TASK = "sentiment-analysis"
MODEL_NAME = "bertimbau"

ID2LABEL = {0: "Neutro", 1: "Positivo", 2: "Negativo"}
LABEL2ID = {"Neutro": 0, "Positivo": 1, "Negativo": 2}
MODEL_CHECKPOINT = "neuralmind/bert-base-portuguese-cased"

OUTPUT_DIR = f"models/{MODEL_NAME}-finetuned-{TASK}"

TOKENIZER = AutoTokenizer.from_pretrained(MODEL_CHECKPOINT, use_fast=True)
MODEL = AutoModelForSequenceClassification.from_pretrained(
    MODEL_CHECKPOINT, num_labels=NUM_LABELS, id2label=ID2LABEL, label2id=LABEL2ID
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at neuralmind/bert-base-portuguese-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [5]:
def tokenize_function(examples: DatasetDict):
    return TOKENIZER(
        examples["text"], padding="max_length", max_length=MAX_LENGTH, truncation=True
    )

In [6]:
def compute_metrics(predictions, references):

    predictions = predictions.detach().cpu().numpy().tolist()
    references = references.detach().cpu().numpy().tolist()

    accuracy = accuracy_score(references, predictions)
    precision, recall, f1, _ = precision_recall_fscore_support(
        references, predictions, average="macro", zero_division=0
    )

    return {
        f"accuracy": accuracy,
        f"f1": f1,
        f"precision": precision,
        f"recall": recall,
    }


In [7]:
def training_step(train_dataloader, optimizer, lr_scheduler, progress_bar, epoch) -> dict:

    prefix = "train"
    train_metrics = {
        "mode": [],
        "step": [],
        "epoch": [],
        "loss": [],
        "accuracy": [],
        "f1": [],
        "precision": [],
        "recall": [],
    }

    for idx, batch in enumerate(train_dataloader):

        batch = {k: v.to(DEVICE) for k, v in batch.items()}
        outputs = MODEL(**batch)
        loss = outputs.loss
        loss.backward()

        # "Logging" training metrics
        logits = outputs.logits
        predictions = torch.argmax(logits, dim=-1)
        references = batch["labels"]

        metrics = compute_metrics(predictions, references)

        train_metrics["mode"].append(prefix)
        train_metrics["step"].append(idx)
        train_metrics["epoch"].append(epoch)
        train_metrics["loss"].append(loss.detach().cpu().numpy().tolist())
        train_metrics["accuracy"].append(metrics["accuracy"])
        train_metrics["f1"].append(metrics["f1"])
        train_metrics["precision"].append(metrics["precision"])
        train_metrics["recall"].append(metrics["recall"])

        optimizer.step()
        lr_scheduler.step()
        optimizer.zero_grad()
        progress_bar.update(1)

    return train_metrics

In [8]:
def evaluation_step(eval_dataloader: DataLoader, epoch: int) -> dict:

    prefix = "eval"
    eval_metrics = {
        "mode": [],
        "step": [],
        "epoch": [],
        "loss": [],
        "accuracy": [],
        "f1": [],
        "precision": [],
        "recall": [],
    }

    for idx, batch in enumerate(eval_dataloader):
        batch = {k: v.to(DEVICE) for k, v in batch.items()}
        with torch.no_grad():
            outputs = MODEL(**batch)

        loss = outputs.loss
        logits = outputs.logits
        predictions = torch.argmax(logits, dim=-1)
        references = batch["labels"]

        # "Logging" evaluation metrics
        metrics = compute_metrics(predictions, references)

        eval_metrics["mode"].append(prefix)
        eval_metrics["step"].append(idx)
        eval_metrics["epoch"].append(epoch)
        eval_metrics["loss"].append(loss.detach().cpu().numpy().tolist())
        eval_metrics["accuracy"].append(metrics["accuracy"])
        eval_metrics["f1"].append(metrics["f1"])
        eval_metrics["precision"].append(metrics["precision"])
        eval_metrics["recall"].append(metrics["recall"])

    return eval_metrics

In [9]:
def load_data(data_path):

    ds = load_from_disk(data_path)
    ds = ds.map(tokenize_function, batched=True)
    ds = ds.remove_columns(["text"])
    ds.set_format("torch")

    train_dataloader = DataLoader(ds["train"], shuffle=True, batch_size=BATCH_SIZE)
    eval_dataloader = DataLoader(ds["dev"], shuffle=False, batch_size=BATCH_SIZE)

    return train_dataloader, eval_dataloader


In [10]:
def train(lr, wd, num_epochs, data_path):

    training_set, evaluate_set = load_data(data_path)
    optimizer = AdamW(MODEL.parameters(), lr=lr, weight_decay=wd)
    num_training_steps = num_epochs * len(training_set)
    lr_scheduler = get_scheduler(
        name="linear",
        optimizer=optimizer,
        num_warmup_steps=0,
        num_training_steps=num_training_steps,
    )

    progress_bar = tqdm(range(num_training_steps))
    best_model_metric = 0
    logging_df = None

    MODEL.to(DEVICE)
    for epoch in range(num_epochs):
        MODEL.train()
        train_metrics = training_step(training_set, optimizer, lr_scheduler, progress_bar, epoch)
        train_metrics = pd.DataFrame(train_metrics)
        print(train_metrics)


        MODEL.eval()
        eval_metrics = evaluation_step(evaluate_set, epoch)
        eval_metrics = pd.DataFrame(eval_metrics)
        print(eval_metrics)

        logging_df = pd.concat(
            [logging_df, train_metrics, eval_metrics], axis=0, ignore_index=True
        )
        logging_df.to_csv(f"{OUTPUT_DIR}/training-log.csv", index=False)



        eval_f1_score = sum(eval_metrics["f1"]) / len(eval_metrics["f1"])
        if eval_f1_score > best_model_metric:
            best_model_metric = eval_f1_score

            MODEL.save_pretrained(f"{OUTPUT_DIR}/hugging-face-save")
            torch.save(
                {
                    "epoch": epoch,
                    "model_state_dict": MODEL.state_dict(),
                    "optimizer_state_dict": optimizer.state_dict(),
                    "loss": train_metrics["loss"],
                },
                f"{OUTPUT_DIR}/pytorch-save",
            )


In [11]:
train(lr=5e-5,
      wd=0.01,
      num_epochs=5,
      data_path="/content/drive/MyDrive/sentiment-analysis/data/intermediate/without-emoticons")


  0%|          | 0/24630 [00:00<?, ?it/s]

       mode  step  epoch      loss  accuracy        f1  precision    recall
0     train     0      0  1.124521  0.242188  0.223854   0.413375  0.353726
1     train     1      0  0.992461  0.593750  0.249589   0.199475  0.333333
2     train     2      0  0.910455  0.585938  0.246305   0.195312  0.333333
3     train     3      0  0.845725  0.625000  0.256410   0.208333  0.333333
4     train     4      0  0.847026  0.609375  0.252427   0.203125  0.333333
...     ...   ...    ...       ...       ...       ...        ...       ...
4921  train  4921      0  0.470691  0.750000  0.796500   0.826031  0.782517
4922  train  4922      0  0.421554  0.828125  0.838869   0.867305  0.816760
4923  train  4923      0  0.414202  0.843750  0.832423   0.854637  0.825785
4924  train  4924      0  0.434105  0.757812  0.830323   0.830787  0.829927
4925  train  4925      0  0.309498  0.839506  0.857955   0.867284  0.851378

[4926 rows x 8 columns]
      mode  step  epoch      loss  accuracy        f1  precisio