## RoberTuito

This Notebook contains our Experimentation using the [RoberTuito Model](https://arxiv.org/abs/2111.09453). This model is an adaptation of the BERT Transformer model pre-trained on Spanish Tweet Data. 
The results obtained using the model were not the expected ones, that is why we proposed the HateStack

In [1]:
import os
import numpy as np
import lightning as L

from box import Box

from robertuito import evaluate, predict, train_model
from robertuito import create_folds, import_data
from pytictoc import TicToc

L.seed_everything(42, workers=True)

# solving forking issues
os.environ["TOKENIZERS_PARALLELISM"] = "false"


Global seed set to 42


### Model Definition

In [2]:
LABELS = [
    "Odio",
    "Mujeres",
    "Comunidad LGBTQ+",
    "Comunidades Migrantes",
    "Pueblos Originarios",
]

TRAIN_PATH = "public_data/tweets_train.csv"
TEST_PATH = "public_data/tweets_test.csv"
train, test = import_data(TRAIN_PATH, TEST_PATH)
train = create_folds(train, LABELS)


In [3]:
dm_config = Box(
    dict(
        train=train,
        test=test,
        labels=LABELS,
        batch_size=32,
        tokenizer="pysentimiento/robertuito-hate-speech",
    )
)

model_config = Box(
    dict(
        model_name="pysentimiento/robertuito-hate-speech",
        dropout=0.2,
        hidden_size=768,
        n_labels=len(LABELS),
        train_size=56,
        batch_size=dm_config.batch_size,
        warmup=0.2,
        w_decay=0.001,
        lr=3e-4,
    )
)

training_config = Box(
    dict(max_epochs=10, patience=10, fast_dev_run=False, overfit_batches=0)
)


### Model Training


In [4]:
score = []
precision_score = []
recall_score = []
preds = []
t = TicToc()
t.tic()
for fold in range(5):
    trainer, model, dm = train_model([fold], dm_config, model_config, training_config)
    f1, precision, recall = evaluate(trainer, model, dm, threshold=0.5, custom=True)
    preds.append(predict(trainer, model, dm, validation=False))
    print(f"Fold {fold} F1: {f1}")
    print(f"Fold {fold} Precision: {precision}")
    print(f"Fold {fold} Recall: {recall}")
    score.append(f1)
    precision_score.append(precision)
    recall_score.append(recall)

t.toc("RoberTuito 5 folds training time: ")


Some weights of the model checkpoint at pysentimiento/robertuito-hate-speech were not used when initializing RobertaModel: ['classifier.out_proj.bias', 'classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at pysentimiento/robertuito-hate-speech and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inf

Sanity Checking: 0it [00:00, ?it/s]

Training: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Epoch 0, global step 56: 'val_loss' reached 0.25795 (best 0.25795), saving model to '/home/alfonso/Documents/hate_speech/checkpoints/epoch=0-step=56-v51.ckpt' as top 1


Validation: 0it [00:00, ?it/s]

Epoch 1, global step 112: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 2, global step 168: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 3, global step 224: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 4, global step 280: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 5, global step 336: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 6, global step 392: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 7, global step 448: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 8, global step 504: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 9, global step 560: 'val_loss' was not in top 1
`Trainer.fit` stopped: `max_epochs=10` reached.
You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Predicting: 0it [00:00, ?it/s]

  _warn_prf(average, modifier, msg_start, len(result))
You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Predicting: 0it [00:00, ?it/s]

Fold 0 F1: 0.37222222222222223
Fold 0 Precision: 0.11858407079646019
Fold 0 Recall: 0.2


Some weights of the model checkpoint at pysentimiento/robertuito-hate-speech were not used when initializing RobertaModel: ['classifier.out_proj.bias', 'classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at pysentimiento/robertuito-hate-speech and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inf

Sanity Checking: 0it [00:00, ?it/s]

Training: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Epoch 0, global step 56: 'val_loss' reached 0.27473 (best 0.27473), saving model to '/home/alfonso/Documents/hate_speech/checkpoints/epoch=0-step=56-v52.ckpt' as top 1


Validation: 0it [00:00, ?it/s]

Epoch 1, global step 112: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 2, global step 168: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 3, global step 224: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 4, global step 280: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 5, global step 336: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 6, global step 392: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 7, global step 448: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 8, global step 504: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 9, global step 560: 'val_loss' was not in top 1
`Trainer.fit` stopped: `max_epochs=10` reached.
You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Predicting: 0it [00:00, ?it/s]

  _warn_prf(average, modifier, msg_start, len(result))
You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Predicting: 0it [00:00, ?it/s]

Fold 1 F1: 0.3529411764705882
Fold 1 Precision: 0.10909090909090909
Fold 1 Recall: 0.2


Some weights of the model checkpoint at pysentimiento/robertuito-hate-speech were not used when initializing RobertaModel: ['classifier.out_proj.bias', 'classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at pysentimiento/robertuito-hate-speech and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inf

Sanity Checking: 0it [00:00, ?it/s]

Training: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Epoch 0, global step 56: 'val_loss' reached 0.22580 (best 0.22580), saving model to '/home/alfonso/Documents/hate_speech/checkpoints/epoch=0-step=56-v53.ckpt' as top 1


Validation: 0it [00:00, ?it/s]

Epoch 1, global step 112: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 2, global step 168: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 3, global step 224: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 4, global step 280: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 5, global step 336: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 6, global step 392: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 7, global step 448: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 8, global step 504: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 9, global step 560: 'val_loss' was not in top 1
`Trainer.fit` stopped: `max_epochs=10` reached.
You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Predicting: 0it [00:00, ?it/s]

  _warn_prf(average, modifier, msg_start, len(result))
You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Predicting: 0it [00:00, ?it/s]

Fold 2 F1: 0.3510791366906475
Fold 2 Precision: 0.1082039911308204
Fold 2 Recall: 0.2


Some weights of the model checkpoint at pysentimiento/robertuito-hate-speech were not used when initializing RobertaModel: ['classifier.out_proj.bias', 'classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at pysentimiento/robertuito-hate-speech and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inf

Sanity Checking: 0it [00:00, ?it/s]

Training: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Epoch 0, global step 56: 'val_loss' reached 0.24127 (best 0.24127), saving model to '/home/alfonso/Documents/hate_speech/checkpoints/epoch=0-step=56-v54.ckpt' as top 1


Validation: 0it [00:00, ?it/s]

Epoch 1, global step 112: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 2, global step 168: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 3, global step 224: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 4, global step 280: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 5, global step 336: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 6, global step 392: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 7, global step 448: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 8, global step 504: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 9, global step 560: 'val_loss' was not in top 1
`Trainer.fit` stopped: `max_epochs=10` reached.
You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Predicting: 0it [00:00, ?it/s]

  _warn_prf(average, modifier, msg_start, len(result))
You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Predicting: 0it [00:00, ?it/s]

Fold 3 F1: 0.3888888888888889
Fold 3 Precision: 0.12727272727272726
Fold 3 Recall: 0.2


Some weights of the model checkpoint at pysentimiento/robertuito-hate-speech were not used when initializing RobertaModel: ['classifier.out_proj.bias', 'classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at pysentimiento/robertuito-hate-speech and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inf

Sanity Checking: 0it [00:00, ?it/s]

Training: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Epoch 0, global step 56: 'val_loss' reached 0.28408 (best 0.28408), saving model to '/home/alfonso/Documents/hate_speech/checkpoints/epoch=0-step=56-v55.ckpt' as top 1


Validation: 0it [00:00, ?it/s]

Epoch 1, global step 112: 'val_loss' reached 0.24491 (best 0.24491), saving model to '/home/alfonso/Documents/hate_speech/checkpoints/epoch=1-step=112-v10.ckpt' as top 1


Validation: 0it [00:00, ?it/s]

Epoch 2, global step 168: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 3, global step 224: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 4, global step 280: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 5, global step 336: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 6, global step 392: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 7, global step 448: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 8, global step 504: 'val_loss' was not in top 1


Validation: 0it [00:00, ?it/s]

Epoch 9, global step 560: 'val_loss' was not in top 1
`Trainer.fit` stopped: `max_epochs=10` reached.
You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Predicting: 0it [00:00, ?it/s]

  _warn_prf(average, modifier, msg_start, len(result))
You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Predicting: 0it [00:00, ?it/s]

Fold 4 F1: 0.0
Fold 4 Precision: 0.0
Fold 4 Recall: 0.0
RoberTuito 5 folds training time:  6482.216791 seconds.


## Model Evaluation

In this case we used 5-Fold Cross Validation and a Blending using out-of-samples predictions.

In [5]:
print(f"Mean 5 Fold CV Score: {np.mean(score)}")
print(f"Std 5 Fold CV Score: {np.std(score)}")
print(f"Mean 5 Fold CV Precision Score: {np.mean(precision_score)}")
print(f"Std 5 Fold CV Precision Score: {np.std(precision_score)}")
print(f"Mean 5 Fold CV Recall Score: {np.mean(recall_score)}")
print(f"Std 5 Fold CV Recall Score: {np.std(recall_score)}")


Mean 5 Fold CV Score: 0.29302628485446935
Std 5 Fold CV Score: 0.1471638317027075
Mean 5 Fold CV Precision Score: 0.09263033965818339
Std 5 Fold CV Precision Score: 0.04683494415725385
Mean 5 Fold CV Recall Score: 0.16
Std 5 Fold CV Recall Score: 0.08


In [6]:
from sklearn.metrics import f1_score, precision_score, recall_score
from robertuito.utils import f1_custom

data = np.zeros((2291, 5))
for i in range(5):
    data += preds[i]

data = data / 5
y_pred = np.where(data > 0.5, 1, 0)
f1 = f1_custom(test[LABELS].values, y_pred)
pr = precision_score(test[LABELS].values, y_pred, average="macro", zero_division=0)
rc = recall_score(test[LABELS].values, y_pred, average="macro", zero_division=0)
print(f"Test F1: {f1}")
print(f"Test Precision: {pr}")
print(f"Test Recall: {rc}")


Test F1: 0.3195723195723196
Test Precision: 0.09393278044522044
Test Recall: 0.2
