# Fine-tuning Roberta for emotion classification

[Source](https://github.com/bentoml/gallery/blob/main/transformers/roberta_text_classification).
Try it out on [Colab](https://colab.research.google.com/github/bentoml/gallery/blob/main/transformers/roberta_text_classification/fine_tune_roberta.ipynb).

Install required dependencies:

In [None]:
!pip install -r requirements.txt

## Fine-tune a Roberta model

First let's define our [dataset](https://huggingface.co/datasets/sentiment140/viewer/sentiment140/test) using [huggingface/datasets](https://github.com/huggingface/datasets)

In [26]:
import bentoml
import transformers

from config import MODEL_NAME, MODEL
from datasets.load import load_dataset
from datasets import ClassLabel, Value

transformers.set_seed(420)

In [29]:
emotion = load_dataset('emotion')
emotion

Downloading:   0%|          | 0.00/1.66k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.61k [00:00<?, ?B/s]



Downloading and preparing dataset emotion/default (download: 1.97 MiB, generated: 2.07 MiB, post-processed: Unknown size, total: 4.05 MiB) to /Users/aarnphm/.cache/huggingface/datasets/emotion/default/0.0.0/348f63ca8e27b3713b6c04d723efe6d824a56fb3d1449794716c0f0296072705...


Downloading: 0.00B [00:00, ?B/s]

Downloading: 0.00B [00:00, ?B/s]

Downloading: 0.00B [00:00, ?B/s]

0 examples [00:00, ? examples/s]

0 examples [00:00, ? examples/s]

0 examples [00:00, ? examples/s]

Dataset emotion downloaded and prepared to /Users/aarnphm/.cache/huggingface/datasets/emotion/default/0.0.0/348f63ca8e27b3713b6c04d723efe6d824a56fb3d1449794716c0f0296072705. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 16000
    })
    validation: Dataset({
        features: ['text', 'label'],
        num_rows: 2000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 2000
    })
})

In [7]:
model, tokenizer = bentoml.transformers.load(MODEL_NAME)

In [45]:
NUM_LABELS = 6
NUM_EPOCHS = 1
NUM_EXAMPLES = 400
BATCH_SIZE = 64

NUM_TRAIN_EPOCHS = 1
LR = 2e-5
WDECAY = 0.01

In [30]:
def preprocess_function(examples):
    return tokenizer(examples['text'], truncation=True, padding=True)

In [32]:
tokenized_emotion = emotion.map(preprocess_function, batched=True, batch_size=None)

  0%|          | 0/1 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

In [34]:
collator = transformers.DataCollatorWithPadding(tokenizer=tokenizer)

In [35]:
from sklearn.metrics import precision_recall_fscore_support, accuracy_score

def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='macro')
    acc = accuracy_score(labels, preds)
    return {
        'accuracy': acc,
        'f1': f1,
        'precision': precision,
        'recall': recall
    }

In [36]:
updated_model_head = transformers.AutoModelForSequenceClassification.from_pretrained(MODEL, num_labels=NUM_LABELS, ignore_mismatched_sizes=True)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at j-hartmann/emotion-english-distilroberta-base and are newly initialized because the shapes did not match:
- classifier.out_proj.weight: found shape torch.Size([7, 768]) in the checkpoint and torch.Size([6, 768]) in the model instantiated
- classifier.out_proj.bias: found shape torch.Size([7]) in the checkpoint and torch.Size([6]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [37]:
tokenized_emotion['train'].features

{'text': Value(dtype='string', id=None),
 'label': ClassLabel(num_classes=6, names=['sadness', 'joy', 'love', 'anger', 'fear', 'surprise'], names_file=None, id=None),
 'input_ids': Sequence(feature=Value(dtype='int32', id=None), length=-1, id=None),
 'attention_mask': Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None)}

In [42]:
tokenized_emotion.set_format("torch", columns=['input_ids', 'attention_mask', 'label'])
tokenized_emotion['train'].features

{'text': Value(dtype='string', id=None),
 'label': ClassLabel(num_classes=6, names=['sadness', 'joy', 'love', 'anger', 'fear', 'surprise'], names_file=None, id=None),
 'input_ids': Sequence(feature=Value(dtype='int32', id=None), length=-1, id=None),
 'attention_mask': Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None)}

In [53]:
logging_steps = len(tokenized_emotion['train'])//BATCH_SIZE
training_args = transformers.TrainingArguments(output_dir='results',
                                            num_train_epochs=NUM_TRAIN_EPOCHS,
                                            learning_rate = LR,
                                            per_device_train_batch_size=BATCH_SIZE,
                                            per_device_eval_batch_size=BATCH_SIZE,
                                            load_best_model_at_end=True,
                                            metric_for_best_model='f1',
                                            weight_decay=WDECAY,
                                            save_strategy='epoch',
                                            evaluation_strategy='epoch',
                                            disable_tqdm=False)

PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


In [48]:
trainer = transformers.Trainer(model=updated_model_head, 
                               args=training_args,
                               compute_metrics=compute_metrics,
                               train_dataset=tokenized_emotion['train'],
                               eval_dataset=tokenized_emotion['validation'])
trainer.train()

The following columns in the training set  don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 16000
  Num Epochs = 5
  Instantaneous batch size per device = 64
  Total train batch size (w. parallel, distributed & accumulation) = 64
  Gradient Accumulation steps = 1
  Total optimization steps = 1250


Step,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
125,No log,0.280883,0.9045,0.869116,0.862759,0.879081


KeyboardInterrupt: 

In [49]:
results = trainer.evaluate()
results

The following columns in the evaluation set  don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 2000
  Batch size = 64


{'eval_loss': 0.28088268637657166,
 'eval_accuracy': 0.9045,
 'eval_f1': 0.8691157729199627,
 'eval_precision': 0.8627589064134121,
 'eval_recall': 0.879081087747385}

In [55]:
preds_output = trainer.predict(tokenized_emotion["validation"])
preds_output.metrics

The following columns in the test set  don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Prediction *****
  Num examples = 2000
  Batch size = 64


{'test_loss': 0.28088268637657166,
 'test_accuracy': 0.9045,
 'test_f1': 0.8691157729199627,
 'test_precision': 0.8627589064134121,
 'test_recall': 0.879081087747385,
 'test_runtime': 76.4113,
 'test_samples_per_second': 26.174,
 'test_steps_per_second': 0.419}

In [57]:
metadata = results.update({"transfer-learning": True})
tag = bentoml.transformers.save(MODEL_NAME, updated_model_head, tokenizer=tokenizer, metadata=metadata)

Configuration saved in /var/folders/b1/90qqtv1n53v9tdl15l0phky40000gn/T/tmp0x4wbh0nbentoml_model_roberta_text_classification/config.json
Model weights saved in /var/folders/b1/90qqtv1n53v9tdl15l0phky40000gn/T/tmp0x4wbh0nbentoml_model_roberta_text_classification/pytorch_model.bin
tokenizer config file saved in /var/folders/b1/90qqtv1n53v9tdl15l0phky40000gn/T/tmp0x4wbh0nbentoml_model_roberta_text_classification/tokenizer_config.json
Special tokens file saved in /var/folders/b1/90qqtv1n53v9tdl15l0phky40000gn/T/tmp0x4wbh0nbentoml_model_roberta_text_classification/special_tokens_map.json


In [63]:
f"{tag.name}:{tag.version}"

'roberta_text_classification:fa3yeten7oi4zgxi'