# Fine-tuning a Sequence Classification Model

Hugging Face Transformers provides a Trainer class to help you fine-tune any of the pretrained models it provides on your dataset. Once you’ve done all the data preprocessing work in the last section, you have just a few steps left to define the `Trainer`. The hardest part is likely to be preparing the environment to run `Trainer.train()`.

In [1]:
from datasets import load_dataset
from transformers import AutoTokenizer, DataCollatorWithPadding

raw_datasets = load_dataset("glue", "mrpc")
checkpoint = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

In [2]:
def tokenize_function(example):
    return tokenizer(example["sentence1"], example["sentence2"], truncation=True)


tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

In [3]:
samples = [tokenized_datasets["train"][i] for i in range(2)]
for sample in samples:  # remove the fields that cannot be collated
    _ = sample.pop("sentence1")
    _ = sample.pop("sentence2")

for chunk in data_collator(samples)["input_ids"]:
    print(f"\n>>> {tokenizer.decode(chunk)}")
    print(f"Number of tokens: {len(chunk)}")


>>> [CLS] amrozi accused his brother, whom he called " the witness ", of deliberately distorting his evidence. [SEP] referring to him as only " the witness ", amrozi accused his brother of deliberately distorting his evidence. [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]
Number of tokens: 59

>>> [CLS] yucaipa owned dominick's before selling the chain to safeway in 1998 for $ 2. 5 billion. [SEP] yucaipa bought dominick's in 1995 for $ 693 million and sold it to safeway for $ 1. 8 billion in 1998. [SEP]
Number of tokens: 59


In [4]:
sample = tokenized_datasets["train"][0]
print("The model will input the sentence1 and sentence2 strings tokenized together")
print(f"Sentence1: {sample['sentence1']}")
print(f"Sentence2: {sample['sentence2']}")
print(f"And expects in the batch a 'label'. In this case: {sample['label']}")

The model will input the sentence1 and sentence2 strings tokenized together
Sentence1: Amrozi accused his brother , whom he called " the witness " , of deliberately distorting his evidence .
Sentence2: Referring to him as only " the witness " , Amrozi accused his brother of deliberately distorting his evidence .
And expects in the batch a 'label'. In this case: 1


# Evaluation

Let’s see how we can build a useful `compute_metrics()` function and use it when we train. The function must take an `EvalPrediction` object (which is a named tuple with a `predictions` field and a `label_ids` field) and will **return a dictionary mapping strings to floats** (the strings being the names of the metrics returned, and the floats their values).

In [5]:
mockup_predictions = [1, 0, 0]
mockup_labels = [1, 0, 1]

In [6]:
import evaluate

metric = evaluate.load("glue", "mrpc")
metric.compute(
    predictions=mockup_predictions,
    references=mockup_labels
)

{'accuracy': 0.6666666666666666, 'f1': 0.6666666666666666}

In [7]:
def compute_metrics(eval_preds):
    metric = evaluate.load("glue", "mrpc")
    logits, labels = eval_preds
    predictions = logits.argmax(axis=1)
    return metric.compute(predictions=predictions, references=labels)

# Training

The first step before we can define our `Trainer` is to define a `TrainingArguments` class that will contain all the hyperparameters the Trainer will use for training and evaluation. The only argument you have to provide is a directory where the trained model will be saved, as well as the checkpoints along the way. For all the rest, you can leave the defaults, which should work pretty well for a basic fine-tuning.

In [11]:
from transformers import TrainingArguments

output_dir = f"tmp/seq_classification-{checkpoint}"
training_args = TrainingArguments(
    output_dir,
    eval_strategy="epoch",
    num_train_epochs=3,
)

The second step is to define our model. We will use the `AutoModelForSequenceClassification` class, with two labels:

In [None]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)

You get a warning after instantiating this pretrained model. This is because BERT has not been pretrained on classifying pairs of sentences, so **the head of the pretrained model has been discarded and a new head suitable for sequence classification has been added** instead. The warnings indicate that some weights were not used (the ones corresponding to the dropped pretraining head) and that some others were randomly initialized (the ones for the new head). It concludes by encouraging you to train the model, which is exactly what we are going to do now.

Once we have our model, we can define a `Trainer` by passing it all the objects constructed up to now — the model, the `training_args`, the training and validation `datasets`, our `data_collator`, and our `tokenizer`:

In [12]:
from transformers import Trainer

trainer = Trainer(
    model,
    training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

To fine-tune the model on our dataset, we just have to call the `train()` method of our `Trainer`:

In [13]:
trainer.train()

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mmarioparreno[0m. Use [1m`wandb login --relogin`[0m to force relogin


  0%|          | 0/1377 [00:00<?, ?it/s]

  0%|          | 0/51 [00:00<?, ?it/s]

{'eval_loss': 0.620880126953125, 'eval_accuracy': 0.7058823529411765, 'eval_f1': 0.8214285714285714, 'eval_runtime': 1.6375, 'eval_samples_per_second': 249.16, 'eval_steps_per_second': 31.145, 'epoch': 1.0}
{'loss': 0.617, 'grad_norm': 2.1467463970184326, 'learning_rate': 3.184458968772695e-05, 'epoch': 1.09}


  0%|          | 0/51 [00:00<?, ?it/s]

{'eval_loss': 0.5247003436088562, 'eval_accuracy': 0.7426470588235294, 'eval_f1': 0.8372093023255814, 'eval_runtime': 1.493, 'eval_samples_per_second': 273.267, 'eval_steps_per_second': 34.158, 'epoch': 2.0}
{'loss': 0.561, 'grad_norm': 7.290564060211182, 'learning_rate': 1.3689179375453886e-05, 'epoch': 2.18}


  0%|          | 0/51 [00:00<?, ?it/s]

{'eval_loss': 0.5071743130683899, 'eval_accuracy': 0.7818627450980392, 'eval_f1': 0.846286701208981, 'eval_runtime': 1.5385, 'eval_samples_per_second': 265.201, 'eval_steps_per_second': 33.15, 'epoch': 3.0}
{'train_runtime': 112.2748, 'train_samples_per_second': 98.01, 'train_steps_per_second': 12.265, 'train_loss': 0.5668545835850978, 'epoch': 3.0}


TrainOutput(global_step=1377, training_loss=0.5668545835850978, metrics={'train_runtime': 112.2748, 'train_samples_per_second': 98.01, 'train_steps_per_second': 12.265, 'total_flos': 405114969714960.0, 'train_loss': 0.5668545835850978, 'epoch': 3.0})