Modified version of Huggingface tutorial.  
https://huggingface.co/learn/nlp-course/en/chapter3/3

# Fine-tuning a model with the Trainer API or Keras

## Recipe
1. Data processing: loading and tokenization
2. Load pre-trained model
3. Set up the Trainer
4. Train
5. Evaluate

Install the Transformers, Datasets, and Evaluate libraries to run this notebook.

In [None]:
!pip install datasets evaluate transformers[sentencepiece]

In [None]:
from datasets import load_dataset
from transformers import AutoTokenizer, DataCollatorWithPadding

raw_datasets = load_dataset("glue", "mrpc")
checkpoint = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)


def tokenize_function(example):
    return tokenizer(example["sentence1"], example["sentence2"], truncation=True)


tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
# DataCollatorWithPadding will dynamically pad input ids.
# 这个 collator 会自动动态padding
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

In [None]:
from transformers import TrainingArguments
#  The only argument you have to provide is a directory where the trained model will be saved, 
# as well as the checkpoints along the way. For all the rest, you can leave the defaults, 
# which should work pretty well for a basic fine-tuning.
# 根据路劲加载训练参数
training_args = TrainingArguments("test-trainer")

In [None]:
from transformers import AutoModelForSequenceClassification
# Load the uncased BERT, warp it, and set the classes as 2.
# 加载 uncased BERT，并且包装成 序列分类器， 这里 num_labels=2 表示为2分类任务 
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)

In [None]:
from transformers import Trainer

trainer = Trainer(
    model,
    training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
)
'''
We didn’t tell the Trainer to evaluate during training 
by setting evaluation_strategy to either "steps" (evaluate every eval_steps) 
or "epoch" (evaluate at the end of each epoch).

We didn’t provide the Trainer with a compute_metrics() function 
to calculate a metric during said evaluation 
(otherwise the evaluation would just have printed the loss, which is not a very intuitive number).
这里没有设置 evaluation 的metric，所以默认会返回 loss

'''

In [None]:
trainer.train()

In [None]:
predictions = trainer.predict(tokenized_datasets["validation"])
print(predictions.predictions.shape, predictions.label_ids.shape)

(408, 2) (408,)

In [None]:
import numpy as np

preds = np.argmax(predictions.predictions, axis=-1)

In [None]:
import evaluate
# Load a Dataset specific metric to evaluate the fine-tuned model.
# 这里加载的是数据特定的 metric，并且是单独进行 evaluate 的。
metric = evaluate.load("glue", "mrpc")
metric.compute(predictions=preds, references=predictions.label_ids)

{'accuracy': 0.8578431372549019, 'f1': 0.8996539792387542}

# Evaluation metrics
Let’s see how we can build a useful compute_metrics() function and use it the next time we train. The function must take an EvalPrediction object (which is a named tuple with a predictions field and a label_ids field) and will return a dictionary mapping strings to floats (the strings being the names of the metrics returned, and the floats their values)

In [None]:
# 计算metric值，计算metric的方法，看上面的描述，确定metric的返回值
def compute_metrics(eval_preds):
    metric = evaluate.load("glue", "mrpc")
    logits, labels = eval_preds
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

In [None]:
training_args = TrainingArguments("test-trainer", evaluation_strategy="epoch")
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)

trainer = Trainer(
    model,
    training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)


In [None]:
trainer.train()