# Fine-Tuning Transformers Models with HuggingFace Trainer
In this example we'll fine-tune [BERT](https://huggingface.co/google-bert/bert-base-cased), with the [IMBD dataset](https://huggingface.co/datasets/imdb) for a Text Classification use-case using the [Trainer class](https://huggingface.co/docs/transformers/v4.37.2/en/main_classes/trainer#transformers.Trainer).

- Setup: <b>conda_python3 kernel</b> and <b>ml.g4dn.12xlarge</b> SageMaker Classic Notebook Instances

## Setup

In [5]:
import datasets
import evaluate
import transformers
from datasets import load_dataset

In [6]:
train_dataset = load_dataset("imdb", split="train")
test_dataset = load_dataset("imdb", split="test")
test_subset = test_dataset.select(range(100)) # we will take a subset of the data for evaluation

In [7]:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")

# tokenize text data
def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_train = train_dataset.map(tokenize_function, batched=True)
tokenized_test = test_subset.map(tokenize_function, batched=True)

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

## Fine-Tuning

In [8]:
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=2)

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at b

In [9]:
from transformers import TrainingArguments, Trainer
training_args = TrainingArguments(output_dir="test_trainer", evaluation_strategy="epoch", num_train_epochs=1)

In [10]:
import numpy as np
import evaluate
metric = evaluate.load("accuracy")

# eval function
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

In [11]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_test, #using test as eval
    compute_metrics=compute_metrics,
    tokenizer=tokenizer
)

In [12]:
trainer.train()

The following columns in the training set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 25000
  Num Epochs = 1
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 1
  Total optimization steps = 782
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Accuracy
1,0.2764,0.29052,0.9


Saving model checkpoint to test_trainer/checkpoint-500
Configuration saved in test_trainer/checkpoint-500/config.json
Model weights saved in test_trainer/checkpoint-500/pytorch_model.bin
tokenizer config file saved in test_trainer/checkpoint-500/tokenizer_config.json
Special tokens file saved in test_trainer/checkpoint-500/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 100
  Batch size = 32


Training completed. Do not forget to share your model on huggingface.co/models =)




TrainOutput(global_step=782, training_loss=0.25128233829117796, metrics={'train_runtime': 683.6955, 'train_samples_per_second': 36.566, 'train_steps_per_second': 1.144, 'total_flos': 6577776384000000.0, 'train_loss': 0.25128233829117796, 'epoch': 1.0})

In [13]:
trainer.evaluate()

The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 100
  Batch size = 32


{'eval_loss': 0.2905201315879822,
 'eval_accuracy': 0.9,
 'eval_runtime': 1.0816,
 'eval_samples_per_second': 92.453,
 'eval_steps_per_second': 3.698,
 'epoch': 1.0}

In [14]:
trainer.predict(tokenized_test)

The following columns in the test set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Prediction *****
  Num examples = 100
  Batch size = 32


PredictionOutput(predictions=array([[ 2.233611  , -2.838757  ],
       [ 1.3145416 , -1.3600825 ],
       [ 2.0577595 , -2.4862347 ],
       [ 2.1538134 , -2.6712563 ],
       [-1.8632524 ,  3.0206013 ],
       [ 1.8456823 , -2.0057776 ],
       [ 1.4395349 , -1.5435355 ],
       [ 2.3080866 , -3.0660431 ],
       [ 2.2837262 , -2.9438133 ],
       [ 2.412186  , -3.2898452 ],
       [ 2.4439065 , -3.2905052 ],
       [ 1.1079466 , -0.9139905 ],
       [ 2.3250206 , -2.900616  ],
       [ 1.8195829 , -2.035905  ],
       [ 1.7146747 , -1.9527353 ],
       [ 1.2669995 , -1.2126276 ],
       [ 1.7040373 , -1.9370786 ],
       [ 2.0469959 , -2.266483  ],
       [-0.40541586,  1.1330844 ],
       [ 0.9924181 , -0.885446  ],
       [-0.864274  ,  1.7464645 ],
       [ 1.578781  , -1.5840921 ],
       [ 1.9062662 , -2.0838084 ],
       [ 2.270737  , -2.8005166 ],
       [ 1.518781  , -1.5225335 ],
       [ 1.7050827 , -1.8103762 ],
       [ 1.5410125 , -1.675297  ],
       [ 0.55967784, -0.17

In [15]:
trainer.save_model("./custom_model")

Saving model checkpoint to ./custom_model
Configuration saved in ./custom_model/config.json
Model weights saved in ./custom_model/pytorch_model.bin
tokenizer config file saved in ./custom_model/tokenizer_config.json
Special tokens file saved in ./custom_model/special_tokens_map.json


In [16]:
loaded_model = AutoModelForSequenceClassification.from_pretrained(pretrained_model_name_or_path="custom_model/")

loading configuration file custom_model/config.json
Model config BertConfig {
  "_name_or_path": "custom_model/",
  "architectures": [
    "BertForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "problem_type": "single_label_classification",
  "torch_dtype": "float32",
  "transformers_version": "4.22.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 28996
}

loading weights file custom_model/pytorch_model.bin
All model checkpoint weights were used when initializing BertForSequenceClassification.

All the weights of BertForSequenceClassification were initi

In [21]:
encoding = tokenizer("I am super delighted", return_tensors="pt")
res = loaded_model(**encoding)
predicted_label_classes = res.logits.argmax(-1)
predicted_label_classes

tensor([1])