## Getting started with Pytorch 2.0 and Hugging Face Transformers

On December 2, 2022, the PyTorch Team announced [PyTorch 2.0](https://pytorch.org/get-started/pytorch-2.0/) at the PyTorch Conference, focused on better performance, being faster, more pythonic, and staying as dynamic as before. 

This blog post explains how to get started with PyTorch 2.0 and Hugging Face Transformers today. It will cover how to fine-tune a BERT model for Text Classification using the newest PyTorch 2.0 features. 

You will learn how to:

1. Setup environment & install Pytorch 2.0 
2. Load and prepare the dataset
3. Fine-tune BERT model with the Hugging Face  `Trainer`
4. Evaluate and test model

Before we can start, make sure you have a **[Hugging Face Account](https://huggingface.co/join)** to save artifacts and experiments.

## Quick intro: Pytorch 2.0

PyTorch 2.0 or, better, 1.14 is entirely backward compatible. Pytorch 2.0 will not require any modification to existing PyTorch code but can optimize your code by adding a single line of code with `model = torch.compile(model)`.
If you ask yourself, why is there a new major version and no breaking changes? The PyTorch team answered this question in their [FAQ](https://pytorch.org/get-started/pytorch-2.0/#faqs): *“We were releasing substantial new features that we believe change how you meaningfully use PyTorch, so we are calling it 2.0 instead.”* 

Those new features include top-level support for TorchDynamo, AOTAutograd, PrimTorch, and TorchInductor. 

This allows PyTorch 2.0 to achieve a 1.3x-2x training time speedups supporting [today's 46 model architectures](https://github.com/pytorch/torchdynamo/issues/681) from [HuggingFace Transformers](https://github.com/huggingface/transformers)

If you want to learn more about PyTorch 2.0, check out the official [“GET STARTED”](https://pytorch.org/get-started/pytorch-2.0/). We expect to ship the first stable 2.0 release in early March 2023.

---

Now we know how PyTorch 2.0 works, let's get started. 🚀

*Note: This tutorial was created and run on a p3.2xlarge AWS EC2 Instance including an NVIDIA V100 GPU.*

## Setup environment & install Pytorch 2.0

Our first step is to install PyTorch 2.0 and the Hugging Face Libraries, including `transformers` and `datasets`. At the time of writing this, PyTorch 2.0 has no official release, but we can install it from the nightly version. The current expectation is a public release of PyTorch 2.0 in March 2023.

In [None]:
# Install PyTorch 2.0
!pip install numpy --pre torch[dynamo] --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu117

Additionally, we are installing the latest version of `transformers` from the `main` git branch, which includes the native integration of PyTorch 2.0 into the `Trainer`.

In [None]:
# Install transformers and dataset
!pip install 'transformers @ git+https://github.com/huggingface/transformers@main'
!pip install datasets evaluate tensorboard scikit-learn
# Install git-fls for pushing model and logs to the hugging face hub
!sudo apt-get install git-lfs --yes

This example will use the [Hugging Face Hub](https://huggingface.co/models) as a remote model versioning service. To push our model to the Hub, you must register on the [Hugging Face](https://huggingface.co/join). If you already have an account, you can skip this step. After you have an account, we will use the `notebook_login` util from the `huggingface_hub` package to log into our account and store our token (access key) on the disk.

In [None]:
from huggingface_hub import notebook_login

notebook_login()


## 2. Load and prepare the dataset

To keep the example straightforward, we are training a Text Classification model on the [BANKING77](https://huggingface.co/datasets/banking77) dataset. The BANKING77 dataset provides a fine-grained set of intents (classes) in a banking/finance domain. It comprises 13,083 customer service queries labeled with 77 intents. It focuses on fine-grained single-domain intent detection. ****

We will use the `load_dataset()` method from the [🤗 Datasets](https://huggingface.co/docs/datasets/index) library to load the `banking77`

In [1]:
from datasets import load_dataset

# Dataset id from huggingface.co/dataset
dataset_id = "banking77" 

# Load raw dataset
raw_dataset = load_dataset(dataset_id)

print(f"Train dataset size: {len(raw_dataset['train'])}")
print(f"Test dataset size: {len(raw_dataset['test'])}")

Found cached dataset banking77 (/home/ubuntu/.cache/huggingface/datasets/banking77/default/1.1.0/ff44c4421d7e70aa810b0fa79d36908a38b87aff8125d002cd44f7fcd31f493c)


  0%|          | 0/2 [00:00<?, ?it/s]

Train dataset size: 10003
Test dataset size: 3080


Let’s check out an example of the dataset.

In [2]:
from random import randrange

random_id = randrange(len(raw_dataset['train']))
raw_dataset['train'][random_id]

{'text': 'What are the disposable cards for?', 'label': 37}

To train our model, we need to convert our "Natural Language" to token IDs. This is done by a Tokenizer, which tokenizes the inputs (including converting the tokens to their corresponding IDs in the pre-trained vocabulary) if you want to learn more about this, out **[chapter 6](https://huggingface.co/course/chapter6/1?fw=pt)** of the [Hugging Face Course](https://huggingface.co/course/chapter1/1).

In [3]:
from transformers import AutoTokenizer

# Model id to load the tokenizer
model_id = "bert-base-uncased"
# Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Tokenize helper function
def tokenize(batch):
    return tokenizer(batch['text'], padding='max_length', truncation=True, return_tensors="pt")

# Tokenize dataset
raw_dataset =  raw_dataset.rename_column("label", "labels") # to match Trainer
tokenized_dataset = raw_dataset.map(tokenize, batched=True,remove_columns=["text"])

print(tokenized_dataset["train"].features.keys())
# dict_keys(['input_ids', 'token_type_ids', 'attention_mask','lable'])

Loading cached processed dataset at /home/ubuntu/.cache/huggingface/datasets/banking77/default/1.1.0/ff44c4421d7e70aa810b0fa79d36908a38b87aff8125d002cd44f7fcd31f493c/cache-985953af8344a8ee.arrow
Loading cached processed dataset at /home/ubuntu/.cache/huggingface/datasets/banking77/default/1.1.0/ff44c4421d7e70aa810b0fa79d36908a38b87aff8125d002cd44f7fcd31f493c/cache-41e18fd05fa6d07c.arrow


dict_keys(['labels', 'input_ids', 'token_type_ids', 'attention_mask'])


## 3. Fine-tune BERT model with the Hugging Face  `Trainer`

After we have processed our dataset, we can start training our model. We will use the bert-base-uncased model. The first step is to load our model with `AutoModelForSequenceClassification` class from the [Hugging Face Hub](https://huggingface.co/bert-base-uncased). This will initialize the pre-trained BERT weights with a classification head on top. Here we pass the number of classes (77) from our dataset and the label names to have readable outputs for inference.

In [4]:
from transformers import AutoModelForSequenceClassification

# Model id to load the tokenizer
model_id = "bert-base-uncased"

# Prepare model labels - useful for inference
labels = tokenized_dataset["train"].features["labels"].names
num_labels = len(labels)
label2id, id2label = dict(), dict()
for i, label in enumerate(labels):
    label2id[label] = str(i)
    id2label[str(i)] = label

# Download the model from huggingface.co/models
model = AutoModelForSequenceClassification.from_pretrained(
    model_id, num_labels=num_labels, label2id=label2id, id2label=id2label
)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

We evaluate our model during training. The `Trainer` supports evaluation during training by providing a `compute_metrics` method. We use the `evaluate` library to calculate the [f1 metric](https://huggingface.co/spaces/evaluate-metric/f1) during training on our test split.

In [5]:
import evaluate
import numpy as np

# Metric Id
metric = evaluate.load("f1")

# Metric helper method
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return metric.compute(predictions=predictions, references=labels, average="weighted")

The last step is to define the hyperparameters (`TrainingArguments`) we use for our training. Here we are adding the PyTorch 2.0 introduced features for fast training times. To use the latest improvements of PyTorch 2.0, we only need to pass the `compile` option in the `TrainingArguments`.

We also leverage the [Hugging Face Hub](https://huggingface.co/models) integration of the `Trainer` to push our checkpoints, logs, and metrics during training into a repository.

In [6]:
from huggingface_hub import HfFolder
from transformers import Trainer, TrainingArguments

# Id for remote repository
repository_id = "bert-base-banking77-pt2"

# Define training args
training_args = TrainingArguments(
    output_dir=repository_id,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=8,
    fp16=True,
    learning_rate=5e-5,
		num_train_epochs=3,
		# PyTorch 2.0
		# torchdynamo="inductor",
    # logging & evaluation strategies
    logging_dir=f"{repository_id}/logs",
    logging_strategy="steps",
    logging_steps=200,
    evaluation_strategy="no",
    save_strategy="epoch",
    save_total_limit=2,
    # push to hub parameters
    report_to="tensorboard",
    push_to_hub=True,
    hub_strategy="every_save",
    hub_model_id=repository_id,
    hub_token=HfFolder.get_token(),

)

# Create a Trainer instance
trainer = Trainer(
    model=model,
    args=training_args,
		tokenizer=tokenizer,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    compute_metrics=compute_metrics,
)

/home/ubuntu/deep-learning-pytorch-huggingface/training/bert-base-banking77-pt2 is already a clone of https://huggingface.co/philschmid/bert-base-banking77-pt2. Make sure you pull the latest changes with `repo.git_pull()`.
Using cuda_amp half precision backend


We can start our training by using the **`train`** method of the `Trainer`.

In [7]:
# Start training
trainer.train()

***** Running training *****
  Num examples = 10003
  Num Epochs = 3
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 1878
  Number of trainable parameters = 109541453
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
200,3.6427
400,1.9121
600,1.0675
800,0.6222
1000,0.4612
1200,0.3779
1400,0.2519
1600,0.1797
1800,0.1797


Saving model checkpoint to bert-base-banking77-pt2/checkpoint-626
Configuration saved in bert-base-banking77-pt2/checkpoint-626/config.json
Model weights saved in bert-base-banking77-pt2/checkpoint-626/pytorch_model.bin
tokenizer config file saved in bert-base-banking77-pt2/checkpoint-626/tokenizer_config.json
Special tokens file saved in bert-base-banking77-pt2/checkpoint-626/special_tokens_map.json
tokenizer config file saved in bert-base-banking77-pt2/tokenizer_config.json
Special tokens file saved in bert-base-banking77-pt2/special_tokens_map.json
Deleting older checkpoint [bert-base-banking77-pt2/checkpoint-2504] due to args.save_total_limit
Saving model checkpoint to bert-base-banking77-pt2/checkpoint-1252
Configuration saved in bert-base-banking77-pt2/checkpoint-1252/config.json
Model weights saved in bert-base-banking77-pt2/checkpoint-1252/pytorch_model.bin
tokenizer config file saved in bert-base-banking77-pt2/checkpoint-1252/tokenizer_config.json
Special tokens file saved in 

TrainOutput(global_step=1878, training_loss=0.933771866198165, metrics={'train_runtime': 501.4391, 'train_samples_per_second': 59.846, 'train_steps_per_second': 3.745, 'total_flos': 7901016582896640.0, 'train_loss': 0.933771866198165, 'epoch': 3.0})

inductor 1 epoch 96seconds

{'train_runtime': 905.2204, 'train_samples_per_second': 55.252, 'train_steps_per_second': 3.458}
{'train_runtime': 830.1859, 'train_samples_per_second': 60.246, 'train_steps_per_second': 3.77 }

In [None]:
# Save processor and create model card
tokenizer.save_pretrained(repository_id)
trainer.create_model_card()
trainer.push_to_hub()

## 3. Evaluate and test model

The last step is to evaluate the model again. The

In [8]:
trainer.evaluate()

***** Running Evaluation *****
  Num examples = 3080
  Batch size = 8


{'eval_loss': 0.3035624623298645,
 'eval_f1': 0.9281943896278403,
 'eval_runtime': 15.048,
 'eval_samples_per_second': 204.678,
 'eval_steps_per_second': 25.585,
 'epoch': 3.0}

In [None]:
# Save processor and create model card
tokenizer.save_pretrained(repository_id)
trainer.create_model_card()
trainer.push_to_hub()