Copyright (c) 2023 Habana Labs, Ltd. an Intel Company.
### Licensed under the Apache License, Version 2.0 (the "License");  
You may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

# PyTorch BERT FineTuning Example on Habana Gaudi

This Jupyter Notebook example demonstrates how to finetune BERT on Habana Gaudi device with PyTorch framework. The pretrained model will be downloaded from HuggingFace, and finetuned with SQuAD dataset.

In [1]:
%cd /root

/root


## Setup

Let's clone Habana `Model-References` repository to this image and add it to PYTHONPATH.

In [2]:
!git clone https://github.com/habanaai/Model-References

Cloning into 'Model-References'...
remote: Enumerating objects: 15297, done.[K
remote: Counting objects: 100% (15296/15296), done.[K
remote: Compressing objects: 100% (6684/6684), done.[K
remote: Total 15297 (delta 8265), reused 15161 (delta 8161), pack-reused 1[K
Receiving objects: 100% (15297/15297), 101.61 MiB | 54.47 MiB/s, done.
Resolving deltas: 100% (8265/8265), done.


In [3]:
!export PYTHONPATH=/root/Model-References:$PYTHONPATH

In [4]:
%cd /root/Model-References/PyTorch/nlp/finetuning/huggingface/bert

/root/Model-References/PyTorch/nlp/finetuning/huggingface/bert


Next, we need to install all the Python packages that BERT depends on.  Including HuggingFace Transformers

In [5]:
!pip install dill>=0.3.6 --quiet

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3[0m[39;49m -> [0m[32;49m23.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m


In [6]:
!pip install -r ./transformers/examples/pytorch/question-answering/requirements.txt

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3[0m[39;49m -> [0m[32;49m23.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m


In [7]:
!pip install transformers/.

Processing ./transformers
  Preparing metadata (setup.py) ... [?25ldone
Building wheels for collected packages: transformers
  Building wheel for transformers (setup.py) ... [?25ldone
[?25h  Created wheel for transformers: filename=transformers-4.20.1-py3-none-any.whl size=4248799 sha256=e8830417d7d3cd1893b5d574d13bce05863da662f94e82ef3a9aed301611a5b0
  Stored in directory: /tmp/pip-ephem-wheel-cache-0ythz61l/wheels/26/3d/59/fdd991f9963e428015334930a898b7b10f9c4405d8fff2f52e
Successfully built transformers
Installing collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.20.1
    Uninstalling transformers-4.20.1:
      Successfully uninstalled transformers-4.20.1
Successfully installed transformers-4.20.1
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3[0m[39;49m -> [0m[32;49m23.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;

## Training on 1 HPU

After all the dependant Python packages are installed, let's launch BERT base finetuning with SQuAD dataset on a single HPU in BF16 data type:

We see that the original BERT model is now Fine Tuned with the SQuAD dataset 

``` 
python3 transformers/examples/pytorch/question-answering/run_qa.py --hmp --hmp_bf16=./ops_bf16_bert.txt --hmp_fp32=./ops_fp32_bert.txt --doc_stride=128 --use_lazy_mode --per_device_train_batch_size=12 --per_device_eval_batch_size=8 --dataset_name=squad --use_fused_adam --use_fused_clip_norm --use_hpu --max_seq_length=384 --learning_rate=3e-05 --num_train_epochs=1 --max_steps 1000 --output_dir=./output --logging_steps=40 --overwrite_output_dir --do_train --save_steps=8000 --model_name_or_path=bert-base-uncased
```

In [8]:
!python3 transformers/examples/pytorch/question-answering/run_qa.py --hmp --hmp_bf16=./ops_bf16_bert.txt --hmp_fp32=./ops_fp32_bert.txt --doc_stride=128 --use_lazy_mode --per_device_train_batch_size=12 --per_device_eval_batch_size=8 --dataset_name=squad --use_fused_adam --use_fused_clip_norm --use_hpu --max_seq_length=384 --learning_rate=3e-05 --num_train_epochs=1 --max_steps 1000 --output_dir=./output --logging_steps=20 --overwrite_output_dir --do_train --save_steps=8000 --model_name_or_path=bert-base-uncased

03/07/2023 21:06:51 - INFO - __main__ - Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=IntervalStrategy.NO,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_min_num_params=0,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hmp=True,
hmp_bf16=./ops_bf16_bert.txt,
hmp_fp32=./ops_fp32_bert.txt,
hmp_opt_level=O1,
hmp_verbose=False,
hub_model_id=None,
hub_private_repo=False,
h

**From the logs above, we can see the finetuning throughput for BERT base on 1 HPU is over 100 samples/second.**