Copyright (c) 2024 Habana Labs, Ltd. an Intel Company.

##### Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

# Intel® Gaudi® Accelerator Quick Start Guide


This document provides instructions on setting up the Intel Gaudi 2 AI accelerator Instance on the Intel® Tiber&trade; Developer Cloud or any on-premise Intel Gaudi Node. You will be running models from the Intel Gaudi software Model References and the Hugging Face Optimum Habana library.

Please follow along with the [video](https://developer.habana.ai/intel-developer-cloud/) on our Developer Page to walk through the steps below.  This assumes that you have setup the latest Intel Gaudi PyTorch Docker image.

To set up a multi-node instance with two or more Gaudi nodes, refer to Setting up Multiple Gaudi Nodes in the [Quick Start Guide Documentation](https://docs.habana.ai/en/latest/Intel_DevCloud_Quick_Start/Intel_DevCloud_Quick_Start.html#setting-up-multiple-gaudi-nodeshttps://docs.habana.ai/en/latest/Intel_DevCloud_Quick_Start/Intel_DevCloud_Quick_Start.html#setting-up-multiple-gaudi-nodes).  

The first step is to install the Model-References repository from GitHub and run the "hello-world" model from the examples library.

In [2]:
%cd ~/Gaudi-tutorials/PyTorch/Single_card_tutorials
!git clone -b 1.15.1 https://github.com/HabanaAI/Model-References.git

/root/Gaudi-tutorials/PyTorch/Single_card_tutorials
fatal: destination path 'Model-References' already exists and is not an empty directory.


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


In [3]:
%cd Model-References/PyTorch/examples/computer_vision/hello_world/

/root/Gaudi-tutorials/PyTorch/Single_card_tutorials/Model-References/PyTorch/examples/computer_vision/hello_world


We now run the simple example with the MNIST dataset on one Intel Gaudi card:

In [5]:
%run mnist.py --batch-size=64 --epochs=1 --lr=1.0 --gamma=0.7 --hpu --autocast

Not using distributed mode
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../data/MNIST/raw/train-images-idx3-ubyte.gz
100%|███████████████████████████| 9912422/9912422 [00:00<00:00, 96733142.65it/s]
Extracting ../data/MNIST/raw/train-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ../data/MNIST/raw/train-labels-idx1-ubyte.gz
100%|███████████████████████████████| 28881/28881 [00:00<00:00, 64536864.05it/s]
Extracting ../data/MNIST/raw/train-labels-idx1-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw/t10k-images-idx3-ubyte.gz
100%|███████████████████████████| 1648877/1648877 [00:00<00:00, 49498574.97it/s]
E

### Fine-tuning with Hugging Face Optimum Habana Library
The Optimum Habana library is the interface between the Hugging Face Transformers and Diffusers libraries and the Gaudi 2 card. It provides a set of tools enabling easy model loading, training and inference on single and multi-card settings for different downstream tasks. The following example uses the text-classification task to fine-tune a BERT-Large model with the MRPC (Microsoft Research Paraphrase Corpus) dataset and also run Inference.

Follow the below steps to install the stable release from the Optimum Habana examples and library:

1. Clone the Optimum-Habana project and check out the lastest stable release.  This repository gives access to the examples that are optimized for Intel Gaudi:

In [7]:
%cd ~/Gaudi-tutorials/PyTorch/Single_card_tutorials
!git clone -b v1.11.1 https://github.com/huggingface/optimum-habana.git


/root/Gaudi-tutorials/PyTorch/Single_card_tutorials
fatal: destination path 'optimum-habana' already exists and is not an empty directory.
/root/Gaudi-tutorials/PyTorch/Single_card_tutorials/optimum-habana
HEAD is now at 1dfbc02 Release: v1.10.4
/root


2. Install Optimum-Habana library. This will install the latest stable library:

In [10]:
!pip install optimum-habana==1.11.1

[0m

The following example is based on the Optimum-Habana Text Classification task example. Change to the text-classification directory and install the additional software requirements for this specific example:

In [14]:
%cd ~/Gaudi-tutorials/PyTorch/Single_card_tutorials/optimum-habana/examples/text-classification/
!pip install --quiet -r requirements.txt

/root/Gaudi-tutorials/PyTorch/Single_card_tutorials/optimum-habana/examples/text-classification
[0m

### Execute Single-Card Training
This run instruction will fine-tune the BERT-Large Model on one Intel Gaudi card:  

In [18]:
%run run_glue.py \
--model_name_or_path bert-large-uncased-whole-word-masking \
--gaudi_config_name Habana/bert-large-uncased-whole-word-masking  \
--task_name mrpc   \
--do_train   \
--do_eval   \
--per_device_train_batch_size 32 \
--learning_rate 3e-5  \
--num_train_epochs 3   \
--max_seq_length 128   \
--output_dir ./output/mrpc/  \
--use_habana  \
--use_lazy_mode   \
--bf16   \
--use_hpu_graphs_for_inference \
--report_to none \
--overwrite_output_dir \
--throughput_warmup_steps 3

03/21/2024 15:48:42 - INFO - __main__ - Training/evaluation parameters GaudiTrainingArguments(
_n_gpu=0,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-06,
adjust_throughput=False,
auto_find_batch_size=False,
bf16=True,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
ddp_backend=hccl,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=230,
ddp_find_unused_parameters=False,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tensor_cache_hpu_graphs=False,
disable_tqdm=False,
dispatch_batches=None,
distribution_strategy=ddp,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gaudi_config_name=Habana/bert-large-unca

### Inference Example Run
Using inference will run the same evaluation metrics (accuracy, F1 score) as shown above. This will display how well the model has performed:

In [19]:
%run run_glue.py --model_name_or_path bert-large-uncased-whole-word-masking \
--gaudi_config_name Habana/bert-large-uncased-whole-word-masking \
--task_name mrpc \
--do_eval \
--max_seq_length 128 \
--output_dir ./output/mrpc/ \
--use_habana \
--use_lazy_mode \
--use_hpu_graphs_for_inference \
--report_to none \
--overwrite_output_dir 

03/21/2024 15:50:17 - INFO - __main__ - Training/evaluation parameters GaudiTrainingArguments(
_n_gpu=0,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-06,
adjust_throughput=False,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
ddp_backend=hccl,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=230,
ddp_find_unused_parameters=False,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tensor_cache_hpu_graphs=False,
disable_tqdm=False,
dispatch_batches=None,
distribution_strategy=ddp,
do_eval=True,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gaudi_config_name=Habana/bert-large-un

## Next Steps
You now have access to all the Models in Model-References and Optimum-Habana repositories, you can start to look at other models.  Remember that all the models in these repositories are fully documented so they are easy to use.
* To explore more models from the Model References, start [here](https://github.com/HabanaAI/Model-References).  
* To run more examples using Hugging Face go [here](https://github.com/huggingface/optimum-habana?tab=readme-ov-file#validated-models).  
* To migrate other models to Gaudi 2, refer to PyTorch Model Porting in the [documentation](https://docs.habana.ai/en/latest/PyTorch/PyTorch_Model_Porting/GPU_Migration_Toolkit/GPU_Migration_Toolkit.html)

In [None]:
exit()