# Hugging Face Optimum Habana Large Language Model Example

### Summarization with T5-3B model
we will use the Hugging Face Summariazion example with the T531B model to fine tune with TBD Dataset

run_summarization.py is a lightweight example of how to download and preprocess a dataset from the ü§ó Datasets library

#### Initial Setup
we start with a Habana PyTorch Docker image and run this notebook

#### Install Habana's DeepSpeed Fork
Habana's DeepSpeed Fork has implementations specifically for Gaudi and must be used

In [1]:
!pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.10.0  

Collecting git+https://github.com/HabanaAI/DeepSpeed.git@1.10.0
  Cloning https://github.com/HabanaAI/DeepSpeed.git (to revision 1.10.0) to /tmp/pip-req-build-nxc0q7t7
  Running command git clone --filter=blob:none --quiet https://github.com/HabanaAI/DeepSpeed.git /tmp/pip-req-build-nxc0q7t7
  Running command git checkout -b 1.10.0 --track origin/1.10.0
  Switched to a new branch '1.10.0'
  Branch '1.10.0' set up to track remote branch '1.10.0' from 'origin'.
  Resolved https://github.com/HabanaAI/DeepSpeed.git to commit ebed48dcdec0b20602af5097182f68c60bdbddaf
  Preparing metadata (setup.py) ... [?25ldone
[0m

#### Install the Optimum Habana Library

In [2]:
!python -m pip install optimum[habana]

[0m

#### Clone the Hugging Face Model Repository

In [3]:
!git clone  https://github.com/huggingface/optimum-habana

fatal: destination path 'optimum-habana' already exists and is not an empty directory.


#### Go the Summarization example model and install the requirements

In [4]:
%cd optimum-habana/examples/summarization

/root/work/hf_examples/optimum-habana/examples/summarization


In [5]:
!pip install -r requirements.txt

[0m

### Setup for DeepSpeed
Since we are using DeepSpeed, we have to confirm that the model has been configured properly.  We look for the following:

* deepspeed.initialize(model, ...) model, optimizer, ... =¬†deepspeed.initialize(args=args,¬†model=model,¬†optimizer=optimizer, ...)
* deepspeed.init_distributed(dist_backend=‚Äúhccl‚Äù, init_method=init_method)
* Create a ds_config.json file to set the DS training parameters
  
  


#### DeepSpeed Initialization
Look in deepspeed.py

In [8]:
%%sh
cd ../../optimum/habana/transformers
cat -n deepspeed.py | head -n 106 | tail -n 6
cat -n deepspeed.py | head -n 160 | tail -n 11

   101	    import deepspeed
   102	    from deepspeed.utils import logger as ds_logger
   103	
   104	    model = trainer.model
   105	    args = trainer.args
   106	
   150	    kwargs = {
   151	        "args": habana_args,
   152	        "model": model,
   153	        "model_parameters": model_parameters,
   154	        "config_params": config,
   155	        "optimizer": optimizer,
   156	        "lr_scheduler": lr_scheduler,
   157	    }
   158	
   159	    deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
   160	


#### DeepSpeed Distrbuted
Look in training_args.py

In [9]:
%%sh
cd ../../optimum/habana/transformers
cat -n training_args.py | head -n 530 | tail -n 1
cat -n training_args.py | head -n 532 | tail -n 1
cat -n training_args.py | head -n 543 | tail -n 2
cat -n training_args.py | head -n 550 | tail -n 3

   530	            from habana_frameworks.torch.distributed.hccl import initialize_distributed_hpu
   532	            world_size, rank, self.local_rank = initialize_distributed_hpu()
   542	                    )
   543	                import deepspeed
   548	
   549	                deepspeed.init_distributed(dist_backend="hccl", timeout=timedelta(seconds=self.ddp_timeout))
   550	                logger.info("DeepSpeed is enabled.")


#### Create DeepSpeed Config file with ZeRO preferences
The ds_config.json file will configure the parameters to run DeepSpeed

In this case, we will run the ZeRO3 optimier and BF16 mixed precision.

In [10]:
%%sh
tee ./ds_config.json <<EOF
{
    "steps_per_print": 64,
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "gradient_accumulation_steps": "auto",
    "bf16": {
        "enabled": true
    },
    "gradient_clipping": 1.0,
    "zero_optimization": {
        "stage": 3,
        "overlap_comm": false,
        "reduce_scatter": false,
        "contiguous_gradients": false
    }
}
EOF

{
    "steps_per_print": 64,
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "gradient_accumulation_steps": "auto",
    "bf16": {
        "enabled": true
    },
    "gradient_clipping": 1.0,
    "zero_optimization": {
        "stage": 3,
        "overlap_comm": false,
        "reduce_scatter": false,
        "contiguous_gradients": false
    }
}


#### Fine Tuning T5-3b with the cnn_dailymail dataset
The T5-3b model is a large language model that was origianlly trained on the C4 dataset and in this case will be fined tuned on the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset that is an English-language dataset containing just over 300k unique news articles as written by journalists at CNN and the Daily Mail.

This is run by `gaudi_spawn.py`, a simple launcher script to collect arguments and send them to `distributed_runner.py` for training on multiple HPUs, which then calls the `run_summarization.py` model.

Notice the Habana specific commands to use here:

-- use_habana  - allows training to run on Habana Gaudi  
-- use_hpu_graphs - reduces recompilation by replaying the graph  
-- gaudi_config_name Habana/t5 - mapping to Hugging Face T5 Model  



In [None]:
%cd ../summarization

In [None]:
%%sh

mkdir ft-summarization
python ../gaudi_spawn.py \
--world_size 8 --use_deepspeed run_summarization.py \
--model_name_or_path t5-3b \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
--dataset_config '"3.0.0"' \
--source_prefix '"summarize: "' \
--output_dir ./ft-summarization \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--overwrite_output_dir \
--predict_with_generate \
--use_habana \
--use_lazy_mode \
--use_hpu_graphs \
--gaudi_config_name Habana/t5 \
--ignore_pad_token_for_loss False \
--pad_to_max_length \
--save_strategy epoch \
--throughput_warmup_steps 3 \
--deepspeed ./ds_config.json



### After fine tuning, let's look at the results
This fine tuned model has created the new `pytorch_model.bin` and the global_step26919 folder contain the checkpoints that will be used in the infernece in the next section.


In [None]:
%cd /ft-summarization

In [13]:
%ls -al

total 37712840
drwxr-xr-x 5 root root        4096 Jun  7 21:16 [0m[01;34m.[0m/
drwxrwxr-x 7 1052 1052        4096 Jun  7 22:06 [01;34m..[0m/
-rw-r--r-- 1 1052 1052 38372376576 Jun  7 05:20 [01;31mcheckpoint-26919.tar.gz[0m
-rw-r--r-- 1 root root        1473 Jun  7 21:35 config.json
-rw-r--r-- 1 root root         142 Jun  7 21:35 generation_config.json
drwxr-xr-x 2 root root        4096 Jun  5 08:06 [01;34mglobal_step26919[0m/
drwxr-x--- 2 root root        4096 Jun  7 21:16 [01;34m.graph_dumps[0m/
-rw-r--r-- 1 root root          16 Jun  5 08:06 latest
-rw-r--r-- 1 root root   242069785 Jun  7 21:35 pytorch_model.bin
-rw-r--r-- 1 root root       18871 Jun  5 08:06 rng_state_0.pth
-rw-r--r-- 1 root root       18871 Jun  5 08:06 rng_state_1.pth
-rw-r--r-- 1 root root       18871 Jun  5 08:06 rng_state_3.pth
-rw-r--r-- 1 root root       18871 Jun  5 08:06 rng_state_4.pth
-rw-r--r-- 1 root root       18871 Jun  5 08:06 rng_state_5.pth
-rw-r--r-- 1 root root       18871 Jun  5 08:0

#### Summarization using the Pipeline
Now we can run the summarization using Hugging Face Pipeline call with the fine tuned model.  In this case we will point to the mdoel that we fine tuned 

In [14]:
import torch
import habana_frameworks.torch

from transformers import pipeline, AutoModelForSeq2SeqLM, AutoTokenizer

# Load model to fine-tune and its tokenizer
model_to_finetune = "t5-3b"
model = AutoModelForSeq2SeqLM.from_pretrained(model_to_finetune)
tokenizer = AutoTokenizer.from_pretrained(model_to_finetune)

# Save model and tokenizer on disk
path_to_local_model = "./ft-summarization"

# Instantiate pipeline from local repo
pipe = pipeline(task="summarization", model=path_to_local_model, device="hpu", torch_dtype=torch.bfloat16)

text_to_summarize = "summarize: Introduction: The Strategic Arms Limitation Talks II (SALT II) treaty, signed on June 18, 1979, between the United States and the Soviet Union, marked a significant milestone in nuclear arms control efforts during the Cold War era. Building upon its predecessor, SALT I, the treaty aimed to curb the arms race and reduce the risk of nuclear conflict between the superpowers. Key Provisions: SALT II encompassed several crucial provisions. It placed limits on strategic offensive arms, including intercontinental ballistic missiles (ICBMs), submarine-launched ballistic missiles (SLBMs), and heavy bombers. The agreement specified the maximum number of deployed warheads and launchers each party could possess. Verification and Compliance: To ensure compliance, the treaty established comprehensive verification measures. This involved regular exchanges of data, on-site inspections, and monitoring activities by both nations. These measures sought to enhance transparency, foster trust, and prevent either side from gaining a significant advantage in terms of strategic nuclear capabilities. Ratification and Challenges: Although both the United States and the Soviet Union signed the treaty, its ratification faced considerable challenges. The political landscape changed when the Soviet Union invaded Afghanistan in 1979, leading to a deterioration of U.S.-Soviet relations. As a result, the United States never ratified the treaty formally, rendering it non-binding. However, both nations pledged to adhere to its principles, effectively implementing its provisions on a voluntary basis. Legacy and Impact: Despite the treaty's non-ratification, SALT II's legacy and impact were significant. It set the stage for subsequent arms control negotiations, providing a framework for future agreements such as the Intermediate-Range Nuclear Forces (INF) Treaty and the Strategic Arms Reduction Treaty (START). SALT II demonstrated the potential for cooperation between the superpowers and laid the groundwork for continued dialogue aimed at reducing the nuclear threat globally."
print("------------------------------------------------------------")
print("Input:", text_to_summarize)
print()

result = pipe(text_to_summarize)
print("------------------------------------------------------------")
print("Result:", result)



  from .autonotebook import tqdm as notebook_tqdm
For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-3b automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.
fopen of /sys/class/accel/accel6 (deleted)/pci_addr failed
 PT_HPU_LAZY_MODE = 1
 PT_HPU_LAZY_EAGER_OPTIM_CACHE = 1
 PT_HPU_ENABLE_COMPILE_THREAD = 0
 PT_HPU_ENABLE_EXECUTION_THREAD = 1
 PT_HPU_ENABLE_LAZY_EAGER_EXECUTION_THREAD = 1
 PT_ENABLE_INTER_HOST_CACHING = 0
 PT_ENABLE_INFERENCE_MODE = 1
 PT_ENABLE_HABANA_CACHING = 1
 PT_HPU_MAX_RECIPE_SUBMISSION_LIMIT = 0
 PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
 PT_HPU_MAX_COMPOUND_OP_SIZE_SS = 10
 PT_HPU_ENABLE_STAGE_SUBMISSION = 1
 PT_HPU_STAGE_SUBMISSION_MODE = 2
 PT_HPU_PGM_ENABLE_CACHE 

------------------------------------------------------------
Input: summarize: Introduction: The Strategic Arms Limitation Talks II (SALT II) treaty, signed on June 18, 1979, between the United States and the Soviet Union, marked a significant milestone in nuclear arms control efforts during the Cold War era. Building upon its predecessor, SALT I, the treaty aimed to curb the arms race and reduce the risk of nuclear conflict between the superpowers. Key Provisions: SALT II encompassed several crucial provisions. It placed limits on strategic offensive arms, including intercontinental ballistic missiles (ICBMs), submarine-launched ballistic missiles (SLBMs), and heavy bombers. The agreement specified the maximum number of deployed warheads and launchers each party could possess. Verification and Compliance: To ensure compliance, the treaty established comprehensive verification measures. This involved regular exchanges of data, on-site inspections, and monitoring activities by both nati



------------------------------------------------------------
Result: [{'summary_text': 'The Strategic Arms Limitation Talks II (SALT II) treaty was signed on June 18, 1979 . It aimed to curb the arms race and reduce the risk of nuclear conflict . The United States never ratified the treaty formally, rendering it non-binding .'}]
