Copyright (c) 2023 Habana Labs, Ltd. an Intel Company.

# Fine Tuning and Inference using Hugging Face and the Optimum Habana Library

### Summarization with T5-3B model on the Intel&reg; Gaudi&reg; 2 AI acclerator
We will use the Hugging Face Summariazion example with the T5-3B model to fine tune the model with the CNN-dailymail dataset

run_summarization.py is a lightweight example of how to download and preprocess a dataset from the 🤗 Datasets library 

#### Initial Setup
We start with a Intel Gaudi PyTorch Docker image and run this notebook

#### Install the Intel Gaudi DeepSpeed Fork
The Intel Gaudi DeepSpeed Fork has implementations specifically for Gaudi and must be used

In [1]:
!pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.16.2  

Collecting git+https://github.com/HabanaAI/DeepSpeed.git@1.16.2
  Cloning https://github.com/HabanaAI/DeepSpeed.git (to revision 1.16.2) to /tmp/pip-req-build-by3bxdk1
  Running command git clone --filter=blob:none --quiet https://github.com/HabanaAI/DeepSpeed.git /tmp/pip-req-build-by3bxdk1
  Running command git checkout -b 1.16.2 --track origin/1.16.2
  Switched to a new branch '1.16.2'
  Branch '1.16.2' set up to track remote branch '1.16.2' from 'origin'.
  Resolved https://github.com/HabanaAI/DeepSpeed.git to commit d0420c5fd6b21fcd403538bde078e695a62ddba5
  Running command git submodule update --init --recursive -q
  Preparing metadata (setup.py) ... [?25ldone
[0m

#### Install the Optimum Habana Library

In [2]:
!pip install optimum-habana==v1.12.0
!pip install ipywidgets

[0mCollecting ipywidgets
  Downloading ipywidgets-8.1.3-py3-none-any.whl.metadata (2.4 kB)
Collecting widgetsnbextension~=4.0.11 (from ipywidgets)
  Downloading widgetsnbextension-4.0.11-py3-none-any.whl.metadata (1.6 kB)
Collecting jupyterlab-widgets~=3.0.11 (from ipywidgets)
  Downloading jupyterlab_widgets-3.0.11-py3-none-any.whl.metadata (4.1 kB)
Downloading ipywidgets-8.1.3-py3-none-any.whl (139 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.4/139.4 kB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading jupyterlab_widgets-3.0.11-py3-none-any.whl (214 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m214.4/214.4 kB[0m [31m14.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading widgetsnbextension-4.0.11-py3-none-any.whl (2.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: widgetsnbextension, jupyterlab-widgets

#### Clone the Hugging Face Model Repository

In [3]:
!git clone -b v1.12.0 https://github.com/huggingface/optimum-habana.git

Cloning into 'optimum-habana'...
remote: Enumerating objects: 15245, done.[K
remote: Counting objects: 100% (5011/5011), done.[K
remote: Compressing objects: 100% (807/807), done.[K
remote: Total 15245 (delta 4653), reused 4321 (delta 4180), pack-reused 10234[K
Receiving objects: 100% (15245/15245), 9.11 MiB | 21.70 MiB/s, done.
Resolving deltas: 100% (10588/10588), done.
Note: switching to '6adad1651566ffb761ce47f8d671b73a3bbb0ec2'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false



#### Go the Summarization example model and install the requirements

In [4]:
%cd optimum-habana/examples/summarization

/root/Gaudi2-Workshop/LLM-Training/optimum-habana/examples/summarization


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


In [5]:
!pip install -q -r requirements.txt

[0m

### Setup for DeepSpeed
Since we are using DeepSpeed, we have to confirm that the model has been configured properly.  We look for the following:

* model, optimizer, ... = deepspeed.initialize(args=args, model=model, optimizer=optimizer, ...)
* deepspeed.init_distributed(dist_backend=“hccl”, init_method=init_method)
* Create a ds_config.json file to set the DS training parameters.

#### DeepSpeed Initialization
Look in deepspeed.py and we see the model being passed to the DeepSpeed engine

```
    import deepspeed
    from deepspeed.utils import logger as ds_logger

    model = trainer.model
    args = trainer.args
    ...

    kwargs = {
        "args": habana_args,
        "model": model,
        "model_parameters": model_parameters,
        "config_params": config,
        "optimizer": optimizer,
        "lr_scheduler": lr_scheduler,
    }

    deepspeedengine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)

```

#### DeepSpeed Distributed
Look in training_args.py and we see the DeepSpeed Distribution initialization

```
    from habana_frameworks.torch.distributed.hccl import initialize_distributed_hpu
    world_size, rank, self.local_rank = initialize_distributed_hpu()

    import deepspeed
    deepspeed.init_distributed(dist_backend="hccl", timeout=timedelta(seconds=self.ddp_timeout))
       logger.info("DeepSpeed is enabled.")
```

#### Create DeepSpeed Config file with ZeRO preferences
The ds_config.json file will configure the parameters to run DeepSpeed

In this case, we will run the ZeRO2 optimizer and BF16 mixed precision.

In [6]:
%pwd

'/root/Gaudi2-Workshop/LLM-Training/optimum-habana/examples/summarization'

In [7]:
%%sh
tee ./ds_config.json <<EOF
{
    "steps_per_print": 64,
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "gradient_accumulation_steps": "auto",
    "bf16": {
        "enabled": true
    },
    "gradient_clipping": 1.0,
    "zero_optimization": {
        "stage": 2,
        "overlap_comm": false,
        "reduce_scatter": false,
        "contiguous_gradients": false
    }
}
EOF

{
    "steps_per_print": 64,
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "gradient_accumulation_steps": "auto",
    "bf16": {
        "enabled": true
    },
    "gradient_clipping": 1.0,
    "zero_optimization": {
        "stage": 2,
        "overlap_comm": false,
        "reduce_scatter": false,
        "contiguous_gradients": false
    }
}


#### Fine Tuning T5-3b with the cnn_dailymail dataset
The T5-3b model is a large language model that was originally trained on the C4 dataset and in this case will be fined tuned on the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset that is an English-language dataset containing just over 300k unique news articles as written by journalists at CNN and the Daily Mail.

For use of this model on Intel Gaudi First-Gen, users should update the model to "T5-large"

This is run by `gaudi_spawn.py`, a simple launcher script to collect arguments and send them to `distributed_runner.py` for training on multiple HPUs, which then calls the `run_summarization.py` model.

Notice the Habana specific commands to use here:

-- use_habana  - allows training to run on Intel Gaudi cards
-- use_hpu_graphs - reduces recompilation by replaying the graph  
-- gaudi_config_name Habana/t5 - mapping to Hugging Face T5 Model  

**Even though a Billion parameter T5 model can be used for Fine Tuning, this fine tuning still takes many hours to complete.  
For users that wish to execute the example Fine Tuning, they should modify the `model_name_or_path` to "t5-small", which takes about 30 minutes to complete.**


In [9]:
!mkdir ft-summarization

In [10]:
!python ../gaudi_spawn.py \
--world_size 8 --use_deepspeed run_summarization.py \
--model_name_or_path t5-small \
--do_train \
--dataset_name cnn_dailymail \
--dataset_config '"3.0.0"' \
--source_prefix '"summarize: "' \
--output_dir ./ft-summarization \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--overwrite_output_dir \
--predict_with_generate \
--use_habana \
--use_lazy_mode \
--use_hpu_graphs_for_training \
--gaudi_config_name Habana/t5 \
--ignore_pad_token_for_loss False \
--pad_to_max_length \
--save_strategy epoch \
--report_to none \
--throughput_warmup_steps 3 \
--deepspeed ./ds_config.json

DistributedRunner run(): command = deepspeed --num_nodes 1 --num_gpus 8 --no_local_rank --master_port 29500 run_summarization.py --model_name_or_path t5-small --do_train --dataset_name cnn_dailymail --dataset_config "3.0.0" --source_prefix "summarize: " --output_dir ./ft-summarization --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --overwrite_output_dir --predict_with_generate --use_habana --use_lazy_mode --use_hpu_graphs_for_training --gaudi_config_name Habana/t5 --ignore_pad_token_for_loss False --pad_to_max_length --save_strategy epoch --report_to none --throughput_warmup_steps 3 --deepspeed ./ds_config.json
[2024-07-17 06:22:58,627] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to hpu (auto detect)
[2024-07-17 06:22:59,900] [INFO] [runner.py:583:main] cmd = /usr/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=29500 --no_local_rank --enable_ea

### After fine tuning, let's look at the results
This fine tuned model has created the new `pytorch_model.bin` and the global_step.. folder contains the checkpoints that will be used in the infernece in the next section.


In [11]:
%cd ./ft-summarization

/root/Gaudi2-Workshop/LLM-Training/optimum-habana/examples/summarization/ft-summarization


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


In [12]:
%ls -al

total 217824
drwxr-xr-x 5 root root      4096 Jul 17 06:49 [0m[01;34m.[0m/
drwxr-xr-x 4 root root      4096 Jul 17 06:23 [01;34m..[0m/
-rw-r--r-- 1 root root      1182 Jul 17 06:49 README.md
-rw-r--r-- 1 root root       355 Jul 17 06:49 all_results.json
drwxr-xr-x 3 root root      4096 Jul 17 06:42 [01;34mcheckpoint-17946[0m/
drwxr-xr-x 3 root root      4096 Jul 17 06:49 [01;34mcheckpoint-26919[0m/
drwxr-xr-x 3 root root      4096 Jul 17 06:34 [01;34mcheckpoint-8973[0m/
-rw-r--r-- 1 root root      1503 Jul 17 06:49 config.json
-rw-r--r-- 1 root root       247 Jul 17 06:49 gaudi_config.json
-rw-r--r-- 1 root root       588 Jul 17 06:49 generation_config.json
-rw-r--r-- 1 root root 219726224 Jul 17 06:49 model.safetensors
-rw-r--r-- 1 root root      2543 Jul 17 06:49 special_tokens_map.json
-rw-r--r-- 1 root root    791656 Jul 17 06:49 spiece.model
-rw-r--r-- 1 root root   2422434 Jul 17 06:49 tokenizer.json
-rw-r--r-- 1 root root     20746 Jul 17 06:49 tokenizer_config.json
-

## Inference Summarization using the Pipeline
Now we can run the summarization using Hugging Face Pipeline call with the fine tuned model.  In this case we will point to the model that we fine tuned.   Remember that if you used t5-small to do the Fine Tuning, be sure to change the `model_to_finetune` to "t5-small"

In [16]:
import torch
import habana_frameworks.torch

from transformers import pipeline, AutoModelForSeq2SeqLM, AutoTokenizer

# Load model to fine-tune and its tokenizer
model_to_finetune = "t5-small"
model = AutoModelForSeq2SeqLM.from_pretrained(model_to_finetune)
tokenizer = AutoTokenizer.from_pretrained(model_to_finetune)

# Point to the ft-summarization folder with the fine-tuned model
path_to_local_model = "/root/Gaudi2-Workshop/LLM-Training/optimum-habana/examples/summarization/ft-summarization"

# Instantiate pipeline from local repo, if you did not run the fine tuning step above, you can change: model=model_to_finetune
summarization_pipeline = pipeline(task="summarization", model=path_to_local_model, device="hpu", torch_dtype=torch.bfloat16, min_length=50, max_length=100)


#text_to_summarize = "summarize: Photosynthesis involves a series of complex reactions that take place within specialized organelles called chloroplasts in plant cells. It can be broadly divided into two stages: the light-dependent reactions and the light-independent reactions, also known as the Calvin cycle.  Light-Dependent Reactions: During the light-dependent reactions, chlorophyll pigments within the thylakoid membranes of the chloroplasts absorb light energy. This energy is utilized to split water molecules into oxygen, protons (H+), and electrons. Oxygen is released as a byproduct, while protons and electrons are transported through an electron transport chain, generating ATP (adenosine triphosphate) and NADPH (nicotinamide adenine dinucleotide phosphate).  Light-Independent Reactions (Calvin Cycle):  The ATP and NADPH produced in the light-dependent reactions are utilized in the Calvin cycle, which takes place in the stroma of the chloroplasts. In this cycle, carbon dioxide from the atmosphere combines with the stored energy in the form of ATP and NADPH to produce glucose. This glucose serves as a building block for other carbohydrates and organic compounds. Photosynthesis is a complex process that enables plants, algae, and some bacteria to convert light energy into chemical energy, facilitating the sustenance of life on Earth. It involves the interplay of light-dependent reactions, which generate ATP and NADPH, and the light-independent reactions or the Calvin cycle, which utilize the produced energy to fix carbon dioxide and produce glucose. Enhancing our understanding of photosynthesis and its underlying mechanisms holds the key to various applications, including improving crop yields, developing sustainable bioenergy sources, and addressing environmental challenges."
text_to_summarize = "summarize: Introduction: The Strategic Arms Limitation Talks II (SALT II) treaty, signed on June 18, 1979, between the United States and the Soviet Union, marked a significant milestone in nuclear arms control efforts during the Cold War era. Building upon its predecessor, SALT I, the treaty aimed to curb the arms race and reduce the risk of nuclear conflict between the superpowers. Key Provisions: SALT II encompassed several crucial provisions. It placed limits on strategic offensive arms, including intercontinental ballistic missiles (ICBMs), submarine-launched ballistic missiles (SLBMs), and heavy bombers. The agreement specified the maximum number of deployed warheads and launchers each party could possess. Verification and Compliance: To ensure compliance, the treaty established comprehensive verification measures. This involved regular exchanges of data, on-site inspections, and monitoring activities by both nations. These measures sought to enhance transparency, foster trust, and prevent either side from gaining a significant advantage in terms of strategic nuclear capabilities. Ratification and Challenges: Although both the United States and the Soviet Union signed the treaty, its ratification faced considerable challenges. The political landscape changed when the Soviet Union invaded Afghanistan in 1979, leading to a deterioration of U.S.-Soviet relations. As a result, the United States never ratified the treaty formally, rendering it non-binding. However, both nations pledged to adhere to its principles, effectively implementing its provisions on a voluntary basis. Legacy and Impact: Despite the treaty's non-ratification, SALT II's legacy and impact were significant. It set the stage for subsequent arms control negotiations, providing a framework for future agreements such as the Intermediate-Range Nuclear Forces (INF) Treaty and the Strategic Arms Reduction Treaty (START). SALT II demonstrated the potential for cooperation between the superpowers and laid the groundwork for continued dialogue aimed at reducing the nuclear threat globally."
print("------------------------------------------------------------")
print("Input:", text_to_summarize)
print()

# Now we call the pipline 
result = summarization_pipeline(text_to_summarize)
print("------------------------------------------------------------")
print("Result:", result)



------------------------------------------------------------
Input: summarize: Introduction: The Strategic Arms Limitation Talks II (SALT II) treaty, signed on June 18, 1979, between the United States and the Soviet Union, marked a significant milestone in nuclear arms control efforts during the Cold War era. Building upon its predecessor, SALT I, the treaty aimed to curb the arms race and reduce the risk of nuclear conflict between the superpowers. Key Provisions: SALT II encompassed several crucial provisions. It placed limits on strategic offensive arms, including intercontinental ballistic missiles (ICBMs), submarine-launched ballistic missiles (SLBMs), and heavy bombers. The agreement specified the maximum number of deployed warheads and launchers each party could possess. Verification and Compliance: To ensure compliance, the treaty established comprehensive verification measures. This involved regular exchanges of data, on-site inspections, and monitoring activities by both nati

In [2]:
# To run additional inference examples, the jupyter notebook requires that the kernel be restarted.  this `exit()` command will restart the kernel and allow another infernece run.
exit()