Copyright (c) 2023 Habana Labs, Ltd. an Intel Company.

#### Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

# Running GPT model on DeepSpeed Lightning

The objective is to take the original DeepSpeed example from [Lightning-GPT](https://github.com/Lightning-Universe/lightning-GPT) and show the same functionality on Gaudi2. This tutorial will show how to run the model and show the changes that were made to the original code to make the model run on the Gaudi2 HPU.

## Setup

Please follow the instructions provided in the [Gaudi Installation Guide](https://docs.habana.ai/en/latest/Installation_Guide/GAUDI_Installation_Guide.html) to set up the environment including the `$PYTHON` environment variable. The guide will walk you through the process of setting up your system to run the model on Gaudi.   **You will need to run the latest Habana PyTorch Docker image, which includes the PyTorch Lightning Support for Gaudi**

### Clone Habana Model-References

In the docker container, clone the Gaudi-tutorials repository and switch to the branch that matches your SynapseAI version. You can run the [`hl-smi`](https://docs.habana.ai/en/latest/Management_and_Monitoring/System_Management_Tools_Guide/System_Management_Tools.html#hl-smi-utility-options) utility to determine the SynapseAI version.


In [1]:
%cd /root/Gaudi-tutorials/Lightning/DeepSpeed_Lightning

/root/Gaudi-tutorials/Lightning/DeepSpeed_Lightning


### Install Dependencies

Please use the following commands to install dependent packages.

In [None]:
!pip install -r requirements.txt
!pip install -e .

### Install Habana DeepSpeed

Please follow the instructions provided in the [Gaudi DeepSpeed User Guide](https://docs.habana.ai/en/latest/PyTorch/DeepSpeed/DeepSpeed_User_Guide.html) to install the DeepSpeed on Gaudi.

In [None]:
!pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.21.0

## Run the model

The command below will run the simple MinGPT DeepSpeed model on eight Gaudi2 Devices. By default, the model uses DeepSpeed ZeRO2. 

### Run command

In [None]:
%cd /root/Gaudi-tutorials/Lightning/DeepSpeed_Lightning
!python train.py --implementation mingpt --strategy deepspeed --model_type gpt2-xl

### Sample output
```
Training: 0it [00:00, ?it/s]
Training:   0%|          | 0/70 [00:00<?, ?it/s]
Epoch 0:   0%|          | 0/70 [00:00<?, ?it/s] 
Epoch 0:   1%|▏         | 1/70 [00:05<06:07,  5.32s/it]
Epoch 0:   1%|▏         | 1/70 [00:05<06:07,  5.32s/it, v_num=0]
Epoch 0:   3%|▎         | 2/70 [00:05<03:04,  2.71s/it, v_num=0]
Epoch 0:   3%|▎         | 2/70 [00:05<03:04,  2.71s/it, v_num=0]
```




## Changes made to minGPT with Deepspeed on Gaudi2
These five areas were changed to support DeepSpeed on Gaudi2in this PyTorch Lightning model, for more information, you can refer to the [Getting Started](https://lightning.ai/docs/pytorch/latest/integrations/hpu/basic.html#run-on-gaudi) section in the PyTorch Lightning documentation for basic model adoption.

**1.** Change `lightning-GPT/train.py` as follows, adding Habana customized optimizer `DeepSpeedCPUAdam, FusedAdam` from `deepspeed.ops.adam`, scheduler `WarmupLR` from `deepspeed.runtime.lr_schedules`, and other necessary modules from either `lightning` or `pytorch_lightning`.

Note that `lightning` has a higher priority than `pytorch_lightning`. We check whether `lightning` is available. If it is available, the code will import `lightning`; otherwise, it will import `pytorch_lightning`.

```
    --- a/train.py
    +++ b/train.py
    @@ -1,13 +1,75 @@
    from argparse import ArgumentParser
    from urllib.request import urlopen

    -import lightning as L
    +from deepspeed.ops.adam import DeepSpeedCPUAdam, FusedAdam
    +from deepspeed.runtime.lr_schedules import WarmupLR
    +
    +from lightning_utilities import module_available
    +if module_available("lightning"):
    +    import lightning.pytorch as L
    +    from lightning.pytorch.callbacks import Callback, LearningRateMonitor, ModelCheckpoint
    +    from lightning.pytorch.loggers import WandbLogger
    +    from lightning.pytorch.plugins import DeepSpeedPrecisionPlugin
    +    from lightning.pytorch.profilers.pytorch import PyTorchProfiler
    +    from lightning.pytorch.strategies import StrategyRegistry
    +    from lightning.pytorch.utilities.types import STEP_OUTPUT
    +elif module_available("pytorch_lightning"):
    +    import pytorch_lightning as L
    +    from pytorch_lightning.callbacks import Callback, LearningRateMonitor, ModelCheckpoint
    +    from pytorch_lightning.loggers import WandbLogger
    +    from pytorch_lightning.plugins import DeepSpeedPrecisionPlugin
    +    from pytorch_lightning.profilers.pytorch import PyTorchProfiler
    +    from pytorch_lightning.strategies import StrategyRegistry
    +    from pytorch_lightning.utilities.types import STEP_OUTPUT
```

**2.** Similarly, change other files, such as `lightning_gpt/bench.py`, `lightning_gpt/callbacks.py`, `lightning_gpt/models.py`, to import necessary modules from either `lightning` or `pytorch_lightning`.
    
This is an example showing how to change `lightning_gpt/bench.py`.

```
    --- a/lightning_gpt/bench.py
    +++ b/lightning_gpt/bench.py
    @@ -3,7 +3,12 @@ import time
    from typing import Any, Callable, Dict, Iterable, List, Optional, Type, Union
    
    import torch
    -from lightning import CloudCompute, LightningFlow, LightningWork
    +from lightning_utilities import module_available
    +if module_available("lightning"):
    +    import lightning.pytorch as L
    +elif module_available("pytorch_lightning"):
    +    import pytorch_lightning as L
    +from lightning.app import CloudCompute, LightningFlow, LightningWork
    from lightning.app.components import LightningTrainerMultiNode
```

**3.** Also, we need to import `habana_frameworks.torch.core` and `import habana_frameworks.torch.hpu` to load necessary Habana libraries. In `lightning-GPT/train.py`, add the following:
 
```
    --- a/train.py
    +++ b/train.py
    @@ -1,13 +1,75 @@

    +try:
    +    import habana_frameworks.torch.core as htcore
    +    import habana_frameworks.torch.hpu as hthpu
    +except:
    +    print('INFO: no habana framework package installed')
```

**4.** In order to run the model on HPU, we need to use Habana customized class `HPUAccelerator`, `HPUDeepSpeedStrategy`, `HPUParallelStrategy`, and `DeepSpeedPrecisionPlugin`. 

4.1. Note that we have already imported `DeepSpeedPrecisionPlugin` in `lightning-GPT/train.py` in Step 1. We just need to import `HPUAccelerator`, `HPUDeepSpeedStrategy`, and `HPUParallelStrategy` as follows in `lightning-GPT/train.py`:

```
    --- a/train.py
    +++ b/train.py
    @@ -1,13 +1,75 @@

    +from lightning_habana.pytorch.accelerator import HPUAccelerator
    +from lightning_habana.pytorch.strategies import HPUDeepSpeedStrategy, HPUParallelStrategy
```

4.2. Change `lightning.Trainer` in `lightning-GPT/train.py` as follows:

```
    --- a/train.py
    +++ b/train.py

    @@ -80,21 +142,61 @@ def main(args):
            torch.set_float32_matmul_precision("high")
            callback_list.append(callbacks.CUDAMetricsCallback())
    
    -    trainer = L.Trainer.from_argparse_args(
    -        args,
    -        max_epochs=10,
    -        gradient_clip_val=1.0,
    -        callbacks=callback_list,
    -        accelerator="auto",
    +    trainer = L.Trainer(#.from_argparse_args(
    +        accelerator=HPUAccelerator(),#"auto",
            devices="auto",
    -        precision=16,
    +        strategy = HPUDeepSpeedStrategy(
    +            stage=2, #cfg.deepspeed_stage,
    +            logging_batch_size_per_gpu=1, #cfg.batch_size,
    +            cpu_checkpointing=True,
    +            allgather_bucket_size=5e8,
    +            reduce_bucket_size=5e8,
    +            pin_memory=True,
    +            contiguous_memory_optimization=False,
    +            process_group_backend="hccl"
    +            # add the option to load a config from json file with more deepspeed options
    +            # note that if supplied all defaults are ignored - model settings defaults this arg to None
    +            # config=cfg.deepspeed_cfg_file
    +        ) if args.strategy == "deepspeed" else  HPUParallelStrategy(
    +            bucket_cap_mb=125,
    +            gradient_as_bucket_view=True,
    +            static_graph=True
    +        ),
    +        callbacks=callback_list,
    +        accumulate_grad_batches=1,
    +        precision="bf16-mixed" if args.strategy == "deepspeed" else "16-mixed",#16,
    +        max_epochs=100,
    +        num_nodes=1,
    +        check_val_every_n_epoch=5000,
    +        val_check_interval=50,
    +        log_every_n_steps=10,
    +        limit_val_batches=10,
    +        max_steps=100,
    +        gradient_clip_val=1.0,
    +        plugins=[DeepSpeedPrecisionPlugin(precision="bf16-mixed")] if args.strategy == "deepspeed" else None,
        )
```

**5.** Use Habana customize optimizer `FusedAdamW` in `lightning_gpt/models.py` as follows:

```
    --- a/lightning_gpt/models.py
    +++ b/lightning_gpt/models.py
    import mingpt.model
    @@ -323,9 +329,12 @@ def _get_deepspeed_optimizer(
            return DeepSpeedCPUAdam(optim_groups, lr=learning_rate, betas=betas)
    
        elif fused_adam and _DEEPSPEED_AVAILABLE:
    -        from deepspeed.ops.adam import FusedAdam
    -        return FusedAdam(optim_groups, lr=learning_rate, betas=betas)
    +        from habana_frameworks.torch.hpex.optimizers import FusedAdamW
    +        return FusedAdamW(optim_groups, lr=learning_rate)
```

## Changelog
### 1.11.0
 - Import Lightning/Lightning-habana, Habana customized DeepSpeed and other necessary packages.
 - Add new arguments following DeepSpeed's requirements.
 - Import habana_frameworks.torch.core to load necessary Habana libraries.
 - Import Habana customized class ```HPUAccelerator```, ```HPUDeepSpeedStrategy```, ```HPUParallelStrategy``` and ```DeepSpeedPrecisionPlugin```, and add them to ```lightning.trainer```.
 - Add Habana customized optimizer ```FusedAdamW``` in ```lightning_gpt/models.py```.
 - Enable generation by changing functions ```from_tokens``` and ```to_tokens``` in ```lightning_gpt/data.py```.