Copyright (c) 2023 Habana Labs, Ltd. an Intel Company.  
Copyright (c) 2017, Pytorch contributors All rights reserved.
SPDX-License-Identifier: BSD-3-Clause


# Model Migration from GPU to the Intel&reg; Gaudi&reg; 2 AI Accelerator 

The GPU Migration toolkit simplifies migrating PyTorch models that run on GPU-based architecture to run on the Intel® Gaudi® AI accelerator. Rather than manually replacing Python API calls that have dependencies on GPU libraries with Gaudi-specific API calls, the toolkit automates this process so you can run your model with fewer modifications. 

The GPU Migration toolkit maps specific API calls from the Python libraries and modules listed below to the appropriate equivalents in the Intel Gaudi software:

* torch.cuda  
* Torch API with GPU related parameters. For example, torch.randn(device=”cuda”)  

The toolkit does not optimize the performance of the model, so further modifications may be required. For more details, refer to Model Performance Optimization Guide.

In this notebook we will demonstrate how to use the GPU Migration toolset on a ResNet50 model which is based on open source implementation of ResNet50.  

Refer to the [GPU Migration Toolkit](https://docs.habana.ai/en/latest/PyTorch/PyTorch_Model_Porting/GPU_Migration_Toolkit/GPU_Migration_Toolkit.html) for more information.  

In addition to this ResNet50 migration, there is a GPU Migration example on the Intel Gaudi GitHub page [here](https://github.com/HabanaAI/Model-References/tree/master/PyTorch/examples/gpu_migration)

### Enabling the GPU Migration Toolkit Summary

#### Set the library
Import the habana_frameworks.torch.gpu_migration package at the beginning of the primary Python script (main.py, train.py, etc.):
`import habana_frameworks.torch.gpu_migration`
Alternatively, you can use PT_HPU_GPU_MIGRATION=1 environment variable when running the primary Python script (main.py, train.py, etc.): `PT_HPU_GPU_MIGRATION=1 $PYTHON main.py`

#### Set the Mark Step
Add mark_step(). In Lazy mode, mark_step() must be added in all training scripts right after loss.backward() and optimizer.step().

`htcore.mark_step()`   Note that if your model is using torch.compile, this step is not needed. 

#### Running Migrated code and logging changes 
Make sure that any device selection argument passed to the script is configured as if the script is running on a GPU. For example, add --cuda or --device gpu in the runtime command of your model. This will guarantee that the GPU Migration toolkit accurately detects and migrates instructions.

You can enable the logging feature, included in the GPU Migration toolkit, by setting the `GPU_MIGRATION_LOG_LEVEL` environment variable like this example:   
`GPU_MIGRATION_LOG_LEVEL=3 PT_HPU_GPU_MIGRATION=1 $PYTHON main.py`

#### For More Information
For more information regarding the use and configuration of the GPU Migration Toolkit and its limitations, please refer to the documentation here (https://docs.habana.ai/en/latest/PyTorch/PyTorch_Model_Porting/GPU_Migration_Toolkit/GPU_Migration_Toolkit.htm).

In [None]:
# Start with the `exit()` command to restart the Python kernel to ensure that there are no other processes holding the Intel Gaudi Accelerator as you start to run this notebook.  You will see a warning that the kernel has died, this is expected.
exit()

In [None]:
import os

#Enable PT_HPU_LAZY_MODE=1
os.environ['PT_HPU_LAZY_MODE'] = '1'

#### Running the ResNet50 Example
The remainder of the notebook will show how the tool works with the [ResNet50 example](https://github.com/HabanaAI/Model-References/tree/master/PyTorch/examples/gpu_migration) from the GPU Migration examples in Model-References

In [None]:
%cd ~/Gaudi-tutorials/PyTorch/Single_card_tutorials
!git clone -b 1.21.0 https://github.com/habanaai/Model-References

#### Navigate to the model example to begin the run

In [None]:
%cd ~/Gaudi-tutorials/PyTorch/Single_card_tutorials/Model-References/PyTorch/examples/gpu_migration/computer_vision/classification/torchvision

#### Download dataset - OPTIONAL
To fully run this example you can download the Tiny ImageNet dataset.  It needs to be organized according to PyTorch requirements, and as specified in the scripts of [imagenet-multiGPU.torch](https://github.com/soumith/imagenet-multiGPU.torch).   You do NOT need to have the dataset loaded to see the Migration steps and logging.    

Please be patient, it takes a few minutes to unzip the dataset.

In [None]:
!wget --progress=bar:force http://cs231n.stanford.edu/tiny-imagenet-200.zip
!chmod 600 ./tiny-imagenet-200.zip
import os;os.makedirs("./datasets/", exist_ok=True)
!unzip -q ./tiny-imagenet-200.zip  -x "tiny-imagenet-200/test*" -d ./datasets/

#### Import GPU Migration Toolkit package and Habana Torch Library
Look into train.py, you will see in the first line that we will load the `gpu.migration` library which is already included in the Intel Gaudi Software: 

In [None]:
%%sh
cat -n train.py | head -n 21 | tail -n 18

#### Placing mark_step()
You will have to place the mark_step() function after the optimizer and loss.backward calculations

In [None]:
%%sh
cat -n train.py | head -n 51 | tail -n 14

#### Run the following command to start multi-HPU training.
We're now ready to run the training.  You will see that we've added the logging command at the beginning of the run: `GPU_MIGRATION_LOG_LEVEL=1` to show the output.   No other changes to the run command are needed.   As you see the training run is started, you will see the log files show exactly where the code changes are happening to change from GPU to Intel Gaudi, including the file name and location.

Look for the [context] and [hpu_match] in the log file to see where the code is changed.

Remember that if you do not download the dataset the training will not complete the execution, but you will see the GPU Migration changes in the logfile, this is the most important part.

```bash
GPU_MIGRATION_LOG_LEVEL=1 torchrun --nproc_per_node 1 train.py --batch-size=256 --model=resnet50 --device=cuda --data-path="./datasets/tiny-imagenet-200/" --workers=8 --epochs=1 --opt=sgd --amp
```

In [None]:
!GPU_MIGRATION_LOG_LEVEL=1 torchrun --nproc_per_node 1 train.py --batch-size=256 --model=resnet50 --device=cuda --data-path="./datasets/tiny-imagenet-200/" --workers=8 --epochs=1 --opt=sgd --amp

In [7]:
# Please be sure to run this exit command to ensure that the resources running on Intel Gaudi are released 
exit()