# ResNet50 training - Pytorch [Beta PyTorch 2.1] 
This notebook shows how to fine-tune a pretrained ResNet50 Pytorch model with AWS Trainium (trn1 instances) using NeuronSDK.
The original implementation is provided by torchvision.

The example has 2 stages:
1. First compile the model using the utility `neuron_parallel_compile` to compile the model to run on the AWS Trainium device.
1. Run the fine-tuning script to train the model based on image classification task. The training job will use 32 workers with data parallel to speed up the training.

It has been tested and run on trn1.32xlarge instance

**Reference:** 

https://pytorch.org/vision/main/models/generated/torchvision.models.resnet50.html

## 1) Install dependencies

Verify that this Jupyter notebook is running the Python kernel environment that was set up according to the [PyTorch Installation Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/setup/torch-neuronx.html#setup-torch-neuronx). You can select the kernel from the 'Kernel -> Change Kernel' option on the top of this Jupyter notebook page.

In [None]:
#Install Neuron Compiler and Neuron/XLA packages
%pip install -U "tensorboard" "timm" torchvision==0.16.*
# use --force-reinstall if you're facing some issues while loading the modules
# now restart the kernel again

## 2) Set the parameters

In [None]:
# use --model-type=cnn-training to gain the best performance
env_var_options = "NEURON_RT_ASYNC_EXEC_MAX_INFLIGHT_REQUESTS=2  " + \
    "NEURON_CC_FLAGS=\'--cache_dir=./compiler_cache --model-type=cnn-training\'"
num_workers = 32
learning_rate = 0.001
dataloader_num_workers = 2
device_prefetch_size = 2
host_to_device_transfer_threads = 4
num_epochs = 10

In [None]:
model_name = "resnet50"
batch_size = 16

## 3) Download CIFAR10 dataset

In [None]:
!wget -N https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz .
!tar xfvz cifar-10-python.tar.gz

## 4) Compile the model with neuron_parallel_compile

In [None]:
%%time
import subprocess

print("Compile model")
# set epochs to 2 to reduce the time for tracing training graphs
COMPILE_CMD = f"""
   {env_var_options} neuron_parallel_compile torchrun --nproc_per_node={num_workers}
   run_image_classification.py
      --model {model_name}
      --platform torchvision
      --pretrained
      --num_epochs 2
      --batch_size {batch_size}
      --pretrained
      --lr {learning_rate}
      --drop_last
   """.replace('\n', '')

print(f'Running command: \n{COMPILE_CMD}')
if subprocess.check_call(COMPILE_CMD,shell=True):
   print("There was an error with the compilation command")
else:
   print("Compilation Success!!!")

## 5) Compile and Fine-tune the model

In [None]:
%%time
print("Train model")
RUN_CMD = f"""
   {env_var_options} torchrun --nproc_per_node={num_workers}
   run_image_classification.py
      --model {model_name}
      --platform torchvision
      --pretrained
      --num_epochs {num_epochs}
      --batch_size {batch_size}
      --pretrained
      --lr {learning_rate}
      --do_eval
      --drop_last
   """.replace('\n', '')

print(f'Running command: \n{RUN_CMD}')
if subprocess.check_call(RUN_CMD,shell=True):
   print("There was an error with the fine-tune command")
else:
   print("Fine-tune Successful!!!")