# T5-Large - Pytorch
This notebook shows how to fine-tune a "T5-Large" PyTorch model with AWS Trainium (trn1 instances) using NeuronSDK. The original implementation is provided by HuggingFace.

The example has 2 stages:
1. First compile the model using the utility `neuron_parallel_compile` to compile the model to run on the AWS Trainium device.
1. Run the fine-tuning script to train the model based on the associated task (e.g. sst2). The training job will use 32 workers with data parallel to speed up the training.

It has been tested and run on a trn1.32xlarge

**Reference:** https://huggingface.co/t5-large

Verify that this Jupyter notebook is running the Python kernel environment that was set up according to the [PyTorch Installation Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/setup/torch-neuronx.html#setup-torch-neuronx). You can select the kernel from the 'Kernel -> Change Kernel' option on the top of this Jupyter notebook page.

## 1) Install dependencies

In [None]:
%pip install -U optimum-neuron==0.0.15 accelerate==0.23.0 datasets>=1.8.0 sentencepiece!=0.1.92 rouge-score nltk py7zr evaluate
# now restart the kernel

## 2) Set the parameters

In [None]:
model_name = "t5-large"
num_workers = 32
batch_size = 2
grad_accum = 1
max_source_length = 768
max_target_length = 200
learning_rate = 0.0001
dataset_name =  "cnn_dailymail"
dataset_config_name= "3.0.0"
num_train_epochs = 1
model_base_name = model_name

## 3) Compile the model with neuron_parallel_compile

In [None]:
import subprocess
COMPILE_CMD = f"""neuron_parallel_compile torchrun --nproc_per_node {num_workers} \
./run_summarization.py \
--model_name_or_path {model_name} \
--num_train_epochs {num_train_epochs} \
--max_steps 10 \
--max_train_samples 128 \
--do_train \
--learning_rate {learning_rate} \
--per_device_train_batch_size {batch_size} \
--gradient_accumulation_steps {grad_accum} \
--report_to none \
--logging_steps 1 \
--save_total_limit 1 \
--bf16 \
--dataset_name {dataset_name} \
--dataset_config_name {dataset_config_name} \
--max_source_length {max_source_length} \
--max_target_length {max_target_length} \
--pad_to_max_length \
--predict_with_generate true \
--source_prefix 'summarize: ' \
--overwrite_output_dir \
--output_dir ./out/t5-large"""
print(f'Running command: \n{COMPILE_CMD}')
if subprocess.check_call(COMPILE_CMD,shell=True):
   print("There was an error with the compilation command")
else:
   print("Compilation Success!!!")

## 4) Fine-tune the model

In [None]:
RUN_CMD = f"""torchrun --nproc_per_node {num_workers} \
./run_summarization.py \
--model_name_or_path {model_name} \
--num_train_epochs {num_train_epochs} \
--max_train_samples 128 \
--max_eval_samples 128 \
--do_train \
--do_eval \
--learning_rate {learning_rate} \
--per_device_train_batch_size {batch_size} \
--per_device_eval_batch_size {batch_size} \
--gradient_accumulation_steps {grad_accum} \
--report_to none \
--logging_steps 1 \
--save_total_limit 1 \
--bf16 \
--dataset_name {dataset_name} \
--dataset_config_name {dataset_config_name} \
--max_source_length {max_source_length} \
--max_target_length {max_target_length} \
--pad_to_max_length \
--predict_with_generate true \
--source_prefix 'summarize: ' \
--overwrite_output_dir \
--output_dir ./out/run-t5-large """
print(f'Running command: \n{RUN_CMD}')
if subprocess.check_call(RUN_CMD,shell=True):
   print("There was an error with the fine-tune command")
else:
   print("Fine-tune Successful!!!")