# GPT2 - Pytorch
This notebook shows how to run training jobs for "gpt2" PyTorch model with AWS Trainium (trn1 instances) using NeuronSDK. The original implementation is provided by HuggingFace.

The example has 2 stages:
1. First compile the model using the utility `neuron_parallel_compile` to compile the model to run on the AWS Trainium device.
1. Run the script to train the model based on causal language modeling (CLM) loss. The training job will use 2 workers with data parallel to speed up the training. If you have a larger instance (trn1.32xlarge) you can increase the worker count to 8 or 32.

It has been tested and run on a trn1.2xlarge

**Reference:** https://huggingface.co/gpt2

## 1) Install dependencies

In [None]:
# Set Pip repository  to point to the Neuron repository
%pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com
# now restart the kernel

In [None]:
#Install Neuron Compiler and Neuron/XLA packages
%pip install -U torch-neuronx=="1.11.0.1.*" "numpy<=1.20.0" "protobuf<4" "transformers==4.16.2" datasets scikit-learn
# use --force-reinstall if you're facing some issues while loading the modules
# now restart the kernel again

## 2) Set the parameters

In [None]:
# Parameters
model_name = "gpt2"
extra_pip_packages = ""
extra_yum_packages = ""
env_var_options = "XLA_USE_BF16=1 NEURON_CC_FLAGS=\"--cache_dir=./compiler_cache --model-type=transformer\""
num_workers = 2
task_name = "clm"
dataset_name = "wikitext"
dataset_config_name = "wikitext-2-raw-v1"
work_dir = "/home/ec2-user/language_modeling"
transformers_version = "4.16.2"
model_base_name = "gpt2"

## 3) Compile the model with neuron_parallel_compile

In [None]:
import subprocess
print("Compile model")
COMPILE_CMD = f"""{env_var_options} neuron_parallel_compile torchrun --nproc_per_node={num_workers} ./run_clm.py \
    --model_name_or_path {model_name} \
    --dataset_name {dataset_name} \
    --dataset_config_name {dataset_config_name} \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --do_train \
    --overwrite_output_dir \
    --output_dir {model_base_name}-{task_name}"""

print(f'Running command: \n{COMPILE_CMD}')
if subprocess.check_call(COMPILE_CMD,shell=True):
   print("There was an error with the compilation command")
else:
   print("Compilation Success!!!")


## 4) Train the model

In [None]:
print("Train model")
RUN_CMD = f"""{env_var_options} torchrun --nproc_per_node={num_workers} run_clm.py \
    --model_name_or_path {model_name} \
    --dataset_name {dataset_name} \
    --dataset_config_name {dataset_config_name} \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --do_train \
    --do_eval \
    --overwrite_output_dir \
    --output_dir {model_base_name}-{task_name}"""

print(f'Running command: \n{RUN_CMD}')
if subprocess.check_call(RUN_CMD,shell=True):
   print("There was an error with the fine-tune command")
else:
   print("Fine-tune Successful!!!")