# AWS Machine Learning Purpose-built Accelerators Tutorial
## Learn how to use [AWS Trainium](https://aws.amazon.com/machine-learning/trainium/) and [AWS Inferentia](https://aws.amazon.com/machine-learning/inferentia/) with [Amazon SageMaker](https://aws.amazon.com/sagemaker/), to optimize your ML workload
## Part 2/3 - Finetuning a Bert model with SageMaker + [Hugging Face Optimum Neuron](https://huggingface.co/docs/optimum-neuron/index) on a Trainum instance

**SageMaker studio Kernel: PyTorch 1.13 Python 3.9 CPU - ml.t3.medium** 

In this tutorial, you'll learn how to kick-off a finetuning job on SageMaker, with [HF Optimum Neuron](https://huggingface.co/docs/optimum-neuron/index) on a [trn1 instance
](https://aws.amazon.com/ec2/instance-types/trn1/). HF Optimum Neuron is a framework that simplifies the training script and helps ML developers to create a portable code that can be reused in many scenarios, for instance: Different models, different tasks, distributed training (data parallel, tensor parallel, etc.). Also, Optimum Neuron helps you to compile your model and deploy to AWS Inferentia (learn more in the 3rd part of this tutorial). 

In section 02, you'll see how to extract metadata from the Optimum Neuron API and render a table with the current tested/supported models (similar models not listed there can also be compatible, but you need to check by yourself). This table is important for you to understand which models can be selected and fine-tuned in a simple way. However, before selecting a model for training, check a similar table in the notebook **Part 3** to see which models can be deployed to AWS Inferentia using HF Optimum Neuron. That way you can plan your end2end solution and start implementing it right now.

## 1) Install some required packages

In [None]:
%pip install -r requirements.txt

## 2) Supported models/tasks

Models with **[TP]** after the name support Tensor Parallelism

In [None]:
from IPython.display import Markdown, display

display(Markdown("../docs/optimum_neuron_models.md"))

## 3) Fine-tuning a model, using SageMaker and HF Optimum Neuron
We're training a Bert model as a text classifier to predict if an input email is SPAM or NOT. To adapt it for your own scenario, just change the following variables: **MODEL** and **TASK** using the table above as a reference.  
  - MODEL: name of the model available on the HF portal. Click on the desired "model name" in the table above to list all the options for that particular model.
  - TASK: copy desired the task (column name) from the table above. Make sure the model you selected supports that particular task, otherwise, you need to change your model.

**You need Hugging Face credentials and a custom repo** to run this sample. This configuration is required to store the cache files of your model. Just go to [huggingface.co] (huggingface.co/) and create and account, if needed. You also need to generate an **access token** and a new model repository.

Set **CUSTOM_CACHE_REPO** to the model repo you created for this training job, for instance: **user-name/model-name**. If you don't have a cache repo yet, just [follow the instructions in this page](https://huggingface.co/docs/optimum-neuron/guides/cache_system) and create one. Set **HF_TOKEN** to a valid Hugging Face access token generated in your account.

If you don't set **HF_CACHE_REPO** and **HF_TOKEN** your model will be recompiled every time you invoke the training job and it consumes some time. It is **HIGHLY** recommended to use the cache mechanism to optimize this step.

In [None]:
# Click on the "model name" in the table above to visualize which options of models you have to fine-tune
# i.e: If you click on bert, bert-base-uncased is an available option to select
MODEL="bert-base-uncased"
TASK="SequenceClassification"
HF_CACHE_REPO="aws-neuron/optimum-neuron-cache"
HF_TOKEN=None
assert len(MODEL)>0, "Please, use the table above to define a valid model name"
assert len(TASK)>0, "Please, use the table above to define a valid model task"

In [None]:
import os
import boto3
import sagemaker

print(sagemaker.__version__)
if not sagemaker.__version__ >= "2.146.0": print("You need to upgrade or restart the kernel if you already upgraded")

sess = sagemaker.Session()
role = sagemaker.get_execution_role()
bucket = sess.default_bucket()
region = sess.boto_region_name
account_id = boto3.client('sts').get_caller_identity().get('Account')

if not os.path.isdir('src'): os.makedirs('src', exist_ok=True)

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {bucket}")
print(f"sagemaker session region: {region}")

### 3.1) Training script that will be invoked by SageMaker

This training script makes use of HF Optimum Neuron API to simplify the process. [You can learn more here](https://huggingface.co/docs/optimum-neuron/quickstart). This script is intented to show how to prepare a training job and quickly fine-tune a model. Depending on your needs you'll need to adjust/modify this script.

In [None]:
!pygmentize src/train.py

In [None]:
!pygmentize src/requirements.txt

### 3.2) Defining a SageMaker Estimator
This object will help you to configure the training job and set the required hyperparameters + other config settings.

In [None]:
from sagemaker.pytorch import PyTorch

estimator = PyTorch(
    entry_point="train.py", # Specify your train script
    source_dir="src",
    role=role,
    sagemaker_session=sess,
    instance_count=1,
    instance_type='ml.trn1.2xlarge',
    disable_profiler=True,
    output_path=f"s3://{bucket}/output",
    image_uri=f"763104351884.dkr.ecr.{region}.amazonaws.com/pytorch-training-neuronx:1.13.1-neuronx-py310-sdk2.18.0-ubuntu20.04",
    
    # Parameters required to enable checkpointing
    # This is necessary for caching XLA HLO files and reduce training time next time    
    checkpoint_s3_uri=f"s3://{bucket}/checkpoints/{MODEL}",
    volume_size = 512,
    distribution={
        "torch_distributed": {
            "enabled": True
        }
    },
    environment={
        # Uncomment the following line to precompile the cache files
        # "RUN_NEURON_PARALLEL_COMPILE": "1"
        "OMP_NUM_THREADS": "1",
        "FI_EFA_FORK_SAFE": "1",        
        "NEURON_RT_STOCHASTIC_ROUNDING_EN": "1",        
        "MALLOC_ARENA_MAX":"80", # required to avoid OOM

        # Uncomment the following line if you defined a HF HUB cache repo
        "CUSTOM_CACHE_REPO": HF_CACHE_REPO
    },
    hyperparameters={
        "model_id": MODEL,
        "task": TASK,        
        "bf16": True,
        "zero_1": True,
        
        "learning_rate": 5e-5,
        "epochs": 1,
        "train_batch_size": 4,
        "eval_batch_size": 4,
        "max_sen_len": 256, # this needs to be aligned with the sentence len used in the data preparation

        # Uncomment this line if you have defined a valid HF_TOKEN
        #"hf_token": HF_TOKEN,
        
        # Uncomment and configure the following line to enable TP
        #"tensor_parallel_size": 8,        
    },
    metric_definitions=[        
        {"Name": "eval_loss", "Regex": ".eval_loss.:\S*(.*?),"},
        {"Name": "train_loss", "Regex": "'loss.:\S*(.*?),"},
        {"Name": "it_per_sec", "Regex": ",\S*(.*?)it.s."},
    ]
)
#if not HF_TOKEN is None and len(HF_TOKEN) > 0:
    
estimator.framework_version = '1.13.1' # workround when using image_uri

In [None]:
train_uri=f"s3://{bucket}/datasets/spam/train"
eval_uri=f"s3://{bucket}/datasets/spam/eval"
print(f"{train_uri}\n{eval_uri}")

In [None]:
estimator.fit({"train": train_uri, "eval": eval_uri})

In [None]:
with open("training_job_name.txt", "w") as f:
    f.write(estimator._current_job_name)

## 4) Now it is time to deploy our model

[Open Deployment/Inference on Inf2 Notebook](03_ModelInference.ipynb)  
[Open Deployment/Inference on Inf1 Notebook](03_ModelInferenceInf1.ipynb)