# AWS Machine Learning Purpose-built Accelerators Tutorial
## Learn how to use [AWS Trainium](https://aws.amazon.com/machine-learning/trainium/) and [AWS Inferentia](https://aws.amazon.com/machine-learning/inferentia/) with [Amazon SageMaker](https://aws.amazon.com/sagemaker/), to optimize your ML workload
## Part 2/3 - Finetuning a Bert model with SageMaker + [Hugging Face Optimum Neuron](https://huggingface.co/docs/optimum-neuron/index) on a Trainum instance

**SageMaker studio Kernel: PyTorch 1.13 Python 3.9 CPU - ml.t3.medium** 

In this tutorial, you'll learn how to kick-off a finetuning job on SageMaker, with [HF Optimum Neuron](https://huggingface.co/docs/optimum-neuron/index) on a [trn1 instance
](https://aws.amazon.com/ec2/instance-types/trn1/). HF Optimum Neuron is a framework that simplifies the training script and helps ML developers to create a portable code that can be reused in many scenarios, for instance: Different models, different tasks, distributed training (data parallel, tensor parallel, etc.). Also, Optimum Neuron helps you to compile your model and deploy to AWS Inferentia (learn more in the 3rd part of this tutorial). 

In section 02, you'll see how to extract metadata from the Optimum Neuron API and render a table with the current tested/supported models (similar models not listed there can also be compatible, but you need to check by yourself). This table is important for you to understand which models can be selected and fine-tuned in a simple way. However, before selecting a model for training, check a similar table in the notebook **Part 3** to see which models can be deployed to AWS Inferentia using HF Optimum Neuron. That way you can plan your end2end solution and start implementing it right now.

## 1) Install some required packages

In [None]:
%pip install -U optimum-neuron==0.0.8 onnx>=1.14.0

## 2) Supported models/tasks for Training

In [None]:
import re
import pandas as pd
from IPython.display import Markdown
from optimum.exporters.tasks import TasksManager
from optimum.exporters.neuron.model_configs import *
from optimum.neuron.distributed.parallelizers_manager import ParallelizersManager
from optimum.neuron.utils.training_utils import (
    _SUPPORTED_MODEL_NAMES,
    _SUPPORTED_MODEL_TYPES,
    _generate_supported_model_class_names
)

In [None]:
# retrieve supported models for Tensor Parallelism
tp_support = list(ParallelizersManager._MODEL_TYPE_TO_PARALLEL_MODEL_CLASS.keys())

# build compability table for training
data_training = {'Model': []}
for m in _SUPPORTED_MODEL_TYPES:
    if type(m) != str: m = m[0]
    if m=='gpt-2': m='gpt2' # fix the name
    model_id = len(data_training['Model'])
    model_link = f'<a target="_new" href="https://huggingface.co/models?sort=trending&search={m}">{m}</a>'
    data_training['Model'].append(f"{model_link} <font style='color: red;'><b>[TP]</b></font>" if m in tp_support else model_link)
    tasks = [re.sub(r'.+For(.+)', r'\1', t) for t in set(_generate_supported_model_class_names(m)) if not t.endswith('Model')]
    for t in tasks:
        if data_training.get(t) is None: data_training[t] = [''] * len(_SUPPORTED_MODEL_TYPES)
        data_training[t][model_id] = f'<a target="_new" href="https://huggingface.co/docs/transformers/model_doc/{m}#transformers.{m.title()}For{t}">api</a>'        
df_training = pd.DataFrame.from_dict(data_training).set_index('Model')


In each new release of HF Optimum Neuron, new models will be included in this list. So, it is expected to see different values for the following tables when you upgrade the library.  

Models with **[TP]** after the name support Tensor Parallelism.

In [None]:
Markdown(df_training.to_markdown())

## 3) Fine-tuning a model, using SageMaker and HF Optimum Neuron
We're training a Bert model as a text classifier to predict if an input email is SPAM or NOT. To adapt it for your own scenario, just change the following variables: **MODEL** and **TASK** using the table above as a reference.  
  - MODEL: name of the model available on the HF portal. Click on the desired "model name" in the table above to list all the options for that particular model.
  - TASK: copy desired the task (column name) from the table above. Make sure the model you selected supports that particular task, otherwise, you need to change your model.

**You need Hugging Face credentials and a custom repo** to run this sample. This configuration is required to store the cache files of your model. Just go to [huggingface.co] (huggingface.co/) and create and account, if needed. You also need to generate an **access token** and a new model repository.

Set **CUSTOM_CACHE_REPO** to the model repo you created for this training job, for instance: **user-name/model-name**. If you don't have a cache repo yet, just [follow the instructions in this page](https://huggingface.co/docs/optimum-neuron/guides/cache_system) and create one. Set **HF_TOKEN** to a valid Hugging Face access token generated in your account.

If you don't set **HF_CACHE_REPO** and **HF_TOKEN** your model will be recompiled every time you invoke the training job and it consumes some time. It is **HIGHLY** recommended to use the cache mechanism to optimize this step.

In [None]:
# Click on the "model name" in the table above to visualize which options of models you have to fine-tune
# i.e: If you click on bert, bert-base-uncased is an available option to select
MODEL="bert-base-uncased"
TASK="SequenceClassification"
HF_CACHE_REPO=""
HF_TOKEN=""
assert len(MODEL)>0, "Please, use the table above to define a valid model name"
assert TASK in df_training.columns, "Please, use the table above to define a valid task name"
assert len(HF_CACHE_REPO)>0, "Please, set a valid Hugging Face CACHE REPO"
assert len(HF_TOKEN)>0, "Please, set a valid HF access token"

In [None]:
import os
import boto3
import sagemaker

print(sagemaker.__version__)
if not sagemaker.__version__ >= "2.146.0": print("You need to upgrade or restart the kernel if you already upgraded")

sess = sagemaker.Session()
role = sagemaker.get_execution_role()
bucket = sess.default_bucket()
region = sess.boto_region_name
account_id = boto3.client('sts').get_caller_identity().get('Account')

if not os.path.isdir('src'): os.makedirs('src', exist_ok=True)

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {bucket}")
print(f"sagemaker session region: {region}")

### 3.1) Training script that will be invoked by SageMaker

This training script makes use of HF Optimum Neuron API to simplify the process. [You can learn more here](https://huggingface.co/docs/optimum-neuron/quickstart). This script is intented to show how to prepare a training job and quickly fine-tune a model. Depending on your needs you'll need to adjust/modify this script.

In [None]:
%%writefile src/train.py
import os
import sys
import torch
import random
import logging
import argparse
import evaluate
import importlib
import traceback
import transformers

from huggingface_hub import login
from datasets import load_from_disk
from transformers import AutoTokenizer, is_torch_tpu_available

if __name__ == "__main__":
    parser = argparse.ArgumentParser()

    # hyperparameters sent by the client are passed as command-line arguments to the script.
    parser.add_argument("--epochs", type=int, default=3)
    parser.add_argument("--train_batch_size", type=int, default=32)
    parser.add_argument("--eval_batch_size", type=int, default=64)
    parser.add_argument("--warmup_steps", type=int, default=500)
    parser.add_argument("--tensor_parallel_size", type=int, default=1)
    parser.add_argument("--model_id", type=str, required=True)
    parser.add_argument("--zero_1", type=bool, default=False)
    parser.add_argument("--task", type=str, default="")
    parser.add_argument("--collator", type=str, default="DefaultDataCollator")
    parser.add_argument("--learning_rate", type=float, default=5e-5)
    parser.add_argument("--weight_decay", type=float, default=0.01)
    parser.add_argument("--bf16", type=bool, default=True)

    # hugging face hub
    parser.add_argument("--hf_token", type=str, default='')

    # Data, model, and output directories
    parser.add_argument("--output_data_dir", type=str, default=os.environ["SM_OUTPUT_DATA_DIR"])
    parser.add_argument("--model_dir", type=str, default=os.environ["SM_MODEL_DIR"])
    parser.add_argument("--n_neurons", type=str, default=os.environ["SM_NUM_NEURONS"])
    parser.add_argument("--training_dir", type=str, default=os.environ["SM_CHANNEL_TRAIN"])
    parser.add_argument("--eval_dir", type=str, default=os.environ.get("SM_CHANNEL_EVAL", None))

    parser.add_argument('--checkpoints-path', type=str, help="Path where we'll save the cache", default='/opt/ml/checkpoints')
    try:
        args, _ = parser.parse_known_args()

        cache_dir = os.path.join(args.checkpoints_path, args.model_id)
        os.makedirs(cache_dir, exist_ok=True)

        if len(args.hf_token) > 0:
            print("HF token defined. Logging in...")
            login(token=args.hf_token)

        # Set up logging
        logger = logging.getLogger(__name__)

        logging.basicConfig(
            level=logging.getLevelName("INFO"),
            handlers=[logging.StreamHandler(sys.stdout)],
            format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
        )
        os.environ['TOKENIZERS_PARALLELISM'] = 'false'
        os.environ['NEURON_CC_FLAGS']=f"--cache_dir={cache_dir} --retry_failed_compilation"

        from optimum.neuron import NeuronTrainer as Trainer
        from optimum.neuron import NeuronTrainingArguments as TrainingArguments

        Collator = eval(f"transformers.{args.collator}")
        AutoModel = eval(f"transformers.AutoModel{'For' + args.task if len(args.task) > 0 else ''}")

        train_dataset=load_from_disk(args.training_dir)
        eval_dataset=load_from_disk(args.eval_dir) if not args.eval_dir is None else None

        tokenizer = AutoTokenizer.from_pretrained(args.model_id)
        tokenizer.pad_token = tokenizer.eos_token

        data_collator = Collator(return_tensors="pt")
        model = AutoModel.from_pretrained(args.model_id, trust_remote_code=True) # TODO: add a hyperparameter with model params
        model.config.output_attentions == True

        def preprocess_logits_for_metrics(logits, labels):
            if isinstance(logits, tuple):
                # Depending on the model and config, logits may contain extra tensors,
                # like past_key_values, but logits always come first
                logits = logits[0]
            return logits.argmax(dim=-1)

        metric = evaluate.load("accuracy", module_type="metric")
        def compute_metrics(eval_pred):
            preds,labels = eval_pred
            if len(preds.shape) == 1: preds = torch.IntTensor(preds).reshape(1,-1)
            if len(labels.shape) == 1: labels = torch.IntTensor(labels).reshape(1,-1)    
            for pred,label in zip(preds,labels):
                metric.add_batch(predictions=pred, references=label)
            return metric.compute()

        training_args = TrainingArguments(
            evaluation_strategy="epoch",
            learning_rate=args.learning_rate,
            weight_decay=args.weight_decay,
            bf16=args.bf16,
            num_train_epochs=args.epochs,
            output_dir=args.model_dir,
            overwrite_output_dir=True,
            tensor_parallel_size=args.tensor_parallel_size,
            zero_1=args.zero_1,

            per_device_train_batch_size=args.train_batch_size,
            per_device_eval_batch_size=args.eval_batch_size,
            logging_dir=f"{args.output_data_dir}/logs",
            logging_strategy="steps",
            logging_steps=500,
            save_strategy="epoch",
            save_total_limit=2,
        )

        trainer = Trainer(
            model=model,
            args=training_args,
            train_dataset=train_dataset,
            compute_metrics=compute_metrics,
            preprocess_logits_for_metrics=preprocess_logits_for_metrics,
            eval_dataset=eval_dataset,
            data_collator=data_collator,
        )
        trainer.train()

        if not args.eval_dir is None:
            eval_results = trainer.evaluate()

            # writes eval result to file which can be accessed later in s3 ouput
            with open(os.path.join(args.output_data_dir, "eval_results.txt"), "w") as writer:
                print(f"***** Eval results *****")
                for key, value in sorted(eval_results.items()):
                    writer.write(f"{key} = {value}\n")

        # Saves the model to s3
        trainer.save_model()

    except Exception as e:
        print(traceback.format_exc())
        sys.exit(1)
    finally:
        print("Done! ", sys.exc_info())
        sys.exit(0)

In [None]:
%%writefile src/requirements.txt
datasets
evaluate
accelerate
torchvision
scikit-learn
neuronx-distributed
## 4.30 or higher is required
transformers==4.30.0
optimum-neuron==0.0.8

### 3.2) Defining a SageMaker Estimator
This object will help you to configure the training job and set the required hyperparameters + other config settings.

In [None]:
from sagemaker.pytorch import PyTorch

estimator = PyTorch(
    entry_point="train.py", # Specify your train script
    source_dir="src",
    role=role,
    sagemaker_session=sess,
    instance_count=1,
    instance_type='ml.trn1.2xlarge',
    disable_profiler=True,
    output_path=f"s3://{bucket}/output",
    image_uri=f"763104351884.dkr.ecr.{region}.amazonaws.com/pytorch-training-neuronx:1.13.1-neuronx-py310-sdk2.12.0-ubuntu20.04",
    
    # Parameters required to enable checkpointing
    # This is necessary for caching XLA HLO files and reduce training time next time    
    checkpoint_s3_uri=f"s3://{bucket}/checkpoints",
    volume_size = 512,
    distribution={
        "torch_distributed": {
            "enabled": True
        }
    },
    environment={
        "XLA_USE_BF16": "1",
        "OMP_NUM_THREADS": "2",
        "FI_EFA_FORK_SAFE": "1",        
        "NEURON_RT_STOCHASTIC_ROUNDING_EN": "1",
        
        "CUSTOM_CACHE_REPO": HF_CACHE_REPO
    },
    hyperparameters={
        "model_id": MODEL,
        "task": TASK,        
        "bf16": True,
        
        "learning_rate": 5e-5,
        "epochs": 25,
        "train_batch_size": 4,
        "eval_batch_size": 4,
        
        "hf_token": HF_TOKEN,
        
        #"collator": "DataCollatorForLanguageModeling",
        #"tensor_parallel_size": 8,        
    },
    metric_definitions=[        
        {"Name": "eval_loss", "Regex": ".eval_loss.:\S*(.*?),"},
        {"Name": "train_loss", "Regex": ".train_loss.:\S*(.*?),"},
        {"Name": "it_per_sec", "Regex": ",\S*(.*?)it.s."},
    ]
)
estimator.framework_version = '1.13.1' # workround when using image_uri

In [None]:
train_uri=f"s3://{bucket}/datasets/spam/train"
eval_uri=f"s3://{bucket}/datasets/spam/eval"
print(f"{train_uri}\n{eval_uri}")

In [None]:
estimator.fit({"train": train_uri, "eval": eval_uri})

## 4) Now it is time to deploy our model

[Open Deployment/Inference Notebook](03_ModelInference.ipynb)