# Training and Deploying Huggingface Models on Sagemaker

For this tutorial it is recommened to use 1 GPU to speed up processes, this notebooks was run the machinetype ml.p3.2xlarge.

This tutorial will focus on utilizing huggingface hub which is a repository for user to share and download machine learning models, datasets, and demos. AWS has partnerd with huggingface to allow users to utilize these resources without the need to manually create a account or token with hugging face. All resources are avaiable using the sagemaker.huggingface API.

For this tutorial we will load in a model and dataset from huggingface and train and test our model before deploying it on Sagemaker. The model we will be deploying is Flan T5 and the datasets is [ccdv/pubmed-summarization](https://huggingface.co/datasets/ccdv/pubmed-summarization).

In [None]:
#Enable GPU to be in persistant mode
!nvidia-smi -pm 1

### Install Tools

Huggingface **transformers** are an open-source framwork that allows you to utilize APIs and tools to download pretrained models, set hyperparameters, tokenize datasets, and further tune them to suite your needs. Here we are updating Sagemaker as well as installing the transformers package and **datasets** so that we can have access to huggingface datasets and as a bonus we are adding the S3 feature to help download datasets that may already be in a S3 bucket.

In [None]:
!pip install "sagemaker" "transformers" "datasets[s3]" --upgrade

The next set of tools we will install are **sentencepiece** which is a unsupervised text tokenizer and **accelerate** another huggingface tool that allows pytorch models to run on multiple GPUs.

In [None]:
!pip install "sentencepiece" "accelerate" --upgrade

### Set up your Sagemaker Session

The following commands have Sagemaker create a session that will automatically create a bucket which will store our training and testing datasets, extract our role id, and region both will be used later for hypertuning.

In [None]:
import sagemaker

sess = sagemaker.Session()
sagemaker_session_bucket = None
if sagemaker_session_bucket is None and sess is not None:
    sagemaker_session_bucket = sess.default_bucket()

role = sagemaker.get_execution_role()
sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")

### Download your dataset from Huggingface

We will be downloading huggingface dataset 'ccdv/pubmed-summarization' which contains article titles and their abstracts which will help train our model to summarize the scientific articles. Once the dataset is loaded we'll split the data into test and train datasets.

In [None]:
from sagemaker.huggingface import HuggingFaceModel

In [None]:
from datasets import load_dataset

# load dataset
train_dataset, test_dataset = load_dataset("ccdv/pubmed-summarization", split=["train", "test"])


## Finetuning our Model Locally

Now that we have our datasets we can upload our model which will be the small version of Flan T5.

**Flan T5** is a text-to-text generation model and an advancement to the original T5 model and can be run on both CPUs and GPUs. **Text-to-text** is a method of creating text by using a neural network to generate new text from a given input. These T5 models can be fine-tuned for various zero shot NLP tasks that we have seen and heard of before: text classification, summarization, translation, and question-answering. Text-to-text is not to be confused by text2text generation which is a earlier version of T5 that is designed specifically for sequence-to-sequence tasks, such as machine translation and text generation and is limited to these task where as T5 models are more flexiable due to the wider range of NPL tasks they can execute.

Because it is a seq2seq class model we will be using the transformer **AutoModelForSeq2Seq** to help find a load our pretrained model architecture. Then we will assign an **AutoTokenizer** to preprocess the text of our inputs (the test and train datasets) into an array of numbers.

In [None]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model_name="google/flan-t5-small"

model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Now that we have loaded the architecture of our model and configured it to tokenize our inputs we can now implement a tokenization functions to start processing our datasets.

The function below will tokenize each row of our dataset based on the 'article' column. now that we have our function the next step is to implement the **map** function to interate our **tokenize function** to our loaded datasets.Then the last step will be to set our data format to be suitable for Pytorch. As you can see there are three columns represented in the dataset:
- **input_ids:** ID for each token, as each text is broken up into sequences (which can be words or subwords) and converted to tokens within our dataset they are assign an ID.
- **attention_masks:** Tokens that should be ignored by the model usually represented by a 0. Masking can be done when some sequences are not the same length so they can not belong in the same tensor and need to be padded.
- **abstracts:** The new name of the abstract column, which is the column we are implementing the new Pytorch format 

In [None]:
# create tokenization function
def tokenize(batch):
    return tokenizer(batch["article"], padding="max_length", truncation=True)


# tokenize train and test datasets
train_dataset = train_dataset.map(tokenize, batched=True)
test_dataset = test_dataset.map(tokenize, batched=True)

# set dataset format for PyTorch
train_dataset =  train_dataset.rename_column("abstract", "abstracts")
train_dataset.set_format("torch", columns=["input_ids", "attention_mask", "abstracts"])
test_dataset = test_dataset.rename_column("abstract", "abstracts")
test_dataset.set_format("torch", columns=["input_ids", "attention_mask", "abstracts"])

The first step to training our model other than setting up our datasets is to set our **hyperparameters**. Hyperparameters depend on your training script and for this one we need to identify our model, the location of our train and test files, etc. iN this case we are using a one created by Hugging Face.

In [None]:
from transformers import TrainingArguments

training_args = TrainingArguments(output_dir="test_trainer")

Next create setting to evaluate the models accuracy.

In [None]:
import numpy as np
import evaluate

metric = evaluate.load("accuracy")

In [None]:
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

In [None]:
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(output_dir="test_trainer", evaluation_strategy="epoch")

Finally we can train our model!

In [None]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    compute_metrics=compute_metrics,
)

In [None]:
trainer.train()

### Setting up our Datasets for Training 

We are using the training script run_summarization.py which weill help train our Flan T5 model to summarize our pubmed datasets. To pass inputs to this script we first need to convert our datasets into cvs formats and then push them into our S3 bucket with the help from the boto3 package. 

In [None]:
from io import BytesIO
import boto3

#convert train dataset to csv and push to S3 bucket
csv_buffer = BytesIO()
train_dataset.to_csv(csv_buffer)
s3_resource = boto3.resource('s3')
s3_resource.Object(f'{sess.default_bucket()}', 'train.csv').put(Body=csv_buffer.getvalue())

In [None]:
#convert test dataset to csv and push to S3 bucket
csv_buffer = BytesIO()
test_dataset.to_csv(csv_buffer)
s3_resource = boto3.resource('s3')
s3_resource.Object(f'{sess.default_bucket()}', 'test.csv').put(Body=csv_buffer.getvalue())

Here we will be saving the location of our datasets and group with a label called **data** which will be used when we execute the training of our model.

In [None]:
# save train_dataset to s3
training_input_path = f's3://{sess.default_bucket()}/train.csv'

# save test_dataset to s3
test_input_path = f's3://{sess.default_bucket()}/test.csv'

# Group the training and testing data since this will be the input to our hugging face estimator
data = {
    'train': training_input_path,
    'test': test_input_path
}

### Training our ModelFinetuning our Model via Vertex AI Training API

The first step to training our model other than setting up our datasets is to set our **hyperparameters**. Hyperparameters depend on your training script and for this one we need to identify our model, the location of our train and test files, if we want to train and test our model, etc other hyperparameters are defined on the huggingface transformers Github [here](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization).

our train andn test file locations are 'opt/ml/input/data/train' and 'opt/ml/input/data/test' because Sagemaker will train our model on a docker container and pull our files from our bucket and store then in a test and train directory. It will then output a model file back into our S3 bucket by first storing it into the directory '/opt/ml/model'.

In [None]:
#location of our run_summarization.py training file, which will be downloaded automatically
git_config = {'repo': 'https://github.com/huggingface/transformers.git','branch': 'v4.26.0'} # v4.6

In [None]:
# hyperparameters, which are passed into the training job
hyperparameters={'per_device_train_batch_size': 2,
                 'per_device_eval_batch_size': 4,
                 'model_name_or_path': 'google/flan-t5-small',
                 'train_file': '/opt/ml/input/data/train/train.csv',
                 'test_file':'/opt/ml/input/data/test/test.csv',
                 'text_column':'article',
                 'summary_column':'abstracts',
                 #'source_prefix': "summarize: ",
                 'do_train': True,
                 'do_eval': False,
                 'do_predict': True,
                 'predict_with_generate': True,
                 'output_dir': '/opt/ml/model',
                 'num_train_epochs': 3,
                 'learning_rate': 5e-5,
                 'seed': 7,
                 'fp16': True,
                 }

To help things move faster we will enable dataparallel which will break up the training tasks and run them in parallel. This does require we run our training on more then one instance and limits the machinetypes we can use:
ml.p3.16xlarge, ml.p3dn.24xlarge, ml.p4d.24xlarge, ml.p4de.24xlarge

**Note:** If you choose not to use the distributor commment out that line of code and the line of code that says "dictribute=distribute" in the huggingface estimator set up.

now we can set up our huggingface estimator which is a Sagemaker managed execution environment meaning the docker contianer that we mentioned before. The container will run our training script on our model utilizing a machine type of our choosing and pass our hyperparameters to the container as well. In the end the estimator will create a huggingface directory in the default bucket and output a model.tar.gz file which we can deploy to a endpoint.

In [None]:
# configuration for running training on smdistributed Data Parallel
distribution = {'smdistributed':{'dataparallel':{ 'enabled': True }}}

from sagemaker.huggingface import HuggingFace

# create the Estimator
huggingface_estimator = HuggingFace(
      entry_point='run_summarization.py', # script
      source_dir='./examples/pytorch/summarization', # relative path to example
      git_config=git_config,
      instance_type='ml.p4d.24xlarge',
      checkpoint_s3_uri=f's3://{sess.default_bucket()}/checkpoints1',
      #checkpoint_local_path='/opt/ml/checkpoints',
      instance_count=2,
      transformers_version='4.26.0',
      pytorch_version='1.13.1',
      py_version='py39',
      role=role,
      hyperparameters = hyperparameters,
      distribution = distribution
)

Start the training which we can also monitor and view logs on the console by going to `Sagemaker > Training > Training Jobs.`

**Warning:** If you recieve a **ResourceLimitExceeded** error it's because there are not enough resources on AWS to use this instance at the moment. To solve this error either try another instance type or try running the training again to see if resources have become available.

In [None]:
# starting the train job
huggingface_estimator.fit(data)

### Deploy the Model

Here we are creating a endpoint and deploying our model to said endpoint the next step will be to feed the model some inputs and check that it produces a accurate and consise summary.

We are deploying our enpoint using 1 GPU which can take 20min to run, feel free to try out other machine types that utilize more GPUs.

In [None]:
predictor = huggingface_estimator.deploy(initial_instance_count=1, instance_type="ml.g4dn.xlarge")

**Optional** 

If your model takes a long time to train you can come back to your notebook later without worrying about it stopping. If you choose to do this the following code is another way to obtain our model.tar.gz file from your bucket and deploy it. Remember you can monitor your training on the console by going to `Sagemaker > Training > Training Jobs.`

Sometimes you need to search in your default bucket to look for your model.tar.gz file it will be in one of the directories that says 'huggingface-pytorch-training'. We are deploying our enpoint using 1 GPU which can take 20min to run, feel free to try out other machine types that utilize more GPUs.

In [None]:
from sagemaker.huggingface.model import HuggingFaceModel
#huggingface directory that holds your model.tar.gz file
huggingface_directory= "<enter in the directory name>"

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data=f's3://{sess.default_bucket()}/{huggingface_directory}/output/model.tar.gz',  # path to your trained SageMaker model
   role=role,                                            # IAM role with permissions to create an endpoint
   transformers_version="4.26",                           # Transformers version used
   pytorch_version="1.13",                                # PyTorch version used
   py_version='py39',                                    # Python version used
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.g4dn.xlarge"
)

### Submit Inputs and Parameters to the Model

Now we can pass in fomr text for our model to summarize. Below you will see that we have provided a paragraph about SARS-CoV-2 as our prompt, we also have some parameters that we specify to further tune our model to get a consise summary of what our prompt is about.

- **Max_Length:** Max number of words to generate.
- **Num_Return_Sequences:** Number of different outputs to generate. For our example we want one sentence or sequence.
- **Temperature:** Controls randomness, higher values increase diversity meaning a more unique response make the model to think harder. Must be a number from 0 to 1.
- **Top_p (nucleus):** The cumulative probability cutoff for token selection. Lower values mean sampling from a smaller, more top-weighted nucleus. Must be a number from 0 to 1.
- **Top_k**: Sample from the k most likely next tokens at each step. Lower k focuses on higher probability tokens.This means the model choses the most probable words. Lower values eliminate fewer coherent words.

In [None]:
prompt =  """Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a
highly transmissible and pathogenic coronavirus that emerged in late 2019 and has 
caused a pandemic of acute respiratory disease, named ‘coronavirus disease 2019’ (COVID-19), 
which threatens human health and public safety. In this Review, we describe the basic virology of 
SARS-CoV-2, including genomic characteristics and receptor use, highlighting its key difference 
from previously known coronaviruses. We summarize current knowledge of clinical, epidemiological and 
pathological features of COVID-19, as well as recent progress in animal models and antiviral treatment 
approaches for SARS-CoV-2 infection. We also discuss the potential wildlife hosts and zoonotic origin 
of this emerging virus in detail."""

payload = ({
    "inputs": prompt,
    "parameters": {"max_length": 2000, 
                   "num_return_sequences": 1, 
                   "temperature":0.6,
                   "top_k": 50, 
                   "top_p": 0.95,
                   "do_sample": True,
                  }
})
predictor.predict(payload)

**Warning:** Once you are done don't forget to delete your endpoint, model, buckets, and shutdown or delete your Sagemaker notebook to avoid additional charges!

In [None]:
predictor.delete_model()
predictor.delete_endpoint()