### Fine tuning & deploying Flan-T5-Large to Amazon Bedrock using Custom Model Import

This notebook covers the step by step process of fine tuning a [FLAN-T5 large](https://huggingface.co/google/flan-t5-large) mode and deploying it using Bedrock's [Custom Model Import](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-import-model.html) feature. 
The fine tuning data we will be using is based on medical terminology this data can be found on HuggingFace [here](https://huggingface.co/datasets/gamino/wiki_medical_terms). In many discplines, industry specific jargon is used and an LLM may not have the correct understanding of words, context or abbreviations. By fine tuning a model on medical terminology in this case, the LLM is given the ability to understand specific jargon and answer questions the user might have. 

This notebook will use the HuggingFace Transformers library to fine tune FLAN-T5-Large. due to the fine tuning process being "local" you can run this notebook anywhere as long as the proper compute is available. This notebook was originally created in Sagemaker Studio using a "ml.g5.16xlarge" instance.

The resulting files are imported into Amazon Bedrock via custom model import

WARNING: This method of Custom Model Import will only work with "FLAN-t5-large". "FLAN-t5-small" is incompatible with Bedrock Custom Model Import in its current state. This is due to the number of heads in the model needing to be a multiple of 4, due to the model needing to be sharded accordingly in the GPU. the number of heads for FLAN-t5-small is 6. this can be checked in the model's config.json file under the parameter "num_heads" 

Below is an overview of the architecture we will cover in this notebook:

![Notebook Architecture](./images/notebook-architecture.jpg "Notebook Architecture")

### Installs & Imports 

we will be utilizing HuggingFace Transformers library to pull a pretrained model from the Hub and fine tune it. The dataset we will be finetuning on will also be pulled from HuggingFace

In [3]:
%%capture
!pip install datasets --quiet
!pip install transformers[torch] --quiet
!pip install tokenizers --quiet
!pip install sentencepiece --quiet
!pip install huggingface_hub --quiet

In [4]:
import numpy as np
from datasets import load_dataset
from transformers import T5Tokenizer, DataCollatorForSeq2Seq
from transformers import T5ForConditionalGeneration, Seq2SeqTrainingArguments, Seq2SeqTrainer

### Pulling a pre-trained model & a datset from HuggingFace 

as mentioned at the beginning of the notebook, we will be fine tuning google's FLAN-t5-large from HuggingFace. This model is free to pull, there is no requirement for a HuggingFace account.

The dataset we are pulling can be looked at [here](https://huggingface.co/datasets/gamino/wiki_medical_terms). This is a dataset containing over 6000+ medical terms, and their wiki definitions. 

In [5]:
#Load model
model_name = "google/flan-t5-large"
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)
data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


In [6]:
#Load data from huggingface
ds = load_dataset("gamino/wiki_medical_terms")
ds

DatasetDict({
    train: Dataset({
        features: ['page_title', 'page_text', '__index_level_0__'],
        num_rows: 6861
    })
})

### Processing the dataset 

first we will take a percentage of the dataset to train out model. the test size in the cell below determines how much of the dataset is used for testing, while the rest is used for training (if you change the test size to 0.1 that means 10% is used for testing and 90% is used for training). Feel free to change this number. 

In [7]:
#Using 70% of the dataset to train
ds = ds["train"].train_test_split(test_size=0.3)
ds

DatasetDict({
    train: Dataset({
        features: ['page_title', 'page_text', '__index_level_0__'],
        num_rows: 4802
    })
    test: Dataset({
        features: ['page_title', 'page_text', '__index_level_0__'],
        num_rows: 2059
    })
})

We want to train our model in a Q & A format. This will allow the model to answer questions about specific medical terms.

There are two columns of interest to us: "page_title" (medical term) & "page_text" (definition/explanation)

in the cell below we will add "what is" as a prefix to the medical term to transform the title into the question. and we will keep "page_text" as is to act as the answer. 

In [8]:
#Process data to fit a Q & A format 
prefix = "What is "

# Define the preprocessing function
def preprocess_data(raw_data):
   # Transform page title into an answer:
   inputs = [prefix + term for term in raw_data["page_title"]]
   model_inputs = tokenizer(inputs, max_length=128, truncation=True)
   # keep explanations as is:
   definitions = tokenizer(text_target=raw_data["page_text"], 
                      max_length=512,         
                      truncation=True)
   model_inputs["labels"] = definitions["input_ids"]
   return model_inputs

In [9]:
# Map the preprocessing function across our dataset
ds = ds.map(preprocess_data, batched=True)

Map:   0%|          | 0/4802 [00:00<?, ? examples/s]

Map:   0%|          | 0/2059 [00:00<?, ? examples/s]

### Setting up the training job 

For this training job we are using a HuggingFace's Trainer class. we are using the Seq2Seq trainer since T5 is an encoder-decoder architecture. 

there are many hyperparameters that can be set when submitting a training job with the trainer class. many hyperparameters will have default values, and optimal values can vary based on what CPUs, GPUs, and dataset is being used to train the model.

HuggingFace has published guidance on optimizations that can be made when training on a single GPU [here](https://huggingface.co/docs/transformers/en/perf_train_gpu_one#gradient-accumulation)

In [10]:
# Set up training arguments
training_args = Seq2SeqTrainingArguments(
   output_dir="./results",
   eval_strategy="steps",
   learning_rate=4e-4,
   per_device_train_batch_size=2,
   per_device_eval_batch_size=2,
   weight_decay=0.01,
   save_total_limit=3,
   num_train_epochs=1, #can increase this to improve fine tuning results (will increase training time)
   predict_with_generate=True,
   push_to_hub=False,
    gradient_accumulation_steps=2,
    eval_accumulation_steps=2
)

In [11]:
#Setup Trainer 
trainer = Seq2SeqTrainer(
   model=model,
   args=training_args,
   train_dataset=ds["train"],
   eval_dataset=ds["test"],
   tokenizer=tokenizer,
   data_collator=data_collator
)

In [12]:
#Empty pytorch cache
import torch
torch.cuda.empty_cache()

### Submitting the training job 

On this GPU with 1 epoch set, training should take roughly 30 mins. This is for a fast finetuning job to get to the main idea of this notebook - to showcase Bedrock Custom Model importing. If you require a higher performance for your fine tuned model please increase the Epoch's and adjust other hyperparameters as needed. 

In [13]:
trainer.train()

Step,Training Loss,Validation Loss
500,2.611,2.358274
1000,2.504,2.300562


TrainOutput(global_step=1200, training_loss=2.541583786010742, metrics={'train_runtime': 1523.1069, 'train_samples_per_second': 3.153, 'train_steps_per_second': 0.788, 'total_flos': 235554486976512.0, 'train_loss': 2.541583786010742, 'epoch': 0.9995835068721366})

### Model Inference 

We will now take our latest checkpoint and generate text with it

In [26]:
#Model Inference 
last_checkpoint = "./results/checkpoint-1200" #Load checkpoint that you want to test 

finetuned_model = T5ForConditionalGeneration.from_pretrained(last_checkpoint)
tokenizer = T5Tokenizer.from_pretrained(last_checkpoint)

med_term = "what is Dexamethasone suppression test" 

query = tokenizer(med_term, return_tensors="pt")
output = finetuned_model.generate(**query, max_length=128, no_repeat_ngram_size=True)
answer = tokenizer.decode(output[0])

print(answer)

<pad>Dexamethasone suppression test (DST) is based on the results of an in vitro study. It was developed by Biogen, Inc and approved for medical use under UDA number NCT0377362. The FDA granted approval to bioGenesis Laboratories Ltd as well; however it has not been used clinically or commercialized since its introduction into medicine at least two years ago with no indication that this product should be available anywhere else except within Europe where there are restrictions regarding marketing authorization due either directly from Pharmaceuticals International Limited ("PharmA") nor through any other company


### Model Upload 

with out model now generating text related to medical terminology we will now upload it to S3 to ensure readiness for Bedrock Custom Model Import

In [None]:
#Upload model to S3 Bucket
import boto3
import os
# Set up S3 client
s3 = boto3.client('s3')

# Specify your S3 bucket name and the prefix (folder) where you want to upload the files
bucket_name = 'your-bucket-here'#YOU BUCKET HERE
model_name = "results/checkpoint-#" #YOUR LATEST CHECKPOINT HERE (this will be in the "results" folder in your notebook directory replace the "#" with the latest checkpoint number)
prefix = 'flan-t5-large-medical/' + model_name

# Upload files to S3
def upload_directory_to_s3(local_directory, bucket, s3_prefix):
    for root, dirs, files in os.walk(local_directory):
        for file in files:
            local_path = os.path.join(root, file)
            relative_path = os.path.relpath(local_path, local_directory)
            s3_path = os.path.join(s3_prefix, relative_path)
            
            print(f'Uploading {local_path} to s3://{bucket}/{s3_path}')
            s3.upload_file(local_path, bucket, s3_path)

# Call the function to upload the downloaded model files to S3
upload_directory_to_s3(model_name, bucket_name, prefix)

### Importing Model to Amazon Bedrock

Now that our model artifacts are uploaded into an S3 bucket, we can import it into Amazon Bedrock 

in the AWS console, we can go to the Amazon Bedrock page. On the left side under "Foundation models" we will click on "Imported models"

![Step 1](./images/step1.png "Step 1")


You can now click on "Import model"

![Step 2](./images/step2.png "Step 2")

In this next step you will have to configure:

1. Model Name 
2. Import Job Name 
3. Model Import Settings 
    a. Select Amazon S3 bucket 
    b. Select your bucket location (uploaded in the previous section)
4. Create a IAM role, or use an existing one (not shown in image)
5. Click Import (not shown in image)

![Step 3](./images/step3.png "Step 3")


You will now be taken to the page below. Your model may take up to an hour to import. 

![Step 4](./images/step4.png "Step 4")

After your model imports you will then be able to test it via the playground or API! 

![Playground](./images/playground.gif "Playground")

END OF NOTEBOOK