# Fine Tune Amazon Titan Text Large model with Bedrock

This notebook provides steps requried to fine tune a model. Using Amazon Bedrock you can customize a model to improve its performance on specific tasks by providing  labeled training dataset. You can upload the training and validation datasets to S3, provide S3 URI to the fine tuning job. You can set the hyper parameters for the fine-tuning job, start and monitor the job status. Once the job is complete, you will be able to invoke the custom trained model. 

 
(This notebook was tested on SageMaker Studio ml.m5.2xlarge instance with Datascience 3.0 kernel)

To customize the model to meet your requirements, there are multiple options. In context learning is the fastest approach as it leverages pre-trained model and optionally RAG can be used to inject additional context from the knowledge base. 

When the performance of the output is not meeting the requirements or if you have domain specific data, Fine tuning can be considered. It should be noted that Fine tuning requires additional effort and compute.

<img src ="images/llm_customization_options.png" width="600"/>

## Pre-requisites

In [None]:
#Check Python version is greater than 3.8 which is required by Langchain if you want to use Langchain
import sys
sys.version

## Install the SDK

NOTE: This notebook requires Bedrock Python SDK. Install Bedrock SDK if you haven't done yet. Refer to 00_bedrock_onboarding.ipynb notebook for steps to install and uninstall previous version if any.

In [None]:
!pip install datasets

In [None]:
!pip install jinja2

# Create IAM Role and assign Permissions

We need to create two roles

Role A- Notebook execution
--
This notebook requires permissions to invoke Bedrock service. Ensure to add a policy to the role listed above similar to
    
```
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Bedrock",
            "Effect": "Allow",
            "Action": "bedrock:*",
            "Resource": "*"
        }
    ]
}
```

Role B- Data access
--
To start the Fine tuning job, the role created above will pass execution to another IAM role that has access to Fine tuning job. Create a role that has following Trust relationship

```
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "bedrock.amazonaws.com"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "aws:SourceAccount": "<account>"
        },
        "ArnEquals": {
          "aws:SourceArn": "arn:aws:bedrock:<region>:<account>:model-customization-job/*"
        }
      }
    }
  ]
}
```
The role created above needs to have permissions to S3 bucket where the training & validation datasets are located and access to S3 location where fine-tuning job output will be written

```
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:ListBucket",
                "s3:ListObjects"
            ],
            "Resource": [
                "arn:aws:s3:::<train_set_bucket>",
                "arn:aws:s3:::<train_set_bucket>/*",
                "arn:aws:s3:::<test_set_bucket>",
                "arn:aws:s3:::<test_set_bucket>/*",
                "arn:aws:s3:::<job_output_bucket>",
                "arn:aws:s3:::<job_output_bucket>/*"
            ]
        }
    ]
}

```


Also ensure Role A has permissions to pass IAM Role to Role B. This needs to be defined as an IAM policy similar to below

```
{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Effect": "Allow",
			"Action": [
				"iam:GetRole",
				"iam:PassRole"
			],
			"Resource": "arn:aws:iam::<account>:role/*"
		}
	]
}

```


## Restart Kernel

In [None]:
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)  

In [2]:
import sagemaker
import boto3
session = boto3.Session()
sagemaker_session = sagemaker.Session()
studio_region = sagemaker_session.boto_region_name 
#sagemaker_session.get_caller_identity_arn()

In [81]:
#Check if basic commands work with bedrock client
bedrock = boto3.client('bedrock' , 'us-east-1', endpoint_url='https://bedrock.us-east-1.amazonaws.com')
bedrock.list_foundation_models()

{'ResponseMetadata': {'RequestId': 'c02947cb-ef0f-4aa3-ab8c-23b4fee9c746',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Wed, 23 Aug 2023 18:59:02 GMT',
   'content-type': 'application/json',
   'content-length': '1166',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'c02947cb-ef0f-4aa3-ab8c-23b4fee9c746'},
  'RetryAttempts': 0},
 'modelSummaries': [{'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-tg1-large',
   'modelId': 'amazon.titan-tg1-large'},
  {'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-e1t-medium',
   'modelId': 'amazon.titan-e1t-medium'},
  {'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/stability.stable-diffusion-xl',
   'modelId': 'stability.stable-diffusion-xl'},
  {'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/ai21.j2-grande-instruct',
   'modelId': 'ai21.j2-grande-instruct'},
  {'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/ai21.j2-jumbo-instruct',
   'modelId': 'ai21.j2-jumbo-i

## Download datasets
In this step, we will use [SQUAD dataset](https://arxiv.org/abs/1606.05250) for fine tuning. This dataset has a list of questions under different categories (title). We will filter for a category and use the filtered set for fine-tuning purpose

In [82]:
from datasets import load_dataset
raw_datasets = load_dataset("squad")

In [6]:
raw_datasets

DatasetDict({
    train: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 87599
    })
    validation: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 10570
    })
})

In [7]:
category = 'Solar_energy'
raw_train_set = raw_datasets['train']
raw_test_set = raw_datasets['validation']
train_set = raw_train_set.filter(lambda x: x['title'] == category)
test_set = raw_test_set.filter(lambda x: x['title'] == category)

if len(test_set) == 0: #If there is no test data set available, split the train set for this
    split_set = train_set.train_test_split(test_size=0.1)
    train_set,test_set = split_set['train'],split_set['test']

train_set, test_set

(Dataset({
     features: ['id', 'title', 'context', 'question', 'answers'],
     num_rows: 225
 }),
 Dataset({
     features: ['id', 'title', 'context', 'question', 'answers'],
     num_rows: 25
 }))

## Prepare Training Dataset
In this step, we will upload training data to an S3 bucket. Bedrock customization job expects training and validation data to be in JSONL format and there should be any new line character at the end of the file.
We will convert the dataset downloaded into a JSONL file

In [8]:
import pathlib , os, json
fine_tune_dataset_path = 'data/finetuned'
pathlib.Path(fine_tune_dataset_path).mkdir(parents=True, exist_ok=True)

In [90]:
import jinja2

env = jinja2.Environment(loader=jinja2.FileSystemLoader('templates'))
squad_template = env.get_template('fine_tuning.txt')

def create_data_file(data,file_name):
    data_len = len(data)
    with open(file_name,'w') as f:
        for i, item in enumerate(data):
            c,q,answers = item['context'],item['question'],item['answers']['text']
            if len(answers) > 0:
                a = answers[0]
                c = c.replace('"','') #Remove any double quotes in context, question and answer
                q = q.replace('"','')
                a = a.replace('"','')
                jsonl = squad_template.render(context=c,question=q,answer=a)
                f.write(jsonl)
                if i < (data_len -1):
                    f.write('\n')
    print(f'File {file_name} created with {data_len} rows')

In [116]:
from io import StringIO 
from urllib.parse import urlparse
import boto3
import sys
import pandas as pd

def split_s3_path(s3_uri):
    parse_result = urlparse(s3_uri, allow_fragments=False)
    return parse_result.netloc,parse_result.path.lstrip('/')

def s3_csv_to_df(s3_uri):
    s3 = boto3.client('s3')
    bucket, object_key = split_s3_path(s3_uri)
    csv = s3.get_object(Bucket=bucket, Key=object_key)
    csvs = csv['Body'].read().decode('utf-8')
    df = pd.read_csv(StringIO(csvs),sep=',')
    return df


In [10]:
train_file_name = f'{fine_tune_dataset_path}/train_data.jsonl'
test_file_name = f'{fine_tune_dataset_path}/test_data.jsonl'

In [None]:
create_data_file(train_set,train_file_name)
create_data_file(test_set,test_file_name)

### Upload the files created locally yo S3 and get the URIs

In [None]:
bucket = sagemaker_session.default_bucket()
fine_tuning_prefix = 'bedrock_fine_tuning'
train_data_s3_path = sagemaker_session.upload_data(train_file_name, bucket=bucket, key_prefix=fine_tuning_prefix)
test_data_s3_path = sagemaker_session.upload_data(test_file_name, bucket=bucket, key_prefix=fine_tuning_prefix)

train_data_s3_path,test_data_s3_path

## Start the Fine tuning job
We will prepare the inputs to the Fine tuning job (aka model customization job) and start the job.
Please make sure you have provided necessary IAM permissions explained in previous steps before proceeding further.

Besisdes the Hyperparameters, we define the names (Job name, Custom model name), attach tags to job and custom model for tracking purposes, location of Training and test set and the output path where the fine tuning job would save the training & validaition metrics.  

We can optionally supply VPC configuation to ensure calls from the fine tuning job to fetch training/ validation data are routed through VPC/ Private Links to S3. You can define security groups to define access controls for the fine tuning job.

Fine Tuning job supports following Hyper parameters

- Epochs
- Batch size
- Learning Rate
- Learning rate warmup steps

In [121]:
from uuid import uuid4
from datetime import datetime

#Define Hyperparameters
hyper_params = {"epochCount" : "1","batchSize":"1", "learningRate": "0.00005","learningRateWarmupSteps":"0"}

#Define names (Job, Model)
base_model = "amazon.titan-tg1-large"
client_token = str(uuid4())
custom_model_name= category.lower() + '_model'
job_name = f"bedrock-titan-{datetime.now().strftime('%Y%m-%d%H-%M%S')}"

#Define tahs for tracking purposes
model_tags = [{"key": "custom_model_type","value": category.lower()}]
job_tags = [{"key": "base_model_type","value": base_model.replace(".","-")}]


output_s3_path = f's3://{bucket}/{fine_tuning_prefix}/output/'

fine_tuning_role = sagemaker_session.get_caller_identity_arn() 
fine_tuning_role = '<ADD FINE TUNING ROLE>' 

#Optional Setup VPC configuration
# vpc_config = {
#     "securityGroupIds":["sg-1","sg-2"],
#     "subnetIds":["subnet-a", "subnet-b"]
# }

client_token, custom_model_name, job_name

('08327152-a75d-4cf6-9907-8fa0b56cbe0c',
 'solar_energy_model',
 'bedrock-titan-202308-2321-3904')

### Start the job using create_model_customization_job API
NOTE: This might take upto 30 minutes to complete depending on the training and validation dataset and other hyperparameters like Epochs etc.

In [None]:
bedrock.create_model_customization_job(    
    baseModelIdentifier = base_model,
    clientRequestToken = client_token,
    customModelName = custom_model_name,
    customModelTags = model_tags,
    jobTags=job_tags, 
    hyperParameters = hyper_params,
    jobName=job_name,
    outputDataConfig = {"s3Uri": output_s3_path},
    trainingDataConfig = {"s3Uri" : train_data_s3_path},
    validationDataConfig =  {"validators": [ {"s3Uri": test_data_s3_path}]},
    roleArn = fine_tuning_role,
    #vpcConfig = vpc_config
)

## Monitor the Fine tuning job
Once the job is submitted we will get the Job ARN. We will be able to monitor the job with APIs list_model_customization_jobs & list_model_customization_jobs

In [None]:
bedrock.list_model_customization_jobs(nameContains=job_name)

In [106]:
job_name = 'bedrock-titan-202308-2316-5550'
job_detail = bedrock.get_model_customization_job(jobIdentifier=job_name)

In [None]:
job_detail

In [None]:
job_detail['status'], job_detail['jobArn']

## Stop Model customization job (Optional)
We can stop a model customization job in progress

In [128]:
#Optional- Uncomment to stop the job

#stop_model_customization_job(jobIdentifier=job_detail['jobArn'])

## View Training and Validation Metrics
Wait until the status of the Fine tuning job changes to "Completed" before proceeding further
In this step we will view the Training and Validation metrics

In [None]:
job_metrics_out_s3_prefix = f"{job_detail['outputDataConfig']['s3Uri']}model-customization-job-{job_detail['jobArn'].split('/')[-1]}"
job_metrics_out_s3_prefix

### Print Step wise training metrics

In [117]:

training_metrics_csv = f'{job_metrics_out_s3_prefix}/training_artifacts/step_wise_training_metrics.csv'
train_metrics_df = s3_csv_to_df(training_metrics_csv)
train_metrics_df

Unnamed: 0,step_number,epoch_number,training_loss
0,0,1,1.773438
1,1,1,0.617188
2,2,1,0.128906
3,3,1,0.433594
4,4,1,0.122559
5,5,1,1.546875
6,6,1,0.675781
7,7,1,0.699219
8,8,1,1.257812
9,9,1,1.734375


### Print Validation metrics

In [118]:
validation_metrics_csv = f'{job_metrics_out_s3_prefix}/validation_artifacts/post_fine_tuning_validation/validation/validation_metrics.csv'
validation_metrics_df = s3_csv_to_df(validation_metrics_csv)
validation_metrics_df

Unnamed: 0,step_number,epoch_number,validation_loss
0,10,1,0.738204


## List Custom Models

In [None]:
bedrock.list_custom_models()

## Query custom model activation status
In this step we will check the activation status of the model and check if it is ready for taking real-time inference requests

In [None]:
custom_model = bedrock.get_custom_model(modelIdentifier=custom_model_name)
custom_model['realTimeInferenceStatus'], custom_model['modelArn']

## Invoke Custom Model
In this step we will invoke teh custom model trained with the Domain specific data. We will select a random item from the test set to get results.
NOTE: Wait until the model is "ACTIVE" in the real time inference status. It might take upto 10 minutes for the model to become active.

In [103]:
import random
item_no = random.randint(0,len(test_set) - 1)
test_item = test_set[item_no]
test_item

{'id': '56cff5ff234ae51400d9c178',
 'title': 'Solar_energy',
 'context': 'Solar cookers use sunlight for cooking, drying and pasteurization. They can be grouped into three broad categories: box cookers, panel cookers and reflector cookers. The simplest solar cooker is the box cooker first built by Horace de Saussure in 1767. A basic box cooker consists of an insulated container with a transparent lid. It can be used effectively with partially overcast skies and will typically reach temperatures of 90–150 °C (194–302 °F). Panel cookers use a reflective panel to direct sunlight onto an insulated container and reach temperatures comparable to box cookers. Reflector cookers use various concentrating geometries (dish, trough, Fresnel mirrors) to focus light on a cooking container. These cookers reach temperatures of 315 °C (599 °F) and above but require direct light to function properly and must be repositioned to track the Sun.',
 'question': 'What is the typical temperature range for a bo

In [119]:
import json
prompt_template = "Given the context below, answer the specified question. The answer should be extracted directly from the context verbatim. CONTEXT: {context} QUESTION: {question}"

prompt = prompt_template.format(context=test_item['context'],question=test_item['question'])

body = json.dumps({"inputText": prompt})
modelId = custom_model_name
accept = "application/json"
contentType = "application/json"

response = bedrock.invoke_model(
    body=body, modelId=modelId, accept=accept, contentType=contentType
)
response_body = json.loads(response.get("body").read())

print(response_body.get("results")[0].get("outputText"))

90–150


## Delete Custom Model (Optional)
We can delete the custom model created above 

In [131]:
#Optional. Uncomment below line to remove the custom model created
#bedrock.delete_custom_model(modelIdentifier = custom_model['modelArn'])