# Fine-Tune Claude-3 Haiku model provided by Amazon Bedrock: End-to-End

This notebook demonstrates the end-to-end process of fine-tuning the Anthropic Claude-3 Haiku model using Amazon Bedrock, including selecting the base model, configuring hyperparameters, creating and monitoring the fine-tuning job, deploying the fine-tuned model with provisioned throughput and evaluating the performance of the fine-tuned model. 

You can also do this through the Bedrock Console.

## Prerequisites

 - Make sure you have executed `00_Setup&DataPrep_Haiku.ipynb` notebook.
 - Make sure you are using the same kernel and instance as `00_Setup&DataPrep_Haiku.ipynb` notebook.

<div class="alert alert-block alert-warning">
<b>Warning:</b> This notebook will create provisioned throughput for testing the fine-tuned model. Therefore, please make sure to delete the provisioned throughput as mentioned in the last section of the notebook, otherwise you will be charged for it, even if you are not using it.
</div>

In [1]:
!pip install -qU bert_score

In [2]:
# restart kernel for packages to take effect
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

In [3]:
## Fetching varialbes from `00_Setup&DataPrep_Haiku.ipynb` notebook. 
%store -r role_arn
%store -r s3_train_uri
%store -r s3_validation_uri
%store -r s3_test_uri
%store -r bucket_name

In [4]:
import pprint
pprint.pp(role_arn)
pprint.pp(s3_train_uri)
pprint.pp(s3_validation_uri)
pprint.pp(s3_test_uri)
pprint.pp(bucket_name)

'arn:aws:iam::525407566630:role/BedrockRole-7c1b52df-6947-40c6-812a-96e036b1f254'
's3://bedrock-haiku-customization-us-west-2-525407566630/haiku-fine-tuning-datasets/train/train-samsum-1K.jsonl'
's3://bedrock-haiku-customization-us-west-2-525407566630/haiku-fine-tuning-datasets/validation/validation-samsum-100.jsonl'
's3://bedrock-haiku-customization-us-west-2-525407566630/haiku-fine-tuning-datasets/test/test-samsum-10.jsonl'
'bedrock-haiku-customization-us-west-2-525407566630'


## Setup

In [5]:
import warnings
warnings.filterwarnings('ignore')
import json
import os
import sys
import boto3
import time

In [6]:
session = boto3.session.Session()
region = session.region_name
sts_client = boto3.client('sts')
s3_client = boto3.client('s3')
aws_account_id = sts_client.get_caller_identity()["Account"]
bedrock = boto3.client(service_name="bedrock")
bedrock_runtime = boto3.client(service_name="bedrock-runtime")

In [7]:
test_file_name = "test-samsum-10.jsonl"
data_folder = "haiku-fine-tuning-datasets-samsum"

## Select the model you would like to fine-tune
You will have to provide the `base_model_id` for the model you are planning to fine-tune. You can get that using `list_foundation_models` API as follows: 
```
for model in bedrock.list_foundation_models(
    byCustomizationType="FINE_TUNING")["modelSummaries"]:
    for key, value in model.items():
        print(key, ":", value)
    print("-----\n")
```

In [8]:
base_model_id = "anthropic.claude-3-haiku-20240307-v1:0:200k"

Next you will need to provide the `customization_job_name`, `custom_model_name` and `customization_role` which will be used to create the fine-tuning job. 

In [9]:
from datetime import datetime
ts = datetime.now().strftime("%Y-%m-%d-%H-%M-%S")

customization_job_name = f"model-finetune-job-{ts}"
custom_model_name = f"finetuned-model-{ts}"
customization_role = role_arn

## Create fine-tuning job

<div class="alert alert-block alert-info">
<b>Note:</b> Fine-tuning job will take several hours to complete.</div>

Anthropic Claude 3 Haiku fine-tuning in Amazon Bedrock allows customers to define various hyperparameters that can significantly impact the fine-tuning process and the resulting model’s performance. 


| ***Parameter Name*** | ***Parameter Description*** | ***Type*** | ***Default*** | **Value Range** |
| ------- | ------------- | ------ | --------- | ----------- |
| epochCount | The maximum number of iterations through the entire training dataset | integer | 2 | 1 - 10 |
| batchSize | The number of samples processed before updating model parameters | integer | 32 | 4 - 256 |
| learningRateMultiplier | Multiplier that influences the learning rate at which model parameters are updated after each batch | float | 1 | 0.1 - 2 |
| earlyStoppingThreshold | The minimum improvement in validation loss required to prevent premature termination of the training process | float | 0.001 | 0-0.1 | 
| earlyStoppingPatience | The tolerance for stagnation in the validation loss metric before stopping the training process | int | 2 | 1 - 10 |



In [10]:
# Select the customization type from "FINE_TUNING" or "CONTINUED_PRE_TRAINING". 
customization_type = "FINE_TUNING"

In [11]:
# Define the hyperparameters for fine-tuning Claude-3 Haiku model
hyper_parameters = {
        "epochCount": "5",
        "batchSize": "32",
        "learningRateMultiplier": "1",
        "earlyStoppingThreshold": "0.001",
        "earlyStoppingPatience": "2"
    }


s3_bucket_config=f's3://{bucket_name}/outputs/output-{custom_model_name}'
# Specify your data path for training, validation(optional) and output
training_data_config = {"s3Uri": s3_train_uri}

validation_data_config = {
        "validators": [{
            # "name": "validation",
            "s3Uri": s3_validation_uri
        }]
    }

output_data_config = {"s3Uri": s3_bucket_config}


# Create the customization job of fine-tuning Claude model in Amazon Bedrock. This part also starts executing the fine-tuning job underneath in Amzon Bedrock.
training_job_response = bedrock.create_model_customization_job(
    customizationType=customization_type,
    jobName=customization_job_name,
    customModelName=custom_model_name,
    roleArn=customization_role,
    baseModelIdentifier=base_model_id,
    hyperParameters=hyper_parameters,
    trainingDataConfig=training_data_config,
    validationDataConfig=validation_data_config,
    outputDataConfig=output_data_config
)
training_job_response

{'ResponseMetadata': {'RequestId': 'a06aeb66-c3fd-4975-b580-25e59852a830',
  'HTTPStatusCode': 201,
  'HTTPHeaders': {'date': 'Tue, 08 Apr 2025 17:20:25 GMT',
   'content-type': 'application/json',
   'content-length': '132',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'a06aeb66-c3fd-4975-b580-25e59852a830'},
  'RetryAttempts': 0},
 'jobArn': 'arn:aws:bedrock:us-west-2:525407566630:model-customization-job/anthropic.claude-3-haiku-20240307-v1:0:200k/1npdg9tmj2m9'}

## Check fine-tuning job status

You can see the status of the fine-funing job by using the API or by check Bedrock Console --> Foundation Models --> Custom Models --> Jobs

In [None]:
fine_tune_job = bedrock.get_model_customization_job(jobIdentifier=customization_job_name)["status"]
print(fine_tune_job)

while fine_tune_job == "InProgress":
    time.sleep(60)
    fine_tune_job = bedrock.get_model_customization_job(jobIdentifier=customization_job_name)["status"]
    print (fine_tune_job)
    time.sleep(60)

InProgress
InProgress
InProgress


In [None]:
fine_tune_job = bedrock.get_model_customization_job(jobIdentifier=customization_job_name)

In [None]:
pprint.pp(fine_tune_job)

In [None]:
output_job_name = "model-customization-job-"+fine_tune_job['jobArn'].split('/')[-1]
output_job_name

Now we are ready to create [`provisioned throughput`](#) which is needed before you can do the inference on the fine-tuned model.

### Overview of Provisioned throughput
You specify Provisioned Throughput in Model Units (MU). A model unit delivers a specific throughput level for the specified model. The throughput level of a MU for a given Text model specifies the following:

- The total number of input tokens per minute – The number of input tokens that an MU can process across all requests within a span of one minute.

- The total number of output tokens per minute – The number of output tokens that an MU can generate across all requests within a span of one minute.

Model unit quotas depend on the level of commitment you specify for the Provisioned Throughput.

- For custom models with no commitment, a quota of one model unit is available for each Provisioned Throughput. You can create up to two Provisioned Throughputs per account.

- For base or custom models with commitment, there is a default quota of 0 model units. To request an increase, use the [limit increase form](#https://support.console.aws.amazon.com/support/home#/case/create?issueType=service-limit-increase).

## Retrieve Custom Model
Once the customization job is finished, you can check your existing custom model(s) and retrieve the modelArn of your fine-tuned model.

In [None]:
# List your custom models
bedrock.list_custom_models()

In [None]:
model_id = bedrock.get_custom_model(modelIdentifier=custom_model_name)['modelArn']
model_id

## Create Provisioned Throughput
<div class="alert alert-block alert-info">
<b>Note:</b> Creating provisioned throughput will take around 20-30mins to complete.</div>

You will need to create provisioned throughput to be able to evaluate the model performance. You can do so through the [console].(https://docs.aws.amazon.com/bedrock/latest/userguide/prov-cap-console.html) or use the following api call:

In [None]:
import boto3 
boto3.client(service_name='bedrock')
provisioned_model_id = bedrock.create_provisioned_model_throughput(
 modelUnits=1,
 provisionedModelName='test-haiku-ft-model', 
 modelId=model_id
)['provisionedModelArn']     

In [None]:
status_provisioning = bedrock.get_provisioned_model_throughput(provisionedModelId = provisioned_model_id)['status']

In [None]:
import time
while status_provisioning == 'Creating':
    time.sleep(60)
    status_provisioning = bedrock.get_provisioned_model_throughput(provisionedModelId=provisioned_model_id)['status']
    print(status_provisioning)
    time.sleep(60)

## Invoke the Custom Model

Before invoking lets get the sample prompt from our test data. 

In [None]:
# Provide the prompt text 
test_file_path = f'{data_folder}/{test_file_name}'
with open(test_file_path) as f:
    lines = f.read().splitlines()

In [None]:
test_system_prompt = json.loads(lines[3])['system']
test_user_prompt = json.loads(lines[3])['messages'][0]['content']
reference_summary = json.loads(lines[3])['messages'][1]['content']
pprint.pp(test_system_prompt)
pprint.pp(test_user_prompt)
pprint.pp(reference_summary)

In [None]:
message = [
        {
            "role": "user",
            "content": test_user_prompt
        }
    ]

In [None]:
base_model_arn = f'arn:aws:bedrock:{region}::foundation-model/anthropic.claude-3-haiku-20240307-v1:0'

Make sure to construct model input following the format needed by Anthropic Claude Message API [here](#https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages.html). 
Please pay attention to the "Model invocation request body field" section in the `body` variable, which we will pass as payload to the custom model trained above. 

Alternatively, you can also use [Converse API](#https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html) in Amazon Bedrock to invoke model regardless of specific input format the model requires. 

In [None]:
body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 2048,
        "messages": message,
        "temperature": 0.1,
        "top_p": 0.9,
        "system": test_system_prompt
    }  
)  

fine_tuned_response = bedrock_runtime.invoke_model(body=body, 
                                        modelId=provisioned_model_id)

base_model_response = bedrock_runtime.invoke_model(body=body, 
                                        modelId=base_model_arn)

fine_tuned_response_body = json.loads(fine_tuned_response.get('body').read())
base_model_response_body = json.loads(base_model_response.get('body').read())

print("Base model response: ", base_model_response_body['content'][0]['text'] + '\n')
print("Fine tuned model response:", fine_tuned_response_body['content'][0]['text']+'\n')
print("Reference summary from test data: " , reference_summary)


In [None]:
#fine_tuned_response_body

In [None]:
#base_model_response_body

In [None]:
#print("Fine tuned model response:", fine_tuned_response_body['content'][0]['text']+'\n')

# print("Base model response: ", base_model_response_body['content'][0]['text'] + '\n')
# print("Fine tuned model response:", fine_tuned_response_body['content'][0]['text']+'\n')
# print("Reference summary from test data: " , reference_summary)


#print("Fine tuned model response:", fine_tuned_response_body['content'][0]['text']+'\n')

In [None]:
# body=json.dumps(
#     {
#         "anthropic_version": "bedrock-2023-05-31",
#         "max_tokens": 2048,
#         "messages": message,
#         "temperature": 0.1,
#         "top_p": 0.9,
#         "system": test_system_prompt
#     }  
# )  

# fine_tuned_response = bedrock_runtime.invoke_model(body=body, 
#                                         modelId=provisioned_model_id)

# base_model_response = bedrock_runtime.invoke_model(body=body, 
#                                         modelId=base_model_arn)

# fine_tuned_response_body = json.loads(fine_tuned_response.get('body').read())
# base_model_response_body = json.loads(base_model_response.get('body').read())

# print("Base model response: ", base_model_response_body["results"][0]["outputText"] + '\n')
# print("Fine tuned model response:", fine_tuned_response_body["results"][0]["outputText"]+'\n')
# print("Reference summary from test data: " , reference_summary)

## Evaluate the performance of the model 
In this section, we will use `BertScore` metrics  to evaluate the performance of the fine-tuned model as compared to base model to check if fine-tuning has improved the results.

- `BERTScore`: calculates the similarity between a summary and reference texts based on the outputs of BERT (Bidirectional Encoder Representations from Transformers), a powerful language model. [Medium article link](#https://haticeozbolat17.medium.com/bertscore-and-rouge-two-metrics-for-evaluating-text-summarization-systems-6337b1d98917)

In [None]:
base_model_generated_response = [base_model_response_body['content'][0]['text']]
fine_tuned_generated_response = [fine_tuned_response_body['content'][0]['text']]

In [None]:
from bert_score import score
reference_summary = [reference_summary]
fine_tuned_model_P, fine_tuned_R, fine_tuned_F1 = score(fine_tuned_generated_response, reference_summary, lang="en")
base_model_P, base_model_R, base_model_F1 = score(base_model_generated_response, reference_summary, lang="en")
print("F1 score: base model ", base_model_F1)
print("F1 score: fine-tuned model", fine_tuned_F1)

## Conclusion
From the scores above and looking at the base model summary, fine-tuned model summary and reference summary, it clearly indicates that fine-tuning the model tends to improve the results on the task its trained on. We only used 1K records for training with 100 validation records and 2 epochs, and were able to get better results. 

<div class="alert alert-block alert-info">
<b>Tip:</b> 
    Please refer to the <a href="https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-guidelines.html" style="color: #3372FF">guidelines </a> provided for fine-tuning the model based on your task. </div>

## Delete provisioned througput
<div class="alert alert-block alert-warning">
<b>Warning:</b> Please make sure to delete providsioned throughput as there will cost incurred if its left in running state, even if you are not using it. 
</div>

In [None]:
bedrock.delete_provisioned_model_throughput(provisionedModelId=provisioned_model_id)