# Fine-Tuning Amazon Nova Canvas Model

> ☝️ This notebook has been tested with the **`SageMaker Data Science 3.0`** kernel in Amazon SageMaker Studio.

---

In this notebook, we will show how to fine tune [Amazon Nova Canvas Model](https://docs.aws.amazon.com/nova/latest/userguide/content-generation.html) on [Amazon Bedrock](https://aws.amazon.com/bedrock/) model.

We will teach our model to recognize two new classes:

**Ron the dog**

<img src="data/ron_01.jpg" width="25%" height="25%" style="float: left"/>
<img src="data/ron_06.jpg" width="25%" height="25%" style="float: left" />
<img src="data/ron_13.jpg" width="25%" height="25%" style="float: left" />
<img src="data/ron_20.jpg" width="25%" height="25%" style="float: left" />

and  **Smila the cat**

<img src="data/smila_02.jpg" width="25%" height="25%" style="float: left"/>
<img src="data/smila_06.jpg" width="25%" height="25%" style="float: left" />
<img src="data/smila_15.jpg" width="25%" height="25%" style="float: left" />
<img src="data/smila_24.jpg" width="25%" height="25%" style="float: left" />

In [None]:
!pip install --upgrade --force-reinstall --no-cache boto3
!pip install --upgrade --force-reinstall --no-cache botocore
!pip install --upgrade --force-reinstall --no-cache awscli

## Pre-requisites

Import needed libraries and instantiate the needed clients

<div style="background-color: #FFFFCC; color: #856404; padding: 15px; border-left: 6px solid #FFD700; margin-bottom: 15px;">
<h3 style="margin-top: 0; color: #856404;">⚠️ Region Availability Warning</h3>
<p>Nova Canvas Fine-tuning is currently available in us-east-1 region</p>
</div>

In [None]:
#Libraries
import json
import boto3
import datetime
import time

# Boto3 clients
s3_client = boto3.client('s3')
iam_client = boto3.client('iam')
sts_client = boto3.client('sts')
bedrock_client = boto3.client('bedrock')
bedrock_runtime_client = boto3.client('bedrock-runtime')
# Account and region info
session = boto3.session.Session()
region = session.region_name
account_id = sts_client.get_caller_identity()["Account"]


### Create an Amazon S3 bucket
Create a bucket where your training data will be stored

In [None]:
bucket_name = "amazonbedrockft-imagegen-ronsmila-{}-{}".format(account_id, region)
role_name = "AmazonBedrockFineTuning-imagegen-ronsmila"
s3_bedrock_ft_access_policy="AmazonBedrockFT-ImageGen-S3-ronsmila"
customization_role = f"arn:aws:iam::{account_id}:role/{role_name}"

try:
    if region != 'us-east-1':
        s3_client.create_bucket(
            Bucket=bucket_name,     
            CreateBucketConfiguration={
                'LocationConstraint': region
            },
        )
    else:
        s3_client.create_bucket(Bucket=bucket_name)
    print("AWS Bucket: {}".format(bucket_name))
except Exception as err:
    print("ERROR: {}".format(err))

s3_bucket_path = "s3://{}".format(bucket_name)
print("S3 bucket path: {}".format(s3_bucket_path))

## Data preparation
- To fine-tune a text-to-image or image-to-embedding model, prepare a training dataset by create a JSONL file with multiple JSON lines. 
- Validation datasets are not supported. 
- Each JSON line is a sample containing an image-ref, the Amazon S3 URI for an image, and a caption that could be a prompt for the image.

The images must be in JPEG or PNG format.

    {"image-ref": "s3://bucket/path/to/image001.png", "caption": "<prompt text>"}
    {"image-ref": "s3://bucket/path/to/image002.png", "caption": "<prompt text>"}
    {"image-ref": "s3://bucket/path/to/image003.png", "caption": "<prompt text>"}    

The following is an example item:

    {"image-ref": "s3://my-bucket/my-pets/cat.png", "caption": "an orange cat with white spots"}

#### Locate your sample json file
We are going to use a json file which contains the image captions in the following format:

    {
        "imagefile":"caption",
        "imagefile":"caption",
        "imagefile":"caption"
    }

In [None]:
raw_data_file = "prompts/captions.json"

with open(raw_data_file, 'r') as file:
    raw_data = json.load(file)

print(raw_data)

### Create the dataset file and upload the images to Amazon S3
Create the `jsonl` file with the images prompt based on the image's s3 path. 

In [None]:
images_dir = 'data'
output_file = 'prompts/output.jsonl'


with open(output_file, "w", encoding="utf-8") as jsonl_file:
    for filename, caption in raw_data.items():
        image_path = "{}/{}".format(images_dir, filename)
        s3_image_path = "{}/{}".format(s3_bucket_path, image_path)
        jsonl_entry = {
            "image-ref":s3_image_path,
            "caption": caption
        }
        jsonl_file.write(json.dumps(jsonl_entry, ensure_ascii=False) + "\n")
        s3_client.upload_file(image_path, bucket_name, image_path)
    # Remove the newline character from the last line
    jsonl_file.seek(jsonl_file.tell() - 1)
    jsonl_file.truncate()

s3_client.upload_file(output_file, bucket_name, output_file)

## Fine tune job preparation - Creating role and policies requirements

We will now prepare the necessary role for the fine-tune job. That includes creating the policies required to run customization jobs with Amazon Bedrock.

### Create Trust relationship
This JSON object defines the trust relationship that allows the bedrock service to assume a role that will give it the ability to talk to other required AWS services. The conditions set restrict the assumption of the role to a specfic account ID and a specific component of the bedrock service (model_customization_jobs)

In [None]:
# This JSON object defines the trust relationship that allows the bedrock service to assume a role that will give it the ability to talk to other required AWS services. The conditions set restrict the assumption of the role to a specfic account ID and a specific component of the bedrock service (model_customization_jobs)
ROLE_DOC = f"""{{
    "Version": "2012-10-17",
    "Statement": [
        {{
            "Effect": "Allow",
            "Principal": {{
                "Service": "bedrock.amazonaws.com"
            }},
            "Action": "sts:AssumeRole",
            "Condition": {{
                "StringEquals": {{
                    "aws:SourceAccount": "{account_id}"
                }},
                "ArnEquals": {{
                    "aws:SourceArn": "arn:aws:bedrock:{region}:{account_id}:model-customization-job/*"
                }}
            }}
        }}
    ]
}}
"""

### Create S3 access policy

This JSON object defines the permissions of the role we want bedrock to assume to allow access to the S3 bucket that we created that will hold our fine-tuning datasets and allow certain bucket and object manipulations.


In [None]:
ACCESS_POLICY_DOC = f"""{{
    "Version": "2012-10-17",
    "Statement": [
        {{
            "Effect": "Allow",
            "Action": [
                "s3:AbortMultipartUpload",
                "s3:DeleteObject",
                "s3:PutObject",
                "s3:GetObject",
                "s3:GetBucketAcl",
                "s3:GetBucketNotification",
                "s3:ListBucket",
                "s3:PutBucketNotification"
            ],
            "Resource": [
                "arn:aws:s3:::{bucket_name}",
                "arn:aws:s3:::{bucket_name}/*"
            ]
        }}
    ]
}}"""

### Create IAM role and attach policies

Let's now create the IAM role with the created trust policy and attach the s3 policy to it

In [None]:
response = iam_client.create_role(
    RoleName=role_name,
    AssumeRolePolicyDocument=ROLE_DOC,
    Description="Role for Bedrock to access S3 for finetuning",
)

In [None]:
role_arn = response["Role"]["Arn"]
response = iam_client.create_policy(
    PolicyName=s3_bedrock_ft_access_policy,
    PolicyDocument=ACCESS_POLICY_DOC,
)
policy_arn = response["Policy"]["Arn"]
iam_client.attach_role_policy(
    RoleName=role_name,
    PolicyArn=policy_arn,
)

## Create Fine-tuning job

Now that we have all the requirements in place, let's create the fine-tuning job with the Nova Canvas model.

To do so, we need to set the model **hyperparameters** for `stepCount`, `batchSize` and `learningRate` and provide the path to your training data

In [None]:
ts = datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")

# Select the foundation model you want to customize (you can find this from the "modelId" from listed foundation model above)
base_model_id = "amazon.nova-canvas-v1:0"

# Select the customization type from "FINE_TUNING" or "CONTINUED_PRE_TRAINING". 
customization_type = "FINE_TUNING"

# Specify the roleArn for your customization job
customization_role = role_arn

# Create a customization job name
customization_job_name = f"image-gen-ft-{ts}"

# Create a customized model name for your fine-tuned Llama2 model
custom_model_name = f"image-gen-ft-{ts}"

# Define the hyperparameters for fine-tuning Llama2 model
hyper_parameters = {
    "stepCount": "8000",
    "batchSize": "8",
    "learningRate": "0.00001",
}

# Specify your data path for training, validation(optional) and output
s3_train_uri = s3_bucket_path + "/" + output_file
training_data_config = {"s3Uri": s3_train_uri}


output_data_config = {"s3Uri": f's3://{bucket_name}/outputs/output-{custom_model_name}'}

# Create the customization job
bedrock_client.create_model_customization_job(
    customizationType=customization_type,
    jobName=customization_job_name,
    customModelName=custom_model_name,
    roleArn=customization_role,
    baseModelIdentifier=base_model_id,
    hyperParameters=hyper_parameters,
    trainingDataConfig=training_data_config,
    outputDataConfig=output_data_config
)


### Waiting until customization job is completed
Once the customization job is finished, you can check your existing custom model(s) and retrieve the modelArn of your fine-tuned model.

<div class=\"alert alert-block alert-warning\">
    <b>Warning:</b> The model customization job can take hours to run. With 5000 steps, 0.000001 learning rate, 64 of batch size and 60 images, it takes around 4 hours to complete
</div>


In [None]:

# check model customization status
status = bedrock_client.list_model_customization_jobs(
    nameContains=customization_job_name
)["modelCustomizationJobSummaries"][0]["status"]
while status == 'InProgress':
    time.sleep(50)
    status = bedrock_client.list_model_customization_jobs(
        nameContains=customization_job_name
    )["modelCustomizationJobSummaries"][0]["status"]
    print(status)

## Next Steps

Once your training job is completed, you can run the `2-Canvas-provisioned-throughput-inference` notebook to invoke the model

In [None]:
%store customization_role