# Fine-tuning Llama 3.2 11b Vision Model for Bedrock with SageMaker

## Introduction

In this notebook, we will demonstrate how to fine-tune the Llama 3.2 11b vision model using Amazon SageMaker and deploy it to Amazon Bedrock. This process involves preparing the dataset, setting up the training environment, fine-tuning the model, and finally deploying it for inference.

## Objectives

1. Load and prepare the dataset for fine-tuning
2. Set up the SageMaker training environment
3. Fine-tune the Llama 3.2 11b vision model
4. Deploy the fine-tuned model to Amazon Bedrock
5. Test the deployed model with sample queries

## Setup and Imports

First, we'll import the necessary libraries and set up our SageMaker session.

In [None]:
# Core libraries
import sagemaker
import boto3
import os
import json
import random
import time
import sys
from datetime import datetime
from botocore.config import Config
from botocore.exceptions import ClientError

# Data processing and visualization
from tqdm import tqdm
from datasets import load_dataset
from PIL import Image
from io import BytesIO
from urllib.request import urlopen
from IPython.display import Image, display

# SageMaker-specific imports
from sagemaker.jumpstart.types import JumpStartSerializablePayload
from sagemaker.s3 import S3Uploader
from sagemaker import hyperparameters
from sagemaker.jumpstart.estimator import JumpStartEstimator

# Custom modules
import importlib.util
spec = importlib.util.spec_from_file_location("iam_role_helper", "iam_role_helper.py")
iam_role_manager = importlib.util.module_from_spec(spec)
sys.modules["iam_role_manager"] = iam_role_manager
spec.loader.exec_module(iam_role_manager)

spec = importlib.util.spec_from_file_location("utils", "utils.py")
utils = importlib.util.module_from_spec(spec)
sys.modules["utils"] = utils
spec.loader.exec_module(utils)

from utils import download_artifacts, remove_field_from_json, upload_artifacts, cleanup_local_files, wait_for_model_availability, test_image_processing
from iam_role_helper import create_or_update_role


This code sets up the SageMaker session and retrieves the necessary IAM role for execution.

In [None]:
sess = sagemaker.Session()
# sagemaker session bucket -> used for uploading data, models and logs
# sagemaker will automatically create this bucket if it not exists
sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
    # set to default bucket if a bucket name is not given
    sagemaker_session_bucket = sess.default_bucket()

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    #role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']
    #use this code if you are running locally
    role = iam.get_role(RoleName='role with required priviledges')['Role']['Arn']

sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)
sm_client = boto3.client('sagemaker', region_name=sess.boto_region_name)
region = sess.boto_region_name
print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {region}")

## Data Preparation

### Dataset Description: HuggingFaceM4/DocumentVQA
The HuggingFaceM4/DocumentVQA dataset is a comprehensive collection designed for document visual question answering tasks. It contains a diverse set of document images along with corresponding questions and answers. This dataset is particularly useful for training models to understand and extract information from various types of documents, including forms, receipts, and other structured text documents.

Key features of the dataset:

- Document images in various formats and layouts
- Question-answer pairs related to the content of each document
- Diverse document types, including forms, invoices, and printed text
- Suitable for training models in tasks such as information extraction, document understanding, and visual question answering

By using this dataset, we aim to fine-tune our Llama 3.2 11b vision model to excel in document-based visual question answering tasks.

Now, let's load and prepare our dataset for fine-tuning.

In [None]:
# Load the dataset
dataset_name = "HuggingFaceM4/DocumentVQA"
data = load_dataset(
    dataset_name, cache_dir="./"
)

# Function to process data
def process_data(data, output_dir, num_ex):
    local_data_file = f"{output_dir}/metadata.jsonl"
    with open(local_data_file, "w") as f:
        for i in tqdm(range(num_ex)):
            each = data[i]
            q = each["question"]
            each_img = each["image"]
            a = each["answers"][0]

            example = {"file_name": f"images/img_{i}.jpg", "prompt": q, "completion": a}
            json.dump(example, f)
            f.write("\n")

            each_img.save(f"{output_dir}/images/img_{i}.jpg")

# Process train and validation data
for split, num in [("train", 1000), ("validation", 20)]:
    os.makedirs(f"docvqa/{split}", exist_ok=True)
    os.makedirs(f"docvqa/{split}/images", exist_ok=True)
    process_data(data=data[split], output_dir=f"./docvqa/{split}", num_ex=num)            


This code loads the DocumentVQA dataset and processes it into a format suitable for fine-tuning.

### Upload the dataset to the S3

Given the dataset contains image, the uploading process will take a while depending on the size of examples you process

In [None]:
local_data_dir = "./docvqa/train/"
train_data_location = f"s3://{sagemaker_session_bucket}/docvqa-1000-20"
S3Uploader.upload(local_data_dir, train_data_location)
print(f"Training data: {train_data_location}")

This step uploads the processed dataset to the specified S3 bucket.

## Model and Hyperparameter Configuration
Now we'll set up our model configuration and hyperparameters.

In [None]:

# Set model ID and version
model_id, model_version = "meta-vlm-llama-3-2-11b-vision-instruct", "*"

# Retrieve default hyperparameters
my_hyperparameters = hyperparameters.retrieve_default(
    model_id=model_id, model_version=model_version
)

# Set number of epochs
my_hyperparameters["epoch"] = "1"

# Validate hyperparameters
hyperparameters.validate(
    model_id=model_id, model_version=model_version, hyperparameters=my_hyperparameters
)


This code sets up the model configuration and hyperparameters for fine-tuning.

### Fine-tuning the Model
With our data and configuration ready, we can now start the fine-tuning process.

In [None]:
# Create SageMaker estimator
estimator = JumpStartEstimator(
    model_id=model_id,
    model_version=model_version,
    environment={"accept_eula": "true"},  # Please change {"accept_eula": "true"}
    disable_output_compression=True,
    instance_type="ml.p4de.24xlarge",
    role=role,
    hyperparameters=my_hyperparameters,
)
# Start fine-tuning
estimator.fit({"training": train_data_location})

This code initiates the fine-tuning process using the specified model and hyperparameters.

## Import SageMaker Model in Amazon Bedrock

### Model Preparation
After fine-tuning our model using SageMaker, we need to prepare it for import into Amazon Bedrock. This process involves downloading the model artifacts, modifying the tokenizer configuration, and uploading the modified files to an S3 bucket.

In [None]:
# Get the training job name and model URI
training_job_name = estimator._current_job_name
model_uri = estimator.model_data['S3DataSource']['S3Uri']

# Download the model artifacts
local_path = download_artifacts(
    training_job_name=training_job_name,
    model_uri=model_uri
)

# Remove the 'processor_class' field from the tokenizer config
file_path = f"{local_path}/tokenizer_config.json"
field_to_remove = "processor_class"
remove_field_from_json(file_path, field_to_remove)

# Upload the modified artifacts to S3
s3_uri = upload_artifacts(
    local_dir=local_path,
    sagemaker_session=sess,
    training_job_name=training_job_name,
    prefix='llama3-multi-model-artifacts'
)

print(f"Artifacts uploaded to: {s3_uri}")

# Clean up temporary files
cleanup_local_files('tmp_artifacts')

In this section, we perform the following steps:

- Retrieve the training job name and model URI from the SageMaker estimator.
- Download the model artifacts to a local directory.
- Modify the tokenizer_config.json file by removing the 'processor_class' field, which is required for custom import in Bedrock.
- Upload the modified artifacts to an S3 bucket, which will be used later in the Bedrock import process.
- Clean up temporary local files to free up space.

### Create IAM Role for Bedrock Import
To import our custom model into Bedrock, we need to create an IAM role with the necessary permissions.

In [None]:
# Set up variables
account_id = boto3.client('sts').get_caller_identity()['Account']
training_bucket = sagemaker_session_bucket
role_name = "Sagemaker_Bedrock_import_role"

# Define policies
trust_relationship = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {"Service": "bedrock.amazonaws.com"},
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {"aws:SourceAccount": account_id},
                "ArnEquals": {"aws:SourceArn": f"arn:aws:bedrock:{region}:{account_id}:model-import-job/*"}
            }
        },
        {
            "Effect": "Allow",
            "Principal": {"Service": "lambda.amazonaws.com"},
            "Action": "sts:AssumeRole"
        }
    ]
}

permission_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": ["s3:GetObject", "s3:ListBucket"],
            "Resource": [f"arn:aws:s3:::{training_bucket}", f"arn:aws:s3:::{training_bucket}/*"],
            "Condition": {"StringEquals": {"aws:ResourceAccount": account_id}}
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        }
    ]
}

# Create or update the role
bedrock_role_arn = create_or_update_role(
    role_name=role_name,
    trust_relationship=trust_relationship,
    permission_policy=permission_policy
)

print(f"Role ARN: {bedrock_role_arn}")

This code creates or updates an IAM role that will be used by Bedrock to access the S3 bucket containing our model artifacts.

### Import Model into Bedrock
Now that we have prepared our model and created the necessary IAM role, we can import the model into Bedrock.

In [None]:

bedrock = boto3.client(service_name='bedrock',
                       region_name=region)
# Generate a uni
timestamp = int(time.time())
random_number = random.randint(1000, 9999)
JOB_NAME = f"meta3-import-model-{timestamp}-{random_number}"

ROLE_ARN = bedrock_role_arn
IMPORTED_MODEL_NAME = f"llama32_multimodal_{timestamp}-{random_number}"
S3_URI = s3_uri

# createModelImportJob API
create_job_response = bedrock.create_model_import_job(
    jobName=JOB_NAME,
    importedModelName=IMPORTED_MODEL_NAME,
    roleArn=ROLE_ARN,
    modelDataSource={
        "s3DataSource": {
            "s3Uri": s3_uri
        }
    },
)
job_arn = create_job_response.get("jobArn")
print(f"Model import job created with ARN: {job_arn}")

This code initiates the process of importing our custom model into Bedrock. The import job is created with a unique name, and we specify the S3 location of our model artifacts and the IAM role to be used.

### Monitor Import Job Progress
After initiating the import job, we need to monitor its progress to ensure successful completion.

In [None]:
model_name_filter = IMPORTED_MODEL_NAME  # Replace with your model name
model_info = wait_for_model_availability(model_name_filter,max_attempts=30,delay=60)
#
if model_info:
    model_arn=model_info["modelArn"]
    print("Model is now available in Bedrock.")
else:
    print("Failed to find the model in Bedrock within the specified attempts.")

This function periodically checks the status of our imported model in Bedrock, waiting for it to become available for use.

## Testing the Deployed Model in Bedrock
Now that we have imported our custom model into Bedrock, we can test it by making a simple query and verifying that the model is available for inference.

In [None]:

MODEL_ID= model_arn

config = Config(
    retries={
        'total_max_attempts': 100, 
        'mode': 'standard'
    }
)
message = "Hello, what it is the weather in seattle?"


session = boto3.session.Session()
br_runtime = session.client(service_name = 'bedrock-runtime', 
                                 region_name=region, 
                                 config=config)
    
try:
    invoke_response = br_runtime.invoke_model(modelId=MODEL_ID, 
                                            body=json.dumps({'prompt': message}), 
                                            accept="application/json", 
                                            contentType="application/json")
    invoke_response["body"] = json.loads(invoke_response["body"].read().decode("utf-8"))
    print(json.dumps(invoke_response, indent=4))
except Exception as e:
    print(e)
    print(e.__repr__())


In this section, we perform the following steps:

- Set the REGION_NAME variable to the AWS region where our Bedrock model is hosted, and assign the model_arn obtained earlier to the MODEL_ID variable.

- Configure the Bedrock Runtime client to retry the API call up to 100 times (using the Config class) in case of any issues, such as network errors or temporary service unavailability.

- Define a sample message, "Hello, what it is the weather in seattle?", to query the model.

- Initialize the Bedrock Runtime client using the configured Config object and the specified region.

- Invoke the model by calling the invoke_model method on the Bedrock Runtime client, passing the MODEL_ID, the sample message as the request body, and the expected content type and response format.

- Parse the response by decoding the body field of the response and print the result as a formatted JSON.

This code demonstrates how to make a simple query to the deployed Bedrock model and handle any exceptions that may occur during the process. By retrying the API call up to 100 times, we ensure that the model is available and responsive before proceeding with further testing or integration.

## Evaluating Visual Multimodal Capabilities of Llama 3 in Bedrock

In this section, we'll demonstrate the visual multimodal capabilities of our fine-tuned Llama 3 model in Bedrock. We'll use an image from our validation set and ask the model to describe what it sees.

### Display the Test Image

First, let's display the image we'll be using for our test:

In [None]:
from PIL import Image
image_path = "./docvqa/validation/images/img_2.jpg"
# Open and display the image
img = Image.open("./docvqa/validation/images/img_2.jpg")
img.show()

This code will display the image in the notebook, allowing us to visually inspect what we're asking the model to analyze.

### Prepare and Send the Request to Bedrock
Now, we'll prepare our request and send it to the Bedrock model:

In [None]:
from PIL import Image
import io

# Path to the image we want to analyze
image_path = "docvqa/validation/images/img_2.jpg"

MODEL_ID = model_arn

# Print current working directory and absolute path
print(f"Current working directory: {os.getcwd()}")
print(f"Absolute image path: {os.path.abspath(image_path)}")

# Test the image processing (assuming test_image_processing function is defined)
#encoded_image = test_image_processing(image_path)
# Load JPEG image
jpeg_image = Image.open(image_path)
png_buffer = io.BytesIO()
jpeg_image.save(png_buffer, format="PNG")
png_bytes = png_buffer.getvalue()

#Use png_bytes in your messages structure
messages = [
    {
        "role": "user",
        "content": [
            {"text": "What can you see in this image?"},
            {
                "image": {
                    "format": "png",
                    "source": {"bytes": png_bytes},
                }
            },
        ],
    }
]

response = br_runtime.converse(
        modelId=MODEL_ID,
        messages=messages,
    )
response_text = response["output"]["message"]["content"][0]["text"]
print("###Response Output###")
print(response_text)



# Clean Up Resources

After completing your experiments and evaluations, it's crucial to clean up the resources you've created to avoid ongoing charges. This section will guide you through the process of deleting all the resources used in this notebook.

> ⚠️ **Warning:** The following steps will permanently delete resources. Make sure you've saved any important data or model artifacts before proceeding.

### Delete the Bedrock Custom Model

let's remove the custom model from Amazon Bedrock:

In [None]:
def delete_bedrock_custom_model(model_name):
    bedrock_client = boto3.client('bedrock')
    try:
        bedrock.delete_imported_model(modelIdentifier=model_name)
        print(f"Successfully deleted Bedrock custom model: {model_name}")
    except botocore.exceptions.ClientError as error:
        error_code = error.response['Error']['Code']
        if error_code == 'ValidationException':
            print(f"Error deleting Bedrock custom model: The provided model name is invalid. Model Name: {model_name}")
        elif error_code == 'ResourceNotFoundException':
            print(f"Error: The model '{model_name}' was not found in Bedrock.")
        elif error_code == 'AccessDeniedException':
            print("Error: You do not have permission to delete this model.")
        elif error_code == 'ConflictException':
            print("Error: The model is currently in use or in a state that doesn't allow deletion.")
        else:
            print(f"Error deleting Bedrock custom model: {error}")

delete_bedrock_custom_model(IMPORTED_MODEL_NAME)

# Conclusion

In this notebook, we've demonstrated the process of fine-tuning the Llama 3.2 11b vision model using Amazon SageMaker and importing it into Amazon Bedrock as a custom model. We've covered the following key objectives:

1. Loaded and prepared the HuggingFaceM4/DocumentVQA dataset for fine-tuning
2. Set up the SageMaker training environment and configured hyperparameters
3. Fine-tuned the Llama 3.2 11b vision model using SageMaker
4. Prepared and imported the fine-tuned model into Amazon Bedrock
5. Tested the deployed model with both text and image-based queries

This workflow showcases the power of combining SageMaker's training capabilities with Bedrock's inference API, allowing for the creation of specialized, multi-modal AI models that can be easily integrated into various applications.

Key takeaways from this notebook:

- The HuggingFaceM4/DocumentVQA dataset provides a rich source of document images and related questions, enabling the model to learn document understanding tasks.
- Fine-tuning a large language model like Llama 3.2 11b can be efficiently done using SageMaker's distributed training capabilities.
- The process of importing a custom model into Bedrock involves several steps, including modifying the model artifacts and setting up the necessary IAM roles.
- The imported model in Bedrock can handle both text-only and image-text queries, demonstrating its multi-modal capabilities.

By following this notebook, you've learned how to create a powerful, custom vision-language model that can be used for a wide range of document understanding and visual question-answering tasks. This model can be further integrated into your applications using the Bedrock API, opening up possibilities for advanced document processing, information extraction, and more.

Next steps:
- Experiment with different datasets or combine multiple datasets to enhance the model's capabilities.
- Explore advanced fine-tuning techniques such as parameter-efficient fine-tuning (PEFT) methods.
- Integrate the custom Bedrock model into your applications and evaluate its performance on real-world tasks.
- Consider optimizing the model for specific use cases by adjusting hyperparameters or using domain-specific data.

Remember to clean up any resources you've created during this notebook to avoid unnecessary charges, and refer to the AWS documentation for best practices in managing and securing your AI models.

    
