# Fine-tuning Llama 3.2 with Vision Capabilities - Model Training and Inference

## Introduction

In this notebook, we'll use the data prepared in the previous notebook to fine-tune a Llama 3.2 multi-modal model using Amazon Bedrock. After fine-tuning, we'll test the model's performance using the test dataset.

## Setup

First, let's install and import the necessary libraries:

In [None]:
# Install required libraries
%pip install --upgrade pip
%pip install boto3 pillow tqdm matplotlib --upgrade --quiet

In [None]:
# Restart kernel to ensure updated packages take effect
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

In [None]:
import boto3
import os
import json
import time
import base64
import io
import matplotlib.pyplot as plt
from PIL import Image
from tqdm.notebook import tqdm
import warnings
warnings.filterwarnings('ignore')

In [None]:
# Set AWS region
region = "us-west-2"  # Llama 3.2 fine-tuning is only available in us-west-2

# Create AWS clients
session = boto3.session.Session(region_name=region)
s3_client = session.client('s3')
bedrock = session.client(service_name="bedrock", region_name=region)
bedrock_runtime = session.client(service_name="bedrock-runtime", region_name=region)

In [None]:
# Retrieve stored variables from previous notebook
%store -r bucket_name
%store -r train_data_uri
%store -r validation_data_uri
%store -r test_data_uri
%store -r role_arn
%store -r role_name
%store -r policy_arn

print(f"Bucket name: {bucket_name}")
print(f"Training data URI: {train_data_uri}")
print(f"Validation data URI: {validation_data_uri}")
print(f"Role ARN: {role_arn}")

## Create Fine-tuning Job

Now, we'll create a fine-tuning job for the Llama 3.2 multi-modal model:

In [None]:
# Generate a timestamp for unique naming
timestamp = time.strftime("%Y-%m-%d-%H-%M-%S")

# Define job parameters
job_name = f"llama32-multimodal-ft-{timestamp}"
custom_model_name = f"llama32-multimodel-{timestamp}"
base_model_id = "meta.llama3-2-90b-instruct-v1:0:128k"  # Llama 3.2 vision model ID

# Define hyperparameters
hyperparameters = {
    "epochCount": "2",       # Number of training epochs
    "batchSize": "1",        # Batch size for training
    "learningRate": "0.00001"  # Learning rate
}

# Define output location
output_s3_uri = f"s3://{bucket_name}/output/"

# Create validation data config
validation_data_config = {
    "validators": [{
        "s3Uri": validation_data_uri
    }]
}

In [None]:
# Create fine-tuning job
try:
    response = bedrock.create_model_customization_job(
        customizationType="FINE_TUNING",
        jobName=job_name,
        customModelName=custom_model_name,
        roleArn=role_arn,
        baseModelIdentifier=base_model_id,
        hyperParameters=hyperparameters,
        trainingDataConfig={"s3Uri": train_data_uri},
        validationDataConfig=validation_data_config,
        outputDataConfig={"s3Uri": output_s3_uri}
    )
    
    # Get job identifier
    job_arn = response["jobArn"]
    print(f"Fine-tuning job created: {job_arn}")
    
except Exception as e:
    print(f"Error creating fine-tuning job: {e}")

## Monitor Job Status

Let's monitor the status of our fine-tuning job:

<div style="
    background-color: #fcf8e3; 
    color: #8a6d3b;
    padding: 15px;
    margin-bottom: 20px;
    border: 1px solid #faebcc;
    border-radius: 4px;
    font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif;">
    <span style="font-weight:bold;">⚠️ Warning:</span> 
    <p>Fine-tuning jobs for Llama 3.2 multi-modal models may take <b>several hours to complete</b>. 
    The exact duration depends on your dataset size, model parameters, and current training resource availability.</p>
</div>

In [None]:
# Function to check job status
def check_job_status(job_arn):
    response = bedrock.get_model_customization_job(jobIdentifier=job_arn)
    return response["status"]

# Get current job status
current_status = check_job_status(job_arn)
print(f"Current job status: {current_status}")

# If job completed successfully, get the model details
if current_status == "Completed":
    model_details = bedrock.get_model_customization_job(jobIdentifier=job_arn)
    custom_model_arn = model_details["outputModelArn"]
    print(f"Fine-tuned model ARN: {custom_model_arn}")

<div style="
    background-color: #fcf8e3; 
    color: #8a6d3b;
    padding: 15px;
    margin-bottom: 20px;
    border: 1px solid #faebcc;
    border-radius: 4px;
    font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif;">
    <span style="font-weight:bold;">⚠️ Warning:</span> 
    <p>Please ensure the status is <b>"Completed"</b> before proceeding with the following cells. 
    You can re-run the status check cell above to update this status.</p>
</div>

## Visualize Training Metrics

Let's download and visualize the training metrics:

In [None]:
# Download training metrics from S3
def download_metrics():
    # Get the job ID from the ARN
    job_id = job_arn.split('/')[-1]
    
    # Define file paths
    train_metrics_s3_key = f"output/model-customization-job-{job_id}/training_artifacts/step_wise_training_metrics.csv"
    
    local_train_metrics = "train_metrics.csv"
    
    # Download files
    try:
        s3_client.download_file(bucket_name, train_metrics_s3_key, local_train_metrics)
        
        print("Metrics downloaded successfully")
        return local_train_metrics
        
    except Exception as e:
        print(f"Error downloading metrics: {e}")
        return None, None

In [None]:
# Download metrics
train_metrics_file = download_metrics()

# Plot training and validation loss if metrics are available
if train_metrics_file:
    import pandas as pd
    
    # Load metrics
    train_data = pd.read_csv(train_metrics_file)

    # Calculate step-level training loss
    train_metrics_epoch = train_data.groupby('step_number').mean()
    
    # Plot
    plt.figure(figsize=(10, 6))
    plt.plot(train_metrics_epoch.index, train_metrics_epoch.training_loss, label='Training')
    plt.title('Training Loss')
    plt.ylabel('Loss')
    plt.xlabel('Step')
    plt.legend()
    plt.grid(True)
    plt.show()

## Create Provisioned Throughput

To use the fine-tuned model for inference, we need to create provisioned throughput:

In [None]:
# Generate a unique name for provisioned throughput
provisioned_model_name = f"llama32-multi-model-prov-{timestamp}"

# Create provisioned throughput
try:
    response = bedrock.create_provisioned_model_throughput(
        modelId=custom_model_arn,
        provisionedModelName=provisioned_model_name,
        modelUnits=1  
    )
    
    provisioned_model_arn = response["provisionedModelArn"]
    print(f"Provisioned model created: {provisioned_model_arn}")
    
    # Monitor provisioning status
    status = bedrock.get_provisioned_model_throughput(provisionedModelId=provisioned_model_arn)["status"]
    print(f"Initial provisioning status: {status}")
    
    progress_bar = tqdm(desc="Provisioning model", bar_format="{desc}: {bar}")
    
    # Poll provisioning status
    while status == "Creating":
        time.sleep(60)  # Check every minute
        status = bedrock.get_provisioned_model_throughput(provisionedModelId=provisioned_model_arn)["status"]
        progress_bar.update(1)
        progress_bar.set_description(f"Provisioning model (Status: {status})")
    
    progress_bar.close()
    
    # Final status
    final_status = bedrock.get_provisioned_model_throughput(provisionedModelId=provisioned_model_arn)["status"]
    print(f"Final provisioning status: {final_status}")
    
except Exception as e:
    print(f"Error creating provisioned throughput: {e}")



## Test with Inference

Now, let's test our fine-tuned model using the test dataset:

In [None]:
# Function to convert S3 URI to local image path
def s3_uri_to_local_path(s3_uri):
    """Convert S3 URI to local file path"""
    filename = s3_uri.split('/')[-1]
    fileformat = filename.split('.')[-1]
    docname = filename.split('.')[0]

    if fileformat != "png":
        img = Image.open(f"llava_images/test/{filename}")
        img.save(f"llava_images/test/{docname}.png", optimize=True, compress_level=9)
        return f"llava_images/test/{docname}.png"

    return f"llava_images/test/{filename}"

In [None]:
# Function to run inference using Bedrock converse API
def run_inference(image_path, question, model_id):
    """Run inference using the converse API with local image"""
    
    # Read image as binary data directly
    with open(image_path, "rb") as f:
        image_bytes = f.read()

    # Create message structure matching the example
    message = {
        "role": "user",
        "content": [
            {
                "image": {
                    "format": "png",  # Our images are jpg format
                    "source": {
                        "bytes": image_bytes  # Raw bytes, no base64 encoding
                    }
                }
            },
            {
                "text": question
            }
        ]
    }

    inference_config = {"temperature": 0.01}
    
    # Call the converse API
    response = bedrock_runtime.converse(
        modelId=model_id,
        messages=[message],
        inferenceConfig=inference_config
    )
    
    # Extract response text following the example
    response_text = response["output"]["message"]["content"][0]["text"]
    return response_text


In [None]:
# Load test samples from local file
with open('test.jsonl', 'r') as f:
    test_samples = [json.loads(line) for line in f.readlines()][:5]  # Get first 5 samples

# Run inference on test samples
for i, sample in enumerate(test_samples):
    # Extract information
    question = sample["messages"][0]["content"][0]["text"]
    image_s3_uri = sample["messages"][0]["content"][1]["image"]["source"]["s3Location"]["uri"]
    expected_answer = sample["messages"][1]["content"][0]["text"]
    
    # Convert S3 URI to local file path
    local_image_path = s3_uri_to_local_path(image_s3_uri)
    
    print(f"\n=== Test Sample {i+1} ===")
    print(f"Question: {question}")
    print(f"Expected Answer: {expected_answer}")
    
    try:
        # Run inference
        model_response = run_inference(local_image_path, question, provisioned_model_arn)
        print(f"Model Response: {model_response}")
    except Exception as e:
        print(f"Inference error: {e}")
        # If the provisioned model ARN doesn't work, try using the regular model ID format
        try:
            # Try with a standard model ID format as shown in the example
            model_response = run_inference(local_image_path, question, provisioned_model_arn)
            print(f"Model Response (using standard model ID): {model_response}")
        except Exception as e2:
            print(f"Second attempt failed: {e2}")
    
    # Display the image
    image = Image.open(local_image_path)
    plt.figure(figsize=(6, 6))
    plt.imshow(image)
    plt.axis('off')
    plt.title(f"Test Image {i+1}")
    plt.show()

## Clean Up Resources

Finally, let's clean up the resources we created:

In [None]:
# Function to clean up resources
def clean_up():
    print("Cleaning up resources...")
    
    # Delete provisioned model throughput
    try:
        print("Deleting provisioned model throughput...")
        bedrock.delete_provisioned_model_throughput(provisionedModelId=provisioned_model_arn)
        print("Provisioned model throughput deleted")
    except Exception as e:
        print(f"Error deleting provisioned model throughput: {e}")
    
    # Clean up IAM resources
    iam = session.client('iam')
    try:
        print("Detaching policy from role...")
        iam.detach_role_policy(RoleName=role_name, PolicyArn=policy_arn)
        
        print("Deleting policy...")
        iam.delete_policy(PolicyArn=policy_arn)
        
        print("Deleting role...")
        iam.delete_role(RoleName=role_name)
        
        print("IAM resources cleaned up")
    except Exception as e:
        print(f"Error cleaning up IAM resources: {e}")
    
    # We're not deleting the S3 bucket here as you might want to keep your data and model
    
    print("Cleanup completed")

In [None]:
clean_up()

## Conclusion

In this notebook, we successfully fine-tuned a Llama 3.2 multi-modal model using Amazon Bedrock. We:

- Set up and launched a fine-tuning job with our prepared dataset
- Monitored the job progress and visualized training metrics
- Created provisioned throughput for the fine-tuned model
- Tested the model's performance with inference on test samples
- Cleaned up resources we no longer needed

The fine-tuned model can now answer questions about images based on the patterns it learned from our training data. For real-world applications, you may want to use a larger and more diverse dataset tailored to your specific use case.