Objective:
- Generate captions for images using Claude 3 Haiku on Amazon Bedrock

What are we using?
- Python
- AWS
-- Amazon Bedrock
-- IAM
-- Boto3 (AWS SDK for Python)
- Claude 3 Haiku

What do you need to install?
- python-env
-- to load our credentials from environment variables instead of hardcoding them in our code (security best practice)
-- boto3


In [1]:
import os
import dotenv
import boto3
import json
import base64

In [11]:
# load environment variables
# we use override=True to ensure that the values are refreshed if we edit them on the external configuration file since there seems to be a bug with the Jupyter extension for VS Code where it doesn't reload them even if you close and open the notebook again
dotenv.load_dotenv(".env", override=True)

True

In [24]:
# set our credentials from the environment values loaded form the .env file
AWS_ACCESS_KEY_ID = os.environ.get('AWS_ACCESS_KEY_ID')
AWS_SECRET_ACCESS_KEY = os.environ.get('AWS_SECRET_ACCESS_KEY')
AWS_REGION = os.environ.get('AWS_REGION')
AWS_SESSION_TOKEN = os.environ.get('AWS_SESSION_TOKEN')

In [25]:
print(AWS_ACCESS_KEY_ID)
print(AWS_SECRET_ACCESS_KEY)
print(AWS_REGION)
print(AWS_SESSION_TOKEN)

ASIA32JSMWA33WI7BVXC
gMIaYnfuJwxFmVxTfBPwfnIB6Ah+DDhag2HZ8TnK
us-west-2
IQoJb3JpZ2luX2VjEKL//////////wEaCXVzLWVhc3QtMSJGMEQCIE+TGwE0Jc299ftKKGj1OrJFpgXmWX9c8CI5MQT0xWMjAiBTjkrFW5WNMisXGwSF0gRT0b+8nyq3Rx40tu6oiHaFgiqZAgg7EAMaDDgxMjM5MTM3ODk5OSIMGJ1m9Aca1u5m33JJKvYB0F5mAAUnKTEa+CP0i79Lx1mXUi/eWQf2x+5XPXTUKgySBpH+Iu+0QvWNuPbPX/dVMcDrBgz+BlayEZ0eauQN8dLYUsO+FujhdoKkUBJbuJnhzZuwmI0BCPFEw9mP7dEHTToJTqN1cnEoEsrclxMOVd8fEF8bxOnZ2iXrW9easEzZu1h/pJSY2Ay27fdduwnAGJkSPTEY6GwukVJHAma9GDaYVeIB+s7t7L1CjURQ2OtyXTCJMH5WD0jNBGIX9OTeeaI5ls3mZGLZrmesU2hNLer7AL9dBD2WVr6ghFl15c6yvu0SQMrwbskMNdHac/9A21DVeHsyMIj5sMAGOp4Bmz2KkHmxMmcWj1rk32b9owesuuTo3HRKFOuNSx6/Gw4ClmMIJSK4xs7LuXXCL9O9TbK9txjjTR6m0EWxkui8Sx0Zsc0gu4ji121DhIzYjfN12hBuzz5dKNDAnVxESLg1UZgrhSzzO0mQYPrvLWY8ZEt1JXvKkvWaVEaJ91fuQ6jggu/gBibCF4I/+qBtY7R3qszrwYEFH7nBe5cTBFg=


In [26]:
# instantiate a bedrock client using boto3 (AWS' official Python SDK)
bedrock_runtime_client = boto3.client(
    'bedrock-runtime',
    aws_access_key_id=AWS_ACCESS_KEY_ID,
    aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
    region_name=AWS_REGION,
    aws_session_token=AWS_SESSION_TOKEN
)

We now need to find the right model id that we can use to send prompts to Claude 3 Haiku. 
Bedrock is a serverless portal to many Foundation Models and the way you distinguish between them is by their unique model ids. 

You can find these in two ways:
1/ Via the Bedrock Console:
+ navigate to the AWS Console
+ navigate to Amazon Bedrock
+ find the menu where it lists the Foundation Models
+ Each model has an API request sample as part of their details where you can copy the model id from

2/ Via the AWS CLI:
+ type 

    **aws bedrock list-foundation-models**

+ scroll till you find the one you want
+ copy the model id
+ BONUS TIP: you can filter results ahead of time by using the --by-provider option. In our case, since we want to find out the model id for Anthropic's Claude 3 Haiku model, we could type the following instead to make our lives easier:

    **aws bedrock list-foundation-models --by-provider Anthropic**

In [27]:
# select the model id
model_id = "anthropic.claude-3-haiku-20240307-v1:0"

In [28]:
# read our image as binary data first
with open('data/aws-serverless-api-architecture-diagram.png', 'rb') as image_file:
    encoded_image = base64.b64encode(image_file.read()).decode()

In [29]:
payload = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/jpeg",
                        "data": encoded_image
                    }
                },
                {
                    "type": "text",
                    "text": "Explain this AWS architecture diagram."
                }
            ]
        }
    ],
    "max_tokens": 10000,
    "anthropic_version": "bedrock-2023-05-31"
}

One of the difference between Foundation Models is the way you interact with them. Each has their own way of receiving input so you need to look up the correct way to send a payload depending on the model you're using.

For Claude 3 models, the template is the following:
payload = {
    "messages": [
        {
            "role": "",
            "content": []
        }
    ],
    "anthropic_version": ""
}

Messages is an array of json objects which must contain at least one item following. Each message must strictly follow the schema and declare:
- role: this can be either user, or system. 
- content: this is also an array as you can send multiple content items in one API request to Claude. At minimum you will have one.

https://community.aws/content/2dfToY7frDS4y8LsTkntgBzORju/hands-on?lang=en


we first need to load our image and convert it to base64

In [30]:
# we're ready to invoke the model!
response = bedrock_runtime_client.invoke_model(
    modelId=model_id,
    contentType="application/json",
    body=json.dumps(payload)
)

In [31]:
# now we need to read the response. It comes back as a stream of bytes so if we want to display the response in one go we need to read the full stream first
# then convert it to a string as json and load it as a dictionary so we can access the field containing the content without all the metadata noise
output_binary = response["body"].read()
output_json = json.loads(output_binary)
output = output_json["content"][0]["text"]


In [32]:
print(output)

This AWS architecture diagram illustrates a typical setup involving various AWS services and components. Let's go through the different elements:

1. Application: This represents the application or workload that is being deployed on AWS.

2. API Gateway: The API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. It acts as the entry point for the application to interact with the backend services.

3. AWS Lambda: AWS Lambda is a serverless computing service that allows you to run code without provisioning or managing servers. In this diagram, Lambda functions are depicted as part of the backend processing.

4. DynamoDB: DynamoDB is a fully managed NoSQL database service provided by AWS. It is shown as the data store for the application.

5. API Gateway Event Trigger: This represents the integration between the API Gateway and the Lambda functions, allowing the API calls to trigger the execution of the