# Interacting with Claude 3-Sonnet with images

## Context

Claude 3 now includes the ability to pass an image along with text to the model. This allows you to ask questions about an image opening up a another dimension of interactivity. With Claude 3, the new Messages API body format is required. The following is an example of a multimodal in the Messages API format.

Please see [Claude Vision](https://docs.anthropic.com/claude/docs/vision) for more details on Claude 3 multimodal capabilties and [Amazon Bedrock Claude Messages API](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages.html) for working with the new Messages API on Bedrock


In [None]:
{
  "modelId": "anthropic.claude-3-sonnet-20240229-v1:0",
  "contentType": "application/json",
  "accept": "application/json",
  "body": {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1500,
    "messages": {
      "role": "user",
      "content": [
        {
          "type": "image",
          "source": {
            "type": "base64",
            "media_type": "image/jpeg",
            "data": "iVBORw..."
          }
        },
        {
          "type": "text",
          "text": "What's in this image?"
        }
      ]
    }
  }
}

In [None]:
%pip install --upgrade pip
%pip install boto3>=1.33.2 --force-reinstall --quiet
%pip install botocore>=1.33.2 --force-reinstall --quiet


### Restart the kernel with the updated packages that are installed through the dependencies above

In [None]:
# restart kernel
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

### Follow the steps below to set up necessary packages
1. Import the necessary libraries for creating the __bedrock-runtime__ needed to invoke foundation models, formatting our JSON bodies, and converting our images into base64 encoding

In [None]:
import boto3
import json
import base64
import json
import xml.etree.ElementTree as ET
from datetime import datetime
from os import listdir, makedirs
from os.path import isfile, join
import shutil

import boto3
from IPython.display import HTML, display, Image as IImage
from PIL import Image, ImageDraw, ImageFont
import time
import os

from IPython.display import JSON


bedrock_client = boto3.client('bedrock-runtime',region_name='ap-south-1')


### Define helper function to pass our models, messages, and inference parameters

In [None]:
def generate_message(bedrock_runtime, model_id, messages, max_tokens,top_p,temp):

    body=json.dumps(
        {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": max_tokens,
            "messages": messages,
            "temperature": temp,
            "top_p": top_p
        }  
    )  
    
    response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
    response_body = json.loads(response.get('body').read())

    return response_body

In [None]:
bucketName = f"anagh-sample"
imageName = "pan_sample.JPG"
s3 = boto3.client('s3') 
display(IImage(url=s3.generate_presigned_url('get_object', Params={'Bucket': bucketName, 'Key': imageName})))
s3.download_file(bucketName, imageName, imageName)

### Process the jpeg image

Here we process the jpeg image into b64 encoding. The result will be used as the image component of the message given to Claude 3. For further details on processing of the images for use in an API call please see [Claude Vision](https://docs.anthropic.com/claude/docs/vision)

In [None]:
# Read reference image from file and encode as base64 strings.
with open(imageName, "rb") as image_file:
    content_image = base64.b64encode(image_file.read()).decode('utf8')

### Create message payload that incorporates text and image input

Here we create the multimodal content message for our input to Claude 3 with seperate JSON objects for the text component and the image component.

In [None]:
message_mm=[

    { "role": "user",
      "content": [
      {"type": "image","source": { "type": "base64","media_type":"image/jpeg","data": content_image}},
      {"type": "text","text": "Extract text from image. List only the text found as a json document having structure with each word inside a text element and all text elementsof a line in a line element"}
      ]
    }
]


In [None]:
response = generate_message(bedrock_client, model_id = "anthropic.claude-3-sonnet-20240229-v1:0",messages=message_mm,max_tokens=512,temp=0.5,top_p=0.9)

In [None]:
response

In [None]:
response['content'][0]['text']

In [None]:
prompt = """The given document is an Indian PAN card where
- The person-name is in the first line (key is person-name)
- The parent-name is in the second line (key is parent-name)
- The Date-Of-Birth (key is DOB)
- PAN number (key is PAN)

Extract all the above fields in key value pair and respond as json
"""

In [None]:
message_mm=[

    { "role": "user",
      "content": [
      {"type": "image","source": { "type": "base64","media_type":"image/jpeg","data": content_image}},
      {"type": "text","text": prompt}
      ]
    }
]

response = generate_message(bedrock_client, model_id = "anthropic.claude-3-sonnet-20240229-v1:0",messages=message_mm,max_tokens=512,temp=0.5,top_p=0.9)

In [None]:
response['content'][0]['text']