### Extract Information from uploaded ID Cards using Claude 3 Sonnet + Bedrock Converse API

In this notebook we will look at using Amazon Bedrock's Converse API with Tool Use to extract structured data from ID's 

### 1. Bedrock Setup

In [2]:
%%capture
#Install dependencies 
!pip3 install -qU boto3

In [3]:
#Imports
import boto3
import json, sys
from datetime import datetime

print('Running boto3 version:', boto3.__version__)

Running boto3 version: 1.34.144


In [4]:
#Setting Up Bedrock Client
region = 'us-east-1' #AWS Region
print('Using region: ', region)

bedrock = boto3.client( #Bedrock Client 
    service_name = 'bedrock-runtime',
    region_name = region,
    )
#Model Id (Claude 3 Sonnet)
modelId = "anthropic.claude-3-sonnet-20240229-v1:0"

Using region:  us-east-1


In [5]:
#Open Image & convert it into Bytes (Real License)
with open("DriverLicense1.png", "rb") as image:
  f = image.read()
  image_byte = bytearray(f)

### 2. Bedrock Converse API + Tool Use 

In the cell below, we are giving the LLM a tool to use. This tool named "extract_info" studies the image provided and extracts each piece of information asked for as a parameter. This tool will return the information as a JSON object.

In [6]:
#Tool Spec Outline 
tools = [
    {
        "toolSpec": {
            "name": "extract_info",
            "description": "Extract the following details from the Driver's License provided",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "entities": {
                            "type": "array",
                            "items": {
                                "type": "object",
                                "properties": {
                                    "DocumentType": {"type": "string", "description": "The document type of the submitted image"},
                                    "IDNumber": {"type": "string", "description": "The ID Number of the document"},
                                    "FirstName": {"type": "string", "description": "The first name or given name of the individual"},
                                    "MiddletName": {"type": "string", "description": "The middle name or initial of the individual. If there is none, use 'N/A'."},
                                    "LastName": {"type": "string", "description": "The last name or family name of the individual"},
                                    "Address": {"type": "string", "description": "The Address of the individual"},
                                    "DOB": {"type": "string", "description": "The date of birth of the individual"},
                                    "IssueDate": {"type": "string", "description": "The issue date of the document"},
                                    "ExpiryDate": {"type": "string", "description": "The expiry date of the document"}
                                    
                                    
                                },
                                "required": [
                                    "DocumentType", 
                                    "IDNumber", 
                                    "FirstName",
                                    "MiddleName",
                                    "LastName",
                                    "Address",
                                    "DOB",
                                    "IssueDate",
                                    "ExpiryDate"
                                ]
                            }
                        }
                    },
                    "required": ["entities"]
                }
            }
        }
    },
    {
        "toolSpec": {
            "name": "extract_visual",
            "description": "Extract the following visual details from the Driver's License provided",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "entities": {
                            "type": "array",
                            "items": {
                                "type": "object",
                                "properties": {
                                    "StateTextColor": {"type": "string", "description": "The color of the text that the State is written in"},
                                    "SymbolTopRight": {"type": "string", "description": "The description of the symbol on the top right of the document"},
                                    "SymbolBottomRight": {"type": "string", "description": "The description of the symbol on the bottom right of the document"},
                                    "Signature": {"type": "string", "description": "color & style of the signature on the document"},
                                    "Picture": {"type": "string", "description": "Description of the picture of the individual, and any obvious alterations present"},
                                    "Blemishes": {"type": "string", "description": "Any visual blemishes present on the document"}
                                },
                                "required": [
                                    "StateTextColor", 
                                    "SymbolTopRight", 
                                    "SymbolBottomRight", 
                                    "Signature",
                                    "Picture",
                                    "Blemishes"
                                ]
                            }
                        }
                    },
                    "required": ["entities"]
                }
            }
        }
    }
]


In the cell below now we are setting up the Bedrock Converse API 

In [7]:
#Set up a function to call the Bedrock Converse API
def invoke_bedrock_model(client, id, prompt, image_byte, max_tokens=2000, temperature=0, top_p=0.9):
    response = ""
    try:
        response = client.converse(
            modelId=id, #Passing Along modelId
            toolConfig={"tools": tools},
            messages=[ #Messages is where the payload goes
                {
                    "role": "user", #Indicating user input
                    "content": [ #Content of the payload as an object 
                        {
                            "image": { #Passing an image, we put this before the text as best practise 
                                "format": "png", #Image format
                                "source": {"bytes": image_byte} #Passing image bytes 
                            }
                        },
                        {
                            "text": prompt #indicating passing text, 'prompt' variable we will create in the cell below
                        }
                    ]
                }
            ],
            inferenceConfig={ # Inference parameters
                "temperature": temperature,
                "maxTokens": max_tokens,
                "topP": top_p
            }
            #additionalModelRequestFields={
            #}
        )
    except Exception as e: #Catch model invocation errors 
        print(e)
        result = "Model invocation error"
    try:
        result = response['output']['message']['content'][0]['text'] \
        + '\n--- Latency: ' + str(response['metrics']['latencyMs']) \
        + 'ms - Input tokens:' + str(response['usage']['inputTokens']) \
        + ' - Output tokens:' + str(response['usage']['outputTokens']) + ' ---\n' 
        return result
    except Exception as e: #Catching parsing errors 
        print(e)
        result = "Output parsing error"
    #return result
    return response

### 3. Inference 

In [8]:
%%time
#User Input
prompt = '''
Use the extract_info tool based on the image given

'''


#Calling the Converse API
response = invoke_bedrock_model(bedrock, modelId, prompt, image_byte)
print(response)

'text'
{'ResponseMetadata': {'RequestId': '2b7b54aa-2ef0-4dba-8f98-4e0467d678d5', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Tue, 16 Jul 2024 01:46:45 GMT', 'content-type': 'application/json', 'content-length': '524', 'connection': 'keep-alive', 'x-amzn-requestid': '2b7b54aa-2ef0-4dba-8f98-4e0467d678d5'}, 'RetryAttempts': 0}, 'output': {'message': {'role': 'assistant', 'content': [{'toolUse': {'toolUseId': 'tooluse_h_1j-ffJSQykWg7PIa3uZQ', 'name': 'extract_info', 'input': {'entities': [{'DocumentType': 'Driver License', 'IDNumber': '123 456 789', 'FirstName': 'Michelle', 'MiddleName': 'Marie', 'LastName': 'Motorist', 'Address': '2345 ANYWHERE STREET ALBANY, NY 12222', 'DOB': '10/31/1990', 'IssueDate': '03/07/2022', 'ExpiryDate': '10/31/2029'}]}}}]}}, 'stopReason': 'tool_use', 'usage': {'inputTokens': 1550, 'outputTokens': 173, 'totalTokens': 1723}, 'metrics': {'latencyMs': 4736}}
CPU times: user 28.8 ms, sys: 7.42 ms, total: 36.2 ms
Wall time: 4.8 s


In [9]:
#Extract only the JSON Object 
json_entities = None
for content in response['output']['message']['content']:
    if "toolUse" in content and content['toolUse']['name'] == "extract_info":
        json_entities = content['toolUse']['input']
        break

if json_entities:
    print("Extracted Entities (JSON):")
    print(json.dumps(json_entities, indent=2))
else:
    print("No entities found in the response.")

Extracted Entities (JSON):
{
  "entities": [
    {
      "DocumentType": "Driver License",
      "IDNumber": "123 456 789",
      "FirstName": "Michelle",
      "MiddleName": "Marie",
      "LastName": "Motorist",
      "Address": "2345 ANYWHERE STREET ALBANY, NY 12222",
      "DOB": "10/31/1990",
      "IssueDate": "03/07/2022",
      "ExpiryDate": "10/31/2029"
    }
  ]
}


In [10]:
%%time
#User Input here we are using the "extract_visual" 
prompt = '''
Use the extract_visual tool based on the image given
'''


#Calling the Converse API
response = invoke_bedrock_model(bedrock, modelId, prompt, image_byte)
print(response)

'text'
{'ResponseMetadata': {'RequestId': '8d825848-35b4-4f76-9cc2-b1ee645da525', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Tue, 16 Jul 2024 01:46:53 GMT', 'content-type': 'application/json', 'content-length': '852', 'connection': 'keep-alive', 'x-amzn-requestid': '8d825848-35b4-4f76-9cc2-b1ee645da525'}, 'RetryAttempts': 0}, 'output': {'message': {'role': 'assistant', 'content': [{'toolUse': {'toolUseId': 'tooluse_DjupD22USterjScvMLpGDQ', 'name': 'extract_visual', 'input': {'entities': [{'StateTextColor': "The text for 'New York State' is written in teal/blue color", 'SymbolTopRight': 'There is a black star symbol in the top right corner', 'SymbolBottomRight': 'There are symbols of a bear, maple leaf, and paw print in the bottom right corner', 'Signature': 'The signature appears to be printed in black cursive font', 'Picture': 'The main photo shows a smiling person. The smaller photo appears to be the same individual with a neutral expression. No obvious alterations are visible.'

In [11]:
#Extract only the JSON Object 
json_entities = None
for content in response['output']['message']['content']:
    if "toolUse" in content and content['toolUse']['name'] == "extract_visual":
        json_entities = content['toolUse']['input']
        break

if json_entities:
    print("Extracted Entities (JSON):")
    print(json.dumps(json_entities, indent=2))
else:
    print("No entities found in the response.")

Extracted Entities (JSON):
{
  "entities": [
    {
      "StateTextColor": "The text for 'New York State' is written in teal/blue color",
      "SymbolTopRight": "There is a black star symbol in the top right corner",
      "SymbolBottomRight": "There are symbols of a bear, maple leaf, and paw print in the bottom right corner",
      "Signature": "The signature appears to be printed in black cursive font",
      "Picture": "The main photo shows a smiling person. The smaller photo appears to be the same individual with a neutral expression. No obvious alterations are visible.",
      "Blemishes": "No significant blemishes or damage is visible on the document"
    }
  ]
}
