## Use FM tooling via Bedrock Converse API to achieve Intelligent Document Proccessing

Document processing is a common challenge faced by many organizations across various industries. It often involves handling large volumes of diverse documents, extracting relevant information, and making informed decisions based on that data. In this notebook, we'll address this real-world problem with a focus on the mortgage application process as an example. We will utilize Anthroipic's multi-modal models like Claude 3 Haiku and Claude 3 Sonnet via the Bedrock Cpnverse API. We will also demonstarte the use of tooling to create an intelligent agent loop. 


An agent loop enables Foundational Models (FM) to tackle complex, multi-stage problems. This design pattern involves a series of iterative steps where the LLM engages in multiple interactions, executes various functions, and analyzes their outcomes. This process continues until the desired end result is accomplished, allowing the AI to navigate through intricate problem-solving scenarios.

In [1]:
%pip install -r requirements.txt

Note: you may need to restart the kernel to use updated packages.


In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import boto3, json
from constants import ModelIDs, Temperature, ToolConfig
from utils import FileUtility
from bedrock_util import BedrockUtils
from tools import IDPTools

message_list = []

sonnet_model_id = ModelIDs.anthropic_claude_3_sonnet
haiku_model_id = ModelIDs.anthropic_claude_3_haiku

temp_focused = Temperature.FOCUSED
temp_balanced = Temperature.BALANCED

sonnet_bedrock_utils = BedrockUtils(model_id=sonnet_model_id)
haiku_bedrock_utils = BedrockUtils(model_id=haiku_model_id)

tool = IDPTools()

In [3]:
def limited_print(data, limit=1000):
    print(str(data)[:limit] + "..." if len(str(data)) > limit else data)

In [4]:
source_folder = "forms"
target_folder = "images/_extracted"
source_bucket = "bedrock-tool-use-789068066945"
# source_key = "forms/urls_filled.pdf"
# source_key = "forms/urla_1.png"
# source_key = "new-jersey-drivers-license.png"
# source_key = "loan-applications/loan-package.zip"
source_key = "loan-applications/combined_application.pdf"

In [27]:
messages = sonnet_bedrock_utils.run_loop(
            f"Start document processing of loan application file {source_key} in s3 bucket {source_bucket}", 
            ToolConfig.COT,
            tool.get_tool_result
        )

print("\nMESSAGES:\n")
print(json.dumps(messages, indent=4))

Invoking Bedrock model anthropic.claude-3-sonnet-20240229-v1:0...
Input Tokens: 1781
Output Tokens: 144
Using tool download_application_package
Downloaded file is not a zip file: ./tmp/combined_application.pdf
Invoking Bedrock model anthropic.claude-3-sonnet-20240229-v1:0...
Input Tokens: 1949
Output Tokens: 91
Using tool classify_documents
Invoking Bedrock model anthropic.claude-3-sonnet-20240229-v1:0...
Input Tokens: 12102
Output Tokens: 190
Invoking Bedrock model anthropic.claude-3-sonnet-20240229-v1:0...
Input Tokens: 2257
Output Tokens: 295
Using tool check_required_documents
doc list is docs
Invoking Bedrock model anthropic.claude-3-haiku-20240307-v1:0...
Input Tokens: 102
Output Tokens: 70
Invoking Bedrock model anthropic.claude-3-sonnet-20240229-v1:0...
Input Tokens: 2660
Output Tokens: 93
Using tool reject_incomplete_application
Invoking Bedrock model anthropic.claude-3-haiku-20240307-v1:0...
Input Tokens: 49
Output Tokens: 186
Invoking Bedrock model anthropic.claude-3-sonnet-

In [28]:
messages = haiku_bedrock_utils.run_loop(
            f"Start document processing of loan application file {source_key} in s3 bucket {source_bucket}", 
            ToolConfig.COT, 
            tool.get_tool_result
        )

print("\nMESSAGES:\n")
print(json.dumps(messages, indent=4))

Invoking Bedrock model anthropic.claude-3-haiku-20240307-v1:0...
Input Tokens: 1886
Output Tokens: 130
Using tool download_application_package
Downloaded file is not a zip file: ./combined_application.pdf
Invoking Bedrock model anthropic.claude-3-haiku-20240307-v1:0...
Input Tokens: 2038
Output Tokens: 114
Using tool classify_documents
Invoking Bedrock model anthropic.claude-3-sonnet-20240229-v1:0...
Input Tokens: 12102
Output Tokens: 190
Invoking Bedrock model anthropic.claude-3-haiku-20240307-v1:0...
Input Tokens: 2369
Output Tokens: 205
Using tool check_required_documents
Invoking Bedrock model anthropic.claude-3-haiku-20240307-v1:0...
Input Tokens: 2596
Output Tokens: 158
Using tool check_required_documents
doc list is URLA, DRIVERS_LICENSE
Invoking Bedrock model anthropic.claude-3-haiku-20240307-v1:0...
Input Tokens: 109
Output Tokens: 8
Invoking Bedrock model anthropic.claude-3-haiku-20240307-v1:0...
Input Tokens: 2791
Output Tokens: 85
Using tool reject_incomplete_application
In

TypeError: string indices must be integers

In [21]:
classified_documents = json.load("{'URLA': ['image1.png', 'image2.png', 'image3.png', 'image4.png', 'image5.png', 'image6.png', 'image7.png', 'image8.png', 'image9.png'], 'DRIVERS_LICENSE': ['image10.png']}")

keys_list = list(classified_documents.keys())
doc_list = ', '.join(keys_list)

AttributeError: 'str' object has no attribute 'read'

In [5]:
binary_data=""
if file_path.endswith('.pdf'):
    binary_data = file_util.pdf_to_png_bytes(file_path)
    media_type = "jpeg"
    print(f"Number of pages converted: {len(binary_data)}")
    print(f"First page base64 (truncated): {binary_data[0][:50]}...")
elif file_path.endswith(('.jpeg', '.jpg', '.png')):
    binary_data, media_type = file_util.image_to_base64(file_path)
    print(f"Image converted to base64 with media type: {media_type}")
    print(f"Base64 (truncated): {binary_data[0][:50]}...")
else:
    print(f"Unsupported file type: {file_path}")
    binary_data = None
    

Image converted to base64 with media type: png
Base64 (truncated): b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x03\x1b\x00\x00\x01\xfe\x08\x02\x00\x00\x00{\xaa\x84\x86\x00\x00\x00\x01sRGB\x00\xae\xce\x1c\xe9\x00\x00\x00x'...


In [5]:
binary_data=""
if file_path.endswith('.pdf'):
    binary_data = file_util.pdf_to_png_bytes(file_path)
    media_type = "jpeg"
    print(f"Number of pages converted: {len(binary_data)}")
    print(f"First page base64 (truncated): {binary_data[0][:50]}...")
elif file_path.endswith(('.jpeg', '.jpg', '.png')):
    binary_data, media_type = file_util.image_to_base64(file_path)
    print(f"Image converted to base64 with media type: {media_type}")
    print(f"Base64 (truncated): {binary_data[0][:50]}...")
else:
    print(f"Unsupported file type: {file_path}")
    binary_data = None
    

Image converted to base64 with media type: png
Base64 (truncated): b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x03\x1b\x00\x00\x01\xfe\x08\x02\x00\x00\x00{\xaa\x84\x86\x00\x00\x00\x01sRGB\x00\xae\xce\x1c\xe9\x00\x00\x00x'...


In [6]:
def categorize_document(binary_data, media_type):
    message_list = [
        {
            "role": 'user',
            "content": [
                {"image": {"format": media_type, "source": {"bytes": binary_data[0]}}},
                # [{"image": {"format": media_type, "source": {"bytes": data}}} for data in binary_data],
                {"text": "What type of document is this?"}
            ]
        }
    ]
    system_message = [
        {"text": "<task>Categorize the attached images.</task> <output_choices>URLA, W2, PS, DL, or UNK</output_choices> In this case PS means pay stub, DL means driver's license, and UNK means unknown. <important>Include only the answer and nothing else</important>"}
    ]

    response = haiku_bedrock_utils.invoke_bedrock(message_list=message_list, system_message=system_message)

    response_message = response['output']['message']
    return response_message
    

In [7]:
resp = categorize_document(binary_data, media_type)

Invoking Bedrock model...


In [28]:
response = tool.categorize_document(["./combined_application.pdf"])

[
    {
        "role": "user",
        "content": [
            {
                "image": {
                    "format": "png",
                    "source": {
                        "bytes": "b'\\x8"
                    }
                }
            },
            {
                "image": {
                    "format": "png",
                    "source": {
                        "bytes": "b'\\x8"
                    }
                }
            },
            {
                "image": {
                    "format": "png",
                    "source": {
                        "bytes": "b'\\x8"
                    }
                }
            },
            {
                "image": {
                    "format": "png",
                    "source": {
                        "bytes": "b'\\x8"
                    }
                }
            },
            {
                "image": {
                    "format": "png",
                    "source": {
         

In [8]:
print(resp)

{'role': 'assistant', 'content': [{'text': 'DL'}]}
