## Use FM tooling via Bedrock Converse API to achieve Intelligent Document Proccessing

Document processing is a common challenge faced by many organizations across various industries. It often involves handling large volumes of diverse documents, extracting relevant information, and making informed decisions based on that data. In this notebook, we'll address this real-world problem with a focus on the mortgage application process as an example. We will utilize Anthroipic's multi-modal models like Claude 3 Haiku and Claude 3 Sonnet via the Bedrock Cpnverse API. We will also demonstarte the use of tooling to create an intelligent agent loop. 


An agent loop enables Foundational Models (FM) to tackle complex, multi-stage problems. This design pattern involves a series of iterative steps where the LLM engages in multiple interactions, executes various functions, and analyzes their outcomes. This process continues until the desired end result is accomplished, allowing the AI to navigate through intricate problem-solving scenarios.

In [1]:
%pip install -r requirements.txt

Note: you may need to restart the kernel to use updated packages.


In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import boto3, json
from constants import ModelIDs, Temperature
from utils import FileUtility
from bedrock_util import BedrockUtils
from tools import IDPTools

message_list = []

sonnet_model_id = ModelIDs.anthropic_claude_3_sonnet
haiku_model_id = ModelIDs.anthropic_claude_3_haiku

temp_focused = Temperature.FOCUSED
temp_balanced = Temperature.BALANCED

sonnet_bedrock_utils = BedrockUtils(model_id=sonnet_model_id)
haiku_bedrock_utils = BedrockUtils(model_id=haiku_model_id)

tool = IDPTools()

In [3]:
def limited_print(data, limit=1000):
    print(str(data)[:limit] + "..." if len(str(data)) > limit else data)

In [13]:
source_folder = "forms"
target_folder = "images/_extracted"
source_bucket = "bedrock-tool-use-789068066945"
# source_key = "forms/urls_filled.pdf"
# source_key = "forms/urla_1.png"
# source_key = "new-jersey-drivers-license.png"
source_key = "loan-applications/loan-package.zip"
# source_key = "loan-applications/combined_application.pdf"

In [14]:
cot_tools_config = [
    {
        "toolSpec": {
            "name": "download_application_package",
            "description": "Downloads the file containing loan application documents from S3, extracts its content and returns a list of paths of extracted files, or None if extraction failed.",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "source_bucket": {
                            "type": "string",
                            "description": "The S3 bucket name where the zip file is stored."
                        },
                        "source_key": {
                            "type": "string",
                            "description": "The S3 key (path) of the zip file."
                        },
                        "target_folder": {
                            "type": "string",
                            "description": "Local folder where the zip file should be saved."
                        }
                    },
                    "required": ["source_bucket", "source_key", "target_folder"]
                }
            }
        }
    },
    {
        "toolSpec": {
            "name": "classify_documents",
            "description": "Classify the documents in the loan application package.",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "document_paths": {
                            "type": "array",
                            "items": {
                                "type": "string"
                            },
                            "description": "List of paths to the extracted documents."
                        }
                    },
                    "required": ["document_paths"]
                }
            }
        }
    },
    {
        "toolSpec": {
            "name": "check_required_documents",
            # "description": "Check if all required document types (URLA, Drivers License, W2, Paystub) are present.",
            "description": "Check if all required document types (URLA, Drivers License) are present.",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "classified_documents": {
                            "type": "object",
                            "description": "Dictionary of document types and their corresponding file paths."
                        }
                    },
                    "required": ["classified_documents"]
                }
            }
        }
    },
    {
        "toolSpec": {
            "name": "reject_incomplete_application",
            "description": "Reject the loan application if required documents are missing.",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "missing_documents": {
                            "type": "array",
                            "items": {
                                "type": "string"
                            },
                            "description": "List of missing document types."
                        }
                    },
                    "required": ["missing_documents"]
                }
            }
        }
    },
    {
        "toolSpec": {
            "name": "extract_urla_information",
            "description": "Extract information from the Uniform Residential Loan Application (URLA).",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "urla_path": {
                            "type": "string",
                            "description": "Path to the URLA document."
                        }
                    },
                    "required": ["urla_path"]
                }
            }
        }
    },
    {
        "toolSpec": {
            "name": "extract_drivers_license_information",
            "description": "Extract information from the driver's license.",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "license_path": {
                            "type": "string",
                            "description": "Path to the driver's license document."
                        }
                    },
                    "required": ["license_path"]
                }
            }
        }
    },
    {
        "toolSpec": {
            "name": "extract_w2_information",
            "description": "Extract information from the W2 form.",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "w2_path": {
                            "type": "string",
                            "description": "Path to the W2 document."
                        }
                    },
                    "required": ["w2_path"]
                }
            }
        }
    },
    {
        "toolSpec": {
            "name": "stop_tool",
            "description": "Call this tool if the previous tool use is the logical end of the loop or needes to be aborted",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "stop_reason": {
                            "type": "string",
                            "description": "Reason for why the loop was aborted or ended"
                        }
                    },
                    "required": ["stop_Reason"]
                }
            }
        }
    }
]

In [16]:
messages = haiku_bedrock_utils.run_loop(
            f"Start document processing of loan application file {source_key} in s3 bucket {source_bucket}", 
            cot_tools_config,
            tool.get_tool_result
        )

print("\nMESSAGES:\n")
print(json.dumps(messages, indent=4))

Invoking Bedrock model...
Input Tokens: 1050
Output Tokens: 130
Using tool download_application_package
Deleted local zip file: ./extracted_loan-package/__MACOSX/._urls_filled.pdf
Invoking Bedrock model...
Input Tokens: 1267
Output Tokens: 106
Using tool classify_documents
Invoking Bedrock model...
Input Tokens: 1832
Output Tokens: 62
[
    {
        "role": "assistant",
        "content": [
            {
                "text": "<output_format>{'docs': [{'type': ['DRIVERS_LICENSE'], 'file_path': 'image1.jpg'}, {'type': ['URLA'], 'file_path': 'image2.jpg'}]}</output_format>"
            }
        ]
    }
]
Invoking Bedrock model...
Input Tokens: 1462
Output Tokens: 115
Using tool check_required_documents
{
    "DRIVERS_LICENSE": "./extracted_loan-package/new-jersey-drivers-license.png",
    "URLA": "./extracted_loan-package/urls_filled.pdf"
}
Invoking Bedrock model...
Input Tokens: 110
Output Tokens: 5
[
    {
        "role": "assistant",
        "content": [
            {
            

In [5]:
messages = haiku_bedrock_utils.run_loop(
            f"Start document processing of loan application file {source_key} in s3 bucket {source_bucket}", 
            cot_tools_config, 
            tool.get_tool_result
        )

print("\nMESSAGES:\n")
print(json.dumps(messages, indent=4))

Invoking Bedrock model...
Input Tokens: 1046
Output Tokens: 130
Using tool download_application_package
Deleted local zip file: ./loan-package.zip
Invoking Bedrock model...
Input Tokens: 1263
Output Tokens: 119
Using tool classify_documents
Invoking Bedrock model...
Input Tokens: 1773
Output Tokens: 38
Invoking Bedrock model...
Input Tokens: 1447
Output Tokens: 139
Using tool check_required_documents
Invoking Bedrock model...
Input Tokens: 1611
Output Tokens: 114
Using tool classify_documents
Invoking Bedrock model...
Input Tokens: 1773
Output Tokens: 38
Invoking Bedrock model...
Input Tokens: 1790
Output Tokens: 135
Using tool reject_incomplete_application
Invoking Bedrock model...
Input Tokens: 1950
Output Tokens: 89
Hit loop limit: 6

MESSAGES:

[
    {
        "role": "user",
        "content": [
            {
                "text": "Start document processing of loan application file loan-applications/loan-package.zip in s3 bucket bedrock-tool-use-789068066945"
            }
     

In [5]:
binary_data=""
if file_path.endswith('.pdf'):
    binary_data = file_util.pdf_to_png_bytes(file_path)
    media_type = "jpeg"
    print(f"Number of pages converted: {len(binary_data)}")
    print(f"First page base64 (truncated): {binary_data[0][:50]}...")
elif file_path.endswith(('.jpeg', '.jpg', '.png')):
    binary_data, media_type = file_util.image_to_base64(file_path)
    print(f"Image converted to base64 with media type: {media_type}")
    print(f"Base64 (truncated): {binary_data[0][:50]}...")
else:
    print(f"Unsupported file type: {file_path}")
    binary_data = None
    

Image converted to base64 with media type: png
Base64 (truncated): b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x03\x1b\x00\x00\x01\xfe\x08\x02\x00\x00\x00{\xaa\x84\x86\x00\x00\x00\x01sRGB\x00\xae\xce\x1c\xe9\x00\x00\x00x'...


In [5]:
binary_data=""
if file_path.endswith('.pdf'):
    binary_data = file_util.pdf_to_png_bytes(file_path)
    media_type = "jpeg"
    print(f"Number of pages converted: {len(binary_data)}")
    print(f"First page base64 (truncated): {binary_data[0][:50]}...")
elif file_path.endswith(('.jpeg', '.jpg', '.png')):
    binary_data, media_type = file_util.image_to_base64(file_path)
    print(f"Image converted to base64 with media type: {media_type}")
    print(f"Base64 (truncated): {binary_data[0][:50]}...")
else:
    print(f"Unsupported file type: {file_path}")
    binary_data = None
    

Image converted to base64 with media type: png
Base64 (truncated): b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x03\x1b\x00\x00\x01\xfe\x08\x02\x00\x00\x00{\xaa\x84\x86\x00\x00\x00\x01sRGB\x00\xae\xce\x1c\xe9\x00\x00\x00x'...


In [6]:
def categorize_document(binary_data, media_type):
    message_list = [
        {
            "role": 'user',
            "content": [
                {"image": {"format": media_type, "source": {"bytes": binary_data[0]}}},
                # [{"image": {"format": media_type, "source": {"bytes": data}}} for data in binary_data],
                {"text": "What type of document is this?"}
            ]
        }
    ]
    system_message = [
        {"text": "<task>Categorize the attached images.</task> <output_choices>URLA, W2, PS, DL, or UNK</output_choices> In this case PS means pay stub, DL means driver's license, and UNK means unknown. <important>Include only the answer and nothing else</important>"}
    ]

    response = haiku_bedrock_utils.invoke_bedrock(message_list=message_list, system_message=system_message)

    response_message = response['output']['message']
    return response_message
    

In [7]:
resp = categorize_document(binary_data, media_type)

Invoking Bedrock model...


In [28]:
response = tool.categorize_document(["./combined_application.pdf"])

[
    {
        "role": "user",
        "content": [
            {
                "image": {
                    "format": "png",
                    "source": {
                        "bytes": "b'\\x8"
                    }
                }
            },
            {
                "image": {
                    "format": "png",
                    "source": {
                        "bytes": "b'\\x8"
                    }
                }
            },
            {
                "image": {
                    "format": "png",
                    "source": {
                        "bytes": "b'\\x8"
                    }
                }
            },
            {
                "image": {
                    "format": "png",
                    "source": {
                        "bytes": "b'\\x8"
                    }
                }
            },
            {
                "image": {
                    "format": "png",
                    "source": {
         

In [8]:
print(resp)

{'role': 'assistant', 'content': [{'text': 'DL'}]}
