# Guidance for Document Processing Using Amazon Bedrock Data Automation

Amazon Bedrock Data Automation (BDA) is a fully managed capability of Amazon Bedrock that streamlines the generation of valuable insights from unstructured, multimodal content such as documents, images, audio, and videos. With Amazon Bedrock Data Automation, you can build automated intelligent document processing (IDP), media analysis, and Retrieval-Augmented Generation (RAG) workflows quickly and cost-effectively.

This workbook focuses on using BDA to process insights from unstructured documents. The use case we will focus on is for processing a loan applcation. We will process a packet of documents relavent to loans: ID Cards, Bank Statements, W2 Tax forms, Pay Stubs and checks.  


The diagram below shows an architecture for an Intelligent document processing workflow. This diagram is from solution 'Guidance for Multimodal Data Processing Using Amazon Bedrock Data Automation', published [here](https://aws.amazon.com/solutions/guidance/multimodal-data-processing-using-amazon-bedrock-data-automation/).


![Arch](./images/a_lending_flow_architecture.png)


1. Document Upload: The data science team uploads sample documents to an Amazon S3 bucket.

2. Blueprint Configuration: The data science team uses provided blueprints, and creates new custom blueprints for each document class: W2, Pay Slip, Drivers License, 1099 and Bank Statement.  Each sample is processed and the fields extracted with Generative AI prompts (e.g. First Name, Last Name, Gross Pay, SS Number, License Number, Capital Gains, Closing Balance).  The blueprints are managed and stored in Amazon Bedrock Data Automation feature. 

3. Test and Refine Blueprints: The blueprints are tested and refined. Key normalizations, key transformations, and key validations are added. 

4. Blueprint Published: The blueprints are managed and stored in the  Amazon Bedrock Data Automation feature. 

5. Amazon EventBridge triggers an AWS Lambda function when documents are uploaded to Amazon S3, using an "Object Created" event. This Lambda function then utilizes Amazon Bedrock's Data Automation feature to process the uploaded documents. 

6. The processing workflow in Amazon Bedrock Data Automation feature includes document splitting based on logical boundaries, with each split containing up to 20 pages. Each page is classified into a specific document type and matched to appropriate blueprints. The corresponding blueprint is then invoked for each page, executing key normalizations, transformations, and validations. This entire process operates asynchronously, allowing for efficient handling of multiple documents and large data volumes.

7. BDA stores the results in a Amazon S3 bucket for later processing and triggers Amazon EventBridge

8. AWS Lambda function is triggered by the Amazon EventBridge to process the JSON results of Amazon Bedrock Data Automation. The processing results send to downstream processing systems. 

In this workbook, we will explore the various aspects of this workflow such as the creating blueprints, processing sample documents, page classification.  We will process these documents:

1. ID Card
2. Bank Statements
3. W2 Tax forms
4. Pay Stubs
5. Check
6. Homeowner Insurance Application

We will then process a single PDF document with a 'loan application package', i.e. all 6 documents in one. 

This workbook follows these steps:

1. Step 1: Setup notebook and instantiate boto3 clients
2. Step 2: Process a simple PDF file using the standard output
3. Step 3: Create a Project, and blueprint for processing a Homeowner Insurance Form
4. Step 4: Add blueprints to the Automaton Project
5. Step 5: Use our custom Blueprint to process a Homeowner Insurance Form
6. Step 6: Document Splitting - Process a Multi-Page Document Package
7. Step 7: Display the results
8. Step 8: Cleanup

## Prerequisite

Before starting the workshop you will need to create an Amazon SageMaker Studio notebook instance. https://docs.aws.amazon.com/sagemaker/latest/dg/howitworks-create-ws.html For IAM role, choose either an existing IAM role in your account or create a new role. The role must the necessary permissions to invoke the BDA, SageMaker and S3 APIs. 

These IAM policies can be assigned to the role: AmazonBedrockFullAccess, AmazonS3FullAccess, AmazonSageMakerFullAccess, IAMReadOnlyAccess

Note: The AdministratorAccess IAM policy can be used, if allowed by security policies at your organization. 

## Note

It is important to run the cells below in order. If you need to re-start the workbook, and have not sucessfully run step 8 to cleanup resources, you will need to login to the AWS Console and delete the project and blueprints created in this workbook. 

If you get out of order, and unexpected results, you can 'Restart Kernel' from the SageMaker studio Kernel menu. 

# Step 1: Setup notebook and boto3 clients

In this step, we will import some necessary libraries that will be used throughout this notebook. 
To use Amazon Bedrock Data Automation (BDA) with boto3, you'll need to ensure you have the latest version of the AWS SDK for Python (boto3) installed. Version Boto3 1.35.96 of later is required. 

Note: At time of Public Preview launch, BDA is available in us-west-2 only. 

In [1]:
!pip install --upgrade boto3 pypdfium2



In [2]:
import boto3, json
from time import sleep
from IPython.display import JSON, IFrame, display
import sagemaker

print(boto3.__version__)

region_name = "us-west-2" # can be removed ones BDA is GA and available in other regions.

s3 = boto3.client('s3', region_name=region_name)
client = boto3.client('bedrock-data-automation', region_name=region_name)
run_client = boto3.client('bedrock-data-automation-runtime', region_name=region_name)

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml
1.35.99


We will give a unique name to our project and blueprint

In [3]:
project_name = 'my-bda-lending-workbook-v1'
blueprint_name = 'my-insurance-blueprint-v1'
bucket_name = sagemaker.Session().default_bucket()
print(f"Bucket_name: {bucket_name}")


Bucket_name: sagemaker-us-west-2-762233765926


# Step 2: Process a simple PDF file using the standard output



In this step, we will process a W2 Tax form using BDA Standard Output. Standard output is the default way of interacting with Amazon Bedrock Data Automation (BDA). If you pass a document to the BDA API with no established blueprint or project it returns the default standard output for that file type. 

https://docs.aws.amazon.com/bedrock/latest/userguide/bda-standard-output.html

Standard Output has three levels of granularity. We will use the default. 

1. Element level granularity (default) – This provides the text of the document in the output format of your choice, separated into different elements. These elements, such as figures, tables, or paragraphs. These are returned in logical reading order based off the structure of the document.

2. Page level granularity – This is enabled by default. Page level granularity provides each page of the document in the text output format of your choice.

3. Word level granularity – Provides information about individual words without using broader context analysis. Provides you with each word and its location on the page.



In [4]:
# Upload a W2 Form

file_name = 'documents/homeowner_insurance_application_sample.pdf'
object_name = f'data_automation/input/{file_name}'
output_name = 'data_automation/output'
s3.upload_file(file_name, bucket_name, object_name)

IFrame(file_name, width=1000, height=500)

We will now nvoke the BDA API to process the document image. 

In [5]:
response = run_client.invoke_data_automation_async(
    inputConfiguration={'s3Uri':  f"s3://{bucket_name}/{object_name}"},
    outputConfiguration={'s3Uri': f"s3://{bucket_name}/{output_name}"},)
response

invoke_arn = response['invocationArn']

The BDA call is asynchronous. We will poll until the operation is complete.

In [6]:
in_progress = True
while in_progress:
    progress = run_client.get_data_automation_status(invocationArn=invoke_arn)
    if progress['status'] == 'InProgress':
        print(progress['status'])
        sleep(10)
    else:
        break
        
print(progress['status'])

InProgress
InProgress
Success


Once the status is 'Success', we will now retrieve the results

In [7]:
out_loc = progress['outputConfiguration']['s3Uri'].split("/job_metadata.json", 1)[0].split(bucket_name+"/")[1]
out_loc += "/0/standard_output/0/result.json"
s3.download_file(bucket_name, out_loc, 'result.json')

We will display the JSON of the Standard Output. \
Note the document layout elements: pages and text, along with the sub-types: paragraphs and fooder. 

In [8]:
data = json.load(open('result.json'))
JSON(data, expand=True)

<IPython.core.display.JSON object>

Lets view the markdown representation.

In [9]:
from IPython.display import Markdown, display

pages_md = [page["representation"]["markdown"] for page in data['pages']]
display(Markdown(pages_md[0]))

# Homeowners Insurance Application

Named Insured(s) and Mailing Address Ziggy Starpixel, 42 Rainbow Sparkle Boulevard Unicornville, NV 12345

Insurance Company Fake Insurance Co 650 Davis Street San Francisco, CA 94111

Primary Email: rainbow.unicorn.987654@fakeemail.nowhere Primary Phone #: 555 555 1212 Alternate Phone #: 555 555 1213

Insured Property 42 Rainbow Sparkle Boulevard Unicornville, NV 12345

NOTICE OF INSURANCE INFORMATION PRACTICES

In some insurance transactions, we may not be able to get all the information we need directly from you. In that case, we may obtain information from outside sources at our own expense. We would also like to inform you that without prior authorization, we may as permitted by law, provide information about you contained in our records and files to certain persons or organizations.

NOTICE: As part of Esurance's underwriting/qualification procedure and subject to applicable laws and regulations, we may obtain information regarding you and other individuals who may be covered by the insurance you are applying for, including: (i) driving record, based on state motor vehicle reports and loss information reports; (ii) your prior insurance record, if any, which will be obtained from your current or prior carrier(s); (iii) credit reports; and (iv) claim history, based on loss information reports.

Primary Applicant Information

Name

Ziggy Starpixel

Date of Birth

Gender

Marital Status

Education Level

2/20/2000

M

S

Length of Time with Current Auto Carrier

Length of Time with Prior Auto Carrier

1 Year

2 years

Years with Prior Property Company

Type of Current Property Policy

1 Year

Home

Co-Applicant Information

Name

Luna Starlight-Glitterdust

Length of Time with Current Auto Carrier

Length of Time with Prior Auto Carrier

1 year

6 months

Existing Esurance Policy	Drivers License Number	DL State	Currently Insured - Auto
123456	1234567A	NV	Fake Auto Ins Co

Date of Birth	Gender	Marital Status	Education Level
2/29/2000	F	S	Graduate

Relationship to Primary Applicant	Drivers License Number	DL State	Currently Insured- Auto
Domestic Partner	987654A	NV	Fake Auto Ins Co.

Policy Number	Purchase Date and Time	Effective Date	Expiration Date
() *	Ui%da		

	Total Auto Claims, Accidents, and Violations for all Applicants			
Number of Auto Accidents		Number of Violations		Number of Comp Claims Page 1
At-Fault H150100 NV 02 16	Not-at-Fault	Major	Minor	


# Step 3: Create custom blueprint for Homeowner Insurance Form

Amazon Bedrock Data Automation (BDA) includes several sample blueprints to help you get started with custom output for documents and images. 

We will next create out own Blueprint for the Homeowners Insurance document. This is a common document seen in a residential loan application. We need just 4 fields from this documment to proceses the loan application. 

1. The insured's name
2. The insurance company name
3. The address of the insured property
4. The primary email address

In [10]:
# Display the Form

file_name = 'documents/homeowner_insurance_application_sample.pdf'
object_name = f'data_automation/input/{file_name}'
output_name = 'data_automation/output'
s3.upload_file(file_name, bucket_name, object_name)

IFrame("documents/homeowner_insurance_application_sample.pdf", width=1000, height=500)

In [11]:
# delete project if it already exists
projects_existing = [project for project in client.list_data_automation_projects()["projects"] if project["projectName"] == project_name]
if len(projects_existing) >0:
    print(f"Deleting existing project: {projects_existing[0]}")
    client.delete_data_automation_project(projectArn=projects_existing[0]["projectArn"])

# delete blueprint if it already exists
blueprints_existing = [blueprint for blueprint in client.list_blueprints()["blueprints"] if blueprint["blueprintName"] == blueprint_name]
if len(blueprints_existing) >0:
    print(f"Deleting existing blueprint: {blueprints_existing[0]}")
    client.delete_blueprint(blueprintArn=blueprints_existing[0]["blueprintArn"])

Deleting existing project: {'projectArn': 'arn:aws:bedrock:us-west-2:762233765926:data-automation-project/5f46f812e5a7', 'projectStage': 'LIVE', 'projectName': 'my-bda-lending-workbook-v1', 'creationTime': datetime.datetime(2025, 1, 15, 13, 49, 13, 944000, tzinfo=tzlocal())}
Deleting existing blueprint: {'blueprintArn': 'arn:aws:bedrock:us-west-2:762233765926:blueprint/253d40da7415', 'blueprintStage': 'LIVE', 'blueprintName': 'my-insurance-blueprint-v1', 'creationTime': datetime.datetime(2025, 1, 15, 13, 48, 47, 183000, tzinfo=tzlocal()), 'lastModifiedTime': datetime.datetime(2025, 1, 15, 13, 48, 47, 183000, tzinfo=tzlocal())}


In [12]:
response = client.create_blueprint(
    blueprintName=blueprint_name,
    type='DOCUMENT',
    blueprintStage='LIVE',
    schema=json.dumps({
    "$schema": "http://json-schema.org/draft-07/schema#",
    "description": "This blueprint will process a homeowners insurance applicatation form",
    "documentClass": "default",
    "type": "object",
    "properties": {
        "Insured Name":{
           "type":"string",
           "inferenceType":"extractive",
           "description":"Insured's Name",
        },
           "Insurance Company":{
           "type":"string",
           "inferenceType":"extractive",
           "description":"insurance company name",
        },  
           "Insured Address":{
           "type":"string",
           "inferenceType":"extractive",
           "description":"the address of the insured property",
        },
           "Email Address":{
           "type":"string",
           "inferenceType":"extractive",
           "description":"the primary email address",
        }
        }
    })
)
JSON(response, expanded=True)

<IPython.core.display.JSON object>

In [13]:
blueprint_arn = response['blueprint']['blueprintArn']
blueprint_arn

'arn:aws:bedrock:us-west-2:762233765926:blueprint/ccf05918ff0a'

# Step 4 - Use our custom blueprint to process a Homeowner Insurance Form

In [14]:
# Upload the Form

file_name = 'documents/homeowner_insurance_application_sample.pdf'
object_name = f'data_automation/input/{file_name}'
output_name = 'data_automation/output'
s3.upload_file(file_name, bucket_name, object_name)

IFrame("documents/homeowner_insurance_application_sample.pdf", width=1000, height=500)

In [15]:
response = run_client.invoke_data_automation_async(
    inputConfiguration={'s3Uri':  f"s3://{bucket_name}/{object_name}"},
    outputConfiguration={'s3Uri': f"s3://{bucket_name}/{output_name}"},
    blueprints=[{'blueprintArn': blueprint_arn, 'stage': 'LIVE'}])
response

invoke_arn = response['invocationArn']
invoke_arn

'arn:aws:bedrock:us-west-2:762233765926:data-automation-invocation/31e43131-ca63-4040-af89-41cab91d64d3'

In [16]:
in_progress = True
while in_progress:
    progress = run_client.get_data_automation_status(invocationArn=invoke_arn)
    if progress['status'] == 'InProgress':
        print(progress['status'])
        sleep(5)
    else:
        break
        
print(progress['status'])

InProgress
InProgress
InProgress
InProgress
InProgress
Success


### Get Results

Note the four fields we requested in the blueprint have been returned

In [17]:
out_loc = progress['outputConfiguration']['s3Uri'].split("/job_metadata.json", 1)[0].split(bucket_name+"/")[1]
out_loc += "/0/custom_output/0/result.json"
out_loc

'data_automation/output/31e43131-ca63-4040-af89-41cab91d64d3/0/custom_output/0/result.json'

In [18]:
s3.download_file(bucket_name, out_loc, 'result.json')

In [19]:
data = json.load(open('result.json'))
print(json.dumps(data['inference_result'], indent=2))

#print(json.dumps(data, indent=2))

{
  "Insured Address": "42 Rainbow Sparkle Boulevard Unicornville, NV 12345",
  "Insurance Company": "Fake Insurance Co",
  "Insured Name": "Ziggy Starpixel",
  "Email Address": "rainbow.unicorn.987654@fakeemail.nowhere"
}


# Step 5: Create automation project and add blueprints

### Create automation project

To process a lending package we need to be able to support processing of multiple document types.
We add our custom blueprints and multiple existing standard blueprints.

1. Homeowner Insurance Application (custom)
2. Drivers License ID Card
3. Bank Statements
4. W2 Tax form
5. Pay Stubs
6. A Check


Lets define the output format of the standard output using standard output configuration for BDA.

Document Reference: https://docs.aws.amazon.com/bedrock/latest/userguide/bda-how-it-works.html

In [20]:
output_config =  {
  "document": {
    "extraction": {
      "granularity": {
        "types": [
          "PAGE",
          "ELEMENT"
        ]
      },
      "boundingBox": {
        "state": "ENABLED"
      }
    },
    "generativeField": {
      "state": "ENABLED"               
    },
    "outputFormat": {
      "textFormat": {
        "types": ['PLAIN_TEXT','MARKDOWN','HTML','CSV']
      },
      "additionalFileFormat": {
        "state": "DISABLED"
      }
    }
  },
  "image": {
    "extraction": {
      "category": {
        "state": "ENABLED",
        "types": [
          "TEXT_DETECTION"
        ]
      },
      "boundingBox": {
        "state": "ENABLED"
      }
    },
    "generativeField": {
      "state": "ENABLED",
      "types": [
        "IMAGE_SUMMARY"
      ]
    }
  },
  "video": {
    "extraction": {
      "category": {
        "state": "ENABLED",
        "types": [
          "TEXT_DETECTION"
        ]
      },
      "boundingBox": {
        "state": "ENABLED"
      }
    },
    "generativeField": {
      "state": "ENABLED",
      "types": [
        "VIDEO_SUMMARY",
        "SCENE_SUMMARY"
      ]
    }
  },
  "audio": {
    "extraction": {
      "category": {
        "state": "ENABLED",
        "types": [
          "TRANSCRIPT"
        ]
      }
    },
    "generativeField": {
      "state": "ENABLED",
      "types": ["IAB"]
    }
  }
}

JSON(output_config)

<IPython.core.display.JSON object>

In [21]:
response = client.create_data_automation_project(
    projectName=project_name,
    projectDescription="Workbook to process Lending Applictions",
    projectStage='LIVE',
    standardOutputConfiguration=output_config,
)

print(response)

project_arn = response['projectArn']
JSON(response, expanded=True)

{'ResponseMetadata': {'RequestId': 'b99436ec-21a9-4680-81f2-e90ac81ae5f6', 'HTTPStatusCode': 201, 'HTTPHeaders': {'date': 'Wed, 15 Jan 2025 14:28:39 GMT', 'content-type': 'application/json', 'content-length': '137', 'connection': 'keep-alive', 'x-amzn-requestid': 'b99436ec-21a9-4680-81f2-e90ac81ae5f6'}, 'RetryAttempts': 0}, 'projectArn': 'arn:aws:bedrock:us-west-2:762233765926:data-automation-project/cb7d1d57400b', 'projectStage': 'LIVE', 'status': 'IN_PROGRESS'}


<IPython.core.display.JSON object>

### Add blueprints to automaton project

Our project will need blueprints needed to processess a loan applicaiton. We will add the W2 Tax Form blueprint we just created, as well as standard blueprints for these documents:

1. Drivers License ID Card 
2. Bank Statements 
3. Pay Stubs 
4. A Check 

We will also add the Homeowner Insurance Appliction we created in step 3


In [22]:
print(blueprint_arn)

arn:aws:bedrock:us-west-2:762233765926:blueprint/ccf05918ff0a


In [23]:
update_response = client.update_data_automation_project(
    projectArn=project_arn,
    standardOutputConfiguration=output_config,
    customOutputConfiguration={
        'blueprints': [
            {
                'blueprintArn': blueprint_arn, # our custom blueprint
                'blueprintStage': 'LIVE'
            },
             {
                 'blueprintArn': 'arn:aws:bedrock:us-west-2:aws:blueprint/bedrock-data-automation-public-us-driver-license',
                  'blueprintStage': 'LIVE'
             },
             {
                 'blueprintArn': 'arn:aws:bedrock:us-west-2:aws:blueprint/bedrock-data-automation-public-us-bank-check',
                  'blueprintStage': 'LIVE'
             },
             {
                 'blueprintArn': 'arn:aws:bedrock:us-west-2:aws:blueprint/bedrock-data-automation-public-payslip',
                 'blueprintStage': 'LIVE'
             },
             {
                 'blueprintArn': 'arn:aws:bedrock:us-west-2:aws:blueprint/bedrock-data-automation-public-bank-statement',
                  'blueprintStage': 'LIVE'
             },
        ]
    },
  )

project_arn = response['projectArn']


### List the blueprints for the automation project

In [24]:
JSON(client.list_blueprints(projectFilter={'projectArn': project_arn}), expanded=True)

<IPython.core.display.JSON object>

In [25]:
JSON(client.get_data_automation_project(projectArn=project_arn), expanded=False)

<IPython.core.display.JSON object>

# Step 6 - Document Splitting - Process a Multi-Page Document Package

In [26]:


print(f"Activating document splitting for project: {project_name}, {project_arn}")


project = client.get_data_automation_project(projectArn=project_arn)["project"]

# Update project configuration
update_response = client.update_data_automation_project(
    projectArn=project_arn,
    standardOutputConfiguration=project["standardOutputConfiguration"],
    customOutputConfiguration={
        'blueprints': [
            {
                'blueprintArn': blueprint_arn,
                'blueprintStage': 'LIVE'
            },
            {
                'blueprintArn': 'arn:aws:bedrock:us-west-2:aws:blueprint/bedrock-data-automation-public-w2-form',
                'blueprintStage': 'LIVE'
            },
             {
                 'blueprintArn': 'arn:aws:bedrock:us-west-2:aws:blueprint/bedrock-data-automation-public-us-driver-license',
                  'blueprintStage': 'LIVE'
             },
             {
                 'blueprintArn': 'arn:aws:bedrock:us-west-2:aws:blueprint/bedrock-data-automation-public-us-bank-check',
                  'blueprintStage': 'LIVE'
             },
             {
                 'blueprintArn': 'arn:aws:bedrock:us-west-2:aws:blueprint/bedrock-data-automation-public-payslip',
                 'blueprintStage': 'LIVE'
             },
             {
                 'blueprintArn': 'arn:aws:bedrock:us-west-2:aws:blueprint/bedrock-data-automation-public-bank-statement',
                  'blueprintStage': 'LIVE'
             },
        ]
    },
    overrideConfiguration={'document': {'splitter': {'state': 'ENABLED'}}})

# Get updated project configuration
updated_project = client.get_data_automation_project(projectArn=project_arn)

print("\nUpdated override configuration of project:")
JSON(updated_project)


Activating document splitting for project: my-bda-lending-workbook-v1, arn:aws:bedrock:us-west-2:762233765926:data-automation-project/cb7d1d57400b

Updated override configuration of project:


<IPython.core.display.JSON object>

In [27]:
##
## Upload a package of documents to an S3
##
file_name = 'documents/lending_package.pdf'
object_name = f'data_automation/input/{file_name}'
output_name = 'data_automation/output'
s3.upload_file(file_name, bucket_name, object_name)

IFrame("documents/lending_package.pdf", width=1000, height=500)

In [28]:
# Process the document package
response = run_client.invoke_data_automation_async(
    dataAutomationConfiguration = { "dataAutomationArn" : project_arn,"stage" : 'LIVE'},
    inputConfiguration={'s3Uri':  f"s3://{bucket_name}/{object_name}"},
    outputConfiguration={'s3Uri': f"s3://{bucket_name}/{output_name}"},
)

response


invoke_arn = response['invocationArn']
invoke_arn


'arn:aws:bedrock:us-west-2:762233765926:data-automation-invocation/46c101e9-d867-4946-82fe-4eaf347734fc'

In [29]:
in_progress = True

while in_progress:
    progress = run_client.get_data_automation_status(invocationArn=invoke_arn)
    if progress['status'] == 'InProgress':
        print(progress['status'])
        sleep(10)
    else:
        break
        
print(progress['status'])

InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
Success


# Step 7 - Display the results

BDA will automatically split the documents based and return the detected blueprints as well as the requested structured output for each blueprint.
Lets visualize these results by showing the first page of each detected blueprint and the inference results.

In [31]:
#import os
#import sys
#sys.path.append(os.path.abspath('..'))
import pypdfium2 as pdfium
import ipywidgets as widgets
from utils.helpers import get_s3_to_dict, display_image_jsons

doc = pdfium.PdfDocument(file_name)
pages_pil = [page.render(scale=1.53).to_pil() for page in doc]

# get the job_metadata
job_json_obj = get_s3_to_dict(s3,progress['outputConfiguration']['s3Uri'])
results_meta = job_json_obj["output_metadata"][0]["segment_metadata"]

# put the results together and show with first page side by side
results_all = []
for result in results_meta:
    standard_output_obj = get_s3_to_dict(s3,result["standard_output_path"])
    custom_output_obj = get_s3_to_dict(s3,result["custom_output_path"])
    pages = custom_output_obj["split_document"]["page_indices"]
    w = display_image_jsons(pages_pil[pages[0]], [custom_output_obj['matched_blueprint'],custom_output_obj['inference_result']],["Matched Blueprint", "Inference Result"])
    results_all.append(w)    

widgets.VBox(results_all)


VBox(children=(HBox(children=(Image(value=b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x01\xc8\x00\x00\x02\x8…

## Conclusion

We learned how to use BDA to extract structured outputs from complex documents by
* creating a custom blueprint with JSON schema and matched it against a specific document.
* creating a project with multiple blueprints and automatically split, classify and match the requested information from blueprints


# Step 8 - Cleanup

This step is needed before we run through the workbook a second time. 

In [None]:
# Delete the project
response = client.delete_data_automation_project(projectArn=project_arn)

# Delete the blueprint
response = client.delete_blueprint(blueprintArn=blueprint_arn)