This workflow demonstrates how to analyze the picture of an expense stored in Amazon S3 using the AnalyzeExpense API call of Amazon Textract. The extracted expense data will be persisted in DynamoDb.
Important: this application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the AWS Pricing page for details. You are responsible for any AWS costs incurred. No warranty is implied in this example.
- Create an AWS account if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources.
- AWS CLI installed and configured
- Git Installed
- Terraform installed
-
Create a new directory, navigate to that directory in a terminal and clone the GitHub repository:
git clone https://github.com/aws-samples/step-functions-workflows-collection
-
Change directory to the pattern directory:
cd step-functions-workflows-collection/textract-analyze-expense-tf
-
From the command line, initialize Terraform to download and install the providers defined in the configuration:
terraform init
-
From the command line, apply the configuration in the main.tf file:
terraform apply
-
During the prompts:
- Enter yes
-
Note the outputs from the Terraform deployment process. These contain the resource names and/or ARNs which are used for testing.
This workflow demonstrates how to analyze the picture of an expense stored in the S3, extract data from the expense, and persist it in DynamoDb.
-
The workflow can be triggered either uploading an image of a receipt in the provided S3 bucket or manually starting the StepFunction with the payload described in the Testing section.
-
The image is processed using the DetectLabel API call of Amazon Rekognition to detect if the image contains a receipt. The sole purpose of this step is to optimize the cost reducing API calls to Textract for images which are not expenses.
-
If in the list of detected labels there is no
Receipt
label or the confidence lower than 80, then the workflow returns an error saying that the image is not a receipt. -
Otherwise if calls the AnalyzeExpense API call of Amazon Textract to extract the expense data from the image.
-
The workflow saves the S3 Object key, paid amount, invoce date and vendor name in the DynamoDb table created via Terraform
There are two ways to test the Step Functions Workflow:
- Upload a JPG image of a receipt onto the S3 Bucket deployed via Terraform. For example using the AWS CLI:
aws s3 cp my_receipt.jpg s3://textract-analyze-expense-tf-${AWS_ACCOUNT_ID}
- Start the workflow with a test event as in the example by below. You must replace
my_receipt.jpg
with the S3 Object key of the image already uploaded in the S3 Buckettextract-analyze-expense-tf-${AWS_ACCOUNT_ID}
{
"detail": {
"bucket": {
"name": "textract-analyze-expense-tf-${AWS_ACCOUNT_ID}"
},
"object": {
"key": "my_receipt.jpg"
}
}
}
Replace ${AWS_ACCOUNT_ID}
with the AWS AccountId used to deploy the Terraform template.
- Change directory to the pattern directory:
cd step-functions-workflows-collection/textract-analyze-expense-tf
- Delete all created resources
terraform destroy
- During the prompts:
- Enter yes
- Confirm all created resources has been deleted
terraform show
Copyright 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
SPDX-License-Identifier: MIT-0