Optimizing nested JSON array processing using AWS Step Functions Distributed Map

This sample application demonstrates processing JSON arrays using AWS Step Functions Distributed Map. The Step Functions workflow reads the JSON file from a Amazon S3 bucket and iterated over the array to process each element in the array.

Workflow

The following diagram shows the Step Functions workflow.

The state machine reads the product-updates.json file from an input Amazon S3 bucket. The file contains a JSON array.
The Distributed Map state in the state machine, iterates over the JSON array for each item with the array invokes an AWS Lambda function for data enrichment. The Lambda functions add product stock and price information to the product data.
The state machine saves the updated product data in an Amazon DynamoDB table.
Finally the state machine upload the execution meta data in an output S3 bucket.

Prerequisites

Create an AWS account if you do not already have one and log in.
Have access to an AWS account through the AWS Management Console and the AWS Command Line Interface (AWS CLI). The AWS Identity and Access Management (IAM) user that you use must have permissions to make the necessary AWS service calls and manage AWS resources mentioned in this post. While providing permissions to the IAM user, follow the principle of least-privilege.
AWS CLI installed and configured
Git Installed
AWS Serverless Application Model (AWS SAM) installed
Python 3.13+ installed

Quick Start

1. Clone and navigate to stacks directory (all commands run from here)

Clone the GitHub repository in a new folder and navigate to project root folder:

git clone https://github.com/aws-samples/sample-stepfunctions-json-array-processor.git
cd sample-stepfunctions-json-array-processor

2. Deploy the application

Run the following commands to deploy the application

sam deploy --guided

Enter the following details:

Stack name: The CloudFormation stack name(for example, stepfunctions-json-array-processor)
AWS Region: A supported AWS Region (for example, us-east-1)
Keep rest of the components to default values.

The outputs from the sam deploy will be used in the subsequent steps.

3. Generate the test data and upload it to S3 bucket

Run the following command to generate sample test data and upload it to the input S3 bucket. Replace InputBucketName with the value from sam deploy output.

python3 scripts/generate_sample_data.py <InputBucketName>

4. Test the Step Functions workflow

Run the following command to start execution of the Step Functions. Replace the StateMachineArn with the value from sam deploy output.

aws stepfunctions start-execution \
  --state-machine-arn <StateMachineArn> \
  --input '{}'

The Step Function state machine parses the input JSON array file and then for each product in the array it invokes the Lambda function. The Lambda function updates the price and stock information and gives back the control to the state machine. The state machine stores the updated product data into the Amazon DynamoDB table and also uploads the execution metadata into the ResultBucketName bucket.

5. Monitor the state machine execution

Run the following command to get the details of the execution. Replace the executionArn from the previous command.

aws stepfunctions describe-execution --execution-arn <executionArn>

It should the status SUCCEEDED.

6. Verify Results

Run the following commands to validate the processed output from ProductCatalogTableName DynamoDB table and the generated manifest file from the ResultBucket. Replace the value ProductCatalogTableName and ResultBucket with the value from sam deploy output.

aws dynamodb scan --table-name <ProductCatalogTableName>

aws s3 ls s3://<ResultBucket>/results/ --recursive

Check that the DynamoDB table contains the updated product information.

Cleanup

Run the following commands to delete the resources deployed in this sample application.

# Delete S3 bucket contents

aws s3 rm s3://<InputBucketName> --recursive
aws s3 rm s3://<ResultBucketName> --recursive

# Delete SAM stack
sam delete

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
scripts		scripts
src		src
statemachine		statemachine
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
template.yaml		template.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Optimizing nested JSON array processing using AWS Step Functions Distributed Map

Table of Contents

Workflow

Prerequisites

Quick Start

1. Clone and navigate to stacks directory (all commands run from here)

2. Deploy the application

3. Generate the test data and upload it to S3 bucket

4. Test the Step Functions workflow

5. Monitor the state machine execution

6. Verify Results

Cleanup

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

aws-samples/sample-stepfunctions-json-array-processor

Folders and files

Latest commit

History

Repository files navigation

Optimizing nested JSON array processing using AWS Step Functions Distributed Map

Table of Contents

Workflow

Prerequisites

Quick Start

1. Clone and navigate to stacks directory (all commands run from here)

2. Deploy the application

3. Generate the test data and upload it to S3 bucket

4. Test the Step Functions workflow

5. Monitor the state machine execution

6. Verify Results

Cleanup

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages