Skip to content

aws-samples/sample-stepfunctions-json-array-processor

Optimizing nested JSON array processing using AWS Step Functions Distributed Map

This sample application demonstrates processing JSON arrays using AWS Step Functions Distributed Map. The Step Functions workflow reads the JSON file from a Amazon S3 bucket and iterated over the array to process each element in the array.

Table of Contents

Workflow

The following diagram shows the Step Functions workflow.

AWS Step Function workflow

  • The state machine reads the product-updates.json file from an input Amazon S3 bucket. The file contains a JSON array.
  • The Distributed Map state in the state machine, iterates over the JSON array for each item with the array invokes an AWS Lambda function for data enrichment. The Lambda functions add product stock and price information to the product data.
  • The state machine saves the updated product data in an Amazon DynamoDB table.
  • Finally the state machine upload the execution meta data in an output S3 bucket.

Prerequisites

Quick Start

1. Clone and navigate to stacks directory (all commands run from here)

Clone the GitHub repository in a new folder and navigate to project root folder:

git clone https://github.com/aws-samples/sample-stepfunctions-json-array-processor.git
cd sample-stepfunctions-json-array-processor

2. Deploy the application

Run the following commands to deploy the application

sam deploy --guided

Enter the following details:

  • Stack name: The CloudFormation stack name(for example, stepfunctions-json-array-processor)
  • AWS Region: A supported AWS Region (for example, us-east-1)
  • Keep rest of the components to default values.

The outputs from the sam deploy will be used in the subsequent steps.

3. Generate the test data and upload it to S3 bucket

Run the following command to generate sample test data and upload it to the input S3 bucket. Replace InputBucketName with the value from sam deploy output.

python3 scripts/generate_sample_data.py <InputBucketName>

4. Test the Step Functions workflow

Run the following command to start execution of the Step Functions. Replace the StateMachineArn with the value from sam deploy output.

aws stepfunctions start-execution \
  --state-machine-arn <StateMachineArn> \
  --input '{}'

The Step Function state machine parses the input JSON array file and then for each product in the array it invokes the Lambda function. The Lambda function updates the price and stock information and gives back the control to the state machine. The state machine stores the updated product data into the Amazon DynamoDB table and also uploads the execution metadata into the ResultBucketName bucket.

5. Monitor the state machine execution

Run the following command to get the details of the execution. Replace the executionArn from the previous command.

aws stepfunctions describe-execution --execution-arn <executionArn>

It should the status SUCCEEDED.

6. Verify Results

Run the following commands to validate the processed output from ProductCatalogTableName DynamoDB table and the generated manifest file from the ResultBucket. Replace the value ProductCatalogTableName and ResultBucket with the value from sam deploy output.

aws dynamodb scan --table-name <ProductCatalogTableName>

aws s3 ls s3://<ResultBucket>/results/ --recursive

Check that the DynamoDB table contains the updated product information.

Cleanup

Run the following commands to delete the resources deployed in this sample application.

# Delete S3 bucket contents

aws s3 rm s3://<InputBucketName> --recursive
aws s3 rm s3://<ResultBucketName> --recursive

# Delete SAM stack
sam delete

License

This library is licensed under the MIT-0 License. See the LICENSE file.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages