Name		Name	Last commit message	Last commit date
parent directory ..
bin		bin
lib		lib
resources		resources
statemachine		statemachine
.npmignore		.npmignore
README.md		README.md
cdk.json		cdk.json
example-workflow.json		example-workflow.json
package.json		package.json
tsconfig.json		tsconfig.json

README.md

S3 High Throughput Distributed Map

This is an AWS CDK template for exploring the Step Functions Distributed Map capabilities and how to tackle the need of an increased paralelism.

Amazon S3 is highly used as a durable and highly available service to store big amount of data for audit, compliance , application or datafication purposes. The challenge of using these data begins in read and parallelization steps, this can be handled by an optimized custom code using aws services like AWS Lambda or AWS Batch but in simple scenarios when the bigest part of this process is read and paralleism level the challenge can be handled using Step function Distributed map.

This example is realized following the official Aws Documentation and the consideration and concepts explored in AWS Blog News.

Learn more about this workflow at Step Functions workflows collection: https://serverlessland.com/workflows/s3-high-throughput-distributed-map

Important: this application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the AWS Pricing page for details. You are responsible for any AWS costs incurred. No warranty is implied in this example.

Requirements

Create an AWS account if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources.
AWS CLI installed and configured
Git Installed
AWS CDK Installed

Deployment Instructions

If this is your first time using AWS CDK, bootstrap your environment.
```
cdk bootstrap aws://{your-aws-account-number}/{your-aws-region}
```
Create a new directory, navigate to that directory in a terminal and clone the GitHub repository:
```
git clone https://github.com/aws-samples/step-functions-workflows-collection
```
Change directory to the pattern directory:
```
cd s3-high-throughput-distributed-map
```
From the command line, use npm to install dependencies and run the build process for the Lambda functions.
```
npm install
npm run build
```
From the command line, use CDK to deploy the AWS resources for the workflow
```
npm run cdk:deploy
```

During the prompts:

Do you wish to deploy these changes (y/n)? Y

How it works

This Sample Deploy json files from ressources/assets folder into the S3 Bucket, to run the example for higher throughput just add more json files and redeploy the sample ( this can lead to costs ).

Run this command to create a high number of files before deployment

    cd resources/assets && for i in {1..100000}; do cp example1.json "example$i.json"; done

     aws s3 sync . s3://<S3-Bucket-Name>

This step function demonstrates oprimized treatment of s3 objects with native service integrations:

Read all S3 bucket objects using a parent DISTRIBUTED map.
Batching and Sending each batch to a parallel map branch.
Parallelizing each sent batch using a child DISTRIBUTED Map.
Rebatching and parallelizing the items using an INLINE map
Geting the Object Body from S3 for any single Key
Transforming the body by adding a new attribute in json body
Publishing the transformed results to Amazon SNS Topic

Image

Testing

The Step Function can be triggered without worring about payload

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

s3-bucket-nested-distributed-map

s3-bucket-nested-distributed-map

README.md

S3 High Throughput Distributed Map

Requirements

Deployment Instructions

How it works

Image

Testing

Files

s3-bucket-nested-distributed-map

Directory actions

More options

Directory actions

More options

Latest commit

History

s3-bucket-nested-distributed-map

Folders and files

parent directory

README.md

S3 High Throughput Distributed Map

Requirements

Deployment Instructions

How it works

Image

Testing