Name		Name	Last commit message	Last commit date
parent directory ..
cdk		cdk
resources		resources
statemachine		statemachine
.gitignore		.gitignore
README.md		README.md
app.py		app.py
cdk.json		cdk.json
example-workflow.json		example-workflow.json
requirements.txt		requirements.txt

README.md

Data Processing & Storage Pattern CDK

Requirements

Create an AWS account if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources.
AWS CLI installed and configured
Git installed
AWS Cloud Development Kit (CDK >= 2.2.0) installed
Python 3 installed

Deployment Instructions

Create a new directory, navigate to that directory in a terminal and clone the GitHub repository:
```
git clone https://github.com/aws-samples/step-functions-workflows-collection
```

Change directory to the pattern directory:

cd step-functions-workflows-collection/data-processing/cdk

From the command line, deploy the stack with CDK and accept changes. If you would like to see CloudFormation output before deploying, you can run 'cdk synth'
```
cdk deploy
```
While the stack is deploying, you can check out the "How it works" section or grab some coffee or tea
Navigate to the shared directory and run the following Python script to load test images into S3. This will obtain the S3 bucket name, create a new input JSON file with the correct bucket name for our objects and upload images under shared/images to the S3 bucket.
```
cd ../shared/
```
```
python scripts/uploadImagesToS3.py
```
You can upload your own test images, but will need to modify the statemachine input JSON in the shared/output folder in order for them to be processed. This is covered in the Testing section

How it works

Iterates over a list of objects in S3 provided as input using the Map state
Retrieves object metadata and uses Rekognition to obtain image labels in parallel using the Parallel state
Data from previous parallel states is merged and stored as one DynamoDB entry

Testing

After completion of the Deployment Instructions, navigate to Step Functions in the AWS console and select the workflow that starts with ProcessingImageDataPatternStateMachine. If you don't see it, make sure you are in the correct region
Select 'Start Execution' and copy the contents of shared/output/data-workflow-pattern-*.json and replace the existing comment in the input text area with it, then Start Execution. If you uploaded your own custom images, you will need to modify the input accordingly

Observe the State Machine workflow execution. It may take several seconds for the workflow to complete
Navigate to DynamoDB in the AWS console, select Tables, then select the images-data-workflow-pattern-sl table and click "Explore table items" and then perform a scan by clicking the Run button. You should have several records with metadata and labels from the Rekognition service
Navigate back to your state machine execution within the AWS console. View the input and output of each state to see what data is passed and/or altered from one state to the next
Select Edit state machine button, then the Workflow Studio button to view the state machine graphically. Click on each state to understand the configuration, input and output processing. View further documentation on input and output processing.

Cleanup

To delete the resources created by this template, use the following command below. For the prompts that occur after the second command, enter y to accept deletion for both prompts.

You can find your bucket name in the output from the sam deploy command run earlier or by navigating to S3 in th console and finding a bucket name that starts with data-workflow-pattern-

cd ../cdk

cdk destroy

The second command deletes all existing objects in the bucket. This ensures the CDK is able to successfully delete the bucket when issued the delete command.

SPDX-License-Identifier: MIT-0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cdk

cdk

README.md

Data Processing & Storage Pattern CDK

Requirements

Deployment Instructions

How it works

Testing

Cleanup

Files

cdk

Directory actions

More options

Directory actions

More options

Latest commit

History

cdk

Folders and files

parent directory

README.md

Data Processing & Storage Pattern CDK

Requirements

Deployment Instructions

How it works

Testing

Cleanup