This sample application demonstrates iterating over Amazon S3 objects under a specified prefix using S3ListObjectsV2 and processing them using AWS Step Functions Distributed Map.
The following diagram shows the Step Functions workflow.
- The Step Functions state machine reads all the log files from the given S3 prefix using distributed map. For each log file entry, the state machine puts a metrics into Amazon CloudWatch.
- The state machine then stores hourly metrics count in a Amazon DynamoDB table.
- The state machine then invokes an AWS Lambda function to perform metrics aggregation.
- Create an AWS account if you do not already have one and log in.
- Have access to an AWS account through the AWS Management Console and the AWS Command Line Interface (AWS CLI). The AWS Identity and Access Management (IAM) user that you use must have permissions to make the necessary AWS service calls and manage AWS resources mentioned in this post. While providing permissions to the IAM user, follow the principle of least-privilege.
- AWS CLI installed and configured
- Git Installed
- AWS Serverless Application Model (AWS SAM) installed
- Python 3.13+ installed
Clone the GitHub repository in a new folder and navigate to project root folder.
git clone https://github.com/aws-samples/sample-stepfunctions-s3-prefix-processor.git
cd sample-stepfunctions-s3-prefix-processorRun the following commands to deploy the application.
sam deploy --guidedEnter the following details:
- Stack name: The CloudFormation stack name(for example, stepfunctions-s3-prefix-processor)
- AWS Region: A supported AWS Region (for example, us-east-1)
- Keep rest of the components to default values.
The outputs from the sam deploy will be used in the subsequent steps.
Run the following command to generate sample test data and upload it to the input S3 bucket.
python3 scripts/generate_logs.pyRun the following to upload the log files to the S3 bucket /logs/daily prefix. Replace LogAnalyticsBucketName with the value from sam deploy output.
aws s3 sync logs/ s3://<LogAnalyticsBucketName>/logs/ --exclude '*' --include '*.log'Run the following command to start execution of the Step Functions. Replace the StateMachineArn with the value from sam deploy output.
aws stepfunctions start-execution \
--state-machine-arn <StateMachineArn> \
--input '{}'The Step Function state machine iterates over all the log files with S3 prefix /logs/daily prefix and processes them in parallel. It updates the metrics in CloudWatch, then stores hourly metrics count in the DynamoDB table and finally, invokes an AWS Lambda function to perform metrics aggregation.
Run the following command to get the details of the execution. Replace the executionArn from the previous command.
aws stepfunctions describe-execution --execution-arn <executionArn>Wait till the status shows SUCCEEDED.
Run the following commands to check the processed output from LogAnalyticsSummaryTableName DynamoDB table. Replace the value LogAnalyticsSummaryTableName with the value from sam deploy output.
aws dynamodb scan --table-name <LogAnalyticsSummaryTableName>
Run the following command to check the output of the Step Functions state machine execution output.
aws stepfunctions describe-execution --execution-arn <executionArn> --query 'output' --output text
The output of the Step Functions state machine shows the daily summary insights of the log files created by the Lambda function.
Run the following commands to delete the resources deployed in this sample application.
# Empty S3 buckets before deletion (replace with actual bucket names)
aws s3 rm s3://<LogAnalyticsBucketName> --recursive
# Delete the SAM stack
sam delete
# Clean up local log files
rm -rf logs/This library is licensed under the MIT-0 License. See the LICENSE file.
