Serverless Data Processing Pipeline on AWS with Terraform

This project demonstrates a serverless data processing pipeline on Amazon Web Services (AWS) using Terraform to manage the infrastructure as code (IaC).

The pipeline automatically processes incoming CSV files containing order data. It aggregates this data and stores the results in a DynamoDB table, showcasing a common and practical cloud architecture.

Architecture

The pipeline's workflow is event-driven. The process is triggered when a new .csv file is uploaded with a specific prefix to an S3 bucket.

[S3 Bucket] --(CSV upload with "orders" prefix)--> [AWS Lambda] --(Process & Aggregate)--> [DynamoDB Table]

Amazon S3 Bucket: Serves as the ingestion point for raw data. The bucket is configured to trigger a Lambda function upon object creation.
AWS Lambda (Python): The core compute engine of the pipeline. This Python function reads the CSV file, processes the orders, aggregates revenue by product category, and writes the result to the DynamoDB table.
Amazon DynamoDB: A NoSQL database used to store the processed, aggregated data. The table's partition key is the product category.
IAM Roles & Policies: Defines the security permissions that allow AWS services to interact with each other (e.g., granting the Lambda function access to read from S3 and write to DynamoDB).

Key Features

Infrastructure as Code (IaC): All AWS resources are defined in Terraform, enabling fast, consistent, and repeatable deployments.
Event-Driven & Serverless: No idle servers to manage. Costs are based on actual usage, and the pipeline scales automatically with the volume of incoming data.

Prerequisites

To run this project, you will need the following tools installed:

Terraform (v1.0.0+)
AWS CLI
Configured AWS credentials (e.g., via the aws configure command).

Deployment and Usage

Clone the repository:

git clone https://github.com/TechWizard27/serverless-data-pipeline.git
cd serverless-data-pipeline

Initialize Terraform: This command downloads the necessary providers (AWS and Random).
```
terraform init
```
Deploy the infrastructure: This command creates a plan and prompts you to approve the creation of the AWS resources. Enter yes to proceed.
```
terraform apply
```
After a successful deployment, Terraform will print the names of the created S3 bucket and DynamoDB table.

How to Test the Pipeline

Get the S3 bucket name from the output of the terraform apply command, or run:
```
terraform output s3_bucket_name
```
Upload a test file with prefix "orders" to the bucket. You can use the test_orders.csv file included in this project.
```
aws s3 cp test_orders.csv s3://<your-bucket-name>/orders_test.csv
```
(Replace <your-bucket-name> with the actual bucket name from the output.)
Check the results in DynamoDB: After a few moments, the Lambda function will be triggered. You can verify the results in the DynamoDB table using the AWS Management Console or the AWS CLI. The table should contain items with aggregated revenue for each product category.

Detailed Dependency Graph

Cleanup

To tear down all the resources created by this project, run the following command:

terraform destroy

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
graph.png		graph.png
main.tf		main.tf
outputs.tf		outputs.tf
test_orders.csv		test_orders.csv
variables.tf		variables.tf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Serverless Data Processing Pipeline on AWS with Terraform

Architecture

Key Features

Prerequisites

Deployment and Usage

How to Test the Pipeline

Detailed Dependency Graph

Cleanup

License

About

Uh oh!

Releases

Packages

Languages

License

TechWizard27/serverless-data-pipeline

Folders and files

Latest commit

History

Repository files navigation

Serverless Data Processing Pipeline on AWS with Terraform

Architecture

Key Features

Prerequisites

Deployment and Usage

How to Test the Pipeline

Detailed Dependency Graph

Cleanup

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages