An automated, event-driven data processing pipeline built on AWS. This project uses a serverless architecture to automatically convert CSV files uploaded to an S3 bucket into JSON format.
The entire cloud infrastructure is defined using Infrastructure as Code (IaC) with the AWS Serverless Application Model (SAM) framework.
This application follows a simple, powerful, and common serverless pattern:
- A user uploads a .csvfile (e.g.,data.csv) to a designated "source" S3 bucket.
- The S3 ObjectCreatedevent automatically triggers an AWS Lambda function.
- The Python-based Lambda function reads the CSV file, parses its contents, and converts the data into a JSON array.
- The function then saves the new JSON file (e.g., data.json) to a separate "destination" S3 bucket.
- Fully Serverless: No servers to provision or manage. The application scales automatically with demand.
- Event-Driven: The processing pipeline is triggered by real-time events, making it efficient and responsive.
- Infrastructure as Code (IaC): The entire stack (S3 buckets, Lambda function, and IAM permissions) is defined in the template.yamlfile, allowing for repeatable and automated deployments.
- Decoupled: The source and destination buckets are separate, following best practices for data processing pipelines.
This project showcases proficiency in modern cloud-native development:
- Cloud: Amazon Web Services (AWS)
- Serverless: AWS Lambda, AWS SAM
- Storage: Amazon S3
- Programming: Python 3.11, Boto3 (AWS SDK)
- Infrastructure as Code (IaC): AWS CloudFormation, YAML
- Concepts: Event-Driven Architecture, Serverless Patterns, IAM Roles & Policies
Before you begin, ensure you have the following tools installed and configured:
- AWS CLI: Configured with your AWS credentials (aws configure).
- AWS SAM CLI: The framework used to build and deploy.
- Docker: Required by SAM CLI to build the Lambda deployment package locally.
- Python 3.11
You can deploy this entire application to your own AWS account in two commands:
1. Build the Application
This command packages the Lambda function and prepares it for deployment.
sam build2. Deploy the Application
This command will guide you through the deployment process, prompting for a "Stack Name" (e.g., csv-processor) and other parameters. It will automatically create the S3 buckets and Lambda function for you.
sam deploy --guidedAfter the deployment succeeds, the SAM CLI will output the names of the two new S3 buckets.
- 
Create a sample CSV file named test.csv:id,name,role 1,Alice,Engineer 2,Bob,Manager 3,Charlie,Analyst
- 
Find your source bucket name. You can find this in the output of the sam deploycommand or in the AWS CloudFormation console's "Outputs" tab for your stack.
- 
Upload the file to the source S3 bucket. aws s3 cp test.csv s3://<your-source-bucket-name>/ 
- 
Check the destination bucket. Within a few seconds, a new file named test.jsonshould appear in your destination S3 bucket. You can check this in the AWS S3 console or by running:aws s3 ls s3://<your-destination-bucket-name>/ The contents of test.jsonwill be:[ { "id": "1", "name": "Alice", "role": "Engineer" }, { "id": "2", "name": "Bob", "role": "Manager" }, { "id": "3", "name": "Charlie", "role": "Analyst" } ]
To completely remove all resources created by this project, run the sam delete command. This will delete the CloudFormation stack, both S3 buckets, the Lambda function, and all associated IAM roles.
sam delete