This project demonstrates the design and implementation of a serverless Data Lake Automation Solution using AWS cloud services. The solution enables automated data ingestion, transformation, storage, and analytics using a scalable event-driven architecture.
The project was developed as part of the Advanced Certification in Cloud Computing – Capstone Project II.
Client
↓
API Gateway
↓
Lambda (Ingestion)
↓
S3 Raw Bucket
↓
S3 Event Trigger
↓
Lambda (Transformation)
↓
S3 Processed Bucket
↓
Amazon Athena
# AWS Services Used
Amazon S3
AWS Lambda
Amazon API Gateway
Amazon DynamoDB
Amazon Athena
Amazon CloudWatch
AWS IAM
# Features
Serverless architecture
Automated data ingestion
Event-driven data processing
Metadata management using DynamoDB
Data transformation pipeline
SQL-based analytics using Athena
Scalable and cost-efficient design
# Project Workflow
1. Data Ingestion
Incoming data is sent through API Gateway and processed by the ingestion Lambda function.
2. Raw Data Storage
Raw JSON data is stored in Amazon S3 raw bucket.
3. Event Trigger
S3 object creation event triggers the transformation Lambda function.
4. Data Transformation
The transformation Lambda processes the raw data and stores transformed output in the processed S3 bucket.
5. Analytics
Amazon Athena is used to query processed data directly from S3 using SQL.
# Conclusion
This project successfully demonstrates the implementation of a scalable AWS-based Data Lake solution capable of automated ingestion, transformation, and analytics using modern serverless cloud technologies.