Skip to content

SovendeSkov/Serverless-Data-Processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Serverless CSV to JSON Processor

An automated, event-driven data processing pipeline built on AWS. This project uses a serverless architecture to automatically convert CSV files uploaded to an S3 bucket into JSON format.

The entire cloud infrastructure is defined using Infrastructure as Code (IaC) with the AWS Serverless Application Model (SAM) framework.

How It Works

This application follows a simple, powerful, and common serverless pattern:

  1. A user uploads a .csv file (e.g., data.csv) to a designated "source" S3 bucket.
  2. The S3 ObjectCreated event automatically triggers an AWS Lambda function.
  3. The Python-based Lambda function reads the CSV file, parses its contents, and converts the data into a JSON array.
  4. The function then saves the new JSON file (e.g., data.json) to a separate "destination" S3 bucket.

Key Features

  • Fully Serverless: No servers to provision or manage. The application scales automatically with demand.
  • Event-Driven: The processing pipeline is triggered by real-time events, making it efficient and responsive.
  • Infrastructure as Code (IaC): The entire stack (S3 buckets, Lambda function, and IAM permissions) is defined in the template.yaml file, allowing for repeatable and automated deployments.
  • Decoupled: The source and destination buckets are separate, following best practices for data processing pipelines.

Skills & Technologies Demonstrated

This project showcases proficiency in modern cloud-native development:

  • Cloud: Amazon Web Services (AWS)
  • Serverless: AWS Lambda, AWS SAM
  • Storage: Amazon S3
  • Programming: Python 3.11, Boto3 (AWS SDK)
  • Infrastructure as Code (IaC): AWS CloudFormation, YAML
  • Concepts: Event-Driven Architecture, Serverless Patterns, IAM Roles & Policies

Prerequisites

Before you begin, ensure you have the following tools installed and configured:

  1. AWS CLI: Configured with your AWS credentials (aws configure).
  2. AWS SAM CLI: The framework used to build and deploy.
  3. Docker: Required by SAM CLI to build the Lambda deployment package locally.
  4. Python 3.11

How to Deploy

You can deploy this entire application to your own AWS account in two commands:

1. Build the Application

This command packages the Lambda function and prepares it for deployment.

sam build

2. Deploy the Application

This command will guide you through the deployment process, prompting for a "Stack Name" (e.g., csv-processor) and other parameters. It will automatically create the S3 buckets and Lambda function for you.

sam deploy --guided

After the deployment succeeds, the SAM CLI will output the names of the two new S3 buckets.

How to Test the Pipeline

  1. Create a sample CSV file named test.csv:

    id,name,role
    1,Alice,Engineer
    2,Bob,Manager
    3,Charlie,Analyst
    
  2. Find your source bucket name. You can find this in the output of the sam deploy command or in the AWS CloudFormation console's "Outputs" tab for your stack.

  3. Upload the file to the source S3 bucket.

    aws s3 cp test.csv s3://<your-source-bucket-name>/
  4. Check the destination bucket. Within a few seconds, a new file named test.json should appear in your destination S3 bucket. You can check this in the AWS S3 console or by running:

    aws s3 ls s3://<your-destination-bucket-name>/

    The contents of test.json will be:

    [
        {
            "id": "1",
            "name": "Alice",
            "role": "Engineer"
        },
        {
            "id": "2",
            "name": "Bob",
            "role": "Manager"
        },
        {
            "id": "3",
            "name": "Charlie",
            "role": "Analyst"
        }
    ]

Cleaning Up

To completely remove all resources created by this project, run the sam delete command. This will delete the CloudFormation stack, both S3 buckets, the Lambda function, and all associated IAM roles.

sam delete

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published