Data Engineering Portfolio

This portfolio is a compilation of code which I have created for data engineering works, projects, examples.

Data Engineering 101:

Solid Waste Management Data Warehouse Project

This project focuses on designing and implementing a data warehouse for a solid waste management company operating in major cities across Brazil. The company's goal is to create a data warehouse that allows for comprehensive reporting on waste collection metrics, including total waste collected by various dimensions such as year, month, truck type, and city.
Denormalizing Tables in Data Warehouse: Creating Materialized Views

This project demonstrates how to denormalize tables in a data warehouse by creating materialized views to improve query performance for customer analytics. We use a star schema model with fact and dimension tables and build materialized views for optimized data retrieval. This approach showcases core data engineering skills like schema design, denormalization, and the creation of materialized views for high-performance reporting.
Data Quality Validation for Data Warehousing

This project focuses on conducting comprehensive data quality checks within a data warehousing environment. It demonstrates the use of a Python-based framework integrated with PostgreSQL to validate data integrity, ensuring accuracy and consistency.
Billing Data Warehouse

This project focuses on designing a data warehouse for a cloud service provider using their historical billing data. The goal is to organize this data into a star schema that can efficiently support queries related to billing trends, customer insights, and more.
Airflow Simple ETL pipeline

This repository contains a simple data engineering project that demonstrates how to build an ETL pipeline using Apache Airflow. The project processes road traffic data from various toll plazas, consolidating data from different formats into a single, transformed CSV file.
Flask Continuous Deployment with GCP

Example of deploying a Flask web application with continuous deployment using PaaS and Google App Engine.
Demo Multi Cloud Continuous Integration Github Actions

Simple demo of continous integration using Github Actions to perform tests using AWS, Azure or GCP
Testing Techniques

This is a repo for doing a simple example of testing python code

AWS:

AWS end-to-end Youtube Analysis Project

AWS Services used: (Amazon S3, AWS IAM, QuickSight, AWS Glue, AWS Lambda, AWS Athena)
Project Goals:
- Data Ingestion — Build a mechanism to ingest data from different sources
- ETL System — We are getting data in raw format, transforming this data into the proper format
- Data lake — We will be getting data from multiple sources so we need centralized repo to store them
- Scalability — As the size of our data increases, we need to make sure our system scales with it
- Cloud — We can’t process vast amounts of data on our local computer so we need to use the cloud, in this case, we will use AWS
- Reporting — Build a dashboard to get answers to the question we asked earlier
AWS Serverless AI Data Engineering Pipeline

This is an example of a Serverless AI Data Engineering Pipeline.
AWS Services used: (DynamoDB, Lambda, SQS, CloudWatch, Comprehend, S3)
With CloudWatch Timer we will invoke a AWS Lambda function(Producer) that reads a DynamoDB table and send each item as a message to a SQS queue, when a message feeds the queue a trigger will invoke a AWS Lambda function(Consumer) that will perform a request to Wikipedia API and ask for the first result and the first 2 lines of the content, then we will perform a sentiment analysis using AWS Compehend and store the results in a AWS S3 bucket.

DevOps:

CI Setting Up Workflow Jobs With Github Actions
This is a Python for DevOps repo, an example of CI using good practices. I have used a EC2 instance on cloud9 AWS environment, git and github actions to check the integrity of the code with a push as a trigger.
Libraries used: (click, pytest, pylint, black, pytest-cov)

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineering Portfolio

Data Engineering 101:

Solid Waste Management Data Warehouse Project

Denormalizing Tables in Data Warehouse: Creating Materialized Views

Data Quality Validation for Data Warehousing

Billing Data Warehouse

Airflow Simple ETL pipeline

Flask Continuous Deployment with GCP

Demo Multi Cloud Continuous Integration Github Actions

Testing Techniques

AWS:

AWS end-to-end Youtube Analysis Project

AWS Serverless AI Data Engineering Pipeline

DevOps:

CI Setting Up Workflow Jobs With Github Actions

About

Releases

Packages

VM-137/data-engineering-portfolio

Folders and files

Latest commit

History

Repository files navigation

Data Engineering Portfolio

Data Engineering 101:

Solid Waste Management Data Warehouse Project

Denormalizing Tables in Data Warehouse: Creating Materialized Views

Data Quality Validation for Data Warehousing

Billing Data Warehouse

Airflow Simple ETL pipeline

Flask Continuous Deployment with GCP

Demo Multi Cloud Continuous Integration Github Actions

Testing Techniques

AWS:

AWS end-to-end Youtube Analysis Project

AWS Serverless AI Data Engineering Pipeline

DevOps:

CI Setting Up Workflow Jobs With Github Actions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages