Data Engineering Project

Project Overview

This project highlights my exploration in data engineering. The primary objective is to generate synthetic data and integrate it with various tools such as Snowflake, dbt, Airflow, and AWS services like EC2, S3, Lambda, SNS, and more. Additionally, the project aims to utilize and demonstrate proficiency in these technologies.

Goals

Enable companies to develop a data strategy using an ELT approach.
Develop an automated data pipeline using a modern data stack, including notions of DevOps.
Showcase skills in data engineering.

Project Structure

Navigate through the project's structure:

airflow/: Folder containing Airflow DAG and related files.
aws: Code for the Lambda function.
dbt/: Folder containing dbt models and configurations.
samples: Airflow logs for each task, AWS lambda and DBT linage.
datagen/: Python scripts for generating fake data and a sample of the data in CSV.

Pipeline Diagram

Stack
- Docker, Docker-Compose.
- AWS Services - EC2, S3, IAM, Lambda, SNS, Secret Manager.
- Snowflake as Data Warehouse.
- Python, Pandas, and Databases.
- Airflow.
- dbt Core.
- Data modeling, Snowflake architecture, normalizations.
More specifically, the following technologies were utilized throughout the project:
- Docker Compose: Used to initiate Apache Airflow for scheduling and orchestration.
- Airflow: Orchestrated the data pipeline extraction process, running Python, Snowflake, and Bash Operators to perform different tasks.
- dbt: Utilized for the intermediate and data mart layers, enabling the implementation of CTEs, normalizations, and a Snowflake schema.

Conclusion

Encountered learning opportunities and challenges, testing fundamental technical skills for a data engineer.

Key Highlights:

Integration of different technologies.
Database building and modeling challenges.

Remaining Tasks

Add different data sources like APIs.
Dive deep into different services like API gateway, CloudWatch.
Implement a CI/CD.
Implement an Infrastructure as Code like Terraform.
Integrate Tableau for dashboard visualization.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
airflow/dags		airflow/dags
aws		aws
datagen		datagen
dbt/dogspipeline		dbt/dogspipeline
samples		samples
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineering Project

Project Overview

Goals

Project Structure

Pipeline Diagram

Contents

The Generated Data

Data Components

Objectives

Used Tools

Conclusion

Key Highlights:

Remaining Tasks

About

Releases

Packages

Languages

MostafaNabilll/end2end_pipeline

Folders and files

Latest commit

History

Repository files navigation

Data Engineering Project

Project Overview

Goals

Project Structure

Pipeline Diagram

Contents

The Generated Data

Data Components

Objectives

Used Tools

Conclusion

Key Highlights:

Remaining Tasks

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages