List of Projects Completed to maintain knowledge and continously learn end to end data engineering tools and processes
NYC Taxi:
ETL Pipeline - Mage-ai
Storage - Google Cloud Storage
Instance - Google Compute Engine
Data Warehouse - Google BigQuery
Data Visualization - Google Looker Studio
Crypto Batch Processing:
API - CoinMarketCap
Airflow:
ETL Pipeline - Airflow
Storage - AWS S3
Instance - AWS EC2
Kafka Processing:
Stream Processing - Apache Kafka
Instance - AWS EC2
Storage - AWS S3
Data Integration - AWS Glue Crawler, AWS Glue Data Catalog
Data Warehouse - AWS Athena
PGA TOUR Stats:
RDBMS - Postgres
Basic Framework:
- Take data set
- Build data model in fact and dimension format
- Write transformation code in Python
- Deploy tool on instance on cloud services (AWS, Azure, GCP)
- Install tool (Open Source Modern Data Pipeline Tool)
- Load data into Data Warehouse
- Create final dashboard