Skip to content

cykim329/Data_Projects

Repository files navigation

List of Projects Completed to maintain knowledge and continously learn end to end data engineering tools and processes

NYC Taxi:

 ETL Pipeline - Mage-ai
 Storage - Google Cloud Storage
 Instance - Google Compute Engine
 Data Warehouse - Google BigQuery
 Data Visualization - Google Looker Studio

Crypto Batch Processing:

 API - CoinMarketCap

Airflow:

 ETL Pipeline - Airflow
 Storage - AWS S3
 Instance - AWS EC2

Kafka Processing:

 Stream Processing - Apache Kafka
 Instance - AWS EC2
 Storage - AWS S3
 Data Integration - AWS Glue Crawler, AWS Glue Data Catalog
 Data Warehouse - AWS Athena

PGA TOUR Stats:

 RDBMS - Postgres

Basic Framework:

  1. Take data set
  2. Build data model in fact and dimension format
  3. Write transformation code in Python
  4. Deploy tool on instance on cloud services (AWS, Azure, GCP)
  5. Install tool (Open Source Modern Data Pipeline Tool)
  6. Load data into Data Warehouse
  7. Create final dashboard

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors