Skip to content

Building an ETL (Extract, Transform, Load) pipeline using Python and leveraging Apache Airflow for orchestration, testing on Docker, finally deploying the pipeline on the Google Cloud Platform (GCP).

Notifications You must be signed in to change notification settings

Siyu-Wu/NYC-Transportation-ETL-Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NYC Transportation ETL Pipeline

Building an ETL (Extract, Transform, Load) pipeline using Python and leveraging Apache Airflow for orchestration, testing on Docker, finally deploying the pipeline on the Google Cloud Platform (GCP).

Acknowledgements

Roadmap

  • Extract Data from diverse sources, such as weather datasets, Citibike records, yellow taxi information, and geographic zone details.

  • Transform Data within Python, employing various data processing techniques. This involves tasks like computing trip durations and enhancing data quality by mapping geographical coordinates to relevant region IDs.

  • Load Data into Snowflake by establishing a connection to your Snowflake data warehouse via Python, enabling the efficient transfer of the processed data. The connection setup and data loading procedures are paramount.

  • Construct a Data Mart with a Star Schema within the Snowflake environment.

Data Management

data_warehouse datamart

Screenshots

mapping DAG_Graph Orchestrating_with_Apache_Airflow Deploy_on_GCP

About

Building an ETL (Extract, Transform, Load) pipeline using Python and leveraging Apache Airflow for orchestration, testing on Docker, finally deploying the pipeline on the Google Cloud Platform (GCP).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages