Building an ETL (Extract, Transform, Load) pipeline using Python and leveraging Apache Airflow for orchestration, testing on Docker, finally deploying the pipeline on the Google Cloud Platform (GCP).
- Code accompanying Chapter 14 of the book Data Pipelines with Apache Airflow
- Data Pipelines with Apache Airflow
- TLC Trip Record Data
- Citi Bike Trip Histories
- NOAA - NYC Weather History
- CityBikes API
-
Extract Data from diverse sources, such as weather datasets, Citibike records, yellow taxi information, and geographic zone details.
-
Transform Data within Python, employing various data processing techniques. This involves tasks like computing trip durations and enhancing data quality by mapping geographical coordinates to relevant region IDs.
-
Load Data into Snowflake by establishing a connection to your Snowflake data warehouse via Python, enabling the efficient transfer of the processed data. The connection setup and data loading procedures are paramount.
-
Construct a Data Mart with a Star Schema within the Snowflake environment.






