Skip to content

ETL Pipeline create with Snowflake, Airflow, dbt, Astronomer Cosmos, Docker

Notifications You must be signed in to change notification settings

KhaledCodes/Snowflow

Repository files navigation

❄️ Snowflow

Snowflow is a modern ELT pipeline project demonstrating how to use Snowflake, dbt, and Apache Airflow together in a scalable and modular way.


🚀 Project Overview

Snowflow ingests TPCH sample data from Snowflake, applies modular SQL transformations using dbt, validates data with tests, and schedules daily runs using Airflow and Docker.


🎯 Objective

Build a production-grade ELT pipeline showcasing:

  • Cloud-based warehousing with Snowflake
  • Modular transformation with dbt
  • Workflow orchestration with Airflow
  • Data quality assurance with dbt tests

📌 Context

This project was developed to learn and showcase best practices in modern data engineering and analytics engineering workflows.


⏱️ Duration

~1 week (design → implementation → orchestration)


👤 Role

End-to-end owner
🛠 Data Engineering, Analytics Engineering, DevOps

  • Snowflake setup
  • dbt modeling and testing
  • Airflow orchestration and deployment

🧰 Tools & Technologies

  • Warehouse: Snowflake
  • Transformations: dbt
  • Orchestration: Apache Airflow (via Astronomer + Cosmos)
  • Environment: Docker
  • Version Control: Git

Methodologies

  • Staging → Intermediate → Mart layer modeling
  • DRY with dbt macros
  • Generic & custom tests for data quality
  • Scheduled DAGs for automation

🧪 The Pipeline

1. Snowflake Setup

  • Created isolated role, schema, and warehouse.
  • Granted scoped access for dbt role.

2. dbt Models

  • Sources: TPCH sample data (orders, lineitem)
  • Staging models with renamed, cleaned columns
  • Intermediate models to aggregate and enrich
  • Mart models (fact tables) for final output

3. Macros

Reusable SQL logic to avoid duplication, such as: ```sql {% macro discounted_amount(extended_price, discount_percentage, scale=2) %} (-1 * {{extended_price}} * {{discount_percentage}})::decimal(16, {{ scale }}) {% endmacro %} ```

4. Data Testing

  • Generic Tests: unique, not_null, relationships, accepted_values
  • Singular Tests: Business logic checks (e.g. future dates, discount amounts)

5. Airflow Deployment

  • Dockerized Airflow setup using Astronomer CLI
  • dbt DAG created with Cosmos: ```python dbt_snowflake_dag = DbtDag( project_config=ProjectConfig("/usr/local/airflow/dags/dbt/data_pipeline"), ... ) ```
  • Daily schedule using @daily
  • Snowflake connection configured via Airflow UI

⚠️ Challenges & Solutions

Challenge Solution
Avoiding repetitive SQL logic Created reusable dbt macros
Integrating dbt with Airflow Used Cosmos to auto-convert dbt projects into DAGs
Testing data quality at scale Implemented a combination of generic and custom tests

✅ Outcome

  • Modular ELT pipeline across staging, intermediate, and marts
  • dbt models with lineage and test coverage
  • Fully scheduled and repeatable pipeline with Airflow
  • Easy-to-read SQL transformations and test logic

📸 Preview

Project Screenshot

About

ETL Pipeline create with Snowflake, Airflow, dbt, Astronomer Cosmos, Docker

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors