Snowflow is a modern ELT pipeline project demonstrating how to use Snowflake, dbt, and Apache Airflow together in a scalable and modular way.
Snowflow ingests TPCH sample data from Snowflake, applies modular SQL transformations using dbt, validates data with tests, and schedules daily runs using Airflow and Docker.
Build a production-grade ELT pipeline showcasing:
- Cloud-based warehousing with Snowflake
- Modular transformation with dbt
- Workflow orchestration with Airflow
- Data quality assurance with dbt tests
This project was developed to learn and showcase best practices in modern data engineering and analytics engineering workflows.
~1 week (design → implementation → orchestration)
End-to-end owner
🛠 Data Engineering, Analytics Engineering, DevOps
- Snowflake setup
- dbt modeling and testing
- Airflow orchestration and deployment
- Warehouse: Snowflake
- Transformations: dbt
- Orchestration: Apache Airflow (via Astronomer + Cosmos)
- Environment: Docker
- Version Control: Git
- Staging → Intermediate → Mart layer modeling
- DRY with dbt macros
- Generic & custom tests for data quality
- Scheduled DAGs for automation
- Created isolated role, schema, and warehouse.
- Granted scoped access for dbt role.
- Sources: TPCH sample data (
orders,lineitem) - Staging models with renamed, cleaned columns
- Intermediate models to aggregate and enrich
- Mart models (fact tables) for final output
Reusable SQL logic to avoid duplication, such as: ```sql {% macro discounted_amount(extended_price, discount_percentage, scale=2) %} (-1 * {{extended_price}} * {{discount_percentage}})::decimal(16, {{ scale }}) {% endmacro %} ```
- Generic Tests:
unique,not_null,relationships,accepted_values - Singular Tests: Business logic checks (e.g. future dates, discount amounts)
- Dockerized Airflow setup using Astronomer CLI
- dbt DAG created with Cosmos: ```python dbt_snowflake_dag = DbtDag( project_config=ProjectConfig("/usr/local/airflow/dags/dbt/data_pipeline"), ... ) ```
- Daily schedule using
@daily - Snowflake connection configured via Airflow UI
| Challenge | Solution |
|---|---|
| Avoiding repetitive SQL logic | Created reusable dbt macros |
| Integrating dbt with Airflow | Used Cosmos to auto-convert dbt projects into DAGs |
| Testing data quality at scale | Implemented a combination of generic and custom tests |
- Modular ELT pipeline across staging, intermediate, and marts
- dbt models with lineage and test coverage
- Fully scheduled and repeatable pipeline with Airflow
- Easy-to-read SQL transformations and test logic
