This project demonstrates the implementation of an end-to-end cloud-based Analytics Engineering platform using Snowflake, dbt, and AWS.
The solution follows the Medallion Architecture pattern (Bronze β Silver β Gold) to transform raw Airbnb operational data into trusted, analytics-ready datasets. The project incorporates modern data engineering best practices including incremental data loading, Slowly Changing Dimensions (SCD Type 2), data quality testing, reusable macros, and dimensional modeling.
The primary goal is to simulate a real-world analytics environment where raw business data is transformed into reliable datasets for reporting, dashboarding, and decision-making.
Airbnb generates large volumes of operational data related to:
- Property Listings
- Hosts
- Bookings
Raw source data is not immediately suitable for analytical consumption because:
- Data quality issues may exist
- Historical changes are not tracked
- Business metrics are not readily available
- Data resides across multiple datasets
This project addresses these challenges by building a scalable ELT pipeline that standardizes, validates, enriches, and models Airbnb data for downstream analytics.
βββββββββββββββββββ
β Airbnb CSV Data β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β AWS S3 β
β Landing Zone β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Snowflake β
β Staging Layer β
ββββββββββ¬βββββββββ
β
βββββββββββββββββββββΌββββββββββββββββββββ
βΌ βΌ βΌ
Bronze Layer Silver Layer Gold Layer
Raw Data Cleaned Data Analytics Data
Incremental Standardized Fact Tables
Processing Enriched OBT
Reporting
β
βΌ
BI / Reporting / Analytics
| Category | Technology |
|---|---|
| Cloud Data Warehouse | Snowflake |
| Transformation Layer | dbt |
| Cloud Storage | AWS S3 |
| Programming Language | Python |
| Version Control | Git |
| SQL Templating | Jinja |
| Data Modeling | Star Schema |
| Historical Tracking | SCD Type 2 |
| Architecture | Medallion Architecture |
Purpose:
Store raw source data with minimal transformations.
Models:
- bronze_bookings
- bronze_hosts
- bronze_listings
Responsibilities:
- Preserve source data
- Enable incremental ingestion
- Maintain auditability
- Serve as source of truth
Purpose:
Clean, validate, and standardize raw datasets.
Transformations:
- Data type standardization
- Null handling
- Data quality validation
- Attribute enrichment
- Price categorization
Models:
- silver_bookings
- silver_hosts
- silver_listings
Purpose:
Provide analytics-ready datasets for business users.
Models:
- fact
- obt (One Big Table)
- ephemeral models
Business Use Cases:
- Revenue Analysis
- Booking Trend Analysis
- Host Performance Tracking
- Listing Performance Analysis
- Occupancy Reporting
The project uses dbt Snapshots to maintain historical changes.
Snapshot Models:
- dim_bookings
- dim_hosts
- dim_listings
Benefits:
- Historical reporting
- Point-in-time analysis
- Change tracking
- Auditability
AWS_DBT_Snowflake/
β
βββ README.md
βββ pyproject.toml
βββ main.py
β
βββ SourceData/
β βββ bookings.csv
β βββ hosts.csv
β βββ listings.csv
β
βββ DDL/
β βββ ddl.sql
β βββ resources.sql
β
βββ aws_dbt_snowflake_project/
β
βββ dbt_project.yml
βββ ExampleProfiles.yml
β
βββ models/
β βββ sources/
β βββ bronze/
β βββ silver/
β βββ gold/
β
βββ macros/
βββ snapshots/
βββ analyses/
βββ tests/
βββ seeds/
Before running this project, ensure the following are available:
- Snowflake Account
- Database Access
- Warehouse Access
- Python 3.12+
- pip
- AWS Account
- S3 Bucket (optional)
git clone <repository-url>
cd AWS_DBT_Snowflakepython -m venv .venv
.venv\Scripts\Activate.ps1python -m venv .venv
source .venv/bin/activatepip install -r requirements.txtor
pip install -e .dbt-core>=1.11
dbt-snowflake>=1.11
sqlfmt
Create:
~/.dbt/profiles.yml
aws_dbt_snowflake_project:
outputs:
dev:
account: <account_identifier>
database: AIRBNB
password: <password>
role: ACCOUNTADMIN
schema: dbt_schema
threads: 4
type: snowflake
user: <username>
warehouse: COMPUTE_WH
target: devExecute DDL scripts located inside:
DDL/
This creates:
- Staging Tables
- Required Database Objects
Load source CSV files into Snowflake staging schema.
| File | Target Table |
|---|---|
| bookings.csv | AIRBNB.STAGING.BOOKINGS |
| hosts.csv | AIRBNB.STAGING.HOSTS |
| listings.csv | AIRBNB.STAGING.LISTINGS |
dbt debugdbt depsdbt rundbt run --select bronze.*dbt run --select silver.*dbt run --select gold.*dbt testdbt snapshotdbt builddbt docs generate
dbt docs serveThe Bronze and Silver layers use incremental materialization to process only newly arrived records.
Example:
{{ config(materialized='incremental') }}
{% if is_incremental() %}
WHERE CREATED_AT >
(
SELECT COALESCE(MAX(CREATED_AT),'1900-01-01')
FROM {{ this }}
)
{% endif %}Benefits:
- Reduced warehouse costs
- Faster execution
- Scalable architecture
Reusable business logic is implemented using dbt macros.
Example:
{{ tag('CAST(PRICE_PER_NIGHT AS INT)') }}Output:
LOW
MEDIUM
HIGH
The OBT model leverages Jinja loops to dynamically generate SQL.
Example:
{% set configs = [...] %}
SELECT
{% for config in configs %}
...
{% endfor %}Benefits:
- Less repetitive code
- Improved maintainability
- Easier model expansion
Historical records are tracked using dbt snapshots.
Features:
- Valid From Date
- Valid To Date
- Current Record Indicator
- Historical State Tracking
Custom schema generation automatically routes models into dedicated schemas.
Example:
AIRBNB.BRONZE
AIRBNB.SILVER
AIRBNB.GOLD
Implemented Data Quality Checks:
- Unique Key Validation
- Not Null Validation
- Source Integrity Checks
- Business Rule Validation
- Referential Integrity Testing
Example:
dbt testdbt automatically generates lineage graphs showing:
- Source Dependencies
- Upstream Relationships
- Downstream Impacts
- Model Dependencies
Generate lineage using:
dbt docs generate
dbt docs serveThe Gold Layer supports the following analytical use cases:
Identify top-performing properties and revenue trends.
Evaluate host effectiveness and booking performance.
Analyze booking behavior over time.
Compare listing performance across categories.
Leverage SCD Type 2 snapshots for point-in-time analysis.
This project demonstrates hands-on experience in:
- Snowflake Data Warehousing
- Analytics Engineering
- dbt Development
- ELT Pipeline Design
- Medallion Architecture
- Incremental Loading
- SCD Type 2 Snapshots
- Data Modeling
- Fact & Dimension Design
- Data Quality Engineering
- SQL Development
- Jinja Templating
- Git Version Control
- AWS S3 Integration
- Credentials excluded from version control
- Role-Based Access Control (RBAC)
- Environment-specific configurations
- Schema-level separation
- Principle of least privilege
dbt debugVerify:
- Username
- Password
- Account Identifier
- Warehouse Name
Verify:
- dbt_project.yml
- Jinja Syntax
- Source Definitions
- Model Dependencies
Run full refresh:
dbt run --full-refresh- Apache Airflow Orchestration
- CI/CD using GitHub Actions
- Data Observability
- Automated Monitoring
- Power BI Integration
- Tableau Integration
- Data Masking for PII
- Real-Time Data Ingestion
- Cost Optimization Dashboards
Project: Airbnb Analytics Engineering Platform
Tech Stack: Snowflake | dbt | AWS | SQL | Python | Git
This project was developed as a hands-on implementation of modern Data Engineering and Analytics Engineering practices using the Modern Data Stack.
- Learn more about dbt in the docs
- Check out Discourse for commonly asked questions and answers
- Join the chat on Slack for live discussions and support
- Find dbt events near you
- Check out the blog for the latest news on dbt's development and best practices