Skip to content

SOUJU07/AIRBNB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🏠 Airbnb Analytics Engineering Platform using Snowflake, dbt & AWS

πŸ“Œ Project Overview

This project demonstrates the implementation of an end-to-end cloud-based Analytics Engineering platform using Snowflake, dbt, and AWS.

The solution follows the Medallion Architecture pattern (Bronze β†’ Silver β†’ Gold) to transform raw Airbnb operational data into trusted, analytics-ready datasets. The project incorporates modern data engineering best practices including incremental data loading, Slowly Changing Dimensions (SCD Type 2), data quality testing, reusable macros, and dimensional modeling.

The primary goal is to simulate a real-world analytics environment where raw business data is transformed into reliable datasets for reporting, dashboarding, and decision-making.


🎯 Business Problem

Airbnb generates large volumes of operational data related to:

  • Property Listings
  • Hosts
  • Bookings

Raw source data is not immediately suitable for analytical consumption because:

  • Data quality issues may exist
  • Historical changes are not tracked
  • Business metrics are not readily available
  • Data resides across multiple datasets

This project addresses these challenges by building a scalable ELT pipeline that standardizes, validates, enriches, and models Airbnb data for downstream analytics.


πŸ—οΈ Solution Architecture

High-Level Architecture

                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                 β”‚ Airbnb CSV Data β”‚
                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚
                          β–Ό
                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                 β”‚     AWS S3      β”‚
                 β”‚ Landing Zone    β”‚
                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚
                          β–Ό
                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                 β”‚   Snowflake     β”‚
                 β”‚ Staging Layer   β”‚
                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚
      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
      β–Ό                   β–Ό                   β–Ό

 Bronze Layer      Silver Layer        Gold Layer

 Raw Data          Cleaned Data      Analytics Data
 Incremental       Standardized      Fact Tables
 Processing        Enriched          OBT
                                      Reporting

                          β”‚
                          β–Ό

               BI / Reporting / Analytics

πŸ› οΈ Technology Stack

Category Technology
Cloud Data Warehouse Snowflake
Transformation Layer dbt
Cloud Storage AWS S3
Programming Language Python
Version Control Git
SQL Templating Jinja
Data Modeling Star Schema
Historical Tracking SCD Type 2
Architecture Medallion Architecture

πŸ“Š Data Architecture

πŸ₯‰ Bronze Layer

Purpose:

Store raw source data with minimal transformations.

Models:

  • bronze_bookings
  • bronze_hosts
  • bronze_listings

Responsibilities:

  • Preserve source data
  • Enable incremental ingestion
  • Maintain auditability
  • Serve as source of truth

πŸ₯ˆ Silver Layer

Purpose:

Clean, validate, and standardize raw datasets.

Transformations:

  • Data type standardization
  • Null handling
  • Data quality validation
  • Attribute enrichment
  • Price categorization

Models:

  • silver_bookings
  • silver_hosts
  • silver_listings

πŸ₯‡ Gold Layer

Purpose:

Provide analytics-ready datasets for business users.

Models:

  • fact
  • obt (One Big Table)
  • ephemeral models

Business Use Cases:

  • Revenue Analysis
  • Booking Trend Analysis
  • Host Performance Tracking
  • Listing Performance Analysis
  • Occupancy Reporting

πŸ“š Slowly Changing Dimensions (SCD Type 2)

The project uses dbt Snapshots to maintain historical changes.

Snapshot Models:

  • dim_bookings
  • dim_hosts
  • dim_listings

Benefits:

  • Historical reporting
  • Point-in-time analysis
  • Change tracking
  • Auditability

πŸ“ Project Structure

AWS_DBT_Snowflake/
β”‚
β”œβ”€β”€ README.md
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ main.py
β”‚
β”œβ”€β”€ SourceData/
β”‚   β”œβ”€β”€ bookings.csv
β”‚   β”œβ”€β”€ hosts.csv
β”‚   └── listings.csv
β”‚
β”œβ”€β”€ DDL/
β”‚   β”œβ”€β”€ ddl.sql
β”‚   └── resources.sql
β”‚
└── aws_dbt_snowflake_project/
    β”‚
    β”œβ”€β”€ dbt_project.yml
    β”œβ”€β”€ ExampleProfiles.yml
    β”‚
    β”œβ”€β”€ models/
    β”‚   β”œβ”€β”€ sources/
    β”‚   β”œβ”€β”€ bronze/
    β”‚   β”œβ”€β”€ silver/
    β”‚   └── gold/
    β”‚
    β”œβ”€β”€ macros/
    β”œβ”€β”€ snapshots/
    β”œβ”€β”€ analyses/
    β”œβ”€β”€ tests/
    └── seeds/

πŸš€ Getting Started

Prerequisites

Before running this project, ensure the following are available:

Snowflake

  • Snowflake Account
  • Database Access
  • Warehouse Access

Python

  • Python 3.12+
  • pip

AWS

  • AWS Account
  • S3 Bucket (optional)

βš™οΈ Installation

Clone Repository

git clone <repository-url>
cd AWS_DBT_Snowflake

Create Virtual Environment

Windows

python -m venv .venv
.venv\Scripts\Activate.ps1

Linux / Mac

python -m venv .venv
source .venv/bin/activate

Install Dependencies

pip install -r requirements.txt

or

pip install -e .

Core Dependencies

dbt-core>=1.11
dbt-snowflake>=1.11
sqlfmt

πŸ”‘ Snowflake Configuration

Create:

~/.dbt/profiles.yml
aws_dbt_snowflake_project:
  outputs:
    dev:
      account: <account_identifier>
      database: AIRBNB
      password: <password>
      role: ACCOUNTADMIN
      schema: dbt_schema
      threads: 4
      type: snowflake
      user: <username>
      warehouse: COMPUTE_WH
  target: dev

πŸ—οΈ Database Setup

Execute DDL scripts located inside:

DDL/

This creates:

  • Staging Tables
  • Required Database Objects

πŸ“₯ Source Data Loading

Load source CSV files into Snowflake staging schema.

File Target Table
bookings.csv AIRBNB.STAGING.BOOKINGS
hosts.csv AIRBNB.STAGING.HOSTS
listings.csv AIRBNB.STAGING.LISTINGS

▢️ Running dbt

Verify Connection

dbt debug

Install Packages

dbt deps

Run All Models

dbt run

Run Bronze Layer

dbt run --select bronze.*

Run Silver Layer

dbt run --select silver.*

Run Gold Layer

dbt run --select gold.*

Execute Tests

dbt test

Run Snapshots

dbt snapshot

Build Entire Project

dbt build

Generate Documentation

dbt docs generate
dbt docs serve

⚑ Key Features

Incremental Processing

The Bronze and Silver layers use incremental materialization to process only newly arrived records.

Example:

{{ config(materialized='incremental') }}

{% if is_incremental() %}

WHERE CREATED_AT >
(
    SELECT COALESCE(MAX(CREATED_AT),'1900-01-01')
    FROM {{ this }}
)

{% endif %}

Benefits:

  • Reduced warehouse costs
  • Faster execution
  • Scalable architecture

Custom Macros

Reusable business logic is implemented using dbt macros.

Example:

{{ tag('CAST(PRICE_PER_NIGHT AS INT)') }}

Output:

LOW
MEDIUM
HIGH

Dynamic SQL using Jinja

The OBT model leverages Jinja loops to dynamically generate SQL.

Example:

{% set configs = [...] %}

SELECT

{% for config in configs %}

...

{% endfor %}

Benefits:

  • Less repetitive code
  • Improved maintainability
  • Easier model expansion

SCD Type 2 Snapshots

Historical records are tracked using dbt snapshots.

Features:

  • Valid From Date
  • Valid To Date
  • Current Record Indicator
  • Historical State Tracking

Schema Management

Custom schema generation automatically routes models into dedicated schemas.

Example:

AIRBNB.BRONZE
AIRBNB.SILVER
AIRBNB.GOLD

βœ… Data Quality Framework

Implemented Data Quality Checks:

  • Unique Key Validation
  • Not Null Validation
  • Source Integrity Checks
  • Business Rule Validation
  • Referential Integrity Testing

Example:

dbt test

πŸ” Data Lineage

dbt automatically generates lineage graphs showing:

  • Source Dependencies
  • Upstream Relationships
  • Downstream Impacts
  • Model Dependencies

Generate lineage using:

dbt docs generate
dbt docs serve

πŸ“ˆ Analytics Use Cases

The Gold Layer supports the following analytical use cases:

Revenue Analysis

Identify top-performing properties and revenue trends.

Host Performance

Evaluate host effectiveness and booking performance.

Booking Trends

Analyze booking behavior over time.

Property Analysis

Compare listing performance across categories.

Historical Reporting

Leverage SCD Type 2 snapshots for point-in-time analysis.


πŸ’‘ Skills Demonstrated

This project demonstrates hands-on experience in:

  • Snowflake Data Warehousing
  • Analytics Engineering
  • dbt Development
  • ELT Pipeline Design
  • Medallion Architecture
  • Incremental Loading
  • SCD Type 2 Snapshots
  • Data Modeling
  • Fact & Dimension Design
  • Data Quality Engineering
  • SQL Development
  • Jinja Templating
  • Git Version Control
  • AWS S3 Integration

πŸ”’ Security Best Practices

  • Credentials excluded from version control
  • Role-Based Access Control (RBAC)
  • Environment-specific configurations
  • Schema-level separation
  • Principle of least privilege

πŸ› Troubleshooting

Snowflake Connection Errors

dbt debug

Verify:

  • Username
  • Password
  • Account Identifier
  • Warehouse Name

Compilation Errors

Verify:

  • dbt_project.yml
  • Jinja Syntax
  • Source Definitions
  • Model Dependencies

Incremental Model Issues

Run full refresh:

dbt run --full-refresh

πŸ“Š Future Enhancements

  • Apache Airflow Orchestration
  • CI/CD using GitHub Actions
  • Data Observability
  • Automated Monitoring
  • Power BI Integration
  • Tableau Integration
  • Data Masking for PII
  • Real-Time Data Ingestion
  • Cost Optimization Dashboards

πŸ‘¨β€πŸ’» Author

Project: Airbnb Analytics Engineering Platform

Tech Stack: Snowflake | dbt | AWS | SQL | Python | Git

This project was developed as a hands-on implementation of modern Data Engineering and Analytics Engineering practices using the Modern Data Stack.

Resources:

  • Learn more about dbt in the docs
  • Check out Discourse for commonly asked questions and answers
  • Join the chat on Slack for live discussions and support
  • Find dbt events near you
  • Check out the blog for the latest news on dbt's development and best practices

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors