Skip to content

JanZednicek/dbt-snowflake-public

Repository files navigation

Sample Data warehouse | Datamart (Data Engineering project)

  • This repository contains dbt project only
  • Author: Jan Zednicek - If you find any inconsistencies, please let me know - jsem@janzednicek.cz
  • Feel free to use this code without any limitations for learning (non-commercial) use.

1 Project description

The goal of this project is to create the structure of a data warehouse or a data mart, incorporating staging and increment data layers. We will utilize sample sales data from the AdventureWorks database on SQL Server. Our database structures will be implemented in Snowflake. Extraction from the source system will be conducted using Fivetran, and transformations will be carried out using dbt.

We aim to achieve:

  • Extract data using fivetran, snapshoting/transformations using dbt
  • Implementation of Slowly Changing Dimension Type 2.
  • Create data warehouse model based on star schema
  • Create Orchestration engine using dbt references
  • Using clear and consistent naming conventions.
  • Development (dev) and (prod) profiles in dbt for testing purposes.

Recap:

2 Source data (Fictional CRM - AdventureWorks)

Below you can find Source data ER diagram. Our data consist of 7 tables. These tables are not heavy (couple of thousands of rows)

  • Address – list of addresses (shipment/billing)
  • Customer – list of customers
  • Product – list of products together with prices and standard costs
  • Productmodel – a list of product models (each product is associated with a specific model).
  • Productcategory – list of product categories (self referrenced parent-child structure). Each product is associated with a specific category tree.
  • Salesorderheader – sales orders ordered by our customers
  • Salesorderdetail – sales order line items (each order contains items -some products in it)
Source Data ER Diagram

3 Destination data (Snowflake) – Data warehouse structure

We will transform our data using dbt to the star schema below. Source Data ER Diagram

4 Steplist

  1. Create Fivetran account (trial and 500 000 rows for free)
  2. Create Snowflake account (30 days trial)
    • Copy AdventureWorks data using Fivetran OR manually to Snowflake database (give it a name “ADVENTUREWORKS”, schema “SQL_SERVER_SALESLT”)
    • Create database “DBT_DEV” db environment
      • create schema “STAGING”
      • create schema “SALES_MART”,
      • create user DBT_DEV_USER and give him a ACCOUNTADMIN role
    • Create database for “DBT_PROD” db environment
      • create schema “STAGING”
      • create schema “SALES_MART”
      • create user DBT_PROD_USER and give him a ACCOUNTADMIN role
    • Create warehouse for query processing
  3. Clone this repository to you local folder
  4. Install python
  5. Open command line/terminal and create new environment
  6. Activate new environment and install dbt with snowflake adapter using “pip install dbt-snowflake
  7. Open your dbt location (step 3) and open file named profiles.yml and fill your snowflake credentials

  8. image
  9. In command line (terminal), go to your dbt location and run “dbt debug --target dev” and then “dbt debug --target prod”. It tests your connection.
  10. Run “dbt run –target dev” if successful you have your data in star schema model
  11. Create some cool reports

5 Naming conventions - to keep all organized:

A) Tables

  • Hist_Table – contain all historical versions – slowly changing dimensions 2
  • Curr_Table – contain only last known current version
  • D_ - Dimension table
  • F_ - Fact table

B) Columns

  • SK_TABLE – Surrogate key – primary key
  • SK_TABLE_MASTER – Surrogate key – business key (consistent over all historical versions of the same business key)
  • ID_BUSINESS_KEY – original business key
  • CODE_STATUS (C,D,E) – Current (C), Expired (E), Deleted (D)
  • TEXT_COLUMN – nvarchar type of information
  • NUM_COLUMN – numerical value
  • AMT_COLUMN – amount (money, quantity)

C) dbt file organization

1) snapshot
  • source_system1 (folder)
    • source_system1_rawtable1
    • source_system1_rawtable2
  • source_system2 (folder)
  • source_system3 (folder)
2) model
  • marts (semantic layer)
    • sales
      • curr_d_table1
      • hist_d_table1
    • marketing
    • finance
  • staging (raw data)
    • source_system1 (folder)
      • curr_source_system_rawtable1
      • curr_source_system_rawtable2
      • snap_source_system_rawtable1
      • snap_source_system_rawtable2
    • source_system2 (folder)
    • source_system3 (folder)
dbt_file_structure

6 Code examples

7 Power BI report

Comming soon

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published