Skip to content

Data93/project-3-dbt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About

This project was created as part of an individual assignment in the "Fast-Track Data Engineer Scholarship" program organized by Digital Skola. This project focuses on dbt, a data transformation tool that enables data analysts and data engineers to transform data in a cloud-based analytics warehouse.

Getting Started

Prerequisites

Ensure you have the following installed:

  1. docker
  2. python3.7
  3. dbt
  4. postgres (PgAdmin or dbeaver)

Preparing your tools and data

  1. Start Postgres Database on Docker
  2. Sample Data Here: https://www.postgresqltutorial.com/postgresql-getting-started/load-postgresql-sample-database/
  3. Downloads dvdrental.zip
  4. Extract dvdrental.zip as directory

For you that run Postgres in Docker

  1. If you run Postgres with docker then run this command to mount path to container docker run --name postgres-test -e PGDATA=/var/lib/postgresql/data -e POSTGRES_PASSWORD= -e POSTGRES_USER=postgres -e POSTGRES_DB=postgres -p 5050:5050 -v ./dvdrental:/dvdrental -v dbt-postgres:/var/lib/postgresql/data -d postgres-test:12
  2. Run this to restore backup file docker exec -it bash create database data_warehouse quit database \q Run in terminal pg_restore -U postgres -d data_warehouse /dvdrental
  3. psql -U postgres

DBT

  1. Create new Python Virtual Environment: python3 -m venv env

  2. Activate Virtual Environment: source env/bin/activate

  3. Install DBT libraries:

    1. pip3 install dbt-core
    2. pip3 install dbt-postgres
  4. Initiate DBT project

    dbt init

  5. Create profiles.yml data_warehouse: outputs: dev: dbname: <your_db_name> host: localhost pass: postgres port: <your_port> schema: dbt_dev threads: 1 type: postgres user: postgres prod: dbname: <your_db_name> host: localhost pass: <your_password> port: <your_port> schema: dbt threads: 1 type: postgres user: postgres target: dev

  6. Run debug, if all connections passed then move to next step dbt debug

Data Modelling

  1. Create Schema medallion architecture :
    1. raw: Raw data
    2. intermediete: Fact and dim table
    3. gold: Mart
  2. Login to Postgres
  • \c data_warehouse
  • create schema dbt_dev_raw;
  • create schema dbt_dev_intermediate;
  • create schema dbt_dev_mart;
  1. Create raw model and write it to raw_dev schema using DBT
    • payment
    • rental
    • staff
    • customer
    • address
    • inventory
    • film
    • film_actor
    • actor
  2. Create sources.yml version: 2 sources:
  • name: public database: <your_db_name> schema: public tables:
    • name: payment
  1. Create intermediete model and write it to intermediete_dev schema using DBT
    • fact_payment
    • dim_rental
    • dim_staff
    • dim_customer
    • dim_address
    • dim_inventory
    • dim_film
    • dim_film_actor
    • dim_actor
  2. Create mart model and write it to mart_dev schema using DBT
    • How many monthly total revenue ? dbt-dag (2)
    • What is the best selling film ? dbt-dag
    • Who is the actor who plays the most roles in films ? dbt-dag (1)

DBT UI

  1. Generate docs dbt docs generate
  2. Run UI dbt docs serve

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published