This project was created as part of an individual assignment in the "Fast-Track Data Engineer Scholarship" program organized by Digital Skola. This project focuses on dbt, a data transformation tool that enables data analysts and data engineers to transform data in a cloud-based analytics warehouse.
Ensure you have the following installed:
- docker
- python3.7
- dbt
- postgres (PgAdmin or dbeaver)
- Start Postgres Database on Docker
- Sample Data Here: https://www.postgresqltutorial.com/postgresql-getting-started/load-postgresql-sample-database/
- Downloads dvdrental.zip
- Extract dvdrental.zip as directory
- If you run Postgres with docker then run this command to mount path to container docker run --name postgres-test -e PGDATA=/var/lib/postgresql/data -e POSTGRES_PASSWORD= -e POSTGRES_USER=postgres -e POSTGRES_DB=postgres -p 5050:5050 -v ./dvdrental:/dvdrental -v dbt-postgres:/var/lib/postgresql/data -d postgres-test:12
- Run this to restore backup file docker exec -it bash create database data_warehouse quit database \q Run in terminal pg_restore -U postgres -d data_warehouse /dvdrental
- psql -U postgres
-
Create new Python Virtual Environment: python3 -m venv env
-
Activate Virtual Environment: source env/bin/activate
-
Install DBT libraries:
- pip3 install dbt-core
- pip3 install dbt-postgres
-
Initiate DBT project
dbt init
-
Create profiles.yml data_warehouse: outputs: dev: dbname: <your_db_name> host: localhost pass: postgres port: <your_port> schema: dbt_dev threads: 1 type: postgres user: postgres prod: dbname: <your_db_name> host: localhost pass: <your_password> port: <your_port> schema: dbt threads: 1 type: postgres user: postgres target: dev
-
Run debug, if all connections passed then move to next step dbt debug
- Create Schema medallion architecture :
- raw: Raw data
- intermediete: Fact and dim table
- gold: Mart
- Login to Postgres
- \c data_warehouse
- create schema dbt_dev_raw;
- create schema dbt_dev_intermediate;
- create schema dbt_dev_mart;
- Create raw model and write it to raw_dev schema using DBT
- payment
- rental
- staff
- customer
- address
- inventory
- film
- film_actor
- actor
- Create sources.yml version: 2 sources:
- name: public
database: <your_db_name>
schema: public
tables:
- name: payment
- Create intermediete model and write it to intermediete_dev schema using DBT
- fact_payment
- dim_rental
- dim_staff
- dim_customer
- dim_address
- dim_inventory
- dim_film
- dim_film_actor
- dim_actor
- Create mart model and write it to mart_dev schema using DBT
- Generate docs dbt docs generate
- Run UI dbt docs serve


