Skip to content

This project is about building a dimensional data warehouse in BigQuery by transforming an OLTP system to an OLAP system, using dbt as our data transformation tool.

Notifications You must be signed in to change notification settings

Chisomnwa/Building-OLAP-Dimensional-Model-using-BigQuery-and-DBT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 

Repository files navigation

Building OLAP Dimensional Model in BigQuery, using dbt as a Data Transformation Tool.

This project is about building a dimensional data warehouse in BigQuery by transforming an OLTP system to an OLAP system, using dbt as our data transformation tool.

The OLTP system contains the sales data of Northwind; a specialty food export-import company.

Aim

To bring Northwind's data reporting up to speed by embracing Dimensional Modeling.

Introducing OLAP for Northwind OLTP Database

What's the current setup or architecture?

  • Northwind traders are companies that buy and sell special foods worldwide.
  • This is a practice database made by Microsoft to showcase its product features and for learning purposes.
  • The current setup combines on-site and older systems.
  • They use MySQL for their main daily sales transactions.
  • MySQL is also used for creating and running reports, but it's not efficient because analytical queries slow down the transaction system.

Why do we need a new setup or architecture?

  • To scale up more easily.
  • To make reports faster.
  • To ease the burden on day-to-day operations.
  • To enhance data security with better access control.

How do we set up a new system or architecture?

  • Northwind traders can shift their current database to GCP.
  • The on-site MySQL can be swapped with a fully managed cloud SQL.
  • To handle reports, we'll create an OLAP data warehouse on GCP using BigQuery.
  • We'll construct a Dimensional Data Warehouse on BigQuery following Kimball's method, incorporating dim and fact tables.

Identifying Business Requirements

Throughout the interview process with the business and stakeholders, the following business processses were identified:

  • Sales Overview: Overall sales reports to understand better, what is being sold to our customers, what sells the most, where and what sells the least, the goal is to have a general overview of how the business is going.
  • Product Inventory: Understand the current inventory levels and how to improve stock management, what suppliers we have, and how much is being purchased. This will allow Northwind to understand stock management and potentially land better deals with suppliers
  • Customer Reporting: Allow customers to understand their purchase orders, how much and when they are buying, empowering them to make data-driven decisions while Northwind utilizes this data in combination with its sales data.

This means the business is looking forward to getting insights on sales overview, product inventory, and customer reporting.

Identifying required tables from ERD

From the above ERD diagram of the OLTP transactional system, we identify the following required tables that will enable us to meet the business requirements:
  • Customers - Customers who buy items from Northwind
  • Employees - Those who work for Northwind
  • Orders - Sales Order transactions taking place between the customers & Northwind
  • Order Details - Order Details for the Orders placed by customer
  • Inventory Transaction - Transaction details of each inventory
  • Products - Current Northwind products that customers can purchase
  • Shippers - Shipped orders from Northwind to customers
  • Suppliers - Supplies Northwind with required items
  • Invoices - Invoice created for each order
  • Proposed Data Modelling Concepts

    After a series of interviews with the business stakeholders and understanding their business requirements, we proposed the following data modeling concepts:

    • Conceptual Data Model



    • Logical Data Model



    • Physical Data Model



    Proposed Kimball Data Warehouse Architecture

    Below is the proposed data warehouse architecture, which focuses on how the data flows from the old MySQL OLTP database to the new modernized dimensional data warehouse in BigQuery during the migration.



    With the insights from the data modeling and the data warehouse architecture design, we go ahead to create the three layers (datasets) in Bigquery using dbt. These (staging, warehouse, and analytics_obt or reporting) layers are identified by the "dbt prefix".

    The above three data layers help to achieve the business requirements of Northwind and Sales Overview, Product Inventory, and Customer Reporting processes can now be carried out effectively to draw out insights.

    Results

    • The new Data Warehouse uses Bigquery for analytics and Business Intelligence which is more efficient than the previous MySQL system.
    • The Reporting is derived from One Big Table denormalised from Dimensional models
    • Sales Overview, Product Inventory, and Customer Reporting processes can now be carried out effectively to draw out insights

    Getting started on dbt project

    • Commands to install dbt and connect to bigquery here
    • Commands to create tables and insert data here
    • Commands to create Dim and Fact tables in different layers can be found here
    • If you are not able to enable billing for Bigquery on your account, insert data manually by uploading csv files located here

    Resources:

    • Learn more about dbt in the docs
    • Check out Discourse for commonly asked questions and answers
    • Join the chat on Slack for live discussions and support
    • Find dbt events near you
    • Check out the blog for the latest news on dbt's development and best practices

    Find the Medium article I wrote about this project here .


    And finally, click the analytics_engineering folder to see all the folders and files created during this project. 🌟

    About

    This project is about building a dimensional data warehouse in BigQuery by transforming an OLTP system to an OLAP system, using dbt as our data transformation tool.

    Resources

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published

    Languages