Skip to content

Our project for Datawarehouse Course taken during fall 2024 semester

Notifications You must be signed in to change notification settings

Salma-Mamdoh/Datawarehouse_Project

Repository files navigation

Datawarehouse_Project ✨📊

Table of Contents

Northwind Database Schema

The Northwind database is a classic and fictional relational database that simulates the operations of a small trading company called Northwind Traders. The database is structured into several tables representing entities such as products, customers, orders, employees, suppliers, and categories. Each table contains attributes relevant to its respective entity, facilitating the storage and management of data related to the company's transactions, inventory, personnel, and interactions with customers and suppliers.

Main Tables

  • Products: Stores information about the products offered by Northwind Traders.
  • Customers: Maintains the details of the customers who place orders.
  • Orders: Captures the order details, including the customer, order date, and shipping information.
  • Order Details: Stores the individual line items for each order, including the product, quantity, and unit price.
  • Employees: Holds the data about the company's employees, such as their names, job titles, and reporting structure.
  • Suppliers: Keeps track of the suppliers who provide products to Northwind Traders.
  • Categories: Organizes the products into different categories.

Database Schema

db schems

KPIs

The following table outlines a set of essential KPIs relevant to our project

KPI Definition
Total Orders Handled The aggregate count of orders managed and fulfilled by shippers, indicating operational throughput and workload distribution efficiency.
Average Shipment Time The mean duration taken by each shipper to complete order deliveries, reflecting the speed and reliability of shipping services provided.
Average Shipment Cost The average expense incurred per shipment by individual shippers, serving as a key metric for evaluating shipping cost effectiveness and operational expenditure management.
Total Units Sold The cumulative quantity of units sold across all product offerings, indicating sales volume and market demand for each product.
Total Revenue The aggregated revenue generated from the sale of all products, serving as a primary indicator of sales performance and revenue generation capability.
Revenue per Unit The average revenue generated per unit sold for each product, providing insights into pricing strategies, product profitability, and customer purchasing behavior.
Total Number of Orders The overall count of orders placed by customers, representing sales activity levels and customer engagement with the business.
Total Sales Revenue The total monetary value generated from all sales transactions, serving as a key financial metric for assessing business performance and revenue growth.
Average Order Value The mean monetary value of individual orders placed by customers, providing insights into customer spending patterns and purchase behavior, and informing marketing and sales strategies.

Data Warehouse Design

Data Staging Area

The data staging area plays a pivotal role in the construction of the data warehouse. It serves as an intermediary step where data is collected, transformed, and prepared for integration into the warehouse.

  • Data Extraction: Relevant data is extracted from source systems, such as the Northwind database.
  • Data Transformation: Data undergoes transformations to standardize formats, handle null values, and perform other necessary modifications.
  • Data Loading: Transformed data is loaded into staging tables within the data warehouse, ensuring cleanliness and consistency for downstream processing.

Data Warehouse Construction

Facts
Fact Table Description
F_Shipper Captures shipping-related metrics such as the number of orders handled, average shipment time, and average shipment cost.
F_Product Stores data on product sales, including the number of sold items, total revenue, and supplier information.
F_Sales Tracks sales transactions made by employees, including the number of orders served and total revenue generated.
Dimensions
Dimension Table Description
D_Date Contains date-related attributes for time-based analysis.
D_Location Stores location information for customers, suppliers, and shipping destinations.
D_Customer Holds details about customers, such as name, location, and segment.
D_Employee Stores information about employees, including their names and job titles.
D_Supplier Contains details about suppliers, such as name and location.
D_Product Stores information about products, including category, supplier, and unit price.
D_Shipper Contains details about shipping companies, such as name and location.

Datawarehouse Schema

dwh schema

ETL Process

In this project, I utilized SQL Server Integration Services (SSIS) to design and implement a robust ETL process for populating a data warehouse, seamlessly integrating diverse data sources. To optimize the ETL, I implemented an incremental load strategy, enabling real-time data integration and analysis. The data warehouse schema supported comprehensive analysis, with fact tables capturing key business metrics complemented by dimension tables providing essential context. By leveraging SSIS and incremental loading, I delivered a solution that empowers stakeholders with timely and accurate insights to drive informed decisions.

This is a screenshot of the ETL Process for Product Fact Table loading

ssis1 ssis2 ssis3

To see other packages download folder SSIS and enjoy :)

Deploying and Scheduling

In the deployment and scheduling phase, the SSIS catalog facilitated the organization and deployment of SSIS projects, packages, and environments. SQL Server Agent jobs were utilized to automate the execution of SSIS packages, enabling efficient ETL workflows. Each job executed specific SSIS packages responsible for loading dimension and fact tables. By defining job schedules and dependencies, ETL processes ran reliably according to business requirements. The centralized approach streamlined management and integration with other SQL Server components. This framework provided a robust foundation for automating and managing ETL processes, facilitating informed decision-making and driving business outcomes.

Analytical Queries

  • Query 1: Identifies the fastest shipper for each country based on average shipment time during the specified period.
  • Query 2: Finds the top supplier in terms of revenue for each country during the specified period.
  • Query 3: Calculates the total number of units sold by each supplier during the specified period.
  • Query 4: Calculates the average shipment cost for each country.
  • Query 5: Calculates the total number of orders handled by each employee during the specified period.
  • Query 6: Calculates the total number of orders processed by each shipper during the specified period.
  • Query 7: Calculates the total revenue generated by each country during the specified period.

Interactive Dashboard Using Power BI

  • Category Sales Analysis: This section displays the total sales revenue for different product categories, providing insights into the best-selling and underperforming product lines.
  • Customer Order Distribution: Here, the distribution of orders among major customers is visualized through a pie chart, highlighting the top contributors to the company's revenue.
  • Global Revenue Overview: This section presents the total revenue generated from different countries, offering insights into the company's global reach and performance.
  • Employee Performance: Employee productivity and performance are showcased here, listing the top employees by the number of orders processed.

dwh schema

About

Our project for Datawarehouse Course taken during fall 2024 semester

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages