
---

## 🧱 What is Dimensional Modeling?

> Dimensional Modeling is a data modeling technique optimized for data warehouse and business intelligence systems. It focuses on **measurable business processes** (facts) and **contextual attributes** (dimensions) to make analytical queries fast and user-friendly.

---

## 🔵 FACT TABLES

Fact tables hold **quantitative data** — numbers you want to analyze.

### 🔹 Step 1: Select the Business

Let’s say your business is **an e-commerce platform** like Amazon.

#### ➤ What does your business do?

* Allows customers to purchase products online.
* Ships orders via third-party logistics.
* Supports payments via different modes.

#### ➤ What measures do you want to analyze?

These are **facts**:

* Number of orders
* Order amount
* Shipping cost
* Payment success rate
* Discount applied

#### ➤ How does your operational data look?

* A transactional database (OLTP) with tables like:

  * `orders`, `order_items`, `customers`, `products`, `payments`

#### ➤ What would be the granularity?

Granularity is **one row per event**.

**Example:**
One row per **order item**, meaning:

> Order #123, Item: iPhone 13, Quantity: 2, Amount: \$2000, Discount: \$200

This allows slicing data by:

* Customer
* Product
* Region
* Time

#### ➤ End goal of the business:

* Understand what sells, where, to whom, and when
* Improve customer retention and supply chain decisions

---

### 🔹 Types of Fact Tables

| Type                           | Description                                         | Example                                                   |
| ------------------------------ | --------------------------------------------------- | --------------------------------------------------------- |
| **Transactional Fact**         | Record of events at the lowest level of granularity | Every purchase, click, return, etc.                       |
| **Periodic Snapshot Fact**     | Data snapshot at regular intervals                  | Daily stock levels, end-of-month balances                 |
| **Accumulating Snapshot Fact** | Captures a process from start to end                | Order lifecycle: ordered → shipped → delivered → returned |

---

### 🔹 Derived Fact

* **Definition**: Facts calculated using other facts.
* **Example**:

  * Profit = Revenue - Cost
  * Conversion Rate = Orders / Visits
  * Avg. Discount = Total Discount / Orders

These are often calculated in **views** or **BI layer**, not stored unless heavily reused.

---

### 🔹 Factless Fact Table

* **Definition**: Fact table with **no numeric measures** — just foreign keys.
* **Use Case**: To track events or conditions.

| Event Type              | Fact Table Example                               |
| ----------------------- | ------------------------------------------------ |
| Student attendance      | Date, Student, Course                            |
| Product promotion       | Date, Product, Store (whether promotion applied) |
| Healthcare appointments | Date, Doctor, Patient (no measures)              |

These help in answering:

> Did something happen?
> Was it attended? Was it promoted?

---

## 🟢 DIMENSION TABLES

Dimension tables provide **context** to facts. They answer **who, what, where, when, why, how**.

### 🔹 How to identify dimensions

Ask these questions:

| Question                      | Helps you find                      |
| ----------------------------- | ----------------------------------- |
| **Who** placed the order?     | `Customer` dimension                |
| **Which** product was sold?   | `Product` dimension                 |
| **Where** was it delivered?   | `Location` or `Region`              |
| **When** was it ordered?      | `Date` or `Time` dimension          |
| **How** was it paid?          | `Payment Method` dimension          |
| **Why** was discount applied? | `Promotion` or `Campaign` dimension |

### 🔹 Dimension Table Design Guidelines

| Rule             | Description                                           |
| ---------------- | ----------------------------------------------------- |
| Surrogate Key    | Always use an integer surrogate key, not business key |
| Hierarchies      | Useful for drill-down (e.g., Region → City → Store)   |
| Type 1 or Type 2 | Handle changes (overwrite vs. history tracking)       |

---

## 👁️ VIEWS in Dimensional Modeling

### 🔹 Why and When are Views Needed?

* For **Derived Facts**: create calculated columns like margin or rank.
* For **Data Privacy**: mask sensitive data.
* For **Aggregation**: pre-calculate daily revenue summaries.
* For **Modeling Layer**: convert raw tables to dimensional format.

### 🔹 Purpose:

* Abstract **complex joins**
* Reuse logic in one place
* Support **BI tools** with consistent logic

### 🔹 What Problem Does It Solve?

Let’s say:

* Users want product performance **by region and by category**.
* You join `fact_sales` with `dim_product`, `dim_location`, `dim_time`
* If done repeatedly in dashboards, it’s inefficient.

👉 Instead, create a **view** like `vw_sales_by_category_region`:

```sql
SELECT 
  p.category,
  l.region,
  t.date,
  SUM(f.amount) as total_sales
FROM fact_sales f
JOIN dim_product p ON f.product_key = p.product_key
JOIN dim_location l ON f.location_key = l.location_key
JOIN dim_time t ON f.date_key = t.date_key
GROUP BY p.category, l.region, t.date
```

---

## 📘 Important Questions (Must-Know)

1. **What are fact and dimension tables? Give examples.**
2. **How do you decide the granularity of a fact table?**
3. **What is a factless fact table and when do you use it?**
4. **What are the different types of fact tables?**
5. **How do you handle slowly changing dimensions (SCD)?**
6. **What’s the benefit of views in a dimensional model?**
7. **Can you give a real-world example of a derived fact?**
8. **What are surrogate keys and why do we use them in dimensions?**

---

## 🧠 Scenario Summary

For an **e-commerce business**, a dimensional model might look like:

**Fact Table:**

* `fact_sales`: Measures like revenue, quantity, discount.

**Dimension Tables:**

* `dim_customer`
* `dim_product`
* `dim_time`
* `dim_location`
* `dim_payment_type`

This allows answering:

* What are the top 5 selling products this month?
* How does sales differ by region?
* What time of day do we sell the most?

---


---

##  QUESTIONS WITH ANSWERS: Dimensional Modeling

---

### 1. **What is dimensional modeling and why is it important in a data warehouse?**

**Answer:**
Dimensional modeling is a design technique optimized for data retrieval in analytical systems like data warehouses. It structures data into **fact tables** (measurable events) and **dimension tables** (descriptive context), often forming a **star or snowflake schema**.

**Importance:**

* Makes querying faster and simpler for BI tools and analysts.
* Improves **understandability** for business users.
* Supports **performance-optimized** aggregations and drill-downs.

---

### 2. **What is a fact table? What types of fact tables exist?**

**Answer:**
A fact table is the **central table** in a dimensional model. It contains:

* **Measures (facts)**: numerical data like revenue, units sold.
* **Foreign keys** to dimension tables.

**Types:**

* **Transactional fact** – Each row is a business event (e.g., a sale).
* **Periodic snapshot** – Captures metrics at regular intervals (e.g., daily account balance).
* **Accumulating snapshot** – Shows progress of a process (e.g., order → shipped → delivered).

---

### 3. **What is a derived fact table?**

**Answer:**
A **derived fact** table contains **calculated measures** derived from base facts. Example:

* From a `SALES_FACT` table, you derive a `PROFIT_FACT` table by calculating `REVENUE - COST`.

Used when:

* Complex calculations are needed frequently.
* Performance is critical.

---

### 4. **What is a factless fact table? Can you give an example?**

**Answer:**
A **factless fact table** records the **occurrence of an event** but has **no numeric fact**.

**Example:**
A student attendance tracking table:

```
FACT_ATTENDANCE(student_id, class_id, date_key)
```

– No numeric facts, but it shows **who attended what and when**.

---

### 5. **What is a dimension table? How do you identify them?**

**Answer:**
Dimension tables provide **descriptive context** to facts: "Who", "What", "When", "Where", "How".

**Example dimensions**:

* Customer
* Product
* Time
* Location

**Identification approach:**

* Ask business-driven questions like:

  * **Who** is involved? (customer, employee)
  * **What** is the product?
  * **Where** is the transaction happening?
  * **How** was the item delivered?

---

### 6. **What is granularity in a fact table and how do you define it?**

**Answer:**
**Granularity** is the **level of detail** stored in a fact table.

**Example:**

* Low granularity: Daily sales per store.
* High granularity: Each item sold in each transaction.

You define it by:

* Business goals (e.g., do we need line-level insights?)
* Operational data structure
* Storage and performance needs

---

### 7. **Why and when are views used in a data warehouse?**

**Answer:**
**Views** are logical tables built on top of physical tables using SQL queries. They’re used to:

* **Simplify complex logic** for business users.
* Apply **business rules** without modifying base tables.
* Create **security layers** to mask sensitive columns.
* Support **virtual data marts** without duplicating data.

**Example:**
Instead of querying a complex join across 5 tables, you expose a view `vw_SalesSummary` that returns the pre-joined, cleaned data.

---

### 8. **What is the purpose of dimensional modeling in transforming staging to access layer?**

**Answer:**
Dimensional modeling converts **raw, normalized data** into **user-friendly, analytical structures**.

* **Staging layer** holds raw data from sources.
* **Transformation process** applies business logic, joins, lookups, and deduplication.
* **Access layer** stores the final star/snowflake schema for reporting.

Without dimensional modeling:

* Users would deal with complex schemas.
* Query performance would degrade.
* Business logic would be inconsistently applied.

---

### 9. **What is the difference between star schema and snowflake schema?**

| Feature     | Star Schema                               | Snowflake Schema                              |
| ----------- | ----------------------------------------- | --------------------------------------------- |
| Structure   | Dimension tables directly join fact table | Dimensions are normalized into sub-dimensions |
| Query Speed | Faster (fewer joins)                      | Slightly slower (more joins)                  |
| Storage     | Redundant data in dimensions              | Less redundant                                |
| Simplicity  | Easy to understand                        | Complex to navigate                           |
| Use Case    | When performance matters                  | When saving storage is more important         |

---

### 10. **What are slowly changing dimensions (SCD)? Why are they important?**

**Answer:**
SCDs manage changes in dimension data over time.

**Types:**

* **Type 1**: Overwrite (no history kept)
* **Type 2**: Add new row (with version/date — full history)
* **Type 3**: Add new column for change (limited history)

**Why important:**
They help **track historical changes** in business attributes like customer location or product category.

---

### ✅ BONUS: Real-Life Business Scenario

**Use Case:**
A retail company wants to analyze **monthly sales** across regions and product categories.

* **Fact Table**: `FACT_SALES` (transaction-level sales with product\_id, date\_id, store\_id)
* **Dimensions**:

  * `DIM_PRODUCT`: what was sold
  * `DIM_DATE`: when
  * `DIM_STORE`: where
* Analysts create views like `vw_MonthlySalesByRegion`.

This schema helps management:

* See trends
* Forecast demands
* Identify top-performing products

---