# Fact and Dimension Tables

## The Four Main Types of Data Warehousing Fact Tables

There are four primary types of fact tables in dimensional modeling:

1. **Transaction Fact Tables**
2. **Periodic Snapshot Fact Tables**
3. **Accumulating Snapshot Fact Tables**
4. **Factless Fact Tables**

Each type serves different business requirements and analytical needs.

## The Role of Transaction Fact Tables

**Transaction Fact Tables** record business events at the moment they occur.

### Characteristics:
- One row per transaction/event
- Records discrete business events
- Most common type of fact table
- Captures "what happened" at a point in time

### Examples:
- Sales transactions
- Order line items
- Invoice line items
- Payment transactions
- Shipment events

### Structure:
```sql
CREATE TABLE sales_transaction_fact (
    transaction_date_key INT,
    product_key INT,
    customer_key INT,
    store_key INT,
    salesperson_key INT,
    transaction_number VARCHAR(50),  -- Degenerate dimension
    sales_amount DECIMAL(10,2),
    quantity_sold INT,
    discount_amount DECIMAL(10,2),
    cost_amount DECIMAL(10,2),
    profit_amount DECIMAL(10,2)
);
```

### Grain:
- One row = one transaction line item
- Lowest level of detail
- Supports detailed analysis

### Use Cases:
- Point-in-time analysis
- Detailed transaction reporting
- Audit trails
- Operational reporting

## The Rules Governing Facts and Transaction Fact Tables

### Fact Table Rules:

1. **Grain Definition**
   - Clearly define the grain (level of detail)
   - One row = one measurement event
   - Document grain explicitly

2. **Additivity**
   - Facts should be additive
   - Can be summed across dimensions
   - Handle semi-additive and non-additive facts appropriately

3. **Foreign Keys**
   - All foreign keys must have corresponding dimension rows
   - Use surrogate keys, not natural keys
   - Maintain referential integrity

4. **Degenerate Dimensions**
   - Transaction numbers, invoice numbers
   - Stored in fact table (not a separate dimension)
   - Used for drill-down to transaction detail

5. **Null Handling**
   - Avoid null foreign keys (use "Unknown" dimension rows)
   - Null facts may be acceptable (use 0 or NULL based on business rules)
   - Document null handling strategy

6. **Fact Types**
   - **Measures**: Numeric facts (amounts, quantities)
   - **Text Facts**: Rare, usually in dimensions
   - **Counts**: Can be facts or derived

### Transaction Fact Table Specific Rules:

- **Time Stamps**: Include transaction timestamp
- **Event-Based**: One event = one row
- **No Aggregation**: Store at transaction level
- **Historical**: Preserve all historical transactions

## Primary and Foreign Keys for Fact Tables

### Primary Keys:

**Composite Primary Keys:**
- Made up of multiple columns
- Include all dimension foreign keys
- Include degenerate dimensions
- Ensure uniqueness

**Example:**
```sql
CREATE TABLE sales_transaction_fact (
    transaction_date_key INT,
    product_key INT,
    customer_key INT,
    store_key INT,
    transaction_number VARCHAR(50),
    line_item_number INT,
    PRIMARY KEY (transaction_date_key, product_key, customer_key, 
                 store_key, transaction_number, line_item_number)
);
```

### Foreign Keys:

**Characteristics:**
- Link to dimension tables
- Use surrogate keys (integers)
- All foreign keys should have corresponding dimension rows
- Index all foreign keys for performance

**Example:**
```sql
CREATE TABLE sales_transaction_fact (
    transaction_date_key INT REFERENCES date_dimension(date_key),
    product_key INT REFERENCES product_dimension(product_key),
    customer_key INT REFERENCES customer_dimension(customer_key),
    store_key INT REFERENCES store_dimension(store_key),
    sales_amount DECIMAL(10,2),
    quantity_sold INT
);
```

### Key Best Practices:

- **Surrogate Keys**: Always use surrogate keys in fact tables
- **Indexing**: Index all foreign key columns
- **Referential Integrity**: Enforce or validate referential integrity
- **Unknown Keys**: Use -1 or 0 for unknown/not applicable
- **Composite Keys**: Use when single key doesn't ensure uniqueness

## The Role of Periodic Snapshot Fact Tables

**Periodic Snapshot Fact Tables** capture the state of a process at regular intervals.

### Characteristics:
- One row per period per entity
- Regular time intervals (daily, weekly, monthly)
- Captures "what is" at end of period
- Shows status at specific points in time

### Examples:
- Daily account balances
- Monthly inventory levels
- Weekly sales summaries
- Quarterly financial statements

### Structure:
```sql
CREATE TABLE account_balance_snapshot_fact (
    snapshot_date_key INT,
    account_key INT,
    customer_key INT,
    account_type_key INT,
    account_balance DECIMAL(15,2),
    number_of_transactions INT,
    average_transaction_amount DECIMAL(10,2)
);
```

### Grain:
- One row = one entity per time period
- Example: One account per day

### Use Cases:
- Trend analysis over time
- Period-over-period comparisons
- Status reporting
- Performance monitoring

## Periodic Snapshots and Semi-Additive Facts

### Semi-Additive Facts:

**Definition**: Facts that can be summed across some dimensions but not others.

**Common Pattern**: Can sum across all dimensions EXCEPT time.

### Examples:

1. **Account Balances**
   - Can sum across accounts: Total balance for all accounts
   - Cannot sum across time: Sum of daily balances is meaningless
   - Use: Average, maximum, or last value for time aggregation

2. **Inventory Levels**
   - Can sum across products: Total inventory
   - Cannot sum across time: Sum of daily inventory is meaningless
   - Use: Average or snapshot value

3. **Headcount**
   - Can sum across departments: Total employees
   - Cannot sum across time: Sum of daily headcount is meaningless
   - Use: Average or last value

### SQL Examples:

```sql
-- CORRECT: Sum across accounts (not time)
SELECT 
    account_type,
    SUM(account_balance) as total_balance
FROM account_balance_snapshot_fact
WHERE snapshot_date_key = 20240101
GROUP BY account_type;

-- CORRECT: Average across time
SELECT 
    account_type,
    AVG(account_balance) as avg_balance,
    MAX(account_balance) as max_balance
FROM account_balance_snapshot_fact
WHERE snapshot_date_key BETWEEN 20240101 AND 20240131
GROUP BY account_type;

-- INCORRECT: Summing across time
SELECT 
    account_type,
    SUM(account_balance) as total_balance  -- WRONG!
FROM account_balance_snapshot_fact
WHERE snapshot_date_key BETWEEN 20240101 AND 20240131
GROUP BY account_type;
```

### Handling Semi-Additive Facts:

- **Document**: Clearly document which dimensions can be summed
- **BI Tools**: Configure tools to handle semi-additive facts
- **Calculations**: Use appropriate aggregation functions
- **Education**: Train users on proper aggregation

## The Role of Accumulating Snapshot Fact Tables

**Accumulating Snapshot Fact Tables** track a process through multiple milestones until completion.

### Characteristics:
- One row per process instance (e.g., order, claim)
- Multiple date columns for milestones
- Facts accumulate as process progresses
- Updated as process moves through stages

### Examples:
- Order fulfillment (order date, ship date, delivery date)
- Insurance claims (claim date, investigation date, approval date, payment date)
- Manufacturing (start date, assembly date, quality check date, ship date)
- Loan processing (application date, approval date, funding date, closing date)

### Structure:
```sql
CREATE TABLE order_fulfillment_fact (
    order_key INT PRIMARY KEY,
    order_date_key INT,
    requested_ship_date_key INT,
    actual_ship_date_key INT,
    delivery_date_key INT,
    customer_key INT,
    product_key INT,
    order_quantity INT,
    shipped_quantity INT,
    delivered_quantity INT,
    days_to_ship INT,
    days_to_deliver INT,
    order_amount DECIMAL(10,2),
    shipping_cost DECIMAL(10,2)
);
```

### Grain:
- One row = one process instance (e.g., one order)
- Row is updated as process progresses

### Key Features:

1. **Multiple Dates**: Track multiple milestones
2. **Accumulating Facts**: Facts accumulate (e.g., quantity shipped, then delivered)
3. **Lag Calculations**: Calculate time between milestones
4. **Status Tracking**: Track current status of process

### Use Cases:
- Process performance analysis
- Bottleneck identification
- Cycle time analysis
- Workflow monitoring

## Accumulating Snapshot Fact Table Example

### Order Fulfillment Process:

**Milestones:**
1. Order placed
2. Order confirmed
3. Order picked
4. Order shipped
5. Order delivered

**Fact Table:**
```sql
CREATE TABLE order_fulfillment_fact (
    order_key INT PRIMARY KEY,
    order_number VARCHAR(50),
    
    -- Date Keys
    order_date_key INT,
    confirmation_date_key INT,
    pick_date_key INT,
    ship_date_key INT,
    delivery_date_key INT,
    
    -- Dimension Keys
    customer_key INT,
    product_key INT,
    warehouse_key INT,
    carrier_key INT,
    
    -- Accumulating Facts
    ordered_quantity INT,
    confirmed_quantity INT,
    picked_quantity INT,
    shipped_quantity INT,
    delivered_quantity INT,
    
    -- Lag Calculations
    days_to_confirm INT,
    days_to_pick INT,
    days_to_ship INT,
    days_to_deliver INT,
    total_days_to_deliver INT,
    
    -- Amounts
    order_amount DECIMAL(10,2),
    shipping_cost DECIMAL(10,2),
    
    -- Status
    current_status VARCHAR(50)
);
```

### Update Pattern:

```sql
-- Initial insert when order is placed
INSERT INTO order_fulfillment_fact VALUES (...);

-- Update when order is confirmed
UPDATE order_fulfillment_fact
SET confirmation_date_key = ...,
    confirmed_quantity = ...,
    days_to_confirm = ...
WHERE order_key = ...;

-- Update when order is shipped
UPDATE order_fulfillment_fact
SET ship_date_key = ...,
    shipped_quantity = ...,
    days_to_ship = ...
WHERE order_key = ...;
```

## Why a Factless Fact Table isn't a Contradiction in Terms

**Factless Fact Tables** contain no numeric facts, only foreign keys to dimensions.

### Why They Exist:

Even without numeric measures, these tables record important business events that need to be analyzed.

### Types of Factless Fact Tables:

1. **Event Tracking**
   - Record that an event occurred
   - No numeric measurement
   - Example: Student attendance, product promotions

2. **Coverage Tables**
   - Record what didn't happen
   - Identify gaps or missing events
   - Example: Products not sold, students not enrolled

### Examples:

**Event Tracking - Student Attendance:**
```sql
CREATE TABLE student_attendance_fact (
    date_key INT,
    student_key INT,
    class_key INT,
    attendance_status_key INT  -- Present, Absent, Excused
    -- No numeric facts, just the event
);
```

**Coverage - Product Promotions:**
```sql
CREATE TABLE product_promotion_fact (
    date_key INT,
    product_key INT,
    promotion_key INT,
    store_key INT
    -- Records that promotion was active, no sales amount
);
```

**Coverage - Coverage Gap Analysis:**
```sql
CREATE TABLE product_store_coverage_fact (
    date_key INT,
    product_key INT,
    store_key INT
    -- Records which products should be in which stores
    -- Helps identify missing products
);
```

### Use Cases:

- **Attendance Tracking**: Who attended events
- **Coverage Analysis**: What should exist but doesn't
- **Event Analysis**: When events occurred
- **Gap Analysis**: Identify missing combinations

### Analysis Examples:

```sql
-- Count events (the "fact" is the count)
SELECT 
    class_key,
    COUNT(*) as attendance_count
FROM student_attendance_fact
WHERE attendance_status_key = 'Present'
GROUP BY class_key;

-- Coverage gaps
SELECT 
    product_key,
    store_key,
    COUNT(*) as days_missing
FROM product_store_coverage_fact
WHERE NOT EXISTS (
    SELECT 1 FROM sales_fact
    WHERE sales_fact.product_key = product_store_coverage_fact.product_key
    AND sales_fact.store_key = product_store_coverage_fact.store_key
)
GROUP BY product_key, store_key;
```

## Compare the Structure of Fact Tables in Star Schemas vs. Snowflake Schemas

### Star Schema Fact Tables:

**Structure:**
- Direct foreign keys to all dimensions
- No intermediate dimension tables
- Simpler structure
- Fewer joins required

**Example:**
```sql
CREATE TABLE sales_fact (
    date_key INT,
    product_key INT,      -- Direct to product dimension
    customer_key INT,     -- Direct to customer dimension
    store_key INT,        -- Direct to store dimension
    sales_amount DECIMAL(10,2),
    quantity INT
);
```

**Characteristics:**
- All dimensions at same level
- Denormalized dimensions
- Faster queries
- More storage in dimensions

### Snowflake Schema Fact Tables:

**Structure:**
- Foreign keys to normalized dimensions
- Dimensions may reference other dimensions
- More normalized structure
- More joins required

**Example:**
```sql
CREATE TABLE sales_fact (
    date_key INT,
    product_key INT,      -- References product dimension
    customer_key INT,     -- References customer dimension
    store_key INT,        -- References store dimension
    sales_amount DECIMAL(10,2),
    quantity INT
);

-- Product dimension references category
CREATE TABLE product_dimension (
    product_key INT PRIMARY KEY,
    category_key INT,     -- References category dimension
    ...
);

-- Category dimension
CREATE TABLE category_dimension (
    category_key INT PRIMARY KEY,
    ...
);
```

**Characteristics:**
- Hierarchical dimensions
- Normalized dimensions
- More complex queries
- Less storage in dimensions

### Comparison:

| Aspect | Star Schema | Snowflake Schema |
|--------|------------|------------------|
| **Fact Table Structure** | Same | Same |
| **Dimension Access** | Direct | Through hierarchy |
| **Joins Required** | Fewer | More |
| **Query Performance** | Faster | Slower |
| **Storage** | More in dimensions | Less in dimensions |

**Key Point**: The fact table structure is the same in both schemas. The difference is in how dimensions are structured.

## SQL for Dimension and Fact Tables

### Creating Dimension Tables:

```sql
-- Product Dimension (Star Schema)
CREATE TABLE product_dimension (
    product_key INT PRIMARY KEY,
    product_id VARCHAR(50),
    product_name VARCHAR(100),
    category VARCHAR(50),
    subcategory VARCHAR(50),
    brand VARCHAR(50),
    price DECIMAL(10,2),
    effective_date DATE,
    expiry_date DATE,
    is_current BOOLEAN
);

-- Create indexes
CREATE INDEX idx_product_id ON product_dimension(product_id);
CREATE INDEX idx_product_category ON product_dimension(category);
```

### Creating Fact Tables:

```sql
-- Sales Fact Table
CREATE TABLE sales_fact (
    date_key INT,
    product_key INT,
    customer_key INT,
    store_key INT,
    sales_amount DECIMAL(10,2),
    quantity INT,
    cost_amount DECIMAL(10,2),
    profit_amount DECIMAL(10,2),
    PRIMARY KEY (date_key, product_key, customer_key, store_key),
    FOREIGN KEY (date_key) REFERENCES date_dimension(date_key),
    FOREIGN KEY (product_key) REFERENCES product_dimension(product_key),
    FOREIGN KEY (customer_key) REFERENCES customer_dimension(customer_key),
    FOREIGN KEY (store_key) REFERENCES store_dimension(store_key)
);

-- Create indexes
CREATE INDEX idx_sales_date ON sales_fact(date_key);
CREATE INDEX idx_sales_product ON sales_fact(product_key);
CREATE INDEX idx_sales_customer ON sales_fact(customer_key);
```

### Querying Fact and Dimension Tables:

```sql
-- Star Schema Query
SELECT 
    d.category,
    d.brand,
    SUM(f.sales_amount) as total_sales,
    SUM(f.quantity) as total_quantity,
    COUNT(*) as transaction_count
FROM sales_fact f
JOIN product_dimension d ON f.product_key = d.product_key
JOIN date_dimension dt ON f.date_key = dt.date_key
WHERE dt.year = 2024
    AND dt.quarter = 'Q1'
GROUP BY d.category, d.brand
ORDER BY total_sales DESC;
```

### Best Practices:

- **Surrogate Keys**: Always use in fact tables
- **Indexes**: Index all foreign keys
- **Partitioning**: Partition fact tables by date
- **Constraints**: Use foreign key constraints or validate
- **Null Handling**: Use "Unknown" dimension rows, not NULLs

## Summarize Fact and Dimension Tables

### Key Takeaways:

1. **Four Fact Table Types**:
   - Transaction: Event-based, one row per event
   - Periodic Snapshot: State at regular intervals
   - Accumulating Snapshot: Process through milestones
   - Factless: Events without numeric measures

2. **Dimension Tables**:
   - Descriptive attributes
   - Surrogate keys
   - Support filtering and grouping
   - Star (denormalized) vs. Snowflake (normalized)

3. **Fact Tables**:
   - Numeric measurements
   - Foreign keys to dimensions
   - Additive facts (with exceptions)
   - Clear grain definition

4. **Design Principles**:
   - Business process focus
   - Clear grain
   - Surrogate keys
   - Proper indexing
   - Referential integrity

### Next Steps:
- Learn about Slowly Changing Dimensions
- Understand ETL design patterns
