# Data Warehousing Architecture

## Introduction to Data Warehousing Architecture

Data warehousing architecture defines how data flows from source systems to end users. Understanding different architectural patterns helps you design solutions that meet your organization's needs.

## Build a Centralized Data Warehouse

A **Centralized Data Warehouse** is a single, unified repository that:
- Stores all enterprise data in one location
- Provides a single source of truth
- Supports enterprise-wide analytics
- Requires significant infrastructure and coordination

### Advantages:
- Single source of truth
- Consistent data across organization
- Centralized management
- Reduced data redundancy

### Disadvantages:
- High initial cost
- Complex implementation
- Longer development cycles
- Potential performance bottlenecks

## Compare a Data Warehouse to a Data Mart

| Aspect | Data Warehouse | Data Mart |
|--------|---------------|-----------|
| **Scope** | Enterprise-wide | Department/subject-specific |
| **Data Volume** | Large (terabytes to petabytes) | Smaller (gigabytes to terabytes) |
| **Users** | Organization-wide | Specific department/users |
| **Implementation Time** | Months to years | Weeks to months |
| **Cost** | High | Lower |
| **Data Granularity** | Detailed and summarized | Usually summarized |
| **Source** | Multiple operational systems | Data warehouse or specific sources |

### Data Mart Types:
1. **Dependent Data Mart**: Sourced from data warehouse
2. **Independent Data Mart**: Built directly from source systems
3. **Hybrid Data Mart**: Combination of warehouse and source data

## Decide Which Component-Based Architecture is Your Best Fit

### 1. **Inmon Approach (Top-Down)**
- Start with enterprise data warehouse
- Create data marts from warehouse
- Normalized data model
- Single source of truth
- Best for: Large enterprises, long-term strategy

### 2. **Kimball Approach (Bottom-Up)**
- Start with data marts
- Combine into data warehouse
- Dimensional data model (star schema)
- Faster time to value
- Best for: Quick wins, departmental needs

### 3. **Hybrid Approach**
- Combine both methodologies
- Use data warehouse for enterprise data
- Use data marts for specific needs
- Flexible and scalable
- Best for: Organizations needing both approaches

### Decision Factors:
- **Business Requirements**: Enterprise-wide vs. departmental
- **Timeline**: Long-term vs. quick wins
- **Resources**: Budget and team size
- **Data Complexity**: Simple vs. complex integration needs

## Include Cubes in Your Data Warehousing Environment

**OLAP Cubes** (Online Analytical Processing) are:
- Multidimensional data structures
- Pre-aggregated data for fast queries
- Organized by dimensions and measures
- Optimized for analytical queries

### Cube Benefits:
- Fast query performance
- Pre-calculated aggregations
- Support for complex analytical queries
- User-friendly for business users

### Cube Types:
1. **MOLAP** (Multidimensional OLAP): Stores data in multidimensional arrays
2. **ROLAP** (Relational OLAP): Uses relational database
3. **HOLAP** (Hybrid OLAP): Combines MOLAP and ROLAP

### When to Use Cubes:
- Complex analytical queries
- Need for fast query performance
- Pre-defined analytical requirements
- Business user self-service analytics

## Include Operational Data Stores in Your Data Warehousing Environment

An **Operational Data Store (ODS)** is:
- A database designed for operational reporting
- Contains current, integrated operational data
- Sits between operational systems and data warehouse
- Supports near real-time operational queries

### ODS Characteristics:
- **Current Data**: Contains up-to-date operational data
- **Integrated**: Data from multiple sources
- **Subject-Oriented**: Organized by business subjects
- **Volatile**: Data can be updated

### ODS vs. Data Warehouse:

| Aspect | ODS | Data Warehouse |
|--------|-----|---------------|
| **Purpose** | Operational reporting | Analytical reporting |
| **Data Currency** | Current/real-time | Historical |
| **Update Frequency** | Frequent updates | Periodic loads |
| **Data Volume** | Moderate | Large |
| **Query Type** | Simple, operational | Complex, analytical |

### When to Use ODS:
- Need for current operational data
- Real-time or near real-time reporting
- Integration layer before data warehouse
- Operational decision support

## Explore the Role of the Staging Layer Inside a Data Warehouse

The **Staging Layer** is:
- A temporary storage area for data in transit
- Used during ETL processes
- Acts as a buffer between source and target
- Supports data transformation and validation

### Staging Layer Functions:
1. **Data Extraction**: Receive data from source systems
2. **Data Validation**: Check data quality and integrity
3. **Data Transformation**: Prepare data for loading
4. **Error Handling**: Manage data quality issues
5. **Audit Trail**: Track data lineage and changes

### Benefits:
- Isolates source systems from warehouse
- Enables data quality checks
- Supports incremental processing
- Provides audit capabilities
- Allows for data recovery

## Compare the Two Types of Staging Layers

### 1. **Persistent Staging Area (PSA)**
- Data is retained after processing
- Maintains historical record of all loads
- Supports data recovery and reprocessing
- Requires more storage space

**Characteristics:**
- Permanent storage
- Historical data retention
- Full audit trail
- Data recovery capability

**Use Cases:**
- Regulatory compliance
- Data lineage requirements
- Need for reprocessing
- Audit requirements

### 2. **Transient Staging Area (TSA)**
- Data is deleted after successful processing
- Temporary storage only
- Lower storage requirements
- Faster processing

**Characteristics:**
- Temporary storage
- Data deleted after load
- Minimal storage footprint
- Faster ETL processes

**Use Cases:**
- Limited storage resources
- Simple ETL processes
- No audit requirements
- High-volume, high-frequency loads

### Comparison:

| Aspect | Persistent Staging | Transient Staging |
|--------|-------------------|-------------------|
| **Data Retention** | Permanent | Temporary |
| **Storage** | Higher | Lower |
| **Audit Trail** | Complete | Limited |
| **Recovery** | Full recovery | Limited recovery |
| **Performance** | Slower (more data) | Faster |
| **Cost** | Higher | Lower |

## Summarize Data Warehousing Architecture

### Key Components:
1. **Source Systems**: Operational databases and applications
2. **Staging Layer**: Temporary storage for ETL processing
3. **Data Warehouse**: Central repository for integrated data
4. **Data Marts**: Department-specific data subsets
5. **OLAP Cubes**: Multidimensional analytical structures
6. **ODS**: Operational data store for current data
7. **BI Tools**: Reporting and analytics layer

### Architectural Patterns:
- **Centralized**: Single enterprise warehouse
- **Distributed**: Multiple warehouses/data marts
- **Hybrid**: Combination of centralized and distributed

### Design Considerations:
- Business requirements and use cases
- Data volume and velocity
- Performance requirements
- Budget and resources
- Compliance and audit needs
