# Why should you use surrogate keys in dimensional and fact tables?

Surrogate keys are fundamental in dimensional modeling, particularly in data warehouse design.

1. * Uniqueness & simplicity:
* What: Surrogate keys are artificial, unique indetifiers that represent a record in a table.
* Why important: Unlike natural keys (like a customer's name or email), surrogate keys are simpler and always unique, even if data changes. This makes them perfect for ensuring referential integrity between dimension and fact tables.

2. * Decoupling from business logic:
* What: Surrogate keys are indenpendent of the business meaning of the data.
* Why important: Business rules change over time (e.g changes in product codes or customer IDs), but surrogate keys remain stable, allowing the structure of the data warehouse to remain unaffected by these changes.

3. * Consistency across systems:
* What: Surrogate keys help to unify data from multiple systems.
* Why important: If you integrate data from different systems, the same customer might have different IDs in different systems. Surrogate keys provide a uniform indentifier across these systems, simplifying joins and analyses.

4. * Handling slowly changing dimensions (SCD):
* What: SUrrogate keys are critical when dealing with slowly changing dimensions, especially type 2 SCD.
* Why important: Surrogate keys allow you to track historical changes. For example, if a customer's address changes, a new surrogate key can represent the new record, allowing the warehouse to store both the old and new addresses while maintaining referential integrity.

5. * Efficient indexing & performance:
* What: Surrogate keys streamline joins between fact and dimension tables.
* Why important: Indexing surrogate keys is much faster and more efficient than indexing long, complex natural keys (e.g., text fields or composite keys).

6. * Join efficiency:
* What: Surrogate keys streamline joins between fact and dimension tables.
* Why important: Since surrogate keys are often smaller, single-field keys (usually integers), they improve the speed and efficiency of joins in large data warehouses, reducing query execution time.

* Summary: In both dimensional and fact tables, surrogate keys provide simplicity, stability and efficiency, especially in scenarios involving historical data, complex joins and integration across systems. They also help manage evolving business data without disrupting the data warehouse. 

# How do you model to associate a fact table with its dimensions?
1. Fact table cointains foreign keys:
* What: The fact table holds foreign keys to link to the corresponding dimension tables.
* Why important: Each foreign key  in the fact table references the surrogate key in a dimension table, which enables easy lookups and joins. This is the core structure that links dimensions (e.g., products, customers, time) to the metrics in the fact tables (e.g. sales, revenue).
Example: In a sales fact table:
fct_sales:
    customer_key (foreign key to customer dimension)
    product_key (foreign key to product dimension)
    date_key (foreign key to date dimension)
2. Granularity of the fact table:
* What: The granularity of a fact table defines the level of detail it records.
* Why important: The grain must align with the level of each dimension. If a fact table tracks sales at a daily, product, and customer level, it must have a foreign key for each of those dimensions. This ensures that metrics (like sales) match the appropriate level of detail (e.g., daily sales per product per customer)
3. Conformed dimensions:
* What: Conformed dimensions are dimensions that can be shared across different fact tables.
* Why important: These standardized dimensions ensure consistency across the data warehouse. For example, if both sales and inventory facts use the same product dimension, they'll share a common product_key, allowing easy cross-table reporting.
* Example: The date dimension may be shared across multiple fact tables, such as sales and returns, ensuring that dates are consistently represented across all analyses.
4. Fact table measures and aggregation:
* What: The measures in a fact table are typically numeric values that can be aggregated (e.g. sum, count, average).
* Why important: Dimensions provide the context for these measures. WHen querying the data, the foreign key relationships between fact and dimension tables allows you to roll up measures by any dimension (e.g. totalt sales by customer, product or date). 

Example in fact_sales table:
* Measures: sales_amount, quantity_sold
* Foreign keys: product_key, customer_key, date_key
* You can sum sales_amount by any dimension (e.g. total sales by product, by customer or by date).

5. Star schema design:
* What: The star schema is the simplest and most common data warehouse schema, with the fact table at the center and dimension tables around it.
* Why important: This design is intuitive and efficient for quering. The relationships between fact and dimension tables create a "star" shape, which is easy to understand and allows for fast query performance.

Example in a sales star schema:
* Fact table: fact_sales
* Dimension table: dim_customer, dim_product, dim_date
* Queries can easily retrieve facts (e.g. sales) linked to dimensions (e.g. products, customers, dates).

6. Foreign key integrity:
* Why: Foreign key relationships between fact and dimension tables ensure that referential integrity.
Why important: Every foreign key in the fact table must match a primary key in the dimension table. THis ensures that all facts are linked to valid dimension records, avoiding orphaned or mismatched data.

* Summary: To model a fact table with its dimensions, focus on the following key concepts:

* Fact tables contain foreign keys pointing to dimension tables.
* The granularity of the fact table must align with the dimensions.
* Use conformed dimensions across fact tables for consistency.
* Organize data in a star schema for simplicity and performance.

1. Business requirements change:
* What: Business needs evolve over time.
* Why important: If you predefine how data should be sliced and diced, you risk building a structure that dosen't adapt easily when new questions arise. The data warehouse should be flexible enough to accommodate unforseen queries, metrics and dimensions. 
* Example: Of you pre-slice sales data by region and product type, you might later find the business needs a breakdown by customer segment, which wasen't anticipated.

2. Reduces flexibility:
* What: Pre-slicing data locks it into specific categories or groupings.
* Why important: This lack of flexibility limits users from performing ad hoc analysis. The goal of a data warehouse is to empower users to explore data dynamically without being constrained by predefined structures.
* Example: If you pre-slices sales data by region and product type, you might later find the business needs a breakdown by customer segment, which wasn't anticipated.

3. Data explosion:
* What: Pre-slicing and predicing can result in data explosion, where the warehouse holds a massive number of unnecessary aggregated views.
* Why important: Storing pre-aggregated data at many different levels increases storage needs and maintenance complexity, without guaranteeing that the aggregates will be useful in the future. It's better to store raw, detailed data and let users aggregate as needed.

4. Loss of granularity:
* What: Pre-aggregating data leads to a loss of detail (granularity).
* Why important: When the data is pre-aggregated, you lose access to the most detailed level of information. This can be problematic when deeper analysis is needed. Keeping data at its most granular level allows maximum flexibility in analysis.
* Example: If sales data is pre-aggregated by month, you lose the ability to analyze daily trends without reverting to the raw data.

5. Modern tools and performance:
* What: Modern data processing tools and databases (like columnar storage, in-memory processing and OLAP-cubes) are optimized for on-demand slicing and dicing.
* Why important: The need for preslicing data has diminished because modern tools can efficiently handle dynamic queries and aggregations. It's better to let users define their own slices based on their current needs rather than anticipating every potential query.
* Example: A modern OLAP system can handle slicing sales data by any dimension (e.g. product, region, customer) on the fly, without needing pre-aggregated tables.

* Summary: Preslicing and predicing data reduces the flexibility, risks data explosion, and leads to a loss of granularity, while failing to adapt to changing business requirements. Modern tools are optimized for on demand data slicing, making it more efficient to keep detailed, raw data and let users aggregate based on their needs.