# Gold Layer: Fact Sales Transformation
This notebook represents the final stage of the **Medallion Architecture**. 
It creates the **Fact Sales** table by joining our Silver Sales data with Gold Dimensions.

### Key Objectives:
1. **Surrogate Key Mapping**: Replace natural IDs with Gold surrogate keys.
2. **Business Logic**: Calculate `profit_amount` (Sales - Total Cost).
3. **Data Integrity**: Ensure all sales records are linked to valid customers and products.

  Output: Delta table `workspace.gold.fact_sales.`

In [0]:
%python
import pyspark.sql.functions as F

# Define source and target paths    
SILVER_SALES = "workspace.silver.crm_sales"
GOLD_CUST    = "workspace.gold.dim_customers"
GOLD_PROD    = "workspace.gold.dim_products"
GOLD_TARGET  = "workspace.gold.fact_sales"

### Transformation Logic
We are using a `LEFT JOIN` to enrich the sales data. 
If a product or customer is missing in our dimensions, the check at the end will flag it.

In [0]:
%python
# Transformation Logic: Using LEFT JOIN to enrich sales with Surrogate Keys
# We map sales to dimensions using business keys (product_number and customer_id)
query = f"""
SELECT
    sd.order_number,
    pr.product_key,   
    cu.customer_key,  
    sd.order_date,
    sd.ship_date,
    sd.due_date,
    sd.sales_amount,
    sd.quantity,
    sd.price
FROM {SILVER_SALES} sd
LEFT JOIN {GOLD_PROD} pr
    ON sd.product_number = pr.product_number  
LEFT JOIN {GOLD_CUST} cu
    ON sd.customer_id = cu.customer_id        
"""

# Execute and add metadata
df_fact = spark.sql(query).withColumn("gold_ingestion_ts", F.current_timestamp())

# Write to Gold Layer
df_fact.write.mode("overwrite").format("delta").saveAsTable(GOLD_TARGET)

print(f"âœ… Fact Sales created successfully. Mapped via 'product_number' and 'customer_id'.")

## Data Quality & Referential Integrity

These queries verify that every sales transaction is correctly linked to a valid product and customer. Any non-zero value indicates a "Late Arriving Dimension" or a missing record in the source system.

In [0]:

%sql
-- Comprehensive Integrity Audit
SELECT 
    'Product Keys Missing' AS check_type, 
    COUNT(*) AS issue_count 
FROM workspace.gold.fact_sales 
WHERE product_key IS NULL

UNION ALL

SELECT 
    'Customer Keys Missing' AS check_type, 
    COUNT(*) AS issue_count 
FROM workspace.gold.fact_sales 
WHERE customer_key IS NULL;