##Test 1: Referential Integrity Check

Description: Checking if every sale has a customer and a product assigned (making sure there are no empty/missing fields).
Expected Result: Both missing_products and missing_customers should be 0.

In [0]:
%sql
SELECT 
    'Fact Sales' as table_name,
    COUNT(*) as total_rows,
    SUM(CASE WHEN product_key IS NULL THEN 1 ELSE 0 END) as missing_products,
    SUM(CASE WHEN customer_key IS NULL THEN 1 ELSE 0 END) as missing_customers
FROM workspace.gold.fact_sales;

##Test 2: Join Coverage Analysis

Description: Confirming that the number of linked products and customers matches the total number of sales exactly.
Expected Result: total_sales, linked_products, and linked_customers must be identical, confirming 100% join coverage.

In [0]:
%sql
SELECT 
    'Join Analysis' AS check_type,
    COUNT(*) AS total_sales,
    COUNT(product_key) AS linked_products,
    COUNT(customer_key) AS linked_customers
FROM workspace.gold.fact_sales;

## Test 3: Revenue Summary by Category

Description:Showing total revenue by category to see if the numbers make sense from a business perspective.
Expected Result: A ranked list of categories by revenue, with no NULL categories and realistic financial totals.

In [0]:
%sql
SELECT 
    p.category,
    p.subcategory,
    ROUND(SUM(f.sales_amount), 2) AS total_revenue,
    SUM(f.quantity) AS total_quantity
FROM workspace.gold.fact_sales f
JOIN workspace.gold.dim_products p ON f.product_key = p.product_key
GROUP BY 1, 2
ORDER BY total_revenue DESC;

## Test 4: Sales Uniqueness Check (Fact Table)

Description: Checking if there is more than one entry for the same product within the same order. We want to ensure that each transaction is unique and not accidentally duplicated.

Expected Result: No rows should be returned (An empty table means no duplicates).

In [0]:
%sql
SELECT 
    order_number, 
    product_key, 
    COUNT(*) AS record_count
FROM workspace.gold.fact_sales
GROUP BY order_number, product_key
HAVING COUNT(*) > 1;

## Test 5: Source Data Duplicate Audit (Bronze Layer)

Description: Identifying duplicate records in the raw source data. This audit proves that the source system contains redundant information, justifying our deduplication logic in the subsequent layers.

Expected Result: A list of orders and products that appear more than once in the Bronze layer.

In [0]:
%sql
SELECT 
    sls_ord_num AS order_number, 
    sls_prd_key AS product_id, 
    COUNT(*) AS occurrence_count
FROM bronze.crm_sales_details
GROUP BY sls_ord_num, sls_prd_key
HAVING COUNT(*) > 1;

## Test 6: Product Dimension Uniqueness

Description: Verifying that each product exists only once in the Product Dimension table. This ensures that our deduplication logic successfully created a clean list of unique products.

Expected Result: No rows should be returned (Each product must be unique).

In [0]:
%sql
SELECT 
    product_number, 
    COUNT(*) AS record_count
FROM workspace.gold.dim_products 
GROUP BY product_number 
HAVING COUNT(*) > 1;

## Test 7: Customer Dimension Uniqueness

Description: Verifying that each customer is unique within the Customer Dimension table. This confirms that the deduplication process correctly handled any redundant customer records from the source systems.
Expected Result: No rows should be returned.

In [0]:
%sql
SELECT 
    customer_id, 
    COUNT(*) AS record_count
FROM workspace.gold.dim_customers 
GROUP BY customer_id 
HAVING COUNT(*) > 1;

## Test 8: Final Financial Reconciliation

Description: This is the ultimate validation of our ETL pipeline. It compares the total revenue between the raw Source (Bronze) and the final analytical layer (Gold). A matching total confirms that our deduplication and transformation processes preserved the financial integrity of the data without any loss or inflation.
Expected Result: Both layers must show the exact same total_amount.

In [0]:
%sql
SELECT 
    'Bronze Layer (Source)' AS layer,
    ROUND(SUM(sls_sales), 2) AS total_amount
FROM bronze.crm_sales_details

UNION ALL

SELECT 
    'Gold Layer (Target)' AS layer,
    ROUND(SUM(sales_amount), 2) AS total_amount
FROM workspace.gold.fact_sales;