# Workshop: Lakeflow Declarative Pipelines

**Training Objective:** Build a complete ETL pipeline using Lakeflow Declarative Pipelines (formerly Delta Live Tables) with Bronze-Silver-Gold medallion architecture.

**Topics covered:**
- Lakeflow pipeline creation and configuration
- STREAMING TABLE and MATERIALIZED VIEW concepts
- Auto Loader for streaming ingestion
- Data quality expectations (EXPECT constraints)
- SCD Type 2 implementation
- Star schema in Gold layer

**Duration:** 45 minutes

## Context and Requirements

- **Training Day**: Day 1 - Lakeflow Pipelines
- **Notebook Type**: Workshop
- **Technical Requirements**:
  - Databricks Runtime 13.0+ (recommended: 14.3 LTS)
  - Unity Catalog enabled
  - Permissions: CREATE TABLE, CREATE SCHEMA, SELECT, MODIFY, Workflows access
  - Cluster: Standard with minimum 2 workers

## Theoretical Introduction

**Section Objective:** Understanding Lakeflow Declarative Pipelines fundamentals

**Key Concepts:**
- **Lakeflow Pipeline**: Declarative ETL framework that automates orchestration, error handling, and data quality
- **STREAMING TABLE**: Append-only table for incremental ingestion with exactly-once semantics
- **MATERIALIZED VIEW**: Cached aggregation that updates incrementally based on source changes
- **Auto Loader**: Streaming file ingestion using `read_files()` function
- **SCD Type 2**: Slowly Changing Dimension pattern for tracking historical changes

**Why Lakeflow?**
Traditional ETL requires managing dependencies, retries, and checkpoints manually. Lakeflow handles this automatically - you declare WHAT you want, not HOW to build it.

## Environment Initialization

In [None]:
%run ../00_setup

In [None]:
# Display paths for Lakeflow pipeline configuration
print("="*60)
print("PATHS FOR LAKEFLOW PIPELINE CONFIGURATION")
print("="*60)
print(f"\nCustomers CSV:  {DATASET_BASE_PATH}/customers/")
print(f"Orders JSON:    {DATASET_BASE_PATH}/orders/stream/")
print(f"Products CSV:   {DATASET_BASE_PATH}/products/csv/")
print(f"\nCatalog:        {CATALOG}")
print(f"Schema Bronze:  {BRONZE_SCHEMA}")
print(f"Schema Silver:  {SILVER_SCHEMA}")
print(f"Schema Gold:    {GOLD_SCHEMA}")
print("\n" + "="*60)

---

## Part 1: Creating Lakeflow Pipeline

### Task 1.1: Create Pipeline in Databricks UI

**Objective:** Create a new Lakeflow pipeline in Databricks workspace.

**Instructions:**
1. Navigate to **Workflows → Lakeflow Pipelines**
2. Click **Create Pipeline**
3. Configure:
   - **Name:** `ecommerce_workshop_pipeline`
   - **Product edition:** Advanced
   - **Pipeline mode:** Triggered
   - **Target catalog:** Use your catalog from setup
   - **Target schema:** `lakeflow_workshop`

**Hints:**
- Use "Triggered" mode for workshop (not Continuous)
- Development mode allows full refresh for testing
- You can add SQL files later from source code section

---

### Task 1.2: Configure Pipeline Parameters

**Objective:** Add configuration parameters for source data paths.

**Instructions:**
1. In pipeline settings, find **Configuration** section
2. Add the following key-value pairs:

| Key | Value |
|-----|-------|
| `customers_path` | `/Volumes/.../customers/` |
| `orders_path` | `/Volumes/.../orders/stream/` |
| `products_path` | `/Volumes/.../products/csv/` |

**Hints:**
- Use paths displayed in the setup cell above
- Parameters are referenced in SQL as `${parameter_name}`
- You can also set parameters in JSON pipeline definition

---

---

## Part 2: Bronze Layer - Streaming Ingestion

### Task 2.1: Create Bronze Customers Table

**Objective:** Create a STREAMING TABLE to ingest customer CSV files using Auto Loader.

**Instructions:**
1. Create a new SQL file in the pipeline
2. Use `CREATE OR REFRESH STREAMING TABLE`
3. Use `STREAM read_files()` for Auto Loader
4. Add metadata columns for lineage tracking

**Hints:**
- `read_files()` options: `format => 'csv'`, `header => true`
- `_metadata.file_path` gives source file path
- `_metadata.file_modification_time` gives file timestamp
- `current_timestamp()` for ingestion time

**TODO:** Complete the SQL below:

```sql
-- Bronze Customers - Auto Loader ingestion
CREATE OR REFRESH STREAMING TABLE bronze_customers
COMMENT 'Raw customer data from CSV files'
AS
SELECT
    customer_id,
    first_name,
    last_name,
    email,
    phone,
    city,
    country,
    CAST(registration_date AS DATE) AS registration_date,
    customer_segment,
    -- TODO: Add metadata columns
    _metadata.___ AS _source_file,
    _metadata.___ AS _file_modified_at,
    ___() AS _ingested_at
FROM STREAM read_files(
    '${customers_path}',
    format => '___',
    header => ___,
    inferColumnTypes => true
);
```

### Task 2.2: Create Bronze Orders Table

**Objective:** Create a STREAMING TABLE for JSON order files.

**Instructions:**
1. Ingest JSON files from orders path
2. Cast order_date to TIMESTAMP
3. Add metadata columns

**Hints:**
- JSON format: `format => 'json'`
- No header option needed for JSON
- Cast types explicitly for consistency

**TODO:** Complete the SQL:

```sql
-- Bronze Orders - JSON streaming ingestion
CREATE OR REFRESH STREAMING TABLE bronze_orders
COMMENT 'Raw order data from JSON stream'
AS
SELECT
    order_id,
    customer_id,
    product_id,
    CAST(order_date AS ___) AS order_date,  -- TODO: What type?
    quantity,
    unit_price,
    total_amount,
    status,
    payment_method,
    _metadata.file_path AS _source_file,
    current_timestamp() AS _ingested_at
FROM STREAM read_files(
    '${___}',  -- TODO: Which parameter?
    format => '___'  -- TODO: What format?
);
```

### Task 2.3: Create Bronze Products Table

**Objective:** Create a STREAMING TABLE for product CSV files.

**Instructions:**
1. Ingest CSV files from products path
2. Cast price to DECIMAL and stock_quantity to INT
3. Add metadata columns

**Hints:**
- `CAST(price AS DECIMAL(10,2))` for price precision
- `CAST(stock_quantity AS INT)` for integer quantity

**TODO:** Write the complete SQL for bronze_products table.

---

---

## Part 3: Silver Layer - Data Quality & SCD Type 2

### Task 3.1: Create Silver Customers with SCD Type 2

**Objective:** Implement SCD Type 2 for customer dimension to track history of changes.

**Instructions:**
1. Define the target STREAMING TABLE with SCD2 columns
2. Create an AUTO CDC FLOW with KEYS and SEQUENCE BY
3. Use `STORED AS SCD TYPE 2`

**Hints:**
- `__START_AT` and `__END_AT` are auto-managed by Lakeflow
- `KEYS` defines the business key for matching records
- `SEQUENCE BY` determines which record is newer (conflict resolution)
- Active records have `__END_AT IS NULL`

**TODO:** Complete the SQL:

```sql
-- Silver Customers - SCD Type 2 with history tracking
CREATE OR REFRESH STREAMING TABLE silver_customers (
    customer_id STRING,
    first_name STRING,
    last_name STRING,
    email STRING,
    phone STRING,
    city STRING,
    country STRING,
    registration_date DATE,
    customer_segment STRING,
    _source_file STRING,
    _ingested_at TIMESTAMP,
    -- SCD2 columns (auto-managed)
    __START_AT TIMESTAMP,
    __END_AT TIMESTAMP
)
COMMENT 'Silver customers with SCD Type 2 history';

-- AUTO CDC Flow for SCD Type 2
CREATE FLOW silver_customers_scd2
AS AUTO CDC INTO silver_customers
FROM STREAM ___  -- TODO: Source table?
KEYS (___)  -- TODO: Business key?
SEQUENCE BY ___  -- TODO: Ordering column?
STORED AS SCD TYPE ___; -- TODO: Which SCD type?
```

### Task 3.2: Create Silver Orders with Data Quality

**Objective:** Apply data quality expectations to validate orders.

**Instructions:**
1. Add CONSTRAINT ... EXPECT for validation rules
2. Choose violation action: DROP ROW or FAIL UPDATE
3. Add calculated fields (gross_amount, discount_amount)
4. Standardize text fields (UPPER for status, payment_method)

**Hints:**
- `ON VIOLATION DROP ROW` - silently drops invalid records
- `ON VIOLATION FAIL UPDATE` - stops pipeline on violation
- Dropped records are tracked in Event Log
- Use business-meaningful constraint names

**TODO:** Complete the SQL:

```sql
-- Silver Orders - with Data Quality expectations
CREATE OR REFRESH STREAMING TABLE silver_orders (
    -- Data Quality Constraints
    CONSTRAINT valid_order_id EXPECT (order_id IS NOT ___) ON VIOLATION ___ ROW,
    CONSTRAINT valid_amount EXPECT (total_amount ___ 0) ON VIOLATION DROP ROW,
    CONSTRAINT valid_quantity EXPECT (quantity > 0) ON VIOLATION DROP ROW,
    CONSTRAINT valid_date EXPECT (order_date IS NOT NULL) ON VIOLATION ___ UPDATE
)
COMMENT 'Validated orders with data quality checks'
AS
SELECT
    order_id,
    customer_id,
    product_id,
    order_date,
    DATE(order_date) AS order_date_key,
    quantity,
    unit_price,
    total_amount,
    -- TODO: Calculate gross and discount amounts
    (unit_price * ___) AS gross_amount,
    (total_amount - (unit_price * quantity)) AS discount_amount,
    -- TODO: Standardize text fields
    ___(status) AS status,
    UPPER(payment_method) AS payment_method,
    _source_file,
    _ingested_at,
    current_timestamp() AS _processed_at
FROM STREAM bronze_orders;
```

### Task 3.3: Create Silver Products with Validation

**Objective:** Create validated product table with price tier classification.

**Instructions:**
1. Add expectations for product_id, price, category
2. Apply INITCAP to product_name, UPPER to category
3. Calculate price_tier based on price ranges

**Hints:**
- `INITCAP()` capitalizes first letter of each word
- Use CASE WHEN for price_tier: PREMIUM (≥1000), STANDARD (≥100), BUDGET (<100)

**TODO:** Write the complete SQL for silver_products table.

---

---

## Part 4: Gold Layer - Star Schema

### Task 4.1: Create dim_customer Dimension

**Objective:** Create customer dimension from SCD2 table (current records only).

**Instructions:**
1. Use MATERIALIZED VIEW (batch refresh, cached)
2. Filter only current/active records from SCD2
3. Add calculated column: days_since_registration

**Hints:**
- `WHERE __END_AT IS NULL` filters only active records
- `DATEDIFF(current_date(), registration_date)` for days calculation
- `CONCAT(first_name, ' ', last_name)` for full_name

**TODO:** Complete the SQL:

```sql
-- Dimension: Customer (current version from SCD2)
CREATE OR REFRESH MATERIALIZED VIEW dim_customer
COMMENT 'Customer dimension - current active records only'
AS
SELECT
    customer_id,
    first_name,
    last_name,
    CONCAT(first_name, ' ', last_name) AS full_name,
    email,
    phone,
    city,
    country,
    registration_date,
    customer_segment,
    ___(current_date(), registration_date) AS days_since_registration
FROM silver_customers
WHERE ___; -- TODO: Filter only active records
```

### Task 4.2: Create dim_product Dimension

**Objective:** Create product dimension with stock status.

**Instructions:**
1. Use MATERIALIZED VIEW
2. Add stock_status: OUT_OF_STOCK (0), LOW_STOCK (<10), IN_STOCK (≥10)

**TODO:** Complete the SQL:

```sql
-- Dimension: Product with stock status
CREATE OR REFRESH MATERIALIZED VIEW dim_product
COMMENT 'Product dimension with price tiers and stock status'
AS
SELECT
    product_id,
    product_name,
    category,
    price,
    stock_quantity,
    price_tier,
    CASE 
        WHEN stock_quantity = 0 THEN '___'
        WHEN stock_quantity < 10 THEN '___'
        ELSE 'IN_STOCK'
    END AS stock_status
FROM silver_products;
```

### Task 4.3: Create dim_date Dimension

**Objective:** Generate a date dimension with calendar attributes.

**Instructions:**
1. Use SEQUENCE to generate date range (2020-2025)
2. Add: year, quarter, month, month_name, week_of_year, day_of_week, day_name, is_weekend

**Hints:**
- `EXPLODE(SEQUENCE(DATE('2020-01-01'), DATE('2025-12-31'), INTERVAL 1 DAY))` generates dates
- `DATE_FORMAT(date, 'yyyyMMdd')` for date key
- `DAYOFWEEK() IN (1, 7)` for weekend check (1=Sunday, 7=Saturday)

**TODO:** Write the complete SQL for dim_date.

### Task 4.4: Create fact_sales Fact Table

**Objective:** Create streaming fact table for real-time sales data.

**Instructions:**
1. Use STREAMING TABLE for low-latency updates
2. Include dimension keys: customer_id, product_id, order_date_key
3. Include measures: quantity, unit_price, total_amount, gross_amount, discount_amount

**Hints:**
- `CAST(DATE_FORMAT(order_date, 'yyyyMMdd') AS INT)` for date key
- Keep order_timestamp for detailed analysis
- Include _processed_at for lineage

**TODO:** Complete the SQL:

```sql
-- Fact: Sales (streaming for low-latency)
CREATE OR REFRESH STREAMING TABLE fact_sales
COMMENT 'Sales fact table - real-time updates'
AS
SELECT
    -- Keys
    order_id,
    customer_id,
    product_id,
    CAST(DATE_FORMAT(order_date, '___') AS INT) AS order_date_key,
    
    -- Measures
    quantity,
    unit_price,
    total_amount,
    gross_amount,
    discount_amount,
    
    -- Attributes
    status,
    payment_method,
    order_date AS order_timestamp,
    
    -- Lineage
    _processed_at
FROM STREAM ___;  -- TODO: Source table?
```

---

---

## Part 5: Running and Monitoring Pipeline

### Task 5.1: Start Pipeline

**Objective:** Run the pipeline and observe the execution.

**Instructions:**
1. Click **Start** in the pipeline UI
2. Observe the DAG building automatically
3. Watch tables being processed in order
4. Note the parallel execution of independent tables

**Questions to Consider:**
- What is the execution order?
- Which tables run in parallel?
- How long does each table take?

---

### Task 5.2: Explore Event Log

**Objective:** Query the Event Log for pipeline metrics and data quality results.

**Instructions:**
1. After pipeline completes, query the Event Log
2. Find data quality metrics
3. Check processing statistics

**Hints:**
- Event Log is a Delta table accessible via SQL
- Use `event_log('pipeline_id')` function
- Filter by `event_type` for specific events

In [None]:
# TODO: Query Event Log after pipeline runs
# Replace 'your_pipeline_id' with actual pipeline ID

# Uncomment after running pipeline:

# event_log_query = """
# SELECT 
#     timestamp,
#     event_type,
#     message,
#     details
# FROM event_log('your_pipeline_id')
# WHERE event_type IN ('flow_progress', 'data_quality')
# ORDER BY timestamp DESC
# LIMIT 20
# """
# spark.sql(event_log_query).display()

In [None]:
# TODO: Query Data Quality metrics

# Uncomment after running pipeline:

# dq_query = """
# SELECT 
#     details:flow_name AS table_name,
#     details:data_quality.expectation_name AS expectation,
#     details:data_quality.passed_records AS passed,
#     details:data_quality.failed_records AS failed,
#     ROUND(details:data_quality.passed_records / 
#           NULLIF(details:data_quality.passed_records + details:data_quality.failed_records, 0) * 100, 2) AS pass_rate_pct
# FROM event_log('your_pipeline_id')
# WHERE event_type = 'data_quality'
# ORDER BY timestamp DESC
# """
# spark.sql(dq_query).display()

---

## Workshop Summary

### Implemented Components

| Layer | Table | Type | Key Feature |
|-------|-------|------|-------------|
| Bronze | bronze_customers | STREAMING TABLE | Auto Loader (CSV) |
| Bronze | bronze_orders | STREAMING TABLE | Auto Loader (JSON) |
| Bronze | bronze_products | STREAMING TABLE | Auto Loader (CSV) |
| Silver | silver_customers | STREAMING TABLE | SCD Type 2 |
| Silver | silver_orders | STREAMING TABLE | Data Quality |
| Silver | silver_products | STREAMING TABLE | Validation + Enrichment |
| Gold | dim_customer | MATERIALIZED VIEW | Current SCD2 snapshot |
| Gold | dim_product | MATERIALIZED VIEW | Stock status |
| Gold | dim_date | MATERIALIZED VIEW | Generated calendar |
| Gold | fact_sales | STREAMING TABLE | Real-time fact |

---

### Key Takeaways

1. **Declarative = Less Code** - Focus on WHAT, not HOW
2. **Automatic Orchestration** - No manual dependency management
3. **Built-in Quality** - EXPECT constraints out-of-the-box
4. **Incremental Processing** - Only process new/changed data
5. **Observable** - Event Log, lineage, metrics included

---

### Best Practices

| Practice | Description |
|----------|-------------|
| **Use STREAMING TABLE for ingest** | Append-only, exactly-once semantics |
| **Use MATERIALIZED VIEW for aggregations** | Cached, incremental refresh |
| **Add metadata columns** | Track source file, ingestion time |
| **Use meaningful constraint names** | Easy debugging in Event Log |
| **Separate Bronze/Silver/Gold** | Clear data quality progression |

---

---

## Solutions

Below are the complete solutions for all workshop tasks.

In [None]:
# =============================================================================
# SOLUTIONS - Complete SQL for Lakeflow Pipeline
# =============================================================================
# These are the complete SQL statements. In practice, create separate .sql files
# for each table and add them to the Lakeflow pipeline.

bronze_customers_sql = """
-- Task 2.1: Bronze Customers
CREATE OR REFRESH STREAMING TABLE bronze_customers
COMMENT 'Raw customer data from CSV files'
AS
SELECT
    customer_id,
    first_name,
    last_name,
    email,
    phone,
    city,
    country,
    CAST(registration_date AS DATE) AS registration_date,
    customer_segment,
    _metadata.file_path AS _source_file,
    _metadata.file_modification_time AS _file_modified_at,
    current_timestamp() AS _ingested_at
FROM STREAM read_files(
    '${customers_path}',
    format => 'csv',
    header => true,
    inferColumnTypes => true
);
"""

bronze_orders_sql = """
-- Task 2.2: Bronze Orders
CREATE OR REFRESH STREAMING TABLE bronze_orders
COMMENT 'Raw order data from JSON stream'
AS
SELECT
    order_id,
    customer_id,
    product_id,
    CAST(order_date AS TIMESTAMP) AS order_date,
    quantity,
    unit_price,
    total_amount,
    status,
    payment_method,
    _metadata.file_path AS _source_file,
    current_timestamp() AS _ingested_at
FROM STREAM read_files(
    '${orders_path}',
    format => 'json'
);
"""

bronze_products_sql = """
-- Task 2.3: Bronze Products
CREATE OR REFRESH STREAMING TABLE bronze_products
COMMENT 'Raw product catalog from CSV'
AS
SELECT
    product_id,
    product_name,
    category,
    CAST(price AS DECIMAL(10,2)) AS price,
    CAST(stock_quantity AS INT) AS stock_quantity,
    _metadata.file_path AS _source_file,
    current_timestamp() AS _ingested_at
FROM STREAM read_files(
    '${products_path}',
    format => 'csv',
    header => true,
    inferColumnTypes => true
);
"""

print("Bronze layer SQL solutions loaded!")

In [None]:
# Silver layer solutions

silver_customers_sql = """
-- Task 3.1: Silver Customers with SCD Type 2
CREATE OR REFRESH STREAMING TABLE silver_customers (
    customer_id STRING,
    first_name STRING,
    last_name STRING,
    email STRING,
    phone STRING,
    city STRING,
    country STRING,
    registration_date DATE,
    customer_segment STRING,
    _source_file STRING,
    _ingested_at TIMESTAMP,
    __START_AT TIMESTAMP,
    __END_AT TIMESTAMP
)
COMMENT 'Silver customers with SCD Type 2 history';

CREATE FLOW silver_customers_scd2
AS AUTO CDC INTO silver_customers
FROM STREAM bronze_customers
KEYS (customer_id)
SEQUENCE BY _ingested_at
STORED AS SCD TYPE 2;
"""

silver_orders_sql = """
-- Task 3.2: Silver Orders with Data Quality
CREATE OR REFRESH STREAMING TABLE silver_orders (
    CONSTRAINT valid_order_id EXPECT (order_id IS NOT NULL) ON VIOLATION DROP ROW,
    CONSTRAINT valid_amount EXPECT (total_amount > 0) ON VIOLATION DROP ROW,
    CONSTRAINT valid_quantity EXPECT (quantity > 0) ON VIOLATION DROP ROW,
    CONSTRAINT valid_date EXPECT (order_date IS NOT NULL) ON VIOLATION FAIL UPDATE
)
COMMENT 'Validated orders with data quality checks'
AS
SELECT
    order_id,
    customer_id,
    product_id,
    order_date,
    DATE(order_date) AS order_date_key,
    quantity,
    unit_price,
    total_amount,
    (unit_price * quantity) AS gross_amount,
    (total_amount - (unit_price * quantity)) AS discount_amount,
    UPPER(status) AS status,
    UPPER(payment_method) AS payment_method,
    _source_file,
    _ingested_at,
    current_timestamp() AS _processed_at
FROM STREAM bronze_orders;
"""

silver_products_sql = """
-- Task 3.3: Silver Products
CREATE OR REFRESH STREAMING TABLE silver_products (
    CONSTRAINT valid_product_id EXPECT (product_id IS NOT NULL) ON VIOLATION DROP ROW,
    CONSTRAINT valid_price EXPECT (price > 0) ON VIOLATION DROP ROW,
    CONSTRAINT valid_category EXPECT (category IS NOT NULL) ON VIOLATION DROP ROW
)
COMMENT 'Validated product catalog'
AS
SELECT
    product_id,
    INITCAP(product_name) AS product_name,
    UPPER(category) AS category,
    price,
    stock_quantity,
    CASE 
        WHEN price >= 1000 THEN 'PREMIUM'
        WHEN price >= 100 THEN 'STANDARD'
        ELSE 'BUDGET'
    END AS price_tier,
    _source_file,
    _ingested_at
FROM STREAM bronze_products;
"""

print("Silver layer SQL solutions loaded!")

In [None]:
# Gold layer solutions

dim_customer_sql = """
-- Task 4.1: dim_customer
CREATE OR REFRESH MATERIALIZED VIEW dim_customer
COMMENT 'Customer dimension - current active records only'
AS
SELECT
    customer_id,
    first_name,
    last_name,
    CONCAT(first_name, ' ', last_name) AS full_name,
    email,
    phone,
    city,
    country,
    registration_date,
    customer_segment,
    DATEDIFF(current_date(), registration_date) AS days_since_registration
FROM silver_customers
WHERE __END_AT IS NULL;
"""

dim_product_sql = """
-- Task 4.2: dim_product
CREATE OR REFRESH MATERIALIZED VIEW dim_product
COMMENT 'Product dimension with price tiers and stock status'
AS
SELECT
    product_id,
    product_name,
    category,
    price,
    stock_quantity,
    price_tier,
    CASE 
        WHEN stock_quantity = 0 THEN 'OUT_OF_STOCK'
        WHEN stock_quantity < 10 THEN 'LOW_STOCK'
        ELSE 'IN_STOCK'
    END AS stock_status
FROM silver_products;
"""

dim_date_sql = """
-- Task 4.3: dim_date
CREATE OR REFRESH MATERIALIZED VIEW dim_date
COMMENT 'Date dimension with calendar attributes'
AS
WITH date_range AS (
    SELECT EXPLODE(SEQUENCE(
        DATE('2020-01-01'), 
        DATE('2025-12-31'), 
        INTERVAL 1 DAY
    )) AS date_value
)
SELECT
    CAST(DATE_FORMAT(date_value, 'yyyyMMdd') AS INT) AS date_key,
    date_value AS full_date,
    YEAR(date_value) AS year,
    QUARTER(date_value) AS quarter,
    MONTH(date_value) AS month,
    DATE_FORMAT(date_value, 'MMMM') AS month_name,
    WEEKOFYEAR(date_value) AS week_of_year,
    DAYOFWEEK(date_value) AS day_of_week,
    DATE_FORMAT(date_value, 'EEEE') AS day_name,
    CASE WHEN DAYOFWEEK(date_value) IN (1, 7) THEN TRUE ELSE FALSE END AS is_weekend
FROM date_range;
"""

fact_sales_sql = """
-- Task 4.4: fact_sales
CREATE OR REFRESH STREAMING TABLE fact_sales
COMMENT 'Sales fact table - real-time updates'
AS
SELECT
    order_id,
    customer_id,
    product_id,
    CAST(DATE_FORMAT(order_date, 'yyyyMMdd') AS INT) AS order_date_key,
    quantity,
    unit_price,
    total_amount,
    gross_amount,
    discount_amount,
    status,
    payment_method,
    order_date AS order_timestamp,
    _processed_at
FROM STREAM silver_orders;
"""

print("Gold layer SQL solutions loaded!")
print("\n" + "="*60)
print("All SQL solutions are available in variables:")
print("- bronze_customers_sql, bronze_orders_sql, bronze_products_sql")
print("- silver_customers_sql, silver_orders_sql, silver_products_sql")
print("- dim_customer_sql, dim_product_sql, dim_date_sql, fact_sales_sql")
print("="*60)

---

## Resource Cleanup (optional)

In [None]:
# WARNING: Run only if you want to delete the pipeline and all created tables

# To clean up:
# 1. Go to Workflows → Lakeflow Pipelines
# 2. Find your pipeline
# 3. Click Delete (this will remove all tables created by the pipeline)

# Or manually drop tables:
# spark.sql(f"DROP SCHEMA IF EXISTS {CATALOG}.lakeflow_workshop CASCADE")

print("Resource cleanup instructions above. Uncomment to execute.")