# Create Bronze Layer Shortcuts (cusn schema)

This notebook creates the Bronze layer (cusn schema) with shortcuts to both:
- **Batch Historical Data**: 24 parquet tables in ADLSv2 (6 dimensions + 18 facts)
- **Streaming Real-Time Data**: 18 event tables in Eventhouse

The Bronze layer serves as the unified source for the Silver layer transformation.

## Prerequisites
- ADLSv2 storage account with parquet files (exported from datagen)
- Eventhouse with streaming event tables
- Lakehouse workspace permissions

## Architecture
```
ADLSv2 Parquet (24 tables) ──shortcut──> cusn.dim_*, cusn.fact_*
Eventhouse Events (18 tables) ──shortcut──> cusn.receipt_created, etc.
```

In [None]:
# Configuration Parameters
# Replace these with your actual Azure resource values

# ADLSv2 Configuration
ADLS_ACCOUNT = "stdretail"  # Storage account name
ADLS_CONTAINER = "supermarket"  # Container name
ADLS_BASE_PATH = ""  # Base path within container (empty for root)

# Eventhouse Configuration  
EVENTHOUSE_URI = "<replace-with-eventhouse-uri>"  # e.g., https://xyz.kusto.windows.net
EVENTHOUSE_DATABASE = "kql_retail_db"  # KQL database name

# Schema Names
BRONZE_SCHEMA = "cusn"

print(f"Configuration loaded:")
print(f"  ADLSv2: abfss://{ADLS_CONTAINER}@{ADLS_ACCOUNT}.dfs.core.windows.net/{ADLS_BASE_PATH}")
print(f"  Eventhouse: {EVENTHOUSE_URI}/{EVENTHOUSE_DATABASE}")
print(f"  Bronze Schema: {BRONZE_SCHEMA}")

## Step 1: Create Bronze Schema

In [None]:
# Create cusn schema
spark.sql(f"CREATE SCHEMA IF NOT EXISTS {BRONZE_SCHEMA}")
print(f"✓ Schema '{BRONZE_SCHEMA}' created or already exists")

# Verify schema creation
schemas = spark.sql("SHOW SCHEMAS").collect()
if any(row.namespace == BRONZE_SCHEMA for row in schemas):
    print(f"✓ Verified: Schema '{BRONZE_SCHEMA}' exists")
else:
    raise Exception(f"Failed to create schema '{BRONZE_SCHEMA}'")

## Step 2: Create Shortcuts to ADLSv2 Parquet Tables (Batch Historical Data)

Creates 24 shortcuts to parquet files in ADLSv2:
- 6 Dimension Tables
- 18 Fact Tables

In [None]:
# Define dimension tables
dimension_tables = [
    "dim_geographies",
    "dim_stores",
    "dim_distribution_centers",
    "dim_trucks",
    "dim_customers",
    "dim_products"
]

# Define fact tables (all 18)
fact_tables = [
    "fact_receipts",
    "fact_receipt_lines",
    "fact_store_inventory_txn",
    "fact_dc_inventory_txn",
    "fact_truck_moves",
    "fact_truck_inventory",
    "fact_foot_traffic",
    "fact_ble_pings",
    "fact_customer_zone_changes",
    "fact_marketing",
    "fact_online_order_headers",
    "fact_online_order_lines",
    "fact_payments",
    "fact_store_ops",
    "fact_stockouts",
    "fact_promotions",
    "fact_promo_lines",
    "fact_reorders"
]

all_parquet_tables = dimension_tables + fact_tables
print(f"Total parquet tables to create shortcuts for: {len(all_parquet_tables)}")
print(f"  Dimensions: {len(dimension_tables)}")
print(f"  Facts: {len(fact_tables)}")

In [None]:
from notebookutils import mssparkutils

# Helper function to create ADLSv2 shortcut
def create_adls_shortcut(table_name: str, schema: str = BRONZE_SCHEMA) -> bool:
    """
    Create a shortcut to an ADLSv2 parquet table.
    
    Args:
        table_name: Name of the table (e.g., 'dim_stores')
        schema: Target schema name (default: cusn)
        
    Returns:
        True if successful, False otherwise
    """
    try:
        # Construct ADLS path
        adls_path = f"abfss://{ADLS_CONTAINER}@{ADLS_ACCOUNT}.dfs.core.windows.net/{ADLS_BASE_PATH}{table_name}/"
        
        # Create shortcut using mssparkutils
        # Note: This API may vary based on Fabric version
        # Alternative: Use Fabric REST API or Portal for manual creation
        
        # For now, we'll create external tables pointing to ADLS
        spark.sql(f"""
            CREATE TABLE IF NOT EXISTS {schema}.{table_name}
            USING PARQUET
            LOCATION '{adls_path}'
        """)
        
        print(f"  ✓ {schema}.{table_name} -> {adls_path}")
        return True
        
    except Exception as e:
        print(f"  ✗ Failed to create {schema}.{table_name}: {e}")
        return False

# Create shortcuts for all parquet tables
print(f"\nCreating shortcuts to ADLSv2 parquet tables...\n")
success_count = 0
failed_tables = []

for table in all_parquet_tables:
    if create_adls_shortcut(table):
        success_count += 1
    else:
        failed_tables.append(table)

print(f"\nADLSv2 Shortcut Creation Summary:")
print(f"  Success: {success_count}/{len(all_parquet_tables)}")
if failed_tables:
    print(f"  Failed tables: {', '.join(failed_tables)}")

## Step 3: Create Shortcuts to Eventhouse Event Tables (Streaming Real-Time Data)

Creates 18 shortcuts to streaming event tables in Eventhouse.

In [None]:
# Define streaming event tables (matches EventType enum)
event_tables = [
    # Transaction Events
    "receipt_created",
    "receipt_line_added",
    "payment_processed",
    
    # Inventory Events
    "inventory_updated",
    "stockout_detected",
    "reorder_triggered",
    
    # Customer Events
    "customer_entered",
    "customer_zone_changed",
    "ble_ping_detected",
    
    # Operational Events
    "truck_arrived",
    "truck_departed",
    "store_opened",
    "store_closed",
    
    # Marketing Events
    "ad_impression",
    "promotion_applied",
    
    # Omnichannel Events
    "online_order_created",
    "online_order_picked",
    "online_order_shipped"
]

print(f"Total event tables to create shortcuts for: {len(event_tables)}")

In [None]:
def create_eventhouse_shortcut(table_name: str, schema: str = BRONZE_SCHEMA) -> bool:
    """
    Create a shortcut to an Eventhouse KQL table.
    
    Args:
        table_name: Name of the event table (e.g., 'receipt_created')
        schema: Target schema name (default: cusn)
        
    Returns:
        True if successful, False otherwise
    """
    try:
        # Construct Eventhouse URI
        kql_uri = f"{EVENTHOUSE_URI}/{EVENTHOUSE_DATABASE}"
        
        # Create shortcut to Eventhouse table
        # Note: Syntax may vary - this is for Fabric Lakehouse shortcuts to KQL
        # Alternative: Use Fabric UI or REST API for shortcut creation
        
        # For external KQL tables, use USING delta with options
        # This is a placeholder - actual syntax depends on Fabric's shortcut API
        spark.sql(f"""
            CREATE TABLE IF NOT EXISTS {schema}.{table_name}
            USING org.apache.spark.sql.eventhouse
            OPTIONS (
                'eventhouse.uri' = '{kql_uri}',
                'eventhouse.table' = '{table_name}'
            )
        """)
        
        print(f"  ✓ {schema}.{table_name} -> {kql_uri}/{table_name}")
        return True
        
    except Exception as e:
        # Eventhouse shortcuts may need to be created via UI or REST API
        print(f"  ⚠ {schema}.{table_name}: Shortcut may need manual creation in Fabric UI")
        print(f"    Source: {EVENTHOUSE_URI}/{EVENTHOUSE_DATABASE}.{table_name}")
        return False

# Create shortcuts for all event tables
print(f"\nCreating shortcuts to Eventhouse event tables...\n")
print(f"Note: Eventhouse shortcuts may require manual creation via Fabric UI")
print(f"      if the programmatic API is not available.\n")

event_success_count = 0
event_failed_tables = []

for table in event_tables:
    if create_eventhouse_shortcut(table):
        event_success_count += 1
    else:
        event_failed_tables.append(table)

print(f"\nEventhouse Shortcut Creation Summary:")
print(f"  Attempted: {event_success_count}/{len(event_tables)}")
if event_failed_tables:
    print(f"  Manual creation needed for: {', '.join(event_failed_tables)}")

## Step 4: Verify Bronze Layer Shortcuts

In [None]:
# List all tables in Bronze schema
bronze_tables = spark.sql(f"SHOW TABLES IN {BRONZE_SCHEMA}").collect()

print(f"\nBronze Schema ({BRONZE_SCHEMA}) Tables:")
print(f"  Total: {len(bronze_tables)} tables\n")

# Categorize tables
dim_count = sum(1 for t in bronze_tables if t.tableName.startswith('dim_'))
fact_count = sum(1 for t in bronze_tables if t.tableName.startswith('fact_'))
event_count = len(bronze_tables) - dim_count - fact_count

print(f"  Dimensions: {dim_count} (expected: 6)")
print(f"  Facts: {fact_count} (expected: 18)")
print(f"  Events: {event_count} (expected: 18)")
print(f"\n  Target: 42 tables (6 dims + 18 facts + 18 events)")

if len(bronze_tables) == 42:
    print(f"\n✓ Bronze layer complete with all 42 shortcuts!")
else:
    print(f"\n⚠ Bronze layer has {len(bronze_tables)}/42 tables")
    print(f"  Check for missing tables and create shortcuts manually if needed")

## Step 5: Test Bronze Layer Access

In [None]:
# Test reading from Bronze shortcuts
print("\nTesting Bronze layer shortcuts...\n")

# Test dimension shortcut
try:
    df_dim = spark.table(f"{BRONZE_SCHEMA}.dim_stores")
    print(f"✓ dim_stores: {df_dim.count()} rows")
    df_dim.printSchema()
except Exception as e:
    print(f"✗ dim_stores failed: {e}")

print("\n" + "="*60 + "\n")

# Test fact shortcut
try:
    df_fact = spark.table(f"{BRONZE_SCHEMA}.fact_receipts")
    print(f"✓ fact_receipts: {df_fact.count()} rows")
    df_fact.printSchema()
except Exception as e:
    print(f"✗ fact_receipts failed: {e}")

print("\n" + "="*60 + "\n")

# Test event shortcut (if available)
try:
    df_event = spark.table(f"{BRONZE_SCHEMA}.receipt_created")
    print(f"✓ receipt_created: {df_event.count()} rows")
    df_event.printSchema()
except Exception as e:
    print(f"⚠ receipt_created: {e}")
    print(f"  (Event shortcuts may need manual creation)")

print("\n" + "="*60)
print("\nBronze layer testing complete!")

## Summary

This notebook has created the Bronze layer (cusn schema) with shortcuts to:

### ADLSv2 Parquet Tables (24 tables)
- 6 Dimension tables: dim_*
- 18 Fact tables: fact_*

### Eventhouse Event Tables (18 tables)
- Transaction events: receipt_created, receipt_line_added, payment_processed
- Inventory events: inventory_updated, stockout_detected, reorder_triggered
- Customer events: customer_entered, customer_zone_changed, ble_ping_detected
- Operational events: truck_arrived, truck_departed, store_opened, store_closed
- Marketing events: ad_impression, promotion_applied
- Omnichannel events: online_order_created, online_order_picked, online_order_shipped

### Next Steps
1. If any shortcuts failed, create them manually via Fabric UI
2. Verify all 42 shortcuts are accessible
3. Run the Silver transformation notebook (02-onelake-to-silver.ipynb)
4. Silver layer will combine batch + streaming data into unified ag.* tables

### Manual Shortcut Creation (if needed)
For Eventhouse shortcuts that couldn't be created programmatically:
1. Navigate to Lakehouse in Fabric workspace
2. Right-click on Tables
3. Select "New shortcut" → "Eventhouse"
4. Enter Eventhouse URI and table name
5. Target schema: cusn