# Silver Layer Transformation - Databricks

This notebook creates cleaned, enriched Silver tables from Bronze Delta tables.

**Prerequisites:**
- Bronze layer ingestion completed
- Tables exist in `/FileStore/instacart/bronze/`

**Output:**
- Enriched tables in `/FileStore/instacart/silver/`

In [None]:
# Configuration
BRONZE_PATH = "/FileStore/instacart/bronze"
SILVER_PATH = "/FileStore/instacart/silver"

print(f"Bronze input path: {BRONZE_PATH}")
print(f"Silver output path: {SILVER_PATH}")
print(f"Spark version: {spark.version}")

## Step 1: Create Product Master Table

Join products with aisles and departments to create complete product catalog.

In [None]:
from pyspark.sql.functions import col, current_timestamp

# Read Bronze tables
products_df = spark.read.format("delta").load(f"{BRONZE_PATH}/products")
aisles_df = spark.read.format("delta").load(f"{BRONZE_PATH}/aisles")
departments_df = spark.read.format("delta").load(f"{BRONZE_PATH}/departments")

# Join to create product master
product_master = products_df \
    .join(aisles_df, "aisle_id", "left") \
    .join(departments_df, "department_id", "left") \
    .select(
        col("product_id"),
        col("product_name"),
        col("aisle_id"),
        col("aisle"),
        col("department_id"),
        col("department")
    ) \
    .withColumn("processing_timestamp", current_timestamp())

# Write to Silver
product_master.write \
    .format("delta") \
    .mode("overwrite") \
    .save(f"{SILVER_PATH}/product_master")

count = product_master.count()
print(f"✓ Created product_master with {count:,} products")

In [None]:
# Preview product master
print("Product Master - Sample Data:")
display(product_master.limit(20))

## Step 2: Create Enriched Order Products (Prior)

Join order_products_prior with orders and product information.

In [None]:
# Read Bronze tables
order_products_prior_df = spark.read.format("delta").load(f"{BRONZE_PATH}/order_products_prior")
orders_df = spark.read.format("delta").load(f"{BRONZE_PATH}/orders")
product_master_df = spark.read.format("delta").load(f"{SILVER_PATH}/product_master")

# Join to create enriched table
order_products_prior_enriched = order_products_prior_df \
    .join(orders_df, "order_id", "inner") \
    .join(product_master_df, "product_id", "inner") \
    .select(
        col("order_id"),
        col("user_id"),
        col("order_number"),
        col("order_dow"),
        col("order_hour_of_day"),
        col("days_since_prior_order"),
        col("product_id"),
        col("product_name"),
        col("aisle_id"),
        col("aisle"),
        col("department_id"),
        col("department"),
        col("add_to_cart_order"),
        col("reordered")
    ) \
    .withColumn("processing_timestamp", current_timestamp())

# Write to Silver with partitioning
order_products_prior_enriched.write \
    .format("delta") \
    .mode("overwrite") \
    .partitionBy("department_id") \
    .save(f"{SILVER_PATH}/order_products_prior_enriched")

count = order_products_prior_enriched.count()
print(f"✓ Created order_products_prior_enriched with {count:,} records")

In [None]:
# Preview enriched prior orders
print("Order Products Prior (Enriched) - Sample Data:")
display(order_products_prior_enriched.limit(20))

## Step 3: Create Enriched Order Products (Train)

Same enrichment for training set.

In [None]:
# Read Bronze table
order_products_train_df = spark.read.format("delta").load(f"{BRONZE_PATH}/order_products_train")

# Join to create enriched table
order_products_train_enriched = order_products_train_df \
    .join(orders_df, "order_id", "inner") \
    .join(product_master_df, "product_id", "inner") \
    .select(
        col("order_id"),
        col("user_id"),
        col("order_number"),
        col("order_dow"),
        col("order_hour_of_day"),
        col("days_since_prior_order"),
        col("product_id"),
        col("product_name"),
        col("aisle_id"),
        col("aisle"),
        col("department_id"),
        col("department"),
        col("add_to_cart_order"),
        col("reordered")
    ) \
    .withColumn("processing_timestamp", current_timestamp())

# Write to Silver
order_products_train_enriched.write \
    .format("delta") \
    .mode("overwrite") \
    .partitionBy("department_id") \
    .save(f"{SILVER_PATH}/order_products_train_enriched")

count = order_products_train_enriched.count()
print(f"✓ Created order_products_train_enriched with {count:,} records")

## Step 4: Create User Order Summary

Aggregate user-level statistics from orders.

In [None]:
from pyspark.sql.functions import count, avg, max as spark_max

# Aggregate user statistics
user_order_summary = orders_df \
    .groupBy("user_id") \
    .agg(
        count("order_id").alias("total_orders"),
        spark_max("order_number").alias("max_order_number"),
        avg("order_dow").alias("avg_order_dow"),
        avg("order_hour_of_day").alias("avg_order_hour"),
        avg("days_since_prior_order").alias("avg_days_between_orders")
    ) \
    .withColumn("processing_timestamp", current_timestamp())

# Write to Silver
user_order_summary.write \
    .format("delta") \
    .mode("overwrite") \
    .save(f"{SILVER_PATH}/user_order_summary")

count = user_order_summary.count()
print(f"✓ Created user_order_summary with {count:,} users")

In [None]:
# Preview user summary
print("User Order Summary - Sample Data:")
display(user_order_summary.limit(20))

## Verify Silver Tables

List all created Silver tables.

In [None]:
# List Silver tables
silver_tables = dbutils.fs.ls(SILVER_PATH)

print("=" * 80)
print("SILVER LAYER TRANSFORMATION COMPLETE")
print("=" * 80)
print("\nSilver tables created:")
for table in silver_tables:
    print(f"  ✓ {table.name}")

## Summary

✅ **Silver layer transformation complete!**

**Tables created:**
- `product_master` - Complete product catalog with hierarchy
- `order_products_prior_enriched` - Historical purchases with full product info
- `order_products_train_enriched` - Training data enriched
- `user_order_summary` - User-level aggregates

**Next steps:**
1. Run `03_gold_aggregation_databricks` to create business metrics
2. Explore Silver tables in Data Explorer