# Silver to Gold Layer Aggregation

**Purpose:** Create business-ready daily revenue reports  
**Input:** Silver parquet files (cleaned purchases)  
**Output:** Gold parquet files (daily aggregations)

## Step 1: Import Libraries

In [None]:
from pyspark.sql.functions import sum, count, col

## Step 2: Read Silver Layer Data

In [None]:
# Configuration - UPDATE THIS
STORAGE_ACCOUNT = "stgretail1763452716"
SILVER_PATH = f"abfss://retail@{STORAGE_ACCOUNT}.dfs.core.windows.net/silver/cleaned_transactions/*.parquet"

df_silver = spark.read.parquet(SILVER_PATH)

print(f"Silver records loaded: {df_silver.count()}")
df_silver.show(5)

## Step 3: Daily Revenue Aggregation

Calculate key metrics:
- Total revenue per day (sum of amounts)
- Total purchases per day (count of transactions)

In [None]:
df_daily_revenue = (
    df_silver
    .groupBy("event_date")
    .agg(
        sum("amount").alias("daily_revenue"),
        count("*").alias("total_purchases")
    )
    .orderBy("event_date")
)

print("Daily Revenue Report:")
df_daily_revenue.show()

## Step 4: Write to Gold Layer

In [None]:
GOLD_PATH = f"abfss://retail@{STORAGE_ACCOUNT}.dfs.core.windows.net/gold/daily_revenue"

df_daily_revenue.write.mode("overwrite").parquet(GOLD_PATH)

print(f"âœ… Gold layer written to: {GOLD_PATH}")

## Step 5: Summary Statistics (Optional)

In [None]:
from pyspark.sql.functions import avg, max, min

summary = df_daily_revenue.agg(
    sum("daily_revenue").alias("total_revenue"),
    sum("total_purchases").alias("total_purchases"),
    avg("daily_revenue").alias("avg_daily_revenue"),
    max("daily_revenue").alias("max_daily_revenue"),
    min("daily_revenue").alias("min_daily_revenue")
)

print("Overall Summary:")
summary.show()