# Gold Aggregation - Daily Sales Summary

## Summary
- Purpose: Aggregate transactional facts into gold-level daily sales summaries for analytics and reporting.
- Inputs: `capstone.gold.daily_sales_fact` (fact table)
- Outputs: `capstone.gold.daily_sales_summary` (aggregated metrics by date and region)
- Audit: Calls `audit_log(spark, table_name, log_path)` after write to record operation metadata.

## Key Transformations
- Compute total revenue, units sold, average order value, and orders count per day and region

## Usage
- Run after populating the gold fact table.


In [None]:
dbutils.widgets.text("catalog", "capstone", "Enter the Catalog: ")

In [None]:
from capstone_pipeline.main import audit_log

table_name = f"{dbutils.widgets.get("catalog")}.gold.daily_sales_summary"
log_path = f"/Volumes/{dbutils.widgets.get("catalog")}/meta/history"

In [None]:
from pyspark.sql import functions as F

dfgold = (spark.table(f"{dbutils.widgets.get("catalog")}.gold.daily_sales_fact").groupBy("order_date", "region")
            .agg(
                F.sum(F.col("line_total")).alias("total_revenue"),
                F.sum(F.col("quantity")).alias("total_units_sold"),
                F.round(F.sum(F.col("line_total")) / F.countDistinct(F.col("order_id")), 2).alias("avg_order_value"),
                F.countDistinct(F.col("order_id")).alias("orders_count")
            )
            .withColumn("region", F.when(F.col("region").isNull(), F.lit("Unknown")).otherwise(F.col("region")))       
            .orderBy(F.col("order_date").desc(), F.col("region")))



In [None]:
(dfgold
    .write
    .mode("overwrite")
    .format("delta")
    .saveAsTable(f"{dbutils.widgets.get("catalog")}.gold.daily_sales_summary"))

In [None]:
audit_log(spark, table_name, log_path)