# Gold Aggregation - Daily Sales Summary

## Summary
- Purpose: Aggregate transactional facts into gold-level daily sales summaries for analytics and reporting.
- Inputs: `capstone.gold.daily_sales_fact` (fact table)
- Outputs: `capstone.gold.daily_sales_summary` (aggregated metrics by date and region)
- Audit: Calls `audit_log(spark, table_name, log_path)` after write to record operation metadata.

## Key Transformations
- Compute total revenue, units sold, average order value, and orders count per day and region

## Usage
- Run after populating the gold fact table.


In [None]:
from pyspark.sql.functions import current_timestamp, col, lit, countDistinct, nullif
from capstone_pipeline.main import audit_log

table_name = "capstone.gold.daily_sales_summary"
log_path = "/Volumes/capstone/bronze/history"

In [None]:
dfgold = (spark.table("capstone.gold.daily_sales_fact").groupBy("order_date", "region")
            .agg(
                sum("line_total").alias("total_revenue"),
                sum("quantity").alias("total_units_sold"),
                round((sum("line_total") / nullif(countDistinct("order_id"), lit(0))), 2).alias("avg_order_value"),
                countDistinct("order_id").alias("orders_count")
            )            
            .orderBy(col("order_date").desc(), col("region")))



In [None]:
(dfgold
    .write
    .mode("overwrite")
    .format("delta")
    .saveAsTable("capstone.gold.daily_sales_summary"))

In [None]:
audit_log(spark, table_name, log_path)