### 🛠Ô∏è  Day 11 Tasks:

1. Calculate statistical summaries
2. Test hypotheses (weekday vs weekend)
3. Identify correlations
4. Engineer features for ML

## Task 1: Calculate Statistical Summaries

Basic Statistics (Revenue)

In [0]:
%sql
SELECT
  COUNT(*)                      AS days_count,
  MIN(total_revenue)            AS min_revenue,
  MAX(total_revenue)            AS max_revenue,
  AVG(total_revenue)            AS avg_revenue,
  STDDEV(total_revenue)         AS stddev_revenue
FROM ecommerce_catalog.gold.daily_sales_partitioned;


days_count,min_revenue,max_revenue,avg_revenue,stddev_revenue
61,339481302.6499579,1998441656.940825,525663331.1660861,334894430.33665925


Distribution Check (Orders)

In [0]:
%sql
SELECT
  MIN(total_orders)  AS min_orders,
  MAX(total_orders)  AS max_orders,
  AVG(total_orders)  AS avg_orders
FROM ecommerce_catalog.gold.daily_sales_partitioned;


min_orders,max_orders,avg_orders
1125950,6460123,1798262.0


## Task 2: Hypothesis Testing (Weekday vs Weekend)

Create Weekday / Weekend Label

In [0]:
%sql
SELECT
  CASE 
    WHEN dayofweek(order_date) IN (1, 7) THEN 'Weekend'
    ELSE 'Weekday'
  END AS day_type,
  AVG(total_revenue) AS avg_revenue
FROM ecommerce_catalog.gold.daily_sales_partitioned
GROUP BY day_type;


day_type,avg_revenue
Weekday,487541148.0416197
Weekend,624332511.0176462


Average revenue on weekends > average revenue on weekdays

This confirms the hypothesis that customers spend more on weekends.

## Task 3: Identify Correlations

Correlation Calculation

In [0]:
%sql
SELECT
  CORR(total_orders, total_revenue) AS orders_revenue_corr
FROM ecommerce_catalog.gold.daily_sales_partitioned;


orders_revenue_corr
0.9984075739818076


The correlation between total orders and total revenue is extremely high (~0.998), indicating that order volume is the primary driver of revenue. This validates that increasing customer conversions directly impacts revenue growth.

## Task 4: Feature Engineering for ML

Create Feature Table

In [0]:
%sql
CREATE OR REPLACE TABLE ecommerce_catalog.gold.daily_sales_features AS
SELECT
  order_date,
  dayofweek(order_date)                AS day_of_week,
  CASE 
    WHEN dayofweek(order_date) IN (1,7) THEN 1 ELSE 0 
  END                                  AS is_weekend,
  total_orders,
  total_revenue,
  avg_order_value,
  LAG(total_revenue, 1) OVER (ORDER BY order_date) AS prev_day_revenue
FROM ecommerce_catalog.gold.daily_sales_partitioned;


num_affected_rows,num_inserted_rows
