<div style="background-color:#f4f8ff; padding:16px; border-left:6px solid #1f4fd8; border-radius:6px; color:#000;">

<h2 style="margin-top:0; color:#000;">Cumulative Analysis</h2>

<h4 style="color:#000;">Purpose</h4>
<ul>
  <li>Calculate running totals or moving averages for key metrics.</li>
  <li>Track cumulative performance over time.</li>
  <li>Identify long-term trends and growth patterns.</li>
</ul>

<h4 style="color:#000;">SQL Functions Used</h4>
<ul>
  <li><b>SUM() OVER()</b></li>
  <li><b>AVG() OVER()</b></li>
</ul>

</div>


<div style="background-color:#f4f8ff; padding:10px; border-left:6px solid #1f4fd8; border-radius:6px; color:#000; font-size:20px;">
  <b>1. Calculate the total sales per month and the running total of sales over time</b><br>
  <span style="font-size:16px;">Cumulative Analysis</span>
</div>


In [2]:
query = """
SELECT TOP 10
    order_month,
    total_sales,
    SUM(total_sales) OVER (ORDER BY order_month) AS running_total_sales,
    AVG(avg_price) OVER (ORDER BY order_month) AS moving_average_sales
FROM
(
    SELECT
        DATETRUNC(month, order_date) AS order_month,
        SUM(sales_amount) AS total_sales,
        AVG(price) AS avg_price
    FROM gold.fact_sales
    WHERE order_date IS NOT NULL
    GROUP BY DATETRUNC(month, order_date)
) t
ORDER BY order_month;

"""

df = pd.read_sql(query, engine)
display(HTML(df.to_html(index=False)))

order_month,total_sales,running_total_sales,moving_average_sales
2010-12-01,43419,43419,3101
2011-01-01,469795,513214,3181
2011-02-01,466307,979521,3200
2011-03-01,485165,1464686,3208
2011-04-01,502042,1966728,3206
2011-05-01,561647,2528375,3209
2011-06-01,737793,3266168,3209
2011-07-01,596710,3862878,3204
2011-08-01,614516,4477394,3202
2011-09-01,603047,5080441,3208


<div style="background-color:#f4f8ff; padding:10px; border-left:6px solid #1f4fd8; border-radius:6px; color:#000; font-size:20px;">
  <b>2. Calculate the total sales per year and the running total of sales & avg. Price over time.</b><br>
  <span style="font-size:16px;">Cumulative Analysis</span>
</div>

In [3]:
query = """
SELECT
	order_date,
	total_sales,
	SUM(total_sales) OVER (ORDER BY order_date) AS running_total_sales,
	AVG(avg_price) OVER (ORDER BY order_date) AS moving_average_price
FROM
(
    SELECT 
        DATETRUNC(year, order_date) AS order_date,
        SUM(sales_amount) AS total_sales,
        AVG(price) AS avg_price
    FROM [gold].[fact_sales]
    WHERE order_date IS NOT NULL
    GROUP BY DATETRUNC(year, order_date)
) t
"""

df = pd.read_sql(query, engine)
display(HTML(df.to_html(index=False)))

order_date,total_sales,running_total_sales,moving_average_price
2010-01-01,43419,43419,3101
2011-01-01,7075088,7118507,3146
2012-01-01,5842231,12960738,2670
2013-01-01,16344878,29305616,2080
2014-01-01,45642,29351258,1668
