## Aggregating DataFrames

#### Efficient summaries

Cumulative statistics can also be helpful in tracking summary statistics over time. In this exercise, you'll calculate the cumulative sum and cumulative max of a department's weekly sales, which will allow you to identify what the total sales were so far as well as what the highest weekly sales were so far.

A DataFrame called `sales_1_1` has been created for you, which contains the sales data for department 1 of store 1. `pandas` is loaded as `pd`.

### Instructions

* Sort of rows of `sales_1_1` by the `date` column in ascending order.
* Get the cumulative sum of `weekly_sales` and add it as a new column of `sales_1_1` called `cum_weekly_sales.
* Get the cumulative maximum of `weekly_sales`, and add it as a column called `cum_max_sales`.
* Print the `date`, `weekly_sales`, `cum_weekly_sales`, and `cum_max_sales` columns.

In [1]:
# importing pandas
import pandas as pd

# importing sales dataset
sales = pd.read_csv("../datasets/sales_subset.csv")
sales.head()

Unnamed: 0.1,Unnamed: 0,store,type,department,date,weekly_sales,is_holiday,temperature_c,fuel_price_usd_per_l,unemployment
0,0,1,A,1,2010-02-05,24924.5,False,5.727778,0.679451,8.106
1,1,1,A,1,2010-03-05,21827.9,False,8.055556,0.693452,8.106
2,2,1,A,1,2010-04-02,57258.43,False,16.816667,0.718284,7.808
3,3,1,A,1,2010-05-07,17413.94,False,22.527778,0.748928,7.808
4,4,1,A,1,2010-06-04,17558.09,False,27.05,0.714586,7.808


In [3]:
# Sort sales_1_1 by date
sales_1_1 = sales.sort_values("date")
sales_1_1

Unnamed: 0.1,Unnamed: 0,store,type,department,date,weekly_sales,is_holiday,temperature_c,fuel_price_usd_per_l,unemployment
0,0,1,A,1,2010-02-05,24924.50,False,5.727778,0.679451,8.106
6437,6437,19,A,13,2010-02-05,38597.52,False,-6.133333,0.780365,8.350
1249,1249,2,A,31,2010-02-05,3840.21,False,4.550000,0.679451,8.324
6449,6449,19,A,14,2010-02-05,17590.59,False,-6.133333,0.780365,8.350
6461,6461,19,A,16,2010-02-05,4929.87,False,-6.133333,0.780365,8.350
...,...,...,...,...,...,...,...,...,...,...
3592,3592,6,A,99,2012-10-05,440.00,False,21.577778,0.955511,5.329
8108,8108,20,A,99,2012-10-05,660.00,False,15.983333,1.052726,7.293
10773,10773,39,A,99,2012-10-05,915.00,False,22.250000,0.955511,6.228
6257,6257,14,A,96,2012-10-12,3.00,False,12.483333,1.056689,8.667


In [4]:
# Get the cumulative sum of weekly_sales, add as cum_weekly_sales col
sales_1_1["cum_weekly_sales"] = sales['weekly_sales'].cumsum()
sales_1_1["cum_weekly_sales"]

0        2.492450e+04
6437     1.629610e+08
1249     2.668539e+07
6449     1.633879e+08
6461     1.635954e+08
             ...     
3592     8.543040e+07
8108     2.028157e+08
10773    2.568947e+08
6257     1.583335e+08
3384     7.879133e+07
Name: cum_weekly_sales, Length: 10774, dtype: float64

In [5]:
# Get the cumulative max of weekly_sales, add as cum_max_sales col
sales_1_1["cum_max_sales"] = sales_1_1["weekly_sales"].cummax()
sales_1_1["cum_max_sales"]

0         24924.50
6437      38597.52
1249      38597.52
6449      38597.52
6461      38597.52
           ...    
3592     293966.05
8108     293966.05
10773    293966.05
6257     293966.05
3384     293966.05
Name: cum_max_sales, Length: 10774, dtype: float64

In [6]:
# See the columns you calculated
print(sales_1_1[["date", "weekly_sales", "cum_weekly_sales", "cum_max_sales"]])

             date  weekly_sales  cum_weekly_sales  cum_max_sales
0      2010-02-05      24924.50      2.492450e+04       24924.50
6437   2010-02-05      38597.52      1.629610e+08       38597.52
1249   2010-02-05       3840.21      2.668539e+07       38597.52
6449   2010-02-05      17590.59      1.633879e+08       38597.52
6461   2010-02-05       4929.87      1.635954e+08       38597.52
...           ...           ...               ...            ...
3592   2012-10-05        440.00      8.543040e+07      293966.05
8108   2012-10-05        660.00      2.028157e+08      293966.05
10773  2012-10-05        915.00      2.568947e+08      293966.05
6257   2012-10-12          3.00      1.583335e+08      293966.05
3384   2012-10-26        -21.63      7.879133e+07      293966.05

[10774 rows x 4 columns]
