### AGGREGATION METHOD AND NAMED AGGREGATIONS

+ The .agg() method lets you perform multiple Aggregations on a groupby object.
+ We can also perform multiple aggregations by passing the aggregations in a dictionary.
+ We can also perform specific aggregations by columns by passing a dictionary with column names as keys and list of aggregation function as values.

In [1]:
import pandas as pd
import numpy as np

In [2]:
retail = pd.read_csv("retail_2016_2017.csv")
retail

Unnamed: 0,id,date,store_nbr,family,sales,onpromotion
0,1945944,2016-01-01,1,AUTOMOTIVE,0.000,0
1,1945945,2016-01-01,1,BABY CARE,0.000,0
2,1945946,2016-01-01,1,BEAUTY,0.000,0
3,1945947,2016-01-01,1,BEVERAGES,0.000,0
4,1945948,2016-01-01,1,BOOKS,0.000,0
...,...,...,...,...,...,...
1054939,3000883,2017-08-15,9,POULTRY,438.133,0
1054940,3000884,2017-08-15,9,PREPARED FOODS,154.553,1
1054941,3000885,2017-08-15,9,PRODUCE,2419.729,148
1054942,3000886,2017-08-15,9,SCHOOL AND OFFICE SUPPLIES,121.000,8


In [5]:
## doing some aggregation
retail.groupby(["family", "store_nbr"]).agg({"sales" : ["sum","mean"]})

Unnamed: 0_level_0,Unnamed: 1_level_0,sales,sales
Unnamed: 0_level_1,Unnamed: 1_level_1,sum,mean
family,store_nbr,Unnamed: 2_level_2,Unnamed: 3_level_2
AUTOMOTIVE,1,2524.000000,4.263514
AUTOMOTIVE,2,3918.000000,6.618243
AUTOMOTIVE,3,6790.000000,11.469595
AUTOMOTIVE,4,2565.000000,4.332770
AUTOMOTIVE,5,3667.000000,6.194257
...,...,...,...
SEAFOOD,50,12773.966999,21.577647
SEAFOOD,51,34250.948976,57.856333
SEAFOOD,52,1219.475999,2.059926
SEAFOOD,53,3745.180001,6.326318


In [6]:
retail.groupby(["family", "store_nbr"]).agg({"sales" : ["sum","mean"], "onpromotion" : ["sum"]})

Unnamed: 0_level_0,Unnamed: 1_level_0,sales,sales,onpromotion
Unnamed: 0_level_1,Unnamed: 1_level_1,sum,mean,sum
family,store_nbr,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
AUTOMOTIVE,1,2524.000000,4.263514,14
AUTOMOTIVE,2,3918.000000,6.618243,12
AUTOMOTIVE,3,6790.000000,11.469595,12
AUTOMOTIVE,4,2565.000000,4.332770,9
AUTOMOTIVE,5,3667.000000,6.194257,17
...,...,...,...,...
SEAFOOD,50,12773.966999,21.577647,716
SEAFOOD,51,34250.948976,57.856333,859
SEAFOOD,52,1219.475999,2.059926,78
SEAFOOD,53,3745.180001,6.326318,456


### NAMED AGGREGATIONS

+ We can name aggregated columns upon creation to avoid multi-index columns.

In [9]:
retail.groupby(["family","store_nbr"]).agg(sales_sum = ("sales","sum"),
        sales_average = ("sales", "mean"),
        on_promotion_max = ("onpromotion","max"))

Unnamed: 0_level_0,Unnamed: 1_level_0,sales_sum,sales_average,on_promotion_max
family,store_nbr,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
AUTOMOTIVE,1,2524.000000,4.263514,1
AUTOMOTIVE,2,3918.000000,6.618243,1
AUTOMOTIVE,3,6790.000000,11.469595,1
AUTOMOTIVE,4,2565.000000,4.332770,1
AUTOMOTIVE,5,3667.000000,6.194257,2
...,...,...,...,...
SEAFOOD,50,12773.966999,21.577647,7
SEAFOOD,51,34250.948976,57.856333,7
SEAFOOD,52,1219.475999,2.059926,5
SEAFOOD,53,3745.180001,6.326318,5


In [16]:
## read the transaction data
transactions = pd.read_csv("transactions.csv")
transactions

transactions = transactions.assign(
    target_pct = transactions["transactions"] / 2500,
    met_target = (transactions["transactions"] / 2500) >= 1,
    bonus_payable = ((transactions["transactions"] / 2500) >= 1) * 100,
)
transactions

transactions.groupby("target_pct").agg({"met_target" : "mean", "bonus_payable" : "mean"}).sort_values(by = "bonus_payable", ascending = False)

Unnamed: 0_level_0,met_target,bonus_payable
target_pct,Unnamed: 1_level_1,Unnamed: 2_level_1
1.1448,True,100
1.4768,True,100
1.4796,True,100
1.4792,True,100
1.4788,True,100
...,...,...
0.7184,False,0
0.7188,False,0
0.7192,False,0
0.7196,False,0
