In [1]:
import pandas as pd

### Pandas Group By
We have 3 main methods we use to do group bys. Each has their own advantages and disadvantages. Check out each one and see which works best for your case.

We will run through 3 examples:
1. Method 1 - Quick and simple group by
2. Method 1 - Quick and simple group by with multiple columns
3. Method 2 - Different columns with different aggregate functions
4. Method 3 - Different columns with different aggregate functions and new column names

But first, let's create our DataFrame

In [17]:
df = pd.DataFrame([('Liho Liho', 200, 45.32),
                  ('Chambers', 350, 65.33),
                  ('The Square', 15, 12.45),
                  ('Tosca Cafe', 35, 180.34),
                  ('Liho Liho', 98, 145.42),
                  ('Chambers', 205, 25.35)],
           columns=('name', 'Customers', 'AvgBill')
                 )
df

Unnamed: 0,name,Customers,AvgBill
0,Liho Liho,200,45.32
1,Chambers,350,65.33
2,The Square,15,12.45
3,Tosca Cafe,35,180.34
4,Liho Liho,98,145.42
5,Chambers,205,25.35


### 1. Method 1 - Quick and simple group by.

The simplest group by takes a single 'group by column,' single 'column to aggregate' and finally an aggregate function. A series is returned to you

* Group By Column = 'name'
* Column To Aggregate = 'AvgBill'
* Aggregate function = .sum()

In [18]:
df.groupby('name')['AvgBill'].sum()

name
Chambers       90.68
Liho Liho     190.74
The Square     12.45
Tosca Cafe    180.34
Name: AvgBill, dtype: float64

### 2. Method 1 - Quick and simple group by with multiple columns

In order to do an aggregate function on multiple columns, simply pass a list of columns into your 'columns to aggregate.'

* Group By Column = 'name'
* Column To Aggregate = ['Customers', 'AvgBill']
* Aggregate function = .sum()

In [20]:
df.groupby('name')[['Customers', 'AvgBill']].sum()

Unnamed: 0_level_0,Customers,AvgBill
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Chambers,555,90.68
Liho Liho,298,190.74
The Square,15,12.45
Tosca Cafe,35,180.34


### 3. Method 2 - Different columns with different aggregate functions

In order to apply different aggregate functions to different columns, you'll need to use the .agg() function. This helpful function allows you to specify each column and the specific function you'd like to apply to it.

You'll need to pass a *dictionary* to your .agg() function. Keys=Column name you'd like to aggregate, values=aggregate function.

Here I'm taking the MAX number from the Customers column and the mean of the AvgBill Column

In [26]:
df.groupby('name').agg({
    'Customers' : max,
    'AvgBill' : pd.Series.mean
})

Unnamed: 0_level_0,Customers,AvgBill
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Chambers,350,45.34
Liho Liho,200,95.37
The Square,15,12.45
Tosca Cafe,35,180.34


### 4. Method 3 - Different columns with different aggregate functions and new column names

This final method, although long, is nice because you can rename the output columns.

Example for the first columns
**New column name** = max_customers (Note: I agree it's weird that a name of something is pulled from a variable vs a string)
**column** = Column to aggregate: Customers
**aggfunc** = The agg function you'd like to apply - max

In [30]:
df.groupby('name').agg(
    max_customers = pd.NamedAgg(column='Customers', aggfunc=max),
    mean_avg_bill = pd.NamedAgg(column='AvgBill', aggfunc=pd.Series.mean),
)

Unnamed: 0_level_0,max_customers,mean_avg_bill
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Chambers,350,45.34
Liho Liho,200,95.37
The Square,15,12.45
Tosca Cafe,35,180.34
