In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.DataFrame([
               {'product_id':23, 'name':'computer',
                 'wholesale_price': 500,
                 'retail_price':1000, 'sales':100,
                 'department':'electronics'},
               {'product_id':96, 'name':'Python Workout',
                 'wholesale_price': 35,
                 'retail_price':75, 'sales':1000,
                 'department':'books'},
               {'product_id':97, 'name':'Pandas Workout',
                 'wholesale_price': 35,
                 'retail_price':75, 'sales':500,
                 'department':'books'},
               {'product_id':15, 'name':'banana',
                 'wholesale_price': 0.5,
                 'retail_price':1, 'sales':200,
                 'department':'food'},
               {'product_id':87, 'name':'sandwich',
                 'wholesale_price': 3,
                 'retail_price':5, 'sales':300,
                 'department': 'food'},
               ])

In [3]:
df

Unnamed: 0,product_id,name,wholesale_price,retail_price,sales,department
0,23,computer,500.0,1000,100,electronics
1,96,Python Workout,35.0,75,1000,books
2,97,Pandas Workout,35.0,75,500,books
3,15,banana,0.5,1,200,food
4,87,sandwich,3.0,5,300,food


In [4]:
df.count()

product_id         5
name               5
wholesale_price    5
retail_price       5
sales              5
department         5
dtype: int64

In [5]:
df.groupby('department') # Returns DataFrameGroupBy object

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001C6F6F73230>

In [6]:
df.groupby('department').count()

Unnamed: 0_level_0,product_id,name,wholesale_price,retail_price,sales
department,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
books,2,2,2,2,2
electronics,1,1,1,1,1
food,2,2,2,2,2


In [7]:
df.groupby('department')['sales'].count() # The result is a series whose index contains the different values in department 

department
books          2
electronics    1
food           2
Name: sales, dtype: int64

Although we’ve used count in the examples here, we can use any aggregation method when grouping, such as mean, std, min, max, or sum. So we can get the average product price per department in our store as follows:

In [8]:
df.groupby('department')['retail_price'].mean()

department
books            75.0
electronics    1000.0
food              3.0
Name: retail_price, dtype: float64

What if we want to know both the mean and the standard deviation of prices in our store, grouped by department? We can do that by altering the syntax somewhat: instead of calling an aggregation method directly, we can apply the agg method to our DataFrameGroupBy object.

In [9]:
df.groupby('department')['retail_price'].agg(['mean', 'std'])

Unnamed: 0_level_0,mean,std
department,Unnamed: 1_level_1,Unnamed: 2_level_1
books,75.0,0.0
electronics,1000.0,
food,3.0,2.828427


What if we want to run **multiple aggregations** on separate columns? In such a case, we don’t need to filter columns via square brackets. Rather, we can pass the entire DataFrameGroupBy object to agg. We then pass multiple keyword arguments to agg

For example, we can get the mean and standard deviation of retail_price per department as well as find the max sales for each department:

In [10]:
df.groupby('department').agg(
    mean_price = ('retail_price', 'mean'),
    std_price = ('retail_price', 'std'),
    max_sales = ('sales', 'max')
)

Unnamed: 0_level_0,mean_price,std_price,max_sales
department,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
books,75.0,0.0,1000
electronics,1000.0,,100
food,3.0,2.828427,300


#### Unsorting group keys
Normally, ``groupby sorts the group keys``. If you don’t want to see this, or ``if you are concerned that it’s making your query too slow``, you can pass sort=False to groupby:

In [11]:
df.groupby('department', sort=False).agg(
    mean_price = ('retail_price', 'mean'),
    std_price = ('retail_price', 'std'),
    max_sales = ('sales', 'max')
)

Unnamed: 0_level_0,mean_price,std_price,max_sales
department,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
electronics,1000.0,,100
books,75.0,0.0,1000
food,3.0,2.828427,300
