# Groupby

The groupby method allows you to group rows of data together and call aggregate functions

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('5_sales.csv')

In [None]:
# you can convert your data to Dataframe using the following code
#df = pd.DataFrame(data)

In [3]:
df.head()

Unnamed: 0,SalesRep,Region,Month,Sales,Units Sold
0,Amy,North,Jan,23040,239
1,Amy,North,Feb,24131,79
2,Amy,North,Mar,24646,71
3,Amy,North,Apr,22047,71
4,Amy,North,May,24971,80


** Now you can use the .groupby() method to group rows together based off of a column name. For instance let's group based off of Company. This will create a DataFrameGroupBy object:**

In [4]:
df.groupby('SalesRep')

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000024BF2C76EE0>

You can save this object as a new variable:

In [5]:
by_comp = df.groupby("SalesRep")

And then call aggregate methods off the object:

In [6]:
by_comp.mean()

Unnamed: 0_level_0,Sales,Units Sold
SalesRep,Unnamed: 1_level_1,Unnamed: 2_level_1
Amy,24575.5,202.666667
Bob,23476.25,169.583333
Chuck,21453.0,250.333333
Doug,27372.0,193.333333


In [7]:
df.groupby('SalesRep').mean()

Unnamed: 0_level_0,Sales,Units Sold
SalesRep,Unnamed: 1_level_1,Unnamed: 2_level_1
Amy,24575.5,202.666667
Bob,23476.25,169.583333
Chuck,21453.0,250.333333
Doug,27372.0,193.333333


More examples of aggregate methods:

In [8]:
by_comp.std()

Unnamed: 0_level_0,Sales,Units Sold
SalesRep,Unnamed: 1_level_1,Unnamed: 2_level_1
Amy,1142.101452,209.702531
Bob,1706.800203,117.599829
Chuck,1479.552635,261.063675
Doug,1876.092021,215.187586


In [9]:
by_comp.min()

Unnamed: 0_level_0,Region,Month,Sales,Units Sold
SalesRep,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Amy,North,Apr,22047,71
Bob,North,Apr,20024,68
Chuck,South,Apr,19625,70
Doug,South,Apr,25041,81


In [10]:
by_comp.max()

Unnamed: 0_level_0,Region,Month,Sales,Units Sold
SalesRep,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Amy,North,Sep,25899,706
Bob,North,Sep,25999,465
Chuck,South,Sep,23965,769
Doug,South,Sep,29953,852


In [11]:
by_comp.count()

Unnamed: 0_level_0,Region,Month,Sales,Units Sold
SalesRep,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Amy,12,12,12,12
Bob,12,12,12,12
Chuck,12,12,12,12
Doug,12,12,12,12


In [12]:
by_comp.describe()

Unnamed: 0_level_0,Sales,Sales,Sales,Sales,Sales,Sales,Sales,Sales,Units Sold,Units Sold,Units Sold,Units Sold,Units Sold,Units Sold,Units Sold,Units Sold
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
SalesRep,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2
Amy,12.0,24575.5,1142.101452,22047.0,24196.25,24662.0,25450.0,25899.0,12.0,202.666667,209.702531,71.0,79.75,93.5,194.75,706.0
Bob,12.0,23476.25,1706.800203,20024.0,22788.25,23500.5,24763.25,25999.0,12.0,169.583333,117.599829,68.0,93.0,111.5,238.25,465.0
Chuck,12.0,21453.0,1479.552635,19625.0,20181.5,21428.5,22301.5,23965.0,12.0,250.333333,261.063675,70.0,92.0,131.0,246.75,769.0
Doug,12.0,27372.0,1876.092021,25041.0,25752.25,27242.0,29130.25,29953.0,12.0,193.333333,215.187586,81.0,90.5,111.5,204.25,852.0


In [13]:
by_comp.describe().transpose()

Unnamed: 0,SalesRep,Amy,Bob,Chuck,Doug
Sales,count,12.0,12.0,12.0,12.0
Sales,mean,24575.5,23476.25,21453.0,27372.0
Sales,std,1142.101452,1706.800203,1479.552635,1876.092021
Sales,min,22047.0,20024.0,19625.0,25041.0
Sales,25%,24196.25,22788.25,20181.5,25752.25
Sales,50%,24662.0,23500.5,21428.5,27242.0
Sales,75%,25450.0,24763.25,22301.5,29130.25
Sales,max,25899.0,25999.0,23965.0,29953.0
Units Sold,count,12.0,12.0,12.0,12.0
Units Sold,mean,202.666667,169.583333,250.333333,193.333333


In [14]:
by_comp.describe().transpose()['Amy']

Sales       count       12.000000
            mean     24575.500000
            std       1142.101452
            min      22047.000000
            25%      24196.250000
            50%      24662.000000
            75%      25450.000000
            max      25899.000000
Units Sold  count       12.000000
            mean       202.666667
            std        209.702531
            min         71.000000
            25%         79.750000
            50%         93.500000
            75%        194.750000
            max        706.000000
Name: Amy, dtype: float64