# Using pandas.groupby

The `groupby` method of Pandas dataframes is useful when you have operations that you want to carry out separately on different groups within your data - for example, `count`, `sum`, `min`, `max`, `mean`, and `std`.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

In [2]:
from platform import python_version
python_version()

'3.6.6'

## Some data

Say you have a time series of data.

In [3]:
date_range = pd.date_range('2018-01-01', '2018-06-05', freq='5D')
n = len(date_range)
df = pd.DataFrame({
    'Date': date_range,
    'A': np.random.randn(n),
    'B': np.random.randint(0, 100, size=n)
})
df.head()

Unnamed: 0,Date,A,B
0,2018-01-01,0.67662,55
1,2018-01-06,0.217035,61
2,2018-01-11,0.471222,90
3,2018-01-16,-1.321683,82
4,2018-01-21,0.161252,68


and you want to see totals by quarter.

In [4]:
df['Quarter'] = df['Date'].dt.to_period('Q')
df.head()

Unnamed: 0,Date,A,B,Quarter
0,2018-01-01,0.67662,55,2018Q1
1,2018-01-06,0.217035,61,2018Q1
2,2018-01-11,0.471222,90,2018Q1
3,2018-01-16,-1.321683,82,2018Q1
4,2018-01-21,0.161252,68,2018Q1


## First, define a groupby object

In [5]:
groupby_quarter = df.groupby('Quarter')

groupby_quarter.count()

Unnamed: 0_level_0,Date,A,B
Quarter,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2018Q1,18,18,18
2018Q2,14,14,14


In [6]:
groupby_quarter.mean()

Unnamed: 0_level_0,A,B
Quarter,Unnamed: 1_level_1,Unnamed: 2_level_1
2018Q1,-0.135362,56.388889
2018Q2,0.100012,33.857143


## Grouping by index values

The group-by will also look in the index

In [14]:
df.set_index(['Quarter', 'Date']).groupby('Quarter').max()

Unnamed: 0_level_0,A,B
Quarter,Unnamed: 1_level_1,Unnamed: 2_level_1
2018Q1,1.533886,97
2018Q2,1.574606,68


## Iterate through the groups

In [8]:
for group in groupby_quarter:

    print(type(group), len(group))
    break

<class 'tuple'> 2


In [9]:
group[0]

Period('2018Q1', 'Q-DEC')

In [10]:
group[1]

Unnamed: 0,Date,A,B,Quarter
0,2018-01-01,0.67662,55,2018Q1
1,2018-01-06,0.217035,61,2018Q1
2,2018-01-11,0.471222,90,2018Q1
3,2018-01-16,-1.321683,82,2018Q1
4,2018-01-21,0.161252,68,2018Q1
5,2018-01-26,-2.100279,38,2018Q1
6,2018-01-31,0.776922,88,2018Q1
7,2018-02-05,1.005232,28,2018Q1
8,2018-02-10,-0.95678,45,2018Q1
9,2018-02-15,1.153897,97,2018Q1


In [11]:
group_data['B']

NameError: name 'group_data' is not defined

In [None]:
for quarter, group_data in groupby_quarter:
    
    group_data['B'].plot.bar()
    plt.title(quarter)
    plt.grid()
    plt.show()