# Iterating Over Groups

The GroupBy object supports iteration, generating a sequence of 2-tuples containing the group name along with the chunk of data. Consider the following small example data set:

In [1]:
import pandas as pd
import numpy as np
from pandas import DataFrame, Series

In [2]:
df = DataFrame({'key1' : ['a', 'a', 'b', 'b', 'a'],
                'key2' : ['one', 'two', 'one', 'two', 'one'],
                'data1' : np.arange(5),
                'data2' : np.random.randn(5),
                'data3' : ['aa', 'bb', 'bb', 'aa', 'bb']})

In [55]:
for i, j in df.groupby('key1'):

    print(i)
    print(j)

a
  key1 key2  data1     data2 data3
0    a  one      0  0.813068    aa
1    a  two      1  0.296137    bb
4    a  one      4 -1.684529    bb
b
  key1 key2  data1     data2 data3
2    b  one      2  0.225491    bb
3    b  two      3 -1.916575    aa


In [64]:
data = DataFrame({'aa' : ['a', 'a','b', 'c', 'b'],
'bb' : [1,2,3,4,5]})

In [67]:
for i, j in data.groupby('aa'):

    print(i)
    print(j)

a
  aa  bb
0  a   1
1  a   2
b
  aa  bb
2  b   3
4  b   5
c
  aa  bb
3  c   4


In the case of multiple keys, the first element in the tuple will be a tuple of key values:

In [87]:
for (i, j), k in df.groupby(['data3', 'key1']):
    print(i, '\n', j, '\n', k)

aa 
 a 
   key1 key2  data1     data2 data3
0    a  one      0  0.813068    aa
aa 
 b 
   key1 key2  data1     data2 data3
3    b  two      3 -1.916575    aa
bb 
 a 
   key1 key2  data1     data2 data3
1    a  two      1  0.296137    bb
4    a  one      4 -1.684529    bb
bb 
 b 
   key1 key2  data1     data2 data3
2    b  one      2  0.225491    bb


Of course, you can choose to do whatever you want with the pieces of data. A recipe you may find useful is computing a dict of the data pieces as a one-liner:

In [103]:
pieces = dict(list(df.groupby('data3')))

In [104]:
pieces

{'aa':   key1 key2  data1     data2 data3
 0    a  one      0  0.813068    aa
 3    b  two      3 -1.916575    aa,
 'bb':   key1 key2  data1     data2 data3
 1    a  two      1  0.296137    bb
 2    b  one      2  0.225491    bb
 4    a  one      4 -1.684529    bb}

By default groupby groups on axis=0, but you can group on any of the other axes. For example, we could group the columns of our example df here by dtype like so:

In [107]:
df.dtypes

key1      object
key2      object
data1      int32
data2    float64
data3     object
dtype: object

In [108]:
grouped = df.groupby(df.dtypes, axis= 1)

In [112]:
for i,j in grouped:
    print(i, j)

int32    data1
0      0
1      1
2      2
3      3
4      4
float64       data2
0  0.813068
1  0.296137
2  0.225491
3 -1.916575
4 -1.684529
object   key1 key2 data3
0    a  one    aa
1    a  two    bb
2    b  one    bb
3    b  two    aa
4    a  one    bb


In [113]:
dict(list(grouped))

{dtype('int32'):    data1
 0      0
 1      1
 2      2
 3      3
 4      4,
 dtype('float64'):       data2
 0  0.813068
 1  0.296137
 2  0.225491
 3 -1.916575
 4 -1.684529,
 dtype('O'):   key1 key2 data3
 0    a  one    aa
 1    a  two    bb
 2    b  one    bb
 3    b  two    aa
 4    a  one    bb}