### Grouping

Sometimes we want to loop over an iterable of elements, but want to group those elements as we iterate through them.

Suppose we have an iterable containing tuples, and want to group based on the first element of each tuple

(1, 10, 100)  
(1, 11, 101)  
(1, 12, 102)  
(2, 20, 200)  
(2, 21, 201)   
(3, 30, 300)  
(3, 31, 301)  
(3, 32, 302)  

The first 3 tuples would be group 1, the last 3 tuples would be group 3, while the middle 2 would be group 2.

We would like to iterate using this kind of approach:

for key, group in groups:  
    print(key)  
    for item in group:  
        print(item)  

key -> 1  
(1, 10, 100)  
(1, 11, 101)  
(1, 12, 102)  
  
key -> 2  
(2, 20, 200)  
(2, 21, 201)  
  
key -> 3  
(3, 30, 300)  
(3, 31, 301)  
(3, 32, 302) 

itertools.*groupby(data, [keyfunc])* -> lazy iterator

The grouby function allows us to do precisely that  
-> normally specify keyfunc which calculates the key we want to use for grouping

iterable  
(1, 10, 100)  
(1, 11, 101)  
(1, 12, 102)  
(2, 20, 200)  
(2, 21, 201)   
(3, 30, 300)  
(3, 31, 301)  
(3, 32, 302)  

notice how the sequence is sorted by the grouping key!

Here, we want to group based on the 1st element of each tuple

-> grouping key ->  lambda x: x[0]

groupby(iterable, lambda x: x[0]) -> iterator -> of tuples (key, sub_iterator)  
ie  
1, sub_iterator -> (1, 10, 100), (1, 11, 100), (1, 12, 102)  
2, sub_iterator -> (2, 20, 200), (2, 21, 201)  
3, sub_iterator -> (3, 30, 300), (3, 31, 301), (3, 32, 302)

**IMPORTANT NOTE**

The sequence of elements producted from the "sub-iterators" are all produced from the same undelying iterator! See this in the code below, or refer to the video.

So if groups = groupby(iterable, lambda x: x[0]),  
next(groups) actually iterates through all the elements of the current 'sub-iterator' before proceeding to the next group.

#### Code Examples

In [1]:
import itertools

with open('cars_2014.csv') as f:
    for row in itertools.islice(f, 0, 20):
        print(row, end='')

make,model
ACURA,ILX
ACURA,MDX
ACURA,RDX
ACURA,RLX
ACURA,TL
ACURA,TSX
ALFA ROMEO,4C
ALFA ROMEO,GIULIETTA
APRILIA,CAPONORD 1200
APRILIA,RSV4 FACTORY APRC ABS
APRILIA,RSV4 R APRC ABS
APRILIA,SHIVER 750
ARCTIC CAT,1000 XT
ARCTIC CAT,500 XT
ARCTIC CAT,550 XT
ARCTIC CAT,700 LTD
ARCTIC CAT,700 SUPER DUTY DIESEL
ARCTIC CAT,700 XT
ARCTIC CAT,90 2X4 4-STROKE


In [3]:
from collections import defaultdict

makes = defaultdict(int)

In [4]:
makes['sdasdad']

0

In [6]:
makes['BMW'] += 1

In [7]:
makes['BMW']

1

In [8]:
makes['BMW'] += 1

In [9]:
makes['BMW']

2

In [11]:
makes = defaultdict(int)

with open('cars_2014.csv') as f:
    next(f)
    for row in f:
        make, _ = row.strip('\n').split(',')
        makes[make] += 1
for key, value in makes.items():
    print(f'{key}: {value}')

ACURA: 6
ALFA ROMEO: 2
APRILIA: 4
ARCTIC CAT: 96
ARGO: 4
ASTON MARTIN: 5
AUDI: 27
BENTLEY: 2
BLUE BIRD: 1
BMW: 86
BUGATTI: 1
BUICK: 5
CADILLAC: 7
CAN-AM: 61
CHEVROLET: 33
CHRYSLER: 2
DODGE: 7
DUCATI: 4
FERRARI: 6
FIAT: 2
FORD: 34
FREIGHTLINER: 7
GMC: 12
HARLEY DAVIDSON: 29
HINO: 7
HONDA: 91
HUSABERG: 4
HUSQVARNA: 9
HYUNDAI: 13
INDIAN: 3
INFINITI: 8
JAGUAR: 9
JEEP: 5
JOHN DEERE: 19
KAWASAKI: 59
KENWORTH: 11
KIA: 10
KTM: 13
KUBOTA: 4
KYMCO: 28
LAMBORGHINI: 2
LAND ROVER: 6
LEXUS: 14
LINCOLN: 6
LOTUS: 1
MACK: 9
MASERATI: 3
MAZDA: 5
MCLAREN: 2
MERCEDES-BENZ: 60
MINI: 3
MITSUBISHI: 8
NISSAN: 24
PEUGEOT: 3
POLARIS: 101
PORSCHE: 4
RAM: 6
RENAULT: 4
ROLLS ROYCE: 3
SCION: 5
SEAT: 3
SKI-DOO: 67
SMART: 1
SRT: 1
SUBARU: 10
SUZUKI: 48
TESLA: 2
TOYOTA: 19
TRIUMPH: 10
VESPA: 4
VICTORY: 14
VOLKSWAGEN: 16
VOLVO: 8
YAMAHA: 110


In [18]:
data = (1, 2, 2, 2, 3)

In [17]:
list(itertools.groupby(data))

[(1, <itertools._grouper at 0x1c89489e7c8>),
 (2, <itertools._grouper at 0x1c89489e808>),
 (3, <itertools._grouper at 0x1c89489e848>),
 (1, <itertools._grouper at 0x1c89489e888>)]

In [19]:
it = itertools.groupby(data)
for group in it:
    print(group[0], list(group[1]))

1 [1]
2 [2, 2, 2]
3 [3]


In [20]:
it = itertools.groupby(data)
for group_key, sub_iter in it:
    print(group_key, list(sub_iter))

1 [1]
2 [2, 2, 2]
3 [3]


In [21]:
data = (
    (1, 'abc'),
    (1, 'bcs'),
    
    (2, 'pyt'),
    (2, 'yth'),
    (2, 'tho'),
    
    (3, 'hon')
)

In [22]:
data

((1, 'abc'), (1, 'bcs'), (2, 'pyt'), (2, 'yth'), (2, 'tho'), (3, 'hon'))

In [24]:
groups = itertools.groupby(data, key=lambda x: x[0])

In [25]:
list(groups)

[(1, <itertools._grouper at 0x1c8948972c8>),
 (2, <itertools._grouper at 0x1c8948971c8>),
 (3, <itertools._grouper at 0x1c894897408>)]

In [26]:
list(groups)

[]

In [27]:
groups = itertools.groupby(data, key=lambda x: x[0])
for group_key, sub_iter in groups:
    print(group_key, list(sub_iter))

1 [(1, 'abc'), (1, 'bcs')]
2 [(2, 'pyt'), (2, 'yth'), (2, 'tho')]
3 [(3, 'hon')]


In [29]:
def gen_groups():
    for key in range(1, 4):
        for i in range(3):
            yield (key, i)

In [30]:
g = gen_groups()

In [31]:
list(g)

[(1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2), (3, 0), (3, 1), (3, 2)]

In [32]:
list(g)

[]

In [38]:
g = gen_groups()

In [39]:
groups = itertools.groupby(g, key=lambda x: x[0])

In [41]:
for group in groups:
    print(group[0], list(group[1]))

1 [(1, 0), (1, 1), (1, 2)]
2 [(2, 0), (2, 1), (2, 2)]
3 [(3, 0), (3, 1), (3, 2)]


In [42]:
list(g)

[]

In [43]:
list(groups)

[]

In [46]:
with open('cars_2014.csv') as f:
    make_groups = itertools.groupby(f, key=lambda x: x.split(',')[0])

In [47]:
list(make_groups)

ValueError: I/O operation on closed file.

This occurs because make_groups is a lazy iterator, and f has been closed before we try to iterate through make_groups

In [49]:
with open('cars_2014.csv') as f:
    # Skip first row
    next(f)
    make_groups = itertools.groupby(f, key=lambda x: x.split(',')[0])
    print(list(itertools.islice(make_groups, 5)))

[('ACURA', <itertools._grouper object at 0x000001C8948B2D08>), ('ALFA ROMEO', <itertools._grouper object at 0x000001C8948B06C8>), ('APRILIA', <itertools._grouper object at 0x000001C8948B0808>), ('ARCTIC CAT', <itertools._grouper object at 0x000001C8948B0588>), ('ARGO', <itertools._grouper object at 0x000001C8948B0448>)]


In [51]:
with open('cars_2014.csv') as f:
    # Skip first row
    next(f)
    make_groups = itertools.groupby(f, key=lambda x: x.split(',')[0])
    make_counts = ((key, len(models)) for key, models in make_groups)
    print(list(make_counts))

TypeError: object of type 'itertools._grouper' has no len()

In [62]:
def squares(n):
    for i in range(n):
        yield i ** 2

In [63]:
sq = squares(5)

In [64]:
len(sq)

TypeError: object of type 'generator' has no len()

In [65]:
i = 0
for item in sq:
    i += 1
    
print(i)

5


In [66]:
def len_iterable(iterable):
    i = 0
    for item in iterable:
        i += 1
    return i

In [67]:
len_iterable(squares(8))

8

In [68]:
'a', 'b', 'c'

('a', 'b', 'c')

In [70]:
sum((1, 1, 1))

3

In [73]:
sum(1 for i in squares(8))

8

In [75]:
with open('cars_2014.csv') as f:
    # Skip first row
    next(f)
    make_groups = itertools.groupby(f, key=lambda x: x.split(',')[0])
    make_counts = ((key, sum(1 for model in models))
                   for key, models in make_groups)
    print(list(make_counts))

[('ACURA', 6), ('ALFA ROMEO', 2), ('APRILIA', 4), ('ARCTIC CAT', 96), ('ARGO', 4), ('ASTON MARTIN', 5), ('AUDI', 27), ('BENTLEY', 2), ('BLUE BIRD', 1), ('BMW', 86), ('BUGATTI', 1), ('BUICK', 5), ('CADILLAC', 7), ('CAN-AM', 61), ('CHEVROLET', 33), ('CHRYSLER', 2), ('DODGE', 7), ('DUCATI', 4), ('FERRARI', 6), ('FIAT', 2), ('FORD', 34), ('FREIGHTLINER', 7), ('GMC', 12), ('HARLEY DAVIDSON', 29), ('HINO', 7), ('HONDA', 91), ('HUSABERG', 4), ('HUSQVARNA', 9), ('HYUNDAI', 13), ('INDIAN', 3), ('INFINITI', 8), ('JAGUAR', 9), ('JEEP', 5), ('JOHN DEERE', 19), ('KAWASAKI', 59), ('KENWORTH', 11), ('KIA', 10), ('KTM', 13), ('KUBOTA', 4), ('KYMCO', 28), ('LAMBORGHINI', 2), ('LAND ROVER', 6), ('LEXUS', 14), ('LINCOLN', 6), ('LOTUS', 1), ('MACK', 9), ('MASERATI', 3), ('MAZDA', 5), ('MCLAREN', 2), ('MERCEDES-BENZ', 60), ('MINI', 3), ('MITSUBISHI', 8), ('NISSAN', 24), ('PEUGEOT', 3), ('POLARIS', 101), ('PORSCHE', 4), ('RAM', 6), ('RENAULT', 4), ('ROLLS ROYCE', 3), ('SCION', 5), ('SEAT', 3), ('SKI-DOO', 6