# Groupby() 

Descriptions ; 

In [1]:
import pandas as pd 

In [2]:
table_descriptions = pd.DataFrame({
    'Parameter': [
        'iterable', 
        'key',
    ], 
    'Details': [
        'Any python iterable', 
        'Function(criteria) on which to group the iterable', 
    ], 
})

In [3]:
table_descriptions

Unnamed: 0,Parameter,Details
0,iterable,Any python iterable
1,key,Function(criteria) on which to group the iterable


In Python, the itertools.groupby() method allows developers to group values of an iterable class based on a
specified property into another iterable set of values.

## 1) Example 1: 

In this example we see what happens when we use different types of iterable.

In [4]:
things = [
    ("animal", "bear"),
    ("animal", "duck"),
    ("plant", "cactus"),
    ("vehicle", "harley"),
    ("vehicle", "speed boat"),
    ("vehicle", "school bus"),
]

In [5]:
dic = {} 

In [6]:
f = lambda x:x[0] 

In [7]:
from itertools import groupby

In [10]:
for key, group in groupby(sorted(things, key=f), f): 
    dic[key] = list(group) 

In [11]:
dic

{'animal': [('animal', 'bear'), ('animal', 'duck')],
 'plant': [('plant', 'cactus')],
 'vehicle': [('vehicle', 'harley'),
  ('vehicle', 'speed boat'),
  ('vehicle', 'school bus')]}

This example below is essentially the same as the one above it. The only difference is that I have changed all the
tuples to lists.

In [12]:
things = [["animal", "bear"], ["animal", "duck"], ["vehicle", "harley"], ["plant", "cactus"], \
["vehicle", "speed boat"], ["vehicle", "school bus"]]
dic = {}
f = lambda x: x[0]
for key, group in groupby(sorted(things, key=f), f):
    dic[key] = list(group)
dic

{'animal': [['animal', 'bear'], ['animal', 'duck']],
 'plant': [['plant', 'cactus']],
 'vehicle': [['vehicle', 'harley'],
  ['vehicle', 'speed boat'],
  ['vehicle', 'school bus']]}

## 2) Example 2: 

This example illustrates how the default key is chosen if we do not specify any

In [13]:
c = groupby(['goat', 'dog', 'cow', 1,1,2,3,11,18, ('persons', 'man', 'woman')])

In [14]:
dic = {} 

In [15]:
for k,v in c: 
    dic[k] = list(v) 

Results in

In [16]:
dic

{'goat': ['goat'],
 'dog': ['dog'],
 'cow': ['cow'],
 1: [1, 1],
 2: [2],
 3: [3],
 11: [11],
 18: [18],
 ('persons', 'man', 'woman'): [('persons', 'man', 'woman')]}

Notice here that the tuple as a whole counts as one key in this list

## 3) Example 3: 

Notice in this example that mulato and camel don't show up in our result. Only the last element with the specified
key shows up. The last result for c actually wipes out two previous results. But watch the new version where I have
the data sorted first on same key

In [17]:
list_things = ['goat', 'dog', 'donkey', 'mulato', 'cow', 'cat', ('persons', 'man', 'woman'), \
'wombat', 'mongoose', 'malloo', 'camel']
c = groupby(list_things, key=lambda x: x[0])
dic = {}
for k, v in c:
    dic[k] = list(v)


Results in 

In [18]:
dic

{'g': ['goat'],
 'd': ['dog', 'donkey'],
 'm': ['mongoose', 'malloo'],
 'c': ['camel'],
 'persons': [('persons', 'man', 'woman')],
 'w': ['wombat']}

Sorted Version

In [19]:
list_things = ['goat', 'dog', 'donkey', 'mulato', 'cow', 'cat', ('persons', 'man', 'woman'), \
'wombat', 'mongoose', 'malloo', 'camel']
sorted_list = sorted(list_things, key = lambda x: x[0])
print(sorted_list)
print()
c = groupby(sorted_list, key=lambda x: x[0])
dic = {}
for k, v in c:
    dic[k] = list(v)

['cow', 'cat', 'camel', 'dog', 'donkey', 'goat', 'mulato', 'mongoose', 'malloo', ('persons', 'man', 'woman'), 'wombat']



Results in 

In [20]:
dic

{'c': ['cow', 'cat', 'camel'],
 'd': ['dog', 'donkey'],
 'g': ['goat'],
 'm': ['mulato', 'mongoose', 'malloo'],
 'persons': [('persons', 'man', 'woman')],
 'w': ['wombat']}