# Grouping with Dicts and Series

Grouping information may exist in a form other than an array. Let’s consider another example DataFrame:

In [54]:
import pandas as pd
import numpy as np
from pandas import DataFrame, Series

In [55]:
people = DataFrame(np.arange(25).reshape(5, 5),
                    columns= ['a', 'b', 'c', 'd', 'e'],
                    index = ['one', 'two', 'three', 'four', 'five'])

In [56]:
people['a'] = people['a'] - 2

people['e'] = people['a'] - 6

In [57]:
people

Unnamed: 0,a,b,c,d,e
one,-2,1,2,3,-8
two,3,6,7,8,-3
three,8,11,12,13,2
four,13,16,17,18,7
five,18,21,22,23,12


In [58]:
people[2:3][['b', 'c']] = np.nan

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  people[2:3][['b', 'c']] = np.nan


In [59]:
people.iloc[2:3, 1:3] = np.nan

In [60]:
people

Unnamed: 0,a,b,c,d,e
one,-2,1.0,2.0,3,-8
two,3,6.0,7.0,8,-3
three,8,,,13,2
four,13,16.0,17.0,18,7
five,18,21.0,22.0,23,12


Now, suppose I have a group correspondence for the columns and want to sum together the columns by group:

In [61]:
mapping = {'a': 'red',
            'b': 'red',
            'c': 'blue',
            'd': 'blue',
            'e': 'red',
            'f': 'orange'}

Now, you could easily construct an array from this dict to pass to groupby, but instead we can just pass the dict:

In [67]:
by_columns = people.groupby(mapping, axis = 1).sum()

In [68]:
by_columns

Unnamed: 0,blue,red
one,5.0,-9.0
two,15.0,6.0
three,13.0,10.0
four,35.0,36.0
five,45.0,51.0


The same functionality holds for Series, which can be viewed as a fixed size mapping. When I used Series as group keys in the above examples, pandas does, in fact, inspect each Series to ensure that its index is aligned with the axis it’s grouping

In [70]:
map_series = Series(mapping)

map_series

a       red
b       red
c      blue
d      blue
e       red
f    orange
dtype: object

In [71]:
people.groupby(map_series, axis = 1).count()

Unnamed: 0,blue,red
one,2,3
two,2,3
three,1,2
four,2,3
five,2,3
