# Group-wise Operations and Transformations

Aggregation is only one kind of group operation. It is a special case in the more general class of data transformations; that is, it accepts functions that reduce a one-dimensional array to a scalar value. In this section, I will introduce you to the transform and apply methods, which will enable you to do many other kinds of group operations.


Suppose, instead, we wanted to add a column to a DataFrame containing group means for each index. One way to do this is to aggregate, then merge:

In [1]:
import pandas as pd
import numpy as np
from pandas import DataFrame, Series

In [2]:
df = DataFrame({'data1': np.random.randn(5),
                'data2': np.arange(5),
                'key1': list('aabba'),
                'key2': ['one', 'two', 'one', 'two', 'one']})

In [3]:
df

Unnamed: 0,data1,data2,key1,key2
0,1.290252,0,a,one
1,-0.920904,1,a,two
2,0.609093,2,b,one
3,-1.064501,3,b,two
4,0.664424,4,a,one


In [4]:
k1_means = df.groupby('key1').sum().add_prefix('sum_')

In [5]:
k1_means

Unnamed: 0_level_0,sum_data1,sum_data2
key1,Unnamed: 1_level_1,Unnamed: 2_level_1
a,1.033772,5
b,-0.455408,5


In [12]:
pd.merge(df, k1_means, left_on='key1', right_index= True)

Unnamed: 0,data1,data2,key1,key2,sum_data1,sum_data2
0,1.290252,0,a,one,1.033772,5
1,-0.920904,1,a,two,1.033772,5
4,0.664424,4,a,one,1.033772,5
2,0.609093,2,b,one,-0.455408,5
3,-1.064501,3,b,two,-0.455408,5


This works, but is somewhat inflexible. You can think of the operation as transforming the two data columns using the np.mean function. Let’s look back at the people DataFrame from earlier in the chapter and use the transform method on GroupBy:

In [26]:
key = ['one', 'two', 'one', 'two', 'one']

In [27]:
people = DataFrame(np.random.randn(5, 5),
                    columns=['a', 'b', 'c', 'd', 'e'],
                    index=['Joe', 'Steve', 'Wes', 'Jim', 'Travis'])

In [29]:
people.groupby(key).sum()

Unnamed: 0,a,b,c,d,e
one,-0.591574,-0.337127,0.270753,-1.795446,-0.05078
two,-1.586022,1.683099,-0.830099,0.203077,-2.337448


In [32]:
people.groupby(key).transform(np.mean)

Unnamed: 0,a,b,c,d,e
Joe,-0.197191,-0.112376,0.090251,-0.598482,-0.016927
Steve,-0.793011,0.84155,-0.415049,0.101538,-1.168724
Wes,-0.197191,-0.112376,0.090251,-0.598482,-0.016927
Jim,-0.793011,0.84155,-0.415049,0.101538,-1.168724
Travis,-0.197191,-0.112376,0.090251,-0.598482,-0.016927


As you may guess, transform applies a function to each group, then places the results in the appropriate locations. If each group produces a scalar value, it will be propagated (broadcasted). Suppose instead you wanted to subtract the mean value from each group. To do this, create a demeaning function and pass it to transform:

In [33]:
def demean(arr):
    return arr - arr.mean()

In [35]:
demeaned = people.groupby(key).transform(demean)

In [36]:
demeaned

Unnamed: 0,a,b,c,d,e
Joe,0.297739,0.149603,0.390231,-1.119151,1.510933
Steve,0.851034,1.06223,0.263462,0.16301,0.127698
Wes,-1.678822,0.490048,-0.712612,2.062036,-1.973128
Jim,-0.851034,-1.06223,-0.263462,-0.16301,-0.127698
Travis,1.381083,-0.639652,0.322381,-0.942885,0.462195


In [38]:
demeaned.groupby(key).sum()

Unnamed: 0,a,b,c,d,e
one,0.0,0.0,0.0,-3.330669e-16,2.220446e-16
two,0.0,-2.220446e-16,0.0,0.0,0.0
