# Pandas tip #3: Transform .groupby() result back into original DataFrame
Sometime you need to get a statistic from a subgroup in a dataset and require this statistic to be in the original dataset. Previously I did this in multiple steps but this can also be achieved by a lesser known method in Pandas: `.transform()`.

The `.transform()` method acts very similarly to the `.apply()` function and is especially powerfull after a `.groupby()`. It does a sort of `.apply()` on the groupby result and then transforms this into the length of the original DataFrame. Lets have a look a some artificial data:

In [15]:
import pandas as pd
import random

N = 100  # number of samples
max_amount = 1000  # maximum amount spend
groups = ['A', 'B', 'C', 'D']  # Groups
random.seed(42)

df = pd.DataFrame([
    {
        'id': ix, 
        'group': random.choice(groups), 
        'spend_money': round(random.random() * max_amount, 2),
    } for ix in range(N)
])

Using .value_counts() we can quickly see how often groups occur.

In [17]:
# https://linkedin.com/in/dennisbakhuis
df['group'].value_counts()

A    29
B    25
D    24
C    22
Name: group, dtype: int64

Now, as an example, lets say we want to see the average `money_spend` for each group. This can easily be done using a `.groupby()`

In [22]:
(df
    .groupby('group')['spend_money']
    .mean()
)

group
A    504.031724
B    506.783600
C    530.030909
D    448.651250
Name: spend_money, dtype: float64

To "transform" these results back to the original DataFrame we can make use of the `.transform()` function:

In [25]:
df['group_mean'] = (df
    .groupby('group')['spend_money']
    .transform('mean')
)
df

Unnamed: 0,id,group,spend_money,group_mean
0,0,A,25.01,504.031724
1,1,C,244.89,530.030909
2,2,B,736.47,506.783600
3,3,A,590.49,504.031724
4,4,A,29.80,504.031724
...,...,...,...,...
95,95,C,800.59,530.030909
96,96,A,248.66,504.031724
97,97,B,536.29,506.783600
98,98,B,421.88,506.783600


The `.transform()` is very similar to an apply function and we can give any arbitrary function. For example, we want to select the second smallest value from each group. One way to do this is the following:

In [29]:
(df
    .loc[df['group']=='A', 'spend_money']
    .sort_values()
    .iloc[1]
)

29.8

Using `.transform()` we can als make use of a function:

In [31]:
def second_from_group(group):
    if len(group) >= 2:
        return group.sort_values().iloc[1]
    else:
        return None

df['2nd_place'] = (df
    .groupby('group')['spend_money']
    .transform(second_from_group)
)
df

Unnamed: 0,id,group,spend_money,group_mean,2nd_place
0,0,A,25.01,504.031724,29.80
1,1,C,244.89,530.030909,111.55
2,2,B,736.47,506.783600,94.33
3,3,A,590.49,504.031724,29.80
4,4,A,29.80,504.031724,29.80
...,...,...,...,...,...
95,95,C,800.59,530.030909,111.55
96,96,A,248.66,504.031724,29.80
97,97,B,536.29,506.783600,94.33
98,98,B,421.88,506.783600,94.33


If you have any questions, comments, or requests, feel free to [contact me on LinkedIn](https://linkedin.com/in/dennisbakhuis).