# The transform function
While aggregation must return a reduced version of the data, transformation can return some transformed version of the full data to recombine. For such a transformation, the output is the same shape as the input. A common example is to center the data by subtracting the group-wise mean.

Say we have sales data with 3 different orders (10001, 10005 and 10006) and that each order consists has multiple products (aka skus).

The question we would like to answer is: “What percentage of the order total does each sku represent?”

In [1]:
%autosave 30
import pandas as pd

df = pd.read_excel("sales_transactions.xlsx")

df.groupby('order')["ext price"].transform('sum')

Autosaving every 30 seconds


0      576.12
1      576.12
2      576.12
3     8185.49
4     8185.49
5     8185.49
6     8185.49
7     8185.49
8     3724.49
9     3724.49
10    3724.49
11    3724.49
Name: ext price, dtype: float64

Instead of only showing the totals for 3 orders, we retain the same number of items as the original data set. That is the unique feature of using transform . Now we can add those order totals and a calculated percent of order

In [3]:
df["Order_Total"] = df.groupby('order')["ext price"].transform('sum')
df["Percent_of_Order"] = df["ext price"] / df["Order_Total"]
df.head()

Unnamed: 0,account,name,order,sku,quantity,unit price,ext price,Order_Total,Percent_of_Order
0,383080,Will LLC,10001,B1-20000,7,33.69,235.83,576.12,0.409342
1,383080,Will LLC,10001,S1-27722,11,21.12,232.32,576.12,0.403249
2,383080,Will LLC,10001,B1-86481,3,35.99,107.97,576.12,0.187409
3,412290,Jerde-Hilpert,10005,S1-06532,48,55.82,2679.36,8185.49,0.32733
4,412290,Jerde-Hilpert,10005,S1-82801,21,13.62,286.02,8185.49,0.034942


As an added bonus, you could combine into one statement if you did not want to show the individual order totals:

In [4]:
df["Percent_of_Order"] = df["ext price"] / df.groupby('order')["ext price"].transform('sum')

In [5]:
df.head()

Unnamed: 0,account,name,order,sku,quantity,unit price,ext price,Order_Total,Percent_of_Order
0,383080,Will LLC,10001,B1-20000,7,33.69,235.83,576.12,0.409342
1,383080,Will LLC,10001,S1-27722,11,21.12,232.32,576.12,0.403249
2,383080,Will LLC,10001,B1-86481,3,35.99,107.97,576.12,0.187409
3,412290,Jerde-Hilpert,10005,S1-06532,48,55.82,2679.36,8185.49,0.32733
4,412290,Jerde-Hilpert,10005,S1-82801,21,13.62,286.02,8185.49,0.034942
