# Pandas Transform: Perform operations on groups and concatenate the results

## 1) Simple transform 

### First, Let's create a dummy dataFrame 

We assume that a customer can have n orders, an order can have m items, and items can be ordered more multiple times. 

In [1]:
import pandas as pd 

In [2]:
orders_df = pd.DataFrame()
orders_df["customer_id"] = [1, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 3]
orders_df["order_id"] = [1, 1, 1, 2, 2, 3, 3, 4, 5, 6, 6, 6]
orders_df["item"] = [
    "apples",
    "chocolate",
    "chocolate",
    "coffee",
    "coffee",
    "apples",
    "bananas",
    "coffee",
    "milkshake",
    "chocolate",
    "strawberry",
    "strawberry",
]

In [3]:
print(orders_df)

    customer_id  order_id        item
0             1         1      apples
1             1         1   chocolate
2             1         1   chocolate
3             1         2      coffee
4             1         2      coffee
5             2         3      apples
6             2         3     bananas
7             3         4      coffee
8             3         5   milkshake
9             3         6   chocolate
10            3         6  strawberry
11            3         6  strawberry


In [4]:
count_number_of_orders = lambda x: len(x.unique())

In [5]:
orders_df["number_of_orders_per_cient"] = (orders_df.groupby(['customer_id'])['order_id'].transform(count_number_of_orders))

In [6]:
orders_df 

Unnamed: 0,customer_id,order_id,item,number_of_orders_per_cient
0,1,1,apples,2
1,1,1,chocolate,2
2,1,1,chocolate,2
3,1,2,coffee,2
4,1,2,coffee,2
5,2,3,apples,1
6,2,3,bananas,1
7,3,4,coffee,3
8,3,5,milkshake,3
9,3,6,chocolate,3


## 2) Multiple results per group 

#### Using transform functions that return sub-calculations per group 

In the previous example, we had one result per client. However, functions returning different values for the group can also be applied. 

In [7]:
def multiple_items_per_order(_items):
    multiple_item_bool = _items.duplicated(keep = False)
    return (multiple_item_bool)
orders_df['item_duplictaed_per_order'] = (orders_df.groupby(['order_id'])['item'].transform(multiple_items_per_order))

In [9]:
print(orders_df)

    customer_id  order_id        item  number_of_orders_per_cient  \
0             1         1      apples                           2   
1             1         1   chocolate                           2   
2             1         1   chocolate                           2   
3             1         2      coffee                           2   
4             1         2      coffee                           2   
5             2         3      apples                           1   
6             2         3     bananas                           1   
7             3         4      coffee                           3   
8             3         5   milkshake                           3   
9             3         6   chocolate                           3   
10            3         6  strawberry                           3   
11            3         6  strawberry                           3   

    item_duplictaed_per_order  
0                       False  
1                        True  
2     