ðŸŸ¦ 1. Import Required Library

In [16]:
# ðŸŸ¦ Import
import pandas as pd
import numpy as np

# ðŸŸ¦ Example DataFrame (transit / score style)
df = pd.DataFrame({
    "route_id":    [10, 10, 10, 20, 20, 30, 30, 30],
    "direction":   [0, 0, 1, 0, 1, 0, 0, 1],
    "delay_min":   [3, 5, 0, 7, 3, 1, 4, 2],
    "trip_duration":[25, 30, 28, 45, 50, 20, 35, 22],
    "score":       [0.8, 0.6, 0.9, 0.5, 0.7, 0.95, 0.4, 0.85]
})

df


Unnamed: 0,route_id,direction,delay_min,trip_duration,score
0,10,0,3,25,0.8
1,10,0,5,30,0.6
2,10,1,0,28,0.9
3,20,0,7,45,0.5
4,20,1,3,50,0.7
5,30,0,1,20,0.95
6,30,0,4,35,0.4
7,30,1,2,22,0.85


ðŸŸ¦ 2. Define and Apply Your Own Aggregation Functions

In [17]:
# Example: custom function that returns the range (max - min)
def range_value(x):
    return x.max() - x.min()

# Apply custom aggregator to a grouped column
df.groupby("route_id")["delay_min"].agg(range_value).reset_index(name="delay_range")


Unnamed: 0,route_id,delay_range
0,10,5
1,20,4
2,30,3


ðŸŸ¦ 3. Using Lambda Functions with .agg()

In [19]:
# Lambda to compute midpoint between min and max
df.groupby("route_id")["delay_min"].agg(lambda x: (x.max() + x.min()) / 2).reset_index(name="delay_midpoint")


Unnamed: 0,route_id,delay_midpoint
0,10,2.5
1,20,5.0
2,30,2.5


ðŸŸ¦ 4. Combine Built-In and Custom Functions (Named Aggregation)

In [20]:
# Define another custom function
def iqr(x):
    return x.quantile(0.75) - x.quantile(0.25)

# Use named-aggregation style to produce clear column names
summary = df.groupby(["route_id", "direction"]).agg(
    avg_delay=("delay_min", "mean"),
    max_delay=("delay_min", "max"),
    delay_range=("delay_min", range_value),
    delay_iqr=("delay_min", iqr),
    total_duration=("trip_duration", "sum"),
    score_midpoint=("score", lambda s: (s.max() + s.min())/2)
).reset_index()

summary


Unnamed: 0,route_id,direction,avg_delay,max_delay,delay_range,delay_iqr,total_duration,score_midpoint
0,10,0,4.0,5,2,1.0,55,0.7
1,10,1,0.0,0,0,0.0,28,0.9
2,20,0,7.0,7,0,0.0,45,0.5
3,20,1,3.0,3,0,0.0,50,0.7
4,30,0,2.5,4,3,1.5,55,0.675
5,30,1,2.0,2,0,0.0,22,0.85


ðŸŸ¦ 5. Using apply() for Group-wise Complex Logic

In [21]:
# Example: return multiple custom metrics per group as a DataFrame row
def custom_metrics(group):
    return pd.Series({
        "n_trips": len(group),
        "pct_long_trips": (group["trip_duration"] > 30).mean(),
        "median_score": group["score"].median()
    })

df.groupby("route_id").apply(custom_metrics).reset_index()


  df.groupby("route_id").apply(custom_metrics).reset_index()


Unnamed: 0,route_id,n_trips,pct_long_trips,median_score
0,10,3.0,0.0,0.8
1,20,2.0,1.0,0.6
2,30,3.0,0.333333,0.85


## ðŸŸ¦ Summary

In this subsection, you learned how to:

Group by multiple columns to create hierarchical data

Aggregate multiple metrics on grouped data

Flatten MultiIndex with reset_index() for easier use

Compare nested groups to reveal patterns