ðŸŸ¦ 1. Import Libraries and DataSet

In [1]:
import pandas as pd

data = {
    "route_id": [10, 10, 10, 20, 20, 30, 30],
    "direction": [0, 0, 1, 0, 1, 0, 1],
    "delay_min": [2, 5, 0, 7, 3, 1, 4],
    "trip_id": [101, 102, 103, 104, 105, 106, 107]
}

df = pd.DataFrame(data)
df

Unnamed: 0,route_id,direction,delay_min,trip_id
0,10,0,2,101
1,10,0,5,102
2,10,1,0,103
3,20,0,7,104
4,20,1,3,105
5,30,0,1,106
6,30,1,4,107


ðŸŸ¦ 2. Grouping by a Single Column

In [4]:
df.groupby("route_id")

#(No calculation happens yet.)

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x10467e540>

ðŸŸ¦ 3. Basic Aggregations

In [6]:
df.groupby("route_id")["delay_min"].mean()

route_id
10    2.333333
20    5.000000
30    2.500000
Name: delay_min, dtype: float64

In [7]:
df.groupby("route_id")["delay_min"].sum()

route_id
10     7
20    10
30     5
Name: delay_min, dtype: int64

In [8]:
df.groupby("route_id")["delay_min"].count()

route_id
10    3
20    2
30    2
Name: delay_min, dtype: int64

In [9]:
df.groupby("route_id")["delay_min"].min()

route_id
10    0
20    3
30    1
Name: delay_min, dtype: int64

In [10]:
df.groupby("route_id")["delay_min"].max()

route_id
10    5
20    7
30    4
Name: delay_min, dtype: int64

ðŸŸ¦ 4. Multiple Aggregations Using .agg()

In [11]:
df.groupby("route_id")["delay_min"].agg(
    ["mean", "sum", "count", "min", "max"]
)

Unnamed: 0_level_0,mean,sum,count,min,max
route_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
10,2.333333,7,3,0,5
20,5.0,10,2,3,7
30,2.5,5,2,1,4


ðŸŸ¦ 5. Direct Aggregation vs .agg()

5.1 Direct aggregation

In [12]:
df.groupby("route_id")["delay_min"].mean()

route_id
10    2.333333
20    5.000000
30    2.500000
Name: delay_min, dtype: float64

5.2 Using .agg()

In [13]:
df.groupby("route_id").agg({
    "delay_min": "mean"
})

Unnamed: 0_level_0,delay_min
route_id,Unnamed: 1_level_1
10,2.333333
20,5.0
30,2.5


ðŸŸ¦ 6. Resetting Index

In [14]:
result = df.groupby("route_id")["delay_min"].mean().reset_index()
result

Unnamed: 0,route_id,delay_min
0,10,2.333333
1,20,5.0
2,30,2.5


ðŸŸ¦ 7. Group By Multiple Columns

In [15]:
df.groupby(["route_id", "direction"])["trip_id"].count().reset_index(name="trip_count")

Unnamed: 0,route_id,direction,trip_count
0,10,0,2
1,10,1,1
2,20,0,1
3,20,1,1
4,30,0,1
5,30,1,1


ðŸŸ¦ 8. Full Example

In [17]:
summary = (
    df.groupby("route_id")
      .agg(
          avg_delay=("delay_min", "mean"),
          total_delay=("delay_min", "sum"),
          trip_count=("trip_id", "count")
      )
      .reset_index()
)

summary

Unnamed: 0,route_id,avg_delay,total_delay,trip_count
0,10,2.333333,7,3
1,20,5.0,10,2
2,30,2.5,5,2


## âœ… Summary

| Concept                   | Example                   |
| ------------------------- | ------------------------- |
| Group single column       | `df.groupby("route_id")`  |
| Basic aggregation         | `.mean()`, `.sum()`       |
| Multiple aggregations     | `.agg(["mean", "max"])`   |
| Rename aggregated columns | `agg(avg=("col","mean"))` |
| Reset index               | `.reset_index()`          |

