---
title: "Group By Operations in Pandas"
author: "Mohammed Adil Siraju"
date: "2025-09-21"
categories: [pandas, dataframe, group-by, aggregation]
description: "Master Pandas groupby operations: splitting data by categories, applying functions, and combining results. Learn aggregation, transformation, and filtering techniques for data analysis."
---

GroupBy operations are one of the most powerful features in Pandas for data analysis. They allow you to:

- **Split** data into groups based on criteria
- **Apply** functions to each group independently
- **Combine** the results back into a DataFrame

This notebook covers essential groupby techniques including aggregation functions, multiple aggregations, and advanced operations.

## 1. Setting Up Sample Data

Let's create a sample dataset to demonstrate groupby operations. We'll work with categorical data and numerical values.

In [4]:
import pandas as pd

data = {
    'Category': ['A', 'B', 'A', 'B', 'A'],
    'Value': [10,15,20,25,30]
}

df = pd.DataFrame(data)

## 2. Basic Aggregation Functions

Groupby operations allow you to calculate summary statistics for each group. Here are the most common aggregation functions:

### Sum Aggregation

Calculate the total sum of values for each category:

In [8]:
df.groupby('Category').sum()

Unnamed: 0_level_0,Value
Category,Unnamed: 1_level_1
A,60
B,40


### Mean Aggregation

Calculate the average value for each category:

In [9]:
df.groupby('Category').mean()

Unnamed: 0_level_0,Value
Category,Unnamed: 1_level_1
A,20.0
B,20.0


### Median Aggregation

Calculate the median (middle) value for each category:

In [10]:
df.groupby('Category').median()

Unnamed: 0_level_0,Value
Category,Unnamed: 1_level_1
A,20.0
B,20.0


### Maximum Values

Find the highest value in each category:

In [11]:
df.groupby('Category').max()

Unnamed: 0_level_0,Value
Category,Unnamed: 1_level_1
A,30
B,25


### Minimum Values

Find the lowest value in each category:

In [12]:
df.groupby('Category').min()

Unnamed: 0_level_0,Value
Category,Unnamed: 1_level_1
A,10
B,15


### Standard Deviation

Measure the spread of values within each category:

In [13]:
df.groupby('Category').std()

Unnamed: 0_level_0,Value
Category,Unnamed: 1_level_1
A,10.0
B,7.071068


### Variance

Calculate the variance (squared standard deviation) for each category:

In [14]:
df.groupby('Category').var()

Unnamed: 0_level_0,Value
Category,Unnamed: 1_level_1
A,100.0
B,50.0


## 3. Multiple Aggregations

You can apply multiple aggregation functions at once using the `agg()` method. This provides a comprehensive view of your grouped data.

### Applying Multiple Functions

Calculate sum, mean, and maximum for each category in one operation:

In [15]:
df.groupby('Category').agg(['sum', 'mean', 'max'])

Unnamed: 0_level_0,Value,Value,Value
Unnamed: 0_level_1,sum,mean,max
Category,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
A,60,20.0,30
B,40,20.0,25


## Summary

GroupBy operations are essential for data analysis in Pandas. In this notebook, you learned:

### 🔢 **Basic Aggregation Functions**
- `sum()`: Total values per group
- `mean()`: Average values per group  
- `median()`: Middle value per group
- `max()` / `min()`: Highest/lowest values per group
- `std()` / `var()`: Measure spread within groups

### 📊 **Advanced Operations**
- `agg()`: Apply multiple functions simultaneously
- Combine statistics for comprehensive group analysis

### 💡 **Key Concepts**
1. **Split-Apply-Combine**: The three-step process of groupby operations
2. **Aggregation**: Reducing groups to single values (sum, mean, etc.)
3. **Multiple Functions**: Use `agg()` for comprehensive summaries

### 🚀 **Best Practices**
- Choose appropriate aggregation functions for your data type
- Use multiple aggregations to get complete group insights
- Consider data distribution when selecting measures (mean vs median)

### 📈 **Next Steps**
- Explore groupby with multiple columns
- Learn filtering and transformation operations
- Practice with real datasets for business insights

Mastering groupby operations will significantly enhance your data analysis capabilities! 🎯