---
title: "Advanced Data Aggregation with Pandas Functions"
author: "Mohammed Adil Siraju"
date: "2025-09-21"
categories: [pandas, dataframe, aggregation, functions]
description: "Master advanced data aggregation techniques in Pandas using built-in functions, custom functions, and lambda expressions. Learn to create powerful summary statistics and custom aggregations."
---

Data aggregation is a fundamental operation in data analysis that allows you to summarize and analyze data by groups. This notebook covers:

- **Built-in Aggregation Functions**: Using Pandas' built-in functions like sum, mean, max
- **Custom Aggregation Functions**: Creating your own aggregation logic with lambda functions and custom functions
- **Multiple Aggregations**: Applying several functions simultaneously
- **Dictionary-based Aggregation**: Specifying different functions for different columns

Mastering these techniques will give you powerful tools for data summarization and analysis.

## 1. Setting Up Sample Data

Let's create a sample dataset to demonstrate various aggregation techniques. We'll work with categorical data and numerical values.

In [1]:
import pandas as pd

data = {
    'Category': ['A', 'B', 'A', 'B', 'A'],
    'Value': [10,15,20,25,30]
}

df = pd.DataFrame(data)
df

Unnamed: 0,Category,Value
0,A,10
1,B,15
2,A,20
3,B,25
4,A,30


## 2. Built-in Aggregation Functions

Pandas provides many built-in aggregation functions that you can use with the `agg()` method. These are the most common summary statistics.

### Sum Aggregation

Calculate the total sum of values for each category:

In [2]:
df.groupby('Category').agg({'Value':'sum'})

Unnamed: 0_level_0,Value
Category,Unnamed: 1_level_1
A,60
B,40


### Mean Aggregation

Calculate the average value for each category:

In [3]:
df.groupby('Category').agg({'Value':'mean'})

Unnamed: 0_level_0,Value
Category,Unnamed: 1_level_1
A,20.0
B,20.0


### Maximum Value Aggregation

Find the highest value in each category:

In [4]:
df.groupby('Category').agg({'Value':'max'})

Unnamed: 0_level_0,Value
Category,Unnamed: 1_level_1
A,30
B,25


## 3. Custom Aggregation Functions

Sometimes built-in functions aren't enough. Pandas allows you to create custom aggregation functions using lambda expressions or named functions.

### Lambda Functions for Custom Aggregation

Create a lambda function to calculate the range (max - min) for each category:

In [5]:
custom_agg = lambda x: x.max() - x.min() 

In [6]:
df

Unnamed: 0,Category,Value
0,A,10
1,B,15
2,A,20
3,B,25
4,A,30


In [7]:
df.groupby('Category').agg(custom_agg)
# or
df.groupby('Category').agg({'Value': custom_agg})

Unnamed: 0_level_0,Value
Category,Unnamed: 1_level_1
A,20
B,10


## 4. Multiple Aggregations

You can apply multiple aggregation functions at once to get comprehensive statistics for each group.

### Applying Multiple Built-in Functions

Calculate count, sum, min, max, and mean for each category:

In [8]:
df.groupby('Category')['Value'].agg(['count', 'sum', 'min', 'max','mean'])

Unnamed: 0_level_0,count,sum,min,max,mean
Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
A,3,60,10,30,20.0
B,2,40,15,25,20.0


## 5. Named Custom Functions

For more complex logic, you can define named functions and use them in aggregations.

### Creating a Custom Mean Function

Define a function to calculate mean (demonstrating how custom functions work):

In [13]:
def custom_mean(values):
    return sum(values) / len(values)

df.groupby('Category')['Value'].agg(custom_mean)

Category
A    20.0
B    20.0
Name: Value, dtype: float64

## Summary

Data aggregation is a powerful tool for summarizing and analyzing grouped data. In this notebook, you learned:

### 🔧 **Built-in Functions**
- `sum`, `mean`, `max`: Standard statistical aggregations
- Dictionary syntax: `agg({'column': 'function'})`
- Multiple functions: `agg(['func1', 'func2'])`

### 🎯 **Custom Functions**
- **Lambda functions**: Quick, inline custom logic
- **Named functions**: Complex logic with reusable functions
- **Flexible application**: Apply to specific columns or entire groups

### 💡 **Key Concepts**
1. **Dictionary Aggregation**: Specify different functions for different columns
2. **List Aggregation**: Apply multiple functions to the same column
3. **Custom Logic**: Create domain-specific aggregations

### 🚀 **Best Practices**
- Use built-in functions when possible (more efficient)
- Lambda functions for simple custom logic
- Named functions for complex, reusable operations
- Choose appropriate aggregations based on your data and analysis goals

### 📊 **Next Steps**
- Explore groupby with multiple columns
- Learn about transformation and filtering operations
- Practice with real datasets to create meaningful aggregations

Mastering aggregation functions will significantly enhance your data analysis capabilities! 🎯📈