1. Introduction to GroupBy and Aggregation in Pandas

In data analysis, grouping and aggregating data is a common operation. Pandas provides a powerful tool called GroupBy to perform these operations. With GroupBy, we can split the data into groups, apply a function to each group, and then combine the results.

    GroupBy: How to group data based on one or more columns.

    Aggregation: How to apply aggregate functions (like sum, mean, etc.) on the grouped data.

    Transformation: How to apply transformations to grouped data.

    Filtering: How to filter data based on group properties.



In [2]:
import pandas as pd

data = {
    'Category': ['A', 'A', 'B', 'B', 'C', 'C', 'A', 'B'],
    'Value1': [10, 20, 30, 40, 50, 60, 70, 80],
    'Value2': [15, 25, 35, 45, 55, 65, 75, 85]
}

df = pd.DataFrame(data)
df

Unnamed: 0,Category,Value1,Value2
0,A,10,15
1,A,20,25
2,B,30,35
3,B,40,45
4,C,50,55
5,C,60,65
6,A,70,75
7,B,80,85


2. Grouping Data

We can use the groupby() function to group data based on one or more columns. This splits the data into smaller subsets that we can apply operations to.

In [3]:
grouped = df.groupby('Category')
grouped
grouped.size()


Category
A    3
B    3
C    2
dtype: int64

3. Aggregating Data

Once the data is grouped, we can perform aggregation operations. These include sum, mean, count, min, max, etc. Pandas provides several built-in aggregate functions.

In [4]:
print(grouped[['Value1', 'Value2']].sum())
print(grouped[['Value1', 'Value2']].mean())

          Value1  Value2
Category                
A            100     115
B            150     165
C            110     120
             Value1     Value2
Category                      
A         33.333333  38.333333
B         50.000000  55.000000
C         55.000000  60.000000


4. Custom Aggregations

We can also perform custom aggregations by passing a function to the agg() method. This allows us to apply more complex operations.

In [5]:
grouped[['Value1', 'Value2']].agg(['sum', 'mean'])


Unnamed: 0_level_0,Value1,Value1,Value2,Value2
Unnamed: 0_level_1,sum,mean,sum,mean
Category,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
A,100,33.333333,115,38.333333
B,150,50.0,165,55.0
C,110,55.0,120,60.0


5. Multiple Aggregations

If we want to apply different aggregation functions to different columns, we can pass a dictionary to the agg() method. For example, we can compute the sum of Value1 and the mean of Value2 for each group:

In [6]:
grouped.agg({
    'Value1': 'sum',
    'Value2': 'mean'
})


Unnamed: 0_level_0,Value1,Value2
Category,Unnamed: 1_level_1,Unnamed: 2_level_1
A,100,38.333333
B,150,55.0
C,110,60.0
