# Pandas Group Operations
Let's next go over grouped operations with pandas. This section of the pandas library does not have as much feature bloat as other parts, which is nice. And the community is starting to narrow around a couple of operations that are core to grouped operations. We'll be going over these operations with particular emphasis on groupby and agg:

groupby

agg

filter

transform

Groupby

A grouped operation starts by specifying which groups of data that we would want to operate over. There are many ways of making groupsm, but the tool that pandas uses to make groups of data, is groupby




Groupby works by telling pandas a couple of columns. Pandas will look in your data and see every unique combination of the columns that you specify. Each unique combination is a group. So in this case we will have four groups: male smoker, female smoker, male non-smoker, female non-smoker.

The groupby object by itself is not super important.

In [1]:
import pandas as pd

# Create a sample DataFrame
data = {'Category': ['A', 'B', 'A', 'B', 'A', 'A'],
        'Value': [10, 20, 15, 25, 30, 35]}
df = pd.DataFrame(data)

# Group by 'Category'
grouped = df.groupby('Category')

# Print the groups
for name, group in grouped:
    print(f"Group: {name}")
    print(group)
    print()

Group: A
  Category  Value
0        A     10
2        A     15
4        A     30
5        A     35

Group: B
  Category  Value
1        B     20
3        B     25



Agg

The aggregate operation aggregates all the data in these groups into one value. You use a dictionary to specify which values you'd like. For example look below, we are asking for both the mean and the min value of the tip column for each group:

In [2]:
# Using the agg() function to perform aggregation operations
result_agg = grouped.agg({'Value': ['sum', 'mean', 'max']})
print("Aggregation Result:")
print(result_agg)

Aggregation Result:
         Value          
           sum  mean max
Category                
A           90  22.5  35
B           45  22.5  25


Filter

The next common group operation is a filter. This one is pretty simple, we filter out member of groups that don't meet our criteria.



In [3]:
# Using the filter() function to filter groups based on a condition
filtered_groups = grouped.filter(lambda x: x['Value'].mean() > 20)
print("Filtered Groups:")
print(filtered_groups)

Filtered Groups:
  Category  Value
0        A     10
1        B     20
2        A     15
3        B     25
4        A     30
5        A     35


Transform

The final group operation is transform. This uses group information to apply transformations to individual data points.

In [4]:
# Using the transform() function to broadcast group-specific statistics to the original DataFrame
df['Group_Mean'] = grouped['Value'].transform('mean')
print("Transformed DataFrame:")
print(df)


Transformed DataFrame:
  Category  Value  Group_Mean
0        A     10        22.5
1        B     20        22.5
2        A     15        22.5
3        B     25        22.5
4        A     30        22.5
5        A     35        22.5
