## 6 聚合方法 agg()

In [1]:
import numpy as np
import pandas as pd

df = pd.DataFrame({
    "class": ["A", "A", "B", "B", "C", "C", "A", "B", "C", "A"],
    "name": ["Alice", "Bob", "Charlie", "David", "Eva", "Frank", "Grace", "Hannah", "Ian", "Jack"],
    "math_score": [85, 90, 78, 88, 92, 80, 87, 91, 76, 89],
    "project_score": [88, 92, 80, 85, 95, 78, 90, 93, 82, 87],
    "passed": [True, True, False, True, True, False, True, True, False, True]
})
df

Unnamed: 0,class,name,math_score,project_score,passed
0,A,Alice,85,88,True
1,A,Bob,90,92,True
2,B,Charlie,78,80,False
3,B,David,88,85,True
4,C,Eva,92,95,True
5,C,Frank,80,78,False
6,A,Grace,87,90,True
7,B,Hannah,91,93,True
8,C,Ian,76,82,False
9,A,Jack,89,87,True


#### 1. 普通聚合方法

##### 1.1 单列，单方法聚合
返回值为标量，根据方法不同返回值类型也不同

In [2]:
df["math_score"].agg("mean") # 计算数学成绩的平均分

np.float64(85.6)

In [3]:
df["project_score"].agg("max") # 计算项目成绩的最高分

95

##### 1.2 多列，单方法聚合
返回值为 Series，索引为列名，值为对应列的聚合结果

In [4]:
df[["math_score", "project_score"]].agg("min") # 计算数学成绩和项目成绩的最低分

math_score       76
project_score    78
dtype: int64

##### 1.3 多列，多方法聚合
返回值为 DataFrame，行索引为方法名，列索引为列名

In [5]:
df[["math_score", "project_score"]].agg(["max", "min", "mean"])

Unnamed: 0,math_score,project_score
max,92.0,95.0
min,76.0,78.0
mean,85.6,87.0


In [13]:
df.agg({
    "math_score": ["max", "min", "mean"],
    "project_score": ["max", "min", "mean"]
})

Unnamed: 0,math_score,project_score
max,92.0,95.0
min,76.0,78.0
mean,85.6,87.0


#### 2. 分组聚合方法 .groupby() + .agg()

##### 2.1 基本聚合

In [9]:
df.groupby("class")["math_score"].mean() # 按班级分组，计算数学成绩和项目成绩的平均分

class
A    87.750000
B    85.666667
C    82.666667
Name: math_score, dtype: float64

##### 2.2 单列，多方法聚合

In [10]:
df.groupby("class")["math_score"].agg(["max", "min", "mean"])

Unnamed: 0_level_0,max,min,mean
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,90,85,87.75
B,91,78,85.666667
C,92,76,82.666667


##### 2.3 多列，多方法聚合

In [11]:
df.groupby("class")[["math_score", "project_score"]].agg(["max", "min", "mean"])

Unnamed: 0_level_0,math_score,math_score,math_score,project_score,project_score,project_score
Unnamed: 0_level_1,max,min,mean,max,min,mean
class,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
A,90,85,87.75,92,87,89.25
B,91,78,85.666667,93,80,86.0
C,92,76,82.666667,95,78,85.0


In [12]:
df.groupby("class").agg({
    "math_score": ["max", "min", "mean"],
    "project_score": ["max", "min", "mean"]
})

Unnamed: 0_level_0,math_score,math_score,math_score,project_score,project_score,project_score
Unnamed: 0_level_1,max,min,mean,max,min,mean
class,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
A,90,85,87.75,92,87,89.25
B,91,78,85.666667,93,80,86.0
C,92,76,82.666667,95,78,85.0


#### 3. as_index 参数：
默认情况下，groupby 的结果会将分组列设置为索引，如果不想将分组列作为索引，可以将 as_index 参数设置为 False

#### 4. .transform() 方法:
对于分组之后，将聚合结果回填到原 DataFrame 中，返回值与原 DataFrame 形状相同