In [2]:
print("""
@File         : apply.ipynb
@Author(s)    : Stephen CUI
@LastEditor(s): Stephen CUI
@CreatedTime  : 2024-12-28 10:48:16
@Email        : cuixuanstephen@gmail.com
@Description  : Apply
""")


@File         : apply.ipynb
@Author(s)    : Stephen CUI
@LastEditor(s): Stephen CUI
@CreatedTime  : 2024-12-28 10:48:16
@Email        : cuixuanstephen@gmail.com
@Description  : Apply



In [3]:
import pandas as pd
import numpy as np

`.apply` 是一种常用的方法，我认为它被过度使用了。到目前为止看到的 `.agg`、`.transform` 和 `.map` 方法具有相对清晰的语义（`.agg` 归约、`.transform` 保持形状、`.map` 按元素应用函数），但是当使用 `.apply` 时，可以镜像其中任何一个。这种灵活性乍一看似乎不错，但由于 `.apply` 将正确的事情留给了 pandas，因此通常最好选择最明确的方法来避免意外。

In [4]:
def debug_apply(value):
    print(f'Apply was called with value:\n{value}')

In [5]:
ser = pd.Series(range(3), dtype=pd.Int64Dtype())
ser.apply(debug_apply)

Apply was called with value:
0
Apply was called with value:
1
Apply was called with value:
2


0    None
1    None
2    None
dtype: object

In [6]:
ser.map(debug_apply)

Apply was called with value:
0
Apply was called with value:
1
Apply was called with value:
2


0    None
1    None
2    None
dtype: object

`pd.Series.apply` 的工作方式类似于 Python 循环，为每个元素调用该函数。并将每个返回的值累积起来得到一个 `pd.Series`。

In [7]:
df = pd.DataFrame(
    np.arange(6).reshape(3, -1),
    columns=list('ab')
).convert_dtypes(dtype_backend='numpy_nullable')
df

Unnamed: 0,a,b
0,0,1
1,2,3
2,4,5


In [8]:
df.apply(debug_apply)

Apply was called with value:
0    0
1    2
2    4
Name: a, dtype: Int32
Apply was called with value:
0    1
1    3
2    5
Name: b, dtype: Int32


a    None
b    None
dtype: object

> `pd.Series.apply` is element-wise, whereas `pd.DataFrame.apply` is column-wise.

相信 pandas 能用 `.apply` 做正确的事情可能是一个冒险的提议；强烈建议用户在使用 `.apply` 之前，先用尽 `.agg`、`.transform` 或 `.map` 的所有选项。