Row or Column-wise Function Application
====

Arbitrary functions can be applied along the axes of a DataFrame or Panel using the [`apply()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.apply.html#pandas.DataFrame.apply) method, which, like the descriptive statistics methods, take an optional `axis` argument:

可以使用[`apply()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.apply.html#pandas.DataFrame.apply)方法沿着DataFrame或Panel的轴应用任意函数，与描述性统计方法一样，采用可选的`axis`参数：

In [22]:
import numpy as np
import pandas as pd

In [23]:
df = pd.DataFrame({'one' : pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
                   'two' : pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']),
                   'three' : pd.Series(np.random.randn(3), index=['b', 'c', 'd'])})

df

Unnamed: 0,one,two,three
a,0.118589,-0.994057,
b,1.492557,-0.04881,0.055497
c,-0.284845,-0.352431,-0.268867
d,,0.034665,0.485592


In [24]:
df.apply(np.mean)

one      0.442100
two     -0.340158
three    0.090741
dtype: float64

In [25]:
df.apply(np.mean, axis=1)

a   -0.437734
b    0.499748
c   -0.302048
d    0.260129
dtype: float64

In [26]:
df.apply(lambda x: x.max() - x.min())

one      1.777403
two      1.028723
three    0.754459
dtype: float64

In [27]:
df.apply(np.cumsum)

Unnamed: 0,one,two,three
a,0.118589,-0.994057,
b,1.611146,-1.042868,0.055497
c,1.326301,-1.395299,-0.21337
d,,-1.360633,0.272222


In [28]:
df.apply(np.exp)

Unnamed: 0,one,two,three
a,1.125907,0.370072,
b,4.448458,0.952362,1.057066
c,0.752131,0.702977,0.764245
d,,1.035273,1.625137


`.apply()` will also dispatch on a string method name.

`.apply()`也会以字符串方法名发送。

In [29]:
df.apply('mean')

one      0.442100
two     -0.340158
three    0.090741
dtype: float64

In [30]:
df.apply('mean', axis=1)

a   -0.437734
b    0.499748
c   -0.302048
d    0.260129
dtype: float64

Depending on the return type of the function passed to [`apply()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.apply.html#pandas.DataFrame.apply), the result will either be of lower dimension or the same dimension.

[`apply()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.apply.html#pandas.DataFrame.apply) combined with some cleverness can be used to answer many questions about a data set. For example, suppose we wanted to extract the date where the maximum value for each column occurred:

根据传递给 [`apply()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.apply.html#pandas.DataFrame.apply)的函数的返回类型，结果将具有较低的维度或相同的维度。

[`apply()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.apply.html#pandas.DataFrame.apply)结合一些聪明可以用来回答关于数据集的许多问题。例如，假设我们想要提取每列发生最大值的日期：

In [31]:
tsdf = pd.DataFrame(np.random.randn(1000, 3), columns=['A', 'B', 'C'],
   .....:                     index=pd.date_range('1/1/2000', periods=1000))

tsdf.head()

Unnamed: 0,A,B,C
2000-01-01,1.942722,0.153037,0.417023
2000-01-02,0.929966,-0.875455,-0.300077
2000-01-03,0.725709,-0.272283,0.274643
2000-01-04,-0.829117,-1.347229,1.64561
2000-01-05,1.554522,-0.902455,-0.282156


In [32]:
tsdf.apply(lambda x: x.idxmax())

A   2002-07-02
B   2000-08-26
C   2001-07-29
dtype: datetime64[ns]

You may also pass additional arguments and keyword arguments to the [`apply()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.apply.html#pandas.DataFrame.apply) method. For instance, consider the following function you would like to apply:

也可以传递附加参数和关键字参数给 [`apply()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.apply.html#pandas.DataFrame.apply) 方法。例如，下面这个是你想应用的函数：

In [33]:
def subtract_and_divide(x, sub, divide=1):
    return (x - sub) / divide

You may then apply this function as follows:

可以像下面这样应用这个函数：

In [34]:
df.apply(subtract_and_divide, args=(5,), divide=3)

Unnamed: 0,one,two,three
a,-1.627137,-1.998019,
b,-1.169148,-1.682937,-1.648168
c,-1.761615,-1.784144,-1.756289
d,,-1.655112,-1.504803


Another useful feature is the ability to pass Series methods to carry out some Series operation on each column or row:

另一个有用的功能是可以传递Series方法在每个列或行上执行一些Series操作：

In [35]:
tsdf.head()

Unnamed: 0,A,B,C
2000-01-01,1.942722,0.153037,0.417023
2000-01-02,0.929966,-0.875455,-0.300077
2000-01-03,0.725709,-0.272283,0.274643
2000-01-04,-0.829117,-1.347229,1.64561
2000-01-05,1.554522,-0.902455,-0.282156


In [36]:
tsdf.apply(pd.Series.interpolate).head()

Unnamed: 0,A,B,C
2000-01-01,1.942722,0.153037,0.417023
2000-01-02,0.929966,-0.875455,-0.300077
2000-01-03,0.725709,-0.272283,0.274643
2000-01-04,-0.829117,-1.347229,1.64561
2000-01-05,1.554522,-0.902455,-0.282156


Finally, [`apply()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.apply.html#pandas.DataFrame.apply) takes an argument `raw` which is False by default, which converts each row or column into a Series before applying the function. When set to True, the passed function will instead receive an ndarray object, which has positive performance implications if you do not need the indexing functionality.

最后, [`apply()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.apply.html#pandas.DataFrame.apply) 接受一个默认为False的参数 `raw` , 它在转换每行或列之前应用这个函数. 当设置为True时，传递的函数将会收到一个ndarray对象，如果您不需要索引功能，它将具有积极的性能影响。设置为True时，传递的函数将接收一个ndarray对象，如果您不需要索引功能，则会产生积极的性能影响。