Aggregation API
===
**聚合 API(应用程序接口)**

New in version 0.20.0.

The aggregation API allows one to express possibly multiple aggregation operations in a single concise way. This API is similar across pandas objects, see [groupby API](http://pandas.pydata.org/pandas-docs/version/0.20.3/groupby.html#groupby-aggregate), the [window functions API](http://pandas.pydata.org/pandas-docs/version/0.20.3/computation.html#stats-aggregate), and the [resample API](http://pandas.pydata.org/pandas-docs/version/0.20.3/timeseries.html#timeseries-aggregate). The entry point for aggregation is the method [`aggregate()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.aggregate.html#pandas.DataFrame.aggregate), or the alias [`agg()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.agg.html#pandas.DataFrame.agg).

We will use a similar starting frame from above:

聚合API允许人们以一种简洁的方式表达可能的多个聚合操作。这个API在熊猫的物体上是相似的，参见 [groupby API](http://pandas.pydata.org/pandas-docs/version/0.20.3/groupby.html#groupby-aggregate), [window functions API](http://pandas.pydata.org/pandas-docs/version/0.20.3/computation.html#stats-aggregate), 和 [resample API](http://pandas.pydata.org/pandas-docs/version/0.20.3/timeseries.html#timeseries-aggregate). 聚合的入口点是方法 [`aggregate()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.aggregate.html#pandas.DataFrame.aggregate), 或者别名 [`agg()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.agg.html#pandas.DataFrame.agg).

我们从使用与上面类似的frame开始：

In [1]:
import numpy as np
import pandas as pd

tsdf = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
                    index=pd.date_range('1/1/2000', periods=10))

In [2]:
tsdf.iloc[3:7] = np.nan
tsdf

Unnamed: 0,A,B,C
2000-01-01,0.144303,-2.160585,-0.910892
2000-01-02,0.351515,-0.816961,-0.051952
2000-01-03,0.6062,0.966552,-0.455652
2000-01-04,,,
2000-01-05,,,
2000-01-06,,,
2000-01-07,,,
2000-01-08,-0.233216,0.677415,0.424877
2000-01-09,-0.029646,1.540799,0.881013
2000-01-10,-0.87882,0.897679,1.321407


Using a single function is equivalent to [`apply()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.apply.html#pandas.DataFrame.apply); You can also pass named methods as strings. These will return a `Series` of the aggregated output:

使用单个函数等效于 [`apply()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.apply.html#pandas.DataFrame.apply);您还可以将命名方法作为字符串传递。 这些将返回聚合输出的`Series`：

In [3]:
tsdf.agg(np.sum)

A   -0.039665
B    1.104899
C    1.208802
dtype: float64

In [4]:
tsdf.agg('sum')

A   -0.039665
B    1.104899
C    1.208802
dtype: float64

In [5]:
# these are equivalent to a ``.sum()`` because we are aggregating on a single function
tsdf.sum()

A   -0.039665
B    1.104899
C    1.208802
dtype: float64

Single aggregations on a `Series` this will result in a scalar value:

Series上的单个聚合将产生标量值：

In [6]:
tsdf.A.agg('sum')

-0.03966525769883711

**Aggregating with multiple functions**

You can pass multiple aggregation arguments as a list. The results of each of the passed functions will be a row in the resultant `DataFrame`. These are naturally named from the aggregation function.

# 具有多个函数的聚合
您可以将多个聚合参数作为列表传递。每个传递函数的结果将是结果DataFrame中的一行。这些行自然地以聚合函数命名。

In [7]:
tsdf.agg(['sum'])

Unnamed: 0,A,B,C
sum,-0.039665,1.104899,1.208802


Multiple functions yield multiple rows:

多个函数产生多个行：

In [8]:
tsdf.agg(['sum', 'mean'])

Unnamed: 0,A,B,C
sum,-0.039665,1.104899,1.208802
mean,-0.006611,0.18415,0.201467


On a `Series`, multiple functions return a Series, indexed by the function names:

在`Series`，多个函数返回一个Series，通过函数名索引：

In [9]:
tsdf.A.agg(['sum', 'mean'])

sum    -0.039665
mean   -0.006611
Name: A, dtype: float64

Passing a `lambda` function will yield a `<lambda>` named row:

传递一个`lambda`函数将返回一个`<lambda>`命名的行：

In [10]:
tsdf.A.agg(['sum', lambda x: x.mean()])

sum        -0.039665
<lambda>   -0.006611
Name: A, dtype: float64

Passing a named function will yield that name for the row:

传递命名的函数将返回该函数的名称用于行：

In [11]:
def mymean(x):
    return x.mean()

In [12]:
tsdf.A.agg(['sum', mymean])

sum      -0.039665
mymean   -0.006611
Name: A, dtype: float64

**Aggregating with a dict**

Passing a dictionary of column names to a scalar or a list of scalars, to `DataFame.agg` allows you to customize which functions are applied to which columns. Note that the results are not in any particular order, you can use an `OrderedDict` instead to guarantee ordering.

#  带dict的聚合

将列名称字典传递给标量或标量列表，到“DataFame.agg”允许您自定义哪些函数应用于哪些列。 请注意，结果不是任何特定顺序，您可以使用“OrderedDict”来保证排序。

In [13]:
tsdf.agg({'A': 'mean', 'B': 'sum'})

A   -0.006611
B    1.104899
dtype: float64

Passing a list-like will generate a `DataFrame` output. You will get a matrix-like output of all of the aggregators. The output will consist of all unique functions. Those that are not noted for a particular column will be `NaN`:

Passing a list-like will generate a `DataFrame` output. You will get a matrix-like output of all of the aggregators. The output will consist of all unique functions. Those that are not noted for a particular column will be `NaN`:

传递类似列表将生成`DataFrame`输出。 您将获得所有聚合器的矩阵式输出。输出将包含所有唯一的函数。 那些没有注明特定列的那些输出将是`NaN`：

In [14]:
tsdf.agg({'A': ['mean', 'min'], 'B': 'sum'})

Unnamed: 0,A,B
mean,-0.006611,
min,-0.87882,
sum,,1.104899


**Mixed Dtypes**

When presented with mixed dtypes that cannot aggregate, `.agg` will only take the valid aggregations. This is similiar to how groupby `.agg` works.

# 混合Dtypes

当呈现无法聚合的混合dtypes时，`.agg`将仅采用有效聚合。这类似于groupby ` .agg`的工作方式。

In [None]:
mdf = pd.DataFrame({'A': [1, 2, 3],
   .....:                     'B': [1., 2., 3.],
   .....:                     'C': ['foo', 'bar', 'baz'],
   .....:                     'D': pd.date_range('20130101', periods=3)})

In [None]:
mdf.dtypes

In [15]:
mdf.agg(['min', 'sum'])

NameError: name 'mdf' is not defined

**Custom describe**

With `.agg()` is it possible to easily create a custom describe function, similar to the built in [describe function](http://pandas.pydata.org/pandas-docs/version/0.20.3/basics.html#basics-describe).

# 自定义描述

使用`.agg()`可以轻松创建自定义描述函数，类似于内置的[describe function](http://pandas.pydata.org/pandas-docs/version/0.20.3/basics.html#basics-describe)。

In [None]:
from functools import partial

In [None]:
q_25 = partial(pd.Series.quantile, q=0.25)

q_25.__name__ = '25%'

q_75 = partial(pd.Series.quantile, q=0.75)

q_75.__name__ = '75%'

# End