Sorting
===

**Warning**

The sorting API is substantially changed in 0.17.0, see [here](http://pandas.pydata.org/pandas-docs/version/0.20.3/whatsnew.html#whatsnew-0170-api-breaking-sorting) for these changes. In particular, all sorting methods now return a new object by default, and **DO NOT** operate in-place (except by passing `inplace=True`).

**警告**

sorting API 在 0.17.0 版中有实质性更改，这些更改参见[here](http://pandas.pydata.org/pandas-docs/version/0.20.3/whatsnew.html#whatsnew-0170-api-breaking-sorting) 。特别是，所有的排序方法现在默认返回一个新的对象，并且**DO NOT** inplace就地操作 (除非传递 `inplace=True`)。

There are two obvious kinds of sorting that you may be interested in: sorting by label and sorting by actual values.

您可能感兴趣的有两种明显的排序方式：按标签排序并按实际值排序。

In [3]:
import numpy as np
import pandas as pd

df = pd.DataFrame({'col1' : np.random.randn(3), 'col2' : np.random.randn(3)},
                  index=['a', 'b', 'c'])

unsorted_df = df.reindex(index=['a', 'd', 'c', 'b'],
                         columns=['three', 'two', 'one'])

unsorted_df

Unnamed: 0,three,two,one
a,,,
d,,,
c,,,
b,,,


In [4]:
unsorted_df.sort_index(ascending=False)

Unnamed: 0,three,two,one
d,,,
c,,,
b,,,
a,,,


In [5]:
unsorted_df.sort_index(axis=1)

Unnamed: 0,one,three,two
a,,,
d,,,
c,,,
b,,,


In [6]:
unsorted_df['three'].sort_index()

a   NaN
b   NaN
c   NaN
d   NaN
Name: three, dtype: float64

**By Values**

The [`Series.sort_values()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.Series.sort_values.html#pandas.Series.sort_values) and [`DataFrame.sort_values()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.sort_values.html#pandas.DataFrame.sort_values) are the entry points for **value** sorting (that is the values in a column or row). [`DataFrame.sort_values()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.sort_values.html#pandas.DataFrame.sort_values) can accept an optional `by` argument for `axis=0` which will use an arbitrary vector or a column name of the DataFrame to determine the sort order:

# 根据值排序

[`Series.sort_values()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.Series.sort_values.html#pandas.Series.sort_values) 和 [`DataFrame.sort_values()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.sort_values.html#pandas.DataFrame.sort_values) 是**value**  (即列或行中的值)排序的入口点。 [`DataFrame.sort_values()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.sort_values.html#pandas.DataFrame.sort_values) 可以接受一个可选的 `by` 参数用于`axis=0` ，它将使用DataFrame的任意向量或列名来确定排序顺序:

In [7]:
df1 = pd.DataFrame({'one':[2,1,1,1],'two':[1,3,2,4],'three':[5,4,3,2]})

In [8]:
df1.sort_values(by='two')

Unnamed: 0,one,two,three
0,2,1,5
2,1,2,3
1,1,3,4
3,1,4,2


The `by` argument can take a list of column names, e.g.:

`by`参数可以列出列名，例如：

In [11]:
df1[['one', 'two', 'three']].sort_values(by=['one','two'])

Unnamed: 0,one,two,three
2,1,2,3
1,1,3,4
3,1,4,2
0,2,1,5


These methods have special treatment of NA values via the `na_position` argument:

这些方法通过`na_position`参数对NA值进行特殊处理：

In [13]:
s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])

s[2] = np.nan

In [14]:
s.sort_values()

0       A
3    Aaba
1       B
4    Baca
6    CABA
8     cat
7     dog
2     NaN
5     NaN
dtype: object

In [15]:
s.sort_values(na_position='first')

2     NaN
5     NaN
0       A
3    Aaba
1       B
4    Baca
6    CABA
8     cat
7     dog
dtype: object

**searchsorted**

Series has the [`searchsorted()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.Series.searchsorted.html#pandas.Series.searchsorted) method, which works similar to [`numpy.ndarray.searchsorted()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.searchsorted.html#numpy.ndarray.searchsorted).

# searchsorted

Series 有[`searchsorted()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.Series.searchsorted.html#pandas.Series.searchsorted) 方法，其工作方式类似于 [`numpy.ndarray.searchsorted()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.searchsorted.html#numpy.ndarray.searchsorted).

In [16]:
ser = pd.Series([1, 2, 3])

In [17]:
ser.searchsorted([0, 3])

array([0, 2], dtype=int64)

In [18]:
ser.searchsorted([0, 4])

array([0, 3], dtype=int64)

In [19]:
ser.searchsorted([1, 3], side='right')

array([1, 3], dtype=int64)

In [20]:
ser.searchsorted([1, 3], side='left')

array([0, 2], dtype=int64)

In [21]:
ser = pd.Series([3, 1, 2])

In [22]:
ser.searchsorted([0, 3], sorter=np.argsort(ser))

array([0, 2], dtype=int64)

**smallest / largest values**

New in version 0.14.0.

`Series` has the [`nsmallest()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.Series.nsmallest.html#pandas.Series.nsmallest) and [`nlargest()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.Series.nlargest.html#pandas.Series.nlargest) methods which return the smallest or largest nn values. For a large `Series` this can be much faster than sorting the entire Series and calling `head(n)` on the result.

# smallest / largest values

New in version 0.14.0.

`Series` 有 [`nsmallest()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.Series.nsmallest.html#pandas.Series.nsmallest) 和 [`nlargest()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.Series.nlargest.html#pandas.Series.nlargest) 方法，这些方法返回最小或最大的 nn 值。对于大型 `Series` 这可能比对整个Series进行排序并在结果上调用 `head(n)` 要快得多。

In [24]:
s = pd.Series(np.random.permutation(10))
s

0    3
1    5
2    2
3    9
4    0
5    4
6    8
7    7
8    1
9    6
dtype: int32

In [25]:
s.sort_values()

4    0
8    1
2    2
0    3
5    4
1    5
9    6
7    7
6    8
3    9
dtype: int32

In [26]:
s.nsmallest(3)

4    0
8    1
2    2
dtype: int32

In [27]:
s.nlargest(3)

3    9
6    8
7    7
dtype: int32

New in version 0.17.0.

DataFrame also has the nlargest and nsmallest methods.

DataFrame也有 nlargest 和 nsmallest 方法。

In [28]:
df = pd.DataFrame({'a': [-2, -1, 1, 10, 8, 11, -1],
   .....:                    'b': list('abdceff'),
   .....:                    'c': [1.0, 2.0, 4.0, 3.2, np.nan, 3.0, 4.0]})

In [30]:
df.nlargest(3, 'a')

Unnamed: 0,a,b,c
5,11,f,3.0
3,10,c,3.2
4,8,e,


In [29]:
df.nlargest(5, ['a', 'c'])

Unnamed: 0,a,b,c
6,-1,f,4.0
5,11,f,3.0
3,10,c,3.2
4,8,e,
2,1,d,4.0


In [31]:
df.nsmallest(3, 'a')

Unnamed: 0,a,b,c
0,-2,a,1.0
1,-1,b,2.0
6,-1,f,4.0


In [32]:
df.nsmallest(5, ['a', 'c'])

Unnamed: 0,a,b,c
0,-2,a,1.0
2,1,d,4.0
4,8,e,
1,-1,b,2.0
6,-1,f,4.0


**Sorting by a multi-index column**

You must be explicit about sorting when the column is a multi-index, and fully specify all levels to `by`.

# Sorting by a multi-index column

You must be explicit about sorting when the column is a multi-index, and fully specify all levels to `by`.

当列是多索引时，您必须明确排序，并完全指定所有级别。

In [33]:
df1.columns = pd.MultiIndex.from_tuples([('a','one'),('a','two'),('b','three')])

In [34]:
df1.sort_values(by=('a','two'))

Unnamed: 0_level_0,a,a,b
Unnamed: 0_level_1,one,two,three
0,2,1,5
2,1,2,3
1,1,3,4
3,1,4,2
