Reindexing and altering labels
====
**重建索引和变更标签**

[`reindex()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.Series.reindex.html#pandas.Series.reindex) is the fundamental data alignment method in pandas. It is used to implement nearly all other features relying on label-alignment functionality. To *reindex* means to conform the data to match a given set of labels along a particular axis. This accomplishes several things:

[`reindex()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.Series.reindex.html#pandas.Series.reindex)是pandas中基本的数据对齐方法。它用于实现几乎所有依赖于标签对齐功能的其他功能。要*重建索引*意味着使数据符合以匹配特定轴上的给定标签集。这完成了几件事：

> - Reorders the existing data to match a new set of labels
> - Inserts missing value (NA) markers in label locations where no data for that label existed
> - If specified, **fill** data for missing labels using logic (highly relevant to working with time series data)

> - 重新排序现有数据以匹配一组新标签
> - 在标签位置插入缺失值（NA）标记，其中不存在该标签的数据
> - 如果指定，使用逻辑 **fill** 缺失标签的数据（与使用时间序列数据高度相关）

Here is a simple example:

下面是一个简单的例子：

In [1]:
import numpy as np
import pandas as pd

In [2]:
s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])

s

a   -0.318634
b    0.127459
c    1.425267
d    2.209594
e    0.979798
dtype: float64

In [3]:
s.reindex(['e', 'b', 'f', 'd'])

e    0.979798
b    0.127459
f         NaN
d    2.209594
dtype: float64

Here, the `f` label was not contained in the Series and hence appears as `NaN` in the result.

With a DataFrame, you can simultaneously reindex the index and columns:

这里，`f`标签没有包含在Series中，所以在结果中显示为`NaN`。

对于DataFrame，你可以同时重建索引和列：

In [4]:
import numpy as np
import pandas as pd

df = pd.DataFrame({'one' : pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
                    'two' : pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']),
                    'three' : pd.Series(np.random.randn(3), index=['b', 'c', 'd'])})
    
df

Unnamed: 0,one,two,three
a,1.296161,0.425753,
b,0.8157,0.893428,0.021893
c,0.876341,0.674101,-0.748436
d,,0.441937,-0.268829


In [5]:
df.reindex(index=['c', 'f', 'b'], columns=['three', 'two', 'one'])

Unnamed: 0,three,two,one
c,-0.748436,0.674101,0.876341
f,,,
b,0.021893,0.893428,0.8157


For convenience, you may utilize the [`reindex_axis()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.Series.reindex_axis.html#pandas.Series.reindex_axis) method, which takes the labels and a keyword `axis`parameter.

Note that the `Index` objects containing the actual axis labels can be **shared** between objects. So if we have a Series and a DataFrame, the following can be done:

为方便起见，您可以使用reindex_axis（）方法，该方法采用标签和关键字轴参数。

请注意，可以在对象之间共享包含实际轴标签的Index对象。 因此，如果我们有一个Series和一个DataFrame，可以执行以下操作：

In [6]:
rs = s.reindex(df.index)

rs

a   -0.318634
b    0.127459
c    1.425267
d    2.209594
dtype: float64

In [7]:
rs.index is df.index

True

This means that the reindexed Series’s index is the same Python object as the DataFrame’s index.

这意味着重新索引的Series索引与DataFrame的索引是相同的Python对象。

**Note**

When writing performance-sensitive code, there is a good reason to spend some time becoming a reindexing ninja: **many operations are faster on pre-aligned data**. Adding two unaligned DataFrames internally triggers a reindexing step. For exploratory analysis you will hardly notice the difference (because `reindex` has been heavily optimized), but when CPU cycles matter sprinkling a few explicit `reindex` calls here and there can have an impact.

**注意** 

在编写对性能敏感的代码时，有充分的理由花一些时间成为重新索引的忍者：**许多操作在预对齐数据上更快**。 在内部添加两个未对齐的DataFrame会触发重建索引步骤。对于探索性分析，您几乎不会注意到差异（因为`reindex`已经过大量优化），但是当CPU周期很重要时，会在这里发出一些明确的`reindex`调用，并且会产生影响。

**Reindexing to align with another object**

You may wish to take an object and reindex its axes to be labeled the same as another object. While the syntax for this is straightforward albeit verbose, it is a common enough operation that the [`reindex_like()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.reindex_like.html#pandas.DataFrame.reindex_like)method is available to make this simpler:

# 重新索引以与另一个对象对齐


您可能希望获取一个对象并重建其轴标记以与另一个对象相同。虽然这个语法很简单，虽然详细，但这是一个常见的操作， [`reindex_like()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.reindex_like.html#pandas.DataFrame.reindex_like)方法可以使这更简单：

In [9]:
df2 = pd.DataFrame({'one' : pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
                   'two' : pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
                   })

df2

Unnamed: 0,one,two
a,-0.547411,0.631706
b,-0.882442,-0.730313
c,-0.513099,0.451223


In [8]:
df3 = pd.DataFrame({'one' : pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
                   'two' : pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
                   })

df3

Unnamed: 0,one,two
a,-1.143812,-3.659409
b,-0.944,0.460221
c,-1.527602,-0.776251


In [10]:
df.reindex_like(df2)

Unnamed: 0,one,two
a,1.296161,0.425753
b,0.8157,0.893428
c,0.876341,0.674101


**Aligning objects with each other with `align`**

The [`align()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.Series.align.html#pandas.Series.align) method is the fastest way to simultaneously align two objects. It supports a `join` argument (related to [joining and merging](http://pandas.pydata.org/pandas-docs/version/0.20.3/merging.html#merging)):

> - `join='outer'`: take the union of the indexes (default)
> - `join='left'`: use the calling object’s index
> - `join='right'`: use the passed object’s index
> - `join='inner'`: intersect the indexes

It returns a tuple with both of the reindexed Series:

# 使用 `align`将对象彼此对齐

[`align()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.Series.align.html#pandas.Series.align) 方法是同时对齐两个对象最快的方法。 它支持一个 `join` 参数 (与 [joining and merging](http://pandas.pydata.org/pandas-docs/version/0.20.3/merging.html#merging)相关):

> - `join='outer'`: take the union of the indexes (default)
> - `join='left'`: use the calling object’s index
> - `join='right'`: use the passed object’s index
> - `join='inner'`: intersect the indexes

> - `join ='outer'`：取索引的并集（默认）
> - `join ='left'`：使用调用对象的索引
> - `join ='right'`：使用传递的对象的索引
> - `join ='inner'`：与索引相交

它返回两个重建索引的Series的元组：

In [11]:
s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
s

a    1.194754
b   -1.660928
c   -0.873620
d   -0.939589
e    1.095333
dtype: float64

In [12]:
s1 = s[:4]
s1

a    1.194754
b   -1.660928
c   -0.873620
d   -0.939589
dtype: float64

In [13]:
s2 = s[1:]
s2

b   -1.660928
c   -0.873620
d   -0.939589
e    1.095333
dtype: float64

In [14]:
s1.align(s2)

(a    1.194754
 b   -1.660928
 c   -0.873620
 d   -0.939589
 e         NaN
 dtype: float64, a         NaN
 b   -1.660928
 c   -0.873620
 d   -0.939589
 e    1.095333
 dtype: float64)

In [15]:
s1.align(s2, join='inner')

(b   -1.660928
 c   -0.873620
 d   -0.939589
 dtype: float64, b   -1.660928
 c   -0.873620
 d   -0.939589
 dtype: float64)

In [17]:
s1.align(s2, join='left')

(a    1.194754
 b   -1.660928
 c   -0.873620
 d   -0.939589
 dtype: float64, a         NaN
 b   -1.660928
 c   -0.873620
 d   -0.939589
 dtype: float64)

For DataFrames, the join method will be applied to both the index and the columns by default:

对于DataFrames，默认情况下，join方法将应用于索引和列：

In [16]:
df.align(df2, join='inner')

(        one       two
 a  1.296161  0.425753
 b  0.815700  0.893428
 c  0.876341  0.674101,         one       two
 a -0.547411  0.631706
 b -0.882442 -0.730313
 c -0.513099  0.451223)

You can also pass an `axis` option to only align on the specified axis:

您还可以传递轴选项以仅在指定的轴上对齐：

In [18]:
df.align(df2, join='inner', axis=0)

(        one       two     three
 a  1.296161  0.425753       NaN
 b  0.815700  0.893428  0.021893
 c  0.876341  0.674101 -0.748436,         one       two
 a -0.547411  0.631706
 b -0.882442 -0.730313
 c -0.513099  0.451223)

If you pass a Series to [`DataFrame.align()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.align.html#pandas.DataFrame.align), you can choose to align both objects either on the DataFrame’s index or columns using the `axis` argument:

如果你传递一个 Series 给 [`DataFrame.align()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.align.html#pandas.DataFrame.align), 您可以使用“axis”参数选择在DataFrame的索引或列上对齐两个对象：

In [19]:
df.align(df2.iloc[0], axis=1)

(        one     three       two
 a  1.296161       NaN  0.425753
 b  0.815700  0.021893  0.893428
 c  0.876341 -0.748436  0.674101
 d       NaN -0.268829  0.441937, one     -0.547411
 three         NaN
 two      0.631706
 Name: a, dtype: float64)

**Filling while reindexing**

[`reindex()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.Series.reindex.html#pandas.Series.reindex) takes an optional parameter `method` which is a filling method chosen from the following table:

# 在重建索引时填充

[`reindex()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.Series.reindex.html#pandas.Series.reindex) 接受一个可选的参数`method`，这是从下表中选择的方法：

| ethod            | Action                            |
| ---------------- | --------------------------------- |
| pad / ffill      | Fill values forward               |
| bfill / backfill | Fill values backward              |
| nearest          | Fill from the nearest index value |

We illustrate these fill methods on a simple Series:

我们在一个简单的Series中说明这些填充方法：

In [21]:
rng = pd.date_range('1/3/2000', periods=8)

In [22]:
ts = pd.Series(np.random.randn(8), index=rng)

In [23]:
ts2 = ts[[0, 3, 6]]

In [24]:
ts

2000-01-03   -0.337940
2000-01-04    0.293493
2000-01-05    1.613010
2000-01-06    1.347765
2000-01-07   -0.498570
2000-01-08   -0.673621
2000-01-09    0.392016
2000-01-10    0.720507
Freq: D, dtype: float64

In [25]:
ts2

2000-01-03   -0.337940
2000-01-06    1.347765
2000-01-09    0.392016
dtype: float64

In [26]:
ts2.reindex(ts.index)

2000-01-03   -0.337940
2000-01-04         NaN
2000-01-05         NaN
2000-01-06    1.347765
2000-01-07         NaN
2000-01-08         NaN
2000-01-09    0.392016
2000-01-10         NaN
Freq: D, dtype: float64

In [27]:
ts2.reindex(ts.index, method='ffill')

2000-01-03   -0.337940
2000-01-04   -0.337940
2000-01-05   -0.337940
2000-01-06    1.347765
2000-01-07    1.347765
2000-01-08    1.347765
2000-01-09    0.392016
2000-01-10    0.392016
Freq: D, dtype: float64

In [28]:
ts2.reindex(ts.index, method='bfill')

2000-01-03   -0.337940
2000-01-04    1.347765
2000-01-05    1.347765
2000-01-06    1.347765
2000-01-07    0.392016
2000-01-08    0.392016
2000-01-09    0.392016
2000-01-10         NaN
Freq: D, dtype: float64

In [29]:
ts2.reindex(ts.index, method='nearest')

2000-01-03   -0.337940
2000-01-04   -0.337940
2000-01-05    1.347765
2000-01-06    1.347765
2000-01-07    1.347765
2000-01-08    0.392016
2000-01-09    0.392016
2000-01-10    0.392016
Freq: D, dtype: float64

These methods require that the indexes are **ordered** increasing or decreasing.

Note that the same result could have been achieved using [fillna](http://pandas.pydata.org/pandas-docs/version/0.20.3/missing_data.html#missing-data-fillna) (except for `method='nearest'`) or [interpolate](http://pandas.pydata.org/pandas-docs/version/0.20.3/missing_data.html#missing-data-interpolate):

这些方法要求索引被 **ordered** 增加或减少。

注意 同样的结果可以使用[fillna](http://pandas.pydata.org/pandas-docs/version/0.20.3/missing_data.html#missing-data-fillna) (except for `method='nearest'`) 或 [interpolate](http://pandas.pydata.org/pandas-docs/version/0.20.3/missing_data.html#missing-data-interpolate)实现:



In [30]:
ts2.reindex(ts.index).fillna(method='ffill')

2000-01-03   -0.337940
2000-01-04   -0.337940
2000-01-05   -0.337940
2000-01-06    1.347765
2000-01-07    1.347765
2000-01-08    1.347765
2000-01-09    0.392016
2000-01-10    0.392016
Freq: D, dtype: float64

如果索引不是单调递增或递减的，[`reindex()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.Series.reindex.html#pandas.Series.reindex) 会触发 ValueError. [`fillna()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.Series.fillna.html#pandas.Series.fillna) 和[`interpolate()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.Series.interpolate.html#pandas.Series.interpolate) 则不会对索引的顺序进行任何检查。

**Limits on filling while reindexing**

The `limit` and `tolerance` arguments provide additional control over filling while reindexing. Limit specifies the maximum count of consecutive matches:

# 重新索引时填充的限制

`limit`和`tolerance`参数在重建索引时提供对填充的额外控制。限制指定连续匹配的最大数量：

In [31]:
ts2.reindex(ts.index, method='ffill', limit=1)

2000-01-03   -0.337940
2000-01-04   -0.337940
2000-01-05         NaN
2000-01-06    1.347765
2000-01-07    1.347765
2000-01-08         NaN
2000-01-09    0.392016
2000-01-10    0.392016
Freq: D, dtype: float64

In contrast, tolerance specifies the maximum distance between the index and indexer values:

相反，tolerance 指定索引和索引器值之间的最大距离：

In [32]:
ts2.reindex(ts.index, method='ffill', tolerance='1 day')

2000-01-03   -0.337940
2000-01-04   -0.337940
2000-01-05         NaN
2000-01-06    1.347765
2000-01-07    1.347765
2000-01-08         NaN
2000-01-09    0.392016
2000-01-10    0.392016
Freq: D, dtype: float64

Notice that when used on a `DatetimeIndex`, `TimedeltaIndex` or `PeriodIndex`, `tolerance` will coerced into a `Timedelta` if possible. This allows you to specify tolerance with appropriate strings.

请注意 当在`DatetimeIndex`, `TimedeltaIndex` 或 `PeriodIndex`, `tolerance`上使用时，如果可能，将被强制转换为`Timedelta`. 这允许使用适当的字符串指定容差。

**Dropping labels from an axis**

A method closely related to `reindex` is the [`drop()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.drop.html#pandas.DataFrame.drop) function. It removes a set of labels from an axis:

# 从轴上删除标签

与 `reindex`密切相关的方法是[`drop()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.drop.html#pandas.DataFrame.drop) 函数。它从轴中删除一组标签。

In [33]:
df

Unnamed: 0,one,two,three
a,1.296161,0.425753,
b,0.8157,0.893428,0.021893
c,0.876341,0.674101,-0.748436
d,,0.441937,-0.268829


In [34]:
df.drop(['a', 'd'], axis=0)

Unnamed: 0,one,two,three
b,0.8157,0.893428,0.021893
c,0.876341,0.674101,-0.748436


In [36]:
df.drop(['one'], axis=1)

Unnamed: 0,two,three
a,0.425753,
b,0.893428,0.021893
c,0.674101,-0.748436
d,0.441937,-0.268829


Note that the following also works, but is a bit less obvious / clean:

请注意，下面的代码也可以工作，但不太明显/干净：

In [37]:
df.reindex(df.index.difference(['a', 'd']))

Unnamed: 0,one,two,three
b,0.8157,0.893428,0.021893
c,0.876341,0.674101,-0.748436


**Renaming / mapping labels**

The [`rename()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.rename.html#pandas.DataFrame.rename) method allows you to relabel an axis based on some mapping (a dict or Series) or an arbitrary function.

# 重命名 / 映射标签

[`rename()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.rename.html#pandas.DataFrame.rename) 方法允许基于一些映射（dict 或 Series）或一个任意函数重新标记一个轴。

In [38]:
s

a    1.194754
b   -1.660928
c   -0.873620
d   -0.939589
e    1.095333
dtype: float64

In [39]:
s.rename(str.upper)

A    1.194754
B   -1.660928
C   -0.873620
D   -0.939589
E    1.095333
dtype: float64

If you pass a function, it must return a value when called with any of the labels (and must produce a set of unique values). A dict or Series can also be used:

如果传递函数，则在使用任何标签调用时必须返回一个值（并且必须生成一组唯一值）。 也可以使用词典或Series：

In [40]:
df.rename(columns={'one' : 'foo', 'two' : 'bar'},
          index={'a' : 'apple', 'b' : 'banana', 'd' : 'durian'})

Unnamed: 0,foo,bar,three
apple,1.296161,0.425753,
banana,0.8157,0.893428,0.021893
c,0.876341,0.674101,-0.748436
durian,,0.441937,-0.268829


If the mapping doesn’t include a column/index label, it isn’t renamed. Also extra labels in the mapping don’t throw an error.

The [`rename()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.rename.html#pandas.DataFrame.rename) method also provides an `inplace` named parameter that is by default `False` and copies the underlying data. Pass `inplace=True` to rename the data in place.

如果映射不包含列/索引标签，则不会重命名。 映射中的额外标签也不会引发错误。

[`rename()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.rename.html#pandas.DataFrame.rename) 方法还提供一个`inplace`命名参数，默认为`False` 并复制基础数据。传递`inplace=True`以在原地重命名数据（即替换数据）。

New in version 0.18.0.

Finally, [`rename()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.Series.rename.html#pandas.Series.rename) also accepts a scalar or list-like for altering the `Series.name` attribute.

最后, [`rename()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.Series.rename.html#pandas.Series.rename) 也接受一个标量 或类似列表来改变`Series.name`属性。

In [41]:
s.rename("scalar-name")

a    1.194754
b   -1.660928
c   -0.873620
d   -0.939589
e    1.095333
Name: scalar-name, dtype: float64