# 8.3.1重新索引

pandas中的reindex方法可以为Series和DataFrame添加或者删除索引。
如果新添加的索引没有对应的值，则默认为NaN。
如果减少索引，就相当于一个切片操作。
下面是对Series使用reindex方法的实例:

In [22]:
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
s1 = Series([1, 2, 3, 4], index=['A', 'B', 'C', 'D'])
print(s1)

A    1
B    2
C    3
D    4
dtype: int64


重新指定index， 多出来的index，可以使用fill_value填充

In [23]:
s1.reindex(index=['A', 'B', 'C', 'D', 'E'], fill_value = 10)

A     1
B     2
C     3
D     4
E    10
dtype: int64

修改索引，将s2的索引增加到15个，如果新增加的索引值不存在，默认为NaN

In [24]:
s2 = Series(['A', 'B', 'C'], index = [1, 5, 10])
print(s2.reindex(index=range(15)))

0     NaN
1       A
2     NaN
3     NaN
4     NaN
5       B
6     NaN
7     NaN
8     NaN
9     NaN
10      C
11    NaN
12    NaN
13    NaN
14    NaN
dtype: object


ffill： 表示forward fill，向前填充
如果新增加索引的值不存在，那么按照前一个非NaN的值填充进去

In [25]:
print(s2.reindex(index=range(15), method='ffill'))

0     NaN
1       A
2       A
3       A
4       A
5       B
6       B
7       B
8       B
9       B
10      C
11      C
12      C
13      C
14      C
dtype: object


In [26]:
s1.reindex(['A', 'B'])

A    1
B    2
dtype: int64

dataframe索引

In [27]:
df1 = DataFrame(np.random.rand(25).reshape([5, 5]), index=['A', 'B', 'D', 'E', 'F'], columns=['c1', 'c2', 'c3', 'c4', 'c5'])
print(df1)

         c1        c2        c3        c4        c5
A  0.523932  0.927819  0.393898  0.008008  0.795789
B  0.096543  0.598717  0.026209  0.810584  0.094924
D  0.214517  0.119733  0.829098  0.702556  0.968878
E  0.969731  0.473640  0.081297  0.673761  0.211721
F  0.631516  0.533387  0.543797  0.935570  0.251229


为DataFrame添加一个新的索引
可以看到自动扩充为NaN

In [28]:
print(df1.reindex(index=['A', 'B', 'C', 'D', 'E', 'F']))

         c1        c2        c3        c4        c5
A  0.523932  0.927819  0.393898  0.008008  0.795789
B  0.096543  0.598717  0.026209  0.810584  0.094924
C       NaN       NaN       NaN       NaN       NaN
D  0.214517  0.119733  0.829098  0.702556  0.968878
E  0.969731  0.473640  0.081297  0.673761  0.211721
F  0.631516  0.533387  0.543797  0.935570  0.251229


扩充列

In [29]:
df1.reindex(columns=['c1', 'c2', 'c3', 'c4', 'c5', 'c6'])

Unnamed: 0,c1,c2,c3,c4,c5,c6
A,0.523932,0.927819,0.393898,0.008008,0.795789,
B,0.096543,0.598717,0.026209,0.810584,0.094924,
D,0.214517,0.119733,0.829098,0.702556,0.968878,
E,0.969731,0.47364,0.081297,0.673761,0.211721,
F,0.631516,0.533387,0.543797,0.93557,0.251229,


减少index

In [30]:
df1.reindex(index=['A', 'B'])

Unnamed: 0,c1,c2,c3,c4,c5
A,0.523932,0.927819,0.393898,0.008008,0.795789
B,0.096543,0.598717,0.026209,0.810584,0.094924
