# Reindexing 

Reindexing changes the row labels and column labels of a DataFrame. To reindex means to conform the data to match a given set of labels along a particular axis.

Multiple operations can be accomplished through indexing like −

- Reorder the existing data to match a new set of labels.
- Insert missing value (NA) markers in label locations where no data for the label existed.

In [1]:
import numpy as np
import pandas as pd

In [2]:
N=20

df = pd.DataFrame({
   'A': pd.date_range(start='2016-01-01',periods=N,freq='D'),
   'x': np.linspace(0,stop=N-1,num=N),
   'y': np.random.rand(N),
   'C': np.random.choice(['Low','Medium','High'],N).tolist(),
   'D': np.random.normal(100, 10, size=(N)).tolist()
})
# >>> mu, sigma = 0, 0.1 # mean and standard deviation
# >>> s = np.random.normal(mu, sigma, 1000
df

Unnamed: 0,A,x,y,C,D
0,2016-01-01,0.0,0.123499,Low,89.984903
1,2016-01-02,1.0,0.588643,Medium,95.310866
2,2016-01-03,2.0,0.665878,Low,99.04054
3,2016-01-04,3.0,0.490203,Low,100.830349
4,2016-01-05,4.0,0.97264,Low,83.715477
5,2016-01-06,5.0,0.874173,Medium,96.613757
6,2016-01-07,6.0,0.052214,Low,101.413881
7,2016-01-08,7.0,0.118664,Medium,115.68169
8,2016-01-09,8.0,0.421724,Low,103.53396
9,2016-01-10,9.0,0.792591,Medium,104.606151


Reindexing the DataFrame

In [3]:
df_reindexed = df.reindex(index=[0,2,4],columns=['A','B','C'])
df_reindexed

Unnamed: 0,A,B,C
0,2016-01-01,,Low
2,2016-01-03,,Low
4,2016-01-05,,Low


## Reindex to Align with Other Object - reindex_like() 
You may wish to take an object and reindex its axes to be labeled the same as another object. Consider the following example to understand the same.

In [4]:
df1 = pd.DataFrame(np.random.randn(10,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(7,3),columns=['col1','col2','col3'])

df1

Unnamed: 0,col1,col2,col3
0,-0.055848,0.53331,-0.732679
1,0.923831,-1.041791,0.585203
2,0.491017,-1.210538,-1.057872
3,-0.762001,-0.901715,1.28471
4,-0.699709,-0.73889,-0.895473
5,-0.063648,-0.445943,-0.145567
6,-1.4714,-0.865216,-1.219163
7,-0.17452,0.429434,2.050769
8,-1.130269,2.026493,-0.596864
9,3.039914,-1.355127,0.87256


In [5]:
df2

Unnamed: 0,col1,col2,col3
0,0.383865,-0.781043,0.95977
1,-1.72827,-2.169801,0.640694
2,1.838005,-1.805634,1.233053
3,2.353332,0.758111,0.145755
4,-1.206832,0.146936,-2.362933
5,1.307987,-1.668927,-1.316135
6,-0.65614,-0.499717,-1.691303


In [6]:
df1_ri = df1.reindex_like(df2)
df1_ri

# df1's shape is modified like a shape of df2

Unnamed: 0,col1,col2,col3
0,-0.055848,0.53331,-0.732679
1,0.923831,-1.041791,0.585203
2,0.491017,-1.210538,-1.057872
3,-0.762001,-0.901715,1.28471
4,-0.699709,-0.73889,-0.895473
5,-0.063648,-0.445943,-0.145567
6,-1.4714,-0.865216,-1.219163


**Note** − Here, the df1 DataFrame is altered and reindexed like df2. The column names should be matched or else NAN will be added for the entire column label.

## Filling while ReIndexing
reindex() takes an optional parameter method which is a filling method with values as follows −

- pad/ffill − Fill values forward

- bfill/backfill − Fill values backward

- nearest − Fill from the nearest index values

In [7]:
df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])

In [8]:
# Padding NAN's
df2.reindex_like(df1)

Unnamed: 0,col1,col2,col3
0,-1.152842,-0.908572,0.750445
1,-0.045374,0.812833,0.291029
2,,,
3,,,
4,,,
5,,,


In [9]:
# Now Fill the NAN's with preceding Values
df2.reindex_like(df1,method='ffill')

Unnamed: 0,col1,col2,col3
0,-1.152842,-0.908572,0.750445
1,-0.045374,0.812833,0.291029
2,-0.045374,0.812833,0.291029
3,-0.045374,0.812833,0.291029
4,-0.045374,0.812833,0.291029
5,-0.045374,0.812833,0.291029


In [10]:
# Back Fill 

df2.reindex_like(df1,method='bfill') 

# NaN because we don't have any value at last

Unnamed: 0,col1,col2,col3
0,-1.152842,-0.908572,0.750445
1,-0.045374,0.812833,0.291029
2,,,
3,,,
4,,,
5,,,


In [11]:
df2.reindex_like(df1,method='nearest') 

Unnamed: 0,col1,col2,col3
0,-1.152842,-0.908572,0.750445
1,-0.045374,0.812833,0.291029
2,-0.045374,0.812833,0.291029
3,-0.045374,0.812833,0.291029
4,-0.045374,0.812833,0.291029
5,-0.045374,0.812833,0.291029


## Limits on Filling while Reindexing
The limit argument provides additional control over filling while reindexing. Limit specifies the maximum count of consecutive matches. Let us consider the following example to understand the same −

In [12]:
df2.reindex_like(df1)

Unnamed: 0,col1,col2,col3
0,-1.152842,-0.908572,0.750445
1,-0.045374,0.812833,0.291029
2,,,
3,,,
4,,,
5,,,


In [13]:
df2.reindex_like(df1,method='ffill',limit = 1)

Unnamed: 0,col1,col2,col3
0,-1.152842,-0.908572,0.750445
1,-0.045374,0.812833,0.291029
2,-0.045374,0.812833,0.291029
3,-0.045374,0.812833,0.291029
4,-0.045374,0.812833,0.291029
5,-0.045374,0.812833,0.291029
