# Reindexing changes the row labels and column labels of a DataFrame. To reindex means to conform the data to match a given set of labels along a particular axis.

Multiple operations can be accomplished through indexing like −

Reorder the existing data to match a new set of labels.

Insert missing value (NA) markers in label locations where no data for the label existed.

# Reindexing the Rows
One can reindex a single row or multiple rows by using reindex() method. Default values in the new index that are not present in the dataframe are assigned NaN.

In [40]:
# import numpy and pandas module 
import pandas as pd 
import numpy as np 
  
column=['a','b','c','d','e'] 
index=['A','B','C','D','E'] 
  
# create a dataframe of random values of array 
df1 = pd.DataFrame(np.random.rand(5,5),  
            columns=column, index=index) 
  
print(df1) 
  
print('\n\nDataframe after reindexing rows: \n', df1.reindex(['B', 'D', 'A', 'C', 'E']))

          a         b         c         d         e
A  0.443144  0.863117  0.967090  0.572074  0.373850
B  0.082602  0.309696  0.336453  0.501487  0.736750
C  0.195776  0.630415  0.680766  0.283206  0.032514
D  0.034231  0.982406  0.444418  0.614612  0.644616
E  0.090357  0.325015  0.066323  0.938246  0.621118


Dataframe after reindexing rows: 
           a         b         c         d         e
B  0.082602  0.309696  0.336453  0.501487  0.736750
D  0.034231  0.982406  0.444418  0.614612  0.644616
A  0.443144  0.863117  0.967090  0.572074  0.373850
C  0.195776  0.630415  0.680766  0.283206  0.032514
E  0.090357  0.325015  0.066323  0.938246  0.621118


In [10]:
# import numpy and pandas module 
import pandas as pd 
import numpy as np 
  
column = ['a', 'b', 'c', 'd', 'e'] 
index = ['A', 'B', 'C', 'D', 'E'] 
   
# create a dataframe of random values of array  
df1 = pd.DataFrame(np.random.rand(5, 5),  
        columns = column, index = index) 
  
# create the new index for rows 
new_index =['U', 'A', 'B', 'C', 'Z'] 
  
print(df1.reindex(new_index))

          a         b         c         d         e
U       NaN       NaN       NaN       NaN       NaN
A  0.307343  0.136659  0.086458  0.888116  0.365017
B  0.021879  0.463088  0.032796  0.906684  0.771560
C  0.054575  0.149609  0.508707  0.524589  0.590389
Z       NaN       NaN       NaN       NaN       NaN


# Reindexing the columns using axis keyword
One can reindex a single column or multiple columns by using reindex() method and by specifying the axis we want to reindex. Default values in the new index that are not present in the dataframe are assigned NaN.

In [13]:
# import numpy and pandas module 
import pandas as pd 
import numpy as np 
  
column=['a','b','c','d','e'] 
index=['A','B','C','D','E'] 
  
#create a dataframe of random values of array 
df1 = pd.DataFrame(np.random.rand(5,5),  
           columns=column, index=index) 
  
colum=['e','a','b','c','d'] 
   
# create the new index for columns 
print(df1.reindex(colum, axis='columns'))

          e         a         b         c         d
A  0.541003  0.569069  0.287900  0.166292  0.228272
B  0.932306  0.859487  0.956609  0.342564  0.941535
C  0.438604  0.392831  0.215062  0.853722  0.858698
D  0.264160  0.746961  0.383011  0.475865  0.610636
E  0.703779  0.983521  0.496582  0.895621  0.458165


In [14]:
# import numpy and pandas module 
import pandas as pd 
import numpy as np 
  
column =['a', 'b', 'c', 'd', 'e'] 
index =['A', 'B', 'C', 'D', 'E'] 
   
# create a dataframe of random values of array 
df1 = pd.DataFrame(np.random.rand(5, 5),  
        columns = column, index = index) 
  
colum =['a', 'b', 'c', 'g', 'h'] 
  
# create the new index for columns 
print(df1.reindex(colum, axis ='columns')) 

          a         b         c   g   h
A  0.846496  0.996774  0.438960 NaN NaN
B  0.634472  0.139707  0.854421 NaN NaN
C  0.497452  0.843615  0.137774 NaN NaN
D  0.611333  0.608320  0.166228 NaN NaN
E  0.506571  0.529835  0.525538 NaN NaN


# Replacing the missing values
Missing values from the dataframe can be filled by passing a value to the keyword fill_value. This keyword replaces the NaN values.

In [15]:
# import numpy and pandas module 
import pandas as pd 
import numpy as np 
  
column =['a', 'b', 'c', 'd', 'e'] 
index =['A', 'B', 'C', 'D', 'E'] 
   
# create a dataframe of random values of array 
df1 = pd.DataFrame(np.random.rand(5, 5),  
        columns = column, index = index) 
  
colum =['a', 'b', 'c', 'g', 'h'] 
  
# create the new index for columns  
print(df1.reindex(colum, axis ='columns', fill_value = 1.5))

          a         b         c    g    h
A  0.239672  0.076196  0.131670  1.5  1.5
B  0.007688  0.955590  0.411585  1.5  1.5
C  0.699835  0.334325  0.333599  1.5  1.5
D  0.111823  0.710649  0.038048  1.5  1.5
E  0.293232  0.093021  0.559703  1.5  1.5


Replacing the missing data with a string.

In [17]:
# import numpy and pandas module 
import pandas as pd 
import numpy as np 
  
column =['a', 'b', 'c', 'd', 'e'] 
index =['A', 'B', 'C', 'D', 'E'] 
   
# create a dataframe of random values of array 
df1 = pd.DataFrame(np.random.rand(5, 5),  
       columns = column, index = index) 
  
colum =['a', 'b', 'c', 'g', 'h'] 
  
# create the new index for columns 
print(df1.reindex(colum, axis ='columns', fill_value ='data missing'))

          a         b         c             g             h
A  0.907035  0.760745  0.576351  data missing  data missing
B  0.655364  0.751028  0.550398  data missing  data missing
C  0.574628  0.752276  0.455496  data missing  data missing
D  0.405090  0.149018  0.101732  data missing  data missing
E  0.055434  0.411849  0.259453  data missing  data missing


# Reindex to Align with Other Objects
You may wish to take an object and reindex its axes to be labeled the same as another object. Consider the following example to understand the same.

In [21]:
import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(10,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(7,3),columns=['col1','col2','col3'])
print(df1)

df1 = df1.reindex_like(df2)
print(df1)

# Note − Here, the df1 DataFrame is altered and reindexed like df2. 
# The column names should be matched or else NAN will be added for the entire column label.

       col1      col2      col3
0 -1.580296  0.656225  0.652748
1  1.051219  0.208885 -0.885359
2 -2.017438  0.482372 -0.656926
3 -0.033910  0.775881 -0.906268
4  1.055259 -0.503661  1.015371
5 -0.805985 -0.183944  1.272977
6  1.320010 -0.520094 -1.866639
7  0.060139  1.046127 -0.208677
8  1.211727  1.099565 -0.922496
9  0.886015  1.774377 -0.390498
       col1      col2      col3
0 -1.580296  0.656225  0.652748
1  1.051219  0.208885 -0.885359
2 -2.017438  0.482372 -0.656926
3 -0.033910  0.775881 -0.906268
4  1.055259 -0.503661  1.015371
5 -0.805985 -0.183944  1.272977
6  1.320010 -0.520094 -1.866639


# Filling while ReIndexing
reindex() takes an optional parameter method which is a filling method with values as follows −

# pad/ffill − Fill values forward

# bfill/backfill − Fill values backward

# nearest − Fill from the nearest index values

In [28]:
import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])

# Padding NAN's
print(df2.reindex_like(df1))

# Now Fill the NAN's with preceding Values
print("Data Frame with Forward Fill:")
print(df2.reindex_like(df1,method='ffill'))

# Note − The last four rows are padded.

       col1      col2      col3
0 -1.137404  0.388312 -0.080256
1  0.794907 -1.093496  0.363636
2       NaN       NaN       NaN
3       NaN       NaN       NaN
4       NaN       NaN       NaN
5       NaN       NaN       NaN
Data Frame with Forward Fill:
       col1      col2      col3
0 -1.137404  0.388312 -0.080256
1  0.794907 -1.093496  0.363636
2  0.794907 -1.093496  0.363636
3  0.794907 -1.093496  0.363636
4  0.794907 -1.093496  0.363636
5  0.794907 -1.093496  0.363636


In [25]:
import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])

# Padding NAN's
print(df2.reindex_like(df1))

# Now Fill the NAN's with preceding Values
print("Data Frame with Forward Fill:")
print(df2.reindex_like(df1,method='nearest'))

# nearest same as ffill

       col1      col2      col3
0 -0.554222 -0.911926  1.081381
1  0.131455  1.617852  1.448053
2       NaN       NaN       NaN
3       NaN       NaN       NaN
4       NaN       NaN       NaN
5       NaN       NaN       NaN
Data Frame with Forward Fill:
       col1      col2      col3
0 -0.554222 -0.911926  1.081381
1  0.131455  1.617852  1.448053
2  0.131455  1.617852  1.448053
3  0.131455  1.617852  1.448053
4  0.131455  1.617852  1.448053
5  0.131455  1.617852  1.448053


# Limits on Filling while Reindexing
The limit argument provides additional control over filling while reindexing. Limit specifies the maximum count of consecutive matches. Let us consider the following example to understand the same −

In [33]:
import pandas as pd
import numpy as np
 
df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])

# Padding NAN's
print(df2.reindex_like(df1))

# Now Fill the NAN's with preceding Values
print ("Data Frame with Forward Fill limiting to 1:")
print(df2.reindex_like(df1,method='ffill',limit=1))

# Note − Observe, only the 2th row is filled by the preceding 1th row. Then, the rows are left as they are.

       col1      col2      col3
0  0.238450  0.125482  0.048011
1  1.118609  1.182929  0.006071
2       NaN       NaN       NaN
3       NaN       NaN       NaN
4       NaN       NaN       NaN
5       NaN       NaN       NaN
Data Frame with Forward Fill limiting to 1:
       col1      col2      col3
0  0.238450  0.125482  0.048011
1  1.118609  1.182929  0.006071
2  1.118609  1.182929  0.006071
3       NaN       NaN       NaN
4       NaN       NaN       NaN
5       NaN       NaN       NaN


# Renaming
The rename() method allows you to relabel an axis based on some mapping (a dict or Series) or an arbitrary function.

Let us consider the following example to understand this −

In [39]:
import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
print(df1)

print ("After renaming the rows and columns:")
print (df1.rename(columns={'col1' : 'c1', 'col2' : 'c2'}, index = {0 : 'zero', 1 : 'one', 2 : 'two'}))

       col1      col2      col3
0  0.346793  0.542074  2.426515
1  1.268244  0.397823  0.398001
2 -1.819425  1.796556 -0.347490
3 -0.731021  1.839596  0.191207
4  0.599060  1.685261 -0.146123
5  1.908652 -0.210549 -0.447093
After renaming the rows and columns:
            c1        c2      col3
zero  0.346793  0.542074  2.426515
one   1.268244  0.397823  0.398001
two  -1.819425  1.796556 -0.347490
3    -0.731021  1.839596  0.191207
4     0.599060  1.685261 -0.146123
5     1.908652 -0.210549 -0.447093
