# Reindexing 

Reindexing changes the row labels and column labels of a DataFrame. To reindex means to conform the data to match a given set of labels along a particular axis.

Multiple operations can be accomplished through indexing like −

- Reorder the existing data to match a new set of labels.
- Insert missing value (NA) markers in label locations where no data for the label existed.

In [1]:
import numpy as np
import pandas as pd

In [2]:
N=20

df = pd.DataFrame({
   'A': pd.date_range(start='2016-01-01',periods=N,freq='D'),
   'x': np.linspace(0,stop=N-1,num=N),
   'y': np.random.rand(N),
   'C': np.random.choice(['Low','Medium','High'],N).tolist(),
   'D': np.random.normal(100, 10, size=(N)).tolist()
})
# >>> mu, sigma = 0, 0.1 # mean and standard deviation
# >>> s = np.random.normal(mu, sigma, 1000
df

Unnamed: 0,A,x,y,C,D
0,2016-01-01,0.0,0.084295,High,98.975067
1,2016-01-02,1.0,0.633611,Medium,104.284452
2,2016-01-03,2.0,0.458543,Low,105.012884
3,2016-01-04,3.0,0.807476,Medium,95.80397
4,2016-01-05,4.0,0.1118,High,104.356997
5,2016-01-06,5.0,0.639625,Low,104.656145
6,2016-01-07,6.0,0.421727,High,99.349359
7,2016-01-08,7.0,0.727934,Low,95.614671
8,2016-01-09,8.0,0.601443,High,105.825422
9,2016-01-10,9.0,0.205072,Medium,93.59311


### Reindexing the DataFrame - reindex()

In [3]:
df_reindexed = df.reindex(index=[0,2,4],columns=['A','B','D'])
df_reindexed

Unnamed: 0,A,B,D
0,2016-01-01,,98.975067
2,2016-01-03,,105.012884
4,2016-01-05,,104.356997


In [4]:
# just tried to achive the same result by other ways 
df.loc[[0,2,4]]['A']
# But
# df.loc[[0,2,4]]['A','B'] 
# This gives Error

0   2016-01-01
2   2016-01-03
4   2016-01-05
Name: A, dtype: datetime64[ns]

## Reindex to Align with Other Object - reindex_like() 
You may wish to take an object and reindex its axes to be labeled the same as another object. Consider the following example to understand the same.

In [5]:
df1 = pd.DataFrame(np.random.randn(10,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(7,3),columns=['col1','col2','col3'])

df1

Unnamed: 0,col1,col2,col3
0,0.579212,-1.64155,0.99051
1,-0.031901,-0.575555,-0.764441
2,0.731548,-0.0221,-0.798959
3,1.23135,-0.455113,-0.076959
4,0.316983,-0.158051,-1.265
5,1.636887,0.464422,-0.816437
6,0.806206,0.607134,-0.86832
7,-1.416288,-1.4069,0.69538
8,-0.227899,0.195989,1.343528
9,-1.911784,-0.168505,0.159811


In [6]:
df2

Unnamed: 0,col1,col2,col3
0,-1.094861,-1.039001,-0.193096
1,-1.264464,-1.450976,1.358388
2,-1.491934,0.175083,1.007504
3,0.659265,-0.463624,-0.727667
4,-0.295408,0.80029,-1.047608
5,1.390072,1.365739,0.085062
6,-0.797986,-1.078639,-1.234535


In [7]:
df1_ri = df1.reindex_like(df2)
df1_ri

# df1's shape is modified like a shape of df2 -- Also data outside of the shape is discaded

Unnamed: 0,col1,col2,col3
0,0.579212,-1.64155,0.99051
1,-0.031901,-0.575555,-0.764441
2,0.731548,-0.0221,-0.798959
3,1.23135,-0.455113,-0.076959
4,0.316983,-0.158051,-1.265
5,1.636887,0.464422,-0.816437
6,0.806206,0.607134,-0.86832


In [8]:
df1.tail(1) # Original df is not modified

Unnamed: 0,col1,col2,col3
9,-1.911784,-0.168505,0.159811


**Note** − Here, the df1 DataFrame is altered and reindexed like df2. The column names should be matched or else NAN will be added for the entire column label.

## Filling while ReIndexing - using 'method' in reindex_like()
reindex() takes an optional parameter method which is a filling method with values as follows −

- pad/ffill − Fill values forward

- bfill/backfill − Fill values backward

- nearest − Fill from the nearest index values

In [9]:
df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])

In [10]:
# Padding NAN's
df2.reindex_like(df1)

Unnamed: 0,col1,col2,col3
0,1.755579,-0.306185,-1.673146
1,-1.055469,0.185424,-1.219721
2,,,
3,,,
4,,,
5,,,


In [11]:
# Now Fill the NAN's with preceding Values
df2.reindex_like(df1,method='ffill')

Unnamed: 0,col1,col2,col3
0,1.755579,-0.306185,-1.673146
1,-1.055469,0.185424,-1.219721
2,-1.055469,0.185424,-1.219721
3,-1.055469,0.185424,-1.219721
4,-1.055469,0.185424,-1.219721
5,-1.055469,0.185424,-1.219721


In [12]:
# Back Fill 

df2.reindex_like(df1,method='bfill') 

# NaN because we don't have any value at last

Unnamed: 0,col1,col2,col3
0,1.755579,-0.306185,-1.673146
1,-1.055469,0.185424,-1.219721
2,,,
3,,,
4,,,
5,,,


In [13]:
df2.reindex_like(df1,method='nearest') 

Unnamed: 0,col1,col2,col3
0,1.755579,-0.306185,-1.673146
1,-1.055469,0.185424,-1.219721
2,-1.055469,0.185424,-1.219721
3,-1.055469,0.185424,-1.219721
4,-1.055469,0.185424,-1.219721
5,-1.055469,0.185424,-1.219721


## Limits on Filling while Reindexing
The limit argument provides additional control over filling while reindexing. Limit specifies the maximum count of consecutive matches. Let us consider the following example to understand

In [14]:
df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])

In [15]:
df2.reindex_like(df1)

Unnamed: 0,col1,col2,col3
0,-0.649026,-1.999373,2.257708
1,1.333717,0.3513,0.537432
2,,,
3,,,
4,,,
5,,,


In [20]:
df2.reindex_like(df1,method='ffill',limit = 1)

Unnamed: 0,col1,col2,col3
0,-0.649026,-1.999373,2.257708
1,1.333717,0.3513,0.537432
2,1.333717,0.3513,0.537432
3,1.333717,0.3513,0.537432
4,1.333717,0.3513,0.537432
5,1.333717,0.3513,0.537432


In [21]:
df2.reindex_like(df1,method='ffill',limit=1)
# Don't know why it is not giving proper result
# come back later to figure it out

Unnamed: 0,col1,col2,col3
0,-0.649026,-1.999373,2.257708
1,1.333717,0.3513,0.537432
2,1.333717,0.3513,0.537432
3,1.333717,0.3513,0.537432
4,1.333717,0.3513,0.537432
5,1.333717,0.3513,0.537432


## Renaming
The rename() method allows you to relabel an axis based on some mapping (a dict or Series) or an arbitrary function.

Let us consider the following example to understand this:

In [31]:
df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df1

Unnamed: 0,col1,col2,col3
0,0.627133,0.362693,-0.592934
1,-0.079474,0.827624,1.113567
2,1.073613,-0.007781,0.43866
3,-0.110881,-0.167553,1.473521
4,0.831487,1.091505,0.474626
5,-1.284327,-0.91265,-1.114718


In [33]:
df1.rename( index =  {0: 'car', 1: 'Bus'},
            columns = {'col1': 'Milage', 'col2': "Speed"}
            )

Unnamed: 0,Milage,Speed,col3
car,0.627133,0.362693,-0.592934
Bus,-0.079474,0.827624,1.113567
2,1.073613,-0.007781,0.43866
3,-0.110881,-0.167553,1.473521
4,0.831487,1.091505,0.474626
5,-1.284327,-0.91265,-1.114718


#### Note 
The rename() method provides an inplace named parameter, which by default is False and copies the underlying data. Pass inplace=True to rename the data in place.

In [35]:
df1.head(2)

Unnamed: 0,col1,col2,col3
0,0.627133,0.362693,-0.592934
1,-0.079474,0.827624,1.113567


End 