## Reindexing

In [1]:
import pandas as pd
import numpy as np

In [2]:
candy_bars = pd.DataFrame({
    'name': ['Snickers','Dairy Milk','KitKat','Oh Henry!','Ferrero Rocher'],
    'company':['Mars','Cadbury','Nestle','Nestle','Ferrero SpA' ],
    'cover' : ['brown','purple', 'red','yellow','gold']
})

In [3]:
candy_bars

Unnamed: 0,name,company,cover
0,Snickers,Mars,brown
1,Dairy Milk,Cadbury,purple
2,KitKat,Nestle,red
3,Oh Henry!,Nestle,yellow
4,Ferrero Rocher,Ferrero SpA,gold


Reindexing the dataframe

In [4]:
candy_reindexed = candy_bars.reindex(index = [0,1,3], 
                                     columns = ['name', 'company'])
candy_reindexed

Unnamed: 0,name,company
0,Snickers,Mars
1,Dairy Milk,Cadbury
3,Oh Henry!,Nestle


In [5]:
candy_reindexed = candy_bars.reindex(index = [0,1,3,5,6], 
                                     columns = ['name','company','rating'])
candy_reindexed

Unnamed: 0,name,company,rating
0,Snickers,Mars,
1,Dairy Milk,Cadbury,
3,Oh Henry!,Nestle,
5,,,
6,,,


In [6]:
df = pd.DataFrame({
    'name':['name1','name2','name3','name4'],
    'company':['A','B','C','D']
})

df

Unnamed: 0,name,company
0,name1,A
1,name2,B
2,name3,C
3,name4,D


Re-indexing can also be used to match 2 dataframes, especially while dealing with large data from different sources.

The important thing to note here is that the column names should be same in both datasets

In [7]:
candy_bars_reindex_like = candy_bars.reindex_like(df)

candy_bars_reindex_like

Unnamed: 0,name,company
0,Snickers,Mars
1,Dairy Milk,Cadbury
2,KitKat,Nestle
3,Oh Henry!,Nestle


As the df dataframe has only 4 rows and 2 columns, you can see that the chocolates_dataframe has been truncated to the same size.

In [8]:
df = pd.DataFrame({'name':['name1','name2','name3','name4', 
                           'name5','name6','name7','name8'],
                   'company':['A','B','C','D','E','F','G','H']
                  },
                  index=[-2,-1,-0,1,2,3,4,5])

df

Unnamed: 0,name,company
-2,name1,A
-1,name2,B
0,name3,C
1,name4,D
2,name5,E
3,name6,F
4,name7,G
5,name8,H


In [9]:
candy_bars_reindexed = candy_bars.reindex_like(df)

candy_bars_reindexed

Unnamed: 0,name,company
-2,,
-1,,
0,Snickers,Mars
1,Dairy Milk,Cadbury
2,KitKat,Nestle
3,Oh Henry!,Nestle
4,Ferrero Rocher,Ferrero SpA
5,,


The extra rows are filled with Nan. 

To fill instead with the last value, use forward fill (ffill). You can also set a limit to it

Backward fill fills previous rows with the first value

In [10]:
candy_bars_reindexed.ffill()

Unnamed: 0,name,company
-2,,
-1,,
0,Snickers,Mars
1,Dairy Milk,Cadbury
2,KitKat,Nestle
3,Oh Henry!,Nestle
4,Ferrero Rocher,Ferrero SpA
5,Ferrero Rocher,Ferrero SpA


In [11]:
candy_bars_reindexed.bfill()

Unnamed: 0,name,company
-2,Snickers,Mars
-1,Snickers,Mars
0,Snickers,Mars
1,Dairy Milk,Cadbury
2,KitKat,Nestle
3,Oh Henry!,Nestle
4,Ferrero Rocher,Ferrero SpA
5,,


In [12]:
candy_bars_reindexed.bfill(limit = 1)

Unnamed: 0,name,company
-2,,
-1,Snickers,Mars
0,Snickers,Mars
1,Dairy Milk,Cadbury
2,KitKat,Nestle
3,Oh Henry!,Nestle
4,Ferrero Rocher,Ferrero SpA
5,,
