In [1]:
import numpy as np
import pandas as pd

#### reindex() 

it's more like **filtering**

In [2]:
s1 = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c' ,'d', 'e'])
s2 = pd.Series([6, 7, 8, 9, 0], index=['a', 'b', 'c' ,'d', 'e'])

In [3]:
s1.reindex(['a' ,'b', 'q'])

a    1.0
b    2.0
q    NaN
dtype: float64

In [4]:
s1.reindex(s2.index)

a    1
b    2
c    3
d    4
e    5
dtype: int64

In [5]:
s1.reindex(['a', 'a', 'b'])

a    1
a    1
b    2
dtype: int64

<br> you can do this with DataFrames too

In [6]:
df = pd.DataFrame(
    {
        "one": pd.Series(np.random.randn(3), index=["a", "b", "c"]),
        "two": pd.Series(np.random.randn(4), index=["a", "b", "c", "d"]),
        "three": pd.Series(np.random.randn(3), index=["b", "c", "d"]),
    }
)

df

Unnamed: 0,one,two,three
a,0.729924,1.037791,
b,0.163467,-1.100254,-1.462609
c,-0.077453,-0.787528,0.682092
d,,-2.190634,-0.246684


In [7]:
df.reindex(index=['a', 'b', 'c', 'z'])

Unnamed: 0,one,two,three
a,0.729924,1.037791,
b,0.163467,-1.100254,-1.462609
c,-0.077453,-0.787528,0.682092
z,,,


In [8]:
df.reindex(index=['a', 'c', 'z'], columns=['one', 'four'])

Unnamed: 0,one,four
a,0.729924,
c,-0.077453,
z,,


unlike Series you can give dataframe a **column** option, which will filter columns too !

Note that the Index objects containing the actual axis labels can be shared between objects. So if we have a Series and a DataFrame, the following can be done:



In [9]:
tmp = df.reindex(index=s1.index)
tmp

Unnamed: 0,one,two,three
a,0.729924,1.037791,
b,0.163467,-1.100254,-1.462609
c,-0.077453,-0.787528,0.682092
d,,-2.190634,-0.246684
e,,,


In [13]:
# and now, it i change the value of s1['a'], tmp is gonna change too...
s1.index is tmp.index

False

In [11]:
s1.index = ['z', *s1.index[1:]]
s1

z    1
b    2
c    3
d    4
e    5
dtype: int64

In [14]:
tmp

Unnamed: 0,one,two,three
a,0.729924,1.037791,
b,0.163467,-1.100254,-1.462609
c,-0.077453,-0.787528,0.682092
d,,-2.190634,-0.246684
e,,,


When writing performance-sensitive code, there is a good reason to spend some time becoming a reindexing ninja:<br>
many operations are faster on **pre-aligned data**. Adding two unaligned DataFrames internally triggers a reindexing step. For exploratory analysis you will hardly notice the difference (because reindex has been heavily optimized), but when CPU cycles matter sprinkling a few explicit reindex calls here and there can have an impact

### reindex_like()

You may wish to take an object and reindex its axes to be labeled the same as another object. While the syntax for this is straightforward albeit verbose, it is a common enough operation that the reindex_like() method is available to make this simpler:

In [15]:
df

Unnamed: 0,one,two,three
a,0.729924,1.037791,
b,0.163467,-1.100254,-1.462609
c,-0.077453,-0.787528,0.682092
d,,-2.190634,-0.246684


In [19]:
df2 = df.reindex(['c', 'b', 'a', 'f'], columns=['one'])
df2

Unnamed: 0,one
c,-0.077453
b,0.163467
a,0.729924
f,


In [20]:
df.reindex_like(df2)

Unnamed: 0,one
c,-0.077453
b,0.163467
a,0.729924
f,


### Aligning objects with each other with


The align() method is the fastest way to simultaneously align two objects. It supports a join argument (related to joining and merging which will be covered later):

- join='outer': take the union of the indexes (default)
- join='left': use the calling object’s index
- join='right': use the passed object’s index
- join='inner': intersect the indexes

In [24]:
s = pd.Series(np.random.randn(5), ['a', 'b', 'c', 'd', 'e'])
s

a    1.084174
b   -1.945504
c    1.731469
d   -0.344387
e   -1.063203
dtype: float64

In [25]:
s1 = s[1:]
s1

b   -1.945504
c    1.731469
d   -0.344387
e   -1.063203
dtype: float64

In [26]:
s2 = s[:-1]
s2

a    1.084174
b   -1.945504
c    1.731469
d   -0.344387
dtype: float64

In [29]:
s1.align(s2, join='inner')

(b   -1.945504
 c    1.731469
 d   -0.344387
 dtype: float64,
 b   -1.945504
 c    1.731469
 d   -0.344387
 dtype: float64)

In [30]:
s1.align(s2, join='outer')

(a         NaN
 b   -1.945504
 c    1.731469
 d   -0.344387
 e   -1.063203
 dtype: float64,
 a    1.084174
 b   -1.945504
 c    1.731469
 d   -0.344387
 e         NaN
 dtype: float64)

In [31]:
s1.align(s2, join='left')

(b   -1.945504
 c    1.731469
 d   -0.344387
 e   -1.063203
 dtype: float64,
 b   -1.945504
 c    1.731469
 d   -0.344387
 e         NaN
 dtype: float64)

In [33]:
s1.align(s2, join='right')

(a         NaN
 b   -1.945504
 c    1.731469
 d   -0.344387
 dtype: float64,
 a    1.084174
 b   -1.945504
 c    1.731469
 d   -0.344387
 dtype: float64)

<br>
agian, you can use align for data frames

In [34]:
df

Unnamed: 0,one,two,three
a,0.729924,1.037791,
b,0.163467,-1.100254,-1.462609
c,-0.077453,-0.787528,0.682092
d,,-2.190634,-0.246684


In [35]:
df2

Unnamed: 0,one
c,-0.077453
b,0.163467
a,0.729924
f,


In [41]:
df2.align(df, join='inner')

(        one
 c -0.077453
 b  0.163467
 a  0.729924,
         one
 c -0.077453
 b  0.163467
 a  0.729924)

if you use align for dataframes, you can pass **axis** to it:

In [40]:
df2.align(df, join='inner', axis=0)

(        one
 c -0.077453
 b  0.163467
 a  0.729924,
         one       two     three
 c -0.077453 -0.787528  0.682092
 b  0.163467 -1.100254 -1.462609
 a  0.729924  1.037791       NaN)

### Filling while reindexing


reindex() takes an optional parameter method which is a filling method chosen from the following table:



In [43]:
rng = pd.date_range("1/3/2000", periods=8)
rng

DatetimeIndex(['2000-01-03', '2000-01-04', '2000-01-05', '2000-01-06',
               '2000-01-07', '2000-01-08', '2000-01-09', '2000-01-10'],
              dtype='datetime64[ns]', freq='D')

In [45]:
ts = pd.Series(np.random.randn(8), index=rng)
ts

2000-01-03   -0.649459
2000-01-04    1.590622
2000-01-05    0.415590
2000-01-06    0.765835
2000-01-07   -0.817032
2000-01-08   -1.561473
2000-01-09   -2.139507
2000-01-10   -0.072055
Freq: D, dtype: float64

In [47]:
ts2 = ts[[0, 3, 6]]
ts2

2000-01-03   -0.649459
2000-01-06    0.765835
2000-01-09   -2.139507
Freq: 3D, dtype: float64

In [50]:
ts2.reindex(ts.index)

2000-01-03   -0.649459
2000-01-04         NaN
2000-01-05         NaN
2000-01-06    0.765835
2000-01-07         NaN
2000-01-08         NaN
2000-01-09   -2.139507
2000-01-10         NaN
Freq: D, dtype: float64

as we can see, there are some NAN values in the table, and here is the way we can fill them: 

In [51]:
ts2.reindex(ts.index, method='ffill')

2000-01-03   -0.649459
2000-01-04   -0.649459
2000-01-05   -0.649459
2000-01-06    0.765835
2000-01-07    0.765835
2000-01-08    0.765835
2000-01-09   -2.139507
2000-01-10   -2.139507
Freq: D, dtype: float64

**Important** <br>

These methods require that the indexes are **ordered** increasing or decreasing.
Note that the same result could have been achieved using fillna (except for method='nearest') or interpolate:


In [56]:
ts2.reindex(ts.index).fillna(method='ffill') # just like the above one

2000-01-03   -0.649459
2000-01-04   -0.649459
2000-01-05   -0.649459
2000-01-06    0.765835
2000-01-07    0.765835
2000-01-08    0.765835
2000-01-09   -2.139507
2000-01-10   -2.139507
Freq: D, dtype: float64

### Limits on filling while reindexing


In [58]:
# Limit specifies the maximum count of consecutive matches
ts2.reindex(ts.index, method="ffill", limit=1)

2000-01-03   -0.649459
2000-01-04   -0.649459
2000-01-05         NaN
2000-01-06    0.765835
2000-01-07    0.765835
2000-01-08         NaN
2000-01-09   -2.139507
2000-01-10   -2.139507
Freq: D, dtype: float64

In [59]:
# In contrast, tolerance specifies the maximum distance between the index and indexer values:
ts2.reindex(ts.index, method="ffill", tolerance="1 day")

2000-01-03   -0.649459
2000-01-04   -0.649459
2000-01-05         NaN
2000-01-06    0.765835
2000-01-07    0.765835
2000-01-08         NaN
2000-01-09   -2.139507
2000-01-10   -2.139507
Freq: D, dtype: float64

### Dropping labels from an axis


In [60]:
df

Unnamed: 0,one,two,three
a,0.729924,1.037791,
b,0.163467,-1.100254,-1.462609
c,-0.077453,-0.787528,0.682092
d,,-2.190634,-0.246684


In [63]:
df.drop('a')

Unnamed: 0,one,two,three
b,0.163467,-1.100254,-1.462609
c,-0.077453,-0.787528,0.682092
d,,-2.190634,-0.246684


In [64]:
df.drop(['a', 'c', 'd'])

Unnamed: 0,one,two,three
b,0.163467,-1.100254,-1.462609


In [65]:
df.drop(index=['a', 'b'], columns=['one', 'two'])

Unnamed: 0,three
c,0.682092
d,-0.246684


### Renaming / mapping labels

In [66]:
s

a    1.084174
b   -1.945504
c    1.731469
d   -0.344387
e   -1.063203
dtype: float64

In [71]:
s.rename('my_serie') # scaller will change Serie's name

a    1.084174
b   -1.945504
c    1.731469
d   -0.344387
e   -1.063203
Name: my_serie, dtype: float64

In [73]:
s.rename(lambda x: x.upper()) # function will change the Serie's indexes

A    1.084174
B   -1.945504
C    1.731469
D   -0.344387
E   -1.063203
dtype: float64

In [79]:
s.rename(
    index={"a": "z", "b": "x", "d": "y"},
)

z    1.084174
x   -1.945504
c    1.731469
y   -0.344387
e   -1.063203
dtype: float64

In [85]:
df.rename(
    index={"a": "z", "b": "x", "d": "y"},
    columns={'one': 'X'}
)

Unnamed: 0,X,two,three
z,0.729924,1.037791,
x,0.163467,-1.100254,-1.462609
c,-0.077453,-0.787528,0.682092
y,,-2.190634,-0.246684


In [87]:
df.rename({"one": "X"}, axis="columns")  # like always, you can use axis instead of above method..

Unnamed: 0,X,two,three
a,0.729924,1.037791,
b,0.163467,-1.100254,-1.462609
c,-0.077453,-0.787528,0.682092
d,,-2.190634,-0.246684
