# <center><div style="width: 370px;"> ![Panel Data](pictures/Panel_Data.jpg)

# <center> Reindexing and Altering Labels

In [1]:
import pandas as pd
import numpy as np

`reindex()` is the fundamental data alignment method in pandas. It is used to implement nearly all other features relying on label-alignment functionality. To reindex means to conform the data to match a given set of labels along a particular axis. This accomplishes several things:

Here is a simple example:

In [3]:
s = pd.Series(np.random.randn(5), index=list('abcde'))
s

a    0.602508
b   -2.022697
c   -0.965342
d   -0.875651
e   -1.415863
dtype: float64

In [6]:
s.reindex(list('ebfd'))

e   -1.415863
b   -2.022697
f         NaN
d   -0.875651
dtype: float64

Here, the `f` label was not contained in the Series and hence appears as `NaN` in the result.

With a DataFrame, you can simultaneously reindex the index and columns:

In [8]:
df = pd.DataFrame(
    {
        "one": pd.Series(np.random.randn(3), index=["a", "b", "c"]),
        "two": pd.Series(np.random.randn(4), index=["a", "b", "c", "d"]),
        "three": pd.Series(np.random.randn(3), index=["b", "c", "d"]),
    }
)

df

Unnamed: 0,one,two,three
a,-1.121518,-0.295393,
b,-2.490563,1.647826,-0.002534
c,0.229646,-0.924086,-1.545136
d,,-1.158674,-0.105505


In [9]:
df.reindex(index=["c", "f", "b"], columns=["three", "two", "one"])

Unnamed: 0,three,two,one
c,-1.545136,-0.924086,0.229646
f,,,
b,-0.002534,1.647826,-2.490563


You may also use `reindex` with an `axis` keyword:

In [10]:
df.reindex(["c", "f", "b"], axis="index")

Unnamed: 0,one,two,three
c,0.229646,-0.924086,-1.545136
f,,,
b,-2.490563,1.647826,-0.002534


> Note that the `Index` objects containing the actual axis labels can be ***shared*** between objects. So if we have a Series and a DataFrame, the following can be done:

In [11]:
rs = s.reindex(df.index)

In [12]:
rs

a    0.602508
b   -2.022697
c   -0.965342
d   -0.875651
dtype: float64

In [13]:
rs.index is df.index

True

This means that the reindexed Series’s index is the same Python object as the DataFrame’s index.

`DataFrame.reindex()` also supports an “axis-style” calling convention, where you specify a single `labels` argument and the `axis` it applies to.

In [14]:
df.reindex(["c", "f", "b"], axis="index")

Unnamed: 0,one,two,three
c,0.229646,-0.924086,-1.545136
f,,,
b,-2.490563,1.647826,-0.002534


In [15]:
df.reindex(["three", "two", "one"], axis="columns")

Unnamed: 0,three,two,one
a,,-0.295393,-1.121518
b,-0.002534,1.647826,-2.490563
c,-1.545136,-0.924086,0.229646
d,-0.105505,-1.158674,


> See also:
> 
> [MultiIndex / Advanced Indexing](https://pandas.pydata.org/docs/user_guide/advanced.html#advanced) is an even more concise way of
doing reindexing.

> Note
>
> When writing performance-sensitive code, there is a good reason to spend
some time becoming a reindexing ninja: **many operations are faster on
pre-aligned data**. Adding two unaligned DataFrames internally triggers a
reindexing step. For exploratory analysis you will hardly notice the
difference (because `reindex` has been heavily optimized), but when CPU
cycles matter sprinkling a few explicit `reindex` calls here and there can
have an impact.

## Reindexing to align with another object

You may wish to take an object and reindex its axes to be labeled the same as
another object. While the syntax for this is straightforward albeit verbose, it
is a common enough operation that the `reindex_like` method is
available to make this simpler:

In [16]:
df

Unnamed: 0,one,two,three
a,-1.121518,-0.295393,
b,-2.490563,1.647826,-0.002534
c,0.229646,-0.924086,-1.545136
d,,-1.158674,-0.105505


In [17]:
df2 = df.reindex(index=['c', 'b', 'e', 'f'], columns=['one'])

In [18]:
df2

Unnamed: 0,one
c,0.229646
b,-2.490563
e,
f,


In [19]:
df.reindex_like(df2)

Unnamed: 0,one
c,0.229646
b,-2.490563
e,
f,


## Aligning objects with each other with 

The `align()` method is the fastest way to simultaneously align two objects. It supports a `join` argument (related to joining and merging which will be covered later):

# <center><div style="width: 370px;"> ![Panel Data](pictures/types-of-joins.png)

> 
> * `join='outer'`: take the union of the indexes (default)
> * `join='left'`: use the calling object’s index
> * `join='right'`: use the passed object’s index
> * `join='inner'`: intersect the indexes
> 
> 
>

It returns a tuple with both of the reindexed Series:

s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"])

In [20]:
s1 = s[:4]
s1

a    0.602508
b   -2.022697
c   -0.965342
d   -0.875651
dtype: float64

In [21]:
s2 = s[1:]
s2

b   -2.022697
c   -0.965342
d   -0.875651
e   -1.415863
dtype: float64

In [22]:
s1.align(s2)

(a    0.602508
 b   -2.022697
 c   -0.965342
 d   -0.875651
 e         NaN
 dtype: float64,
 a         NaN
 b   -2.022697
 c   -0.965342
 d   -0.875651
 e   -1.415863
 dtype: float64)

In [23]:
s1.align(s2, join="inner")

(b   -2.022697
 c   -0.965342
 d   -0.875651
 dtype: float64,
 b   -2.022697
 c   -0.965342
 d   -0.875651
 dtype: float64)

In [24]:
s1.align(s2, join="left")

(a    0.602508
 b   -2.022697
 c   -0.965342
 d   -0.875651
 dtype: float64,
 a         NaN
 b   -2.022697
 c   -0.965342
 d   -0.875651
 dtype: float64)

For DataFrames, the join method will be applied to both the index and the
columns by default:

In [25]:
df

Unnamed: 0,one,two,three
a,-1.121518,-0.295393,
b,-2.490563,1.647826,-0.002534
c,0.229646,-0.924086,-1.545136
d,,-1.158674,-0.105505


In [26]:
df2

Unnamed: 0,one
c,0.229646
b,-2.490563
e,
f,


In [27]:
df.align(df2, join="inner")

(        one
 b -2.490563
 c  0.229646,
         one
 b -2.490563
 c  0.229646)

You can also pass an `axis` option to only align on the specified axis:

In [28]:
df.align(df2, join="inner", axis=0)

(        one       two     three
 b -2.490563  1.647826 -0.002534
 c  0.229646 -0.924086 -1.545136,
         one
 b -2.490563
 c  0.229646)

If you pass a Series to `DataFrame.align()`, you can choose to align both
objects either on the DataFrame’s index or columns using the `axis` argument:

In [29]:
df

Unnamed: 0,one,two,three
a,-1.121518,-0.295393,
b,-2.490563,1.647826,-0.002534
c,0.229646,-0.924086,-1.545136
d,,-1.158674,-0.105505


In [30]:
df2

Unnamed: 0,one
c,0.229646
b,-2.490563
e,
f,


In [31]:
df.align(df2.iloc[0], axis=1)

(        one     three       two
 a -1.121518       NaN -0.295393
 b -2.490563 -0.002534  1.647826
 c  0.229646 -1.545136 -0.924086
 d       NaN -0.105505 -1.158674,
 one      0.229646
 three         NaN
 two           NaN
 Name: c, dtype: float64)

## Filling while reindexing

`reindex()` takes an optional parameter `method` which is a
filling method chosen from the following table:

We illustrate these fill methods on a simple Series:

In [34]:
rng = pd.date_range("1/3/2000", periods=8)

In [35]:
ts = pd.Series(np.random.randn(8), index=rng)

In [42]:
ts2 = ts.iloc[[0, 3, 6]]

In [43]:
ts

2000-01-03   -0.935042
2000-01-04    0.087726
2000-01-05   -0.799389
2000-01-06   -0.441159
2000-01-07   -0.358319
2000-01-08    0.334735
2000-01-09    0.472108
2000-01-10    1.686189
Freq: D, dtype: float64

In [44]:
ts2

2000-01-03   -0.935042
2000-01-06   -0.441159
2000-01-09    0.472108
Freq: 3D, dtype: float64

In [45]:
ts2.reindex(ts.index)

2000-01-03   -0.935042
2000-01-04         NaN
2000-01-05         NaN
2000-01-06   -0.441159
2000-01-07         NaN
2000-01-08         NaN
2000-01-09    0.472108
2000-01-10         NaN
Freq: D, dtype: float64

In [46]:
ts2.reindex(ts.index, method="ffill")

2000-01-03   -0.935042
2000-01-04   -0.935042
2000-01-05   -0.935042
2000-01-06   -0.441159
2000-01-07   -0.441159
2000-01-08   -0.441159
2000-01-09    0.472108
2000-01-10    0.472108
Freq: D, dtype: float64

In [47]:
ts2.reindex(ts.index, method="bfill")

2000-01-03   -0.935042
2000-01-04   -0.441159
2000-01-05   -0.441159
2000-01-06   -0.441159
2000-01-07    0.472108
2000-01-08    0.472108
2000-01-09    0.472108
2000-01-10         NaN
Freq: D, dtype: float64

In [48]:
ts2.reindex(ts.index, method="nearest")

2000-01-03   -0.935042
2000-01-04   -0.935042
2000-01-05   -0.441159
2000-01-06   -0.441159
2000-01-07   -0.441159
2000-01-08    0.472108
2000-01-09    0.472108
2000-01-10    0.472108
Freq: D, dtype: float64

These methods require that the indexes are **ordered** increasing or
decreasing.

Note that the same result could have been achieved using
fillna (except for `method='nearest'`) or
interpolate:

In [None]:
ts2.reindex(ts.index).fillna(method="ffill")

`reindex()` will raise a ValueError if the index is not monotonically
increasing or decreasing. `fillna()` and `interpolate()`
will not perform any checks on the order of the index.

## Limits on filling while reindexing

The `limit` and `tolerance` arguments provide additional control over
filling while reindexing. Limit specifies the maximum count of consecutive
matches:

In [49]:
ts2.reindex(ts.index, method="ffill", limit=1)

2000-01-03   -0.935042
2000-01-04   -0.935042
2000-01-05         NaN
2000-01-06   -0.441159
2000-01-07   -0.441159
2000-01-08         NaN
2000-01-09    0.472108
2000-01-10    0.472108
Freq: D, dtype: float64

In contrast, tolerance specifies the maximum distance between the index and
indexer values:

In [50]:
ts2.reindex(ts.index, method="ffill", tolerance="1 day")

2000-01-03   -0.935042
2000-01-04   -0.935042
2000-01-05         NaN
2000-01-06   -0.441159
2000-01-07   -0.441159
2000-01-08         NaN
2000-01-09    0.472108
2000-01-10    0.472108
Freq: D, dtype: float64

Notice that when used on a `DatetimeIndex`, `TimedeltaIndex` or
`PeriodIndex`, `tolerance` will coerced into a `Timedelta` if possible.
This allows you to specify tolerance with appropriate strings.

## Dropping labels from an axis

A method closely related to `reindex` is the `drop()` function.
It removes a set of labels from an axis:

In [51]:
df

Unnamed: 0,one,two,three
a,-1.121518,-0.295393,
b,-2.490563,1.647826,-0.002534
c,0.229646,-0.924086,-1.545136
d,,-1.158674,-0.105505


In [52]:
df.drop(["a", "d"], axis=0)

Unnamed: 0,one,two,three
b,-2.490563,1.647826,-0.002534
c,0.229646,-0.924086,-1.545136


In [53]:
df.drop(["one"], axis=1)

Unnamed: 0,two,three
a,-0.295393,
b,1.647826,-0.002534
c,-0.924086,-1.545136
d,-1.158674,-0.105505


## Renaming / mapping labels

The `rename()` method allows you to relabel an axis based on some
mapping (a dict or Series) or an arbitrary function.

In [54]:
s

a    0.602508
b   -2.022697
c   -0.965342
d   -0.875651
e   -1.415863
dtype: float64

In [55]:
s.rename(str.upper)

A    0.602508
B   -2.022697
C   -0.965342
D   -0.875651
E   -1.415863
dtype: float64

If you pass a function, it must return a value when called with any of the
labels (and must produce a set of unique values). A dict or
Series can also be used:

In [57]:
df.rename(
    columns={"one": "foo", "two": "bar"},
    index={"a": "apple", "b": "banana", "d": "durian"},
)

Unnamed: 0,foo,bar,three
apple,-1.121518,-0.295393,
banana,-2.490563,1.647826,-0.002534
c,0.229646,-0.924086,-1.545136
durian,,-1.158674,-0.105505


If the mapping doesn’t include a column/index label, it isn’t renamed. Note that
extra labels in the mapping don’t throw an error.

`DataFrame.rename()` also supports an “axis-style” calling convention, where
you specify a single `mapper` and the `axis` to apply that mapping to.

In [59]:
df.rename({"one": "foo", "two": "bar"}, axis="columns")

Unnamed: 0,foo,bar,three
a,-1.121518,-0.295393,
b,-2.490563,1.647826,-0.002534
c,0.229646,-0.924086,-1.545136
d,,-1.158674,-0.105505


In [60]:
df.rename({"a": "apple", "b": "banana", "d": "durian"}, axis="index")

Unnamed: 0,one,two,three
apple,-1.121518,-0.295393,
banana,-2.490563,1.647826,-0.002534
c,0.229646,-0.924086,-1.545136
durian,,-1.158674,-0.105505


The `rename()` method also provides an `inplace` named
parameter that is by default `False` and copies the underlying data. Pass
`inplace=True` to rename the data in place.

Finally, `rename()` also accepts a scalar or list-like
for altering the `Series.name` attribute.

In [61]:
s.rename("scalar-name")

a    0.602508
b   -2.022697
c   -0.965342
d   -0.875651
e   -1.415863
Name: scalar-name, dtype: float64

The methods `DataFrame.rename_axis()` and `Series.rename_axis()`
allow specific names of a `MultiIndex` to be changed (as opposed to the
labels).

In [62]:
df = pd.DataFrame(
    {"x": [1, 2, 3, 4, 5, 6], "y": [10, 20, 30, 40, 50, 60]},
    index=pd.MultiIndex.from_product(
        [["a", "b", "c"], [1, 2]], names=["let", "num"]
    ),
)

In [63]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,x,y
let,num,Unnamed: 2_level_1,Unnamed: 3_level_1
a,1,1,10
a,2,2,20
b,1,3,30
b,2,4,40
c,1,5,50
c,2,6,60


In [64]:
df.rename_axis(index={"let": "abc"})

Unnamed: 0_level_0,Unnamed: 1_level_0,x,y
abc,num,Unnamed: 2_level_1,Unnamed: 3_level_1
a,1,1,10
a,2,2,20
b,1,3,30
b,2,4,40
c,1,5,50
c,2,6,60


In [65]:
df.rename_axis(index=str.upper)

Unnamed: 0_level_0,Unnamed: 1_level_0,x,y
LET,NUM,Unnamed: 2_level_1,Unnamed: 3_level_1
a,1,1,10
a,2,2,20
b,1,3,30
b,2,4,40
c,1,5,50
c,2,6,60
