#Index Objects and Reindexing

In [2]:
import numpy as np
import pandas as pd

###Index Objects

In [8]:
# create a series
series1 = pd.Series([1, 2, 3, 4], index=['A', 'B', 'C', 'D'])
series1

A    1
B    2
C    3
D    4
dtype: int64

In [7]:
# We can grab the index
series1_index = series1.index
series1_index

Index(['A', 'B', 'C', 'D'], dtype='object')

In [9]:
# We can use this to also grab index values
series1_index[2]

'C'

In [10]:
series1_index[2:]

Index(['C', 'D'], dtype='object')

In [11]:
# let's try and change an index
series1_index[0] = 'Z'

TypeError: Indexes does not support mutable operations

This causes an error, because index values are immutable. 

###Reindexing

In [12]:
# print series1
print(series1)

A    1
B    2
C    3
D    4
dtype: int64


Let's say we wanted to reindex series1. How would we do this?

In [14]:
# create series2 with the values from series1, but with a new index
series2 = series1.reindex(['A', 'B', 'C', 'D', 'E', 'F'])
series2

A     1
B     2
C     3
D     4
E   NaN
F   NaN
dtype: float64

Pandas is very good at dealing with missing values. When we pulled the values from series1 and reindexed, E and F didn't correspond to any values. Thus, they just show up as NaN (missing) values in series2.

Now, we can reindex series2 again, but this time, we tell pandas to fill in any new NaN values with a 0. However, E and F will stay NaN, since they're not new values, they're currently existing as missing.

In [16]:
# reindex series2, with a default value of 0
series2.reindex(['A', 'B', 'C', 'D', 'E', 'F', 'G'], fill_value = 0)

A     1
B     2
C     3
D     4
E   NaN
F   NaN
G     0
dtype: float64

In [20]:
# create a new series
series3 = pd.Series(['USA', 'Mexico', 'Canada'], index = [0, 5, 10])
series3

0        USA
5     Mexico
10    Canada
dtype: object

Let's reindex series3 from 0 to 14. Instead of using NaN values, we'll use method = 'ffill', which stands for forward fill. This will forward fill any values over missing values, until you hit an existing value.

In [21]:
# reindex series3 from 0 to 14
series3.reindex(range(15), method = 'ffill')

0        USA
1        USA
2        USA
3        USA
4        USA
5     Mexico
6     Mexico
7     Mexico
8     Mexico
9     Mexico
10    Canada
11    Canada
12    Canada
13    Canada
14    Canada
dtype: object

In [23]:
# create a DataFrame of random normally distributed numbers
frame1 = pd.DataFrame(np.random.randn(25).reshape(5, 5),
                      index = ['A', 'B', 'D', 'E', 'F'],
                      columns = ['col1', 'col2', 'col3', 'col4', 'col5'])
frame1

Unnamed: 0,col1,col2,col3,col4,col5
A,-0.380715,-0.363434,0.257713,-0.357725,0.029428
B,-0.503176,-0.479468,2.347637,0.091316,-0.07798
D,1.094181,-0.660691,-0.255178,-1.048621,0.714585
E,0.795444,0.668781,0.931876,-1.228426,0.260081
F,0.428537,0.651846,-1.257655,-0.793718,0.960576


In [26]:
# create a new reindexed frame
frame2 = frame1.reindex(['A', 'B', 'C', 'D', 'E'])
frame2

Unnamed: 0,col1,col2,col3,col4,col5
A,-0.380715,-0.363434,0.257713,-0.357725,0.029428
B,-0.503176,-0.479468,2.347637,0.091316,-0.07798
C,,,,,
D,1.094181,-0.660691,-0.255178,-1.048621,0.714585
E,0.795444,0.668781,0.931876,-1.228426,0.260081


In [27]:
# create a list of new columns
new_columns = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6']

In [28]:
# reindex columns
frame2.reindex(columns = new_columns)

Unnamed: 0,col1,col2,col3,col4,col5,col6
A,-0.380715,-0.363434,0.257713,-0.357725,0.029428,
B,-0.503176,-0.479468,2.347637,0.091316,-0.07798,
C,,,,,,
D,1.094181,-0.660691,-0.255178,-1.048621,0.714585,
E,0.795444,0.668781,0.931876,-1.228426,0.260081,


In [29]:
# show original DataFrame
frame1

Unnamed: 0,col1,col2,col3,col4,col5
A,-0.380715,-0.363434,0.257713,-0.357725,0.029428
B,-0.503176,-0.479468,2.347637,0.091316,-0.07798
D,1.094181,-0.660691,-0.255178,-1.048621,0.714585
E,0.795444,0.668781,0.931876,-1.228426,0.260081
F,0.428537,0.651846,-1.257655,-0.793718,0.960576


A shortcut for reindexing is the .ix method. This takes two lists as inputs: the first is the indices, and the second is the columns.

In [31]:
# reindex using .ix method
frame1.ix[['A', 'B', 'C', 'D', 'E', 'F'], new_columns]

Unnamed: 0,col1,col2,col3,col4,col5,col6
A,-0.380715,-0.363434,0.257713,-0.357725,0.029428,
B,-0.503176,-0.479468,2.347637,0.091316,-0.07798,
C,,,,,,
D,1.094181,-0.660691,-0.255178,-1.048621,0.714585,
E,0.795444,0.668781,0.931876,-1.228426,0.260081,
F,0.428537,0.651846,-1.257655,-0.793718,0.960576,
