# Hierarchical Indexing
Hierarchical indexing is an important feature of pandas enabling you to have multiple (two or more) index levels on an axis. Somewhat abstractly, it provides a way for you to work with higher dimensional data in a lower dimensional form. Let’s start with a simple example; create a Series with a list of lists or arrays as the index:

In [6]:
from pandas import Series, DataFrame
import numpy as np
import pandas as pd

In [13]:
data = Series(np.random.randn(10),
index=[['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd', 'd'],
[1, 2, 3, 1, 2, 3, 1, 2, 2, 3]])

data

a  1    0.041373
   2    1.540400
   3    0.799172
b  1   -0.001361
   2    1.570515
   3    0.806432
c  1   -0.060385
   2    1.129928
d  2   -2.041302
   3   -1.303466
dtype: float64

What you’re seeing is a prettified view of a Series with a MultiIndex as its index. The “gaps” in the index display mean “use the label directly above”:

In [15]:
data.index

MultiIndex([('a', 1),
            ('a', 2),
            ('a', 3),
            ('b', 1),
            ('b', 2),
            ('b', 3),
            ('c', 1),
            ('c', 2),
            ('d', 2),
            ('d', 3)],
           )

With a hierarchically-indexed object, so-called partial indexing is possible, enabling you to concisely select subsets of the data:

In [18]:
data['b']

1   -0.001361
2    1.570515
3    0.806432
dtype: float64

In [19]:
data['b' : 'c']

b  1   -0.001361
   2    1.570515
   3    0.806432
c  1   -0.060385
   2    1.129928
dtype: float64

In [21]:
data.loc[['b', 'd']]

b  1   -0.001361
   2    1.570515
   3    0.806432
d  2   -2.041302
   3   -1.303466
dtype: float64

Selection is even possible in some cases from an “inner” level:

In [22]:
data[:, 2]

a    1.540400
b    1.570515
c    1.129928
d   -2.041302
dtype: float64

Hierarchical indexing plays a critical role in reshaping data and group-based operations like forming a pivot table. For example, this data could be rearranged into a DataFrame using its unstack method:

In [24]:
data.unstack()

Unnamed: 0,1,2,3
a,0.041373,1.5404,0.799172
b,-0.001361,1.570515,0.806432
c,-0.060385,1.129928,
d,,-2.041302,-1.303466


The inverse operation of unstack is stack:

In [25]:
data.unstack().stack()

a  1    0.041373
   2    1.540400
   3    0.799172
b  1   -0.001361
   2    1.570515
   3    0.806432
c  1   -0.060385
   2    1.129928
d  2   -2.041302
   3   -1.303466
dtype: float64

With a DataFrame, either axis can have a hierarchical index:

In [27]:
frame = DataFrame(np.arange(12).reshape((4, 3)),
                    index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]],
                    columns=[['Ohio', 'Ohio', 'Colorado'],
                     ['Green', 'Red', 'Green']])


frame

Unnamed: 0_level_0,Unnamed: 1_level_0,Ohio,Ohio,Colorado
Unnamed: 0_level_1,Unnamed: 1_level_1,Green,Red,Green
a,1,0,1,2
a,2,3,4,5
b,1,6,7,8
b,2,9,10,11


The hierarchical levels can have names (as strings or any Python objects). If so, these will show up in the console output (don’t confuse the index names with the axis labels!):

In [32]:
frame.index.names = ['key1', 'key2']

frame

Unnamed: 0_level_0,Unnamed: 1_level_0,Ohio,Ohio,Colorado
Unnamed: 0_level_1,Unnamed: 1_level_1,Green,Red,Green
key1,key2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
a,1,0,1,2
a,2,3,4,5
b,1,6,7,8
b,2,9,10,11


In [34]:
frame.columns.names = ['state', 'color']

frame

Unnamed: 0_level_0,state,Ohio,Ohio,Colorado
Unnamed: 0_level_1,color,Green,Red,Green
key1,key2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
a,1,0,1,2
a,2,3,4,5
b,1,6,7,8
b,2,9,10,11


With partial column indexing you can similarly select groups of columns:

In [35]:
frame['Ohio']

Unnamed: 0_level_0,color,Green,Red
key1,key2,Unnamed: 2_level_1,Unnamed: 3_level_1
a,1,0,1
a,2,3,4
b,1,6,7
b,2,9,10


 MultiIndex can be created by itself and then reused; the columns in the above DataFrame with level names could be created like this:

In [37]:
from pandas import MultiIndex

In [38]:
MultiIndex.from_arrays([['Ohio', 'Ohio', 'Colorado'], ['Green', 'Red', 'Green']],
                        names=['state', 'color'])

MultiIndex([(    'Ohio', 'Green'),
            (    'Ohio',   'Red'),
            ('Colorado', 'Green')],
           names=['state', 'color'])