# Hierarchical Indexing

Hierarchical indexing is an important feature of pandas enabling you to have multiple
(two or more) index levels on an axis. Somewhat abstractly, it provides a way for you
to work with higher dimensional data in a lower dimensional form. Let’s start with a
simple example; create a Series with a list of lists or arrays as the index:

In [1]:
import pandas as pd
from pandas import DataFrame , Series
import numpy as np

In [2]:
data = Series(np.random.randn(10),index=[['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd', 'd'],[1, 2, 3, 1, 2, 3, 1, 2, 2, 3]])

In [3]:
data

a  1    0.451883
   2   -0.066545
   3   -0.353470
b  1    0.681551
   2    1.528489
   3    0.115527
c  1    1.673042
   2    0.196543
d  2    0.850413
   3   -0.032468
dtype: float64

What you’re seeing is a prettified view of a Series with a MultiIndex as its index. The
“gaps” in the index display mean “use the label directly above”:

In [4]:
data.index

MultiIndex([('a', 1),
            ('a', 2),
            ('a', 3),
            ('b', 1),
            ('b', 2),
            ('b', 3),
            ('c', 1),
            ('c', 2),
            ('d', 2),
            ('d', 3)],
           )

With a hierarchically-indexed object, so-called partial indexing is possible, enabling
you to concisely select subsets of the data:

In [5]:
data['b']

1    0.681551
2    1.528489
3    0.115527
dtype: float64

In [22]:
data[2],data['a']

(-0.3534696863877639, 1    0.451883
 2   -0.066545
 3   -0.353470
 dtype: float64)

In [7]:
data['b':'c']

b  1    0.681551
   2    1.528489
   3    0.115527
c  1    1.673042
   2    0.196543
dtype: float64

In [8]:
data.ix[['b', 'd']]

.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.


b  1    0.681551
   2    1.528489
   3    0.115527
d  2    0.850413
   3   -0.032468
dtype: float64

Selection is even possible in some cases from an “inner” level:

In [23]:
data

a  1    0.451883
   2   -0.066545
   3   -0.353470
b  1    0.681551
   2    1.528489
   3    0.115527
c  1    1.673042
   2    0.196543
d  2    0.850413
   3   -0.032468
dtype: float64

In [9]:
data[:, 2]

a   -0.066545
b    1.528489
c    0.196543
d    0.850413
dtype: float64

Hierarchical indexing plays a critical role in reshaping data and group-based operations
like forming a pivot table. For example, this data could be rearranged into a DataFrame
using its unstack method:

In [10]:
data.unstack()

Unnamed: 0,1,2,3
a,0.451883,-0.066545,-0.35347
b,0.681551,1.528489,0.115527
c,1.673042,0.196543,
d,,0.850413,-0.032468


The inverse operation of unstack is stack:

In [25]:
data.unstack().stack()

a  1    0.451883
   2   -0.066545
   3   -0.353470
b  1    0.681551
   2    1.528489
   3    0.115527
c  1    1.673042
   2    0.196543
d  2    0.850413
   3   -0.032468
dtype: float64

With a DataFrame, either axis can have a hierarchical index:

In [33]:
frame = DataFrame(np.arange(12).reshape((4, 3)),index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]], columns=[['Ohio', 'Ohio', 'Colorado'],['Green', 'Red', 'Green']])

In [34]:
frame

Unnamed: 0_level_0,Unnamed: 1_level_0,Ohio,Ohio,Colorado
Unnamed: 0_level_1,Unnamed: 1_level_1,Green,Red,Green
a,1,0,1,2
a,2,3,4,5
b,1,6,7,8
b,2,9,10,11


The hierarchical levels can have names (as strings or any Python objects). If so, these
will show up in the console output (don’t confuse the index names with the axis labels!):

In [14]:
frame.index.names = ['key1', 'key2']
frame.columns.names = ['state', 'color']
frame

Unnamed: 0_level_0,state,Ohio,Ohio,Colorado
Unnamed: 0_level_1,color,Green,Red,Green
key1,key2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
a,1,0,1,2
a,2,3,4,5
b,1,6,7,8
b,2,9,10,11


With partial column indexing you can similarly select groups of columns:

In [15]:
frame['Ohio']

Unnamed: 0_level_0,color,Green,Red
key1,key2,Unnamed: 2_level_1,Unnamed: 3_level_1
a,1,0,1
a,2,3,4
b,1,6,7
b,2,9,10


In [16]:
frame['Colorado']

Unnamed: 0_level_0,color,Green
key1,key2,Unnamed: 2_level_1
a,1,2
a,2,5
b,1,8
b,2,11


A MultiIndex can be created by itself and then reused; the columns in the above Data-
Frame with level names could be created like this:

In [17]:
pd.MultiIndex.from_arrays([['Ohio', 'Ohio', 'Colorado'], ['Green', 'Red', 'Green']],names=['state', 'color'])

MultiIndex([(    'Ohio', 'Green'),
            (    'Ohio',   'Red'),
            ('Colorado', 'Green')],
           names=['state', 'color'])