# Hierarchical Indexing
- Hierarchical indexing is an important feature of pandas that enables you to have multiple (two or more) index levels on an axis.
- It provides a way to work with higher dimension data.

In [1]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

In [2]:
np.random.uniform(size=9)

array([0.6227876 , 0.16233994, 0.10613757, 0.15429082, 0.57367741,
       0.2716733 , 0.79279439, 0.568418  , 0.75192583])

In [4]:
data = pd.Series(np.random.uniform(size=9),
                index=[list('aaabbccdd'), [1,2,3,1,3,1,2,2,3]])
data

a  1    0.006064
   2    0.684851
   3    0.137637
b  1    0.548987
   3    0.348890
c  1    0.918848
   2    0.249424
d  2    0.114382
   3    0.126006
dtype: float64

In [5]:
data.index

MultiIndex([('a', 1),
            ('a', 2),
            ('a', 3),
            ('b', 1),
            ('b', 3),
            ('c', 1),
            ('c', 2),
            ('d', 2),
            ('d', 3)],
           )

In [6]:
data['b']

1    0.548987
3    0.348890
dtype: float64

In [7]:
data['b':'c']

b  1    0.548987
   3    0.348890
c  1    0.918848
   2    0.249424
dtype: float64

In [8]:
data.loc[['b', 'd']]

b  1    0.548987
   3    0.348890
d  2    0.114382
   3    0.126006
dtype: float64

Selection from inner level. Select all values having value 2 from the second index level

In [9]:
data.loc[:, 2]

a    0.684851
c    0.249424
d    0.114382
dtype: float64

In [10]:
data.unstack()

Unnamed: 0,1,2,3
a,0.006064,0.684851,0.137637
b,0.548987,,0.34889
c,0.918848,0.249424,
d,,0.114382,0.126006


In [11]:
data.unstack().stack()

a  1    0.006064
   2    0.684851
   3    0.137637
b  1    0.548987
   3    0.348890
c  1    0.918848
   2    0.249424
d  2    0.114382
   3    0.126006
dtype: float64

Hierarchical indexing can be done on either axis.

In [12]:
frame = pd.DataFrame(np.arange(12).reshape((4,3)),
                    index=[['a', 'a', 'b', 'b'], [1,2,1,2]],
                    columns=[['Ohio', 'Ohio', 'Colorado'], ['Green', 'Red', 'Green']])
frame

Unnamed: 0_level_0,Unnamed: 1_level_0,Ohio,Ohio,Colorado
Unnamed: 0_level_1,Unnamed: 1_level_1,Green,Red,Green
a,1,0,1,2
a,2,3,4,5
b,1,6,7,8
b,2,9,10,11


In [16]:
frame.index.names = ['key1', 'key2']
frame.columns.names = ['state', 'color']
frame

Unnamed: 0_level_0,state,Ohio,Ohio,Colorado
Unnamed: 0_level_1,color,Green,Red,Green
key1,key2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
a,1,0,1,2
a,2,3,4,5
b,1,6,7,8
b,2,9,10,11


In [17]:
frame.index.nlevels

2

In [18]:
frame['Ohio']

Unnamed: 0_level_0,color,Green,Red
key1,key2,Unnamed: 2_level_1,Unnamed: 3_level_1
a,1,0,1
a,2,3,4
b,1,6,7
b,2,9,10


A `MultiIndex` can be created by itself. For example:

In [19]:
pd.MultiIndex.from_arrays([['Ohio', 'Ohio', 'Colorado'], ['Green', 'Red', 'Green']],
                            names=['state', 'color'])

MultiIndex([(    'Ohio', 'Green'),
            (    'Ohio',   'Red'),
            ('Colorado', 'Green')],
           names=['state', 'color'])