# Hierarchical Indexing and Leveling

The hierarchical indexing is a very important feature of pandas, as it allows you to have multiple levels of indexes on a single axis. Somehow it gives you a way to work with data in multiple dimensions continuing to work in a two-dimensional structure.

In [2]:
import numpy as np
import pandas as pd

In [3]:
mser = pd.Series(np.random.rand(8), 
                index=[['white','white','white','blue','blue','red','red','red'],
                        ['up','down','right','up','down','up','down','left']])

In [4]:
mser

white  up       0.786982
       down     0.846764
       right    0.570307
blue   up       0.757272
       down     0.483206
red    up       0.308831
       down     0.418737
       left     0.880981
dtype: float64

In [5]:
mser.index

MultiIndex(levels=[['blue', 'red', 'white'], ['down', 'left', 'right', 'up']],
           codes=[[2, 2, 2, 0, 0, 1, 1, 1], [3, 0, 2, 3, 0, 3, 0, 1]])

In [7]:
mi = pd.MultiIndex(levels=[['sekar', 'saskia', 'arifa'], [0, 1, 2]], 
                  codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]])

In [8]:
mi

MultiIndex(levels=[['sekar', 'saskia', 'arifa'], [0, 1, 2]],
           codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]])

In [17]:
s = pd.Series(np.random.randint(100, 200, 9), index=mi)

In [18]:
s

sekar   0    190
        1    176
        2    118
saskia  0    175
        1    181
        2    161
arifa   0    107
        1    165
        2    101
dtype: int32

In [19]:
s['sekar']

0    190
1    176
2    118
dtype: int32

In [20]:
s[:, 1]

sekar     176
saskia    181
arifa     165
dtype: int32

In [21]:
# converts Series with hierarchical indexing into DataFrame
s.unstack()

Unnamed: 0,0,1,2
sekar,190,176,118
saskia,175,181,161
arifa,107,165,101


In [23]:
s.unstack().stack()

sekar   0    190
        1    176
        2    118
saskia  0    175
        1    181
        2    161
arifa   0    107
        1    165
        2    101
dtype: int32

In [24]:
s.unstack().T

Unnamed: 0,sekar,saskia,arifa
0,190,175,107
1,176,181,165
2,118,161,101


In [26]:
frame = pd.DataFrame(np.arange(16).reshape((4,4)),
                     index=['red','blue','yellow','white'],
                     columns=['ball','pen','pencil','paper'])
frame

Unnamed: 0,ball,pen,pencil,paper
red,0,1,2,3
blue,4,5,6,7
yellow,8,9,10,11
white,12,13,14,15


In [27]:
frame.stack()

red     ball       0
        pen        1
        pencil     2
        paper      3
blue    ball       4
        pen        5
        pencil     6
        paper      7
yellow  ball       8
        pen        9
        pencil    10
        paper     11
white   ball      12
        pen       13
        pencil    14
        paper     15
dtype: int32

In [28]:
frame.T.stack()

ball    red        0
        blue       4
        yellow     8
        white     12
pen     red        1
        blue       5
        yellow     9
        white     13
pencil  red        2
        blue       6
        yellow    10
        white     14
paper   red        3
        blue       7
        yellow    11
        white     15
dtype: int32

In [29]:
mframe = pd.DataFrame(np.random.randn(16).reshape(4,4),
                        index=[['white','white','red','red'], ['up','down','up','down']],
                        columns=[['pen','pen','paper','paper'],[1,2,1,2]])
mframe

Unnamed: 0_level_0,Unnamed: 1_level_0,pen,pen,paper,paper
Unnamed: 0_level_1,Unnamed: 1_level_1,1,2,1,2
white,up,1.879602,0.301486,-0.647833,0.673755
white,down,-0.440464,-1.819184,0.628374,-0.137523
red,up,-0.836754,1.137609,0.100188,0.992703
red,down,1.18901,0.88298,1.300226,0.524605


---

## Reordering and Sorting Levels

In [30]:
mframe.columns.names = ['objects', 'id']
mframe.index.names = ['colors', 'status']

In [31]:
mframe

Unnamed: 0_level_0,objects,pen,pen,paper,paper
Unnamed: 0_level_1,id,1,2,1,2
colors,status,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
white,up,1.879602,0.301486,-0.647833,0.673755
white,down,-0.440464,-1.819184,0.628374,-0.137523
red,up,-0.836754,1.137609,0.100188,0.992703
red,down,1.18901,0.88298,1.300226,0.524605


In [32]:
mframe.swaplevel('colors', 'status')

Unnamed: 0_level_0,objects,pen,pen,paper,paper
Unnamed: 0_level_1,id,1,2,1,2
status,colors,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
up,white,1.879602,0.301486,-0.647833,0.673755
down,white,-0.440464,-1.819184,0.628374,-0.137523
up,red,-0.836754,1.137609,0.100188,0.992703
down,red,1.18901,0.88298,1.300226,0.524605


---

## Summary Statistics by Level

In [35]:
mframe.sum(level='colors')

objects,pen,pen,paper,paper
id,1,2,1,2
colors,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
white,1.439138,-1.517698,-0.019459,0.536232
red,0.352256,2.020589,1.400414,1.517308


In [36]:
mframe.mean(level='colors')

objects,pen,pen,paper,paper
id,1,2,1,2
colors,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
white,0.719569,-0.758849,-0.00973,0.268116
red,0.176128,1.010295,0.700207,0.758654


In [37]:
mframe.mean(level='status')

objects,pen,pen,paper,paper
id,1,2,1,2
status,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
up,0.521424,0.719548,-0.273822,0.833229
down,0.374273,-0.468102,0.9643,0.193541


In [38]:
mframe.mean(level='objects', axis=1)

Unnamed: 0_level_0,objects,pen,paper
colors,status,Unnamed: 2_level_1,Unnamed: 3_level_1
white,up,1.090544,0.012961
white,down,-1.129824,0.245425
red,up,0.150427,0.546446
red,down,1.035995,0.912415


In [39]:
mframe.mean(level='id', axis=1)

Unnamed: 0_level_0,id,1,2
colors,status,Unnamed: 2_level_1,Unnamed: 3_level_1
white,up,0.615885,0.487621
white,down,0.093955,-0.978354
red,up,-0.368283,1.065156
red,down,1.244618,0.703793


---

# Conclusions

In this chapter, the library pandas has been introduced. You have learned how to install it and then you have seen a general overview based on its characteristics. In more detail, you saw the two basic structures data, called Series and DataFrame, along with their operation and their main characteristics. Especially, you discovered the importance of indexing within these structures and how best to perform some operations on them. Finally you looked at the possibility of extending the complexity of these structures creating hierarchies of indexes, thus distributing the data contained in them in different sub-levels. In the next chapter, you will see how to capture data from external sources such as files, and inversely, how to write the results of our analysis on them.

In [41]:
mframe

Unnamed: 0_level_0,objects,pen,pen,paper,paper
Unnamed: 0_level_1,id,1,2,1,2
colors,status,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
white,up,1.879602,0.301486,-0.647833,0.673755
white,down,-0.440464,-1.819184,0.628374,-0.137523
red,up,-0.836754,1.137609,0.100188,0.992703
red,down,1.18901,0.88298,1.300226,0.524605


In [42]:
mframe.stack()

Unnamed: 0_level_0,Unnamed: 1_level_0,objects,paper,pen
colors,status,id,Unnamed: 3_level_1,Unnamed: 4_level_1
white,up,1,-0.647833,1.879602
white,up,2,0.673755,0.301486
white,down,1,0.628374,-0.440464
white,down,2,-0.137523,-1.819184
red,up,1,0.100188,-0.836754
red,up,2,0.992703,1.137609
red,down,1,1.300226,1.18901
red,down,2,0.524605,0.88298


---

# Important Points

- Series with hierarchical indexing:
    - can be created by specifying index by list within list (index=[['name1', 'name1', 'name2', 'name2'], [0, 1, 0, 1]]
    - by creating MultiIndex object first, specifying levels and codes
- DataFrame with hierarchical indexing:
    - same as series, for index and columns
- unstack(): turn Series to DataFrame
- stack(): turn DataFrame to Series or DataFrame with less dimension
- object.index.names = ['name for first level', 'name for second level'] and so on
- object.columns.name = ['name for first level', 'name for second level'] ans so on
- summary and descriptive statistics:
    - object.mean(level=..., axis=...)