# Reshaping with Hierarchical Indexing

Hierarchical indexing provides a consistent way to rearrange data in a DataFrame.

There are two primary actions:
* stack: this “rotates” or pivots from the columns in the data to the rows
* unstack: this pivots from the rows into the columns

I’ll illustrate these operations through a series of examples. Consider a small DataFrame with string arrays as row and column indexes:

In [100]:
import pandas as pd
import numpy as np
from pandas import DataFrame, Series

In [104]:
data = DataFrame(np.arange(6).reshape(2,3),
                index= pd.Index(['Ohio', 'Colorado'], name = 'state'),
                columns=pd.Index(['a', 'b', 'c'], name = 'number'))


data

number,a,b,c
state,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Ohio,0,1,2
Colorado,3,4,5


Using the stack method on this data pivots the columns into the rows, producing a Series:

In [106]:
data.stack()

state     number
Ohio      a         0
          b         1
          c         2
Colorado  a         3
          b         4
          c         5
dtype: int32

From a hierarchically-indexed Series, you can rearrange the data back into a DataFrame with unstack:

In [107]:
result.unstack()

number,one,two,three
state,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Ohio,0,1,2
Colorado,3,4,5


By default the innermost level is unstacked (same with stack). You can unstack a different level by passing a level number or name:

In [108]:
result.unstack(0)

state,Ohio,Colorado
number,Unnamed: 1_level_1,Unnamed: 2_level_1
one,0,3
two,1,4
three,2,5


In [109]:
result.unstack('state')

state,Ohio,Colorado
number,Unnamed: 1_level_1,Unnamed: 2_level_1
one,0,3
two,1,4
three,2,5


Unstacking might introduce missing data if all of the values in the level aren’t found in each of the subgroups:

In [110]:
s1 = Series([0, 1, 2, 3], index=['a', 'b', 'c', 'd'])

s2 = Series([4, 5, 6], index=['c', 'd', 'e'])

s1, s2

(a    0
 b    1
 c    2
 d    3
 dtype: int64,
 c    4
 d    5
 e    6
 dtype: int64)

In [111]:
data2 = pd.concat([s1, s2], keys=['one', 'two'])

data2

one  a    0
     b    1
     c    2
     d    3
two  c    4
     d    5
     e    6
dtype: int64

In [112]:
data2.unstack()

Unnamed: 0,a,b,c,d,e
one,0.0,1.0,2.0,3.0,
two,,,4.0,5.0,6.0


Stacking filters out missing data by default, so the operation is easily invertible:

In [113]:
data2.unstack().stack()

one  a    0.0
     b    1.0
     c    2.0
     d    3.0
two  c    4.0
     d    5.0
     e    6.0
dtype: float64

In [114]:
data2.unstack().stack(dropna=False)

one  a    0.0
     b    1.0
     c    2.0
     d    3.0
     e    NaN
two  a    NaN
     b    NaN
     c    4.0
     d    5.0
     e    6.0
dtype: float64

When unstacking in a DataFrame, the level unstacked becomes the lowest level in the result:

In [116]:
df = DataFrame({'left': result, 'right': result + 5},
                columns=pd.Index(['left', 'right'], name = 'side'))
            
df

Unnamed: 0_level_0,side,left,right
state,number,Unnamed: 2_level_1,Unnamed: 3_level_1
Ohio,one,0,5
Ohio,two,1,6
Ohio,three,2,7
Colorado,one,3,8
Colorado,two,4,9
Colorado,three,5,10


In [118]:
df.unstack('state')

side,left,left,right,right
state,Ohio,Colorado,Ohio,Colorado
number,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
one,0,3,5,8
two,1,4,6,9
three,2,5,7,10


In [119]:
df.unstack('state').stack('side')

Unnamed: 0_level_0,state,Colorado,Ohio
number,side,Unnamed: 2_level_1,Unnamed: 3_level_1
one,left,3,0
one,right,8,5
two,left,4,1
two,right,9,6
three,left,5,2
three,right,10,7
