# Hierarchical Indexing

In [1]:
import numpy as np
import pandas as pd

**A Multiply Indexed Series**

In [18]:
index = [('California', 2000), ('California', 2010), ('New York', 2000), ('New York', 2010), ('Texas', 2000), ('Texas', 2010)]
populations = [33871648, 37253956, 18976457, 19378102, 20851820, 25145561]
pop = pd.Series(populations, index=index)

index = pd.MultiIndex.from_tuples(index)
index

MultiIndex(levels=[['California', 'New York', 'Texas'], [2000, 2010]],
           labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]])

In [None]:
pop = pop.reindex(index)
pop[:, 2010]

In [None]:
pop_df = pop.unstack()
pop_df.stack()

In [20]:
pop_df = pd.DataFrame({'total': pop, 'under18': [9267089, 9284094, 4687374, 4318033, 5906301, 6879014]})
pop_df

Unnamed: 0,Unnamed: 1,total,under18
California,2000,33871648,9267089
California,2010,37253956,9284094
New York,2000,18976457,4687374
New York,2010,19378102,4318033
Texas,2000,20851820,5906301
Texas,2010,25145561,6879014


In [41]:
f_u18 = pop_df['under18'] / pop_df['total']
f_u18.unstack()

Unnamed: 0,2000,2010
California,0.273594,0.249211
New York,0.24701,0.222831
Texas,0.283251,0.273568


**Methods of MultiIndex Creation**

In [None]:
df = pd.DataFrame(np.random.rand(4, 2), index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]], columns=['data1', 'data2'])

data = {('California', 2000): 33871648, ('California', 2010): 37253956, ('Texas', 2000): 20851820, ('Texas', 2010): 25145561, ('New York', 2000): 18976457, ('New York', 2010): 19378102}
pd.Series(data)

**Explicit MultiIndex constructors**

In [None]:
pd.MultiIndex.from_arrays([['a', 'a', 'b', 'b'], [1, 2, 1, 2]], names=['kobe', 'jordan'])
pd.MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1), ('b', 2)])
pd.MultiIndex.from_product([['a', 'b'], [1, 2]])    # From a Cartesian product of single indices
pd.MultiIndex(levels=[['a', 'b'], [1, 2]], labels=[[0, 0, 1, 1], [0, 1, 0, 1]])

Any of these objects can be passed as the index argument when creating a Series or DataFrame, or be passed to the reindex method of an existing Series or DataFrame.

**Indexing and Slicing a MultiIndex**

......

In [None]:
health_data.loc[:, ('Bob', 'HR')]

In [56]:
idx = pd.IndexSlice
health_data.loc[idx[:, 1], idx[:, 'HR']]

**Rearranging Multi-Indices**

**Sorted and unsorted indices**  
Many of the MultiIndex slicing operations will fail if the index is not sorted. For various reasons, partial slices and other similar operations require the levels in the MultiIndex to be in sorted (i.e., lexographical) order.

In [None]:
data.sort_index()

**Stacking and unstacking indices**

In [None]:
pop.unstack(level=0)
pop.unstack(level=1)
pop.unstack().stack()

**Index setting and resetting**  
Another way to rearrange hierarhical data is to turn the index labels into columns; this can be accomplished with the reset_index method. For clarity, we can optionally specify the name of the data for the column representation:

In [None]:
pop_flat = pop.reset_index(name='population')

It is useful to build a MultiIndex from the column values.

In [None]:
pop_flat.set_index(['state', 'year'])

**Data Aggregations on Multi-Indices**

In [None]:
data_mean = health_data.mean(level='year')
data_mean.mean(axis=1, level='type')

In [5]:
t = pd.DataFrame(np.random.rand(9).reshape((3,3)), columns=list('kob'))

In [6]:
t

Unnamed: 0,k,o,b
0,0.421528,0.364321,0.723384
1,0.172087,0.647029,0.02696
2,0.447667,0.839699,0.698248


In [12]:
a = t.set_index('k')

In [16]:
a.reindex(list('kkp'))

Unnamed: 0_level_0,o,b
k,Unnamed: 1_level_1,Unnamed: 2_level_1
k,,
k,,
p,,
