# Creating a MultiIndex (Hierarchical Index) Objects - Forming DataFrame with MultiIndex - Accessing Data from a MultiIndexed DataFrame

The MultiIndex object is the hierarchical analogue of the standard Index object which typically stores the axis labels in pandas objects. 

You can think of MultiIndex as an array of tuples where each tuple is unique. 

A MultiIndex can be created from a list of arrays (using **MultiIndex.from_arrays()**), 

an array of tuples (using **MultiIndex.from_tuples()**), 

a crossed set of iterables (using **MultiIndex.from_product()**), 

or a DataFrame (using **MultiIndex.from_frame()**). 

The Index constructor will attempt to return a MultiIndex when it is passed a list of tuples. 

The following examples demonstrate different ways to initialize MultiIndexes.

In [1]:
import pandas as pd
import numpy as np

In [2]:
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

In [3]:
tuples = list(zip(*arrays))

In [4]:
tuples

[('bar', 'one'),
 ('bar', 'two'),
 ('baz', 'one'),
 ('baz', 'two'),
 ('foo', 'one'),
 ('foo', 'two'),
 ('qux', 'one'),
 ('qux', 'two')]

### From Tuple

In [5]:
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index

MultiIndex([('bar', 'one'),
            ('bar', 'two'),
            ('baz', 'one'),
            ('baz', 'two'),
            ('foo', 'one'),
            ('foo', 'two'),
            ('qux', 'one'),
            ('qux', 'two')],
           names=['first', 'second'])

In [6]:
s = pd.Series(np.random.randn(8), index=index)
s

first  second
bar    one       0.833750
       two       0.097797
baz    one      -0.340482
       two       0.924362
foo    one      -0.290514
       two      -1.910361
qux    one      -0.788704
       two       0.511412
dtype: float64

### From Product

When you want every pairing of the elements in two iterables, it can be easier to use the MultiIndex.from_product() method:

In [7]:
iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']]
mindex = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
mindex

MultiIndex([('bar', 'one'),
            ('bar', 'two'),
            ('baz', 'one'),
            ('baz', 'two'),
            ('foo', 'one'),
            ('foo', 'two'),
            ('qux', 'one'),
            ('qux', 'two')],
           names=['first', 'second'])

In [8]:
df = pd.DataFrame(np.random.randn(8), index=mindex, columns=['Value'])
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Value
first,second,Unnamed: 2_level_1
bar,one,-0.168288
bar,two,1.235525
baz,one,0.106039
baz,two,1.039472
foo,one,0.570929
foo,two,0.499319
qux,one,-0.745725
qux,two,0.668846


### From DataFrame

You can also construct a MultiIndex from a DataFrame directly, using the method MultiIndex.from_frame(). This is a complementary method to MultiIndex.to_frame().

In [9]:
df = pd.DataFrame([['bar', 'one'], ['bar', 'two'],
                  ['foo', 'one'], ['foo', 'two']],
                columns=['first', 'second'])


pd.MultiIndex.from_frame(df)

MultiIndex([('bar', 'one'),
            ('bar', 'two'),
            ('foo', 'one'),
            ('foo', 'two')],
           names=['first', 'second'])

### From Array

As a convenience, you can pass a list of arrays directly into Series or DataFrame to construct a MultiIndex automatically:

In [10]:
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]

s = pd.Series(np.random.randn(8), index=arrays)
s

bar  one   -2.584272
     two   -1.903928
baz  one   -0.237914
     two    0.371016
foo  one    1.141270
     two    0.585330
qux  one    0.166543
     two    0.269481
dtype: float64

In [11]:
df = pd.DataFrame(np.random.randn(8, 4), index=arrays)
df

Unnamed: 0,Unnamed: 1,0,1,2,3
bar,one,-0.83112,0.699244,0.829856,-1.476933
bar,two,0.08328,0.649442,-1.018775,0.451348
baz,one,0.047405,-0.60777,-0.726212,0.326543
baz,two,-0.380054,-0.405311,-0.598826,0.667852
foo,one,-1.286423,0.436432,1.825972,0.441394
foo,two,-0.614868,-0.459946,1.101575,0.925332
qux,one,-0.679105,0.489705,-1.064863,1.414558
qux,two,-1.539423,0.259661,0.879745,-1.279806


## Setting Name to a MultiIndex Column

All of the MultiIndex constructors accept a names argument which stores string names for the levels themselves. If no names are provided, None will be assigned:

In [12]:
df.index.names

FrozenList([None, None])

In [13]:
df.index.names=["First", "Second"]
df

Unnamed: 0_level_0,Unnamed: 1_level_0,0,1,2,3
First,Second,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
bar,one,-0.83112,0.699244,0.829856,-1.476933
bar,two,0.08328,0.649442,-1.018775,0.451348
baz,one,0.047405,-0.60777,-0.726212,0.326543
baz,two,-0.380054,-0.405311,-0.598826,0.667852
foo,one,-1.286423,0.436432,1.825972,0.441394
foo,two,-0.614868,-0.459946,1.101575,0.925332
qux,one,-0.679105,0.489705,-1.064863,1.414558
qux,two,-1.539423,0.259661,0.879745,-1.279806


## MultiIndex Columns

This index can back any axis of a pandas object, and the number of levels of the index is up to you:

In [14]:
index

MultiIndex([('bar', 'one'),
            ('bar', 'two'),
            ('baz', 'one'),
            ('baz', 'two'),
            ('foo', 'one'),
            ('foo', 'two'),
            ('qux', 'one'),
            ('qux', 'two')],
           names=['first', 'second'])

In [15]:
df = pd.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=index)

In [16]:
df

first,bar,bar,baz,baz,foo,foo,qux,qux
second,one,two,one,two,one,two,one,two
A,1.956486,2.151983,0.499499,-0.670543,0.218849,1.846146,1.522105,-0.130854
B,0.054652,-0.663455,-0.938688,0.417795,0.368269,-1.252605,-0.616281,-1.365875
C,-0.712757,1.145038,0.629326,-0.36906,1.191872,-1.643647,1.639173,-0.048317


It’s worth keeping in mind that there’s nothing preventing you from using tuples as atomic labels on an axis:

In [17]:
pd.Series(np.random.randn(8), index=tuples)

(bar, one)    1.737675
(bar, two)   -1.188568
(baz, one)    0.267118
(baz, two)   -0.944693
(foo, one)   -1.862257
(foo, two)    0.511478
(qux, one)    0.807032
(qux, two)    0.220280
dtype: float64

The reason that the MultiIndex matters is that it can allow you to do grouping, selection, and reshaping operations as we will describe below and in subsequent areas of this notebook. 

However, when loading data from a file, you may wish to generate your own MultiIndex when preparing the data set.

# ++++++++++++++++++++++++++++++++++++++++++++++++++++++
## Accessing Data from a MultiIndexed DataFrame

### Access Data from  MultiIndexed Rows

In [30]:
tuples = [('bar', 'one'),
 ('bar', 'two'),
 ('baz', 'one'),
 ('baz', 'two'),
 ('foo', 'one'),
 ('foo', 'two'),
 ('qux', 'one'),
 ('qux', 'two')]

mindex = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
mindex

MultiIndex([('bar', 'one'),
            ('bar', 'two'),
            ('baz', 'one'),
            ('baz', 'two'),
            ('foo', 'one'),
            ('foo', 'two'),
            ('qux', 'one'),
            ('qux', 'two')],
           names=['first', 'second'])

In [32]:
df = pd.DataFrame(np.random.randn(8, 3), columns=['A', 'B', 'C'], index=mindex)
df

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B,C
first,second,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
bar,one,0.924473,-0.347282,-0.388588
bar,two,1.206788,-0.364663,-1.173634
baz,one,1.308784,-0.287066,0.208028
baz,two,0.342473,-1.272328,0.298117
foo,one,1.150158,-0.224495,-0.526445
foo,two,0.135815,-0.731925,2.015287
qux,one,1.511933,-0.461144,0.593067
qux,two,1.2,0.389731,-0.765996


In [33]:
df.loc[('foo','one'),:]

A    1.150158
B   -0.224495
C   -0.526445
Name: (foo, one), dtype: float64

In [34]:
df.loc[('foo',slice(None)),:]

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B,C
first,second,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
foo,one,1.150158,-0.224495,-0.526445
foo,two,0.135815,-0.731925,2.015287


In [38]:
df.loc[('foo','two'),'C']

2.015286743798879

In [39]:
df.loc[(['foo','baz'],'one'),:]

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B,C
first,second,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
baz,one,1.308784,-0.287066,0.208028
foo,one,1.150158,-0.224495,-0.526445


In [40]:
df.loc[(['foo','baz'],slice(None)),:]

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B,C
first,second,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
baz,one,1.308784,-0.287066,0.208028
baz,two,0.342473,-1.272328,0.298117
foo,one,1.150158,-0.224495,-0.526445
foo,two,0.135815,-0.731925,2.015287


### Access Data from  MultiIndexed Columns

In [18]:
tuples = [('bar', 'one'),
 ('bar', 'two'),
 ('baz', 'one'),
 ('baz', 'two'),
 ('foo', 'one'),
 ('foo', 'two'),
 ('qux', 'one'),
 ('qux', 'two')]

mindex = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
mindex


MultiIndex([('bar', 'one'),
            ('bar', 'two'),
            ('baz', 'one'),
            ('baz', 'two'),
            ('foo', 'one'),
            ('foo', 'two'),
            ('qux', 'one'),
            ('qux', 'two')],
           names=['first', 'second'])

In [20]:
df = pd.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=mindex)
df

first,bar,bar,baz,baz,foo,foo,qux,qux
second,one,two,one,two,one,two,one,two
A,-0.509925,0.855016,-0.0824,1.105906,-1.372405,-0.015421,0.537703,0.417029
B,-0.509291,-0.577665,0.916687,0.815328,1.971836,0.887451,1.819334,0.475621
C,0.642399,0.046598,0.536567,-0.556607,0.397714,-1.011728,1.111485,-0.455608


In [23]:
df.loc['A',:]

first  second
bar    one      -0.509925
       two       0.855016
baz    one      -0.082400
       two       1.105906
foo    one      -1.372405
       two      -0.015421
qux    one       0.537703
       two       0.417029
Name: A, dtype: float64

In [25]:
df.loc['A',('bar', 'one')]

-0.5099250053323966

In [27]:
df.loc['A',('bar', slice(None))]

first  second
bar    one      -0.509925
       two       0.855016
Name: A, dtype: float64

In [28]:
df.loc['A',(['bar','foo'], slice(None))]

first  second
bar    one      -0.509925
       two       0.855016
foo    one      -1.372405
       two      -0.015421
Name: A, dtype: float64

In [29]:
df.loc[['A','B'],(['bar','foo'], slice(None))]

first,bar,bar,foo,foo
second,one,two,one,two
A,-0.509925,0.855016,-1.372405,-0.015421
B,-0.509291,-0.577665,1.971836,0.887451
