## Multi-Indexing (Hierarchical Indexing)

### Importing `pandas`

In [1]:
import pandas as pd

For this example, we create a simple table with the population of different states of the USA for two different years (2010 and 2011)

In [2]:
#definign a table (DataFrame)
df=  pd.DataFrame({'Country': ['California','California','New York','New York', 'Texas','Texas'],
                  'Year': [2010,2011,2010,2011,2010,2011],
                  'Population': [33871648,37253956,18976457,19378102,20851820,25145561]})

In [3]:
df

Unnamed: 0,Country,Year,Population
0,California,2010,33871648
1,California,2011,37253956
2,New York,2010,18976457
3,New York,2011,19378102
4,Texas,2010,20851820
5,Texas,2011,25145561


To create a multi-index DataFrame, we simply pass multiple column names to the `set_index()` function

In [10]:
#setting multiple index for the DataFrame
df_2 = df.set_index(['Country','Year'])

In [11]:
df_2

Unnamed: 0_level_0,Unnamed: 1_level_0,Population
Country,Year,Unnamed: 2_level_1
California,2010,33871648
California,2011,37253956
New York,2010,18976457
New York,2011,19378102
Texas,2010,20851820
Texas,2011,25145561


Given a `csv` file, we can read the those in `pandas` with the `pd.read_csv()` function and pass in the list of columns, which we want to set as index, in the `index_col` parameter as shown [here](https://stackoverflow.com/questions/19103624/load-csv-to-pandas-multiindex-dataframe)

### To access individual data from a `MultiIndex` table

In [12]:
#accessing an element from the multiindex table
df_2.loc['Texas',2010]

Population    20851820
Name: (Texas, 2010), dtype: int64

In [38]:
table = pd.DataFrame({'col1': [1,2,3,4,5,6],
                     'col2': [6,7,8,9,10,11]})

### Explicitly creating a multi-index table

In [39]:
#explicitly creating a multiindex using .from_product()
list1 = ['A','B','C']
list2 = ['one', 'two']

index1 = pd.MultiIndex.from_product([list1,list2])

In [40]:
index1

MultiIndex([('A', 'one'),
            ('A', 'two'),
            ('B', 'one'),
            ('B', 'two'),
            ('C', 'one'),
            ('C', 'two')],
           )

In [41]:
table1 = table.set_index(index1)
display(table1)

Unnamed: 0,Unnamed: 1,col1,col2
A,one,1,6
A,two,2,7
B,one,3,8
B,two,4,9
C,one,5,10
C,two,6,11


In [42]:
#explicitly creating a multiindex using from_tuples()
tup = [('A','one'),('A','two'),('B','one'),('B','two'),('C','one'),('C','two')]
index2 = pd.MultiIndex.from_tuples(tup)
index2

MultiIndex([('A', 'one'),
            ('A', 'two'),
            ('B', 'one'),
            ('B', 'two'),
            ('C', 'one'),
            ('C', 'two')],
           )

In [44]:
table2 = table.set_index(index2)
table2

Unnamed: 0,Unnamed: 1,col1,col2
A,one,1,6
A,two,2,7
B,one,3,8
B,two,4,9
C,one,5,10
C,two,6,11


In [47]:
#explicitly creating a multiindex using from_array()

array1 = np.array(['A','A','B','B','C','C'])
array2 = np.array(['one','two','one','two','one','two'])
array = [array1, array2]

In [49]:
index3 = pd.MultiIndex.from_arrays(array)
index3

MultiIndex([('A', 'one'),
            ('A', 'two'),
            ('B', 'one'),
            ('B', 'two'),
            ('C', 'one'),
            ('C', 'two')],
           )

In [50]:
table3 = table.set_index(index3)
table3

Unnamed: 0,Unnamed: 1,col1,col2
A,one,1,6
A,two,2,7
B,one,3,8
B,two,4,9
C,one,5,10
C,two,6,11
