# Multi Index

Multi-level indices in pandas refer to a way of organizing data in a dataframe by creating a hierarchical index consisting of multiple levels. This allows for more complex and specific grouping and selection of data within the dataframe.

Each level of the index represents a different dimension of the data, with the first level being the most general and the last level being the most specific. Multi-level indices can be created using the `set_index` function in pandas.

To select data from a multi-level index, you can use the `loc` function and specify the index values for each level. You can also use the `xs` function to select data from a specific level of the index.

Multi-level indices can be used on both rows and columns in pandas dataframes to organize and analyze complex data structures. (Multi-level columns are used when you wanted to group columns together.)


## Example

In [None]:
#get original DataFrame’s index label
import pandas as pd
df = pd.read_excel('../datasets/State Tax and GSP.xlsx')
print( 'index:', df.index )
print( 'index.names:', df.index.names )

To create a Multi-index with our original DataFrame, we use the __.set_index()__ function with multiple variables. The first variable is most general (State) and the second one more specific (Year)

In [None]:
multi = df.set_index(['State', 'Year'])
print(multi)

### multi-level indices: names, levels, and values

In [None]:
# index names, levels and values
print( 'index names:', multi.index.names )
print( 'index levels:', multi.index.levels )

In [None]:
# displays each index value (list of tuples)
multi.index.values

In [None]:
# this is also the number of records (since each state-year is unique)
len(multi.index.values)

### `loc` function

In [None]:
# using .loc function to get all the data for a given index
multi.loc[('Florida'), :].head()

### `xs` function

In [None]:
# xs function has different syntax
multi.xs('Florida', level='State').head()

In [None]:
# multiple indices: specify index values as a tuple
multi.xs( ( 'Florida', 2019) )

### Selecting multiple rows

In [None]:
# several (list) of multiple indices: specify as list of tuples
# using loc
multi.loc[[ ( 'Florida', 2019), ( 'Alabama', 2019) ] ]

### Aggregate functions

In [None]:
# sum function
multi.xs('Florida', level='State').sum()

### Further reading

Documentation, indexing (in general): https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html 
        
More on multi-level indices: https://towardsdatascience.com/confused-by-multi-index-in-pandas-9-essential-operations-to-know-e6aec29ee6d8
        
Slicing: https://kanoki.org/2022/07/25/pandas-select-slice-rows-columns-multiindex-dataframe/
    