In [2]:
import pandas as pd
import numpy as np

# Hierarchical indexing (MultiIndex)

In essence, A MultiIndex, also known as a multi-level index or hierarchical index enables us to store and manipulate data with an arbitrary number of dimensions in lower dimensional data structures like Series (1d) and DataFrame (2d).

It allows us to have multiple columns acting as a row identifier, while having each index column related to another through a parent/child relationship.

Reference: [https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html)



## Load and explore data

In [3]:
# load datasets
df = pd.read_csv('../../datasets/LordOfTheRings/WordsByCharacter.csv')

In [4]:
df.head(3)

Unnamed: 0,Film,Chapter,Character,Race,Words
0,The Fellowship Of The Ring,01: Prologue,Bilbo,Hobbit,4
1,The Fellowship Of The Ring,01: Prologue,Elrond,Elf,5
2,The Fellowship Of The Ring,01: Prologue,Galadriel,Elf,460


In [5]:
#get index labels:
print(df.index)
print(df.index.values[:20])
# print(df.index.name)
print(df.index.names)
# print(df.index.value_counts())

RangeIndex(start=0, stop=731, step=1)
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
[None]


## Create MultiIndex





### From columns with [set_index()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.set_index.html)


In [6]:
print(df.columns)

Index(['Film', 'Chapter', 'Character', 'Race', 'Words'], dtype='object')


In [7]:
df_multi = df.set_index(['Film', 'Chapter', 'Race', 'Character'])

print(df_multi.head())

print(df_multi.index.names)
print(df_multi.index.values[:5])

                                                                    Words
Film                       Chapter                Race   Character       
The Fellowship Of The Ring 01: Prologue           Hobbit Bilbo          4
                                                  Elf    Elrond         5
                                                         Galadriel    460
                                                  Gollum Gollum        20
                           02: Concerning Hobbits Hobbit Bilbo        214
['Film', 'Chapter', 'Race', 'Character']
[('The Fellowship Of The Ring', '01: Prologue', 'Hobbit', 'Bilbo')
 ('The Fellowship Of The Ring', '01: Prologue', 'Elf', 'Elrond')
 ('The Fellowship Of The Ring', '01: Prologue', 'Elf', 'Galadriel')
 ('The Fellowship Of The Ring', '01: Prologue', 'Gollum', 'Gollum')
 ('The Fellowship Of The Ring', '02: Concerning Hobbits', 'Hobbit', 'Bilbo')]


Note, that the index is not sorted by default. To do that, we can use [sort_index()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_index.html) method

## Remove Multiindex

### With [reset_index()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.reset_index.html)

Reset the index of the DataFrame, and use the default one instead. If the DataFrame has a MultiIndex, this method can remove one or more levels.

In [8]:
# remove all index levels
tmp = df_multi.reset_index()
tmp.head(3)


Unnamed: 0,Film,Chapter,Race,Character,Words
0,The Fellowship Of The Ring,01: Prologue,Hobbit,Bilbo,4
1,The Fellowship Of The Ring,01: Prologue,Elf,Elrond,5
2,The Fellowship Of The Ring,01: Prologue,Elf,Galadriel,460


In [9]:
# reset only a subset of index
tmp = df_multi.reset_index(level=['Chapter','Race'])
tmp.head(10)
# df_multi.head(3)

Unnamed: 0_level_0,Unnamed: 1_level_0,Chapter,Race,Words
Film,Character,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
The Fellowship Of The Ring,Bilbo,01: Prologue,Hobbit,4
The Fellowship Of The Ring,Elrond,01: Prologue,Elf,5
The Fellowship Of The Ring,Galadriel,01: Prologue,Elf,460
The Fellowship Of The Ring,Gollum,01: Prologue,Gollum,20
The Fellowship Of The Ring,Bilbo,02: Concerning Hobbits,Hobbit,214
The Fellowship Of The Ring,Bilbo,03: The Shire,Hobbit,70
The Fellowship Of The Ring,Frodo,03: The Shire,Hobbit,128
The Fellowship Of The Ring,Gandalf,03: The Shire,Ainur,197
The Fellowship Of The Ring,Hobbit Kids,03: The Shire,Hobbit,10
The Fellowship Of The Ring,Hobbits,03: The Shire,Hobbit,12


## Sorting a MultiIndex

For MultiIndex-ed objects to be indexed and sliced effectively, they need to be sorted. As with any index, you can use [sort_index()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_index.html).

Indexing will work even if the data are not sorted, but will be rather inefficient (and show a PerformanceWarning). It will also return a copy of the data rather than a view.

In [None]:
df_multi_sorted = df.set_index(['Film', 'Chapter', 'Race', 'Character']).sort_index()
df_multi_sorted

## Index DataFrame with MultiIndex

we can use tupple with values for desired indexes with .loc

In [40]:
### TASK: which characters speak in the first chapter of “The Fellowship of the Ring”?
# select all rows for which Film='The Fellowship Of The Ring' and Chapter='01: Prologue'
df_multi_sorted.loc[('The Fellowship Of The Ring','01: Prologue'),:]

Unnamed: 0_level_0,Unnamed: 1_level_0,Words
Race,Character,Unnamed: 2_level_1
Elf,Elrond,5
Elf,Galadriel,460
Gollum,Gollum,20
Hobbit,Bilbo,4


### Skip a level in MultiIndex

TASK: find the first five elves which speak in "The Fellowship Of The Ring"

#### Using slice(None) to select all values in sub-index

In [39]:
### TASK: find the first five elves which speak in "The Fellowship Of The Ring"
df_multi_sorted.loc[('The Fellowship Of The Ring',slice(None),'Elf'),:].head(5)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Words
Film,Chapter,Race,Character,Unnamed: 4_level_1
The Fellowship Of The Ring,01: Prologue,Elf,Elrond,5
The Fellowship Of The Ring,01: Prologue,Elf,Galadriel,460
The Fellowship Of The Ring,21: Flight To The Ford,Elf,Arwen,131
The Fellowship Of The Ring,22: Rivendell,Elf,Elrond,7
The Fellowship Of The Ring,23: Many Meetings,Elf,Elrond,5


#### Using pd.IndexSlice to select all values in sub-index

In [42]:
### TASK: find the first five elves which speak in "The Fellowship Of The Ring"
idx = pd.IndexSlice
df_multi_sorted.loc[idx['The Fellowship Of The Ring',:,'Elf'],:].head(5)
# df_multi_sorted.loc[('The Fellowship Of The Ring',slice(None),'Elf')].head(5)



Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Words
Film,Chapter,Race,Character,Unnamed: 4_level_1
The Fellowship Of The Ring,01: Prologue,Elf,Elrond,5
The Fellowship Of The Ring,01: Prologue,Elf,Galadriel,460
The Fellowship Of The Ring,21: Flight To The Ford,Elf,Arwen,131
The Fellowship Of The Ring,22: Rivendell,Elf,Elrond,7
The Fellowship Of The Ring,23: Many Meetings,Elf,Elrond,5


#### TASK: 