Data wrangling refers to combining, transforming, and / or re-arranging data to make it suitable for further analysis. We'll use Pandas for all data wrangling operations.

In [2]:
import pandas as pd

## Hierarchical indexing

Until now we have seen only a single level of indexing in the rows and columns of a Pandas DataFrame. Hierarchical indexing refers to having multiple index levels on an axis (row / column) of a Pandas DataFrame. It helps us to work with a higher dimensional data in a lower dimensional form. 

Let use define Pandas Series as we defined in Chapter 5:

In [43]:
#Defining a Pandas Series
series_example = pd.Series(['these','are','english','words','estas','son','palabras','en','español',
                            'ce','sont','des','françai','mots'])
series_example

0        these
1          are
2      english
3        words
4        estas
5          son
6     palabras
7           en
8      español
9           ce
10        sont
11         des
12     françai
13        mots
dtype: object

Let us use the attribute `nlevels` to find the number of levels of the row indices of this Series:

In [5]:
series_example.index.nlevels

1

The Series `series_example` has only one level of row indices.

Let us introduce another level of row indices while defining the Series:

In [45]:
#Defining a Pandas Series with multiple levels of row indices
series_example = pd.Series(['these','are','english','words','estas','son','palabras','en','español',
                           'ce','sont','des','françai','mots'], 
                          index=[['English']*4+['Spanish']*5+['French']*5,list(range(1,5))+list(range(1,6))*2])
series_example

English  1       these
         2         are
         3     english
         4       words
Spanish  1       estas
         2         son
         3    palabras
         4          en
         5     español
French   1          ce
         2        sont
         3         des
         4     françai
         5        mots
dtype: object

In the above Series, there are two levels of row indices:

In [46]:
series_example.index.nlevels

2

The first four observations of the Series correspond to the outer row index `English`, while the last 5 rows correspond to the outer row index `Spanish`. We can use the indices at the outler level to concisely subset the Series. For example, let us subset all the observation corresponding to the outer row index `English`:

In [47]:
#Subsetting data by row-index
series_example['English']

1      these
2        are
3    english
4      words
dtype: object

Just like in the case of single level indices, if we wish to subset corresponding to multiple outer-level indices, we put the indices within an additional box bracket `[]`. For example, let us subset all the observations corresponding to the row-indices `English` and `French`:

In [48]:
#Subsetting data by multiple row-indices
series_example[['English','French']]

English  1      these
         2        are
         3    english
         4      words
French   1         ce
         2       sont
         3        des
         4    françai
         5       mots
dtype: object

We can also subset data using the inner row index:

In [49]:
#Subsetting data by row-index
series_example.loc[:,2]

English     are
Spanish     son
French     sont
dtype: object

In [50]:
#Subsetting data by mutiple row-indices
series_example.loc[:,[1,2]]

English  1    these
Spanish  1    estas
French   1       ce
English  2      are
Spanish  2      son
French   2     sont
dtype: object

Apart from ease in subsetting data, hierarchical indexing also plays a role in reshaping data. For example, the Pandas Series [`unstack()`](https://pandas.pydata.org/docs/reference/api/pandas.Series.unstack.html) method pivots the inner-most level of row indices to columns, thereby creating a DataFrame:

In [52]:
#Pivoting the innermost Series row index to column labels
series_example_unstack = series_example.unstack()
series_example_unstack

Unnamed: 0,1,2,3,4,5
English,these,are,english,words,
French,ce,sont,des,françai,mots
Spanish,estas,son,palabras,en,español


Also, check out the [`unstack()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.unstack.html) method of the Pandas DataFrame class.

The inverse of `unstack()` is the [`stack()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.stack.html) method, which creates the inner-most level of row indices by pivoting the column labels of the prescribed level. Note that in this case the column labels have only one level, so we don't need to specify a level. 

In [66]:
series_example_unstack.stack()

English  1       these
         2         are
         3     english
         4       words
French   1          ce
         2        sont
         3         des
         4     françai
         5        mots
Spanish  1       estas
         2         son
         3    palabras
         4          en
         5     español
dtype: object