## Using multilevel indexes in Pandas

1. Creating a MultiIndex
2. Indexing the DataFrames with MultiIndexes
3. Applying stack() and unstack() Pandas functions

In [14]:
import pandas as pd
import seaborn

#seaborn includes some interesting datasets

In [15]:
# Although Seaborn is an extension to the plotting package matplotlib, 
#it also includes some interesting datasets.

flights = seaborn.load_dataset('flights')


#The dataset flights contains data about passengers who took flights between 1949 and 1960. 

In [16]:
flights.head()    #head method of a pandas dataframe shows only the beginning of the frame


Unnamed: 0,year,month,passengers
0,1949,January,112
1,1949,February,118
2,1949,March,132
3,1949,April,129
4,1949,May,121


In [17]:
#In this case, it's convenient to use a Pandas MultiIndex 
#by telling Pandas to index the dataframe flights with both, year and month.

flights_indexed = flights.set_index(['year','month'])

In [18]:
flights_indexed.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,passengers
year,month,Unnamed: 2_level_1
1949,January,112
1949,February,118
1949,March,132
1949,April,129
1949,May,121


In [19]:
#we may just select a specific year, using loc since we're using the value of an index.

flights_indexed.loc[1949]

Unnamed: 0_level_0,passengers
month,Unnamed: 1_level_1
January,112
February,118
March,132
April,129
May,121
June,135
July,148
August,148
September,136
October,119


In [20]:
flights_indexed.loc[1949:1950]

#Since slicing is on an explicit index, the end year 1950 is included

Unnamed: 0_level_0,Unnamed: 1_level_0,passengers
year,month,Unnamed: 2_level_1
1949,January,112
1949,February,118
1949,March,132
1949,April,129
1949,May,121
1949,June,135
1949,July,148
1949,August,148
1949,September,136
1949,October,119


In [21]:
#We could also select a specific year and month
flights_indexed.loc[1949,'January']

passengers    112
Name: (1949, January), dtype: int64

In [22]:
# We could also select a range of months.
# First select the entire year and then use slicing

flights_indexed.loc[1949].loc['January':'June']

Unnamed: 0_level_0,passengers
month,Unnamed: 1_level_1
January,112
February,118
March,132
April,129
May,121
June,135


The unstack method let's us trade between the levels of a MultiIndex and column names.

In [23]:
flights_unstacked = flights_indexed.unstack()

flights_unstacked

Unnamed: 0_level_0,passengers,passengers,passengers,passengers,passengers,passengers,passengers,passengers,passengers,passengers,passengers,passengers
month,January,February,March,April,May,June,July,August,September,October,November,December
year,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
1949,112,118,132,129,121,135,148,148,136,119,104,118
1950,115,126,141,135,125,149,170,170,158,133,114,140
1951,145,150,178,163,172,178,199,199,184,162,146,166
1952,171,180,193,181,183,218,230,242,209,191,172,194
1953,196,196,236,235,229,243,264,272,237,211,180,201
1954,204,188,235,227,234,264,302,293,259,229,203,229
1955,242,233,267,269,270,315,364,347,312,274,237,278
1956,284,277,317,313,318,374,413,405,355,306,271,306
1957,315,301,356,348,355,422,465,467,404,347,305,336
1958,340,318,362,348,363,435,491,505,404,359,310,337


The second level of multi index has been used to create columns.

In [24]:
#sum along the rows

flights_unstacked.sum(axis=1)

year
1949    1520
1950    1676
1951    2042
1952    2364
1953    2700
1954    2867
1955    3408
1956    3939
1957    4421
1958    4572
1959    5140
1960    5714
dtype: int64

In [26]:
flights_unstacked['passengers','total'] = flights_unstacked.sum(axis=1)

flights_unstacked

TypeError: cannot insert an item into a CategoricalIndex that is not already an existing category

In [None]:
flights_restacked = flights_unstacked.stack()

In [None]:
flights_restacked

In [None]:
flights_restacked.loc[pd.IndexSlice[:'total'],'passengers']