### MULTI INDEX DATAFRAMES

+ Multi Index Dataframes are generally created through Aggregation Operation.
+ They are stored as a list of Tuples, with an item of each layer of the index.

The .loc Accessor lets you access the Multi Index Dataframes in different ways.

1. Access rows via Outer Index Only.
2. Access rows via Outer and Inner Indices.

In [1]:
import pandas as pd
import numpy as np

In [32]:
retail = pd.read_csv("retail_2016_2017.csv")
print(retail.loc[:5])
print("="*80)
## create a sales_sum df by grouping family, store_nbr and sales.
sales_sum = retail.groupby(["family","store_nbr"])[["sales"]].sum()
print(sales_sum)
print("="*80)
## here the outer index is column family, entries are "AUTOMOTIVE", etc.
## Access the df using the Outer Index Only.
print(sales_sum.loc["AUTOMOTIVE"].head())
print("="*80)
print(sales_sum.loc["SEAFOOD"].head())

print("="*80)
## Accessing the rows using outer and inner indices
sales_sum.loc[("AUTOMOTIVE",12), :]

        id        date  store_nbr        family  sales  onpromotion
0  1945944  2016-01-01          1    AUTOMOTIVE    0.0            0
1  1945945  2016-01-01          1     BABY CARE    0.0            0
2  1945946  2016-01-01          1        BEAUTY    0.0            0
3  1945947  2016-01-01          1     BEVERAGES    0.0            0
4  1945948  2016-01-01          1         BOOKS    0.0            0
5  1945949  2016-01-01          1  BREAD/BAKERY    0.0            0
                             sales
family     store_nbr              
AUTOMOTIVE 1           2524.000000
           2           3918.000000
           3           6790.000000
           4           2565.000000
           5           3667.000000
...                            ...
SEAFOOD    50         12773.966999
           51         34250.948976
           52          1219.475999
           53          3745.180001
           54          1082.000000

[1782 rows x 1 columns]
            sales
store_nbr        
1       

sales    3507.0
Name: (AUTOMOTIVE, 12), dtype: float64

In [36]:
## using the aggregate function
## to get the sum and mean from sales wrt to family and store number.

sales_agg = retail.groupby(["family","store_nbr"]).agg({"sales":["sum","mean"]})
sales_agg

Unnamed: 0_level_0,Unnamed: 1_level_0,sales,sales
Unnamed: 0_level_1,Unnamed: 1_level_1,sum,mean
family,store_nbr,Unnamed: 2_level_2,Unnamed: 3_level_2
AUTOMOTIVE,1,2524.000000,4.263514
AUTOMOTIVE,2,3918.000000,6.618243
AUTOMOTIVE,3,6790.000000,11.469595
AUTOMOTIVE,4,2565.000000,4.332770
AUTOMOTIVE,5,3667.000000,6.194257
...,...,...,...
SEAFOOD,50,12773.966999,21.577647
SEAFOOD,51,34250.948976,57.856333
SEAFOOD,52,1219.475999,2.059926
SEAFOOD,53,3745.180001,6.326318


### MODIFYING THE MULTI INDEX DATAFRAMES

THere are few methods to Modify the Multi Index Dataframe.

1. Reset Index Method `.reset_index()` : Moves the Index levels back to Dataframe Columns.
2. Swap the Index Level `.swaplevel()` : Changes the Hierarchy for the Index Level.
3. Drop an Index Level `.droplevel("column")` : Drops the index level from the Dataframe Completely.


In [45]:
sales_agg = retail.groupby(["family","store_nbr"]).agg({"sales":["sum","mean"]})
sales_agg

Unnamed: 0_level_0,Unnamed: 1_level_0,sales,sales
Unnamed: 0_level_1,Unnamed: 1_level_1,sum,mean
family,store_nbr,Unnamed: 2_level_2,Unnamed: 3_level_2
AUTOMOTIVE,1,2524.000000,4.263514
AUTOMOTIVE,2,3918.000000,6.618243
AUTOMOTIVE,3,6790.000000,11.469595
AUTOMOTIVE,4,2565.000000,4.332770
AUTOMOTIVE,5,3667.000000,6.194257
...,...,...,...
SEAFOOD,50,12773.966999,21.577647
SEAFOOD,51,34250.948976,57.856333
SEAFOOD,52,1219.475999,2.059926
SEAFOOD,53,3745.180001,6.326318


In [46]:
## reset index method
sales_agg.reset_index()

Unnamed: 0_level_0,family,store_nbr,sales,sales
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,sum,mean
0,AUTOMOTIVE,1,2524.000000,4.263514
1,AUTOMOTIVE,2,3918.000000,6.618243
2,AUTOMOTIVE,3,6790.000000,11.469595
3,AUTOMOTIVE,4,2565.000000,4.332770
4,AUTOMOTIVE,5,3667.000000,6.194257
...,...,...,...,...
1777,SEAFOOD,50,12773.966999,21.577647
1778,SEAFOOD,51,34250.948976,57.856333
1779,SEAFOOD,52,1219.475999,2.059926
1780,SEAFOOD,53,3745.180001,6.326318


In [47]:
## swap the index level
sales_agg.swaplevel()

Unnamed: 0_level_0,Unnamed: 1_level_0,sales,sales
Unnamed: 0_level_1,Unnamed: 1_level_1,sum,mean
store_nbr,family,Unnamed: 2_level_2,Unnamed: 3_level_2
1,AUTOMOTIVE,2524.000000,4.263514
2,AUTOMOTIVE,3918.000000,6.618243
3,AUTOMOTIVE,6790.000000,11.469595
4,AUTOMOTIVE,2565.000000,4.332770
5,AUTOMOTIVE,3667.000000,6.194257
...,...,...,...
50,SEAFOOD,12773.966999,21.577647
51,SEAFOOD,34250.948976,57.856333
52,SEAFOOD,1219.475999,2.059926
53,SEAFOOD,3745.180001,6.326318


In [48]:
## drop the index level from the dataframe completely
sales_agg.droplevel("family")

Unnamed: 0_level_0,sales,sales
Unnamed: 0_level_1,sum,mean
store_nbr,Unnamed: 1_level_2,Unnamed: 2_level_2
1,2524.000000,4.263514
2,3918.000000,6.618243
3,6790.000000,11.469595
4,2565.000000,4.332770
5,3667.000000,6.194257
...,...,...
50,12773.966999,21.577647
51,34250.948976,57.856333
52,1219.475999,2.059926
53,3745.180001,6.326318
