# Pandas MultiIndex Tutorial

In [1]:
from typing import Any

import numpy as np
import pandas as pd

# What is a MultiIndex DataFrame?

Pandas' MultiIndex \[DataFrame\] enables you to effectively store and manipulate arbitrarily high dimension data in a 2-dimensional tabular structure (DataFrame).

While the displayed version of a MultiIndex df doesn't appear to be much more than a prettily-organized regular df, it's actually a pretty powerful structure if the data warrants its use.

# When should you use one?

1. When a single column’s value isn’t enough to uniquely identify a row (e.g. multiple records on the same date means date alone isn’t a good index).
2. When data is logically hierarchical - meaning that is has multiple dimensions or “levels.”

Besides structure, multiindexes offer us two benefits:
- Relatively easy retreival of complex data retreival.
- Improved efficiency if lookups and merges will be frequent..? (NEED TO EXPLORE THIS)

# First, some quick groundwork

- 2-minute anatomy of a dataframe
- What’s an index in pandas?
  - The index of a DataFrame is a set that consists of a label for each row. To be helpful, those labels should be meaningful and unique.
- Example:
  - Start w/ range index - unique, but not super useful
  - Date
  - But what about data with multiple transactions per date?

# Realistic Demo Data

xxx Description of the data xxx

In [2]:
df = pd.read_csv('data.csv', parse_dates=['Date'])
df

Unnamed: 0,Date,Store,Category,Subcategory,UPC EAN,Description,Dollars,Units
0,2018-07-10,Store 2,Beer,Ales,737000000000.0,Goose Island - Honkers Ale - 6 Pack,92,9
1,2018-07-10,Store 1,Beer,Ales,737000000000.0,Goose Island - Honkers Ale - 6 Pack,90,9
2,2018-07-11,Store 1,Beer,Lagers,737000000000.0,Brand2 - RandomName1 - 6 Pack,47,6
3,2018-07-11,Store 2,Beer,Stouts,737000000000.0,Brand2 - RandomName2 - 6 Pack,47,6
4,2018-07-12,Store 1,Beer,Ales,737000000000.0,Goose Island - Honkers Ale - 6 Pack,104,9
5,2018-07-12,Store 3,Beer,Malts,737000000000.0,Goose Island - Honkers Ale - 6 Pack,90,9
6,2018-07-10,Store 3,Wine,Red,737000000000.0,Goose Island - Honkers Ale - 6 Pack,90,9
7,2018-07-13,Store 2,Wine,White,737000000000.0,Goose Island - Honkers Ale - 6 Pack,90,9
8,2018-07-13,Store 3,Wine,Rose,737000000000.0,Goose Island - Honkers Ale - 6 Pack,90,9
9,2018-07-12,Store 1,Alcohol,Liqour,9740000000000.0,Goose Island - Honkers Ale - 6 Pack,90,9


# Setting and Manipulating MultiIndexes



xxx reference the data. Explain the format we want. xxxxxxx Let's take a look at how we can create our multiindex from our regular ol' DataFrame. We'll walk through the basics of setting, reordering, and resetting indexes, along with some useful tips/tricks.

In [3]:
# Set just like the index for a DataFrame...
# ...except we give a list of column names instead of a single string column name
df.set_index(['Date', 'Store', 'Category', 'Subcategory', 'Description'], inplace=True)
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,UPC EAN,Dollars,Units
Date,Store,Category,Subcategory,Description,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-07-10,Store 2,Beer,Ales,Goose Island - Honkers Ale - 6 Pack,737000000000.0,92,9
2018-07-10,Store 1,Beer,Ales,Goose Island - Honkers Ale - 6 Pack,737000000000.0,90,9
2018-07-11,Store 1,Beer,Lagers,Brand2 - RandomName1 - 6 Pack,737000000000.0,47,6
2018-07-11,Store 2,Beer,Stouts,Brand2 - RandomName2 - 6 Pack,737000000000.0,47,6
2018-07-12,Store 1,Beer,Ales,Goose Island - Honkers Ale - 6 Pack,737000000000.0,104,9
2018-07-12,Store 3,Beer,Malts,Goose Island - Honkers Ale - 6 Pack,737000000000.0,90,9
2018-07-10,Store 3,Wine,Red,Goose Island - Honkers Ale - 6 Pack,737000000000.0,90,9
2018-07-13,Store 2,Wine,White,Goose Island - Honkers Ale - 6 Pack,737000000000.0,90,9
2018-07-13,Store 3,Wine,Rose,Goose Island - Honkers Ale - 6 Pack,737000000000.0,90,9
2018-07-12,Store 1,Alcohol,Liqour,Goose Island - Honkers Ale - 6 Pack,9740000000000.0,90,9


Uh oh - it looks like we forgot to add the 'UPC EAN' column to our index, but don't worry - pandas has us covered with extra set_index parameters for MultiIndexes:

In [4]:
# We can append a column to our existing index
df.set_index('UPC EAN', append=True, inplace=True)
df.head(3)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Dollars,Units
Date,Store,Category,Subcategory,Description,UPC EAN,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-07-10,Store 2,Beer,Ales,Goose Island - Honkers Ale - 6 Pack,737000000000.0,92,9
2018-07-10,Store 1,Beer,Ales,Goose Island - Honkers Ale - 6 Pack,737000000000.0,90,9
2018-07-11,Store 1,Beer,Lagers,Brand2 - RandomName1 - 6 Pack,737000000000.0,47,6


That's almost right, but we'd actually like 'Description' to show up after 'UPC EAN'. We have a couple of options to get things in the right order:

In [5]:
# Option 1 is the generalized solution to reorder the index levels
# Note: We're not making an inplace change in this cell,
#       but it's worth noting that this method doesn't have an inplace parameter.
df.reorder_levels(order=['Date', 'Store', 'Category', 'Subcategory', 'UPC EAN', 'Description']).head(3)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Dollars,Units
Date,Store,Category,Subcategory,UPC EAN,Description,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-07-10,Store 2,Beer,Ales,737000000000.0,Goose Island - Honkers Ale - 6 Pack,92,9
2018-07-10,Store 1,Beer,Ales,737000000000.0,Goose Island - Honkers Ale - 6 Pack,90,9
2018-07-11,Store 1,Beer,Lagers,737000000000.0,Brand2 - RandomName1 - 6 Pack,47,6


reorder_levels() is useful, but it was a pain to have to type all five levels just two switch two. In cases like this we have a second, less verbose option:

In [6]:
# Option 2 just switches two index levels (a more common need than you'd think)
# Note: This time we're doing an inplace change, but there's no parameter for this method either.
df = df.swaplevel('Description', 'UPC EAN')
df.head(3)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Dollars,Units
Date,Store,Category,Subcategory,UPC EAN,Description,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-07-10,Store 2,Beer,Ales,737000000000.0,Goose Island - Honkers Ale - 6 Pack,92,9
2018-07-10,Store 1,Beer,Ales,737000000000.0,Goose Island - Honkers Ale - 6 Pack,90,9
2018-07-11,Store 1,Beer,Lagers,737000000000.0,Brand2 - RandomName1 - 6 Pack,47,6


Just when we thought we were done, it turns our we forgot to add the highest level of the product hierarchy - the Department - not just to our index, but to our DataFrame altogether. Luckily all of our records belong in the same Department, so here's a neat trick to add a new column with all the same values as a level in an existing index:

In [7]:
# A handy function to keep around for projects
def add_constant_index_level(df: pd.DataFrame, value: Any, level_name: str):
    """Add a new level to an existing index where every row has the same, given value.
    
    Args:
        df: Any existing pd.DataFrame.
        value: Value to be placed in every row of the new index level.
        level_name: Title of the new index level.
    
    Returns:
        df with an additional, prepended index level.
    """
    return pd.concat([df], keys=[value], names=[level_name])

df = add_constant_index_level(df, "Booooze", "Department")
df = df.reorder_levels(order=['Date', 'Store', 'Department', 'Category', 'Subcategory', 'UPC EAN', 'Description'])
df.head(3)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,Dollars,Units
Date,Store,Department,Category,Subcategory,UPC EAN,Description,Unnamed: 7_level_1,Unnamed: 8_level_1
2018-07-10,Store 2,Booooze,Beer,Ales,737000000000.0,Goose Island - Honkers Ale - 6 Pack,92,9
2018-07-10,Store 1,Booooze,Beer,Ales,737000000000.0,Goose Island - Honkers Ale - 6 Pack,90,9
2018-07-11,Store 1,Booooze,Beer,Lagers,737000000000.0,Brand2 - RandomName1 - 6 Pack,47,6


In [8]:
# # Other ways to interact with you index levels
# rename levels
# fill values

# Replace np.nans in index values in MultiIndex
# https://stackoverflow.com/questions/41515877/how-to-set-index-values-in-a-multiindex-pandas-dataframe
df.rename(index={np.nan: "''"}, inplace=True)

# checking out their unique values, for a single level 
df.index.get_level_values('Subcategory').unique()
# checking out their unique values, for combinations of multiple levels

Index(['Ales', 'Lagers', 'Stouts', 'Malts', 'Red', 'White', 'Rose', 'Liqour',
       'Liquor'],
      dtype='object', name='Subcategory')

# Understanding the MultiIndex Object

Why is this section all the way down here? Because the MultiIndex object is scary looking if you're new to using them. Many guides to hierarchical data analysis using multiindex DataFrames start with DataFrame creation and manipulation using MultiIndex objects, which I think both hinders adoption and is not reflective of how a lot of DataFrames get created in practice. As a result, my explanation of MultiIndex objects is very basic, because there are lots of other great resources out there if you want to learn more. Here are my top two:
 * [Official guide](https://pandas.pydata.org/pandas-docs/stable/advanced.html?highlight=indexslice#hierarchical-indexing-multiindex)
 * [Python Data Science Handbook by Jake Vanderplas](https://jakevdp.github.io/PythonDataScienceHandbook/03.05-hierarchical-indexing.html#Methods-of-MultiIndex-Creation)

In [9]:
df.index

MultiIndex(levels=[[2018-07-10 00:00:00, 2018-07-11 00:00:00, 2018-07-12 00:00:00, 2018-07-13 00:00:00], ['Store 1', 'Store 2', 'Store 3'], ['Booooze'], ['Alcohol', 'Beer', 'Wine'], ['Ales', 'Lagers', 'Liqour', 'Liquor', 'Malts', 'Red', 'Rose', 'Stouts', 'White'], [737000000000.0, 9740000000000.0], ['Brand2 - RandomName1 - 6 Pack', 'Brand2 - RandomName2 - 6 Pack', 'Goose Island - Honkers Ale - 6 Pack']],
           labels=[[0, 0, 1, 1, 2, 2, 0, 3, 3, 2, 2, 2], [1, 0, 0, 1, 0, 2, 2, 1, 2, 0, 1, 2], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1, 2, 2, 2, 0, 0, 0], [0, 0, 1, 7, 0, 4, 5, 8, 6, 2, 3, 3], [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1], [2, 2, 0, 1, 2, 2, 2, 2, 2, 2, 2, 2]],
           names=['Date', 'Store', 'Department', 'Category', 'Subcategory', 'UPC EAN', 'Description'])

Well that's gross looking...but don't be scared - it's actually not that hard to understand.

**'levels'** is a list of lists, where each sublist represents all possible values in that index level. In other words, the 'levels' parameter reflects all possible unique values by level. For example, our first index level ('Date') has the possible values \['2018-07-10', '2018-07-11', '2018-07-12', '2018-07-13'\].

* **Important Note:** When talking about a multiindex DataFrame (not the parameter for the MultiIndex object), we talk about the "levels" as the index "columns." For example, the 'levels' of our df in a more general sense are 'Date', 'Store', 'Department', etc. Levels in this sense (and elsewhere in code) can also be referenced by number (e.g. 'Date' = 0 \[read as 'level 0'\], 'Store' = 1, 'Department' = 2, etc.).

**'labels'** is also a list of lists, but here each sublist reflects all of the values that appear in the row of that index. In other words, each sublist in our labels is of the same length as the entire dataframe, and the value of each row is one of the possible values defined in our associated level (above). Looking again at our first index level ('Date'), we see \[0, 1, 1, 2, 2, 0, 3, 3, 2, 2, 2\]. There are just an enumerated representation of the options defined in our level, so 0 = '2018-07-10', 1 = '2018-07-11', 2 = '2018-07-12', and 3 = '2018-07-13'.

**'names'** is a list of the actual titles of each index level, in order of appearance from left to right.

With that fresh understanding of the 'anatomy' of a MultiIndex, we can look at...

# Other Methods of Multiindex DataFrame Creation

For the most part, the two references listed in the section above cover this topic well; however, a common use case that isn't covered in those guides is creating a multiindex DataFrame while reading from a csv:

In [10]:
# We can set a MultiIndex while reading a csv by referencing columns to be used in the index by number
display(pd.read_csv("data.csv", index_col=[0, 1, 2, 3, 4, 5], skipinitialspace=True, parse_dates=['Date']))

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Dollars,Units
Date,Store,Category,Subcategory,UPC EAN,Description,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-07-10,Store 2,Beer,Ales,737000000000.0,Goose Island - Honkers Ale - 6 Pack,92,9
2018-07-10,Store 1,Beer,Ales,737000000000.0,Goose Island - Honkers Ale - 6 Pack,90,9
2018-07-11,Store 1,Beer,Lagers,737000000000.0,Brand2 - RandomName1 - 6 Pack,47,6
2018-07-11,Store 2,Beer,Stouts,737000000000.0,Brand2 - RandomName2 - 6 Pack,47,6
2018-07-12,Store 1,Beer,Ales,737000000000.0,Goose Island - Honkers Ale - 6 Pack,104,9
2018-07-12,Store 3,Beer,Malts,737000000000.0,Goose Island - Honkers Ale - 6 Pack,90,9
2018-07-10,Store 3,Wine,Red,737000000000.0,Goose Island - Honkers Ale - 6 Pack,90,9
2018-07-13,Store 2,Wine,White,737000000000.0,Goose Island - Honkers Ale - 6 Pack,90,9
2018-07-13,Store 3,Wine,Rose,737000000000.0,Goose Island - Honkers Ale - 6 Pack,90,9
2018-07-12,Store 1,Alcohol,Liqour,9740000000000.0,Goose Island - Honkers Ale - 6 Pack,90,9


We'll review more advanced importing/exporting methods below.

# MultiIndex Columns (Multiple Column Levels)

For a different view we can also create hierarchical column levels. For example, let's say we want to more easily compare sales of a product by store by day:

In [11]:
multi_col_lvl_df = df.unstack('Store', fill_value="")
multi_col_lvl_df.head(3)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Dollars,Dollars,Dollars,Units,Units,Units
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Store,Store 1,Store 2,Store 3,Store 1,Store 2,Store 3
Date,Department,Category,Subcategory,UPC EAN,Description,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2
2018-07-10,Booooze,Beer,Ales,737000000000.0,Goose Island - Honkers Ale - 6 Pack,90.0,92.0,,9.0,9.0,
2018-07-10,Booooze,Wine,Red,737000000000.0,Goose Island - Honkers Ale - 6 Pack,,,90.0,,,9.0
2018-07-11,Booooze,Beer,Lagers,737000000000.0,Brand2 - RandomName1 - 6 Pack,47.0,,,6.0,,


The new view makes our comparison easier, but now it's a bit cluttered. Internally our multi-level columns are stored as tuples of the name values for each level, so we can easily fix the clutter by flattening the columns into a single level:

In [12]:
def flatten_cols(df: pd.DataFrame, delim: str = ""):
    """Flatten multiple column levels of the DataFrame into a one column level.

    Args:
        delim: the delimiter between the column values

    Returns:
        A copy of the dataframe with the new column names.

    """
    new_cols = [delim.join((col_lev for col_lev in tup if col_lev))
                for tup in df.columns.values]
    ndf = df.copy()
    ndf.columns = new_cols

    return ndf

flatten_cols(multi_col_lvl_df, " - ").head(3)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Dollars - Store 1,Dollars - Store 2,Dollars - Store 3,Units - Store 1,Units - Store 2,Units - Store 3
Date,Department,Category,Subcategory,UPC EAN,Description,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2018-07-10,Booooze,Beer,Ales,737000000000.0,Goose Island - Honkers Ale - 6 Pack,90.0,92.0,,9.0,9.0,
2018-07-10,Booooze,Wine,Red,737000000000.0,Goose Island - Honkers Ale - 6 Pack,,,90.0,,,9.0
2018-07-11,Booooze,Beer,Lagers,737000000000.0,Brand2 - RandomName1 - 6 Pack,47.0,,,6.0,,


# Importing/Exporting MultiIndex DataFrames

It's worth noting that if we wrote this dataframe as it is to a csv, it complicates our read_csv() parameters just a bit. To reread a multiindex DataFrame that has both multiple index levels and column levels after it's been written to a csv, we need to add the header parameter:

In [13]:
# Write our multi-column-level df
multi_col_lvl_df.to_csv('multi_col_lvl_output.csv')

# Reading it back in requires the header parameter
bad_dtypes_df = pd.read_csv('multi_col_lvl_output.csv', header=[0, 1], index_col=[0, 1, 2, 3, 4, 5],
                            skipinitialspace=True).head(3)

bad_dtypes_df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Dollars,Dollars,Dollars,Units,Units,Units
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Store,Store 1,Store 2,Store 3,Store 1,Store 2,Store 3
Date,Department,Category,Subcategory,UPC EAN,Description,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2
2018-07-10,Booooze,Beer,Ales,737000000000.0,Goose Island - Honkers Ale - 6 Pack,90.0,92.0,,9.0,9.0,
2018-07-10,Booooze,Wine,Red,737000000000.0,Goose Island - Honkers Ale - 6 Pack,,,90.0,,,9.0
2018-07-11,Booooze,Beer,Lagers,737000000000.0,Brand2 - RandomName1 - 6 Pack,47.0,,,6.0,,


This import looks good, but as you've probably guessed from the name it has dtype problems. Note that one thing was missing from this read_csv(): we didn't parse the date (parse_dates=\['Date'\]), because read_csv doesn't understand that 'Date' is our index header since it's not in the first row with values. So instead, we're stuck with Pandas' guess that our 'Date' column is of dtype 'Object':

In [15]:
# A function to check our index level dtypes to aid this example
def index_level_dtypes(df):
    return [f"{df.index.names[i]}: {df.index.get_level_values(n).dtype}"
            for i, n in enumerate(df.index.names)]

index_level_dtypes(bad_dtypes_df)

['Date: object',
 'Department: object',
 'Category: object',
 'Subcategory: object',
 'UPC EAN: float64',
 'Description: object']

Updating the dtypes of our index columns isn't so simple, though, because our MultiIndex levels are immutable. To make any changes to the levels, we actually have to recreate the levels:

In [16]:
bad_dtypes_df.index.set_levels([pd.to_datetime(bad_dtypes_df.index.levels[0]), bad_dtypes_df.index.levels[1],
                                bad_dtypes_df.index.levels[2], bad_dtypes_df.index.levels[3],
                                bad_dtypes_df.index.levels[4], bad_dtypes_df.index.levels[5]],
                               inplace=True)
index_level_dtypes(bad_dtypes_df)

['Date: datetime64[ns]',
 'Department: object',
 'Category: object',
 'Subcategory: object',
 'UPC EAN: float64',
 'Description: object']

That's an awful lot of work; and reading files that were written directly from multiindex DataFrames can similarly be problematic for other parameters. Since this is a realistic type of issue to come across (albeit not an everyday one), how should we actually deal with these types of issue? By changing our workflow when more complex parameterization would be needed on import.

Instead, we will opt to write more rereadable csvs in the first place (when we control the data at least): 
1. save a copy of our MultiIndex object,
2. reset our index, and
3. write our csv. 

That way, we can read our csv as we normally would and apply our MultIndex object to the DataFrame.

# In Progress Below This Points

In [57]:
rereadable_multi_df_index = multi_col_lvl_df.index
.columns.names
rereadable_multi_df = multi_col_lvl_df.rename(columns={'Store': 'T'}).reset_index(col_level=0, col_fill='genus')
display(multi_col_lvl_df)
display(rereadable_multi_df)
rereadable_multi_df.to_csv('readable_output.csv')

read_df = pd.read_csv('readable_output.csv', header=[0, 1], skipinitialspace=True)
#read_df.columns=pd.MultiIndex.from_tuples(read_df.columns)
read_df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Dollars,Dollars,Dollars,Units,Units,Units
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Store,Store 1,Store 2,Store 3,Store 1,Store 2,Store 3
Date,Department,Category,Subcategory,UPC EAN,Description,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2
2018-07-10,Booooze,Beer,Ales,737000000000.0,Goose Island - Honkers Ale - 6 Pack,90.0,92.0,,9.0,9.0,
2018-07-10,Booooze,Wine,Red,737000000000.0,Goose Island - Honkers Ale - 6 Pack,,,90.0,,,9.0
2018-07-11,Booooze,Beer,Lagers,737000000000.0,Brand2 - RandomName1 - 6 Pack,47.0,,,6.0,,
2018-07-11,Booooze,Beer,Stouts,737000000000.0,Brand2 - RandomName2 - 6 Pack,,47.0,,,6.0,
2018-07-12,Booooze,Alcohol,Liqour,9740000000000.0,Goose Island - Honkers Ale - 6 Pack,90.0,,,9.0,,
2018-07-12,Booooze,Alcohol,Liquor,9740000000000.0,Goose Island - Honkers Ale - 6 Pack,,90.0,90.0,,9.0,9.0
2018-07-12,Booooze,Beer,Ales,737000000000.0,Goose Island - Honkers Ale - 6 Pack,104.0,,,9.0,,
2018-07-12,Booooze,Beer,Malts,737000000000.0,Goose Island - Honkers Ale - 6 Pack,,,90.0,,,9.0
2018-07-13,Booooze,Wine,Rose,737000000000.0,Goose Island - Honkers Ale - 6 Pack,,,90.0,,,9.0
2018-07-13,Booooze,Wine,White,737000000000.0,Goose Island - Honkers Ale - 6 Pack,,90.0,,,9.0,


Unnamed: 0_level_0,Date,Department,Category,Subcategory,UPC EAN,Description,Dollars,Dollars,Dollars,Units,Units,Units
Store,genus,genus,genus,genus,genus,genus,Store 1,Store 2,Store 3,Store 1,Store 2,Store 3
0,2018-07-10,Booooze,Beer,Ales,737000000000.0,Goose Island - Honkers Ale - 6 Pack,90.0,92.0,,9.0,9.0,
1,2018-07-10,Booooze,Wine,Red,737000000000.0,Goose Island - Honkers Ale - 6 Pack,,,90.0,,,9.0
2,2018-07-11,Booooze,Beer,Lagers,737000000000.0,Brand2 - RandomName1 - 6 Pack,47.0,,,6.0,,
3,2018-07-11,Booooze,Beer,Stouts,737000000000.0,Brand2 - RandomName2 - 6 Pack,,47.0,,,6.0,
4,2018-07-12,Booooze,Alcohol,Liqour,9740000000000.0,Goose Island - Honkers Ale - 6 Pack,90.0,,,9.0,,
5,2018-07-12,Booooze,Alcohol,Liquor,9740000000000.0,Goose Island - Honkers Ale - 6 Pack,,90.0,90.0,,9.0,9.0
6,2018-07-12,Booooze,Beer,Ales,737000000000.0,Goose Island - Honkers Ale - 6 Pack,104.0,,,9.0,,
7,2018-07-12,Booooze,Beer,Malts,737000000000.0,Goose Island - Honkers Ale - 6 Pack,,,90.0,,,9.0
8,2018-07-13,Booooze,Wine,Rose,737000000000.0,Goose Island - Honkers Ale - 6 Pack,,,90.0,,,9.0
9,2018-07-13,Booooze,Wine,White,737000000000.0,Goose Island - Honkers Ale - 6 Pack,,90.0,,,9.0,


Unnamed: 0_level_0,Unnamed: 0_level_0,Date,Department,Category,Subcategory,UPC EAN,Description,Dollars,Dollars,Dollars,Units,Units,Units
Unnamed: 0_level_1,Store,genus,genus,genus,genus,genus,genus,Store 1,Store 2,Store 3,Store 1,Store 2,Store 3
0,0,2018-07-10,Booooze,Beer,Ales,737000000000.0,Goose Island - Honkers Ale - 6 Pack,90.0,92.0,,9.0,9.0,
1,1,2018-07-10,Booooze,Wine,Red,737000000000.0,Goose Island - Honkers Ale - 6 Pack,,,90.0,,,9.0
2,2,2018-07-11,Booooze,Beer,Lagers,737000000000.0,Brand2 - RandomName1 - 6 Pack,47.0,,,6.0,,
3,3,2018-07-11,Booooze,Beer,Stouts,737000000000.0,Brand2 - RandomName2 - 6 Pack,,47.0,,,6.0,
4,4,2018-07-12,Booooze,Alcohol,Liqour,9740000000000.0,Goose Island - Honkers Ale - 6 Pack,90.0,,,9.0,,
5,5,2018-07-12,Booooze,Alcohol,Liquor,9740000000000.0,Goose Island - Honkers Ale - 6 Pack,,90.0,90.0,,9.0,9.0
6,6,2018-07-12,Booooze,Beer,Ales,737000000000.0,Goose Island - Honkers Ale - 6 Pack,104.0,,,9.0,,
7,7,2018-07-12,Booooze,Beer,Malts,737000000000.0,Goose Island - Honkers Ale - 6 Pack,,,90.0,,,9.0
8,8,2018-07-13,Booooze,Wine,Rose,737000000000.0,Goose Island - Honkers Ale - 6 Pack,,,90.0,,,9.0
9,9,2018-07-13,Booooze,Wine,White,737000000000.0,Goose Island - Honkers Ale - 6 Pack,,90.0,,,9.0,


In [32]:
multi_col_lvl_df.to_csv('output2.csv')
df3 = pd.read_csv('output2.csv', index_col=[0, 1, 2, 3, 4, 5], header=[0, 1], skipinitialspace=True)
df3
# df3.columns = pd.MultiIndex.from_tuples(df3.columns)
# df3.set_index(['Date', 'Store', 'Category', 'Subcategory', 'Description'], inplace=True)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Dollars,Dollars,Dollars,Units,Units,Units
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Store,Store 1,Store 2,Store 3,Store 1,Store 2,Store 3
Date,Department,Category,Subcategory,UPC EAN,Description,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2
2018-07-10,Booooze,Beer,Ales,737000000000.0,Goose Island - Honkers Ale - 6 Pack,90.0,92.0,,9.0,9.0,
2018-07-10,Booooze,Wine,Red,737000000000.0,Goose Island - Honkers Ale - 6 Pack,,,90.0,,,9.0
2018-07-11,Booooze,Beer,Lagers,737000000000.0,Brand2 - RandomName1 - 6 Pack,47.0,,,6.0,,
2018-07-11,Booooze,Beer,Stouts,737000000000.0,Brand2 - RandomName2 - 6 Pack,,47.0,,,6.0,
2018-07-12,Booooze,Alcohol,Liqour,9740000000000.0,Goose Island - Honkers Ale - 6 Pack,90.0,,,9.0,,
2018-07-12,Booooze,Alcohol,Liquor,9740000000000.0,Goose Island - Honkers Ale - 6 Pack,,90.0,90.0,,9.0,9.0
2018-07-12,Booooze,Beer,Ales,737000000000.0,Goose Island - Honkers Ale - 6 Pack,104.0,,,9.0,,
2018-07-12,Booooze,Beer,Malts,737000000000.0,Goose Island - Honkers Ale - 6 Pack,,,90.0,,,9.0
2018-07-13,Booooze,Wine,Rose,737000000000.0,Goose Island - Honkers Ale - 6 Pack,,,90.0,,,9.0
2018-07-13,Booooze,Wine,White,737000000000.0,Goose Island - Honkers Ale - 6 Pack,,90.0,,,9.0,
