# MultiIndex

In [None]:
import pandas as pd

## This Module's Dataset

In [None]:
bigmac = pd.read_csv('bigmac.csv', parse_dates=['Date'], date_format='%Y-%m-%d')
bigmac

In [None]:
bigmac.info()

In [None]:
bigmac.dtypes

In [None]:
bigmac.nunique()

## Create a MultiIndex
- A **MultiIndex** is an index with multiple levels or layers.
- Pass the `set_index` method a list of colum names to create a multi-index **DataFrame**.
- The order of the list's values will determine the order of the levels.
- Alternatively, we can pass the `read_csv` function's `index_col` parameter a list of columns.

In [None]:
bigmac.set_index(keys=['Date', 'Country'])

In [None]:
bigmac.set_index(keys=['Country', 'Date']).sort_index()

In [None]:
bigmac = pd.read_csv('bigmac.csv', parse_dates=['Date'], date_format='%Y-%m-%d', index_col=['Date', 'Country']).sort_index()
bigmac

## Extract Index Level Values
- The `get_level_values` method extracts an **Index** with the values from one level in the **MultiIndex**.
- Invoke the `get_level_values` on the **MultiIndex**, not the **DataFrame** itself.
- The method expects either the level's index position or its name.

In [None]:
bigmac = pd.read_csv('bigmac.csv', parse_dates=['Date'], date_format='%Y-%m-%d', index_col=['Date', 'Country']).sort_index()
bigmac

In [None]:
bigmac.index

In [None]:
bigmac.index.get_level_values('Date')

In [None]:
bigmac.index.get_level_values(0)

In [None]:
bigmac.index.get_level_values('Country')

In [None]:
bigmac.index.get_level_values(1)

## Rename Index Levels
- Invoke the `set_names` method on the **MultiIndex** to change one or more level names.
- Use the `names` and `level` parameter to target a nested index at a given level.
- Alternatively, pass `names` a list of strings to overwrite *all* level names.
- The `set_names` method returns a copy, so replace the original index to alter the **DataFrame**.

In [None]:
bigmac = pd.read_csv('bigmac.csv', parse_dates=['Date'], date_format='%Y-%m-%d', index_col=['Date', 'Country']).sort_index()
bigmac

In [None]:
bigmac.index.set_names(names='Time', level='Date')

In [None]:
bigmac

In [None]:
bigmac.index.set_names(names=['Time', 'Location'])

In [None]:
bigmac

In [None]:
bigmac.index = bigmac.index.set_names(names=['Time', 'Location'])

In [None]:
bigmac

## The sort_index Method on a MultiIndex DataFrame
- Using the `sort_index` method, we can target all levels or specific levels of the **MultiIndex**.
- To apply a different sort order to different levels, pass a list of Booleans.

In [None]:
bigmac = pd.read_csv('bigmac.csv', parse_dates=['Date'], date_format='%Y-%m-%d' ,index_col=['Date', 'Country'])
bigmac

In [None]:
bigmac.sort_index()

In [None]:
bigmac.sort_index(ascending=False)

In [None]:
bigmac.sort_index(ascending=[False, True])

In [None]:
bigmac.sort_index(ascending=[True, False])

## Extract Rows from a MultiIndex DataFrame
- A **tuple** is an immutable list. It cannot be modified after creation.
- Create a tuple with a comma between elements. The community convention is to wrap the elements in parentheses.
- The `iloc` and `loc` accessors are available to extract rows by index position or label.
- For the `loc` accessor, pass a tuple to hold the labels from the index levels.

In [None]:
bigmac = pd.read_csv('bigmac.csv', parse_dates=['Date'], index_col=['Date', 'Country']).sort_index()
bigmac

In [None]:
bigmac.iloc[0]

In [None]:
bigmac.iloc[1]

In [None]:
bigmac.iloc[69]

In [None]:
bigmac.loc['2000-04-01', 'Argentina']

In [None]:
bigmac.loc[('2000-04-01', 'Brazil')]
bigmac.loc['2000-04-01', 'Brazil']

In [None]:
bigmac.loc[('2000-04-01', 'Argentina'): ('2000-04-01', 'Brazil')]

In [None]:
bigmac.loc[('2000-04-01', 'Argentina'): ('2000-04-01', 'Brazil'), 'Price in US Dollars']

In [None]:
bigmac.loc[: ('2000-04-01', 'Brazil')]

In [None]:
bigmac.loc[('2020-04-01', 'Brazil'): ]

In [None]:
start = ('2000-04-01', 'Argentina')
end = ('2000-04-01', 'Brazil')
bigmac.loc[start:end]

## The transpose Method
- The `transpose` method inverts/flips the horizontal and vertical axes of the **DataFrame**.

In [None]:
bigmac

In [None]:
start = ('2000-04-01', 'Argentina')
end = ('2002-04-01', 'Brazil')
bigmac.loc[start:end]

In [None]:
bigmac.loc[start:end].transpose()

## The stack Method
- The `stack` method moves the column index to the row index.
- Pandas will return a **MultiIndex Series**.
- Think of it like "stacking" index levels for a **MultiIndex**.

In [None]:
world = pd.read_csv('worldstats.csv', index_col=['year', 'country']).sort_index()
world

## The unstack Method
- The `unstack` method moves a row index to the column index (the inverse of the `stack` method).
- By default, the `unstack` method will move the innermost index.
- We can customize the moved index with the `level` parameter.
- The `level` parameter accepts the level's index position or its name. It can also accept a list of positions/names.

In [None]:
world = pd.read_csv('worldstats.csv', index_col=['year', 'country']).sort_index().stack()
world.head(15)

In [None]:
world.unstack(level=0)

In [None]:
world.unstack(level='year')

In [None]:
world.unstack()

## The pivot Method
- The `pivot` method reshapes data from a tall format to a wide format.
- Ask yourself which direction the data will expand in if you add more entries.
- A tall/long format expands down. A wide format expands out.
- The `index` parameter sets the horizontal index of the pivoted **DataFrame**.
- The `columns` parameter sets the column whose values will be the columns in the pivoted **DataFrame**.
- The `values` parameter set the values of the pivoted **DataFrame**. Pandas will populate the correct values based on the index and column intersections.

In [None]:
salesmen = pd.read_csv('salesmen.csv')
salesmen

In [None]:
salesmen.pivot(index='Date', columns='Salesman', values='Revenue')

## The melt Method
- The `melt` method is the inverse of the `pivot` method.
- It takes a 'wide' dataset and converts it to a 'tall' dataset.
- The `melt` method is ideal when you have multiple columns storing the *same* data point.
- Ask yourself whether the column's values are a *type* of the column header. If they're not, the data is likely stored in a wide format.
- The `id_vars` parameters accepts the column whose values will be repeated for every column.
- The `var_name` parameter sets the name of the new column for the varying values (the former column names).
- The `value_name` parameter set the new name of the values column (holding the values from the original **DataFrame**).

In [None]:
quarters = pd.read_csv('quarters.csv')
quarters

In [None]:
quarters.melt(id_vars='Salesman', var_name='Quarter', value_name='Revenue')

## The pivot_table Method
- The `pivot_table` method operates similarly to the Pivot Table feature in Excel.
- A pivot table is a table whose values are aggregations of groups of values from another table.
- The `values` parameter accepts the numeric column whose values will be aggregated.
- The `aggfunc` parameter declares the aggregation function (the default is mean/average).
- The `index` parameter sets the index labels of the pivot table. MultiIndexes are permitted.
- The `columns` parameter sets the column labels of the pivot table. MultiIndexes are permitted.

In [123]:
foods = pd.read_csv('foods.csv')
foods

Unnamed: 0,First Name,Gender,City,Frequency,Item,Spend
0,Wanda,Female,Stamford,Weekly,Burger,15.66
1,Eric,Male,Stamford,Daily,Chalupa,10.56
2,Charles,Male,New York,Never,Sushi,42.14
3,Anna,Female,Philadelphia,Once,Ice Cream,11.01
4,Deborah,Female,Philadelphia,Daily,Chalupa,23.49
...,...,...,...,...,...,...
995,Donna,Female,New York,Monthly,Sushi,83.53
996,Albert,Male,Philadelphia,Daily,Sushi,72.88
997,Jean,Female,Stamford,Weekly,Donut,5.85
998,Jessica,Female,New York,Daily,Chalupa,43.19


In [125]:
foods.pivot_table(values='Spend', index='Item', aggfunc='sum')

Unnamed: 0_level_0,Spend
Item,Unnamed: 1_level_1
Burger,7765.73
Burrito,8270.44
Chalupa,7644.52
Donut,8758.76
Ice Cream,8886.99
Sushi,8742.93


In [126]:
foods.pivot_table(values='Spend', index='Item', aggfunc='mean')

Unnamed: 0_level_0,Spend
Item,Unnamed: 1_level_1
Burger,49.780321
Burrito,49.22881
Chalupa,52.003537
Donut,46.838289
Ice Cream,50.494261
Sushi,52.668253


In [128]:
foods.pivot_table(values='Spend', index='Gender', aggfunc='mean')

Unnamed: 0_level_0,Spend
Gender,Unnamed: 1_level_1
Female,50.709629
Male,49.397623


In [129]:
foods.pivot_table(values='Spend', index='Gender', aggfunc='max')

Unnamed: 0_level_0,Spend
Gender,Unnamed: 1_level_1
Female,99.51
Male,99.87


In [130]:
foods.pivot_table(values='Spend', index='Gender', aggfunc='min')

Unnamed: 0_level_0,Spend
Gender,Unnamed: 1_level_1
Female,1.02
Male,1.26


In [132]:
foods.pivot_table(values='Spend', index=['Item','City', 'Gender'], aggfunc='sum')

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Spend
Item,City,Gender,Unnamed: 3_level_1
Burger,New York,Female,1239.04
Burger,New York,Male,1294.09
Burger,Philadelphia,Female,1639.24
Burger,Philadelphia,Male,938.18
Burger,Stamford,Female,1216.02
Burger,Stamford,Male,1439.16
Burrito,New York,Female,978.95
Burrito,New York,Male,1399.4
Burrito,Philadelphia,Female,1458.76
Burrito,Philadelphia,Male,1312.93


In [134]:
foods.pivot_table(values='Spend', index='Item',columns=['Gender', 'City'] ,aggfunc='sum')

Gender,Female,Female,Female,Male,Male,Male
City,New York,Philadelphia,Stamford,New York,Philadelphia,Stamford
Item,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Burger,1239.04,1639.24,1216.02,1294.09,938.18,1439.16
Burrito,978.95,1458.76,1820.11,1399.4,1312.93,1300.29
Chalupa,876.58,1673.33,1602.35,1227.77,1114.23,1150.26
Donut,1446.78,1639.26,1656.96,1345.27,1249.36,1421.13
Ice Cream,1521.62,1479.22,1032.03,1603.63,2191.27,1059.22
Sushi,1480.29,1742.88,1459.91,1396.15,1395.88,1267.82
