# MultiIndex

In [105]:
import pandas as pd

## This Module's Dataset

In [106]:
bigmac: pd.DataFrame = pd.read_csv("bigmac.csv", parse_dates=["Date"], date_format= "%Y-%m-%d")

bigmac["Country"] = bigmac["Country"].astype("category")

bigmac2 : pd.DataFrame = pd.read_csv("bigmac.csv", index_col= ["Country"], parse_dates=["Date"], date_format= "%Y-%m-%d")


In [111]:
brazil_appearances = bigmac["Country"] == "Brazil"
brazil = bigmac[brazil_appearances].copy()

brazil.set_index(["Country","Date"])

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Country,Date,Unnamed: 2_level_1
Brazil,2000-04-01,1.648045
Brazil,2001-04-01,1.643836
Brazil,2002-04-01,1.538462
Brazil,2003-04-01,1.482085
Brazil,2004-05-01,1.698113
Brazil,2005-06-01,2.393703
Brazil,2006-01-01,2.741543
Brazil,2006-05-01,2.777175
Brazil,2007-01-01,2.999766
Brazil,2007-06-01,3.6069


## Create a MultiIndex
- A **MultiIndex** is an index with multiple levels or layers.
- Pass the `set_index` method a list of colum names to create a multi-index **DataFrame**.
- The order of the list's values will determine the order of the levels.
- Alternatively, we can pass the `read_csv` function's `index_col` parameter a list of columns.

In [40]:
bigmac = bigmac.set_index(keys = ["Country", "Date"])

In [41]:
bigmac.loc[ "Brazil"]

Unnamed: 0_level_0,Price in US Dollars
Date,Unnamed: 1_level_1
2000-04-01,1.648045
2001-04-01,1.643836
2002-04-01,1.538462
2003-04-01,1.482085
2004-05-01,1.698113
2005-06-01,2.393703
2006-01-01,2.741543
2006-05-01,2.777175
2007-01-01,2.999766
2007-06-01,3.6069


## Extract Index Level Values
- The `get_level_values` method extracts an **Index** with the values from one level in the **MultiIndex**.
- Invoke the `get_level_values` on the **MultiIndex**, not the **DataFrame** itself.
- The method expects either the level's index position or its name.

## Rename Index Levels
- Invoke the `set_names` method on the **MultiIndex** to change one or more level names.
- Use the `names` and `level` parameter to target a nested index at a given level.
- Alternatively, pass `names` a list of strings to overwrite *all* level names.
- The `set_names` method returns a copy, so replace the original index to alter the **DataFrame**.

In [42]:
bigmac

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Country,Date,Unnamed: 2_level_1
Argentina,2000-04-01,2.500000
Australia,2000-04-01,1.541667
Brazil,2000-04-01,1.648045
Canada,2000-04-01,1.938776
Switzerland,2000-04-01,3.470588
...,...,...
Ukraine,2020-07-01,2.174714
Uruguay,2020-07-01,4.327418
United States,2020-07-01,5.710000
Vietnam,2020-07-01,2.847282


## The sort_index Method on a MultiIndex DataFrame
- Using the `sort_index` method, we can target all levels or specific levels of the **MultiIndex**.
- To apply a different sort order to different levels, pass a list of Booleans.

In [43]:
bigmac.loc[("Brazil")]

Unnamed: 0_level_0,Price in US Dollars
Date,Unnamed: 1_level_1
2000-04-01,1.648045
2001-04-01,1.643836
2002-04-01,1.538462
2003-04-01,1.482085
2004-05-01,1.698113
2005-06-01,2.393703
2006-01-01,2.741543
2006-05-01,2.777175
2007-01-01,2.999766
2007-06-01,3.6069


## Extract Rows from a MultiIndex DataFrame
- A **tuple** is an immutable list. It cannot be modified after creation.
- Create a tuple with a comma between elements. The community convention is to wrap the elements in parentheses.
- The `iloc` and `loc` accessors are available to extract rows by index position or label.
- For the `loc` accessor, pass a tuple to hold the labels from the index levels.

In [44]:
bigmac

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Country,Date,Unnamed: 2_level_1
Argentina,2000-04-01,2.500000
Australia,2000-04-01,1.541667
Brazil,2000-04-01,1.648045
Canada,2000-04-01,1.938776
Switzerland,2000-04-01,3.470588
...,...,...
Ukraine,2020-07-01,2.174714
Uruguay,2020-07-01,4.327418
United States,2020-07-01,5.710000
Vietnam,2020-07-01,2.847282


In [45]:
#bigmac.loc[("Brazil","2000-04-01"):("Brazil","2000-10-01") ]

## The transpose Method
- The `transpose` method inverts/flips the horizontal and vertical axes of the **DataFrame**.

In [46]:
bigmac = bigmac.sort_index()

bigmac.loc[("Argentina"): ("Brazil")].transpose()

Country,Argentina,Argentina,Argentina,Argentina,Argentina,Argentina,Argentina,Argentina,Argentina,Argentina,...,Brazil,Brazil,Brazil,Brazil,Brazil,Brazil,Brazil,Brazil,Brazil,Brazil
Date,2000-04-01,2001-04-01,2002-04-01,2003-04-01,2004-05-01,2005-06-01,2006-01-01,2006-05-01,2007-01-01,2007-06-01,...,2016-01-01,2016-07-01,2017-01-01,2017-07-01,2018-01-01,2018-07-01,2019-01-01,2019-07-09,2020-01-14,2020-07-01
Price in US Dollars,2.5,2.5,0.798722,1.423611,1.477966,1.639627,1.550362,2.290201,2.670983,2.668607,...,3.354204,4.781811,5.117945,5.101568,5.111683,4.402934,4.545516,4.596433,4.804558,3.913528


In [66]:
test= pd.read_csv("worldstats.csv", index_col=["country"]).sort_index()
test

Unnamed: 0_level_0,year,Population,GDP
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Afghanistan,2003,22507368.0,4.583649e+09
Afghanistan,1964,9728645.0,8.000000e+08
Afghanistan,1965,9935358.0,1.006667e+09
Afghanistan,1966,10148841.0,1.400000e+09
Afghanistan,1967,10368600.0,1.673333e+09
...,...,...,...
Zimbabwe,1993,11256512.0,6.563813e+09
Zimbabwe,1992,11019717.0,6.751472e+09
Zimbabwe,1991,10763036.0,8.641482e+09
Zimbabwe,1989,10184966.0,8.286323e+09


## The stack Method
- The `stack` method moves the column index to the row index.
- Pandas will return a **MultiIndex Series**.
- Think of it like "stacking" index levels for a **MultiIndex**.

In [47]:
w_stats = pd.read_csv("worldstats.csv", index_col=["country","year"]).sort_index()
w_stats

Unnamed: 0_level_0,Unnamed: 1_level_0,Population,GDP
country,year,Unnamed: 2_level_1,Unnamed: 3_level_1
Afghanistan,1960,8994793.0,5.377778e+08
Afghanistan,1961,9164945.0,5.488889e+08
Afghanistan,1962,9343772.0,5.466667e+08
Afghanistan,1963,9531555.0,7.511112e+08
Afghanistan,1964,9728645.0,8.000000e+08
...,...,...,...
Zimbabwe,2011,14255592.0,1.095623e+10
Zimbabwe,2012,14565482.0,1.239272e+10
Zimbabwe,2013,14898092.0,1.349023e+10
Zimbabwe,2014,15245855.0,1.419691e+10


In [48]:
stacked = w_stats.stack().to_frame()
stacked.loc["Afghanistan",1960,"Population"] # = 8.994793e+06
stacked.loc["Afghanistan",1960,"GDP"] # = 5.377778e+08
stacked

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,0
country,year,Unnamed: 2_level_1,Unnamed: 3_level_1
Afghanistan,1960,Population,8.994793e+06
Afghanistan,1960,GDP,5.377778e+08
Afghanistan,1961,Population,9.164945e+06
Afghanistan,1961,GDP,5.488889e+08
Afghanistan,1962,Population,9.343772e+06
...,...,...,...
Zimbabwe,2013,GDP,1.349023e+10
Zimbabwe,2014,Population,1.524586e+07
Zimbabwe,2014,GDP,1.419691e+10
Zimbabwe,2015,Population,1.560275e+07


In [57]:
stacked.unstack(level=2)



Unnamed: 0_level_0,Unnamed: 1_level_0,0,0
Unnamed: 0_level_1,Unnamed: 1_level_1,Population,GDP
country,year,Unnamed: 2_level_2,Unnamed: 3_level_2
Afghanistan,1960,8994793.0,5.377778e+08
Afghanistan,1961,9164945.0,5.488889e+08
Afghanistan,1962,9343772.0,5.466667e+08
Afghanistan,1963,9531555.0,7.511112e+08
Afghanistan,1964,9728645.0,8.000000e+08
...,...,...,...
Zimbabwe,2011,14255592.0,1.095623e+10
Zimbabwe,2012,14565482.0,1.239272e+10
Zimbabwe,2013,14898092.0,1.349023e+10
Zimbabwe,2014,15245855.0,1.419691e+10


In [62]:
w_stats.unstack(level=1)

Unnamed: 0_level_0,Population,Population,Population,Population,Population,Population,Population,Population,Population,Population,...,GDP,GDP,GDP,GDP,GDP,GDP,GDP,GDP,GDP,GDP
year,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,...,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015
country,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
Afghanistan,8.994793e+06,9.164945e+06,9.343772e+06,9.531555e+06,9.728645e+06,9.935358e+06,1.014884e+07,1.036860e+07,1.059979e+07,1.084951e+07,...,7.057598e+09,9.843842e+09,1.019053e+10,1.248694e+10,1.593680e+10,1.793024e+10,2.053654e+10,2.004633e+10,2.005019e+10,1.919944e+10
Albania,,,,,,,,,,,...,8.992642e+09,1.070101e+10,1.288135e+10,1.204421e+10,1.192695e+10,1.289087e+10,1.231978e+10,1.278103e+10,1.327796e+10,1.145560e+10
Algeria,1.112489e+07,1.140486e+07,1.169015e+07,1.198513e+07,1.229597e+07,1.262695e+07,1.298027e+07,1.335420e+07,1.374438e+07,1.414444e+07,...,1.170273e+11,1.349771e+11,1.710007e+11,1.372110e+11,1.612073e+11,2.000131e+11,2.090474e+11,2.097035e+11,2.135185e+11,1.668386e+11
Andorra,,,,,,,,,,,...,3.536452e+09,4.010785e+09,4.001349e+09,3.649863e+09,3.346317e+09,3.427236e+09,3.146178e+09,3.249101e+09,,
Angola,,,,,,,,,,,...,4.178948e+10,6.044892e+10,8.417803e+10,7.549238e+10,8.247091e+10,1.041159e+11,1.153984e+11,1.249121e+11,1.267751e+11,1.026431e+11
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
West Bank and Gaza,,,,,,,,,,,...,4.910100e+09,5.505800e+09,6.673500e+09,7.268200e+09,8.913100e+09,1.045985e+10,1.127940e+10,1.247600e+10,1.271560e+10,1.267740e+10
World,3.035056e+09,3.076121e+09,3.129064e+09,3.193947e+09,3.259355e+09,3.326054e+09,3.395866e+09,3.465297e+09,3.535512e+09,3.609910e+09,...,5.107451e+13,5.758343e+13,6.312856e+13,5.983553e+13,6.564782e+13,7.284314e+13,7.442836e+13,7.643132e+13,7.810634e+13,7.343364e+13
"Yemen, Rep.",,,,,,,,,,,...,1.908173e+10,2.563367e+10,3.039720e+10,2.845950e+10,3.090675e+10,3.107886e+10,3.207477e+10,3.595450e+10,,
Zambia,3.049586e+06,3.142848e+06,3.240664e+06,3.342894e+06,3.449266e+06,3.559687e+06,3.674088e+06,3.792864e+06,3.916928e+06,4.047479e+06,...,1.275686e+10,1.405696e+10,1.791086e+10,1.532834e+10,2.026555e+10,2.345952e+10,2.550306e+10,2.804552e+10,2.713464e+10,2.120156e+10


## The unstack Method
- The `unstack` method moves a row index to the column index (the inverse of the `stack` method).
- By default, the `unstack` method will move the innermost index.
- We can customize the moved index with the `level` parameter.
- The `level` parameter accepts the level's index position or its name. It can also accept a list of positions/names.

In [80]:
quarters = pd.read_csv("quarters.csv")
quarters
quarters.melt()

Unnamed: 0,Salesman,Q1,Q2,Q3,Q4
0,Boris,602908,233879,354479,32704
1,Piers,43790,514863,297151,544493
2,Tommy,392668,113579,430882,247231
3,Travis,834663,266785,749238,570524
4,Cindy,580935,411379,110390,651572
5,Rob,656644,70803,375948,321388
6,Mike,486141,600753,742716,404995
7,Stacy,479662,742806,770712,2501
8,Alexandra,992673,879183,37945,293710


In [86]:
quarters2 = quarters.melt(id_vars=("Salesman"),var_name="Quarter", value_name="Revenue")
quarters2


Unnamed: 0,Salesman,Quarter,Revenue
0,Boris,Q1,602908
1,Piers,Q1,43790
2,Tommy,Q1,392668
3,Travis,Q1,834663
4,Cindy,Q1,580935
5,Rob,Q1,656644
6,Mike,Q1,486141
7,Stacy,Q1,479662
8,Alexandra,Q1,992673
9,Boris,Q2,233879


## The pivot Method
- The `pivot` method reshapes data from a tall format to a wide format.
- Ask yourself which direction the data will expand in if you add more entries.
- A tall/long format expands down. A wide format expands out.
- The `index` parameter sets the horizontal index of the pivoted **DataFrame**.
- The `columns` parameter sets the column whose values will be the columns in the pivoted **DataFrame**.
- The `values` parameter set the values of the pivoted **DataFrame**. Pandas will populate the correct values based on the index and column intersections.

In [92]:
salesman = pd.read_csv("salesmen.csv")
salesman


Unnamed: 0,Date,Salesman,Revenue
0,1/1/2025,Sharon,7172
1,1/2/2025,Sharon,6362
2,1/3/2025,Sharon,5982
3,1/4/2025,Sharon,7917
4,1/5/2025,Sharon,7837
...,...,...,...
1820,12/27/2025,Oscar,835
1821,12/28/2025,Oscar,3073
1822,12/29/2025,Oscar,6424
1823,12/30/2025,Oscar,7088


In [87]:
salesman2 = salesman.pivot(index="Date",columns="Salesman",values="Revenue")
salesman2

Salesman,Alexander,Dave,Oscar,Ronald,Sharon
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1/1/2025,4430,1864,5250,2639,7172
1/10/2025,301,7105,7663,8267,7543
1/11/2025,9489,6851,8888,1340,1053
1/12/2025,8719,7147,3092,279,4362
1/13/2025,2349,6160,6139,7540,6812
...,...,...,...,...,...
9/5/2025,2439,211,7743,4252,992
9/6/2025,7585,7293,5072,1112,556
9/7/2025,6669,9774,5230,3608,6499
9/8/2025,3058,8194,7755,5762,9621


## The melt Method
- The `melt` method is the inverse of the `pivot` method.
- It takes a 'wide' dataset and converts it to a 'tall' dataset.
- The `melt` method is ideal when you have multiple columns storing the *same* data point.
- Ask yourself whether the column's values are a *type* of the column header. If they're not, the data is likely stored in a wide format.
- The `id_vars` parameters accepts the column whose values will be repeated for every column.
- The `var_name` parameter sets the name of the new column for the varying values (the former column names).
- The `value_name` parameter set the new name of the values column (holding the values from the original **DataFrame**).

## The pivot_table Method
- The `pivot_table` method operates similarly to the Pivot Table feature in Excel.
- A pivot table is a table whose values are aggregations of groups of values from another table.
- The `values` parameter accepts the numeric column whose values will be aggregated.
- The `aggfunc` parameter declares the aggregation function (the default is mean/average).
- The `index` parameter sets the index labels of the pivot table. MultiIndexes are permitted.
- The `columns` parameter sets the column labels of the pivot table. MultiIndexes are permitted.

In [100]:
rest = pd.read_csv("foods.csv")
rest


Unnamed: 0,First Name,Gender,City,Frequency,Item,Spend
0,Wanda,Female,Stamford,Weekly,Burger,15.66
1,Eric,Male,Stamford,Daily,Chalupa,10.56
2,Charles,Male,New York,Never,Sushi,42.14
3,Anna,Female,Philadelphia,Once,Ice Cream,11.01
4,Deborah,Female,Philadelphia,Daily,Chalupa,23.49
...,...,...,...,...,...,...
995,Donna,Female,New York,Monthly,Sushi,83.53
996,Albert,Male,Philadelphia,Daily,Sushi,72.88
997,Jean,Female,Stamford,Weekly,Donut,5.85
998,Jessica,Female,New York,Daily,Chalupa,43.19


In [103]:
rest.pivot_table(values="Spend",index=["Item","City"],columns="Gender")

Unnamed: 0_level_0,Gender,Female,Male
Item,City,Unnamed: 2_level_1,Unnamed: 3_level_1
Burger,New York,51.626667,58.822273
Burger,Philadelphia,52.87871,44.675238
Burger,Stamford,45.037778,46.424516
Burrito,New York,42.563043,55.976
Burrito,Philadelphia,52.098571,43.764333
Burrito,Stamford,53.532647,46.438929
Chalupa,New York,46.135789,49.1108
Chalupa,Philadelphia,52.291562,48.444783
Chalupa,Stamford,64.094,50.011304
Donut,New York,46.670323,44.842333
