# MultiIndex and Advanced Data Structures in Pandas

A **MultiIndex** in Pandas allows you to have multiple levels of row or column labels, enabling the representation of higher-dimensional data in a 2D DataFrame.

Advantages:
- Organize complex datasets without converting them into multiple separate DataFrames.
- Perform grouped operations more easily.
- Access subsets of data using multiple keys.

Key concepts:
- Creating MultiIndex manually or from arrays/tuples.
- Using `set_index()` to convert columns into a MultiIndex.
- Accessing data with `.loc[]` and `.xs()` (cross-section).
- Stacking/unstacking to reshape data.

MultiIndex is useful in scenarios like time series with multiple categories, sales data per region and product, or experimental results from multiple trials.

In [1]:
import pandas as pd
import numpy as np

In [2]:
# Example sales data for different cities and products
arrays = [
    ["New York", "New York", "Los Angeles", "Los Angeles", "Chicago", "Chicago"],
    ["Apples", "Oranges", "Apples", "Oranges", "Apples", "Oranges"]
]

# Create a MultiIndex from arrays
multi_index = pd.MultiIndex.from_arrays(arrays, names=("City", "Product"))

# Create DataFrame with MultiIndex
sales_data = pd.DataFrame({
    "Sales": np.random.randint(100, 500, size=6),
    "Profit": np.random.randint(10, 100, size=6)
}, index=multi_index)

print("MultiIndex DataFrame:\n", sales_data)

MultiIndex DataFrame:
                      Sales  Profit
City        Product               
New York    Apples     293      42
            Oranges    281      85
Los Angeles Apples     116      20
            Oranges    287      17
Chicago     Apples     125      22
            Oranges    231      23


In [3]:
# Access all sales in New York
print("\nSales in New York:\n", sales_data.loc["New York"])


Sales in New York:
          Sales  Profit
Product               
Apples     293      42
Oranges    281      85


In [4]:
# Access only 'Apples' sales across all cities
print("\nApples Sales Across Cities:\n", sales_data.xs("Apples", level="Product"))


Apples Sales Across Cities:
              Sales  Profit
City                      
New York       293      42
Los Angeles    116      20
Chicago        125      22


In [5]:
# Stack and unstack
stacked = sales_data.stack()
print("\nStacked Data:\n", stacked)

unstacked = sales_data.unstack()
print("\nUnstacked Data:\n", unstacked)


Stacked Data:
 City         Product        
New York     Apples   Sales     293
                      Profit     42
             Oranges  Sales     281
                      Profit     85
Los Angeles  Apples   Sales     116
                      Profit     20
             Oranges  Sales     287
                      Profit     17
Chicago      Apples   Sales     125
                      Profit     22
             Oranges  Sales     231
                      Profit     23
dtype: int32

Unstacked Data:
              Sales         Profit        
Product     Apples Oranges Apples Oranges
City                                     
Chicago        125     231     22      23
Los Angeles    116     287     20      17
New York       293     281     42      85


In [6]:
# Reset index to turn MultiIndex back into columns
reset_df = sales_data.reset_index()
print("\nDataFrame with Reset Index:\n", reset_df)


DataFrame with Reset Index:
           City  Product  Sales  Profit
0     New York   Apples    293      42
1     New York  Oranges    281      85
2  Los Angeles   Apples    116      20
3  Los Angeles  Oranges    287      17
4      Chicago   Apples    125      22
5      Chicago  Oranges    231      23


# Real-World Analogy: Library Catalog System

Think of a MultiIndex like organizing books in a library:
- **Level 1 (City)**: The branch of the library (New York, Los Angeles, Chicago).
- **Level 2 (Product)**: The category of the books (Fiction, Non-fiction).

This two-level labeling lets you quickly find:
- All books in a specific branch.
- All books in a certain category across all branches.
