# Pandas-6: MultiIndexing & Pivot Tables in Pandas


Both Hierarchical Indexing (MultiIndex) and Pivot Tables in pandas allow you to work with multi-dimensional data in tabular form, but they serve different purposes.

### 1. MultiIndexing (Hierarchical Indexing) in Pandas

* Hierarchical Indexing is a way to have multiple levels of indexing (row or column labels) in a DataFrame or Series.
* It lets you represent higher-dimensional data in a 2D table. How ?

**Why two names?**
- Hierarchical Indexing: emphasizes the tree-like structure of the index (levels within levels).
- MultiIndexing: refers to the actual pandas object (`pandas.MultiIndex`) that implements this feature.

**So, Hierarchical Indexing is the concept, while MultiIndex is the implementation in pandas.**

In [1]:
# import
import numpy as np
import pandas as pd

In [2]:
arrays = [
    ['North', 'North', 'South', 'South'],
    [2024, 2025, 2024, 2025]
]

# creating multi-Index
index = pd.MultiIndex.from_arrays(arrays, names=('Region', 'Year'))
index

MultiIndex([('North', 2024),
            ('North', 2025),
            ('South', 2024),
            ('South', 2025)],
           names=['Region', 'Year'])

In [3]:
print("\nIndex hierarchy type\n",'-'*25, sep='')
print(type(index))


Index hierarchy type
-------------------------
<class 'pandas.core.indexes.multi.MultiIndex'>


In [4]:
# creating DataFrames
df = pd.DataFrame(
    np.random.randint(100, 500, size=(4, 2)),
    index=index,
    columns=['Product_A', 'Product_B']
)
print(df)

             Product_A  Product_B
Region Year                      
North  2024        367        377
       2025        472        499
South  2024        281        230
       2025        355        239


**If we observe above DataFrame - the table is 2D, but the index structure encodes 3D relationships.**
- Dimension 1 (Axis 0): Region
- Dimension 2 (Axis 0): Year
- Dimension 3 (Axis 1): Product
  
**key Points**
- Each extra index level acts like an extra dimension.
- Instead of creating a true 3D array (like in NumPy), pandas flattens the extra dimensions into multi-level row/column labels.

In [5]:
print(df)

             Product_A  Product_B
Region Year                      
North  2024        367        377
       2025        472        499
South  2024        281        230
       2025        355        239


In [7]:
df.index

MultiIndex([('North', 2024),
            ('North', 2025),
            ('South', 2024),
            ('South', 2025)],
           names=['Region', 'Year'])

In [8]:
df.columns

Index(['Product_A', 'Product_B'], dtype='object')

**df.xs()** : Cross-Section selection in a DataFrame

In [6]:
# Slice by region
print(df.xs('North', level='Region'))
print("="*50)
print(df.loc['North'])

      Product_A  Product_B
Year                      
2024        367        377
2025        472        499
      Product_A  Product_B
Year                      
2024        367        377
2025        472        499


In [27]:
# # All regions, year 2024
print(df.xs(2024, level='Year'))
print("="*50)
print(df.loc[pd.IndexSlice[:, 2024], :])

        Product_A  Product_B
Region                      
North         231        391
South         106        267
             Product_A  Product_B
Region Year                      
North  2024        231        391
South  2024        106        267


In [25]:
# slice by product
print(df.xs('Product_A', axis=1))
print("="*50)
print(df.loc[:,'Product_A'])

Region  Year
North   2024    231
        2025    100
South   2024    106
        2025    188
Name: Product_A, dtype: int32
Region  Year
North   2024    231
        2025    100
South   2024    106
        2025    188
Name: Product_A, dtype: int32


In [9]:
# Access a specific year in a region
print(df.loc[('South', 2025)])
print("="*50)
print(df.xs(('South', 2025)))

Product_A    355
Product_B    239
Name: (South, 2025), dtype: int32
Product_A    355
Product_B    239
Name: (South, 2025), dtype: int32


**Aggregation Using MultiIndex**

In [10]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Product_A,Product_B
Region,Year,Unnamed: 2_level_1,Unnamed: 3_level_1
North,2024,367,377
North,2025,472,499
South,2024,281,230
South,2025,355,239


In [11]:
df.index

MultiIndex([('North', 2024),
            ('North', 2025),
            ('South', 2024),
            ('South', 2025)],
           names=['Region', 'Year'])

In [12]:
# Sum of products per region
df.groupby(level='Region').sum()

Unnamed: 0_level_0,Product_A,Product_B
Region,Unnamed: 1_level_1,Unnamed: 2_level_1
North,839,876
South,636,469


In [13]:
# Sum of products per year
df.groupby(level='Year').sum()

Unnamed: 0_level_0,Product_A,Product_B
Year,Unnamed: 1_level_1,Unnamed: 2_level_1
2024,648,607
2025,827,738


**Resetting the MultiIndex**
- If you want to revert to a normal DataFrame

In [14]:
df_reset = df.reset_index()
print(df_reset)

  Region  Year  Product_A  Product_B
0  North  2024        367        377
1  North  2025        472        499
2  South  2024        281        230
3  South  2025        355        239


### 2.Pivot Tables

- A data summarization tool (similar to Excel pivot tables).
- It reshapes the DataFrame by turning unique values of one column into columns and another into index, with aggregation on values.
- Use cases: Summarize large datasets. Create cross-tabulations. Aggregate with functions (mean, sum, count, etc.).

In [15]:
# Sample data
data = {
    'Region': ['North', 'North', 'South', 'South'],
    'Year': [2024, 2025, 2024, 2025],
    'Product': ['Product_A', 'Product_B', 'Product_A', 'Product_B'],
    'Sales': [232, 192, 412, 156]
}

# Create DataFrame
df = pd.DataFrame(data)
df

Unnamed: 0,Region,Year,Product,Sales
0,North,2024,Product_A,232
1,North,2025,Product_B,192
2,South,2024,Product_A,412
3,South,2025,Product_B,156


In [16]:
# Create a pivot table: total sales per Region per Product
pivot = pd.pivot_table(df, 
                       values='Sales', 
                       index='Region', 
                       columns='Product', 
                       aggfunc='sum')

print(pivot)

Product  Product_A  Product_B
Region                       
North          232        192
South          412        156


**cross-tabulation**: A statistical tool mainly used to analyze the relationship between two or more categorical variables.
- default output: Counts / Frequencies

In [19]:
import pandas as pd

# Sample data
data = {
    'Gender': ['Male','Male','Female','Female','Male','Female','Male','Female'],
    'Preference': ['Tea','Coffee','Coffee','Tea','Tea','Coffee','Coffee','Tea']
}
df = pd.DataFrame(data)
df.head()

Unnamed: 0,Gender,Preference
0,Male,Tea
1,Male,Coffee
2,Female,Coffee
3,Female,Tea
4,Male,Tea


In [21]:
# Create cross tab using pivot_table
pivot = pd.pivot_table(df, index='Gender', 
                       columns='Preference', 
                       aggfunc='size')
pivot

Preference,Coffee,Tea
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,2,2
Male,2,2


In [20]:
# Create crosstab
crosstab = pd.crosstab(df['Gender'], df['Preference'])
print(crosstab)

Preference  Coffee  Tea
Gender                 
Female           2    2
Male             2    2


### Summary: Hierarchical Indexing vs Pivot Tables

- **Hierarchical Indexing (MultiIndex)**  
  Use this when you want to **organize and access data with multiple keys**.  
  Example: Accessing sales data by `Region` and `Year`.

- **Pivot Tables**  
  Use this when you want to **summarize or aggregate data into a cross-tab format** (similar to Excel Pivot).  
  Example: Calculating total sales per `Region` per `Product`.


---

Happy Learning ! Team DecodeAiML !!