# Level 7: Indexing & MultiIndex (Advanced)

Advanced indexing is key to unlocking the full potential of Pandas for complex data analysis. This level focuses on MultiIndex (or hierarchical indexing), which allows you to have multiple levels of index labels on your axes, enabling sophisticated data slicing and dicing.

In [1]:
import pandas as pd
import numpy as np

## 7.1 MultiIndex (Hierarchical Indexing)

### Creating a MultiIndex

In [2]:
data = {
    'Region': ['North', 'North', 'South', 'South', 'West', 'West'],
    'Product': ['A', 'B', 'A', 'B', 'A', 'B'],
    'Sales': [100, 150, 200, 50, 300, 250],
    'Quantity': [10, 12, 25, 8, 30, 22]
}
df = pd.DataFrame(data)

# Set 'Region' and 'Product' as the index
df_multi = df.set_index(['Region', 'Product'])
df_multi

Unnamed: 0_level_0,Unnamed: 1_level_0,Sales,Quantity
Region,Product,Unnamed: 2_level_1,Unnamed: 3_level_1
North,A,100,10
North,B,150,12
South,A,200,25
South,B,50,8
West,A,300,30
West,B,250,22


### Accessing Data with a MultiIndex

In [3]:
# Accessing the outer level ('Region')
df_multi.loc['North']

Unnamed: 0_level_0,Sales,Quantity
Product,Unnamed: 1_level_1,Unnamed: 2_level_1
A,100,10
B,150,12


In [4]:
# Accessing a specific inner level element requires a tuple
df_multi.loc[('North', 'A')]

Sales       100
Quantity     10
Name: (North, A), dtype: int64

In [5]:
# Get a specific value
df_multi.loc[('North', 'A'), 'Sales']

np.int64(100)

### Swapping Levels (`.swaplevel()`)

In [6]:
# Swap the order of the index levels
df_multi.swaplevel()

Unnamed: 0_level_0,Unnamed: 1_level_0,Sales,Quantity
Product,Region,Unnamed: 2_level_1,Unnamed: 3_level_1
A,North,100,10
B,North,150,12
A,South,200,25
B,South,50,8
A,West,300,30
B,West,250,22


### Unstacking and Stacking
- **`.unstack()`**: Pivots a level of the index labels, turning them into column headers.
- **`.stack()`**: Pivots a level of the column labels, turning them into index labels.

In [7]:
# Unstack the inner level ('Product')
df_unstacked = df_multi.unstack()
df_unstacked

Unnamed: 0_level_0,Sales,Sales,Quantity,Quantity
Product,A,B,A,B
Region,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
North,100,150,10,12
South,200,50,25,8
West,300,250,30,22


In [8]:
# Stack it back
df_unstacked.stack()

  df_unstacked.stack()


Unnamed: 0_level_0,Unnamed: 1_level_0,Sales,Quantity
Region,Product,Unnamed: 2_level_1,Unnamed: 3_level_1
North,A,100,10
North,B,150,12
South,A,200,25
South,B,50,8
West,A,300,30
West,B,250,22


## 7.2 Advanced Indexing

### Index Alignment in Operations
When you perform operations on two DataFrames, Pandas automatically aligns them by their index labels.

In [9]:
s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s2 = pd.Series([10, 20, 30], index=['b', 'c', 'd'])

# Values are added where indices match. Where they don't, the result is NaN.
s1 + s2

a     NaN
b    12.0
c    23.0
d     NaN
dtype: float64

### Reindexing (`.reindex()`)
Conform a DataFrame to a new index, optionally filling missing values.

In [10]:
s = pd.Series(['A', 'B', 'C'], index=[0, 2, 4])
s

0    A
2    B
4    C
dtype: object

In [11]:
# Reindex to a new index, introducing missing values
s.reindex(range(6))

0      A
1    NaN
2      B
3    NaN
4      C
5    NaN
dtype: object

In [12]:
# Reindex and fill missing values using forward-fill
s.reindex(range(6), method='ffill')

0    A
1    A
2    B
3    B
4    C
5    C
dtype: object

### Index Slicing with `pd.IndexSlice`
For a MultiIndex, `pd.IndexSlice` provides a more intuitive way to slice data.

In [13]:
df_multi_sorted = df_multi.sort_index()
df_multi_sorted

Unnamed: 0_level_0,Unnamed: 1_level_0,Sales,Quantity
Region,Product,Unnamed: 2_level_1,Unnamed: 3_level_1
North,A,100,10
North,B,150,12
South,A,200,25
South,B,50,8
West,A,300,30
West,B,250,22


In [14]:
idx = pd.IndexSlice

# Select all rows for 'Product' B
# The first ':' means all of the first level ('Region')
df_multi_sorted.loc[idx[:, 'B'], :]

Unnamed: 0_level_0,Unnamed: 1_level_0,Sales,Quantity
Region,Product,Unnamed: 2_level_1,Unnamed: 3_level_1
North,B,150,12
South,B,50,8
West,B,250,22


In [15]:
# Select all products for regions from 'North' to 'South'
df_multi_sorted.loc[idx['North':'South', :], :]

Unnamed: 0_level_0,Unnamed: 1_level_0,Sales,Quantity
Region,Product,Unnamed: 2_level_1,Unnamed: 3_level_1
North,A,100,10
North,B,150,12
South,A,200,25
South,B,50,8


In [16]:
# Select Sales for Product A in all Regions
df_multi_sorted.loc[idx[:, 'A'], 'Sales']

Region  Product
North   A          100
South   A          200
West    A          300
Name: Sales, dtype: int64