# Part 17: MultiIndex and Advanced Indexing in Pandas

In this notebook, we'll explore:
- Set operations on Index objects
- Handling missing values in Index
- Working with MultiIndex
- Renaming levels and names
- Sorting a MultiIndex

## Setup
First, let's import the necessary libraries:

In [None]:
import pandas as pd
import numpy as np
import random

## 1. Set Operations on Index Objects

The two main operations are union (`|`) and intersection (`&`). These can be directly called as instance methods or used via overloaded operators. Difference is provided via the `.difference()` method.

In [None]:
a = pd.Index(['c', 'b', 'a'])
b = pd.Index(['c', 'e', 'd'])

# Union
a | b

In [None]:
# Intersection
a & b

In [None]:
# Difference
a.difference(b)

Also available is the symmetric_difference (`^`) operation, which returns elements that appear in either idx1 or idx2, but not in both.

In [None]:
idx1 = pd.Index([1, 2, 3, 4])
idx2 = pd.Index([2, 3, 4, 5])

# Symmetric difference using method
idx1.symmetric_difference(idx2)

In [None]:
# Symmetric difference using operator
idx1 ^ idx2

When performing `Index.union()` between indexes with different dtypes, the indexes must be cast to a common dtype. Typically, though not always, this is object dtype. The exception is when performing a union between integer and float data.

In [None]:
idx1 = pd.Index([0, 1, 2])
idx2 = pd.Index([0.5, 1.5])

# Union of integer and float indices
idx1 | idx2

## 2. Missing Values in Index

Even though Index can hold missing values (NaN), it should be avoided if you do not want any unexpected results. For example, some operations exclude missing values implicitly.

`Index.fillna` fills missing values with specified scalar value.

In [None]:
idx1 = pd.Index([1, np.nan, 3, 4])
idx1

In [None]:
# Fill NaN values with 2
idx1.fillna(2)

In [None]:
# DatetimeIndex with NaT
idx2 = pd.DatetimeIndex([pd.Timestamp('2011-01-01'),
                         pd.NaT,
                         pd.Timestamp('2011-01-03')])
idx2

In [None]:
# Fill NaT values with a timestamp
idx2.fillna(pd.Timestamp('2011-01-02'))

## 3. Working with MultiIndex

A MultiIndex represents an ordered, tree-like structure of Python objects that provides multiple paths through the index to the same position in the data. It enables storing and manipulating data with an arbitrary number of dimensions in lower dimensional data structures like Series (1d) and DataFrame (2d).

In [None]:
# Create a MultiIndex from product
index = pd.MultiIndex.from_product([range(3), ['one', 'two']], names=['first', 'second'])
index

In [None]:
# Access levels of a MultiIndex
index.levels[1]

In [None]:
# Set levels of a MultiIndex
index.set_levels(["a", "b"], level=1)

## 4. Reordering Levels with reorder_levels

The `reorder_levels()` method generalizes the `swaplevel` method, allowing you to permute the hierarchical index levels in one step.

In [None]:
# Create a DataFrame with MultiIndex
arrays = [['one', 'one', 'zero', 'zero'], ['y', 'x', 'y', 'x']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples)
df = pd.DataFrame(np.random.randn(4, 2), index=index)
df

In [None]:
# Reorder levels
df.reorder_levels([1, 0], axis=0)

## 5. Renaming Names of an Index or MultiIndex

The `rename()` method is used to rename the labels of a MultiIndex, and is typically used to rename the columns of a DataFrame.

In [None]:
# Rename columns
df.rename(columns={0: "col0", 1: "col1"})

In [None]:
# Rename specific labels of the main index
df.rename(index={"one": "two", "y": "z"})

The `rename_axis()` method is used to rename the name of a Index or MultiIndex. In particular, the names of the levels of a MultiIndex can be specified.

In [None]:
# Rename axis names
df.rename_axis(index=['abc', 'def'])

In [None]:
# Rename column index name
df.rename_axis(columns="Cols").columns

When working with an Index object directly, rather than via a DataFrame, `Index.set_names()` can be used to change the names.

In [None]:
mi = pd.MultiIndex.from_product([[1, 2], ['a', 'b']], names=['x', 'y'])
mi.names

In [None]:
# Rename a specific level
mi2 = mi.rename("new name", level=0)
mi2

You cannot set the names of the MultiIndex via a level directly. This will raise a RuntimeError:

In [None]:
# This will raise an error
# mi.levels[0].name = "name via level"

## 6. Sorting a MultiIndex

For MultiIndex-ed objects to be indexed and sliced effectively, they need to be sorted. As with any index, you can use `sort_index()`.

In [None]:
# Create a list of tuples for MultiIndex
tuples = [('foo', 'one'), ('foo', 'two'), ('bar', 'one'), ('bar', 'two'), ('qux', 'one'), ('qux', 'two')]

# Shuffle the tuples
random.shuffle(tuples)

# Create a Series with MultiIndex
s = pd.Series(np.random.randn(6), index=pd.MultiIndex.from_tuples(tuples))
s

In [None]:
# Sort the index
s = s.sort_index()
s