# Working with Multi-Index DataFrames

Multi-Index DataFrames, also known as hierarchical indexing, allow for more complex data structures with multiple levels of rows or columns. This is particularly useful for representing grouped or hierarchical data.

## Key Topics


### Understanding Multi-Index Structures

A Multi-Index DataFrame has multiple levels of indexing for rows, columns, or both. This enables better organization and analysis of hierarchical data.

**How to Create a Multi-Index DataFrame:**
- Use the `pd.MultiIndex.from_arrays()` or `pd.MultiIndex.from_tuples()` methods to explicitly define hierarchical indices.
- Group data using `groupby()` with multiple keys.

#### Example: Creating a Multi-Index DataFrame


In [ ]:
import pandas as pd

# Create a Multi-Index DataFrame
multi_index = pd.MultiIndex.from_arrays([
    ['Region1', 'Region1', 'Region2', 'Region2'],
    ['ProductA', 'ProductB', 'ProductA', 'ProductB']
], names=['Region', 'Product'])

sales_data = pd.DataFrame({
    'Sales': [100, 150, 200, 250]
}, index=multi_index)

print(sales_data)

### Merging and Concatenating with Multi-Index DataFrames

Combining datasets with Multi-Index structures requires careful alignment of levels.

**Key Methods:**
- Use `merge()` with `on` or `left_on`/`right_on` for specific levels.
- Use `concat()` to combine Multi-Index DataFrames along rows or columns.

#### Example: Merging Multi-Index DataFrames


In [ ]:
# Sample datasets with Multi-Index
region_sales = pd.DataFrame({
    'Revenue': [1000, 1500, 2000, 2500]
}, index=multi_index)

product_details = pd.DataFrame({
    'Category': ['Electronics', 'Clothing', 'Electronics', 'Clothing']
}, index=multi_index)

# Merging datasets
merged_data = pd.merge(region_sales, product_details, left_index=True, right_index=True)
print(merged_data)

### Aligning Multi-Level Indices

Alignment of Multi-Index DataFrames ensures consistent and meaningful operations across levels.

#### Example: Align and Concatenate Multi-Index DataFrames


In [ ]:
# Additional sales data for another year
sales_2023 = pd.DataFrame({
    'Sales': [110, 160, 210, 260]
}, index=multi_index)

# Concatenating data along rows
aligned_concat = pd.concat([sales_data, sales_2023], keys=['2022', '2023'], axis=0)
print(aligned_concat)

# Practical Example: Combining Datasets for Complex Analysis

In this section, we will apply the learned concepts to a real-world scenario by combining multiple datasets for detailed analysis.

## Objective
Combine sales data with customer and product information to create a comprehensive sales report.

## Scenario
We have the following datasets:
- `sales`: Contains sales transactions.
- `customers`: Contains customer details.
- `products`: Contains product details.

### Tasks
1. Merge sales with customer details.
2. Concatenate sales data from multiple months.
3. Handle missing values and mismatched indices.



In [ ]:
# Sample datasets
sales = pd.DataFrame({
    'OrderID': [1, 2, 3],
    'CustomerID': [101, 102, 103],
    'ProductID': [201, 202, 203],
    'Amount': [250, 450, 300]
})

customers = pd.DataFrame({
    'CustomerID': [101, 102, 104],
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Location': ['NY', 'CA', 'TX']
})

products = pd.DataFrame({
    'ProductID': [201, 202, 203],
    'Category': ['Electronics', 'Clothing', 'Furniture']
})

# Merging sales with customer details
sales_customers = pd.merge(sales, customers, on='CustomerID', how='left')
print('Sales with Customer Details:')
print(sales_customers)

### Concatenate Sales Data from Multiple Months

Combine monthly sales data into a single dataset for analysis.


In [ ]:
# Monthly sales data
sales_jan = pd.DataFrame({
    'OrderID': [1, 2],
    'Amount': [250, 450]
})

sales_feb = pd.DataFrame({
    'OrderID': [3, 4],
    'Amount': [300, 500]
})

# Concatenating monthly sales data
combined_sales = pd.concat([sales_jan, sales_feb], keys=['January', 'February'], axis=0)
print('Combined Monthly Sales:')
print(combined_sales)

### Handle Missing Values and Mismatched Indices

Fill missing values and address mismatched indices during the merge or concatenation process.


In [ ]:
# Merge products with sales and handle missing data
sales_products = pd.merge(sales, products, on='ProductID', how='outer')
sales_products.fillna({'Category': 'Unknown'}, inplace=True)
print('Sales with Product Details (Handled Missing Values):')
print(sales_products)