# Week 10 Wednesday - Part 2: Data Reshaping with Pandas

**Duration:** 45 minutes  
**Topic:** Transforming Data Structure with `pivot_table()`, `melt()`, `stack()`, and `unstack()`  
**Business Context:** Lagos E-Commerce Inventory Reporting and Analysis  

---

## Learning Objectives

By the end of this session, you will be able to:

1. Transform data from long format to wide format using `pivot()` and `pivot_table()`
2. Convert wide format data to long format using `melt()`
3. Understand when to use each reshaping technique
4. Create cross-tabulation reports for business analysis
5. Apply stack/unstack for multi-level index manipulation

---

## Introduction: Why Reshaping Matters

Different analyses and visualizations require different data structures:

- **Long Format (Tidy Data):** Good for analysis, filtering, and database storage
  - One observation per row
  - Multiple rows per entity
  - Easy to filter and aggregate

- **Wide Format:** Good for reporting, visualization, and human readability
  - One entity per row
  - Multiple columns for different measurements
  - Excel-friendly

**Real-World Scenarios:**
- Create monthly sales report with categories as rows and months as columns
- Transform product inventory across warehouses into a matrix
- Convert wide budget spreadsheet into analyzable long format

---

## Setup: Import Libraries and Load Data

In [None]:
import pandas as pd
import numpy as np

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

In [None]:
# Load datasets
products = pd.read_csv('../datasets/products.csv')
inventory = pd.read_csv('../datasets/inventory.csv')
orders = pd.read_csv('../datasets/orders.csv')
order_items = pd.read_csv('../datasets/order_items.csv')
warehouses = pd.read_csv('../datasets/warehouses.csv')

# Create merged dataset from Part 1
inventory_data = pd.merge(inventory, products, on='product_id', how='left')
inventory_data = pd.merge(inventory_data, warehouses, on='warehouse_id', how='left')

print("‚úì Datasets loaded and merged")
print(f"Working with {len(inventory_data)} inventory records")

---

## Section 1: Understanding Long vs Wide Format (10 minutes)

### Current Data Structure (Long Format)

In [None]:
# Our inventory data is in "long format" - one row per product-warehouse combination
print("Long Format Example:")
print(inventory_data[['product_id', 'category', 'city', 'stock_level']].head(10))
print(f"\nShape: {inventory_data.shape}")
print(f"Multiple rows per category: {inventory_data['category'].value_counts().head()}")

### What is Wide Format?

Wide format has:
- **Rows:** One row per primary entity (e.g., one row per category)
- **Columns:** Separate column for each measurement/dimension (e.g., Lagos stock, Abuja stock, etc.)

**Example Transformation:**
```
Long Format:
category     | city   | stock
Electronics  | Lagos  | 100
Electronics  | Abuja  | 50
Furniture    | Lagos  | 75

‚Üì pivot_table ‚Üì

Wide Format:
category     | Lagos | Abuja
Electronics  | 100   | 50
Furniture    | 75    | NaN
```

---

## Section 2: Pivot Tables - Long to Wide (12 minutes)

### Example 1: Stock Levels by Category and Warehouse

In [None]:
# Create a pivot table: categories as rows, warehouses as columns
stock_matrix = pd.pivot_table(
    inventory_data,
    values='stock_level',        # What to aggregate
    index='category',             # Rows
    columns='city',               # Columns
    aggfunc='sum',                # How to aggregate (sum, mean, count, etc.)
    fill_value=0                  # Replace NaN with 0
)

print("Stock Levels by Category and Warehouse:")
print(stock_matrix)
print(f"\nShape: {stock_matrix.shape}")

**üí° Key Insight:** This pivot table instantly shows which warehouses have stock for each category - perfect for management reports!

### Example 2: Multiple Aggregation Functions

In [None]:
# Calculate both total stock and average stock per product
multi_agg_pivot = pd.pivot_table(
    inventory_data,
    values='stock_level',
    index='category',
    columns='city',
    aggfunc=['sum', 'mean'],
    fill_value=0
)

print("Stock Analysis (Total and Average):")
print(multi_agg_pivot)

### Example 3: Adding Row and Column Totals

In [None]:
# Create pivot with margins (totals)
stock_with_totals = pd.pivot_table(
    inventory_data,
    values='stock_level',
    index='category',
    columns='city',
    aggfunc='sum',
    fill_value=0,
    margins=True,              # Add row and column totals
    margins_name='Total'       # Name for the total row/column
)

print("Stock Levels with Totals:")
print(stock_with_totals)

# Which warehouse has the most stock?
print("\nTotal Stock by Warehouse:")
print(stock_with_totals.loc['Total'])

### Example 4: Multiple Index Levels

In [None]:
# Pivot by both category AND status
detailed_pivot = pd.pivot_table(
    inventory_data,
    values='stock_level',
    index=['category', 'status'],  # Multiple row levels
    columns='city',
    aggfunc='sum',
    fill_value=0
)

print("Stock by Category, Status, and Warehouse:")
print(detailed_pivot.head(15))

---

## Section 3: Melt - Wide to Long (10 minutes)

**Business Scenario:** You receive a wide-format monthly sales report from Excel. You need to convert it to long format for analysis.

### Creating Sample Wide Data

In [None]:
# Simulate a wide-format monthly sales report
monthly_sales_wide = pd.DataFrame({
    'category': ['Electronics', 'Furniture', 'Home Appliances', 'Sports', 'Toys'],
    'January': [150000, 80000, 120000, 45000, 60000],
    'February': [165000, 75000, 135000, 50000, 65000],
    'March': [180000, 90000, 145000, 55000, 70000]
})

print("Wide Format Monthly Sales (‚Ç¶):")
print(monthly_sales_wide)
print(f"\nShape: {monthly_sales_wide.shape} (5 categories, 4 columns)")

### Example 5: Basic Melt Operation

In [None]:
# Convert wide format to long format
monthly_sales_long = pd.melt(
    monthly_sales_wide,
    id_vars=['category'],           # Columns to keep as identifiers
    value_vars=['January', 'February', 'March'],  # Columns to unpivot
    var_name='month',               # Name for the new "variable" column
    value_name='sales'              # Name for the new "value" column
)

print("Long Format Monthly Sales:")
print(monthly_sales_long)
print(f"\nShape: {monthly_sales_long.shape} (15 rows, 3 columns)")

**üí° Key Insight:** `melt()` is the inverse of `pivot_table()`. It transforms wide data into long format, making it easier to analyze and visualize.

### Example 6: Why Melt is Useful

In [None]:
# Now we can easily perform operations that were difficult in wide format

# 1. Group by month to see total sales
print("Total Sales by Month:")
print(monthly_sales_long.groupby('month')['sales'].sum())

# 2. Filter for specific categories
print("\nElectronics Sales Over Time:")
electronics = monthly_sales_long[monthly_sales_long['category'] == 'Electronics']
print(electronics)

# 3. Calculate growth rate
print("\nMonth-over-Month Growth by Category:")
monthly_sales_long = monthly_sales_long.sort_values(['category', 'month'])
monthly_sales_long['growth'] = monthly_sales_long.groupby('category')['sales'].pct_change() * 100
print(monthly_sales_long[['category', 'month', 'sales', 'growth']].head(9))

---

## Section 4: Pivot vs Melt - Round Trip (8 minutes)

### Demonstration: Full Cycle

In [None]:
# Start with long format inventory data
print("STEP 1: Original Long Format")
sample = inventory_data[['category', 'city', 'stock_level']].head(12)
print(sample)
print(f"Shape: {sample.shape}")

# Pivot to wide format
print("\nSTEP 2: Pivot to Wide Format")
wide = pd.pivot_table(sample, values='stock_level', index='category', columns='city', aggfunc='sum', fill_value=0)
print(wide)
print(f"Shape: {wide.shape}")

# Melt back to long format
print("\nSTEP 3: Melt Back to Long Format")
wide_reset = wide.reset_index()  # Convert index to column
long_again = pd.melt(wide_reset, id_vars=['category'], var_name='city', value_name='stock_level')
print(long_again)
print(f"Shape: {long_again.shape}")

**üí° Key Insight:** You can transform data back and forth between formats. Choose the format that makes your current analysis easiest!

---

## Section 5: Stack and Unstack (5 minutes)

`stack()` and `unstack()` are alternative methods for reshaping, especially useful with multi-index DataFrames.

### Example 7: Unstack for Quick Pivoting

In [None]:
# Create a grouped summary
grouped = inventory_data.groupby(['category', 'city'])['stock_level'].sum()
print("Multi-Index Series:")
print(grouped.head(10))

# Unstack to convert inner index level to columns
print("\nAfter unstack():")
unstacked = grouped.unstack(fill_value=0)
print(unstacked)

### Example 8: Stack to Convert Columns to Rows

In [None]:
# Stack converts columns back to index
print("After stack():")
stacked = unstacked.stack()
print(stacked.head(10))
print(f"\nType: {type(stacked)}")

**Comparison:**
- `pivot_table()` / `melt()`: More flexible, better for complex transformations
- `unstack()` / `stack()`: Faster for simple reshaping of grouped data

---

## Section 6: Real-World Business Report (5 minutes)

**Complete Example:** Create a comprehensive inventory status report

### Multi-Dimensional Pivot Report

In [None]:
# Create executive summary: Stock status by category and warehouse
status_report = pd.pivot_table(
    inventory_data,
    values='product_id',
    index=['status', 'category'],
    columns='city',
    aggfunc='count',
    fill_value=0,
    margins=True
)

print("Inventory Status Report (Product Count by Warehouse):")
print(status_report)

# Highlight problem areas
print("\n‚ö†Ô∏è Low Stock Alert:")
low_stock = status_report.loc['Low Stock']
print(low_stock)

### Creating a Trend Analysis Format

In [None]:
# Simulate time-series data for trend analysis
# Add 'last_restocked' as a month column
inventory_data['restock_month'] = pd.to_datetime(inventory_data['last_restocked']).dt.strftime('%Y-%m')

# Pivot to show restocking activity by category over time
restock_trend = pd.pivot_table(
    inventory_data,
    values='product_id',
    index='category',
    columns='restock_month',
    aggfunc='count',
    fill_value=0
)

print("\nRestocking Activity by Month:")
print(restock_trend)

---

## Section 7: Best Practices and Common Pitfalls (3 minutes)

### When to Use Each Method:

| Use Case | Method |
|----------|--------|
| Create summary reports | `pivot_table()` |
| Prepare data for plotting | `pivot_table()` or `unstack()` |
| Import Excel data for analysis | `melt()` |
| Time-series analysis | `melt()` (wide to long) |
| Quick multi-index reshaping | `stack()` / `unstack()` |
| Export to Excel report | `pivot_table()` |

### Common Pitfalls:

In [None]:
# Pitfall 1: Forgetting to handle NaN values
print("Without fill_value:")
bad_pivot = pd.pivot_table(inventory_data, values='stock_level', index='category', columns='city', aggfunc='sum')
print(bad_pivot)
print("\nWith fill_value=0:")
good_pivot = pd.pivot_table(inventory_data, values='stock_level', index='category', columns='city', aggfunc='sum', fill_value=0)
print(good_pivot)

# Pitfall 2: Wrong aggregation function
print("\n‚ùå Using 'mean' when you want 'sum':")
print(pd.pivot_table(inventory_data, values='stock_level', index='category', columns='city', aggfunc='mean', fill_value=0))

print("\n‚úì Correct - using 'sum':")
print(pd.pivot_table(inventory_data, values='stock_level', index='category', columns='city', aggfunc='sum', fill_value=0))

---

## Summary and Key Takeaways

### What We Learned Today:

1. **Long vs Wide Format:**
   - Long: Analysis-friendly (filtering, grouping)
   - Wide: Report-friendly (Excel, presentations)

2. **Pivot Table (`pivot_table()`):**
   - Transforms long ‚Üí wide
   - Parameters: `values`, `index`, `columns`, `aggfunc`
   - Use `fill_value=0` to handle NaN
   - Use `margins=True` for totals

3. **Melt (`melt()`):**
   - Transforms wide ‚Üí long
   - Parameters: `id_vars`, `value_vars`, `var_name`, `value_name`
   - Perfect for importing Excel reports

4. **Stack/Unstack:**
   - Quick reshaping for grouped data
   - `unstack()`: Index level ‚Üí columns
   - `stack()`: Columns ‚Üí index level

### Quick Reference:

```python
# Long to Wide (Pivot)
pd.pivot_table(df, values='metric', index='row', columns='col', aggfunc='sum')

# Wide to Long (Melt)
pd.melt(df, id_vars=['id'], value_vars=['col1', 'col2'], var_name='variable', value_name='value')

# Multi-Index Reshaping
df.unstack()  # Index ‚Üí Columns
df.stack()    # Columns ‚Üí Index
```

---

## Practice Exercise (5 minutes)

**Challenge:** Using the order_items and products data:
1. Merge the datasets
2. Create a pivot table showing total revenue (price √ó quantity) by category and order month
3. Convert the result to long format

### Your Task:

In [None]:
# Step 1: Merge order_items with products
# Step 2: Extract month from order dates
# Step 3: Create pivot table (category √ó month)
# Step 4: Convert back to long format using melt

# Your code here:


### Solution (Reveal After Attempting)

In [None]:
# Solution:
# Step 1: Merge
sales_data = pd.merge(order_items, orders[['order_id', 'order_date']], on='order_id')
sales_data = pd.merge(sales_data, products[['product_id', 'category']], on='product_id')

# Step 2: Extract month
sales_data['month'] = pd.to_datetime(sales_data['order_date']).dt.strftime('%Y-%m')

# Step 3: Pivot
revenue_pivot = pd.pivot_table(
    sales_data, 
    values='price', 
    index='category', 
    columns='month', 
    aggfunc='sum',
    fill_value=0
)
print("Revenue by Category and Month:")
print(revenue_pivot)

# Step 4: Melt
revenue_long = pd.melt(
    revenue_pivot.reset_index(),
    id_vars=['category'],
    var_name='month',
    value_name='revenue'
)
print("\nLong Format:")
print(revenue_long.head(10))

---

## Next Session Preview

**Part 3: Data Concatenation (Wednesday, 30 minutes)**
- Combining DataFrames vertically (`pd.concat()` with axis=0)
- Combining DataFrames horizontally (`pd.concat()` with axis=1)
- Handling index alignment and duplicates
- Real-world use case: Combining monthly data files

---

## Resources

- [Pandas pivot_table() documentation](https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html)
- [Pandas melt() documentation](https://pandas.pydata.org/docs/reference/api/pandas.melt.html)
- [Reshaping and pivot tables guide](https://pandas.pydata.org/docs/user_guide/reshaping.html)
- Week 10 SQL content: Normalization and denormalization concepts