# Polars Pivoting & Reshaping - Comprehensive Guide

Master data reshaping operations in Polars.

## Topics:
- Pivot (wide format)
- Unpivot/Melt (long format)
- Transpose
- Explode (lists to rows)
- Stack and unstack
- Real-world reshaping examples

In [None]:
import polars as pl
import numpy as np

## Part 1: Pivot (Long to Wide)

In [None]:
# Long format data
long_df = pl.DataFrame({
    'date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-03', '2023-01-03'],
    'product': ['A', 'B', 'A', 'B', 'A', 'B'],
    'sales': [100, 150, 110, 160, 105, 155]
})

print("Long format (original):")
print(long_df)

### Basic pivot

In [None]:
# Pivot: products as columns
wide_df = long_df.pivot(
    index='date',
    columns='product',
    values='sales'
)

print("\nWide format (pivoted):")
print(wide_df)

### Pivot with aggregation

In [None]:
# Data with duplicates
dup_df = pl.DataFrame({
    'region': ['North', 'North', 'North', 'South', 'South', 'South'],
    'product': ['A', 'A', 'B', 'A', 'B', 'B'],
    'sales': [100, 110, 150, 120, 140, 145]
})

print("Data with duplicates:")
print(dup_df)

# Pivot with aggregation (sum)
pivoted = dup_df.pivot(
    index='region',
    columns='product',
    values='sales',
    aggregate_function='sum'
)

print("\nPivoted with sum:")
print(pivoted)

### Multiple aggregations

In [None]:
# Pivot with mean
pivot_mean = dup_df.pivot(
    index='region',
    columns='product',
    values='sales',
    aggregate_function='mean'
)

print("Pivoted with mean:")
print(pivot_mean)

## Part 2: Unpivot/Melt (Wide to Long)

In [None]:
# Wide format data
wide_sales = pl.DataFrame({
    'date': ['2023-01-01', '2023-01-02', '2023-01-03'],
    'product_A': [100, 110, 105],
    'product_B': [150, 160, 155],
    'product_C': [200, 210, 205]
})

print("Wide format (original):")
print(wide_sales)

### Basic unpivot

In [None]:
# Unpivot to long format
long_sales = wide_sales.unpivot(
    index='date',
    on=['product_A', 'product_B', 'product_C']
)

print("\nLong format (unpivoted):")
print(long_sales)

### Custom column names

In [None]:
# Unpivot with custom names
long_custom = wide_sales.unpivot(
    index='date',
    on=['product_A', 'product_B', 'product_C'],
    variable_name='product',
    value_name='sales'
)

print("With custom column names:")
print(long_custom)

### Clean up after unpivot

In [None]:
# Clean product names
cleaned = long_custom.with_columns([
    pl.col('product').str.replace('product_', '').alias('product')
])

print("Cleaned:")
print(cleaned)

## Part 3: Explode (Lists to Rows)

In [None]:
# Data with lists
list_df = pl.DataFrame({
    'customer': ['Alice', 'Bob', 'Charlie'],
    'orders': [[101, 102, 103], [201, 202], [301]],
    'amounts': [[100, 200, 150], [300, 250], [175]]
})

print("Data with lists:")
print(list_df)

### Explode single column

In [None]:
# Explode orders
exploded = list_df.explode('orders')

print("\nExploded orders:")
print(exploded)

### Explode multiple columns

In [None]:
# Explode both orders and amounts together
exploded_both = list_df.explode(['orders', 'amounts'])

print("Exploded orders and amounts:")
print(exploded_both)

## Part 4: Transpose

In [None]:
# Sample data
metrics_df = pl.DataFrame({
    'metric': ['Revenue', 'Cost', 'Profit'],
    'Q1': [1000, 600, 400],
    'Q2': [1200, 650, 550],
    'Q3': [1100, 620, 480],
    'Q4': [1300, 680, 620]
})

print("Original:")
print(metrics_df)

In [None]:
# Transpose
transposed = metrics_df.transpose(
    include_header=True,
    header_name='quarter',
    column_names='metric'
)

print("\nTransposed:")
print(transposed)

## Part 5: Stack/Concat Operations

### Vertical stacking (concat rows)

In [None]:
df1 = pl.DataFrame({
    'id': [1, 2, 3],
    'value': [10, 20, 30]
})

df2 = pl.DataFrame({
    'id': [4, 5, 6],
    'value': [40, 50, 60]
})

# Stack vertically
stacked = pl.concat([df1, df2], how='vertical')

print("Vertical stack:")
print(stacked)

### Horizontal concatenation (concat columns)

In [None]:
df_left = pl.DataFrame({
    'id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie']
})

df_right = pl.DataFrame({
    'age': [25, 30, 35],
    'city': ['NYC', 'LA', 'Chicago']
})

# Concat horizontally
combined = pl.concat([df_left, df_right], how='horizontal')

print("Horizontal concat:")
print(combined)

## Part 6: Real-World Examples

### Example 1: Sales report transformation

In [None]:
# Wide format sales report
sales_report = pl.DataFrame({
    'region': ['North', 'South', 'East', 'West'],
    'Jan': [1000, 1200, 1100, 1300],
    'Feb': [1100, 1250, 1150, 1320],
    'Mar': [1050, 1180, 1120, 1280]
})

print("Sales report (wide):")
print(sales_report)

# Transform to long for analysis
sales_long = (
    sales_report
    .unpivot(index='region', variable_name='month', value_name='sales')
    .with_columns([
        pl.when(pl.col('month') == 'Jan').then(1)
          .when(pl.col('month') == 'Feb').then(2)
          .when(pl.col('month') == 'Mar').then(3)
          .alias('month_num')
    ])
    .sort(['region', 'month_num'])
)

print("\nSales report (long):")
print(sales_long)

### Example 2: Survey data reshaping

In [None]:
# Survey responses in wide format
survey = pl.DataFrame({
    'respondent_id': [1, 2, 3],
    'age': [25, 30, 35],
    'q1_satisfaction': [5, 4, 5],
    'q2_recommend': [5, 5, 4],
    'q3_support': [4, 3, 5]
})

print("Survey (wide):")
print(survey)

# Reshape to long for analysis
survey_long = (
    survey
    .unpivot(
        index=['respondent_id', 'age'],
        on=['q1_satisfaction', 'q2_recommend', 'q3_support'],
        variable_name='question',
        value_name='rating'
    )
    .with_columns([
        pl.col('question').str.replace('q\d+_', '').alias('question')
    ])
)

print("\nSurvey (long):")
print(survey_long)

### Example 3: Multi-level data flattening

In [None]:
# Nested data structure
nested = pl.DataFrame({
    'order_id': [1, 2],
    'customer': ['Alice', 'Bob'],
    'items': [['Laptop', 'Mouse'], ['Keyboard', 'Monitor', 'Cable']],
    'prices': [[1200, 25], [75, 350, 15]]
})

print("Nested structure:")
print(nested)

# Flatten
flattened = (
    nested
    .explode(['items', 'prices'])
    .with_columns([
        pl.col('items').alias('product'),
        pl.col('prices').alias('price')
    ])
    .select(['order_id', 'customer', 'product', 'price'])
)

print("\nFlattened:")
print(flattened)

### Example 4: Creating cross-tabulation

In [None]:
# Transaction data
transactions = pl.DataFrame({
    'customer': ['Alice', 'Alice', 'Bob', 'Bob', 'Charlie', 'Charlie', 'Alice', 'Bob'],
    'product': ['A', 'B', 'A', 'C', 'B', 'C', 'A', 'B'],
    'amount': [100, 150, 120, 200, 160, 180, 110, 130]
})

print("Transactions:")
print(transactions)

# Create cross-tab (customer vs product)
crosstab = (
    transactions
    .group_by(['customer', 'product'])
    .agg(pl.col('amount').sum())
    .pivot(
        index='customer',
        columns='product',
        values='amount'
    )
)

print("\nCross-tabulation:")
print(crosstab)

## Part 7: Complex Reshaping Pipeline

In [None]:
# Complex dataset
complex_df = pl.DataFrame({
    'date': ['2023-01', '2023-02', '2023-03'],
    'region': ['North', 'North', 'North'],
    'sales_A': [1000, 1100, 1050],
    'sales_B': [1500, 1600, 1550],
    'units_A': [50, 55, 52],
    'units_B': [75, 80, 77]
})

print("Complex wide format:")
print(complex_df)

# Reshape to long format with separate sales and units
reshaped = (
    complex_df
    # Unpivot sales columns
    .unpivot(
        index=['date', 'region'],
        on=['sales_A', 'sales_B', 'units_A', 'units_B'],
        variable_name='metric',
        value_name='value'
    )
    # Extract product and measure type
    .with_columns([
        pl.col('metric').str.extract(r'_(\w)$', 1).alias('product'),
        pl.col('metric').str.extract(r'^(\w+)_', 1).alias('measure')
    ])
    # Pivot to get sales and units as separate columns
    .pivot(
        index=['date', 'region', 'product'],
        columns='measure',
        values='value'
    )
)

print("\nReshaped to normalized format:")
print(reshaped)

## Summary

### Key Operations:
- **pivot()**: Long to wide (columns from values)
- **unpivot()/melt()**: Wide to long (values from columns)
- **explode()**: Lists to separate rows
- **transpose()**: Swap rows and columns
- **concat()**: Stack DataFrames vertically or horizontally

### When to Use:
- **Pivot**: Create summary tables, cross-tabs, reports
- **Unpivot**: Prepare data for analysis, normalize structure
- **Explode**: Flatten nested/list columns
- **Transpose**: Switch perspective (rows ↔ columns)

### Best Practices:
- Understand your target format before reshaping
- Use unpivot → process → pivot for transformations
- Clean column names after unpivot
- Consider performance with large datasets