# Assignment 6: Data Wrangling - Join, Combine, and Reshape

**Student Name:** [Your Name Here]

**Instructions:** Complete all three questions below. Save your output files to the `output/` directory.

## Setup

In [1]:
import pandas as pd
import numpy as np
import os

# Create output directory
os.makedirs('output', exist_ok=True)
print("Setup complete!")

Setup complete!


## Question 1: Merging DataFrames (40 points)

**Objectives:**
- Load customer, order, and product datasets
- Perform inner, left, and outer joins
- Merge on multiple columns
- Handle duplicate keys and validate results
- Save merged output

### Step 1.1: Load the datasets

In [2]:
# Load the three main datasets
customers = pd.read_csv('data/customers.csv')
orders = pd.read_csv('data/orders.csv')
products = pd.read_csv('data/products.csv')

print(f"Customers: {len(customers)} rows")
print(f"Orders: {len(orders)} rows")
print(f"Products: {len(products)} rows")

# Display first few rows
print("\nCustomers:")
print(customers.head())
print("\nOrders:")
print(orders.head())
print("\nProducts:")
print(products.head())

Customers: 500 rows
Orders: 2000 rows
Products: 100 rows

Customers:
  customer_id               name                email         city state  \
0       C0001      Robert Martin  customer1@email.com      Houston    FL   
1       C0002  Charles Hernandez  customer2@email.com     Portland    TX   
2       C0003           Lisa Lee  customer3@email.com  Los Angeles    TX   
3       C0004     Robert Sanchez  customer4@email.com      Seattle    FL   
4       C0005    Thomas Thompson  customer5@email.com       Austin    TX   

    join_date  
0  2022-08-25  
1  2023-06-14  
2  2023-07-06  
3  2022-01-21  
4  2023-06-04  

Orders:
   order_id customer_id product_id  quantity  order_date  order_total
0  ORD00001       C0512      P0024         9  2023-06-13      3006.63
1  ORD00002       C0072      P0047         8  2023-07-29      1473.12
2  ORD00003       C0468      P0002         7  2024-01-22       679.77
3  ORD00004       C0283      P0040         6  2023-08-08       502.98
4  ORD00005       C

### Step 1.2: Inner Join - Orders with Customers

Perform an inner join to keep only orders that have matching customer records.

In [3]:
# Inner join: only matching records
inner_merged = pd.merge(orders, customers, on='customer_id', how='inner')

print(f"Inner join result: {len(inner_merged)} rows")
print(f"Original orders: {len(orders)} rows")
print(f"Rows lost: {len(orders) - len(inner_merged)}")
print("\nSample of merged data:")
print(inner_merged.head())

Inner join result: 1905 rows
Original orders: 2000 rows
Rows lost: 95

Sample of merged data:
   order_id customer_id product_id  quantity  order_date  order_total  \
0  ORD00002       C0072      P0047         8  2023-07-29      1473.12   
1  ORD00003       C0468      P0002         7  2024-01-22       679.77   
2  ORD00004       C0283      P0040         6  2023-08-08       502.98   
3  ORD00005       C0008      P0048         5  2023-09-17        62.80   
4  ORD00006       C0366      P0032         9  2024-06-07      1731.24   

            name                  email       city state   join_date  
0  John Williams   customer72@email.com     Austin    NY  2022-01-20  
1     Lisa Brown  customer468@email.com    Seattle    WA  2023-04-27  
2  William Jones  customer283@email.com    Houston    CA  2022-06-16  
3    Emily Davis    customer8@email.com    Orlando    WA  2023-08-22  
4  Joseph Harris  customer366@email.com  San Diego    CA  2023-07-02  


### Step 1.3: Left Join - Keep All Orders

Use a left join to keep all orders, even those without customer information.

In [4]:
# Left join: all orders, even without customer data
left_merged = pd.merge(orders, customers, on='customer_id', how='left')

print(f"Left join result: {len(left_merged)} rows")
print(f"Original orders: {len(orders)} rows")
print(f"Orders without customer data: {left_merged['name'].isna().sum()}")
print("\nOrders without customer info:")
print(left_merged[left_merged['name'].isna()].head())

Left join result: 2000 rows
Original orders: 2000 rows
Orders without customer data: 95

Orders without customer info:
    order_id customer_id product_id  quantity  order_date  order_total name  \
0   ORD00001       C0512      P0024         9  2023-06-13      3006.63  NaN   
22  ORD00023       C0514      P0025         8  2023-09-27       465.68  NaN   
67  ORD00068       C0504      P0046         6  2023-12-29      1691.94  NaN   
71  ORD00072       C0517      P0022         4  2023-03-10        63.72  NaN   
90  ORD00091       C0508      P0140         7  2024-08-01        77.95  NaN   

   email city state join_date  
0    NaN  NaN   NaN       NaN  
22   NaN  NaN   NaN       NaN  
67   NaN  NaN   NaN       NaN  
71   NaN  NaN   NaN       NaN  
90   NaN  NaN   NaN       NaN  


### Step 1.4: Outer Join - Orders and Products

Perform an outer join to see all orders and all products, even unmatched ones.

In [5]:
# Outer join: everything from both sides
outer_merged = pd.merge(orders, products, on='product_id', how='outer')

print(f"Outer join result: {len(outer_merged)} rows")
print(f"Orders: {len(orders)} rows")
print(f"Products: {len(products)} rows")
print(f"Products never ordered: {outer_merged['order_id'].isna().sum()}")
print(f"Orders for unknown products: {outer_merged['product_name'].isna().sum()}")
print("\nProducts never ordered:")
print(outer_merged[outer_merged['order_id'].isna()][['product_id', 'product_name', 'category']].head())

Outer join result: 2000 rows
Orders: 2000 rows
Products: 100 rows
Products never ordered: 0
Orders for unknown products: 82

Products never ordered:
Empty DataFrame
Columns: [product_id, product_name, category]
Index: []


### Step 1.5: Complete Merge - Orders + Customers + Products

Create a complete dataset with order, customer, and product information.

In [6]:
# First merge orders with customers
orders_customers = pd.merge(orders, customers, on='customer_id', how='left')

# Then merge with products
complete_data = pd.merge(orders_customers, products, on='product_id', how='left')

print(f"Complete merged data: {len(complete_data)} rows")
print(f"Columns: {len(complete_data.columns)}")
print("\nColumn names:")
print(complete_data.columns.tolist())
print("\nSample:")
print(complete_data[['order_id', 'name', 'product_name', 'category', 'quantity', 'order_total']].head())

Complete merged data: 2000 rows
Columns: 15

Column names:
['order_id', 'customer_id', 'product_id', 'quantity', 'order_date', 'order_total', 'name', 'email', 'city', 'state', 'join_date', 'product_name', 'category', 'price', 'stock']

Sample:
   order_id           name         product_name       category  quantity  \
0  ORD00001            NaN         Guide Deluxe          Books         9   
1  ORD00002  John Williams           Table Plus  Home & Garden         8   
2  ORD00003     Lisa Brown    Headphones Deluxe    Electronics         7   
3  ORD00004  William Jones         Chair Deluxe  Home & Garden         6   
4  ORD00005    Emily Davis  Headphones Standard    Electronics         5   

   order_total  
0      3006.63  
1      1473.12  
2       679.77  
3       502.98  
4        62.80  


### Step 1.6: Save Q1 Output

In [7]:
# Save the complete merged dataset
complete_data.to_csv('output/q1_merged_data.csv', index=False)
print("✓ Saved output/q1_merged_data.csv")

# Create validation file
with open('output/q1_validation.txt', 'w') as f:
    f.write("Question 1: Merging DataFrames - Validation Report\n")
    f.write("="*60 + "\n\n")
    f.write(f"Total merged rows: {len(complete_data)}\n")
    f.write(f"Total columns: {len(complete_data.columns)}\n")
    f.write(f"Orders with customer data: {complete_data['name'].notna().sum()}\n")
    f.write(f"Orders with product data: {complete_data['product_name'].notna().sum()}\n")
    f.write(f"Complete records (all data): {complete_data[['name', 'product_name']].notna().all(axis=1).sum()}\n")

print("✓ Saved output/q1_validation.txt")

✓ Saved output/q1_merged_data.csv
✓ Saved output/q1_validation.txt


## Question 2: Concatenation & Index Management (30 points)

**Objectives:**
- Load 2023 and 2024 monthly sales data
- Concatenate vertically and horizontally
- Manage indexes properly
- Handle misaligned data

### Step 2.1: Load Monthly Sales Data

In [8]:
# Load sales data
sales_2023 = pd.read_csv('data/monthly_sales_2023.csv')
sales_2024 = pd.read_csv('data/monthly_sales_2024.csv')

print(f"2023 Sales: {len(sales_2023)} products")
print(f"2024 Sales: {len(sales_2024)} products")
print(f"\n2023 columns: {sales_2023.columns.tolist()}")
print(f"\n2024 columns: {sales_2024.columns.tolist()}")
print("\n2023 Sample:")
print(sales_2023.head())

2023 Sales: 50 products
2024 Sales: 50 products

2023 columns: ['product_id', 'product_name', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

2024 columns: ['product_id', 'product_name', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

2023 Sample:
  product_id            product_name      Jan      Feb      Mar      Apr  \
0      P0084               Vase Plus  5832.11  8349.18  7808.01  7367.09   
1      P0054  Tennis Racket Standard  6049.48  8816.56  4906.59  5315.33   
2      P0071     Tennis Racket Basic  4455.45  6754.82  6026.26  6261.48   
3      P0046              Jacket Pro  3418.55  5027.58  8348.08  9898.08   
4      P0045       Headphones Deluxe  6868.54  6959.56  2997.96  6417.89   

       May      Jun      Jul      Aug      Sep      Oct      Nov      Dec  
0  7496.78  2234.93  8390.28  2874.94  9391.65  9518.52  6592.06  3178.80  
1  8955.03  6302.05  3022.88  2128.43  7363.86  3218.69  5693.40  493

### Step 2.2: Vertical Concatenation

Stack 2023 and 2024 data vertically (add rows).

In [9]:
# Add year column before concatenating
sales_2023_labeled = sales_2023.copy()
sales_2023_labeled['year'] = 2023

sales_2024_labeled = sales_2024.copy()
sales_2024_labeled['year'] = 2024

# Vertical concatenation
combined_vertical = pd.concat([sales_2023_labeled, sales_2024_labeled], ignore_index=True)

print(f"Combined data: {len(combined_vertical)} rows")
print(f"Expected: {len(sales_2023) + len(sales_2024)} rows")
print("\nIndex after concat:")
print(combined_vertical.index[:5])
print("...")
print(combined_vertical.index[-5:])
print("\nSample:")
print(combined_vertical[['product_id', 'product_name', 'year', 'Jan', 'Feb']].head())

Combined data: 100 rows
Expected: 100 rows

Index after concat:
RangeIndex(start=0, stop=5, step=1)
...
RangeIndex(start=95, stop=100, step=1)

Sample:
  product_id            product_name  year      Jan      Feb
0      P0084               Vase Plus  2023  5832.11  8349.18
1      P0054  Tennis Racket Standard  2023  6049.48  8816.56
2      P0071     Tennis Racket Basic  2023  4455.45  6754.82
3      P0046              Jacket Pro  2023  3418.55  5027.58
4      P0045       Headphones Deluxe  2023  6868.54  6959.56


### Step 2.3: Index Management

Practice using `set_index()` and `reset_index()`.

In [10]:
# Set product_id as index
indexed = combined_vertical.set_index('product_id')
print("After set_index:")
print(indexed.head())
print(f"\nIndex name: {indexed.index.name}")

# Reset index back to default
reset = indexed.reset_index()
print("\nAfter reset_index:")
print(reset.head())
print(f"\nIs 'product_id' a column again? {('product_id' in reset.columns)}")

After set_index:
                      product_name      Jan      Feb      Mar      Apr  \
product_id                                                               
P0084                    Vase Plus  5832.11  8349.18  7808.01  7367.09   
P0054       Tennis Racket Standard  6049.48  8816.56  4906.59  5315.33   
P0071          Tennis Racket Basic  4455.45  6754.82  6026.26  6261.48   
P0046                   Jacket Pro  3418.55  5027.58  8348.08  9898.08   
P0045            Headphones Deluxe  6868.54  6959.56  2997.96  6417.89   

                May      Jun      Jul      Aug      Sep      Oct      Nov  \
product_id                                                                  
P0084       7496.78  2234.93  8390.28  2874.94  9391.65  9518.52  6592.06   
P0054       8955.03  6302.05  3022.88  2128.43  7363.86  3218.69  5693.40   
P0071       6541.99  9083.06  8867.64  2706.36  9060.25  8227.85  8757.50   
P0046       2172.96  8137.65  3370.98  1286.10  3128.26  6508.45  5425.41   
P0

### Step 2.4: Horizontal Concatenation (Careful!)

Demonstrate horizontal concatenation and potential alignment issues.

In [11]:
# Create subset for horizontal concat demo
# Use product_id as index for proper alignment
sales_2023_indexed = sales_2023.set_index('product_id')[['product_name', 'Jan', 'Feb', 'Mar']]
sales_2024_indexed = sales_2024.set_index('product_id')[['Jan', 'Feb', 'Mar']]

# Rename 2024 columns to avoid duplicates
sales_2024_indexed.columns = [f'{col}_2024' for col in sales_2024_indexed.columns]

# Horizontal concatenation
combined_horizontal = pd.concat([sales_2023_indexed, sales_2024_indexed], axis=1)

print(f"Combined data: {combined_horizontal.shape[0]} rows x {combined_horizontal.shape[1]} columns")
print("\nColumns:")
print(combined_horizontal.columns.tolist())
print("\nSample (comparing 2023 vs 2024):")
print(combined_horizontal[['product_name', 'Jan', 'Jan_2024', 'Feb', 'Feb_2024']].head())

Combined data: 50 rows x 7 columns

Columns:
['product_name', 'Jan', 'Feb', 'Mar', 'Jan_2024', 'Feb_2024', 'Mar_2024']

Sample (comparing 2023 vs 2024):
                      product_name      Jan  Jan_2024      Feb  Feb_2024
product_id                                                              
P0084                    Vase Plus  5832.11   9822.02  8349.18   3669.66
P0054       Tennis Racket Standard  6049.48   9600.56  8816.56   4399.43
P0071          Tennis Racket Basic  4455.45   4875.53  6754.82   1962.67
P0046                   Jacket Pro  3418.55   5727.60  5027.58   6074.34
P0045            Headphones Deluxe  6868.54   1321.41  6959.56   2651.49


### Step 2.5: Save Q2 Output

In [12]:
# Save the vertically concatenated data (more useful for analysis)
combined_vertical.to_csv('output/q2_combined_data.csv', index=False)
print("✓ Saved output/q2_combined_data.csv")

# Verify the file
verify = pd.read_csv('output/q2_combined_data.csv')
print(f"\nVerification: {len(verify)} rows, {len(verify.columns)} columns")

✓ Saved output/q2_combined_data.csv

Verification: 100 rows, 15 columns


## Question 3: Reshaping & Analysis (30 points)

**Objectives:**
- Transform wide format to long format using `melt()`
- Transform long format to wide using `pivot()`
- Create pivot tables for aggregation
- Combine reshape with merge operations

### Step 3.1: Wide to Long Format (melt)

In [13]:
# Melt 2023 sales from wide to long format
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

long_data_2023 = pd.melt(sales_2023,
                          id_vars=['product_id', 'product_name'],
                          value_vars=months,
                          var_name='month',
                          value_name='sales')

print(f"Original (wide): {sales_2023.shape[0]} rows x {sales_2023.shape[1]} columns")
print(f"After melt (long): {long_data_2023.shape[0]} rows x {long_data_2023.shape[1]} columns")
print("\nLong format sample:")
print(long_data_2023.head(15))  # Show first 15 to see multiple months

Original (wide): 50 rows x 14 columns
After melt (long): 600 rows x 4 columns

Long format sample:
   product_id            product_name month    sales
0       P0084               Vase Plus   Jan  5832.11
1       P0054  Tennis Racket Standard   Jan  6049.48
2       P0071     Tennis Racket Basic   Jan  4455.45
3       P0046              Jacket Pro   Jan  3418.55
4       P0045       Headphones Deluxe   Jan  6868.54
5       P0040            Chair Deluxe   Jan  2211.93
6       P0023              Comic Plus   Jan  5667.54
7       P0081         Jacket Standard   Jan  7912.67
8       P0011              Vase Basic   Jan  2478.68
9       P0001          Speaker Deluxe   Jan  7779.34
10      P0019             Speaker Pro   Jan  7847.15
11      P0031         Cookbook Deluxe   Jan  1539.34
12      P0074            Textbook Pro   Jan  1169.06
13      P0034          Comic Standard   Jan  9765.42
14      P0091       Tennis Racket Pro   Jan  2740.44


### Step 3.2: Long to Wide Format (pivot)

In [14]:
# Pivot back to wide format
wide_again = long_data_2023.pivot(index='product_id',
                                   columns='month',
                                   values='sales')

print(f"Pivoted data: {wide_again.shape[0]} rows x {wide_again.shape[1]} columns")
print("\nWide format (recreated):")
print(wide_again.head())

# Verify it matches original structure
print(f"\nColumns match months? {list(wide_again.columns) == months}")

Pivoted data: 50 rows x 12 columns

Wide format (recreated):
month           Apr      Aug      Dec      Feb      Jan      Jul      Jun  \
product_id                                                                  
P0001       5089.69  1614.39  2508.60  8186.45  7779.34  7348.91  7112.78   
P0005       8978.01  6957.55  8339.97  1382.31  8399.25  1236.31  1254.92   
P0006       5325.59  4059.06  5465.10  1611.33  9032.88  9940.48  9877.02   
P0008       4618.08  4881.21  5216.84  9062.69  6247.90  7539.47  1764.73   
P0010       6397.34  7709.07  8991.12  4807.33  5127.95  2771.11  1306.86   

month           Mar      May      Nov      Oct      Sep  
product_id                                               
P0001       4871.67  7176.25  2520.75  1676.43  3697.57  
P0005       3180.12  3911.77  5199.59  4527.61  6213.41  
P0006       1275.12  1885.47  1740.64  8624.95  9617.75  
P0008       9121.11  8599.29  1070.26  9778.94  8229.33  
P0010       1879.33  9243.02  1352.35  9155.01  190

### Step 3.3: Merge Product Category with Sales

Add product category to analyze sales by category.

In [15]:
# Merge long-format sales with product info to get categories
sales_with_category = pd.merge(long_data_2023,
                               products[['product_id', 'category']],
                               on='product_id',
                               how='left')

print(f"Sales with category: {len(sales_with_category)} rows")
print("\nSample:")
print(sales_with_category.head())

Sales with category: 600 rows

Sample:
  product_id            product_name month    sales       category
0      P0084               Vase Plus   Jan  5832.11  Home & Garden
1      P0054  Tennis Racket Standard   Jan  6049.48         Sports
2      P0071     Tennis Racket Basic   Jan  4455.45         Sports
3      P0046              Jacket Pro   Jan  3418.55       Clothing
4      P0045       Headphones Deluxe   Jan  6868.54    Electronics


### Step 3.4: Pivot Table for Aggregation

Create a pivot table showing total sales by category and month.

In [16]:
# Create pivot table: category by month
category_sales_pivot = pd.pivot_table(sales_with_category,
                                      values='sales',
                                      index='category',
                                      columns='month',
                                      aggfunc='sum')

print("Sales by Category and Month:")
print(category_sales_pivot)

# Calculate totals
category_sales_pivot['Total'] = category_sales_pivot.sum(axis=1)
print("\nWith totals:")
print(category_sales_pivot)

Sales by Category and Month:
month               Apr       Aug       Dec        Feb        Jan       Jul  \
category                                                                      
Books          45129.56  54917.00  46637.30   39734.84   48669.81  43222.42   
Clothing       47281.06  44945.81  50935.21   45917.39   43427.96  49811.72   
Electronics    83828.24  61527.11  93053.08  109514.50  103891.05  83000.44   
Home & Garden  37129.10  23605.84  23422.46   37989.80   20240.69  26901.18   
Sports         72116.64  76411.47  81590.71   73587.57   69525.07  69170.36   

month               Jun       Mar       May       Nov       Oct       Sep  
category                                                                   
Books          46761.45  51552.15  42358.91  41362.17  54745.24  52236.59  
Clothing       48379.55  51075.45  47205.70  38898.80  56004.38  39565.58  
Electronics    88442.64  96196.48  57029.38  71513.86  90947.48  80794.55  
Home & Garden  21724.15  31683.29  33

### Step 3.5: Group and Summarize

In [17]:
# Group by category and calculate statistics
category_summary = sales_with_category.groupby('category')['sales'].agg([
    ('total_sales', 'sum'),
    ('avg_monthly_sales', 'mean'),
    ('min_sales', 'min'),
    ('max_sales', 'max'),
    ('count', 'count')
])

print("Category Summary Statistics:")
print(category_summary)

# Sort by total sales
print("\nTop categories by total sales:")
print(category_summary.sort_values('total_sales', ascending=False))

Category Summary Statistics:
               total_sales  avg_monthly_sales  min_sales  max_sales  count
category                                                                  
Books            567327.44        5253.031852    1154.37    9912.14    108
Clothing         563448.61        5869.256354    1155.93    9991.82     96
Electronics     1019738.81        5665.215611    1081.99    9974.84    180
Home & Garden    343838.33        5730.638833    1012.31    9778.94     60
Sports           909133.86        5827.781154    1054.75    9940.48    156

Top categories by total sales:
               total_sales  avg_monthly_sales  min_sales  max_sales  count
category                                                                  
Electronics     1019738.81        5665.215611    1081.99    9974.84    180
Sports           909133.86        5827.781154    1054.75    9940.48    156
Books            567327.44        5253.031852    1154.37    9912.14    108
Clothing         563448.61        5869.

### Step 3.6: Save Q3 Output

In [18]:
# Save the long-format data with categories
sales_with_category.to_csv('output/q3_reshaped_data.csv', index=False)
print("✓ Saved output/q3_reshaped_data.csv")

# Also save the category pivot table
category_sales_pivot.to_csv('output/q3_category_sales_wide.csv')
print("✓ Saved output/q3_category_sales_wide.csv")

# Save analysis report
with open('output/q3_analysis_report.txt', 'w') as f:
    f.write("Question 3: Reshaping & Analysis - Report\n")
    f.write("="*60 + "\n\n")
    f.write(f"Total sales records: {len(sales_with_category)}\n")
    f.write(f"Products analyzed: {sales_with_category['product_id'].nunique()}\n")
    f.write(f"Months covered: {sales_with_category['month'].nunique()}\n")
    f.write(f"Categories: {sales_with_category['category'].nunique()}\n\n")
    f.write("Category Summary:\n")
    f.write(category_summary.to_string())

print("✓ Saved output/q3_analysis_report.txt")

✓ Saved output/q3_reshaped_data.csv
✓ Saved output/q3_category_sales_wide.csv


✓ Saved output/q3_analysis_report.txt


## Verification

Check that all required output files exist.

In [19]:
import os

required_outputs = [
    'output/q1_merged_data.csv',
    'output/q1_validation.txt',
    'output/q2_combined_data.csv',
    'output/q3_reshaped_data.csv',
    'output/q3_category_sales_wide.csv',
    'output/q3_analysis_report.txt'
]

print("Checking output files:")
print("="*60)
all_exist = True
for file in required_outputs:
    exists = os.path.exists(file)
    size = os.path.getsize(file) if exists else 0
    status = "✓" if exists else "✗"
    print(f"{status} {file} ({size:,} bytes)")
    if not exists:
        all_exist = False

print("="*60)
if all_exist:
    print("\n✓ All required output files created successfully!")
else:
    print("\n✗ Some output files are missing. Please check for errors.")

Checking output files:
✓ output/q1_merged_data.csv (264,871 bytes)
✓ output/q1_validation.txt (252 bytes)
✓ output/q2_combined_data.csv (12,218 bytes)
✓ output/q3_reshaped_data.csv (25,418 bytes)
✓ output/q3_category_sales_wide.csv (807 bytes)
✓ output/q3_analysis_report.txt (727 bytes)

✓ All required output files created successfully!


## Summary

This assignment covered:

1. **Merging DataFrames**: Inner, left, outer joins and multi-table merges
2. **Concatenation**: Vertical and horizontal combining with proper index management
3. **Reshaping**: Wide ↔ long format conversion and pivot tables

These operations are fundamental to data wrangling and will be used extensively in real-world data analysis.