# Assignment 6: Data Wrangling with Merge, Concat, and Reshape

**Deliverable:** Completed notebook with output files in `output/`

---

## Setup

First, make sure you've generated the data by running `data_generator.ipynb`.

In [53]:
import pandas as pd
import numpy as np
import os

# Verify data files exist
required_files = ['data/customers.csv', 'data/products.csv', 'data/purchases.csv']
for file in required_files:
    if not os.path.exists(file):
        raise FileNotFoundError(f"{file} not found. Run data_generator.ipynb first!")

print("✓ All data files found")

✓ All data files found


---

## Dataset Column Reference

Use this reference when writing merge operations and selecting columns. Each dataset's columns are listed below with their data types and descriptions.

**`customers.csv` columns:**
- `customer_id` - Unique ID (C001, C002, ...)
- `name` - Customer full name
- `city` - Customer city
- `signup_date` - Registration date

**`products.csv` columns:**
- `product_id` - Unique ID (P001, P002, ...)
- `product_name` - Product name
- `category` - Product category (Electronics, Clothing, Home & Garden, Books, Sports)
- `price` - Product price in dollars

**`purchases.csv` columns:**
- `purchase_id` - Unique ID (T0001, T0002, ...)
- `customer_id` - Links to customers
- `product_id` - Links to products
- `quantity` - Number of items purchased
- `purchase_date` - Purchase date
- `store` - Store location (Store A, B, or C)

---

## Question 1: Merging Datasets

### Part A: Basic Merge Operations

Load the datasets and perform merge operations.

In [54]:
# TODO: Load the three datasets
customers = pd.read_csv('data/customers.csv')  # Load data/customers.csv
products = pd.read_csv('data/products.csv')   # Load data/products.csv
purchases = pd.read_csv('data/purchases.csv')  # Load data/purchases.csv

# Display first few rows of each
print("Customers:")
display(customers.head())
print("\nProducts:")
display(products.head())
print("\nPurchases:")
display(purchases.head())

Customers:


Unnamed: 0,customer_id,name,city,signup_date
0,C001,George Chen,Portland,2023-01-01
1,C002,Teresa Miller,Sacramento,2023-01-04
2,C003,Diana Rodriguez,Los Angeles,2023-01-07
3,C004,Eric Lee,San Francisco,2023-01-10
4,C005,George Miller,Seattle,2023-01-13



Products:


Unnamed: 0,product_id,product_name,category,price
0,P001,Laptop,Electronics,1097.56
1,P002,Mouse,Electronics,457.12
2,P003,Keyboard,Electronics,85.26
3,P004,Monitor,Electronics,985.93
4,P005,Tablet,Electronics,306.81



Purchases:


Unnamed: 0,purchase_id,customer_id,product_id,quantity,purchase_date,store
0,T0001,C011,P042,1,2023-01-01 00:00:00,Store C
1,T0002,C070,P001,2,2023-01-01 04:00:00,Store A
2,T0003,C075,P029,1,2023-01-01 08:00:00,Store A
3,T0004,C053,P045,1,2023-01-01 12:00:00,Store C
4,T0005,C079,P049,3,2023-01-01 16:00:00,Store B


In [55]:
# TODO: Merge purchases with customers (left join)
# Keep all purchases, add customer information
purchase_customer = pd.merge(purchases, customers, on = 'customer_id', how = 'left')

display(purchase_customer.head(10))

Unnamed: 0,purchase_id,customer_id,product_id,quantity,purchase_date,store,name,city,signup_date
0,T0001,C011,P042,1,2023-01-01 00:00:00,Store C,Michael Rodriguez,Seattle,2023-01-31
1,T0002,C070,P001,2,2023-01-01 04:00:00,Store A,Eric Brown,Sacramento,2023-07-27
2,T0003,C075,P029,1,2023-01-01 08:00:00,Store A,Alice Wilson,Los Angeles,2023-08-11
3,T0004,C053,P045,1,2023-01-01 12:00:00,Store C,Diana Kim,Sacramento,2023-06-06
4,T0005,C079,P049,3,2023-01-01 16:00:00,Store B,Hannah Patel,Portland,2023-08-23
5,T0006,C010,P035,1,2023-01-01 20:00:00,Store C,Bob Jackson,Los Angeles,2023-01-28
6,T0007,C011,P040,3,2023-01-02 00:00:00,Store C,Michael Rodriguez,Seattle,2023-01-31
7,T0008,C092,P014,1,2023-01-02 04:00:00,Store C,Steve Garcia,Los Angeles,2023-10-01
8,T0009,C060,P035,1,2023-01-02 08:00:00,Store B,Alice Brown,Portland,2023-06-27
9,T0010,C030,P048,1,2023-01-02 12:00:00,Store B,Teresa Davis,Seattle,2023-03-29


In [56]:
# TODO: Merge the result with products to add product information
# Use left join to keep all purchases
full_data = pd.merge(purchase_customer, products, on = 'product_id', how = 'left')

display(full_data.head(10))

Unnamed: 0,purchase_id,customer_id,product_id,quantity,purchase_date,store,name,city,signup_date,product_name,category,price
0,T0001,C011,P042,1,2023-01-01 00:00:00,Store C,Michael Rodriguez,Seattle,2023-01-31,Dumbbells,Sports,402.17
1,T0002,C070,P001,2,2023-01-01 04:00:00,Store A,Eric Brown,Sacramento,2023-07-27,Laptop,Electronics,1097.56
2,T0003,C075,P029,1,2023-01-01 08:00:00,Store A,Alice Wilson,Los Angeles,2023-08-11,Candle,Home & Garden,162.72
3,T0004,C053,P045,1,2023-01-01 12:00:00,Store C,Diana Kim,Sacramento,2023-06-06,Running Shoes,Sports,400.96
4,T0005,C079,P049,3,2023-01-01 16:00:00,Store B,Hannah Patel,Portland,2023-08-23,Bicycle,Sports,60.58
5,T0006,C010,P035,1,2023-01-01 20:00:00,Store C,Bob Jackson,Los Angeles,2023-01-28,Magazine,Books,21.74
6,T0007,C011,P040,3,2023-01-02 00:00:00,Store C,Michael Rodriguez,Seattle,2023-01-31,Reference Book,Books,30.45
7,T0008,C092,P014,1,2023-01-02 04:00:00,Store C,Steve Garcia,Los Angeles,2023-10-01,Jacket,Clothing,145.27
8,T0009,C060,P035,1,2023-01-02 08:00:00,Store B,Alice Brown,Portland,2023-06-27,Magazine,Books,21.74
9,T0010,C030,P048,1,2023-01-02 12:00:00,Store B,Teresa Davis,Seattle,2023-03-29,Jump Rope,Sports,197.16


In [57]:
# TODO: Calculate total_price for each purchase
# Multiply quantity by price to get the total cost
# Round to 2 decimal places
# Hint: full_data['total_price'] = (full_data['quantity'] * full_data['price']).round(2)

full_data['total_price'] = (full_data['quantity'] * full_data['price']).round(2)

display(full_data.head(10))

Unnamed: 0,purchase_id,customer_id,product_id,quantity,purchase_date,store,name,city,signup_date,product_name,category,price,total_price
0,T0001,C011,P042,1,2023-01-01 00:00:00,Store C,Michael Rodriguez,Seattle,2023-01-31,Dumbbells,Sports,402.17,402.17
1,T0002,C070,P001,2,2023-01-01 04:00:00,Store A,Eric Brown,Sacramento,2023-07-27,Laptop,Electronics,1097.56,2195.12
2,T0003,C075,P029,1,2023-01-01 08:00:00,Store A,Alice Wilson,Los Angeles,2023-08-11,Candle,Home & Garden,162.72,162.72
3,T0004,C053,P045,1,2023-01-01 12:00:00,Store C,Diana Kim,Sacramento,2023-06-06,Running Shoes,Sports,400.96,400.96
4,T0005,C079,P049,3,2023-01-01 16:00:00,Store B,Hannah Patel,Portland,2023-08-23,Bicycle,Sports,60.58,181.74
5,T0006,C010,P035,1,2023-01-01 20:00:00,Store C,Bob Jackson,Los Angeles,2023-01-28,Magazine,Books,21.74,21.74
6,T0007,C011,P040,3,2023-01-02 00:00:00,Store C,Michael Rodriguez,Seattle,2023-01-31,Reference Book,Books,30.45,91.35
7,T0008,C092,P014,1,2023-01-02 04:00:00,Store C,Steve Garcia,Los Angeles,2023-10-01,Jacket,Clothing,145.27,145.27
8,T0009,C060,P035,1,2023-01-02 08:00:00,Store B,Alice Brown,Portland,2023-06-27,Magazine,Books,21.74,21.74
9,T0010,C030,P048,1,2023-01-02 12:00:00,Store B,Teresa Davis,Seattle,2023-03-29,Jump Rope,Sports,197.16,197.16


### Part B: Join Type Analysis

Compare different join types to understand data relationships.

In [58]:
# TODO: Inner join - only customers who made purchases
inner_result = pd.merge(customers, purchases, on = 'customer_id', how = 'inner')

print(f"Inner join result: {len(inner_result)} rows")
display(inner_result.head())

Inner join result: 2000 rows


Unnamed: 0,customer_id,name,city,signup_date,purchase_id,product_id,quantity,purchase_date,store
0,C002,Teresa Miller,Sacramento,2023-01-04,T0537,P010,1,2023-03-31 08:00:00,Store B
1,C002,Teresa Miller,Sacramento,2023-01-04,T1142,P014,1,2023-07-10 04:00:00,Store B
2,C002,Teresa Miller,Sacramento,2023-01-04,T1199,P024,3,2023-07-19 16:00:00,Store A
3,C002,Teresa Miller,Sacramento,2023-01-04,T1617,P035,4,2023-09-27 08:00:00,Store C
4,C002,Teresa Miller,Sacramento,2023-01-04,T1637,P037,3,2023-09-30 16:00:00,Store C


In [59]:
# TODO: Left join - all customers (including those with no purchases)
left_result = pd.merge(customers, purchases, on = 'customer_id', how = 'left')

print(f"Left join result: {len(left_result)} rows")
display(left_result.head())

Left join result: 2001 rows


Unnamed: 0,customer_id,name,city,signup_date,purchase_id,product_id,quantity,purchase_date,store
0,C001,George Chen,Portland,2023-01-01,,,,,
1,C002,Teresa Miller,Sacramento,2023-01-04,T0537,P010,1.0,2023-03-31 08:00:00,Store B
2,C002,Teresa Miller,Sacramento,2023-01-04,T1142,P014,1.0,2023-07-10 04:00:00,Store B
3,C002,Teresa Miller,Sacramento,2023-01-04,T1199,P024,3.0,2023-07-19 16:00:00,Store A
4,C002,Teresa Miller,Sacramento,2023-01-04,T1617,P035,4.0,2023-09-27 08:00:00,Store C


In [60]:
# TODO: Find customers who haven't made any purchases
# Hint: Use left join result and check where purchase_id is NaN
# Use .isna() to find NaN values: left_result[left_result['purchase_id'].isna()]
no_purchases = left_result[left_result['purchase_id'].isna()]

print(f"Customers with no purchases: {len(no_purchases)}")
display(no_purchases.head())

Customers with no purchases: 1


Unnamed: 0,customer_id,name,city,signup_date,purchase_id,product_id,quantity,purchase_date,store
0,C001,George Chen,Portland,2023-01-01,,,,,


### Part C: Multi-Column Merge

Merge on multiple columns when single columns aren't unique enough.

In [61]:
# Create store-specific product pricing
# (Different stores may have different prices for same product)
store_pricing = pd.DataFrame({
    'product_id': ['P001', 'P001', 'P002', 'P002', 'P003', 'P003'],
    'store': ['Store A', 'Store B', 'Store A', 'Store B', 'Store A', 'Store B'],
    'discount_pct': [5, 10, 8, 5, 0, 15]
})

# TODO: Merge purchases with store_pricing on BOTH product_id AND store
# Hint: Use on=['product_id', 'store']
purchases_with_discount = pd.merge(purchases, store_pricing, on = ['product_id', 'store'], how = 'inner')

display(purchases_with_discount.head(10))

Unnamed: 0,purchase_id,customer_id,product_id,quantity,purchase_date,store,discount_pct
0,T0002,C070,P001,2,2023-01-01 04:00:00,Store A,5
1,T0079,C076,P002,1,2023-01-14 00:00:00,Store A,8
2,T0097,C085,P002,1,2023-01-17 00:00:00,Store B,5
3,T0101,C091,P002,1,2023-01-17 16:00:00,Store A,8
4,T0114,C088,P002,3,2023-01-19 20:00:00,Store A,8
5,T0121,C096,P002,2,2023-01-21 00:00:00,Store B,5
6,T0152,C020,P003,1,2023-01-26 04:00:00,Store A,0
7,T0173,C070,P001,3,2023-01-29 16:00:00,Store A,5
8,T0206,C069,P003,1,2023-02-04 04:00:00,Store A,0
9,T0250,C089,P003,1,2023-02-11 12:00:00,Store A,0


### Part D: Save Results

In [62]:
# Create output directory
os.makedirs('output', exist_ok=True)

# TODO: Save full_data to output/q1_merged_data.csv
# Hint: Use .to_csv() with index=False
full_data.to_csv('output/q1_merged_data.csv', index = False)

print("✓ Saved output/q1_merged_data.csv")

✓ Saved output/q1_merged_data.csv


In [63]:
# Create a validation report
validation_report = f"""
Question 1 Validation Report
============================

Dataset Sizes:
  - Customers: {len(customers)} rows
  - Products: {len(products)} rows
  - Purchases: {len(purchases)} rows

Merge Results:
  - Full merged data: {len(full_data)} rows
  - Inner join: {len(inner_result)} rows
  - Left join: {len(left_result)} rows
  - Customers with no purchases: {len(no_purchases)}

Data Quality:
  - Missing customer names: {full_data['name'].isna().sum()}
  - Missing product names: {full_data['product_name'].isna().sum()}
"""

# TODO: Save validation_report to output/q1_validation.txt
# Hint: Use open() with 'w' mode

with open('output/q1_validation.txt', 'w') as file:
    file.write(validation_report)

print("✓ Saved output/q1_validation.txt")

✓ Saved output/q1_validation.txt


---

## Question 2: Concatenating DataFrames

### Part A: Vertical Concatenation

Combine multiple DataFrames by stacking rows.

In [64]:
# Split purchases into quarterly datasets
q1_purchases = purchases[purchases['purchase_date'] < '2023-04-01']
q2_purchases = purchases[(purchases['purchase_date'] >= '2023-04-01') &
                          (purchases['purchase_date'] < '2023-07-01')]
q3_purchases = purchases[(purchases['purchase_date'] >= '2023-07-01') &
                          (purchases['purchase_date'] < '2023-10-01')]
q4_purchases = purchases[purchases['purchase_date'] >= '2023-10-01']

print(f"Q1: {len(q1_purchases)} purchases")
print(f"Q2: {len(q2_purchases)} purchases")
print(f"Q3: {len(q3_purchases)} purchases")
print(f"Q4: {len(q4_purchases)} purchases")

Q1: 540 purchases
Q2: 546 purchases
Q3: 552 purchases
Q4: 362 purchases


In [65]:
# TODO: Concatenate all quarters back together
# Use ignore_index=True for clean sequential indexing
# Hint: pd.concat([df1, df2, df3, df4], ignore_index=True)
all_purchases = pd.concat([q1_purchases, q2_purchases, q3_purchases, q4_purchases], ignore_index = True)

print(f"Total after concat: {len(all_purchases)} purchases")
display(all_purchases.head())
print(f"\nVerify total rows: {len(q1_purchases)} + {len(q2_purchases)} + {len(q3_purchases)} + {len(q4_purchases)} = {len(all_purchases)}")

Total after concat: 2000 purchases


Unnamed: 0,purchase_id,customer_id,product_id,quantity,purchase_date,store
0,T0001,C011,P042,1,2023-01-01 00:00:00,Store C
1,T0002,C070,P001,2,2023-01-01 04:00:00,Store A
2,T0003,C075,P029,1,2023-01-01 08:00:00,Store A
3,T0004,C053,P045,1,2023-01-01 12:00:00,Store C
4,T0005,C079,P049,3,2023-01-01 16:00:00,Store B



Verify total rows: 540 + 546 + 552 + 362 = 2000


### Part B: Horizontal Concatenation

Add related information as new columns.

In [66]:
# Create customer satisfaction scores (subset of customers)
satisfaction = pd.DataFrame({
    'customer_id': customers['customer_id'].sample(50, random_state=42),
    'satisfaction_score': np.random.randint(1, 11, size=50),
    'survey_date': pd.date_range('2023-12-01', periods=50, freq='D')
})

# Create customer loyalty tier (different subset)
loyalty = pd.DataFrame({
    'customer_id': customers['customer_id'].sample(60, random_state=123),
    'tier': np.random.choice(['Bronze', 'Silver', 'Gold', 'Platinum'], size=60),
    'points': np.random.randint(100, 10000, size=60)
})

# Set customer_id as index for both
satisfaction = satisfaction.set_index('customer_id')
loyalty = loyalty.set_index('customer_id')

print("Satisfaction scores:")
display(satisfaction.head())
print("\nLoyalty tiers:")
display(loyalty.head())

Satisfaction scores:


Unnamed: 0_level_0,satisfaction_score,survey_date
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1
C084,1,2023-12-01
C054,2,2023-12-02
C071,9,2023-12-03
C046,6,2023-12-04
C045,7,2023-12-05



Loyalty tiers:


Unnamed: 0_level_0,tier,points
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1
C009,Platinum,3169
C071,Platinum,7037
C083,Platinum,8695
C029,Gold,1921
C064,Silver,608


In [67]:
# TODO: Horizontal concat to combine satisfaction and loyalty
# Use outer join to keep all customers from both datasets
# Hint: pd.concat([df1, df2], axis=1, join='outer')
customer_metrics = pd.concat([satisfaction, loyalty], axis = 1, join = 'outer')

display(customer_metrics.head(10))

Unnamed: 0_level_0,satisfaction_score,survey_date,tier,points
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
C084,1.0,2023-12-01,,
C054,2.0,2023-12-02,Bronze,3990.0
C071,9.0,2023-12-03,Platinum,7037.0
C046,6.0,2023-12-04,Bronze,7803.0
C045,7.0,2023-12-05,Silver,5163.0
C040,2.0,2023-12-06,,
C023,5.0,2023-12-07,,
C081,9.0,2023-12-08,,
C011,4.0,2023-12-09,,
C001,8.0,2023-12-10,Platinum,436.0


In [68]:
# Handle misaligned indexes - how many NaN values?
print(f"Missing satisfaction scores: {customer_metrics['satisfaction_score'].isna().sum()}")
print(f"Missing loyalty tiers: {customer_metrics['tier'].isna().sum()}")

Missing satisfaction scores: 30
Missing loyalty tiers: 20


In [69]:
# TODO: Save customer_metrics to output/q2_combined_data.csv
# Hint: Use .to_csv() - index will be saved automatically

customer_metrics.to_csv('output/q2_combined_data.csv')
print("✓ Saved output/q2_combined_data.csv")

✓ Saved output/q2_combined_data.csv


---

## Question 3: Reshaping and Analysis

### Part A: Pivot Table Analysis

Transform data to analyze patterns.

In [70]:
# TODO: Load the merged data from Question 1
# This already has purchases merged with customers and products (and total_price calculated)
# Hint: pd.read_csv('output/q1_merged_data.csv')
full_data = pd.read_csv('output/q1_merged_data.csv')

# Add month column for grouping (YYYY-MM format like "2023-01")
full_data['month'] = pd.to_datetime(full_data['purchase_date']).dt.strftime('%Y-%m')

display(full_data.head())

Unnamed: 0,purchase_id,customer_id,product_id,quantity,purchase_date,store,name,city,signup_date,product_name,category,price,total_price,month
0,T0001,C011,P042,1,2023-01-01 00:00:00,Store C,Michael Rodriguez,Seattle,2023-01-31,Dumbbells,Sports,402.17,402.17,2023-01
1,T0002,C070,P001,2,2023-01-01 04:00:00,Store A,Eric Brown,Sacramento,2023-07-27,Laptop,Electronics,1097.56,2195.12,2023-01
2,T0003,C075,P029,1,2023-01-01 08:00:00,Store A,Alice Wilson,Los Angeles,2023-08-11,Candle,Home & Garden,162.72,162.72,2023-01
3,T0004,C053,P045,1,2023-01-01 12:00:00,Store C,Diana Kim,Sacramento,2023-06-06,Running Shoes,Sports,400.96,400.96,2023-01
4,T0005,C079,P049,3,2023-01-01 16:00:00,Store B,Hannah Patel,Portland,2023-08-23,Bicycle,Sports,60.58,181.74,2023-01


In [71]:
# TODO: Create pivot table - sales by category and month
# Use pivot_table to handle duplicate entries (aggregate with sum)
# Hint: pd.pivot_table(df, values='total_price', index='month', columns='category', aggfunc='sum')
sales_pivot = pd.pivot_table(full_data, values = 'total_price', index = 'month', columns = 'category', aggfunc = 'sum')

display(sales_pivot)

category,Books,Clothing,Electronics,Home & Garden,Sports
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2023-01,2780.36,6512.46,42889.88,9442.48,24394.96
2023-02,2311.67,7236.13,56340.97,9535.97,20441.05
2023-03,2828.88,7368.45,58519.75,15252.15,11629.01
2023-04,2628.37,6371.61,43720.79,11304.66,17711.1
2023-05,2246.64,6398.13,49026.15,16001.21,23634.81
2023-06,2616.71,5103.35,67951.56,11604.87,16156.68
2023-07,1905.62,6636.07,68428.6,13898.04,10097.13
2023-08,2614.03,5875.96,39960.77,14272.98,21717.36
2023-09,2162.12,8016.28,44362.79,12473.02,20888.18
2023-10,2240.53,7953.12,36672.7,15580.44,16054.8


In [72]:
# TODO: Save sales_pivot to output/q3_category_sales_wide.csv
# Hint: Use .to_csv()

sales_pivot.to_csv('output/q3_category_sales_wide.csv')
print("✓ Saved output/q3_category_sales_wide.csv")

✓ Saved output/q3_category_sales_wide.csv


### Part B: Melt and Long Format

Convert wide format back to long for different analysis.

In [73]:
# Reset index to make month a column
sales_wide = sales_pivot.reset_index()

# TODO: Melt to convert category columns back to rows
# Hint: pd.melt(df, id_vars=['month'], var_name='category', value_name='sales')
sales_long = pd.melt(sales_wide, id_vars = ['month'], var_name = 'category', value_name = 'sales')

display(sales_long.head(15))

Unnamed: 0,month,category,sales
0,2023-01,Books,2780.36
1,2023-02,Books,2311.67
2,2023-03,Books,2828.88
3,2023-04,Books,2628.37
4,2023-05,Books,2246.64
5,2023-06,Books,2616.71
6,2023-07,Books,1905.62
7,2023-08,Books,2614.03
8,2023-09,Books,2162.12
9,2023-10,Books,2240.53


In [74]:
# TODO: Calculate summary statistics using the long format
# Group by category and calculate total sales, average monthly sales
# Hint: Use .groupby('category')['sales'].agg(['sum', 'mean']) and sort by sum descending
category_summary = sales_long.groupby('category')['sales'].agg(['sum', 'mean']).sort_values('sum', ascending = False)

display(category_summary)

Unnamed: 0_level_0,sum,mean
category,Unnamed: 1_level_1,Unnamed: 2_level_1
Electronics,531620.58,48329.143636
Sports,204377.02,18579.729091
Home & Garden,144286.2,13116.927273
Clothing,75040.71,6821.882727
Books,26924.63,2447.693636


In [75]:
# Create final analysis report
analysis_report = f"""
Question 3 Analysis Report
==========================

Sales by Category (Total):
{category_summary.to_string()}

Time Period:
  - Start: {full_data['purchase_date'].min()}
  - End: {full_data['purchase_date'].max()}
  - Months: {full_data['month'].nunique()}

Top Category: {category_summary.index[0]}
Bottom Category: {category_summary.index[-1]}
"""

# TODO: Save analysis_report to output/q3_analysis_report.txt
# Hint: Use open() with 'w' mode

with open('output/q3_analysis_report.txt', 'w') as file:
    file.write(analysis_report)
    
print("✓ Saved output/q3_analysis_report.txt")

✓ Saved output/q3_analysis_report.txt
