# Demo 3: Time Series Concatenation and Index Management

## Learning Objectives

- Concatenate DataFrames vertically (stacking rows) and horizontally (adding columns)
- Understand when to use `ignore_index=True` vs preserving indexes
- Master `set_index()` and `reset_index()` for index manipulation
- Handle misaligned indexes during concatenation
- Combine `concat()` and `merge()` in practical workflows
- Work with time-based indexes for temporal data

## Setup

In [1]:
import pandas as pd
import numpy as np

# Set random seed for reproducibility
np.random.seed(42)

## Create Sample Data: Quarterly Sales Reports

We'll simulate monthly sales data that arrives in separate quarterly files.

In [2]:
# Q1 Sales (Jan-Mar 2023)
q1_sales = pd.DataFrame({
    'month': pd.to_datetime(['2023-01-01', '2023-02-01', '2023-03-01']),
    'revenue': [125000, 132000, 145000],
    'units_sold': [1250, 1320, 1450],
    'returns': [50, 45, 60]
})

# Q2 Sales (Apr-Jun 2023)
q2_sales = pd.DataFrame({
    'month': pd.to_datetime(['2023-04-01', '2023-05-01', '2023-06-01']),
    'revenue': [158000, 165000, 178000],
    'units_sold': [1580, 1650, 1780],
    'returns': [55, 70, 65]
})

# Q3 Sales (Jul-Sep 2023)
q3_sales = pd.DataFrame({
    'month': pd.to_datetime(['2023-07-01', '2023-08-01', '2023-09-01']),
    'revenue': [185000, 192000, 175000],
    'units_sold': [1850, 1920, 1750],
    'returns': [80, 75, 68]
})

print("Q1 Sales:")
display(q1_sales)
print("\nQ2 Sales:")
display(q2_sales)
print("\nQ3 Sales:")
display(q3_sales)

Q1 Sales:


Unnamed: 0,month,revenue,units_sold,returns
0,2023-01-01,125000,1250,50
1,2023-02-01,132000,1320,45
2,2023-03-01,145000,1450,60



Q2 Sales:


Unnamed: 0,month,revenue,units_sold,returns
0,2023-04-01,158000,1580,55
1,2023-05-01,165000,1650,70
2,2023-06-01,178000,1780,65



Q3 Sales:


Unnamed: 0,month,revenue,units_sold,returns
0,2023-07-01,185000,1850,80
1,2023-08-01,192000,1920,75
2,2023-09-01,175000,1750,68


**Scenario:** You receive quarterly sales files and need to combine them into a single dataset for annual analysis.

## Vertical Concatenation: Stacking Rows

Use `pd.concat()` to stack DataFrames vertically (add more rows).

In [3]:
# Basic vertical concatenation
year_sales = pd.concat([q1_sales, q2_sales, q3_sales])

print("Combined Sales (Default):")
year_sales

Combined Sales (Default):


Unnamed: 0,month,revenue,units_sold,returns
0,2023-01-01,125000,1250,50
1,2023-02-01,132000,1320,45
2,2023-03-01,145000,1450,60
0,2023-04-01,158000,1580,55
1,2023-05-01,165000,1650,70
2,2023-06-01,178000,1780,65
0,2023-07-01,185000,1850,80
1,2023-08-01,192000,1920,75
2,2023-09-01,175000,1750,68


**Problem:** Notice the index! It repeats (0, 1, 2, 0, 1, 2, 0, 1, 2)

**Why:** Each DataFrame has its own 0-2 index, and concat preserved them.

**Two solutions:**
1. Use `ignore_index=True` to create new sequential index
2. Use `set_index()` to make month the index

In [4]:
# Solution 1: ignore_index=True for clean sequential index
year_sales_clean = pd.concat([q1_sales, q2_sales, q3_sales], ignore_index=True)

print("Combined Sales (Clean Index):")
year_sales_clean

Combined Sales (Clean Index):


Unnamed: 0,month,revenue,units_sold,returns
0,2023-01-01,125000,1250,50
1,2023-02-01,132000,1320,45
2,2023-03-01,145000,1450,60
3,2023-04-01,158000,1580,55
4,2023-05-01,165000,1650,70
5,2023-06-01,178000,1780,65
6,2023-07-01,185000,1850,80
7,2023-08-01,192000,1920,75
8,2023-09-01,175000,1750,68


**Much better!** Now we have a clean 0-8 index.

**When to use `ignore_index=True`:**
- When original indexes don't matter (default numeric indexes)
- When you want clean sequential numbering
- When combining similar datasets from different sources

## Using set_index() for Meaningful Row Labels

For time series data, the date should be the index!

In [5]:
# Solution 2: Use month as index (better for time series!)
year_sales_indexed = pd.concat([q1_sales, q2_sales, q3_sales], ignore_index=True)
year_sales_indexed = year_sales_indexed.set_index('month')

print("Combined Sales (Month as Index):")
year_sales_indexed

Combined Sales (Month as Index):


Unnamed: 0_level_0,revenue,units_sold,returns
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-01-01,125000,1250,50
2023-02-01,132000,1320,45
2023-03-01,145000,1450,60
2023-04-01,158000,1580,55
2023-05-01,165000,1650,70
2023-06-01,178000,1780,65
2023-07-01,185000,1850,80
2023-08-01,192000,1920,75
2023-09-01,175000,1750,68


**Advantages of datetime index:**
- Can select by date: `year_sales_indexed.loc['2023-06']`
- Easy time-based filtering and resampling
- More meaningful than numeric index

In [6]:
# Example: Select Q2 data using datetime index
q2_data = year_sales_indexed.loc['2023-04':'2023-06']
print("Q2 Data (using datetime index):")
q2_data

Q2 Data (using datetime index):


Unnamed: 0_level_0,revenue,units_sold,returns
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-04-01,158000,1580,55
2023-05-01,165000,1650,70
2023-06-01,178000,1780,65


In [7]:
# Example: Calculate quarterly totals
quarterly_totals = year_sales_indexed.resample('QE').sum()
print("\nQuarterly Totals (resample magic!):")
quarterly_totals


Quarterly Totals (resample magic!):


Unnamed: 0_level_0,revenue,units_sold,returns
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-03-31,402000,4020,155
2023-06-30,501000,5010,190
2023-09-30,552000,5520,223


**This is why datetime indexes are powerful for time series!**

## Horizontal Concatenation: Adding Columns

Use `axis=1` to concatenate side-by-side (adding more columns).

In [8]:
# Create additional metrics in separate DataFrames
# Marketing spend data
marketing = pd.DataFrame({
    'month': pd.to_datetime(['2023-01-01', '2023-02-01', '2023-03-01',
                             '2023-04-01', '2023-05-01', '2023-06-01']),
    'ad_spend': [12000, 15000, 18000, 20000, 22000, 25000],
    'impressions': [500000, 600000, 700000, 800000, 850000, 900000]
}).set_index('month')

# Customer satisfaction scores
satisfaction = pd.DataFrame({
    'month': pd.to_datetime(['2023-01-01', '2023-02-01', '2023-03-01',
                             '2023-04-01', '2023-05-01', '2023-06-01']),
    'nps_score': [45, 48, 52, 55, 58, 60],
    'survey_responses': [120, 135, 150, 165, 180, 195]
}).set_index('month')

print("Marketing Data:")
display(marketing.head())
print("\nSatisfaction Data:")
display(satisfaction.head())

Marketing Data:


Unnamed: 0_level_0,ad_spend,impressions
month,Unnamed: 1_level_1,Unnamed: 2_level_1
2023-01-01,12000,500000
2023-02-01,15000,600000
2023-03-01,18000,700000
2023-04-01,20000,800000
2023-05-01,22000,850000



Satisfaction Data:


Unnamed: 0_level_0,nps_score,survey_responses
month,Unnamed: 1_level_1,Unnamed: 2_level_1
2023-01-01,45,120
2023-02-01,48,135
2023-03-01,52,150
2023-04-01,55,165
2023-05-01,58,180


In [9]:
# Get first 6 months of sales for this example
sales_h1 = year_sales_indexed.loc['2023-01':'2023-06']

# Horizontal concatenation (add columns)
combined_metrics = pd.concat([sales_h1, marketing, satisfaction], axis=1)

print("Combined Metrics (Horizontal Concat):")
combined_metrics

Combined Metrics (Horizontal Concat):


Unnamed: 0_level_0,revenue,units_sold,returns,ad_spend,impressions,nps_score,survey_responses
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2023-01-01,125000,1250,50,12000,500000,45,120
2023-02-01,132000,1320,45,15000,600000,48,135
2023-03-01,145000,1450,60,18000,700000,52,150
2023-04-01,158000,1580,55,20000,800000,55,165
2023-05-01,165000,1650,70,22000,850000,58,180
2023-06-01,178000,1780,65,25000,900000,60,195


**What happened:**
- All DataFrames aligned by their **month index**
- Columns from each DataFrame added side-by-side
- Index values matched up automatically

**Key insight:** Horizontal concat uses index for alignment!

## Handling Misaligned Indexes

What happens when indexes don't match perfectly?

In [10]:
# Create data with missing/extra months
partial_data = pd.DataFrame({
    'month': pd.to_datetime(['2023-02-01', '2023-03-01', '2023-04-01', 
                             '2023-07-01']),  # Missing Jan, May, Jun
    'social_engagement': [5000, 5500, 6000, 7000]
}).set_index('month')

print("Partial Data (Missing Some Months):")
display(partial_data)

# Concatenate with misaligned indexes
combined_misaligned = pd.concat([sales_h1, partial_data], axis=1)
print("\nCombined with Misaligned Indexes:")
combined_misaligned

Partial Data (Missing Some Months):


Unnamed: 0_level_0,social_engagement
month,Unnamed: 1_level_1
2023-02-01,5000
2023-03-01,5500
2023-04-01,6000
2023-07-01,7000



Combined with Misaligned Indexes:


Unnamed: 0_level_0,revenue,units_sold,returns,social_engagement
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2023-01-01,125000.0,1250.0,50.0,
2023-02-01,132000.0,1320.0,45.0,5000.0
2023-03-01,145000.0,1450.0,60.0,5500.0
2023-04-01,158000.0,1580.0,55.0,6000.0
2023-05-01,165000.0,1650.0,70.0,
2023-06-01,178000.0,1780.0,65.0,
2023-07-01,,,,7000.0


**Result:** NaN values appear where indexes don't match!

**Default behavior:** `join='outer'` keeps all index values from both DataFrames.

**Alternative:** Use `join='inner'` to keep only matching indexes.

In [11]:
# Inner join - only keep matching months
combined_inner = pd.concat([sales_h1, partial_data], axis=1, join='inner')

print("Combined with Inner Join (Only Matching Months):")
combined_inner

Combined with Inner Join (Only Matching Months):


Unnamed: 0_level_0,revenue,units_sold,returns,social_engagement
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2023-02-01,132000,1320,45,5000
2023-03-01,145000,1450,60,5500
2023-04-01,158000,1580,55,6000


**Now only months present in BOTH DataFrames appear!**

**Common pitfall:** Using horizontal concat when you should use merge. If indexes don't align well, consider `pd.merge()` instead!

## reset_index(): Moving Index Back to Columns

Sometimes you need to convert the index back to a regular column.

In [12]:
# Current state: month is the index
print("Before reset_index():")
display(combined_metrics.head())
print(f"Index name: {combined_metrics.index.name}")
print(f"Columns: {list(combined_metrics.columns)}")

Before reset_index():


Unnamed: 0_level_0,revenue,units_sold,returns,ad_spend,impressions,nps_score,survey_responses
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2023-01-01,125000,1250,50,12000,500000,45,120
2023-02-01,132000,1320,45,15000,600000,48,135
2023-03-01,145000,1450,60,18000,700000,52,150
2023-04-01,158000,1580,55,20000,800000,55,165
2023-05-01,165000,1650,70,22000,850000,58,180


Index name: month
Columns: ['revenue', 'units_sold', 'returns', 'ad_spend', 'impressions', 'nps_score', 'survey_responses']


In [13]:
# Reset index to make month a regular column
combined_reset = combined_metrics.reset_index()

print("\nAfter reset_index():")
display(combined_reset.head())
print(f"Index: {list(combined_reset.index)}")
print(f"Columns: {list(combined_reset.columns)}")


After reset_index():


Unnamed: 0,month,revenue,units_sold,returns,ad_spend,impressions,nps_score,survey_responses
0,2023-01-01,125000,1250,50,12000,500000,45,120
1,2023-02-01,132000,1320,45,15000,600000,48,135
2,2023-03-01,145000,1450,60,18000,700000,52,150
3,2023-04-01,158000,1580,55,20000,800000,55,165
4,2023-05-01,165000,1650,70,22000,850000,58,180


Index: [0, 1, 2, 3, 4, 5]
Columns: ['month', 'revenue', 'units_sold', 'returns', 'ad_spend', 'impressions', 'nps_score', 'survey_responses']


**What happened:**
- `month` moved from index to a regular column
- New default numeric index (0, 1, 2, ...) created

**When to use reset_index():**
- After groupby operations (groups become index)
- Before saving to CSV (indexes aren't always preserved)
- When you need the index as a column for analysis

In [14]:
# Alternative: drop the index instead of converting to column
combined_dropped = combined_metrics.reset_index(drop=True)

print("reset_index(drop=True) - Index Discarded:")
combined_dropped.head()

reset_index(drop=True) - Index Discarded:


Unnamed: 0,revenue,units_sold,returns,ad_spend,impressions,nps_score,survey_responses
0,125000,1250,50,12000,500000,45,120
1,132000,1320,45,15000,600000,48,135
2,145000,1450,60,18000,700000,52,150
3,158000,1580,55,20000,800000,55,165
4,165000,1650,70,22000,850000,58,180


**Use `drop=True` when:** The index contains no useful information.

## Combining concat() and merge() in Workflows

Real-world scenarios often require both operations.

In [15]:
# Step 1: Concatenate quarterly sales files
all_sales = pd.concat([q1_sales, q2_sales, q3_sales], ignore_index=True)
print("Step 1: Concatenated Sales Data")
display(all_sales.head())

Step 1: Concatenated Sales Data


Unnamed: 0,month,revenue,units_sold,returns
0,2023-01-01,125000,1250,50
1,2023-02-01,132000,1320,45
2,2023-03-01,145000,1450,60
3,2023-04-01,158000,1580,55
4,2023-05-01,165000,1650,70


In [16]:
# Step 2: Create product category data
# (This would come from a separate database table in reality)
products = pd.DataFrame({
    'month': pd.to_datetime(['2023-01-01', '2023-02-01', '2023-03-01',
                             '2023-04-01', '2023-05-01', '2023-06-01',
                             '2023-07-01', '2023-08-01', '2023-09-01']),
    'top_category': ['Electronics', 'Electronics', 'Clothing',
                     'Clothing', 'Electronics', 'Home Goods',
                     'Home Goods', 'Electronics', 'Clothing'],
    'new_customers': [120, 135, 150, 165, 180, 195, 210, 225, 240]
})

print("\nStep 2: Product Category Data")
display(products.head())


Step 2: Product Category Data


Unnamed: 0,month,top_category,new_customers
0,2023-01-01,Electronics,120
1,2023-02-01,Electronics,135
2,2023-03-01,Clothing,150
3,2023-04-01,Clothing,165
4,2023-05-01,Electronics,180


In [17]:
# Step 3: Merge sales with product data
sales_enriched = pd.merge(all_sales, products, on='month', how='left')

print("\nStep 3: Merged Sales + Product Data")
display(sales_enriched.head())


Step 3: Merged Sales + Product Data


Unnamed: 0,month,revenue,units_sold,returns,top_category,new_customers
0,2023-01-01,125000,1250,50,Electronics,120
1,2023-02-01,132000,1320,45,Electronics,135
2,2023-03-01,145000,1450,60,Clothing,150
3,2023-04-01,158000,1580,55,Clothing,165
4,2023-05-01,165000,1650,70,Electronics,180


In [18]:
# Step 4: Calculate metrics and analyze
sales_enriched['return_rate'] = (sales_enriched['returns'] / 
                                 sales_enriched['units_sold'] * 100).round(2)
sales_enriched['revenue_per_unit'] = (sales_enriched['revenue'] / 
                                      sales_enriched['units_sold']).round(2)

print("\nStep 4: Final Analysis Dataset")
display(sales_enriched)


Step 4: Final Analysis Dataset


Unnamed: 0,month,revenue,units_sold,returns,top_category,new_customers,return_rate,revenue_per_unit
0,2023-01-01,125000,1250,50,Electronics,120,4.0,100.0
1,2023-02-01,132000,1320,45,Electronics,135,3.41,100.0
2,2023-03-01,145000,1450,60,Clothing,150,4.14,100.0
3,2023-04-01,158000,1580,55,Clothing,165,3.48,100.0
4,2023-05-01,165000,1650,70,Electronics,180,4.24,100.0
5,2023-06-01,178000,1780,65,Home Goods,195,3.65,100.0
6,2023-07-01,185000,1850,80,Home Goods,210,4.32,100.0
7,2023-08-01,192000,1920,75,Electronics,225,3.91,100.0
8,2023-09-01,175000,1750,68,Clothing,240,3.89,100.0


In [19]:
# Step 5: Analyze by product category
category_summary = sales_enriched.groupby('top_category').agg({
    'revenue': 'sum',
    'units_sold': 'sum',
    'new_customers': 'sum',
    'return_rate': 'mean'
}).round(2)

category_summary['avg_revenue_per_unit'] = (
    category_summary['revenue'] / category_summary['units_sold']
).round(2)

print("\nStep 5: Category Summary")
category_summary.sort_values('revenue', ascending=False)


Step 5: Category Summary


Unnamed: 0_level_0,revenue,units_sold,new_customers,return_rate,avg_revenue_per_unit
top_category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Electronics,614000,6140,660,3.89,100.0
Clothing,478000,4780,555,3.84,100.0
Home Goods,363000,3630,405,3.99,100.0


**Complete workflow:**
1. **concat()** - Combine quarterly files (same structure)
2. **merge()** - Add related data from other sources (different structure)
3. **groupby()** - Analyze the enriched dataset

**Key insight:** concat for stacking, merge for joining!

## Tracking Data Sources with keys Parameter

Use `keys` to label where data came from during concatenation.

In [20]:
# Concatenate with source labels
labeled_sales = pd.concat(
    [q1_sales, q2_sales, q3_sales],
    keys=['Q1', 'Q2', 'Q3'],
    names=['quarter', 'month_index']
)

print("Sales with Quarter Labels (MultiIndex):")
labeled_sales

Sales with Quarter Labels (MultiIndex):


Unnamed: 0_level_0,Unnamed: 1_level_0,month,revenue,units_sold,returns
quarter,month_index,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Q1,0,2023-01-01,125000,1250,50
Q1,1,2023-02-01,132000,1320,45
Q1,2,2023-03-01,145000,1450,60
Q2,0,2023-04-01,158000,1580,55
Q2,1,2023-05-01,165000,1650,70
Q2,2,2023-06-01,178000,1780,65
Q3,0,2023-07-01,185000,1850,80
Q3,1,2023-08-01,192000,1920,75
Q3,2,2023-09-01,175000,1750,68


**Created a MultiIndex!**
- Outer level: quarter (Q1, Q2, Q3)
- Inner level: month_index (0, 1, 2)

**Use case:** Track data provenance when combining multiple sources.

In [21]:
# Select all Q2 data using the outer index level
q2_only = labeled_sales.loc['Q2']
print("Q2 Data Only:")
q2_only

Q2 Data Only:


Unnamed: 0_level_0,month,revenue,units_sold,returns
month_index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,2023-04-01,158000,1580,55
1,2023-05-01,165000,1650,70
2,2023-06-01,178000,1780,65


In [22]:
# Flatten the MultiIndex with reset_index
labeled_flat = labeled_sales.reset_index()
print("\nFlattened with Quarter Column:")
labeled_flat


Flattened with Quarter Column:


Unnamed: 0,quarter,month_index,month,revenue,units_sold,returns
0,Q1,0,2023-01-01,125000,1250,50
1,Q1,1,2023-02-01,132000,1320,45
2,Q1,2,2023-03-01,145000,1450,60
3,Q2,0,2023-04-01,158000,1580,55
4,Q2,1,2023-05-01,165000,1650,70
5,Q2,2,2023-06-01,178000,1780,65
6,Q3,0,2023-07-01,185000,1850,80
7,Q3,1,2023-08-01,192000,1920,75
8,Q3,2,2023-09-01,175000,1750,68


**Perfect!** Now we have a `quarter` column showing data source.

## Real-World Application: Year-Over-Year Analysis

Combining techniques to compare 2023 vs 2024 performance.

In [23]:
# Create 2024 Q1 data for comparison
q1_2024 = pd.DataFrame({
    'month': pd.to_datetime(['2024-01-01', '2024-02-01', '2024-03-01']),
    'revenue': [145000, 152000, 168000],
    'units_sold': [1450, 1520, 1680],
    'returns': [48, 52, 58]
})

# Prepare both years with year label
q1_2023_labeled = q1_sales.copy()
q1_2023_labeled['year'] = 2023

q1_2024_labeled = q1_2024.copy()
q1_2024_labeled['year'] = 2024

# Concatenate both years
yoy_data = pd.concat([q1_2023_labeled, q1_2024_labeled], ignore_index=True)

# Add month name for grouping
yoy_data['month_name'] = yoy_data['month'].dt.strftime('%B')

print("Year-Over-Year Q1 Data:")
yoy_data

Year-Over-Year Q1 Data:


Unnamed: 0,month,revenue,units_sold,returns,year,month_name
0,2023-01-01,125000,1250,50,2023,January
1,2023-02-01,132000,1320,45,2023,February
2,2023-03-01,145000,1450,60,2023,March
3,2024-01-01,145000,1450,48,2024,January
4,2024-02-01,152000,1520,52,2024,February
5,2024-03-01,168000,1680,58,2024,March


In [24]:
# Pivot to compare 2023 vs 2024 side-by-side
yoy_comparison = yoy_data.pivot_table(
    index='month_name',
    columns='year',
    values=['revenue', 'units_sold']
)

print("\nYear-Over-Year Comparison (Pivoted):")
yoy_comparison


Year-Over-Year Comparison (Pivoted):


Unnamed: 0_level_0,revenue,revenue,units_sold,units_sold
year,2023,2024,2023,2024
month_name,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
February,132000.0,152000.0,1320.0,1520.0
January,125000.0,145000.0,1250.0,1450.0
March,145000.0,168000.0,1450.0,1680.0


In [25]:
# Calculate growth rates
# Flatten column names for easier access
yoy_flat = yoy_comparison.copy()
yoy_flat.columns = ['_'.join(map(str, col)) for col in yoy_flat.columns]

yoy_flat['revenue_growth_%'] = (
    (yoy_flat['revenue_2024'] - yoy_flat['revenue_2023']) / 
    yoy_flat['revenue_2023'] * 100
).round(1)

yoy_flat['units_growth_%'] = (
    (yoy_flat['units_sold_2024'] - yoy_flat['units_sold_2023']) / 
    yoy_flat['units_sold_2023'] * 100
).round(1)

print("\nYear-Over-Year Growth Analysis:")
yoy_flat[['revenue_2023', 'revenue_2024', 'revenue_growth_%',
          'units_sold_2023', 'units_sold_2024', 'units_growth_%']]


Year-Over-Year Growth Analysis:


Unnamed: 0_level_0,revenue_2023,revenue_2024,revenue_growth_%,units_sold_2023,units_sold_2024,units_growth_%
month_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
February,132000.0,152000.0,15.2,1320.0,1520.0,15.2
January,125000.0,145000.0,16.0,1250.0,1450.0,16.0
March,145000.0,168000.0,15.9,1450.0,1680.0,15.9


**Business insights:**
- February 2024 revenue up **15.2%** vs 2023
- March 2024 shows strongest growth: **15.9%** revenue, **15.9%** units
- Consistent growth across all months

**Workflow used:**
1. **concat()** - Stack 2023 and 2024 data
2. **pivot_table()** - Create side-by-side comparison
3. Calculate derived metrics (growth rates)

## Key Takeaways

1. **concat() for stacking similar DataFrames:**
   - Vertical (`axis=0`): Add more rows (default)
   - Horizontal (`axis=1`): Add more columns
   - Use `ignore_index=True` for clean sequential indexing

2. **set_index() makes columns into indexes:**
   - Essential for time series (use dates as index)
   - Enables powerful time-based operations
   - Makes selection more intuitive

3. **reset_index() moves indexes back to columns:**
   - After groupby operations
   - When saving to files
   - Use `drop=True` to discard index

4. **Index alignment in horizontal concat:**
   - Default: `join='outer'` (keep all indexes)
   - Alternative: `join='inner'` (only matching)
   - Creates NaN where indexes don't match

5. **Common workflow patterns:**
   - **concat → set_index:** Stack files then create meaningful index
   - **concat → merge:** Stack similar data, then join with related data
   - **concat with keys:** Track data sources with MultiIndex

6. **When to use concat vs merge:**
   - **concat:** Same structure, different time periods/sources
   - **merge:** Different structures, need to join by keys

7. **Index management best practices:**
   - Use datetime indexes for time series
   - Use meaningful indexes (not just 0, 1, 2)
   - Reset index before groupby results
   - Set index for better selection

**Practice tip:** Think of concat as "stacking LEGO bricks" - vertically or horizontally. Merge is like "connecting LEGO pieces by their studs" (keys)!