# 📓 Lesson 8: Merging and Combining DataFrames
📘 What you will learn:
1. How to combine multiple datasets using:
2. concat() for stacking DataFrames
3. merge() for database-style joins
4. join() for index-based merging
5. How to handle overlapping columns and indexes

## Step 1: Create Two Simple DataFrames
Let’s create two small datasets to understand merging better:

In [None]:
import pandas as pd

# First DataFrame – basic customer info
df_customers = pd.DataFrame({
    'CustomerID': [1, 2, 3],
    'Name': ['Ali', 'Sara', 'Reza']
})

# Second DataFrame – customer orders
df_orders = pd.DataFrame({
    'CustomerID': [1, 2, 4],
    'OrderTotal': [100, 200, 300]
})

print("Customers:\n", df_customers)
print("\nOrders:\n", df_orders)


## Step 2: Merge with merge()
Merge both datasets on CustomerID:

In [None]:
# Inner Join (only matching CustomerID)
merged_df = pd.merge(df_customers, df_orders, on='CustomerID', how='inner')
print(merged_df)

💡 Use merge() when:

- You want to combine two tables based on a common key (like SQL joins)
- You have relational-style data (e.g., Customer → Orders, Employee → Department)

📌 Best for: 

- Combining related datasets using shared columns like UserID, ProductID, etc.

## Step 3: Types of Joins
Try different join types:

In [None]:
# Left Join: keep all customers
left_join = pd.merge(df_customers, df_orders, on='CustomerID', how='left')

# Right Join: keep all orders
right_join = pd.merge(df_customers, df_orders, on='CustomerID', how='right')

# Outer Join: keep everything
outer_join = pd.merge(df_customers, df_orders, on='CustomerID', how='outer')

## Step 4: Concatenating DataFrames
Stacking data row-wise or column-wise:

In [None]:
# Vertical stacking (row-wise)
df1 = pd.DataFrame({'A': ['a', 'b'], 'B': [1, 2]})
df2 = pd.DataFrame({'A': ['c', 'd'], 'B': [3, 4]})

vertical = pd.concat([df1, df2], ignore_index=True)

# Horizontal stacking (column-wise)
horizontal = pd.concat([df1, df2], axis=1)

print("Vertical:\n", vertical)
print("Horizontal:\n", horizontal)


💡 Use concat() when:

- You have multiple datasets with the same structure (same columns)
- You want to stack them vertically (row-wise) or combine them side-by-side (column-wise)

📌 Best for:

- Appending monthly reports
- Adding the same type of data from multiple sources (e.g., df_jan, df_feb)

## Step 5: Joining by Index
Create two DataFrames with index:

In [None]:
df_left = pd.DataFrame({'A': ['Ali', 'Sara']}, index=[1, 2])
df_right = pd.DataFrame({'B': [100, 200]}, index=[2, 3])

# Join by index
joined = df_left.join(df_right, how='outer')
print(joined)


💡 Use join() when:

- You are working with DataFrames indexed by ID, not columns
- You want to combine data based on their index values

📌 Best for:

- Situations where you have aligned data by index, like time series or labeled data



## Practice Exercises
1. Create two DataFrames with a common column (e.g. EmployeeID)
2. Merge them with inner and outer join
3. Use concat() to stack sample data vertically and horizontally
4. Use .join() to join data on index

In [None]:
# Merge example
df_a = pd.DataFrame({'ID': [1, 2], 'Name': ['Ali', 'Sara']})
df_b = pd.DataFrame({'ID': [2, 3], 'City': ['Tehran', 'Shiraz']})

print(pd.merge(df_a, df_b, on='ID', how='outer'))

# Concat example
d1 = pd.DataFrame({'X': [1, 2]})
d2 = pd.DataFrame({'X': [3, 4]})
print(pd.concat([d1, d2], ignore_index=True))

# Join example
df_indexed_1 = pd.DataFrame({'A': ['Yes', 'No']}, index=[1, 2])
df_indexed_2 = pd.DataFrame({'B': ['Win', 'Lose']}, index=[2, 3])
print(df_indexed_1.join(df_indexed_2, how='outer'))

## Summary
In this lesson, you learned:
- How to combine DataFrames using merge(), concat(), and join()
- The difference between inner, left, right, and outer joins
- How to concatenate data vertically or horizontally
- How to join datasets based on index values

👉 In the next lesson, you will learn how to create pivot tables and work with MultiIndex DataFrames.