In this subsection, you will learn how to perform different types of merges in pandas, including merging on multiple keys, merging when column names differ, using suffixes, and handling many-to-one or many-to-many merges.

üü¶ 1. Import Library

In [1]:
import pandas as pd

üü¶ 2. Create Sample DataFrames

In [4]:
df_employees = pd.DataFrame({
    "emp_id": [101, 102, 103, 104],
    "name": ["Alice", "Bob", "Charlie", "Diana"],
    "dept_id": [1, 2, 1, 3]
})

df_departments = pd.DataFrame({
    "dept_id": [1, 2, 3],
    "department_name": ["HR", "Finance", "Engineering"]
})

df_employees, df_departments

(   emp_id     name  dept_id
 0     101    Alice        1
 1     102      Bob        2
 2     103  Charlie        1
 3     104    Diana        3,
    dept_id department_name
 0        1              HR
 1        2         Finance
 2        3     Engineering)

üü¶ 2. Merge on a Single Column

In [5]:
merged = pd.merge(df_employees, df_departments, on="dept_id", how="inner")
merged

Unnamed: 0,emp_id,name,dept_id,department_name
0,101,Alice,1,HR
1,102,Bob,2,Finance
2,103,Charlie,1,HR
3,104,Diana,3,Engineering


üü¶ 3. Merge on Multiple Columns

In [6]:
df_locations = pd.DataFrame({
    "dept_id": [1, 1, 2, 3],
    "location": ["Toronto", "Montreal", "Toronto", "Vancouver"]
})

df_employees2 = pd.DataFrame({
    "name": ["Alice", "Bob", "Charlie", "Diana"],
    "dept_id": [1, 2, 1, 3],
    "location": ["Toronto", "Toronto", "Montreal", "Vancouver"]
})

multi_key_merge = pd.merge(df_employees2, df_locations, on=["dept_id", "location"])
multi_key_merge

Unnamed: 0,name,dept_id,location
0,Alice,1,Toronto
1,Bob,2,Toronto
2,Charlie,1,Montreal
3,Diana,3,Vancouver


üü¶ 4. Merging When Column Names Differ

In [7]:
df_salary = pd.DataFrame({
    "id": [101, 102, 104],
    "salary": [70000, 80000, 95000]
})

merge_diff_names = pd.merge(
    df_employees,
    df_salary,
    left_on="emp_id",
    right_on="id",
    how="left"
)

merge_diff_names

Unnamed: 0,emp_id,name,dept_id,id,salary
0,101,Alice,1,101.0,70000.0
1,102,Bob,2,102.0,80000.0
2,103,Charlie,1,,
3,104,Diana,3,104.0,95000.0


üü¶ 5. Handling Overlapping Column Names with suffixes

In [8]:
df_manager = pd.DataFrame({
    "emp_id": [101, 102],
    "name": ["Alice", "Bob"],
    "level": ["Manager", "Senior Manager"]
})

suffix_merge = pd.merge(df_employees, df_manager, on="emp_id", how="left", suffixes=("_employee", "_manager"))
suffix_merge

Unnamed: 0,emp_id,name_employee,dept_id,name_manager,level
0,101,Alice,1,Alice,Manager
1,102,Bob,2,Bob,Senior Manager
2,103,Charlie,1,,
3,104,Diana,3,,


üü¶ 6. Many-to-One Merge

In [9]:
many_to_one = pd.merge(df_employees, df_departments, on="dept_id", how="left")
many_to_one

Unnamed: 0,emp_id,name,dept_id,department_name
0,101,Alice,1,HR
1,102,Bob,2,Finance
2,103,Charlie,1,HR
3,104,Diana,3,Engineering


üü¶ 7. Many-to-Many Merge

In [11]:
### Departments with multiple employees AND multiple projects ‚Üí every combination is created.
df_projects = pd.DataFrame({
    "dept_id": [1, 1, 2],
    "project": ["A", "B", "C"]
})

many_to_many = pd.merge(df_employees, df_projects, on="dept_id", how="inner")
many_to_many

Unnamed: 0,emp_id,name,dept_id,project
0,101,Alice,1,A
1,101,Alice,1,B
2,102,Bob,2,C
3,103,Charlie,1,A
4,103,Charlie,1,B


###üü¶ Summary

In this subsection you learned how to:

‚úîÔ∏è Merge on single or multiple columns

‚úîÔ∏è Merge DataFrames with different column names

‚úîÔ∏è Add suffixes to manage overlapping column names

‚úîÔ∏è Understand many-to-one and many-to-many merges