In [1]:
import pandas as pd

# Merging, Joining & Concatenating DataFrames

In [2]:
# Sample DataFrame
employees = pd.DataFrame({
    "emp_id": [101, 102, 103],
    "name": ["Onkar", "Amit", "Sara"],
    "dept_id": [1, 2, 1]
})

departments = pd.DataFrame({
    "dept_id": [1, 2],
    "dept_name": ["IT", "HR"]
})

employees, departments

(   emp_id   name  dept_id
 0     101  Onkar        1
 1     102   Amit        2
 2     103   Sara        1,
    dept_id dept_name
 0        1        IT
 1        2        HR)

## PART 1 — pd.merge() (SQL-style joins)

The joins are same as SQL joins.  
It joins the two DataFrames/tables on the common column.  
`.mearge()` is used when we want to combile the tables on columns.

### 1. Inner join (default)

Keeps only matching `dept_id`.  
Same as SQL inner join

In [4]:
pd.merge(employees, departments, on="dept_id")

Unnamed: 0,emp_id,name,dept_id,dept_name
0,101,Onkar,1,IT
1,102,Amit,2,HR
2,103,Sara,1,IT


### 2. Left Join

It keeps all left tables data and gives only matching data from right DataFrame/table.  
If there is no match for left df/table then right table if filled with null values

In [7]:
pd.merge(employees, departments, on="dept_id", how="left")

Unnamed: 0,emp_id,name,dept_id,dept_name
0,101,Onkar,1,IT
1,102,Amit,2,HR
2,103,Sara,1,IT


### 3. Right Join

Opposite to left.

In [8]:
pd.merge(employees, departments, on="dept_id", how="right")

Unnamed: 0,emp_id,name,dept_id,dept_name
0,101,Onkar,1,IT
1,103,Sara,1,IT
2,102,Amit,2,HR


### 4. Outer Join

It keeps all rows from both tables.

In [9]:
pd.merge(employees, departments, on="dept_id", how="outer")

Unnamed: 0,emp_id,name,dept_id,dept_name
0,101,Onkar,1,IT
1,103,Sara,1,IT
2,102,Amit,2,HR


### 5. Merge wih different column Names

In [10]:
pd.merge(
    employees,
    departments,
    left_on="dept_id",
    right_on="dept_id"
)

Unnamed: 0,emp_id,name,dept_id,dept_name
0,101,Onkar,1,IT
1,102,Amit,2,HR
2,103,Sara,1,IT


## PART 2 — .join() (Index-based joining)

Use `.join()` when we want to join tables on a index.

In [12]:
departments.set_index("dept_id", inplace=True)

In [13]:
employees.join(departments, on="dept_id")

Unnamed: 0,emp_id,name,dept_id,dept_name
0,101,Onkar,1,IT
1,102,Amit,2,HR
2,103,Sara,1,IT


The departments table is indexed in `dept_id` and employees is same as previous.  
In code `employees.join(departments, on="dept_id")`.  
Here `on="dept_id"` dept_if refers to employees table.

## PART 3 — pd.concat() (Stacking DataFrames)

### 1. Row-wise concatenation (default)

Column concat with common columns, and if no common then NaN values are filled

In [25]:
df1 = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
df2 = pd.DataFrame({"B": [5, 6], "C": [7, 8]})

In [26]:
pd.concat([df1, df2])

Unnamed: 0,A,B,C
0,1.0,3,
1,2.0,4,
0,,5,7.0
1,,6,8.0


In [27]:
# reset index after concat
pd.concat([df1, df2], ignore_index=True)

Unnamed: 0,A,B,C
0,1.0,3,
1,2.0,4,
2,,5,7.0
3,,6,8.0


### 2. Column-wise concatenation

Row concat with common index's, and if no common then NaN values are filled

In [17]:
pd.concat([df1, df2], axis=1)

Unnamed: 0,A,B,A.1,B.1
0,1,3,5,7
1,2,4,6,8


In [28]:
df3 = pd.DataFrame({"A": [1, 2], "B": [3, 4]}, index=["A", "B"])
df4 = pd.DataFrame({"A": [5, 6], "B": [7, 8]}, index=["B", "C"])
pd.concat([df3, df4], axis=1)

Unnamed: 0,A,B,A.1,B.1
A,1.0,3.0,,
B,2.0,4.0,5.0,7.0
C,,,6.0,8.0


---
# Summary
1. `.merge()` -> SQL style join
2. `.join()` -> Join when joining with index
3. `.concat()` -> like append