### Pandas DataFrame.join() Method

The join() method in Pandas is used to combine two DataFrames based on their index or a key column. It is mainly used when you want to add columns from one DataFrame to another, similar to SQL JOIN.

### Why is join() used?

 You use join() when:

• You want to merge DataFrames using index values

• You want simple and clean syntax compared to merge()

• You are adding related columns from another DataFrame

• You need SQL-style joins: left, right, inner, outer

### Basic Syntax:

DataFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)

### Parameters Explained (With Examples)

### 1.other (Required)

The DataFrame (or Series) you want to join with.

e.g

df1.join(df2)


In [1]:
# Example

import pandas as pd

# First DataFrame
df1 = pd.DataFrame({
    'Name': ['Ali', 'Sara', 'John'],
    'Age': [25, 30, 28]
}, index=[1, 2, 3])

# Second DataFrame
df2 = pd.DataFrame({
    'City': ['Lahore', 'Karachi', 'Islamabad']
}, index=[1, 2, 3])

# Join df2 to df1
result = df1.join(df2)

print(result)

   Name  Age       City
1   Ali   25     Lahore
2  Sara   30    Karachi
3  John   28  Islamabad


### 2.on

Column in the left DataFrame to join on instead of index.

e.g

df1.join(df2, on='id')

If on is used, df2 must use index as join key.

In [2]:
# Example

import pandas as pd

# First DataFrame
df1 = pd.DataFrame({
    'id': [1, 2, 3],
    'Name': ['Ali', 'Sara', 'John']
})

# Second DataFrame (notice 'id' is the index here)
df2 = pd.DataFrame({
    'City': ['Lahore', 'Karachi', 'Islamabad']
}, index=[1, 2, 3])

# Join df2 to df1 using 'id' column
result = df1.join(df2, on='id')

print(result)

   id  Name       City
0   1   Ali     Lahore
1   2  Sara    Karachi
2   3  John  Islamabad


### 3.how

Type of join to perform.

| Value  | Meaning                                 |
|--------|-----------------------------------------|
| left   | Keep all rows from left DataFrame (default) |
| right  | Keep all rows from right DataFrame          |
| inner  | Keep only matching rows                     |
| outer  | Keep all rows from both DataFrames          |

e.g

df1.join(df2, how='inner')

In [3]:
# Example

import pandas as pd

# First DataFrame
df1 = pd.DataFrame({
    'Name': ['Ali', 'Sara', 'John']
}, index=[1, 2, 3])

# Second DataFrame
df2 = pd.DataFrame({
    'City': ['Lahore', 'Karachi']
}, index=[2, 3])

# Inner join
result = df1.join(df2, how='inner')

print(result)

   Name     City
2  Sara   Lahore
3  John  Karachi


### 4.lsuffix

Suffix added to left DataFrame column names if overlap exists.

e.g

df1.join(df2, lsuffix='_left')

In [7]:
# Example

import pandas as pd

# First DataFrame
df1 = pd.DataFrame({
    'id': [1, 2, 3],
    'Name': ['Ali', 'Sara', 'John']
})

# Second DataFrame with overlapping column 'Name'
df2 = pd.DataFrame({
    'Name': ['Lahore', 'Karachi', 'Islamabad'],
    'Population': [12, 15, 10]
}, index=[1, 2, 3])

# Join with lsuffix
result = df1.join(df2, on='id', lsuffix='_left')

print(result)

   id Name_left       Name  Population
0   1       Ali     Lahore          12
1   2      Sara    Karachi          15
2   3      John  Islamabad          10


### 5.rsuffix

Suffix added to right DataFrame column names if overlap exists.

e.g

df1.join(df2, rsuffix='_right')


If columns overlap, suffixes are mandatory.

In [8]:
# Example

import pandas as pd

# First DataFrame
df1 = pd.DataFrame({
    'id': [1, 2, 3],
    'Name': ['Ali', 'Sara', 'John']
})

# Second DataFrame with overlapping column 'Name'
df2 = pd.DataFrame({
    'Name': ['Lahore', 'Karachi', 'Islamabad'],
    'Population': [12, 15, 10]
}, index=[1, 2, 3])

# Join with rsuffix
result = df1.join(df2, on='id', rsuffix='_right')

print(result)

   id  Name Name_right  Population
0   1   Ali     Lahore          12
1   2  Sara    Karachi          15
2   3  John  Islamabad          10


### 6.sort

Sort the resulting DataFrame by join key.

e.g

df1.join(df2, sort=True)


Default: False

In [9]:
# Example

import pandas as pd

# First DataFrame
df1 = pd.DataFrame({
    'Name': ['Ali', 'Sara', 'John']
}, index=[3, 1, 2])   # Notice shuffled index

# Second DataFrame
df2 = pd.DataFrame({
    'City': ['Lahore', 'Karachi', 'Islamabad']
}, index=[1, 2, 3])

# Join without sort
result1 = df1.join(df2)

# Join with sort=True
result2 = df1.join(df2, sort=True)

print("Without sort:\n", result1)
print("\nWith sort=True:\n", result2)

Without sort:
    Name       City
3   Ali  Islamabad
1  Sara     Lahore
2  John    Karachi

With sort=True:
    Name       City
1  Sara     Lahore
2  John    Karachi
3   Ali  Islamabad


In [10]:
# Join Using Column (on)
df1 = pd.DataFrame({
    'id': [1, 2, 3],
    'name': ['Ali', 'Sara', 'John']
})

df2 = pd.DataFrame({
    'marks': [85, 90, 78]
}, index=[1, 2, 3])

df1.join(df2, on='id')

Unnamed: 0,id,name,marks
0,1,Ali,85
1,2,Sara,90
2,3,John,78


### join() vs merge()
| Feature | `join()` | `merge()` |
|---------|----------|-----------|
| Uses index | Yes (primarily) | Optional |
| Simple syntax | Simple | Complex |
| Column-based join | Limited | Powerful |
| SQL-like joins | Yes | Yes |


Use join() when index-based joining is enough.

Use merge() for complex column-to-column joins.

### Summary

• join() combines DataFrames by index

• Best for adding columns

• Cleaner than merge()

• Supports all SQL join types