### Pandas concat()

### What is concat()?

concat() is used to combine multiple DataFrames or Series along rows or columns.

Think of it as stacking data (unlike merge() which matches keys).

##### Basic Syntax

pd.concat(
    objs,
    axis=0,
    join='outer',
    ignore_index=False,
    keys=None,
    levels=None,
    names=None,
    verify_integrity=False,
    sort=False,
    copy=True
)

Parameters Explained One by One

### 1.objs

A list or dictionary of DataFrames/Series to combine.

e.g

pd.concat([df1, df2])

Mandatory parameter

In [1]:
# Example

import pandas as pd

# First DataFrame
df1 = pd.DataFrame({
    'Name': ['Ali', 'Sara'],
    'Department': ['HR', 'IT']
})

# Second DataFrame
df2 = pd.DataFrame({
    'Name': ['John', 'Ayesha'],
    'Department': ['Finance', 'Marketing']
})

# Concatenate
result = pd.concat([df1, df2])

print(result)

     Name Department
0     Ali         HR
1    Sara         IT
0    John    Finance
1  Ayesha  Marketing


### 2.axis

Defines direction of concatenation.

axis=0  # Row-wise (default)

axis=1  # Column-wise

e.g

pd.concat([df1, df2], axis=0)  # stack rows

pd.concat([df1, df2], axis=1)  # add columns

### 1. pd.concat([df1, df2], axis=0) → Stack Rows (Vertical Concatenation)

• axis=0 means concatenate row-wise (default).

• It places df2 below df1, creating a longer DataFrame.

In [2]:
# Example
import pandas as pd

df1 = pd.DataFrame({
    'Name': ['Ali', 'Sara'],
    'Department': ['HR', 'IT']
})

df2 = pd.DataFrame({
    'Name': ['John', 'Ayesha'],
    'Department': ['Finance', 'Marketing']
})

result = pd.concat([df1, df2], axis=0)
print(result)

     Name Department
0     Ali         HR
1    Sara         IT
0    John    Finance
1  Ayesha  Marketing


### 2. pd.concat([df1, df2], axis=1) → Add Columns (Horizontal Concatenation)

• axis=1 means concatenate column-wise.

• It places df2 next to df1, creating a wider DataFrame.

In [3]:
result = pd.concat([df1, df2], axis=1)
print(result)

   Name Department    Name Department
0   Ali         HR    John    Finance
1  Sara         IT  Ayesha  Marketing


### 3.join

How to handle non-matching columns.

join='outer'  # default (union)

join='inner'  # intersection

e.g

pd.concat([df1, df2], join='inner')

Important for datasets with different columns.

In [4]:
# Example
import pandas as pd

df1 = pd.DataFrame({
    'Name': ['Ali', 'Sara'],
    'Department': ['HR', 'IT'],
    'Salary': [40000, 50000]
})

df2 = pd.DataFrame({
    'Name': ['John', 'Ayesha'],
    'Department': ['Finance', 'Marketing'],
    'Age': [30, 28]
})

# Concatenate with inner join
result = pd.concat([df1, df2], join='inner', axis=0)

print(result)

     Name Department
0     Ali         HR
1    Sara         IT
0    John    Finance
1  Ayesha  Marketing


In [5]:
# If You Use join='outer'
pd.concat([df1, df2], join='outer', axis=0)
#  Keeps all columns (Name, Department, Salary, Age) and fills missing values with NaN.

Unnamed: 0,Name,Department,Salary,Age
0,Ali,HR,40000.0,
1,Sara,IT,50000.0,
0,John,Finance,,30.0
1,Ayesha,Marketing,,28.0


### 4.ignore_index

Resets index in the result.

ignore_index=True

e.g

pd.concat([df1, df2], ignore_index=True)

Recommended for clean datasets & exports

In [6]:
# Example

import pandas as pd

df1 = pd.DataFrame({
    'Name': ['Ali', 'Sara'],
    'Department': ['HR', 'IT']
})

df2 = pd.DataFrame({
    'Name': ['John', 'Ayesha'],
    'Department': ['Finance', 'Marketing']
})

# Concatenate with ignore_index
result = pd.concat([df1, df2], ignore_index=True)

print(result)

     Name Department
0     Ali         HR
1    Sara         IT
2    John    Finance
3  Ayesha  Marketing


### 5.keys

Adds a hierarchical index to identify data source.

pd.concat([df1, df2], keys=['Jan', 'Feb'])


Useful for time-based or multi-source projects

In [7]:
# Example
import pandas as pd

df1 = pd.DataFrame({
    'Name': ['Ali', 'Sara'],
    'Department': ['HR', 'IT']
})

df2 = pd.DataFrame({
    'Name': ['John', 'Ayesha'],
    'Department': ['Finance', 'Marketing']
})

# Concatenate with keys
result = pd.concat([df1, df2], keys=['Jan', 'Feb'])

print(result)

         Name Department
Jan 0     Ali         HR
    1    Sara         IT
Feb 0    John    Finance
    1  Ayesha  Marketing


### 6.levels

Defines levels for MultiIndex (advanced).

pd.concat([df1, df2], levels=[['A', 'B']])

Rarely used in beginner projects.

In [8]:
# Example
import pandas as pd

df1 = pd.DataFrame({
    'Name': ['Ali', 'Sara'],
    'Department': ['HR', 'IT']
})

df2 = pd.DataFrame({
    'Name': ['John', 'Ayesha'],
    'Department': ['Finance', 'Marketing']
})

# Concatenate with keys and levels
result = pd.concat([df1, df2], keys=['A', 'B'], levels=[['A', 'B']])

print(result)

       Name Department
A 0     Ali         HR
  1    Sara         IT
B 0    John    Finance
  1  Ayesha  Marketing


### 7.names

Names of MultiIndex levels.

pd.concat([df1, df2], keys=['2024', '2025'], names=['Year'])

Improves readability.

In [9]:
# Example
import pandas as pd

df1 = pd.DataFrame({
    'Name': ['Ali', 'Sara'],
    'Department': ['HR', 'IT']
})

df2 = pd.DataFrame({
    'Name': ['John', 'Ayesha'],
    'Department': ['Finance', 'Marketing']
})

# Concatenate with keys and names
result = pd.concat([df1, df2], keys=['2024', '2025'], names=['Year'])

print(result)

          Name Department
Year                     
2024 0     Ali         HR
     1    Sara         IT
2025 0    John    Finance
     1  Ayesha  Marketing


### 8.verify_integrity

Checks for duplicate indexes.

verify_integrity=True

Raises error if duplicate index exists.

In [10]:
# Example 1: No Duplicate Index (Works Fine)
import pandas as pd

df1 = pd.DataFrame({'Name': ['Ali', 'Sara']}, index=[0, 1])
df2 = pd.DataFrame({'Name': ['John', 'Ayesha']}, index=[2, 3])

result = pd.concat([df1, df2], verify_integrity=True)
print(result)

     Name
0     Ali
1    Sara
2    John
3  Ayesha


In [18]:
# Example 2: Duplicate Index (Error Raised)
df1 = pd.DataFrame({'Name': ['Ali', 'Sara']}, index=[0, 1])
df2 = pd.DataFrame({'Name': ['John', 'Ayesha']}, index=[0, 1])

# This will raise an error
result = pd.concat([df1, df2], verify_integrity= True)
result

ValueError: Indexes have overlapping values: Index([0, 1], dtype='int64')

### 9.sort

Sorts non-matching columns.

sort=False  # default
sort=True

Keep False for better performance.

In [19]:
# Example

import pandas as pd

df1 = pd.DataFrame({
    'Name': ['Ali', 'Sara'],
    'Department': ['HR', 'IT']
})

df2 = pd.DataFrame({
    'Age': [25, 30],
    'Name': ['John', 'Ayesha']
})

# Concatenate with sort=False (default)
result1 = pd.concat([df1, df2], sort=False)

# Concatenate with sort=True
result2 = pd.concat([df1, df2], sort=True)

print("sort=False:\n", result1, "\n")
print("sort=True:\n", result2)

sort=False:
      Name Department   Age
0     Ali         HR   NaN
1    Sara         IT   NaN
0    John        NaN  25.0
1  Ayesha        NaN  30.0 

sort=True:
     Age Department    Name
0   NaN         HR     Ali
1   NaN         IT    Sara
0  25.0        NaN    John
1  30.0        NaN  Ayesha


### 10.copy

Controls data copying.

copy=True  # default

Usually left unchanged.

In [20]:
# Example 1: With copy=True (Safe Copy)
import pandas as pd

df1 = pd.DataFrame({'A': [1, 2, 3]})

# Create a copy
df2 = pd.DataFrame(df1, copy=True)

# Modify df2
df2.loc[0, 'A'] = 100

print("df1:\n", df1)
print("df2:\n", df2)

df1:
    A
0  1
1  2
2  3
df2:
      A
0  100
1    2
2    3


In [21]:
# Example 2: With copy=False (Shared Reference)
df1 = pd.DataFrame({'A': [1, 2, 3]})

# Try to avoid copying
df2 = pd.DataFrame(df1, copy=False)

# Modify df2
df2.loc[0, 'A'] = 100

print("df1:\n", df1)
print("df2:\n", df2)

df1:
      A
0  100
1    2
2    3
df2:
      A
0  100
1    2
2    3


In [3]:
# Row-Wise Concatenation
# pd.concat([df1, df2], ignore_index=True)
import pandas as pd

df1 = pd.DataFrame({'Name': ['Ali', 'Sara']})
df2 = pd.DataFrame({'Name': ['John', 'Ayesha']})

result = pd.concat([df1, df2], ignore_index=True)
print(result)

     Name
0     Ali
1    Sara
2    John
3  Ayesha


In [4]:
# Column-Wise Concatenation
pd.concat([df1, df2], axis=1)

Unnamed: 0,Name,Name.1
0,Ali,John
1,Sara,Ayesha


In [6]:
# Inner Join Concatenation
pd.concat([df1, df2], join='inner')

Unnamed: 0,Name
0,Ali
1,Sara
0,John
1,Ayesha


In [5]:
# Identify Source Using Keys
pd.concat(
    [df1, df2],
    keys=['Source1', 'Source2'],
    names=['DataSource']
)

Unnamed: 0_level_0,Unnamed: 1_level_0,Name
DataSource,Unnamed: 1_level_1,Unnamed: 2_level_1
Source1,0,Ali
Source1,1,Sara
Source2,0,John
Source2,1,Ayesha
