Threshold-based dropping refers to removing rows or columns from a DataFrame based on how many NaN values they contain.

✅ Syntax

df.dropna(thresh=N, axis=0)

    thresh: Minimum non-NA values required to keep the row/column.

    axis=0: Drop rows (default)

    axis=1: Drop columns

In [1]:
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': [1, 2, np.nan, 4],
    'B': [np.nan, 2, np.nan, 3],
    'C': [1, 2, 3, np.nan],
    'D': [np.nan, np.nan, np.nan, 4]
})

print(df)


     A    B    C    D
0  1.0  NaN  1.0  NaN
1  2.0  2.0  2.0  NaN
2  NaN  NaN  3.0  NaN
3  4.0  3.0  NaN  4.0


In [2]:
#1. Drop Rows with Less Than 3 Non-NA Values
df.dropna(thresh=3)

Unnamed: 0,A,B,C,D
1,2.0,2.0,2.0,
3,4.0,3.0,,4.0


In [3]:
#2. Drop Columns with Less Than 2 Non-NA Values
df.dropna(thresh=2, axis=1)

Unnamed: 0,A,B,C
0,1.0,,1.0
1,2.0,2.0,2.0
2,,,3.0
3,4.0,3.0,


In [4]:
#3. Custom Thresholds (e.g., 50% completeness)
threshold = len(df) * 0.5
df.dropna(thresh=threshold, axis=1)


Unnamed: 0,A,B,C
0,1.0,,1.0
1,2.0,2.0,2.0
2,,,3.0
3,4.0,3.0,


In [5]:
#4. Threshold Drop on Specific Subset
df.dropna(thresh=2, subset=['A', 'B'])

Unnamed: 0,A,B,C,D
1,2.0,2.0,2.0,
3,4.0,3.0,,4.0
