### Interquartile Range (IQR) Approach
Q1 (first quartile): Represents the value where 25% of the data falls below it and 75% falls above.
Q2 (second quartile): This is the median of your data, the middle value when ordered.
Q3 (third quartile): Represents the value where 75% of the data falls below it and 25% falls above.

Interquartile Range (IQR):  This is simply the difference between the third quartile (Q3) and the first quartile (Q1). So,

IQR=Q3−Q1.

In layman's terms, the IQR tells us how spread out the data is.

In [None]:
def remove_outliers_iqr(df, columns):
    df_clean = df.copy()
    for col in columns:
        Q1 = df[col].quantile(0.25)
        Q3 = df[col].quantile(0.75)
        IQR = Q3 - Q1
        lower = Q1 - 1.5 * IQR
        upper = Q3 + 1.5 * IQR
        df_clean = df_clean[(df_clean[col] >= lower) & (df_clean[col] <= upper)]
    return df_clean

numeric_columns = df.select_dtypes(include='number').columns.tolist()
df_cleaned = remove_outliers_iqr(df, numeric_columns)

### The z-score Method
In statistics, the z-score (also known as the standard score) tells you how many standard deviations a data point is from the mean of the distribution. The interpretati0n of the z-score is summarized in the following table:

Z-score	Meaning
0	Data point is exactly at the mean
+1	1 standard deviation above the mean
-1	1 standard deviation below the mean
> +3 or < -3	Potential outlier (extreme deviation)
z-Score and the Empirical Rule
The Empirical Rule (also called the 68–95–99.7 Rule) for a normal distribution, which is the foundation for the z-score approach to outlier detection is illustrated by the following diagram

In [None]:
from scipy.stats import zscore

z_scores = df[numeric_columns].apply(zscore)
df_cleaned = df[(abs(z_scores) < 3).all(axis=1)]

### Python Code - Isolation Forest
For example, one can use the sklearn Isolation forest function to remove outliers as follows:

In [None]:
from sklearn.ensemble import IsolationForest

iso = IsolationForest(contamination=0.05, random_state=42)
outliers = iso.fit_predict(df[numeric_columns])
df_cleaned = df[outliers == 1]

Consider keeping outliers if:

- They represent real rare but valid cases (e.g., high-income clients).

- The model is robust to them (e.g., tree-based methods).

- Consider capping extreme values (i.e. using imputation) instead of removing the data points(a.k.a. winsorizing).