An outlier is a data point that is significantly different from other observations in a dataset. It lies far outside the general distribution of the data.

| Type                     | Description                                                                                 |
| ------------------------ | ------------------------------------------------------------------------------------------- |
| **Univariate Outlier**   | Outlier in a single variable.                                                               |
| **Multivariate Outlier** | Outlier in the relationship between two or more variables.                                  |
| **Global Outlier**       | Deviates from entire dataset.                                                               |
| **Contextual Outlier**   | Normal in one context, outlier in another (e.g., 30°C normal in summer, outlier in winter). |
| **Collective Outlier**   | A group of points that are outliers together.                                               |


| Plot Type        | Use                                  |
| ---------------- | ------------------------------------ |
| **Box Plot**     | Quick view of outliers using IQR.    |
| **Histogram**    | Spot extreme skew or gaps.           |
| **Scatter Plot** | Identify outliers in bivariate data. |
| **Z-score plot** | Highlight extreme standard scores.   |


🔧 How to Handle Outliers


| Method                  | When to Use                                                                  |
| ----------------------- | ---------------------------------------------------------------------------- |
| **Remove**              | When they are clearly errors or not useful (e.g., data entry mistakes).      |
| **Cap (Winsorization)** | Replace extreme values with upper/lower percentiles (e.g., 95th percentile). |
| **Transform**           | Apply log, square root, or Box-Cox transform to reduce the effect.           |
| **Impute**              | Replace with mean, median, or prediction using ML.                           |
| **Treat Separately**    | For example, flag outliers in a new column.                                  |
| **Use robust models**   | Like Random Forests or models based on medians instead of means.             |


IQR Method (Interquartile Range)

In [None]:
import pandas as pd
df = pd.read_csv('x')

Q1 = df['Age'].quantile(0.25)
Q3 = df['Age'].quantile(0.75)

IQR = Q3 - Q1

lower = Q1 - 1.5 * IQR
upper = Q3 + 1.5 * IQR

outliers = df[(df['Age'] < lower) | (df['Age'] > upper)]


## 4 main techniques : 
* Z-score treatment
* IQR based filtering
* percentile
* Winsorization