# 🔍 What is an Outlier?

An outlier is a data point that is very different from the rest of the data.

# 🧠 How to Detect Outliers?

# 1. 📊 Using Descriptive Statistics (describe())
df.describe()

Check min, max, mean. If max is too far, it's an outlier.

# 2. 📦 Using IQR (Interquartile Range)
📌 Steps:
Q1 = df['column'].quantile(0.25)
Q3 = df['column'].quantile(0.75)
IQR = Q3 - Q1

lower_limit = Q1 - 1.5 * IQR
upper_limit = Q3 + 1.5 * IQR

outliers = df[(df['column'] < lower_limit) | (df['column'] > upper_limit)]
Anything outside the lower or upper limit is an outlier.

# 3. 🧪 Using Z-score
Z-score shows how many standard deviations a value is from the mean.

from scipy.stats import zscore

df['zscore'] = zscore(df['column'])
df[df['zscore'].abs() > 3]
If Z > 3 or Z < -3 → Outlier

# 4. 📉 Using Boxplot
import seaborn as sns
import matplotlib.pyplot as plt

sns.boxplot(x=df['column'])
plt.show()
Outliers = dots outside the box lines

5. 📈 Using Histplot or Distplot

sns.histplot(df['column'], kde=True)
See if there are any long tails or gaps.

# 🧹 What to Do With Outliers?
Method	When to Use
❌ Remove it	If it's a data entry error or not useful

⛔ Cap or Floor it	If you want to reduce effect but keep the data

🔄 Transform it	Use log, sqrt, etc. to reduce its impact

✅ Keep it	If it's important (e.g., fraud, rare disease)

