**An outlier** is a data point that is significantly different or distant from other points in a dataset. It is unusually high or low compared to the rest of the data.

**Why Remove Outliers?**
Distortion: Outliers can distort statistical analyses and model training by skewing results.
Model Accuracy: They can reduce the accuracy of machine learning models, causing poor predictions or overfitting.
Irrelevance: Outliers may represent errors or rare, irrelevant occurrences that do not contribute to meaningful insights.

**1. Z-Score Method:**
Z-Score is calculated to determine how far a data point is from the mean in terms of standard deviations. If the Z-Score is above a certain threshold (typically 2 or 3), it is considered an outlier.

In [4]:
import numpy as np
import pandas as pd
from scipy import stats

# Sample DataFrame
data = {'Value': [10, 12, 14, 15, 100, 16, 18, 20, 22, 24]}
df = pd.DataFrame(data)

# Calculate Z-Score
z_scores = np.abs(stats.zscore(df['Value']))


# # Remove outliers (Z-Score > 2)
# df_no_outliers = df[z_scores <= 2]

# print("Data after removing outliers using Z-Score:")
# print(df_no_outliers)
df

Unnamed: 0,Value
0,10
1,12
2,14
3,15
4,100
5,16
6,18
7,20
8,22
9,24


**Explanation:**

**Z-Score Calculation:** This step computes how far each data point is from the mean in terms of standard deviations.
Thresholding: We remove rows where the absolute Z-Score is greater than 2, indicating they are outliers.

**2. IQR (Interquartile Range) Method:**
The IQR method uses the first (Q1) and third (Q3) quartiles of the data to calculate the range of acceptable values. Outliers are considered to be values outside of the range (Q1 - 1.5 * IQR, Q3 + 1.5 * IQR).

In [5]:
# Calculate IQR
Q1 = df['Value'].quantile(0.25)
Q3 = df['Value'].quantile(0.75)
IQR = Q3 - Q1

# Determine the outlier boundaries
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

# Remove outliers (values outside of IQR range)
df_no_outliers_iqr = df[(df['Value'] >= lower_bound) & (df['Value'] <= upper_bound)]

print("Data after removing outliers using IQR:")
print(df_no_outliers_iqr)


Data after removing outliers using IQR:
   Value
0     10
1     12
2     14
3     15
5     16
6     18
7     20
8     22
9     24


**Explanation:**

**IQR Calculation:**  The difference between the third and first quartiles (Q3 - Q1).
Thresholding: Values below Q1 - 1.5 * IQR or above Q3 + 1.5 * IQR are considered outliers and removed.