# Outliers

---

## Sample Data
We will use the following raw data for demonstration:  

- Data: [10, 12, 11, 13, 12, 95, 14, 13, 12]  

Here, **95** looks suspicious compared to the rest of the values.

---

## Definition
An **Outlier** is a data point that lies far away from the majority of other observations in a dataset.  
It may indicate variability in the data, experimental error, or a novel finding.  

Outliers can:  
- Skew the results of statistical tests.  
- Affect the mean and standard deviation.  
- Reduce the accuracy of machine learning models.  

---

## Methods to Detect Outliers

### 1. Z-Score Method
A data point is considered an outlier if:

$$
Z = \frac{X - \bar{X}}{\sigma}
$$

- If \(|Z| > 3\), the point is usually considered an outlier.  

---

### 2. IQR (Interquartile Range) Method
Steps:  
1. Find \(Q1\) (25th percentile) and \(Q3\) (75th percentile).  
2. Compute IQR:  

$$
IQR = Q3 - Q1
$$  

3. Define bounds:  

$$
\text{Lower Bound} = Q1 - 1.5 \times IQR
$$  

$$
\text{Upper Bound} = Q3 + 1.5 \times IQR
$$  

4. Any value outside these bounds is considered an outlier.  

---

## Usage
1. To identify abnormal observations in datasets.

2. To improve the accuracy of statistical models.

3. To clean data before applying machine learning algorithms.

## Application
1. Finance → Detect fraudulent transactions.

2. Healthcare → Identify unusual patient records (e.g., abnormal blood pressure readings).

3. Manufacturing → Detect defective products in quality control.

4. Marketing → Spot unusual customer behavior.

In [1]:
# Computerized Formula (Programming Perspective)

# In Python, detecting outliers with **Z-Score** and **IQR**:

# ```python
import numpy as np
from scipy import stats

# Sample data
data = np.array([10, 12, 11, 13, 12, 95, 14, 13, 12])

# --- Z-Score Method ---
z_scores = np.abs(stats.zscore(data))
outliers_z = data[z_scores > 3]

# --- IQR Method ---
Q1, Q3 = np.percentile(data, [25, 75])
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers_iqr = data[(data < lower_bound) | (data > upper_bound)]

print("Outliers using Z-Score:", outliers_z)
print("Outliers using IQR:", outliers_iqr)


Outliers using Z-Score: []
Outliers using IQR: [10 95]
