## Anomaly Detection
**Anomaly Detection** is the process of identifying data points, patterns, or events that do not fit the expected behavior or normal patterns of a dataset. These unusual points are called `anomalies`, `outliers`, or `novelties`.

---
# 🎯 What is Z-Score?

👉 A **Z-Score** tells **how far** (and in which direction) a data point is from the **mean** (average) **in units of standard deviation**.

## 📈 Formula:
$$
Z = \frac{x - \mu}{\sigma}
$$

Where:
- $(x)$ = data point
- $(\mu)$ = mean of the data
- $(\sigma)$ = standard deviation of the data

---

## 💡 Intuitive Meaning:

| Z-Score | Meaning |
|:-------:|:--------|
| 0       | Exactly at the mean |
| +1      | 1 standard deviation above the mean |
| -1      | 1 standard deviation below the mean |
| +2      | 2 standard deviations above the mean |
| -2      | 2 standard deviations below the mean |

---

✅ So, **Z-Score measures how "unusual" or "normal" a value is** in the context of the whole dataset.


In [1]:
# Import required Packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import zscore

In [2]:
# Read the inputs
input_data = pd.DataFrame([10, 12, 14, 15, 16, 18, 20, 24, 30, 100], columns=['input'])

In [3]:
# View the inputs
input_data

Unnamed: 0,input
0,10
1,12
2,14
3,15
4,16
5,18
6,20
7,24
8,30
9,100


# Manual Calculation for Detecting Outliers using Z-Score
## Step 1: Calculate Mean (μ)

$$
\text{Mean} = \frac{10 + 12 + 14 + 15 + 16 + 18 + 20 + 24 + 30 + 100}{10} = \frac{259}{10} = 25.9
$$

✅ **Mean (μ) = 25.9**

---

## Step 2: Calculate Standard Deviation (σ)

### Formula:

$$
\sigma = \sqrt{\frac{1}{N} \sum (x_i - \mu)^2}
$$

First, let's find $((x_i - \mu)^2)$ for each value:

| Value (x) | \(x - 25.9\) | \((x - 25.9)^2\) |
|:---------:|:------------:|:----------------:|
| 10        | -15.9        | 252.81            |
| 12        | -13.9        | 193.21            |
| 14        | -11.9        | 141.61            |
| 15        | -10.9        | 118.81            |
| 16        | -9.9         | 98.01             |
| 18        | -7.9         | 62.41             |
| 20        | -5.9         | 34.81             |
| 24        | -1.9         | 3.61              |
| 30        | 4.1          | 16.81             |
| 100       | 74.1         | 5480.81           |

Sum of squares = **6402.9**

Now,

$$
\sigma = \sqrt{\frac{6402.9}{10}} = \sqrt{640.29} \approx 25.3
$$

✅ **Standard Deviation (σ) ≈ 25.3**

---

## Step 3: Calculate Z-Score for Each Value

### Formula:

$$
Z = \frac{x - \mu}{\sigma}
$$

Now calculating:

| Value (x) | Z-Score |
|:---------:|:-------:|
| 10        | (10 - 25.9)/25.3 ≈ -0.627 |
| 12        | (12 - 25.9)/25.3 ≈ -0.552 |
| 14        | (14 - 25.9)/25.3 ≈ -0.467 |
| 15        | (15 - 25.9)/25.3 ≈ -0.428 |
| 16        | (16 - 25.9)/25.3 ≈ -0.391 |
| 18        | (18 - 25.9)/25.3 ≈ -0.312 |
| 20        | (20 - 25.9)/25.3 ≈ -0.233 |
| 24        | (24 - 25.9)/25.3 ≈ -0.075 |
| 30        | (30 - 25.9)/25.3 ≈ 0.162 |
| 100       | (100 - 25.9)/25.3 ≈ 2.937 |

---


# 🔹 Finding Lower and Upper Bounds using Z-score

There are two common situations:

## 1. If you are given a mean and standard deviation, and a z-score:

You can find the **raw value** (actual lower and upper bounds) using:

$$
x = \mu + (z \times \sigma)
$$

where:
- $( x )$ = raw score (bound)
- $( \mu )$ = mean
- $( \sigma )$ = standard deviation
- $( z )$ = z-score

### Let's assume we use the common z-scores for a 95% confidence level:

- Lower bound z-score: $( z = -1.96 )$
- Upper bound z-score: $( z = 1.96 )$

#### Lower Bound:
$$
x_{\text{lower}} = 25.9 + (-1.96 \times 25.3) = 25.9 - 49.6 = -23.7
$$

#### Upper Bound:
$$
x_{\text{upper}} = 25.9 + (1.96 \times 25.3) = 25.9 + 49.6 = 75.5
$$

---
## Step 4: Detect Outliers

👉 **Threshold**: $(|Z| > 3)$ → considered an outlier.

- 100 has a Z-score ≈ 2.937 (almost 3!) and `grater than 75`

✅ **So, 100 is an outlier!**

In [4]:
# Calculate Z-scores using scipy
z_scores = zscore(input_data)

In [5]:
z_scores

array([[-0.62787023],
       [-0.54889284],
       [-0.46991545],
       [-0.43042676],
       [-0.39093807],
       [-0.31196068],
       [-0.23298329],
       [-0.07502852],
       [ 0.16190364],
       [ 2.92611219]])

In [6]:
# Detect outliers (|Z| > 1.9)
# Let's assume we use the common z-scores for a 95% confidence level
outliers = input_data[np.abs(z_scores) > 1.9]

In [7]:
outliers

Unnamed: 0,input
9,100
