# Modified Z-Score

This is a robust alternative to the standard Z-score that uses the median instead of the mean, making it resistant to the influence of outliers.

Why We Need It:

Standard Z-score fails when outliers inflate the mean and standard deviation

MAD uses median and median absolute deviation which are not skewed by extreme values

The Formulas:

MAD (Median Absolute Deviation):

``` bash
MAD = median(|x_i - median(X)|)
Modified Z-Score:


Modified Z = 0.6745 * (x_i - median(X)) / MAD
Outlier Detection Rule:

If |Modified Z| > 3.5, the point is considered an outlier

The constant 0.6745 makes MAD consistent with standard deviation for normal distributions
```
Step-by-Step Example
Let's use the same problematic dataset where standard Z-score failed:
```bash
Dataset: [10, 12, 12, 13, 14, 15, 16, 120]

Step 1: Calculate Median


Sorted data: [10, 12, 12, 13, 14, 15, 16, 120]
Median = (13 + 14) / 2 = 13.5
Step 2: Calculate Absolute Deviations from Median


|10 - 13.5| = 3.5
|12 - 13.5| = 1.5
|12 - 13.5| = 1.5
|13 - 13.5| = 0.5
|14 - 13.5| = 0.5
|15 - 13.5| = 1.5
|16 - 13.5| = 2.5
|120 - 13.5| = 106.5

Absolute deviations: [3.5, 1.5, 1.5, 0.5, 0.5, 1.5, 2.5, 106.5]
Step 3: Calculate MAD (Median of Absolute Deviations)


Sort absolute deviations: [0.5, 0.5, 1.5, 1.5, 1.5, 2.5, 3.5, 106.5]
MAD = median([0.5, 0.5, 1.5, 1.5, 1.5, 2.5, 3.5, 106.5]) = 1.5
Step 4: Calculate Modified Z-scores


Constant = 0.6745

Modified Z(10)  = 0.6745 * (10 - 13.5) / 1.5  = 0.6745 * (-3.5) / 1.5  = -1.57
Modified Z(12)  = 0.6745 * (12 - 13.5) / 1.5  = 0.6745 * (-1.5) / 1.5  = -0.67
Modified Z(12)  = 0.6745 * (12 - 13.5) / 1.5  = 0.6745 * (-1.5) / 1.5  = -0.67
Modified Z(13)  = 0.6745 * (13 - 13.5) / 1.5  = 0.6745 * (-0.5) / 1.5  = -0.22
Modified Z(14)  = 0.6745 * (14 - 13.5) / 1.5  = 0.6745 * (0.5) / 1.5   =  0.22
Modified Z(15)  = 0.6745 * (15 - 13.5) / 1.5  = 0.6745 * (1.5) / 1.5   =  0.67
Modified Z(16)  = 0.6745 * (16 - 13.5) / 1.5  = 0.6745 * (2.5) / 1.5   =  1.12
Modified Z(120) = 0.6745 * (120 - 13.5) / 1.5 = 0.6745 * (106.5) / 1.5 = 47.9  ← HUGE!
Step 5: Identify Outliers

Threshold: |Modified Z| > 3.5

120 has |Modified Z| = 47.9 → Definitely an outlier!

All other values have |Modified Z| < 3.5 → Not outliers
```

### When to Use MAD Method
Excellent for:

Small to medium datasets

Data with multiple or extreme outliers

Situations where standard Z-score fails

Any real-world data that might contain anomalies

Considerations:

Slightly more computationally intensive

The 3.5 threshold is a guideline - can be adjusted based on domain knowledge

### Mathematical Intuition
Why the constant 0.6745?

For a normal distribution: MAD ≈ 0.6745 × σ

So: Modified Z = (x - median) / MAD × 0.6745 ≈ (x - median) / σ

This makes Modified Z-scores comparable to standard Z-scores for normal data



In [1]:
import numpy as np

def detect_outliers_mad(data, threshold=3.5):
    """
    Detect outliers using Modified Z-score  method
    """
    data = np.array(data)
    median = np.median(data)
    
    # Calculate MAD
    deviations = np.abs(data - median)
    mad = np.median(deviations)
    
    # Avoid division by zero
    if mad == 0:
        mad = 1e-6  # small value
    
    # Calculate Modified Z-scores
    modified_z_scores = 0.6745 * (data - median) / mad
    
    # Identify outliers
    outliers = []
    for i, z in enumerate(modified_z_scores):
        if abs(z) > threshold:
            outliers.append((i, data[i], z))
    
    return outliers

# Example usage
data = [10, 12, 12, 13, 14, 15, 16, 120]
outliers = detect_outliers_mad(data)
print(f"MAD Outliers detected: {outliers}")

# Compare with standard Z-score
def detect_outliers_zscore(data, threshold=3):
    mean = np.mean(data)
    std = np.std(data)
    z_scores = (data - mean) / std
    outliers = [(i, data[i], z) for i, z in enumerate(z_scores) if abs(z) > threshold]
    return outliers

z_outliers = detect_outliers_zscore(data)
print(f"Z-score Outliers detected: {z_outliers}")

MAD Outliers detected: [(7, np.int64(120), np.float64(47.8895))]
Z-score Outliers detected: []
