# Mahalanobis Distance
Mahalanobis Distance measures the distance between a point and a distribution, taking into account the correlations between variables and their different scales.

``` Unlike Euclidean distance that treats all directions equally, Mahalanobis Distance understands that data can be correlated and have different variances.```

Mathematical Definition
For a single point x from distribution with mean μ and covariance matrix Σ:


``` D² = (x - μ)ᵀ Σ⁻¹ (x - μ)```
Where:

x = data point (vector)

μ = mean vector of the distribution

Σ = covariance matrix of the distribution

Σ⁻¹ = inverse of covariance matrix

D² = squared Mahalanobis distance

Geometric Interpretation
Think of the data as an ellipsoid (like a stretched sphere):

Euclidean distance: Measures distance as if the data were a perfect sphere

Mahalanobis distance: Measures distance accounting for the actual shape and orientation of the ellipsoid

### Step-by-Step Example
Let's use height-weight data to understand the intuition:
``` bash
Dataset:


Person A: [Height=170cm, Weight=65kg]
Person B: [Height=175cm, Weight=70kg]
Person C: [Height=180cm, Weight=75kg]
Person D: [Height=190cm, Weight=40kg]  ← Suspicious!
Step 1: Calculate Mean Vector (μ)


Height mean = (170+175+180+190)/4 = 178.75
Weight mean = (65+70+75+40)/4 = 62.5

μ = [178.75, 62.5]
Step 2: Calculate Covariance Matrix (Σ)


Covariance measures how variables change together:

Σ = [[Var(Height),    Cov(Height, Weight)],
     [Cov(Weight, Height), Var(Weight)]]

After calculation:
Σ = [[62.92,  37.50],
     [37.50, 193.75]]
Step 3: Calculate Inverse Covariance Matrix (Σ⁻¹)


Σ⁻¹ = [[ 0.0243, -0.0047],
       [-0.0047,  0.0079]]
Step 4: Calculate Mahalanobis Distance for Person D


x - μ = [190-178.75, 40-62.5] = [11.25, -22.5]

D² = [11.25, -22.5] × Σ⁻¹ × [11.25, -22.5]ᵀ
   = [11.25, -22.5] × [[0.0243, -0.0047], [-0.0047, 0.0079]] × [11.25, -22.5]ᵀ

Step 1: [11.25×0.0243 + (-22.5)×(-0.0047), 11.25×(-0.0047) + (-22.5)×0.0079]
       = [0.273 + 0.106, -0.053 + (-0.178)] = [0.379, -0.231]

Step 2: [0.379, -0.231] × [11.25, -22.5]ᵀ
       = 0.379×11.25 + (-0.231)×(-22.5)
       = 4.26 + 5.20 = 9.46

D = √9.46 = 3.08
Comparison with Euclidean Distance
Let's compare both distances for Person D:

Euclidean Distance from mean:


√((190-178.75)² + (40-62.5)²) = √(126.56 + 506.25) = √632.81 = 25.16
Mahalanobis Distance: 3.08

Wait! Why is Mahalanobis smaller? Because it understands that:

Height and weight are correlated (tall people tend to weigh more)

A tall, lightweight person is unusual given this correlation

But it accounts for the natural variability in the data

The Real Power: Understanding Correlations
Consider this more dramatic example:

Scenario: Tech company employees


Features: [Years_Experience, Salary]
Normal pattern: Experience ↑ → Salary ↑

Normal employees:
[2, 60000], [5, 75000], [8, 90000], [12, 120000]

Suspicious employee:
[1, 200000]  ← 1 year experience, $200K salary
Euclidean distance might not flag this as extreme, but Mahalanobis distance will recognize this breaks the expected correlation pattern!

Statistical Properties
Chi-Square Distribution:

For multivariate normal data, squared Mahalanobis distances follow a Chi-Square distribution

Degrees of freedom = number of features (p)

Outlier threshold: D² > χ²(p, α) where α is significance level (e.g., 0.95, 0.99)

Critical Values:

For p=2 features, α=0.95: χ²(2, 0.95) = 5.991

For p=2 features, α=0.99: χ²(2, 0.99) = 9.210
```

## When to Use Mahalanobis Distance
- Excellent for:

Multivariate outlier detection

Correlated features

Different scales and units

Quality control processes

Anomaly detection in complex systems

- Requirements:

More observations than features (n > p)

No perfect multicollinearity

Roughly multivariate normal distribution

- Limitations:

Sensitive to outliers in mean/covariance estimation

Computationally expensive for high dimensions

Requires matrix inversion (can be unstable)

#### Robust Mahalanobis Distance
Problem: Regular Mahalanobis is sensitive to outliers in the mean and covariance estimates.

- Solution: Use robust estimators:

Minimum Covariance Determinant (MCD)

Use median instead of mean

Use robust covariance estimators

