# Local Correlation Integral
LOCI is a density-based outlier detection algorithm that goes beyond simple distance or density measures by providing a multi-granular approach to outlier detection. The key innovation is that it doesn't require a fixed neighborhood size - it evaluates points at all scales simultaneously.


Most methods (like LOF): Use a fixed neighborhood size (k)

LOCI: Evaluates points across all possible neighborhood sizes

Provides automatic, data-driven outlier detection without parameter tuning

Mathematical Foundation
1. Core Definitions
α-Neighborhood:


``` N(x, α) = {y ∈ D | dist(x, y) ≤ α} ```
Points within distance α from x

Counting Neighborhood:


``` n(x, α) = |N(x, α)| ```
Number of points in α-neighborhood

2. Multi-Granular Deviation (MGD)
The fundamental measure in LOCI:


``` MGD(x, r, α) = n(x, α·r) / n(x, r) ```
Where:

r = base radius

α = scaling factor (typically α = 1/√2 or α = 0.5)

n(x, r) = points within radius r of x

n(x, α·r) = points within radius α·r of x

3. Local Correlation Integral (LOCI) Score

``` σ_MGD(x, r) = standard deviation of MGD for all points in N(x, r) MDEF(x, r, α) = 1 - (n(x, α·r) / avg_n(α·r))  ```
Where:

 avg_n(α·r) = average of n(y, α·r) for all y in N(x, r) 

Outlier Detection Rule:


``` If MDEF(x, r, α) > k·σ_MDEF(x, r, α) for some r   then x is an outlier ```
Where:

k = typically 3 (like 3-sigma rule)

The algorithm tests all r values up to some maximum

## Step-by-Step Example
``` bash
Let's use a simple 1D example to understand:

Dataset: [1, 1.5, 2, 2.5, 3, 3.5, 4, 10]
Outlier: 10

Parameters: α = 0.5, k = 3

Step 1: For point 10, calculate at various radii r
Let's take r = 2:


N(10, 2) = points within distance 2 from 10 = {10} only
n(10, 2) = 1

N(10, 1) = points within distance 1 from 10 = {10} only  (α·r = 0.5×2 = 1)
n(10, 1) = 1
Step 2: Calculate for neighbors
For the cluster points (1-4):
At r = 2, α·r = 1:

For point 3 (in the dense cluster):


N(3, 2) = points within distance 2 = {1.5, 2, 2.5, 3, 3.5, 4} = 6 points
N(3, 1) = points within distance 1 = {2, 2.5, 3, 3.5} = 4 points
Step 3: Calculate MDEF
For point 10 at r=2:


n(10, 1) = 1
avg_n(1) for neighbors = ??? (but 10 has no neighbors!)
The key insight: For isolated points like 10, when we look at their neighborhood, they have very few or no neighbors, so their local density estimate is unreliable or zero.

```