# Local Outlier Factor (LOF)
Uzair Ahmad

## Explained with Numeric Example

---

## 1. Concept

**Local Outlier Factor (LOF)** measures how *isolated* a point is compared to its **local neighborhood**.  

It compares the **local density** around a point with the densities of its **neighbors**.

 🔹 If a point’s region is *less dense* than its neighbors → it’s an **outlier**.  
 🔹 If density is similar → it’s a **normal** point.

---

## 2. Key Terms

| Term | Definition |
|------|-------------|
| **k-distance(A)** | Distance from A to its *k-th nearest neighbor* |
| **Reachability distance** | `reach-distₖ(A,B) = max(k-distance(B), distance(A,B))` |
| **Local Reachability Density (LRD)** | `LRDₖ(A) = 1 / average(reach-distₖ(A,B))` |
| **Local Outlier Factor (LOF)** | `LOFₖ(A) = average(LRD of A's neighbors) / LRD(A)` |

**Interpretation:**
- LOF ≈ 1 → normal  
- LOF > 1 → somewhat outlier  
- LOF ≫ 1 (e.g., >2) → strong outlier  

---

## 📊 3. Example Data (1D for simplicity)

| Point | Value |
|-------|--------|
| A | 1.0 |
| B | 1.2 |
| C | 1.1 |
| D | 5.0 |
| E | 1.3 |

Let’s choose **k = 2**.

---

### Step 1: Pairwise Distances

| Pair | Distance |
|------|-----------|
| A–B | 0.2 |
| A–C | 0.1 |
| A–E | 0.3 |
| A–D | 4.0 |
| B–C | 0.1 |
| B–E | 0.1 |
| B–D | 3.8 |
| C–E | 0.2 |
| C–D | 3.9 |
| E–D | 3.7 |

---

### Step 2: k-Distance and Nearest Neighbors

| Point | 2 Nearest Neighbors | k-distance |
|--------|---------------------|-------------|
| A | C (0.1), B (0.2) | 0.2 |
| B | C (0.1), E (0.1) | 0.1 |
| C | B (0.1), A (0.1) | 0.1 |
| E | B (0.1), C (0.2) | 0.2 |
| D | E (3.7), B (3.8) | 3.8 |

---

### Step 3: Reachability Distances and Local Reachability Density (LRD)

Example for **A**:

`reach-dist(A,C) = max(0.1, 0.1) = 0.1`

`reach-dist(A,B) = max(0.1, 0.2) = 0.2`

`average reach-dist = 0.15`

`LRD(A) = 1 / 0.15 = 6.67`


| Point | Neighbors | Avg Reach-Dist | LRD |
|--------|------------|----------------|------|
| A | B, C | 0.15 | 6.67 |
| B | C, E | 0.15 | 6.67 |
| C | A, B | 0.15 | 6.67 |
| E | B, C | 0.15 | 6.67 |
| D | E, B | 3.75 | 0.27 |

---

### Step 4: Compute LOF

Formula:
LOF(A) = average(LRD of neighbors) / LRD(A)

| Point | Neighbors | Avg Neighbor LRD | LOF |
|--------|------------|------------------|------|
| A | B, C | 6.67 | 1.0 |
| B | C, E | 6.67 | 1.0 |
| C | A, B | 6.67 | 1.0 |
| E | B, C | 6.67 | 1.0 |
| D | E, B | 6.67 | **24.7** |

---

## ✅ 5. Interpretation

| Point | LOF | Meaning |
|--------|------|----------|
| A, B, C, E | ≈ 1 | Normal points |
| D | 24.7 | **Strong Outlier ** |

---

##  Intuitive Summary

- LOF checks how *dense* your neighborhood is compared to your neighbors’.  
- If you live in a sparse area while your neighbors are close together → **you’re the outlier!**


In [2]:
import numpy as np
from scipy.spatial.distance import cdist

class LocalOutlierFactor:
    """
    Custom implementation of the Local Outlier Factor (LOF) algorithm.
    """
    def __init__(self, n_neighbors=2):
        self.n_neighbors = n_neighbors
        self.X = None
        self.lrd_ = None  # local reachability densities
        self.lof_ = None  # LOF scores

    def _k_distance(self, distances):
        """
        Find the k-distance (distance to the k-th nearest neighbor) and
        the indices of the k-nearest neighbors for each point.
        """
        # Sort distances for each point and get indices of neighbors
        sorted_indices = np.argsort(distances, axis=1)
        # For each point, the k-th distance is the distance to the k-th nearest neighbor
        k_distances = distances[np.arange(len(distances)), sorted_indices[:, self.n_neighbors]]
        # Store the indices of k-nearest neighbors (excluding itself)
        neighbors = sorted_indices[:, 1:self.n_neighbors+1]
        return k_distances, neighbors

    def _reachability_distance(self, k_distances, distances, neighbors):
        """
        Compute the reachability distance between each point and its neighbors.
        reach_dist(p, o) = max{k-distance(o), distance(p, o)}
        """
        n_samples = len(distances)
        reach_dists = np.zeros((n_samples, self.n_neighbors))
        # Loop over all points
        for i in range(n_samples):
            for j, o in enumerate(neighbors[i]):
                # Reachability distance considers both the true distance and k-distance of neighbor
                reach_dists[i, j] = max(k_distances[o], distances[i, o])
        return reach_dists

    def _local_reachability_density(self, reach_dists):
        """
        Compute local reachability density (LRD) for each point.
        LRD(p) = 1 / (average reachability distance from p to its k neighbors)
        """
        # Mean reachability distance for each point
        mean_reach = np.mean(reach_dists, axis=1)
        # Avoid division by zero
        lrd = 1.0 / np.maximum(mean_reach, 1e-10)
        return lrd

    def _local_outlier_factor(self, lrd, neighbors):
        """
        Compute LOF for each point.
        LOF(p) = average over neighbors o of [LRD(o) / LRD(p)]
        """
        n_samples = len(lrd)
        lof_scores = np.zeros(n_samples)
        for i in range(n_samples):
            # Ratio of neighbors' LRDs to the point’s LRD
            ratios = lrd[neighbors[i]] / lrd[i]
            # LOF is the mean of these ratios
            lof_scores[i] = np.mean(ratios)
        return lof_scores

    def fit(self, X):
        """
        Fit the LOF model to data and compute LOF scores.
        """
        # Convert input to numpy array
        self.X = np.array(X, dtype=float)

        # Compute pairwise distances between all points
        distances = cdist(self.X, self.X)

        # Step 1: Compute k-distance and neighbors
        k_distances, neighbors = self._k_distance(distances)

        # Step 2: Compute reachability distances
        reach_dists = self._reachability_distance(k_distances, distances, neighbors)

        # Step 3: Compute local reachability densities
        self.lrd_ = self._local_reachability_density(reach_dists)

        # Step 4: Compute LOF scores
        self.lof_ = self._local_outlier_factor(self.lrd_, neighbors)

        # Print intermediate details for clarity
        print("=== Intermediate Details ===")
        for i, (lr, lof) in enumerate(zip(self.lrd_, self.lof_)):
            print(f"Point {i}: LRD = {lr:.4f}, LOF = {lof:.4f}")
        print("=============================\n")

        return self

    def score_samples(self):
        """
        Return the LOF scores (higher → more likely to be outlier).
        """
        if self.lof_ is None:
            raise ValueError("Model not fitted yet. Call fit() first.")
        return self.lof_


# =============================
# Example Usage
# =============================

# 1D example (same as our earlier explanation)
data_1d = np.array([[1.0], [1.2], [1.1], [5.0], [1.3]])

lof = LocalOutlierFactor(n_neighbors=2)
lof.fit(data_1d)

# Display LOF scores
for i, score in enumerate(lof.score_samples()):
    label = "OUTLIER" if score > 1.5 else "normal"
    print(f"Point {i} (x={data_1d[i][0]:.1f}): LOF = {score:.3f} → {label}")

# Try with 2D data as well
# data_2d = np.array([[1,1], [1.2,1.1], [1.1,1.3], [5,5], [1.3,1.2]])
# lof.fit(data_2d)
# print(lof.score_samples())


=== Intermediate Details ===
Point 0: LRD = 6.6667, LOF = 1.0000
Point 1: LRD = 6.6667, LOF = 1.0000
Point 2: LRD = 6.6667, LOF = 1.0000
Point 3: LRD = 0.2667, LOF = 25.0000
Point 4: LRD = 6.6667, LOF = 1.0000

Point 0 (x=1.0): LOF = 1.000 → normal
Point 1 (x=1.2): LOF = 1.000 → normal
Point 2 (x=1.1): LOF = 1.000 → normal
Point 3 (x=5.0): LOF = 25.000 → OUTLIER
Point 4 (x=1.3): LOF = 1.000 → normal
