# Local Outlier Factor Outlier Detection Hello World

An example application of Local Outlier Factor (LOF) Outlier Detection.

LOF is an **unsupervised** algorithm: you only need to define the number of neighbors **(K)**, then the algorithm will try to find clusters internally. Then, when you give a new point to the model, it gives you a score describing the liklihood that the point is a outlier.

The output score depends on the "clearness" of the data. For example, if the data is very organized and clean, a LOF of 1.03 can still be an outlier. In contrast, if the data is messy (i.e. the boundaries of clusters are not clear), a LOF of 1.1 can still be an inliner. But anyways, the rule of thumb is, the larger of the LOF, the more chance that it is an outlier.

**Reference:**
- https://scikit-learn.org/stable/auto_examples/neighbors/plot_lof_outlier_detection.html
- https://cat.chriz.hk/2020/11/knndbscan-lof.html


In [1]:
## Inliner Generation
import numpy as np

# Generate 2D array of random variables ([100][2]) and factor them by 0.3
X_inliers = 0.3 * np.random.randn(100, 2)

# Generate two groups of clusters, one with value rand+2, one with rand-2
X_inliers = np.r_[X_inliers + 2, X_inliers - 2]


In [2]:
## Outlier Generation

# Generate 2D array of outliers ([20][2]) spreading uniformly from (-4,-4) to (4,4):
X_outliers = np.random.uniform(low=-4, high=4, size=(20, 2))


In [3]:
## Data Concatenation

# Concatenate two arrays (i.e. [120][2])
X = np.r_[X_inliers, X_outliers]

# Generate labels (inliner = 1 / outliers = -1)
n_outliers = len(X_outliers)
ground_truth = np.ones(len(X), dtype=int)
ground_truth[-n_outliers:] = -1


In [4]:
## Model Execution

# Import LOF from neighbors model
from sklearn.neighbors import LocalOutlierFactor

# Create LOF
lof = LocalOutlierFactor(n_neighbors=20, contamination=0.2) # K=20, contamination=percentage of outliers (20/120)

# Fit and find the outliners in the data using the model
y_pred = lof.fit_predict(X)


In [5]:
## Accuracy Evaluation
n_errors = (y_pred != ground_truth).sum()
print("Prediction errors: %d" % (n_errors))
print("Accuracy: %f" % (1-(n_errors/len(X))))


Prediction errors: 24
Accuracy: 0.890909
