# CBLOF (Cluster-based Local Outlier Factor) Hello World

This project uses PyOD Library to calculate CBLOF for anomaly detection.

CBLOF is an advanced version of LOF, for more details, please refer to the [original paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.20.4242&rep=rep1&type=pdf). I found it quite hard to comprehend because it involves Squeezer Algorithm and lots of new concepts. However, that's alright - you can simply treat it as a blackbox and put your data into this model.

**Reference:**
- https://www.analyticsvidhya.com/blog/2019/02/outlier-detection-python-pyod/
- https://pyod.readthedocs.io/en/latest/index.html


In [1]:
# Install PyOD (Python Outlier Detection) written by Zao Yang
#%pip install pyod
#%pip install --upgrade pyod  # to make sure that the latest version is installed!


In [2]:
# Import Libraries
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.font_manager

from pyod.models.abod import ABOD
from pyod.models.knn import KNN
from pyod.models.cblof import CBLOF 


In [3]:
## Inliner Generation
import numpy as np

# Generate 2D array of random variables ([100][2]) and factor them by 0.3
X_inliers = 0.3 * np.random.randn(100, 2)

# Generate two groups of clusters, one with value rand+2, one with rand-2
X_inliers = np.r_[X_inliers + 2, X_inliers - 2]


In [4]:
## Outlier Generation

# Generate 2D array of outliers ([20][2]) spreading uniformly from (-4,-4) to (4,4):
X_outliers = np.random.uniform(low=-4, high=4, size=(20, 2))


In [5]:
## Data Concatenation

# Concatenate two arrays (i.e. [120][2])
X = np.r_[X_inliers, X_outliers]

# Generate labels (inliner = 0 / outliers = 1)
n_outliers = len(X_outliers)
ground_truth = np.zeros(len(X), dtype=int)
ground_truth[-n_outliers:] = 1


In [6]:
# Create CBLOF model
cblof = CBLOF(contamination=0.2) # contamination = percentage of outliers (20/120)

# Fit the dataset to model
cblof.fit(X)

# Predict if the data are outliers or inliers
y_pred = cblof.predict(X)


In [7]:
## Accuracy Evaluation
n_errors = (y_pred != ground_truth).sum()
print("Prediction errors: %d" % (n_errors))
print("Accuracy: %f" % (1-(n_errors/len(X))))


Prediction errors: 24
Accuracy: 0.890909
