## What is Anomaly detection?

Source: https://towardsdatascience.com/anomaly-detection-for-dummies-15f148e559c1

Anomaly detection is the process of identifying unexpected items or events in data sets, which differ from the norm. And anomaly detection is often applied on unlabeled data which is known as unsupervised anomaly detection. Anomaly detection has two basic assumptions:

- Anomalies only occur very rarely in the data.
- Their features differ from the normal instances significantly.

Source for below information: https://medium.com/analytics-vidhya/algorithm-selection-for-anomaly-detection-ef193fd0d6d1

<b>Supervised</b> learning algorithms can be used for anomaly detection when anomalies are already known and labelled data is available. These methods are particularly expensive when the labeling has to be done manually. Unbalanced classification algorithms such as Support Vector Machines (SVM) or Artificial Neural Networks (ANN) can be used for supervised anomaly detection.

<b>Semi-supervised</b> anomaly Detection uses labelled data consisting only of normal data without any anomalies. The basic idea is, that a model of the normal class is learned and any deviations from that model can be said to be anomalies. Popular algorithms: Auto-Encoders, Gaussian Mixture Models, Kernel Density Estimation.

<b>Unsupervised</b> learning methods are most commonly used to detect anomalies:
<img src="https://miro.medium.com/max/2498/1*7yJ2KiW3RFHP2nJppTP4mw.png" width=5000>

## What is PyOD?

The Python Outlier Detection (PyOD) module makes your anomaly detection modeling easy. It collects a wide range of techniques ranging from supervised learning to unsupervised learning techniques.

## K-Nearest Neighbor (kNN): 

Source:https://towardsdatascience.com/anomaly-detection-with-pyod-b523fc47db9

The k-NN algorithm is a non-parametric method that identifies the k closest training examples. Any isolated data points can potentially be classified as outliers.

With the trained k-NN model from the Python Outlier Detection (PyOD) module, you can apply to the test dataset to predict outliers. The function “decision_functions()” generates the anomaly score based on your model predictions.

How is the “anomaly score” defined? Recall the k-NN model uses the Euclidean distance to measure the distance. An outlier is a point that is distant from neighboring points, so the outlier score is defined by the distance value. Each point will have an outlier score. Our job is to find those points with high outlier scores. We can use a histogram to find those points.

<img src="https://miro.medium.com/max/572/1*W3anCSUzDwHzTM1BOAd5cA.png">

## Local Outlier Factor(LOF): 

Source: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.LocalOutlierFactor.html

The Local Outlier Factor (LOF) algorithm is an unsupervised anomaly detection method which computes the local density deviation of a given data point with respect to its neighbors. It considers as outliers the samples that have a substantially lower density than their neighbors. 

The anomaly score of each sample is called Local Outlier Factor. It measures the local deviation of density of a given sample with respect to its neighbors. It is local in that the anomaly score depends on how isolated the object is with respect to the surrounding neighborhood. More precisely, locality is given by k-nearest neighbors, whose distance is used to estimate the local density. By comparing the local density of a sample to the local densities of its neighbors, one can identify samples that have a substantially lower density than their neighbors. These are considered outliers.

<img src="https://scikit-learn.org/stable/_images/sphx_glr_plot_lof_outlier_detection_001.png">

## K-Means:

K-means Clustering is a popular clustering algorithm that groups data points into k clusters by their feature values. Scores of each data point inside a cluster are calculated as the distance to its centroid. Data points which are far from the centroid of their clusters are labeled as anomalies.

<img src="https://www.researchgate.net/profile/Hazrat_Ali3/publication/326619835/figure/fig4/AS:654171346849792@1532978011669/Anomaly-detection-through-k-means-clustering-first-dataset.png">

## Robust Principal Component Analysis(rPCA): 

While PCA is susceptible to outliers, RPCA can detect a more accurate low dimensional space to be recovered,
and that is why RPCA is necessary over standard PCA for anomaly detection.
RPCA works to recover a low-rank matrix L and a sparse matrix S from corrupted measurements where S would be the anomalies of the data set.

<img src="https://d3i71xaburhd42.cloudfront.net/bd32166541a656a4420c5c1ffc5701d09eedf2aa/4-Figure1-1.png">

## One Class SVM:

A standard SVM works by separating two classes using a hyperplane with the largest possible margin.

One-Class SVM is similar, but instead of using a hyperplane to separate two classes of instances, it uses a hypersphere to encompass all of the instances. Now think of the "margin" as referring to the outside of the hypersphere -- so by "the largest possible margin", we mean "the smallest possible hypersphere".

<img src="https://upload.wikimedia.org/wikipedia/en/f/f4/One-class_data_description_TAX.png">

In the above image we can see:

The hypersphere containing the target data having center a and radius R. Objects on the boundary are support vectors, and two objects lie outside the boundary having slack greater than 0.

## Isolation Forest

Isolation Forest is an algorithm to detect outliers that returns the anomaly score of each sample using the IsolationForest algorithm which is based on the fact that anomalies are data points that are few and different. Isolation Forest is a tree-based model. In these trees, partitions are created by first randomly selecting a feature and then selecting a random split value between the minimum and maximum value of the selected feature.

An <b>advantage</b> of this algorithm is that it works with a huge data set and several dimensions. The dimensions refer to the different features that we have in our data set. The data refers, of course, to each element of the data set.

1. Select the point to isolate.
2. For each feature, set the range to isolate between the minimum and the maximum.
3. Choose a feature randomly.
4. Pick a value that’s in the range, again randomly:
    - If the chosen value keeps the point above, switch the minimum of the range of the feature to the value.
    - If the chosen value keeps the point below, switch the maximum of the range of the feature to the value.
5. Repeat steps 3 & 4 until the point is isolated. That is, until the point is the only one which is inside the range for all features.
6. Count how many times you’ve had to repeat steps 3 & 4. We call this quantity the isolation number.

<img src="https://image.slidesharecdn.com/anomalydetection-150918115852-lva1-app6892/95/l14-anomaly-detection-19-638.jpg?cb=1442577598">



## Angle Based Outlier detection (ABOD): 

Angle Based Outlier detection (ABOD) relates data to high-dimensional spaces, using the variance in the angles between a data point to the other points as anomaly score. The angle-based outlier detection (ABOD) method provides an good alternative in identifying outliers in high-dimensional spaces.

<img src="https://images.slideplayer.com/24/7032797/slides/slide_59.jpg">

## Gaussian Mixture Models:

Gaussian Mixture Model (GMM) fits a given number of Gaussian distributions to a dataset. The model is trained using the Expectation-Maximization algorithm. It detects the anomalies samples by evaluating their probabilities. If the probability is below a threshold, it is an anomaly.

![image.png](attachment:image.png)

## Which anomaly detection model to use?

Source: https://federation.edu.au/__data/assets/pdf_file/0011/443666/ICDM2018-Tutorial-Final.pdf

• Isolation based methods: Different isolation mechanisms

• Nearest neighbour-based methods: Well explored methods

• Kernel methods: have untapped potentials

## How to Evaluate the Quality of Unsupervised Anomaly Detection Algorithms?

Source: 

- https://arxiv.org/abs/1607.01152 : Paper
- https://github.com/ngoix/EMMV_benchmarks/blob/master/em.py : Code

When sufficient labeled data are available, classical criteria based on Receiver Operating Characteristic (ROC) or Precision-Recall (PR) curves can be used to compare the performance of un-supervised anomaly detection algorithms. However , in many situations, few or no data are labeled. This calls for alternative criteria one can compute on non-labeled data. In this paper, two criteria that do not require labels are empirically shown to discriminate accurately (w.r.t. ROC or PR based criteria) between algorithms. These criteria are based on existing Excess-Mass (EM) and Mass-Volume (MV) curves