# Anomaly Detection

Refers to the problem of finding patterns in data that do not conform to expected behavior. These nonconforming patterns are often referred to as anomalies, outliers, discordant observations, exceptions, aberrations, surprises, peculiarities, or contaminants in different application domains.

![](../img/fish.png)

The main idea is to define a "region" representing normal behavior and declare any observation in the data that does not belong to this normal region as an anomaly. But...

- Defining a normal region that encompasses every behavior is difficult.
- The boundary between normal and anomalous behavior is not precise.
- When anomalies are the result of malicious actions, the malicious adversaries often adapt themselves to make the anomalous observations appear normal, thereby making the task of defining normal behavior more difficult.
- In many domains normal behavior keeps evolving and a current notion of normal behavior might not be sufficiently representative in the future.
- The exact notion of an anomaly is different for different application domains (e.g. medical biometrics vs stock market).
- Availability of labeled data for training/validation of models used by anomaly detection techniques is usually a major issue.
- Often the data contains noise that tends to be similar to the actual anomalies and hence is difficult to distinguish and remove.
- In many domains normal behavior keeps evolving and a current notion of normal behavior might not be sufficiently representative in the future

## - Point Anomalies  
## - Contextual Anomalies  
## - Collective Anomalies  

Source
https://www.anodot.com/blog/quick-guide-different-types-outliers/

# Point Anomalies  
<img src="../img/pointan.png" width="400">

Also know as Global Outliers  
Occur when records are anomalous with respect to all other records in the dataset

<img src="../img/point1d.png" width="400">

# Contextual Anomalies

(Conditional) Outliers  
Values are not outside the normal global range, but are abnormal compared to the seasonal pattern.

<img src="../img/contextual.png" width="400">

A data point is considered a contextual outlier if its value significantly deviates from the rest of the data points in the same context. Note that this means that same value may not be considered an outlier if it occurred in a different context

<img src="../img/contextual2.png" width="400">

# Collective Anomalies

Outlier sequence or movement   
Occur when a record is anomalous when considered with adjacent records

A subset of data points within a data set is considered anomalous if those values as a collection deviate significantly from the entire data set, but the values of the individual data points are not themselves anomalous in either a contextual or global sense. In time series data, one way this can manifest is as normal peaks and valleys occurring outside of a time frame when that seasonal sequence is normal or as a combination of time series that is in an outlier state as a group.

![](../img/collective3.jpg)

Analogy  
A fist-size meteorite impacting a house in your neighborhood is a global outlier because it’s a truly rare event that meteorites hit buildings. Your neighborhood getting buried in two feet of snow would be a contextual outlier if the snowfall happened in the middle of summer and you normally don’t get any snow outside of winter. Every one of your neighbors moving out of the neighborhood on the same day is a collective outlier because although it’s definitely not rare that people move from one residence to the next, it is very unusual that an entire neighborhood relocates at the same time.

## Nature of Input Data

- The nature of attributes determines the applicability of anomaly detection techniques.
- Identify the minimum aggregate level of the anomaly class (transaction, record, measure, etc.)
- Trying pairwise distance between features might be provided in the form of a distance or similarity matrix.
- Always take in consideration the scale and data type on every feature.

# Techniques for Anomaly Detection

![](../img/techniques.png)
###### Baddar, Sherenaz & Merlo, Alessio & Migliardi, Mauro. (2014). Anomaly Detection in Computer Networks: A State-of-the-Art Review. Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications (JoWUA). 5. 29-64. 

- Unsupervised Learning
- Statistical Filtering

# Unsupervised Learning

$$
\renewcommand{\like}{{\cal L}}
\renewcommand{\loglike}{{\ell}}
\renewcommand{\err}{{\cal E}}
\renewcommand{\dat}{{\cal D}}
\renewcommand{\hyp}{{\cal H}}
\renewcommand{\Ex}[2]{E_{#1}[#2]}
\renewcommand{\x}{{\mathbf x}}
\renewcommand{\v}[1]{{\mathbf #1}}
$$

Clustering
It refers to the task of grouping data so that points in the same cluster are highly similar to each other, while points in different clusters are dissimilar. Is a form of unsupervised learning because there is no target variable indicating which groups the training data belong to.

![](../img/Fruits.jpg)

Unlike supervised learning, unsupervised learning is used with data sets without historical data. An unsupervised learning algorithm explores the data to find internal structures existing. Mathematically, we do not have any $y$ or **label** rather we consider the whole training data as a feature table $\x$. FThis kind of learning works best for transactional data; for instance, it can help in identifying customer segments and clusters with certain attributes; this is often used in content personalization.


![](../img/Machine_learning_3.jpg)

![](../img/recommender-systems.jpg)

![](../img/Outliers.jpeg) 

![](../img/anomaly.png) 

![](../img/topic.png) 