# Anomaly Detection

<img src="../img/fish.png" width="500">

Refers to the problem of finding patterns in data that do not conform to expected behavior. These nonconforming patterns are often referred to as anomalies, outliers, discordant observations, exceptions, aberrations, surprises, peculiarities, or contaminants in different application domains.

The main idea is to define a "region" representing normal behavior and declare any observation in the data that does not belong to this normal region as an anomaly. But...

- Defining a normal region is difficult.
- The boundary between normal and anomalous behavior is not precise.
- When anomalies are the result of malicious actions, the malicious adversaries often adapt.
- In many domains normal behavior keeps evolving.
- The exact notion of an anomaly is different for different application domains.
- Availability of labeled data for training/validation is usually a major issue.
- Often the data contains noise that tends to be similar to the actual anomalies and hence is difficult to distinguish and remove.

## - Point Anomalies  
## - Contextual Anomalies  
## - Collective Anomalies  

# Point Anomalies  
<img src="../img/pointan.png" width="600">

Also know as Global Outliers  
Occur when records are anomalous with respect to all other records in the dataset

<img src="../img/point1d.png" width="800">

# Contextual Anomalies
<img src="../img/contextual.png" width="800">

(Conditional) Outliers  
Values are not outside the normal global range.  
A data point is considered a contextual outlier if its value significantly deviates from the rest of the data points in the same context. Note that this means that same value may not be considered an outlier if it occurred in a different context

<img src="../img/contextual2.png" width="900">

# Collective Anomalies
<img src="../img/collective3.jpg" width="700">

Data points within a data set are considered anomalous if those values as a collection or locally deviate significantly from the entire data set, but the values of the individual data points are not themselves anomalous. In time series data, one way this can manifest is as normal peaks and valleys occurring outside of a time frame.

## Nature of Input Data

- The nature of attributes determines the applicability of anomaly detection techniques.
- Identify the minimum aggregate level of the anomaly class (transaction, record, measure, etc.)
- Trying pairwise distance between features 
- Always take in consideration the scale and data type on every feature.

# Techniques for Anomaly Detection

<img src="../img/techniques.png" width="400">
###### Baddar, Sherenaz & Merlo, Alessio & Migliardi, Mauro. (2014). Anomaly Detection in Computer Networks: A State-of-the-Art Review. Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications (JoWUA). 5. 29-64. 

- Unsupervised Learning
- Statistical Filtering

# Unsupervised Learning

Unsupervised Learning is a modern term for certain techniques of data mining also known as Clustering.  

It refers to the task of grouping data so that points in the same cluster are highly similar to each other, while points in different clusters are dissimilar. Is a form of unsupervised learning because there is no target variable indicating which groups the training data belong to.

<img src="../img/Fruits.jpg" width="900">

An unsupervised learning algorithm explores the data to find internal structures existing. Mathematically, we do not have any $y$ or **label** rather we consider the whole training data as a feature table $\x$. An example could be transactional data; for instance, it can help in identifying customer segments and clusters with certain attributes; this is often used in content personalization.


<img src="../img/Machine_learning_3.jpg" width="900">

<img src="../img/Outliers.jpeg" width="900">

<img src="../img/anomaly.png" width="700">

<img src="../img/topic.png" width="900">