## What is an Anomaly?
An anomaly is basically something that’s unusual, doesn’t fit the usual pattern, or stands out because it’s different in a specific category or situation. 

## What is Anomaly Detection?
Anomaly detection is the process of identifying data points that deviate from the expected behavior or norm. Uncovering these anomalies can help you mitigate errors in systems, catch potential incidents, segment events or users that deviate from the norm, or just identify abnormal events that occur in your data.

Anomaly detection relies on **historical data** or established knowledge to determine what falls within the usual range.

## Anomaly Types

### 1. Point Anomaly
A point anomaly is where a single datapoint stands out from the expected pattern, range, or norm. In other words, the datapoint is unexpected.

<center><img src="img/anomaly_1.png" alt="Point Anomaly" width="500" height="477" /></center>
<p style="text-align: center; font-size: small;"><i><b>Figure 24.</b> Example of point anomalies</i></p>


### 2. Collective Anomaly
A collective anomaly occurs where single datapoints looked at in isolation appear normal. When you look at a group of these datapoints, however, unexpected patterns, behaviours, or results become clear.

<center><img src="img/anomaly_2.png" alt="c Anomaly" width="500" height="397" /></center>
<p style="text-align: center; font-size: small;"><i><b>Figure 25.</b> An irregular heart beat is an example of a collective anomaly.</i></p>

### 3. Contextual Anomaly
Instead of looking at specific datapoints or groups of data, an algorithm looking for contextual anomalies will be interested in unexpected results that come from what appears to be normal activity.

The crucial element here is context: Are the results out of context?


<center><img src="img/anomaly_3.png" alt="c Anomaly" width="500" height="318" /></center>
<p style="text-align: center; font-size: small;"><i><b>Figure 26.</b> Example of contextual anomaly detection in network traffic</i></p>

A good example in this instance is a network intrusion attempt. An algorithm looking for contextual anomalies will have a baseline of activity that provides it with normal parameters. This could, for example, show the expected levels of traffic accessing the network at various times of the day.

Traffic might be at its lowest in the early hours of the morning. Therefore, a spike in traffic at 3 a.m. is a contextual anomaly that warrants further action and/or investigation as it could indicate a network intrusion attempt.

## Techniques for Anomaly Detection
#### 1. Statistical Methods
Statistical approaches like Z-scores, mean-variance analysis, and quartiles are commonly used for simple anomaly detection in univariate data.

* **Z-Score/Standard Score**: This method measures how many standard deviations a data point is away from the mean. Points that fall far from the mean are considered anomalies.
* **Percentiles**: Identifying anomalies based on percentiles or quantiles, where values below or above a certain threshold are considered outliers.

#### 2. Machine Learning Algorithms
Advanced techniques, such as clustering, classification, and regression, can be employed for anomaly detection in multidimensional and complex datasets. Popular algorithms include Isolation Forest, One-Class SVM, and Autoencoders.

* **Isolation Forest**: An ensemble learning method that builds a tree structure to isolate anomalies efficiently.
* **One-Class SVM**: A support vector machine (SVM) model trained to classify data points as normal or outliers.
* **K-Nearest Neighbors (KNN)**: Assigns an anomaly score based on the distance to the K-nearest neighbors, with distant points being potential anomalies.
* **Autoencoders**: Neural networks designed to learn a compressed representation of data, where reconstruction error can be used to identify anomalies.

### 3. Clustering methods
* **DBSCAN (Density-Based Spatial Clustering of Applications with Noise)**: Clusters data points based on their density, with points that do not belong to any cluster considered outliers.
* **K-Means Clustering**: Data points that do not belong to well-defined clusters may be considered anomalies.

#### 4. Time Series Analysis
For time-dependent data, methods like Seasonal Decomposition of Time Series (STL) and Prophet can help identify anomalies over time.

* **Moving Averages**: Identifying anomalies based on deviations from the moving average or exponential moving average.
* **Seasonal Decomposition**: Decomposing a time series into its trend, seasonal, and residual components, with anomalies often detected in the residual component.

#### 5. Proximity-based approaches
Assumes normal points have close neighbors while anomalies are located far from other points.
* **Mahalanobis Distance**: Measures the distance of data points from the center of the data distribution, considering correlations between features.
* **Local Outlier Factor (LOF)**: Computes the local density deviation of a data point compared to its neighbors, identifying regions of different densities.

#### 6. Deep Learning
Deep learning models, like Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), are used in image and sequential data anomaly detection.

Next we will implement an **anomaly detection** model based on **autoencoder**.