# Outliers

### Main Take-Aways

- There are different type of Outliers
    - Global
    - Collective
    - Contextual Outliers
- Detecting Outliers
    - Visually (Box plots, scatter plots)
    - Mathematically (z-score, IQR score)
- How to handle Outliers

---


## Potential Questions

#### What's the difference between _Noise_ and _Outliers_
- Noise = incorrect values. Caused by human mistake, people lying in surveys, imprecise instruments etc.
- Outliers = values that fall outside, of the norm. May or may not be incorrect. Example Wealth of Bill Gates compared to rest of the world.

Noise is always bad and should be removed, but it's hard to identify them as noise. Some Noise might actually be outliers → needs to be investigated.
Not all outliers are noise, if that's the case we don't want to remove it.

---

## The different types of Outliers

#### Global Outliers (or point outliers)
If an individual data point can be considered anomalous with respect to the rest of the data, then the datum is termed as a point outlier. Used for example for intrusion detection in computer networks.

#### Collective Outliers
If a collection of data points is anomalous with respect to the entire data set, it is termed as a collective outlier. Much harder to identify than global outliers. Often not a noise. Detect with e.g. _Anomaly Detection Algorithms_.

#### Contextual Outliers
If an individual data instance is anomalous in a specific context or condition (but not otherwise), then it is termed as a contextual outlier. Attributes of data object should be divided into two groups
- Contextual attributes: defines the context. e.g., time & location
- Behavioural attributes: characteristics of the object, used in outlier evaluation e.g. temperature

---

### How to Identify Outliers?
**Mathematically**  
- Z-Score: Points that are more than 3 times the standard deviation from the mean
- IQR-Score (Interquartile Range) = Distance between 0.25-percentile and 0.75-percentile => Points that are more than 1.5*IQR out of range. 

**Visually**  
- Single Variable => Boxplot
- Two Variables => Scatter Plot


### How to deal with outliers
- If the Outlier is Noise => Remove Outlier
- If the Outlier is not Noise => _It depends_
![img/noise.jpg](img/noise.jpg)

### Examples
#### Global Outlier
![img/global-outlier.png](img/global-outlier.png)
#### Collective Outlier
![img/collective-outliers.png](img/collective-outliers.png)
#### Contextual Outlier
![img/contextual-outlier.png](img/contextual-outlier.png)

Global | Collective | Contextual



