# What is Anomaly Detection? Methods, Examples, and Applications

## Outline

1. Introduction
2. What is Anomaly Detection?
    1. Definition and context
    2. Importance in data science
3. Real-world Applications of Anomaly Detection
4. Types of Anomalies
5. Anomaly detection methods and when to use each one
6. Building an anomaly detection model
7. Challenges and Limitations in Anomaly Detection
8. Conclusion

### Introduction

Everyone loves to stand out, to be different. But, that's not the quality you want in your data points if you are a data scientists. Divergent data points or _anomalies_ in a dataset, are one of the most dangerous data quality issues that plague almost all data science projects. 

This last sentence may surprise you if you have only been working on polished open-source datasets that often come without outliers. But real-world datasets always feature some amount of samples different from the norm. It is your job to detect and deal with them appropriately. 

In this article, you will learn the fundamental ideas of this process, often called anomaly detection:

1. The detrimental effect anomalies have on your project.
2. The importance of detecting anomalies.
3. Real-world applications of anomaly detection.
4. The difference between anomalies, outliers and novelties.
5. Types of anomalies and anomaly detection methods.
6. How to build anomaly detection algorithms in Python.
7. How to deal with the challenges of anomaly detection. 

By the end, you will have all it takes to go deep into the world of anomalies and FILL IN LATER.

### What is Anomaly Detection?

Anomaly detection, often referred to as outlier detection, is a process of finding patterns or instances in a dataset that deviate significantly from the expected or "normal behavior".

"Normal behavior" and thus, the divergent instances (outliers) significantly vary depending on the context. Below are a few example.

Anomaly detection, often referred to as outlier detection, is a process of finding patterns or instances in a dataset that deviate significantly from the expected or "normal behavior". 

"Normal behavior" and thus, the divergent instances (outliers) vary depending on the context. Here are a few examples:

##### 1. Financial transactions
__Normal__: Routine purchases and consistent spending by an individual in London.

__Outlier__: A massive withdrawal from Ireland from the same account, hinting at potential fraud.

##### 2. Network traffic in cybersecurity
__Normal__: Regular communication, steady data transfer, and adherence to protocol.

__Outlier__: Abrupt increase in data transfer or use of unknown protocols signaling a potential breach or malware.

##### 3. Patient vital signs monitoring
__Normal__: Stable heart rate and consistent blood pressure

__Outlier__: Sudden increase in heart rate and decrease in blood pressure, indicating a potential emergency or equipment failure. 

Anomaly detection includes many types of unsupervised methods to identify divergent samples. Data specialists choose them based on anomaly type, the context, structure and characteristics of the dataset at hand. We'll cover them in the coming sections.

### Real-world applications of Anomaly Detection

Even though we saw some examples above, let's look at a real-life story how anomaly detection works in finance.

Shaq O'Neal, four times NBA winner, gets traded from the Miami Heat to the Phoenix Suns. When Shaq arrives at the apartment provided by the Phoenix Suns, he finds that it is empty. Without waiting for his belongings to arrive from Miami, he goes to Walmart and makes the biggest purchase in Walmart history for 70.000$ to furnish his apartment. Or at least, he tries, because his card gets declined twice.

As he wonders what possibly could be the problem (he _can't_ be broke!) at 2 am in the morning, American Express security calls him, all out of breath, and tells him that his card was stolen because somebody was trying to make a 70.000$ purchase at Walmart in Phoenix (watch [here](https://www.youtube.com/watch?v=1W3A2hQhdg4&ab_channel=TheLateLateShowwithJamesCorden)). 

There are so many other real-world applications of anomaly detection beyond finance and fraud detection:

- Cybersecurity
- Healthcare
- Industrial equipment monitoring
- Network intrusion detection
- Energy grid monitoring
- E-commerce and user behavior analysis
- Quality control in manufacturing

and so on. 

### Types of Anomalies

Anomaly detection encompasses two broad practices: outlier detection and novelty detection. 

Outliers are abnormal or extreme data points that exist only in training data. In contrast, novelties refer to new or previously unseen instances in the original (training) data.

For example, consider a dataset of daily temperatures in a city. Most days, the temperatures range between 20°C and 30°C. However, one day, there's a spike of 40°C. This extreme temperature is an outlier as it significantly deviates from the usual daily temperature range.

Now, imagine that the city installs a new, more accurate weather monitoring station. As a result, the dataset starts consistently recording slightly higher temperatures, ranging from 25°C to 35°C. This sustained increase in temperatures is a __novelty__, representing a new pattern introduced by the improved monitoring system.

Anomalies, on the other hand, is a broad term that refers to both outliers and novelties. It can be used to define any abnormal instance in any context.

### Anomaly Detection Methods And When to Use Each One

### Building an anomaly detection model in Python


### Challenges in Anomaly Detection

### Conclusion