# Anomaly detection (?) in time series using autoencoders 
Project by Kasper Bågmark, Michele di Sabato, Erik Jansson, Peng Kuang and Selma Tabakovic
The goal of this project is to take a scalable approach to anomaly detection in time series. 
More specifically, we consider electrocardiogram data, i.e., _time series_ of heart rate sequences. The challenge is to reliably detect if such a time series is _anomalous_, or in other words, deviates from healthy heartbeat patterns. 

TODO: Add more on scalability when this is clear

## Background on time series

A time series is a sequence of real data points indexed by time, i.e., 
$$
(x_t,t \in \mathbb{T}), 
$$
where $\mathbb{T}$ is an index set, for instance $\mathbb{T} = \{1,2,3,\ldots\}$
 Famous examples include series of temperature data collected for instance daily, or the closing stock price. 
Time series modeling is an important application of the theory of stochastic processes. 
After fitting a stochastic process to the data, i.e., a sequence of random variables indexed by $\mathbb{T}$, the model can be used to answer several questions. 
For instance, one may extract trend lines or seasonality (relevant in for instance financial or climate modeling) and perhaps most importantly, forecast the future. 
Time series models are usually (compared to deep models) simple and easy to fit, with theoretically well understood techniques.  
In this project, we consider the problem of _anomaly detection_, i.e., to TODO CHECK THIS WITH GROUP!!!, given a previous sample (i.e., training data) of different time series, detect if a new series is anomalous in some way. 
This can be various things, for instance, if the series contains outlier points, if a subset of the series does not fit in what one would expect, or, as is the case for us, if the series is entirely different in some sense from what is expected. 

TODO? Picture in some way. 

One viable approach to this would be to fit a time series model to the training data, and using a statistical approach determine if the new series is different from what the model predicts. 
In this project, however, we will avoid the modeling step and instead take a fully data-driven approach using a deep learning technique known as _autoencoders_
 

## Autoencoders

Autoencoders are neural networks that are used to learn embeddings of unlabeled data. 
An autoencoder consists of two networks, the _encoder_ and the _decoder_. 
The encoder learns a representation of the input data into some latent space, and the decoder learns to reconstruct the input data from the representations. 
Formally, the autoencoder consists of a $4$-tuple $(\mathcal D,\mathcal E,\varphi_E,\varphi_D)$.
 $\mathcal D$ is the data space (in our case, the space of time series of a certain length), and $\mathcal E$ is the latent space of representations, in our case chosen to be a Euclidean space of dimension $n$. 
Further, $\varphi_E\colon \mathcal D \to \mathcal E$ is the _encoder_ and $\varphi_D\colon \mathcal E \to \mathcal D$ is the _decoder_. 

The mappings $\varphi_E$ and $\varphi_D$ are parametrized by neural networks. 
In our case, we use TODO WHEN DECIDED. MLP or SOMETHING ELSE?


To train the autoencoder, it is assigned a task to solve. 
In practice, this means selecting a data fidelity measure on $\mathcal{D}$, i.e., a function $d\colon \mathcal{D} \times \mathcal{D} \to \mathbb R$.
Then, given the parametrized coders $\varphi_E^{\theta_E}$ and $\varphi_D^{\theta_D}$, where $\theta_E$ and $\theta_D$ are the parameters of the functions indicated by their subscript, the training problem is to solve
$$
\min_{\theta_E,\theta_D} \sum_{i=1}^N d(x_i,\varphi_D^{\theta_D}\circ \varphi_E^{\theta_E}(x_i)) \coloneq ,
$$
where $\{x_i, i = 1,\ldots N\} \subset \mathcal D$ are the $N$ samples of the training data, e.g., the $N$ time series used to learn what a normal time series should look like. 


## Time series anomaly detection using autoencoders

Given an autoencoder trained $(\mathcal D,\mathcal E,\varphi_E,\varphi_D)$ on normal, non-anomalous time series, the idea is that the autoencoder has learned the most important features of a normal time series. If then provided with an anomalous time series $z \in \mathcal D$ _not_ containing some or all of these important features, the reconstruction performance (i.e., $d(z,\varphi_D\circ \varphi_E(z))$ ) should drop significantly. 
Moreover, if there are several autoencoders, each can weigh in with their reconstruction performance, thereby building a distribution of reconstruction scores on which we can compute uncertainties. 

## ECG Data
Electrocardiograms (ECG) is a recording of the electrical activity of the heart through several cardiac cycles. ECGs are used as a diagnostic tool to detect several cardiac issues, such as heart attacks, arrhythmias or thickening of the heart muscle. These appear as anomalies from the standard ECG time series. 
We use ECG data from SOURCE for several reasons. Namely, 
* It is a suitable test case for anomaly detection using autoencoders. The relevant anomalies are very different from the standard ECG sample. 
* It is a suitable for scalability scenarios. Around the world, more than 300 million such test (SOURCE) are performed each year, and new data is produced almost continuously. 
* TODO fill in if there are more things. 

## Making it scalable

TODO: What is goal to do exactly? Describe when we know more. 

## Discussion

TODO: PLENTY OF RESEARCH SUGGESTS AUTOENCODERS IS NOT SUITABLE FOR THE APPLICATION: SHOULD WE DISCUSS THIS?

Neural networks in general and autoencoders can have surprisingly good out-of-sample performance, meaning that even if an autoencoder has not seen a type of anomalous time series, it could still reconstruct the anomalous time series. This in turn results in a low reconstruction performance, meaning that we do not detect the anomaly. For medical applications, one could ask, given this, if it is reasonable to still use autoencoders for anomaly detection purposes. On the one hand, a human interpreting ECG time series can also make mistakes, but on the other, the by some perceived objectivity of deep learning based methods could mean that life-threatening diseases are never diagnosed, and the patient is sent home, because "the computer is objective and cannot lie". 

$$