# Outlier Detection Workshop

## Agenda (as far as we get):
1. Concept of stationary timeseries
2. Kullback-Leibler and Jensen-Shannon Divergences
3. Seasonal autoregressive integrated moving average (SARIMA)
4. Singular Spectrum Analysis
5. Curve fitting
6. Facebook Prophet
7. Further methods
    - Low Pass Filter (z Scoring)
    - Isolation Forest
    - Seasonal extrem studentized deviate (S-ESD)
8. Bayesian Networks
    - a little (micro) introduction: Flip Coins
    - Example Implementation with a Gaussian Process and a latent variable
9. Deep Learning
    - [(LSTM) AutoEncoder](https://colab.research.google.com/drive/1yDVG6C9-R9NTlXqPOMPKzXAOMdPsDhQN#scrollTo=saamYyUsHdw0)
    - [Dual-Stage Attention-Based Recurrent Neural Network](https://arxiv.org/pdf/1704.02971.pdf)
    
    
## Preparations
For the workshop, several frameworks and libraries will be used. The whole list is being provided by the file "OD_Workshop_Env.yml". Assuming an available _conda_ distribution, it can be loaded via:

```conda env create -f OD_Workshop_Env.yml```

## Frameworks and remarks (Python)
We start by mentioning a list ("Basics") of very standard libraries, and follow up with structuring further packages by purposes.

### Basics

#### Numpy and Pandas
Those are part of the [NumFocus Stack](https://numfocus.org/) and the very core (especially numpy) of almost all analytical frameworks for Python out there. _Numpy_ is unrivaled in its capabilities, performance and API regarding n-dimensionsal array computations. Pandas builds on top of Numpy and provides DataFrames and the needs around those. It is very hard to get around both frameworks within the Python environment :]

#### SciKit Learn
Also part of the [NumFocus Stack](https://numfocus.org/), providing a very extensive list of implemented Machinelearning Algorithms. The standardized API of _sklearn_ is worth mentioning, since many libraries, including external ones, are following the API patterns introduced by sklearn implying direct interface compatibilities. Sklearn leaves Bayesian networks and Deep Learning capabilities to other, more specialized frameworks.

#### Matplotlib 
The standard visualization library in analytical environments. Very mighty in general, yet lacking interactivity. Takes a lot of documentation reading as soon as one wants to strave away from very standards... but it's usually possible!

#### statsmodels
this library implements a lot of sophisticated models and statistical tools for working with timeseries. This library does not deliver proper stochastic or deep learning capabilities.

### Sorted by Purpose

### "mechanical" Basics for analytical workflows
- Numpy
- Pandas

### Data Visualization
- Matplotlib
- Seaborn
- Bokeh
- Plotly
- Altair
- Arviz (for working with PyMC3)

### Statistical Toolsets (incl. machinelearning, excluding BN and DL)
- SciPy (academical toolsets with strong focus on statistics)
- statsmodels (timeseries specialist)
- sklearn (general purpose machine learning of all kind)
- PyOD (general purpose Outlier Detection)
- sesd (seasonal extremized studentized derivation)
- Facebook Prophet (Specialist for timeseries with daily frequencies)

### Bayesian Networks
- PyMC3

### Deep Learning
- Tensorflow
- PyTorch
- Keras