# Autoencoders for anomaly detection in time series

To assess the quality of an autoencoder, we can analyse its performance on a classification task, using the UCR benchmark dataset for time series.

## References

2016 - *Variational Inference for On-line Anomaly Detection in High-Dimensional Time Series* (cited by 4)

2014 - *Semi-supervised Learning with Deep Generative Models* (Cited by 276)

**Related articles:**

2013 - *SUCCESS: A New Approach for Semi-Supervised Classification of Time-Series* (Cited by 14)

2013 - *Semi-Supervised Time Series Classification* (Cited by 193)

In [34]:
import numpy as np
from keras.utils import np_utils

def data():
    def readucr(filename):
        data = np.loadtxt(filename, delimiter=',')
        Y = data[:, 0]
        X = data[:, 1:]
        return X, Y

    def to_categorical(y, nb_classes):
        return np_utils.to_categorical((y - y.min()) / (y.max() - y.min()) * (nb_classes - 1), nb_classes)

    fdir = "../../../ucr/"  # Path to the UCR Time Series Data directory
    fname = "ChlorineConcentration"
    print("Dataset : " + fname)

    x_train, y_train = readucr(fdir + fname + '/' + fname + '_TRAIN')
    x_test, y_test = readucr(fdir + fname + '/' + fname + '_TEST')
    nb_classes = len(np.unique(y_test))
    Y_train = to_categorical(y_train, nb_classes)
    Y_test = to_categorical(y_test, nb_classes)
    return x_train, x_test, Y_train, Y_test

x_train, x_test, Y_train, Y_test = data()
print("\nx_train shape :",x_train.shape)
print("x_test shape :",x_test.shape)
print("\nY_train shape :",Y_train.shape)
print("Y_test shape :",Y_test.shape)

Dataset : ChlorineConcentration

x_train shape : (467, 166)
x_test shape : (3840, 166)

Y_train shape : (467, 3)
Y_test shape : (3840, 3)


In [35]:
def crop_input(x_train, x_test):
    crop_length = 160

    x_train = x_train[:, :crop_length]
    x_test = x_test[:, :crop_length]
    return x_train, x_test
    
def normalize_input(x_train, x_test):
    x_concat = np.concatenate([x_train, x_test])
    x_concat = (x_concat - x_concat.min()) / (x_concat.max() - x_concat.min()) * 2 - 1.
    x_train = x_concat[:x_train.shape[0]]
    x_test = x_concat[x_train.shape[0]:]

    x_train = x_train.reshape(x_train.shape + (1,))
    x_test = x_test.reshape(x_test.shape + (1,))
    x_concat = np.concatenate([x_train, x_test],axis = 0)
    return x_train, x_test, x_concat

x_train, x_test, x_concat = normalize_input(x_train, x_test)

## Performance of the network

In [36]:
from keras.layers import Input, Flatten, Dense, Reshape
from keras.models import Model

x = Input(shape=(x_train.shape[1],))

h = Dense(512, activation='elu')(x)
h = Dense(64, use_bias=False)(h)
h = Dense(512, activation='elu')(h)

x_recons = Dense(x_train.shape[1])(h)

y = Dense(Y_train.shape[1], activation='softmax')(h)

mlt = Model(x, [x_recons, y])
mlt.compile('adam', ['mse', 'categorical_crossentropy'], metrics=['acc'])

epochs = 3000
batch_size = 256
mlt.fit(x_concat.squeeze(), [x_concat.squeeze(), np.concatenate([Y_train, Y_test])],
                   sample_weight=[np.ones(len(x_concat)),
                                  np.concatenate([np.ones(len(Y_train)),
                                                  np.zeros(len(Y_test))])],
                   shuffle=True,
                   epochs=epochs,
                   batch_size=batch_size,
                   verbose=0
        )

<keras.callbacks.History at 0x7f79fc18cb38>

### Assessing the performance

The quality of the autoencoder is given by the accuracy of the underlying classifier. For this specific dataset, we achieve state of the art performance for the time series classification. 

In [37]:
acc = (np.argmax(mlt.predict(x_test.squeeze())[1], axis=1) == np.argmax(Y_test, axis=-1)).mean()
print('ACC :', acc)

ACC : 0.882291666667


**Conclusion:** 

We have therefore found a good method for dimensionality reduction, and found a network which can help us extract fewer features that are representative of the data.

Afterwards, the reconstruction error of a given sample is a good indicator of the *normality* of the sample.

If the error is high, it's probably is an outlier.

## Improving the network

An interesting task would be to do **cross validation** between **different neural networks implementation** on this dataset and the others of the ucr benchmark to find a **generally best suited autoencoder** for time series. 

We could afterwards pipe this analysis to our data and use the best method to identify the anomalies in an unsupervised fashion.

Also an interesting challenge would be to **minimize the number of parameters** used in the neural network for two reasons:

- First, it prevents overfitting
- Second, it trains (and therefore identifies anomalies) faster

Taking a look at our last (very simple) network, we observe that we have a total of 238,249 to train:

In [38]:
mlt.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
input_18 (InputLayer)            (None, 166)           0                                            
____________________________________________________________________________________________________
dense_74 (Dense)                 (None, 512)           85504       input_18[0][0]                   
____________________________________________________________________________________________________
dense_75 (Dense)                 (None, 64)            32768       dense_74[0][0]                   
____________________________________________________________________________________________________
dense_76 (Dense)                 (None, 512)           33280       dense_75[0][0]                   
___________________________________________________________________________________________

Trying out **convolutional networks**, which take advantage of the sequantiality of time series, we can come up with a more complex network with **much less trainable parameters** (64,842), and yet an almost similar classification performance (0.87)

In [56]:
from keras.layers import Conv1D, Conv2DTranspose, Flatten, Reshape, BatchNormalization, Activation,Dropout

def convolutional_network():
    x = Input(shape=x_train.shape[1:])
    depth = 32
    kernel_size = 4
    squeeze=2

    h = Conv1D(depth, kernel_size, strides=4, padding='same')(x)
    h = BatchNormalization()(h)
    h = Activation('relu')(h)
    h = Conv1D(depth*2, kernel_size*2, strides=4, padding='same')(h)
    h = BatchNormalization()(h)
    h = Activation('relu')(h)

    hidden = Flatten()(h)

    x_recons = Reshape((-1,1,depth * squeeze))(hidden)
    x_recons = Conv2DTranspose(filters=depth, kernel_size=(kernel_size*2, 1),
                               strides=(4, 1), padding='same', activation='relu')(x_recons)
    x_recons = Conv2DTranspose(filters=1, kernel_size=(kernel_size, 1),
                               strides=(4, 1), padding='same', activation='relu')(x_recons)
    x_recons = Flatten()(x_recons)
    x_recons = Dense(x_train.shape[1])(x_recons)
    x_recons = Reshape(x_train.shape[1:])(x_recons)

    y = Dropout(0.2)(hidden)
    y = Dense(Y_train.shape[1], activation='softmax')(y)

    mlt = Model(x, [x_recons, y])
    mlt.compile('adadelta', ['mse', 'categorical_crossentropy'], metrics=['acc'])
    return mlt

convolutional_network().summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
input_26 (InputLayer)            (None, 166, 1)        0                                            
____________________________________________________________________________________________________
conv1d_11 (Conv1D)               (None, 42, 32)        160         input_26[0][0]                   
____________________________________________________________________________________________________
batch_normalization_10 (BatchNor (None, 42, 32)        128         conv1d_11[0][0]                  
____________________________________________________________________________________________________
activation_9 (Activation)        (None, 42, 32)        0           batch_normalization_10[0][0]     
___________________________________________________________________________________________

# Cross validation

The best way to deal with hyperparameters (number of layers, type of layers, number of neurons, activation type, loss to optimize, ...) is to use cross validation.

Hyperopt offers a very convenient way to search through a **space of possibilities**, with what's called the Tree-structured Parzen Estimator (TPE) algorithm.

Our **space of possibilities** looks like this:

In [58]:
from hyperopt import hp

space = {
    'n_layers': hp.choice('n_layers',
                          [{'n':1},
                           {'n':2,'squeeze':hp.choice('squeeze',[1,2])}]),
    'after_conv': hp.choice('after_conv',
                            [{'type':'max_pooling',
                              'activation':hp.choice('activation',['relu','elu',None]),
                              'pooling_size':hp.choice('pooling_size',[2,4])},
                             {'type':'batch_norm',
                              'activation': 'relu',
                              'pooling_size':None}]),
    'middle_layer': hp.choice('middle_layer',
                              [{'type':'gaussian',
                               'epsilon':hp.choice('epsilon',[0.1,0.5,1]),
                               'correct_factor': hp.choice('correct_factor',[True, False]),
                               'gaussian_regul':hp.choice('gaussian_regul',[0.1,0.5,5,10])},
                               {'type':'regular',
                                'epsilon':None,
                                'correct_factor':None,
                                'gaussian_regul':None}
                              ]),
    'depth': hp.choice('depth',[16,32,64]),
    'kernel_size': hp.choice('kernel_size',[2,4,8,16]),
    'intermediate_dim': hp.choice('intermediate_dim',[16,32,64]),
    'dropout': hp.uniform('dropout',.0,.5),
    'recons_regul': hp.uniform('recons_regul',1.,10.),
    'optimizer': hp.choice('optimizer',['adadelta','adam','rmsprop']),
}

## Interpretation of the results

The results can be exported to a json format, and the best autoencoder can thereafter be chosen.

In [61]:
import pickle
from hyperas.utils import eval_hyperopt_space
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials

trials = pickle.load(open("../../../nn-tsc/logs/opt/trials.p", "rb")) # Trials changed
space = pickle.load(open("../../../nn-tsc/logs/opt/space.p", "rb"))

for trial in trials:
    vals = trial.get('misc').get('vals')
    print(eval_hyperopt_space(space,vals))
    print(trial.get('result').get('loss'))

{'after_conv': {'activation': 'elu', 'pooling_size': 4, 'type': 'max_pooling'}, 'depth': 32, 'dropout': 0.2, 'intermediate_dim': 32, 'kernel_size': 8, 'middle_layer': {'correct_factor': None, 'epsilon': None, 'gaussian_regul': None, 'type': 'regular'}, 'n_layers': {'n': 2, 'squeeze': 1}, 'optimizer': 'adam', 'recons_regul': 7.179121423546559}
-0.8591145833333333
{'after_conv': {'activation': 'relu', 'pooling_size': None, 'type': 'batch_norm'}, 'depth': 16, 'dropout': 0.2, 'intermediate_dim': 32, 'kernel_size': 8, 'middle_layer': {'correct_factor': None, 'epsilon': None, 'gaussian_regul': None, 'type': 'regular'}, 'n_layers': {'n': 1}, 'optimizer': 'adadelta', 'recons_regul': 1.5845342151446347}
-0.87578125
{'after_conv': {'activation': 'relu', 'pooling_size': None, 'type': 'batch_norm'}, 'depth': 64, 'dropout': 0.2, 'intermediate_dim': 32, 'kernel_size': 8, 'middle_layer': {'correct_factor': None, 'epsilon': None, 'gaussian_regul': None, 'type': 'regular'}, 'n_layers': {'n': 1}, 'optim