# Timeseries anomaly detection using an Autoencoder

**Author:** [pavithrasv](https://github.com/pavithrasv)<br>
**Date created:** 2020/05/31<br>
**Last modified:** 2020/05/31<br>
**Description:** Detect anomalies in a timeseries using an Autoencoder.

https://github.com/keras-team/keras-io/blob/master/examples/timeseries/ipynb/timeseries_anomaly_detection.ipynb

## Introduction

This script demonstrates how you can use a reconstruction convolutional
autoencoder model to detect anomalies in timeseries data.

## Setup

In [1]:
import numpy as np
import pandas as pd
import keras
from keras import layers, models
from matplotlib import pyplot as plt

2024-08-23 12:22:28.724580: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-08-23 12:22:28.777284: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [2]:
from joblib import delayed

In [3]:
import pandas as pd

In [4]:
import random

In [5]:
import tensorflow as tf
from tensorflow.keras.callbacks import Callback

In [6]:
import dask.dataframe as dd
import os
import shutil

In [7]:
batchSize = 1024

## Load the data

We will use the [Numenta Anomaly Benchmark(NAB)](
https://www.kaggle.com/boltzmannbrain/nab) dataset. It provides artificial
timeseries data containing labeled anomalous periods of behavior. Data are
ordered, timestamped, single-valued metrics.

We will use the `art_daily_small_noise.csv` file for training and the
`art_daily_jumpsup.csv` file for testing. The simplicity of this dataset
allows us to demonstrate anomaly detection effectively.

## Build a model

We will build a convolutional reconstruction autoencoder model. The model will
take input of shape `(batch_size, sequence_length, num_features)` and return
output of the same shape. In this case, `sequence_length` is 288 and
`num_features` is 1.

### Get Data  sequences
Create sequences combining `TIME_STEPS` contiguous data values from the
training data.

In [8]:
%%time
TIME_STEPS = 1000
Dims = 3

Folder = '/scratch/1000Sm/'
Folder = '/scratch/1000Input/'
Folder = '/lclscr/1000Inputs/'

CPU times: user 3 μs, sys: 1e+03 ns, total: 4 μs
Wall time: 5.96 μs


In [9]:
%%time 
Olines = [
    os.path.join(Folder,file)
    for file in os.listdir(Folder) if file.endswith('Outs.csv') #and file.startswith('2')
]

CPU times: user 5.29 s, sys: 2.16 s, total: 7.45 s
Wall time: 7.47 s


In [10]:
lines = [sub.replace('Outs', 'Data') for sub in Olines]

with open('FileListAsOf0812-b.txt', 'r') as file:
    # Read all lines into a list
    Alines = [line.rstrip('\n') for line in file]
    
    #file.readlines()
    

lines = []
for line in Alines:
    lines.append(Folder+line[19:])

In [11]:
len(lines)

6754297

dataset = tf.data.Dataset.list_files(file_list)

In [12]:
def parse_csv(file_path):
    # Read the CSV file
    try:
        df = pd.read_csv(file_path, header=None)
        df.columns = ['rx','ry','rz']
        features = df[['rx','ry','rz']].values.tolist()
    except:
        print(file_path)
        features = np.zeros((1000,3)).tolist()
    
    try:
        label = pd.read_csv(file_path[:-8]+'Outs.csv', header=None)
        label.columns = ['sx']
        labels = np.asarray(label.sx)
    except:
        labels = np.asarray(np.zeros((1000)))
        print(file_path[:-8]+'Outs.csv')
        
    return features, labels

In [13]:
class CustomDataset(tf.keras.utils.Sequence):
    def __init__(self, files, batch_size=batchSize, shuffle=True, **kwargs):
        super().__init__(**kwargs)  # Call the parent class constructor
        self.batch_size = batch_size
        self.shuffle = shuffle
        self.files = files
        self.on_epoch_end()
        self.OneTime = 0
        
    def __len__(self):
        return int(np.floor(len(self.files) / self.batch_size))

    def __getitem__(self, index):
        indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]
        Feat, Labe = [], []
        
        for idx in indexes:
            f = parse_csv(self.files[idx])
            Feat.append(f[0])
            Labe.append(f[1])

        if self.OneTime == 0:
            print(np.shape(Feat), np.shape(Labe))
            self.OneTime = 1
        
        Feat = tf.convert_to_tensor(Feat, dtype=tf.float32)
        Labe = tf.convert_to_tensor(Labe, dtype=tf.float32)
        return Feat, Labe

    def on_epoch_end(self):
        self.indexes = np.arange(len(self.files))
        if self.shuffle:
            np.random.shuffle(self.indexes)


In [14]:
training_generator = CustomDataset(lines, batch_size=batchSize, shuffle = True)

## Train the model

Please note that we are using `x_train` as both the input and the target
since this is a reconstruction model.

In [15]:
model = models.Sequential(
    [
        # Input layer
        layers.Input(shape=(1000, 3)),
        
        # Increased number of filters and added layers
        layers.Conv1D(filters=128, kernel_size=7, padding="same", strides=2, activation="relu", kernel_regularizer=tf.keras.regularizers.l2(0.01)),
        layers.Conv1D(filters=64, kernel_size=7, padding="same", strides=2, activation="relu", kernel_regularizer=tf.keras.regularizers.l2(0.01)),
        layers.Conv1D(filters=32, kernel_size=7, padding="same", strides=2, activation="relu", kernel_regularizer=tf.keras.regularizers.l2(0.01)),
        
        # Dropout layer for regularization
        layers.Dropout(rate=0.3),
        
        # Transpose layers with increased filters
        layers.Conv1DTranspose(filters=32, kernel_size=7, padding="same", strides=2, activation="relu"),
        layers.Conv1DTranspose(filters=64, kernel_size=7, padding="same", strides=2, activation="relu", kernel_regularizer=tf.keras.regularizers.l2(0.01)),
        layers.Conv1DTranspose(filters=128, kernel_size=7, padding="same", strides=2, activation="relu", kernel_regularizer=tf.keras.regularizers.l2(0.01)),
        
        # Dropout layer for regularization
        layers.Dropout(rate=0.3),
        
        # Output layer
        layers.Conv1DTranspose(filters=1, kernel_size=7, padding="same"),
        
        # Added dense layers with more neurons
        layers.Dense(256, activation="relu"),
        layers.Dense(128, activation="relu"),
        layers.Dense(1)
    ]
)

2024-08-23 12:22:40.432766: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22287 MB memory:  -> device: 0, name: NVIDIA A30, pci bus id: 0000:b3:00.0, compute capability: 8.0


In [16]:
class CustomModelCheckpoint(Callback):
    def __init__(self, filepath, save_freq):
        super(CustomModelCheckpoint, self).__init__()
        self.filepath = filepath
        self.save_freq = save_freq

    def on_epoch_end(self, epoch, logs=None):
        if (epoch + 1) % self.save_freq == 0:
            self.model.save(self.filepath.format(epoch=epoch + 1), save_format='keras')


In [17]:
checkpoint_callback = CustomModelCheckpoint(
    filepath='/scratch/models/TAD_0823_checkpoint_a_{epoch:02d}.keras',
    save_freq=1  
)

In [18]:
model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.001), loss="mse")
model.summary()

In [19]:
checkpoint_callback = CustomModelCheckpoint(
    filepath='/scratch/models/TAD_0822_checkpoint_a_{epoch:02d}.h5',
    save_freq=1  
)

tb_callback = tf.keras.callbacks.TensorBoard(log_dir='/scratch/models/profiles/0823',
                                            profile_batch='01, 125')

es_callback = keras.callbacks.EarlyStopping(monitor="loss", patience=5, mode="min")

2024-08-23 12:22:40.810814: I external/local_tsl/tsl/profiler/lib/profiler_session.cc:104] Profiler session initializing.
2024-08-23 12:22:40.810835: I external/local_tsl/tsl/profiler/lib/profiler_session.cc:119] Profiler session started.
2024-08-23 12:22:40.810854: I external/local_xla/xla/backends/profiler/gpu/cupti_tracer.cc:1239] Profiler found 1 GPUs
2024-08-23 12:22:40.930580: I external/local_tsl/tsl/profiler/lib/profiler_session.cc:131] Profiler session tear down.
2024-08-23 12:22:40.930781: I external/local_xla/xla/backends/profiler/gpu/cupti_tracer.cc:1364] CUPTI activity buffer flushed


In [None]:
history = model.fit(
    x=training_generator,
    epochs=10,
    batch_size=batchSize,
    #validation_split=0.1,
    callbacks=[checkpoint_callback, es_callback, tb_callback],        
    )

(1024, 1000, 3) (1024, 1000)
Epoch 1/10


2024-08-23 12:27:04.994540: I external/local_tsl/tsl/profiler/lib/profiler_session.cc:104] Profiler session initializing.
2024-08-23 12:27:04.994559: I external/local_tsl/tsl/profiler/lib/profiler_session.cc:119] Profiler session started.
2024-08-23 12:27:18.707953: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:450] ShuffleDatasetV3:2: Filling up shuffle buffer (this may take a while): 2 of 8
2024-08-23 12:27:30.441923: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:450] ShuffleDatasetV3:2: Filling up shuffle buffer (this may take a while): 4 of 8

I0000 00:00:1724430499.672073 2537950 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m 124/6595[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m10:25:31[0m 6s/step - loss: 2.0781

2024-08-23 12:40:19.192495: I external/local_tsl/tsl/profiler/lib/profiler_session.cc:70] Profiler session collecting data.
2024-08-23 12:40:19.211685: I external/local_xla/xla/backends/profiler/gpu/cupti_tracer.cc:1364] CUPTI activity buffer flushed
2024-08-23 12:40:19.273236: I external/local_xla/xla/backends/profiler/gpu/cupti_collector.cc:540]  GpuTracer has collected 286469 callback api events and 280251 activity events. 
2024-08-23 12:40:20.813935: I external/local_tsl/tsl/profiler/lib/profiler_session.cc:131] Profiler session tear down.
2024-08-23 12:40:20.928802: I external/local_tsl/tsl/profiler/rpc/client/save_profile.cc:144] Collecting XSpace to repository: /scratch/models/profiles/0823/plugins/profile/2024_08_23_12_40_20/as01.sciclone.wm.edu.xplane.pb


[1m2078/6595[0m [32m━━━━━━[0m[37m━━━━━━━━━━━━━━[0m [1m7:23:34[0m 6s/step - loss: 1.1416/lclscr/1000Inputs/230117 recording200038863Outs.csv
[1m4169/6595[0m [32m━━━━━━━━━━━━[0m[37m━━━━━━━━[0m [1m4:01:17[0m 6s/step - loss: 1.0794/lclscr/1000Inputs/221014 recording100092303Data.csv
[1m6595/6595[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6s/step - loss: 1.0535



[1m6595/6595[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40995s[0m 6s/step - loss: 1.0535
Epoch 2/10


2024-08-23 23:50:35.237728: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:450] ShuffleDatasetV3:2: Filling up shuffle buffer (this may take a while): 2 of 8
2024-08-23 23:50:50.102011: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:450] ShuffleDatasetV3:2: Filling up shuffle buffer (this may take a while): 4 of 8
2024-08-23 23:51:05.264738: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:450] ShuffleDatasetV3:2: Filling up shuffle buffer (this may take a while): 6 of 8


[1m   1/6595[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m112:05:07[0m 61s/step - loss: 0.9863

2024-08-23 23:51:20.951701: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:480] Shuffle buffer filled.


[1m1235/6595[0m [32m━━━[0m[37m━━━━━━━━━━━━━━━━━[0m [1m11:03:54[0m 7s/step - loss: 0.9951/lclscr/1000Inputs/221014 recording100092303Data.csv
[1m1565/6595[0m [32m━━━━[0m[37m━━━━━━━━━━━━━━━━[0m [1m10:22:40[0m 7s/step - loss: 0.9953/lclscr/1000Inputs/230117 recording200038863Outs.csv
[1m6595/6595[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7s/step - loss: 0.9973



[1m6595/6595[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m48608s[0m 7s/step - loss: 0.9973
Epoch 3/10


2024-08-24 13:20:43.386801: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:450] ShuffleDatasetV3:2: Filling up shuffle buffer (this may take a while): 2 of 8
2024-08-24 13:20:58.565370: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:450] ShuffleDatasetV3:2: Filling up shuffle buffer (this may take a while): 4 of 8
2024-08-24 13:21:13.466978: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:450] ShuffleDatasetV3:2: Filling up shuffle buffer (this may take a while): 6 of 8


[1m   1/6595[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m109:32:38[0m 60s/step - loss: 0.9559

2024-08-24 13:21:28.024368: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:480] Shuffle buffer filled.


[1m 956/6595[0m [32m━━[0m[37m━━━━━━━━━━━━━━━━━━[0m [1m11:26:25[0m 7s/step - loss: 1.0076/lclscr/1000Inputs/230117 recording200038863Outs.csv
[1m2089/6595[0m [32m━━━━━━[0m[37m━━━━━━━━━━━━━━[0m [1m8:44:44[0m 7s/step - loss: 1.0049/lclscr/1000Inputs/221014 recording100092303Data.csv
[1m6595/6595[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6s/step - loss: 1.0010



[1m6595/6595[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42499s[0m 6s/step - loss: 1.0010
Epoch 4/10


2024-08-25 01:09:00.837351: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:450] ShuffleDatasetV3:2: Filling up shuffle buffer (this may take a while): 2 of 8
2024-08-25 01:09:13.203709: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:450] ShuffleDatasetV3:2: Filling up shuffle buffer (this may take a while): 4 of 8
2024-08-25 01:09:32.037020: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:450] ShuffleDatasetV3:2: Filling up shuffle buffer (this may take a while): 7 of 8


[1m   1/6595[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m93:23:42[0m 51s/step - loss: 1.0242

2024-08-25 01:09:38.374657: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:480] Shuffle buffer filled.


[1m 865/6595[0m [32m━━[0m[37m━━━━━━━━━━━━━━━━━━[0m [1m9:56:58[0m 6s/step - loss: 0.9943/lclscr/1000Inputs/230117 recording200038863Outs.csv
[1m2851/6595[0m [32m━━━━━━━━[0m[37m━━━━━━━━━━━━[0m [1m6:30:04[0m 6s/step - loss: 0.9990/lclscr/1000Inputs/221014 recording100092303Data.csv
[1m4468/6595[0m [32m━━━━━━━━━━━━━[0m[37m━━━━━━━[0m [1m3:41:40[0m 6s/step - loss: 0.9987

Let's plot training and validation loss to see how the training went.

In [None]:
plt.plot(history.history["loss"], label="Training Loss")
#plt.plot(history.history["val_loss"], label="Validation Loss")
plt.legend()
plt.show()

In [None]:
asdfasdfasdf

## Detecting anomalies

We will detect anomalies by determining how well our model can reconstruct
the input data.


1.   Find MAE loss on training samples.
2.   Find max MAE loss value. This is the worst our model has performed trying
to reconstruct a sample. We will make this the `threshold` for anomaly
detection.
3.   If the reconstruction loss for a sample is greater than this `threshold`
value then we can infer that the model is seeing a pattern that it isn't
familiar with. We will label this sample as an `anomaly`.


In [None]:
# Get train MAE loss.
x_train_pred = model.predict(x_train)
train_mae_loss = np.mean(np.abs(x_train_pred - x_train), axis=1)

plt.hist(train_mae_loss, bins=50)
plt.xlabel("Train MAE loss")
plt.ylabel("No of samples")
plt.show()

# Get reconstruction loss threshold.
threshold = np.max(train_mae_loss)
print("Reconstruction error threshold: ", threshold)

### Compare recontruction

Just for fun, let's see how our model has recontructed the first sample.
This is the 288 timesteps from day 1 of our training dataset.

In [None]:
# Checking how the first sequence is learnt
plt.plot(x_train[0])
plt.plot(x_train_pred[0])
plt.show()

### Prepare test data

In [None]:

df_test_value = (df_daily_jumpsup - training_mean) / training_std
fig, ax = plt.subplots()
df_test_value.plot(legend=False, ax=ax)
plt.show()

# Create sequences from test values.
x_test = create_sequences(df_test_value.values)
print("Test input shape: ", x_test.shape)

# Get test MAE loss.
x_test_pred = model.predict(x_test)
test_mae_loss = np.mean(np.abs(x_test_pred - x_test), axis=1)
test_mae_loss = test_mae_loss.reshape((-1))

plt.hist(test_mae_loss, bins=50)
plt.xlabel("test MAE loss")
plt.ylabel("No of samples")
plt.show()

# Detect all the samples which are anomalies.
anomalies = test_mae_loss > threshold
print("Number of anomaly samples: ", np.sum(anomalies))
print("Indices of anomaly samples: ", np.where(anomalies))

## Plot anomalies

We now know the samples of the data which are anomalies. With this, we will
find the corresponding `timestamps` from the original test data. We will be
using the following method to do that:

Let's say time_steps = 3 and we have 10 training values. Our `x_train` will
look like this:

- 0, 1, 2
- 1, 2, 3
- 2, 3, 4
- 3, 4, 5
- 4, 5, 6
- 5, 6, 7
- 6, 7, 8
- 7, 8, 9

All except the initial and the final time_steps-1 data values, will appear in
`time_steps` number of samples. So, if we know that the samples
[(3, 4, 5), (4, 5, 6), (5, 6, 7)] are anomalies, we can say that the data point
5 is an anomaly.

In [None]:
# data i is an anomaly if samples [(i - timesteps + 1) to (i)] are anomalies
anomalous_data_indices = []
for data_idx in range(TIME_STEPS - 1, len(df_test_value) - TIME_STEPS + 1):
    if np.all(anomalies[data_idx - TIME_STEPS + 1 : data_idx]):
        anomalous_data_indices.append(data_idx)

Let's overlay the anomalies on the original test data plot.

In [None]:
df_subset = df_daily_jumpsup.iloc[anomalous_data_indices]
fig, ax = plt.subplots()
df_daily_jumpsup.plot(legend=False, ax=ax)
df_subset.plot(legend=False, ax=ax, color="r")
plt.show()