## Project Title: Time-Series Anomaly Detection
***
### Description: 
On the keras website, there is an example of time-series anomaly detection. Re-create this example in a notebook of your own, explaining the concepts. Clearly explain each keras function used, referring to the documentation. Include an introduction to your notebook, setting the context and describing what the reader can expect as they read down through the notebook. Include a conclusion section where you suggest improvements you could make to the analysis in the notebook.

### Introduction:
Time-series anomaly detection is the finding of unexpected deviances from what would be expected from a particular metric over a length of time, this may be in hours, days or some other length of time. The normal behaviour of an input or metric must be known before anomalies can be accurately detected. A line graph of a time-series shows repeated patterns such as a wave or else a less smooth jagged output. A straight line would mean there is no change in the metric over a time span. 

In [1]:
import numpy as np
import pandas as pd
from tensorflow import keras
from tensorflow.keras import layers
from matplotlib import pyplot as plt

Load the data:

We will use the Numenta Anomaly Benchmark(NAB) dataset. It provides artifical timeseries data containing labeled anomalous periods of behavior. Data are ordered, timestamped, single-valued metrics.

We will use the art_daily_small_noise.csv file for training and the art_daily_jumpsup.csv file for testing. The simplicity of this dataset allows us to demonstrate anomaly detection effectively.

In [2]:
master_url_root = "https://raw.githubusercontent.com/numenta/NAB/master/data/"

df_small_noise_url_suffix = "artificialNoAnomaly/art_daily_small_noise.csv"
df_small_noise_url = master_url_root + df_small_noise_url_suffix
# Reading in the csv file using Python's pandas package. 'parse_dates=True' parses a column or index to date data types (convert
# to a datetime object). 'index_col' will set the index column to whatever column you want:
df_small_noise = pd.read_csv(
    df_small_noise_url, parse_dates=True, index_col="timestamp"
)

df_daily_jumpsup_url_suffix = "artificialWithAnomaly/art_daily_jumpsup.csv"
df_daily_jumpsup_url = master_url_root + df_daily_jumpsup_url_suffix
df_daily_jumpsup = pd.read_csv(
    df_daily_jumpsup_url, parse_dates=True, index_col="timestamp"
)

In [3]:
print(df_small_noise.head())

print(df_daily_jumpsup.head())

                         value
timestamp                     
2014-04-01 00:00:00  18.324919
2014-04-01 00:05:00  21.970327
2014-04-01 00:10:00  18.624806
2014-04-01 00:15:00  21.953684
2014-04-01 00:20:00  21.909120
                         value
timestamp                     
2014-04-01 00:00:00  19.761252
2014-04-01 00:05:00  20.500833
2014-04-01 00:10:00  19.961641
2014-04-01 00:15:00  21.490266
2014-04-01 00:20:00  20.187739


Visualize the data:
Timeseries data without anomalies. We will use the following data for training.

In [None]:
fig, ax = plt.subplots()
df_small_noise.plot(legend=False, ax=ax)
plt.show()

Timeseries data with anomalies:

In [None]:
fig, ax = plt.subplots()
df_daily_jumpsup.plot(legend=False, ax=ax)
plt.show()

In [None]:
Prepare training data:

In [None]:
# Normalize and save the mean and std we get, for normalizing test data. Normalization is done to ensure variables are of a 
# similar scale.
training_mean = df_small_noise.mean()
training_std = df_small_noise.std()
# 'z-score' normalization technique is used here. The mean of all the values is zero and the standard deviation is one:
df_training_value = (df_small_noise - training_mean) / training_std
print("Number of training samples:", len(df_training_value))

Creating sequences:

In [None]:
TIME_STEPS = 288

# Generated training sequences for use in the model.
def create_sequences(values, time_steps=TIME_STEPS):
    output = []
    for i in range(len(values) - time_steps + 1):
        output.append(values[i : (i + time_steps)])
    return np.stack(output)


x_train = create_sequences(df_training_value.values)
print("Training input shape: ", x_train.shape)

Build a model:

Convolutional Neural Networks:
The convolution layer is used to build a convolutional neural network (CNN). The operation known as a "convolution" is a linear operation that involves the multiplication of a set of weights with the input. "The multiplication is performed between an array of input data and a two-dimensional array of weights, called a filter or a kernel."

"The filter is smaller than the input data and the type of multiplication applied between a filter-sized patch of the input and the filter is a dot product. A dot product is the element-wise multiplication between the filter-sized patch of the input and filter, which is then summed, always resulting in a single value. Because it results in a single value, the operation is often referred to as the “scalar product“."
"Using a filter smaller than the input is intentional as it allows the same filter (set of weights) to be multiplied by the input array multiple times at different points on the input. Specifically, the filter is applied systematically to each overlapping part or filter-sized patch of the input data, left to right, top to bottom. This systematic application of the same filter across an image is a powerful idea. If the filter is designed to detect a specific type of feature in the input, then the application of that filter systematically across the entire input image allows the filter an opportunity to discover that feature anywhere in the image. This capability is commonly referred to as translation invariance, e.g. the general interest in whether the feature is present rather than where it was present."
https://machinelearningmastery.com/convolutional-layers-for-deep-learning-neural-networks/#:~:text=Convolutional%20layers%20are%20the%20major,that%20results%20in%20an%20activation.

The Sequential class in keras groups a linear stack of layers into a keras model:

In [None]:
# Build a model:
model = keras.Sequential(
    [
        # Only the first layer can receive an optional 'input-shape' argument. If the shape is specified, the model gets built continuously as you add layers:
        layers.Input(shape=(x_train.shape[1], x_train.shape[2])),
        layers.Conv1D(
            filters=32, kernel_size=7, padding="same", strides=2, activation="relu"
        ),
        layers.Dropout(rate=0.2),
        # 1-dimensional convolution layer. It creates a convolution kernel that is convoled with the layer input over one spatial 
        # dimension, producing a tensor of outputs. Convolution layers have different properties to other layers. "Convolutional layers
        # are used in convolutional neural networks. A convolution is the application of a filter to output what is called an 'activation'." 
        # Using the same filter on an input results in a map of activations. https://machinelearningmastery.com/convolutional-layers-for-deep-learning-neural-networks/#:~:text=Convolutional%20layers%20are%20the%20major,that%20results%20in%20an%20activation.
        layers.Conv1D(filters=16, kernel_size=7, padding="same", strides=2, activation="relu"),
        # 'filters' is the number of dimensions of the output space, 'padding' can be either "valid" or "same", where valid means 
        # no padding and same ensures the output has the same height/width dimensionality as the input. 'activation' is the activation
        # function to be used, here "relu" is the rectified linear unit activation function. This returns the maximum of 0 and the 
        # input tensor. It can take in parameters: x, the input variable; alpha, a floating point number to set the slope for values 
        # lower than the threshold; max_value, a float that sets the largest value the function can return; threshold, a float to set 
        # which values will be set to zero. kernel-size' is the length of the convolution window. It returns a Tensor of same shape and 
        # dtype as the input, x. 'Cov1DTranspose' returns a Tensor of rank 3. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv1DTranspose
        layers.Conv1DTranspose(filters=16, kernel_size=7, padding="same", strides=2, activation="relu"),
        # '.Dropout' sets the rate at which inputs will be set to zero. "Inputs not set to 0 are scaled up by 1/(1 - rate) such
        # that the sum over all inputs is unchanged." Its other arguments are: noise_shape and seed. 'noise_shape' is "the shape of
        # the binary dropout mask that will be multiplied with the input." and 'seed' can be an integer used as a random seed. 
        # https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout
        layers.Dropout(rate=0.2),
        layers.Conv1DTranspose(filters=32, kernel_size=7, padding="same", strides=2, activation="relu"),
        layers.Conv1DTranspose(filters=1, kernel_size=7, padding="same"),
    ]
)
# Configuring the model using '.compile()'. 'loss' is the loss function which may be a string or a `tf.keras.losses.Loss` instance. A loss
# function is of the form fn(y_true, y_pred), where y_true are the actual values and y_pred are the model's predictions. Here the optimizer
# is the Adam algorithm.
model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.001), loss="mse")
model.summary()

Training the model using the training data (x_train) both as input and output data as this is a reconstruction model:

In [None]:
history = model.fit(
    x_train,
    x_train,
    epochs=50,
    batch_size=128,
    validation_split=0.1,
    callbacks=[
        keras.callbacks.EarlyStopping(monitor="val_loss", patience=5, mode="min")
    ],
)

In [None]:
plt.plot(history.history["loss"], label="Training Loss")
plt.plot(history.history["val_loss"], label="Validation Loss")
plt.legend()
plt.show()

Detecting anomalies:

In [None]:
# Get train MAE loss.
x_train_pred = model.predict(x_train)
train_mae_loss = np.mean(np.abs(x_train_pred - x_train), axis=1)

plt.hist(train_mae_loss, bins=50)
plt.xlabel("Train MAE loss")
plt.ylabel("No of samples")
plt.show()

# Get reconstruction loss threshold.
threshold = np.max(train_mae_loss)
print("Reconstruction error threshold: ", threshold)

Compare reconstruction:

In [None]:
# Checking how the first sequence is learnt
plt.plot(x_train[0])
plt.plot(x_train_pred[0])
plt.show()

Prepare test data:

In [None]:
df_test_value = (df_daily_jumpsup - training_mean) / training_std
fig, ax = plt.subplots()
df_test_value.plot(legend=False, ax=ax)
plt.show()

# Create sequences from test values.
x_test = create_sequences(df_test_value.values)
print("Test input shape: ", x_test.shape)

# Get test MAE loss.
x_test_pred = model.predict(x_test)
test_mae_loss = np.mean(np.abs(x_test_pred - x_test), axis=1)
test_mae_loss = test_mae_loss.reshape((-1))

plt.hist(test_mae_loss, bins=50)
plt.xlabel("test MAE loss")
plt.ylabel("No of samples")
plt.show()

# Detect all the samples which are anomalies.
anomalies = test_mae_loss > threshold
print("Number of anomaly samples: ", np.sum(anomalies))
print("Indices of anomaly samples: ", np.where(anomalies))

Plot anomalies:

In [None]:
# data i is an anomaly if samples [(i - timesteps + 1) to (i)] are anomalies
anomalous_data_indices = []
for data_idx in range(TIME_STEPS - 1, len(df_test_value) - TIME_STEPS + 1):
    if np.all(anomalies[data_idx - TIME_STEPS + 1 : data_idx]):
        anomalous_data_indices.append(data_idx)

In [None]:
df_subset = df_daily_jumpsup.iloc[anomalous_data_indices]
fig, ax = plt.subplots()
df_daily_jumpsup.plot(legend=False, ax=ax)
df_subset.plot(legend=False, ax=ax, color="r")
plt.show()