# Exercise - AEs

Use the $\texttt{ECG5000}$ dataset$^1$ to perform autoencoding and anomaly detection of ECGs. Specifically, split the data into normal and anomalous data as well as a train and test set. 

1. Use a linear and a nonlinear autoencoder to perform reconstructions of the **normal** ECGs. Measure performance both by MSE and MAE. Which model is best on the respective measures (measured on the test data, having used the train data to train the models)? The bottleneck layer should contain 8 neurons.
1. Use one or both of the models from above to perform anomaly detection (you decide which metric, i.e. MSE or MAE, to use for this purpose). That is, find the losses on the training data, and then decide on a threshold above which you classify data as anomolous.
1. Use a supervised model to perform anomaly detection (i.e. use both the normal and anomolous data for training). Is this model better than the approach above? Is this still the case if you restrict the number of anomalies in the training data to a small number (such as 10)?

$^1$http://www.timeseriesclassification.com/description.php?Dataset=ECG5000.

**Hint**: Consider looking at https://www.tensorflow.org/tutorials/generative/autoencoder, as they go through some of the same steps.

**See slides for more details!**

# Setup

This is simply some code to prepare the data. Mostly similar to https://www.tensorflow.org/tutorials/generative/autoencoder.

In [None]:
import tensorflow as tf
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from matplotlib import pyplot as plt
from sklearn.metrics import accuracy_score, precision_score, recall_score

dataframe = pd.read_csv('http://storage.googleapis.com/download.tensorflow.org/data/ecg.csv', header=None)
raw_data = dataframe.values

In [None]:
# The last element contains the labels
labels = raw_data[:, -1]

# The other data points are the electrocadriogram data
data = raw_data[:, 0:-1]

train_data, test_data, train_labels, test_labels = train_test_split(
    data, labels, test_size=0.2, random_state=42
)

In [None]:
scaler = StandardScaler()
scaler.fit(train_data)

x_train = scaler.transform(train_data)
x_test = scaler.transform(test_data)

In [None]:
x_train_normal = x_train[train_labels == 1]
x_train_anomalous = x_train[train_labels == 0]

x_test_normal = x_test[test_labels == 1]
x_test_anomalous = x_test[test_labels == 0]

In [None]:
print(x_train_normal.shape, x_train_anomalous.shape, x_test_normal.shape, x_test_anomalous.shape, train_labels.shape, test_labels.shape)

In [None]:
plt.grid()
plt.plot(train_data[train_labels == 1][0])
plt.title("A Normal ECG")
plt.show()

plt.grid()
plt.plot(train_data[train_labels == 0][0])
plt.title("An Anomolous ECG")
plt.show()

# Exercise 1

Use a linear and a nonlinear autoencoder to perform reconstructions of the **normal** ECGs. Measure performance both by MSE and MAE. Which model is best on the respective measures (measured on the test data, having used the train data to train the models)? The bottleneck layer should contain 8 neurons.

Let us define the linear autoencoder below.

In [None]:
encoder_linear = tf.keras.models.Sequential([
    ??
], name='encoder')

decoder_linear = tf.keras.models.Sequential([
    ??
], name='decoder')

autoencoder_linear = tf.keras.models.Sequential([
    ??
], name='autoencoder')

autoencoder_linear.compile(loss=??, optimizer=??, metrics=[??])

In [None]:
encoder_linear.summary()
print('\n')
decoder_linear.summary()
print('\n')
autoencoder_linear.summary()

Now for the nonlinear autoencoder.

In [None]:
encoder_nonlinear = tf.keras.models.Sequential([
    ??
], name='encoder')

decoder_nonlinear = tf.keras.models.Sequential([
    ??
], name='decoder')

autoencoder_nonlinear = tf.keras.models.Sequential([
    ??
], name='autoencoder')

autoencoder_nonlinear.compile(loss=??, optimizer=??, metrics=[??])

In [None]:
encoder_nonlinear.summary()
print('\n')
decoder_nonlinear.summary()
print('\n')
autoencoder_nonlinear.summary()

Let us train them.

In [None]:
hist_linear = autoencoder_linear.fit(
    ??
)

hist_nonlinear = autoencoder_nonlinear.fit(
    ??
)

In [None]:
plt.plot(hist_linear.history["loss"], label="Training MSE (linear)")
plt.plot(hist_linear.history["val_loss"], label="Test MSE (linear)")
plt.plot(hist_nonlinear.history["loss"], label="Training MSE (nonlinear)")
plt.plot(hist_nonlinear.history["val_loss"], label="Test MSE (nonlinear)")
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()

plt.plot(hist_linear.history["mae"], label="Training MAE (linear)")
plt.plot(hist_linear.history["val_mae"], label="Test MAE (linear)")
plt.plot(hist_nonlinear.history["mae"], label="Training MAE (nonlinear)")
plt.plot(hist_nonlinear.history["val_mae"], label="Test MAE (nonlinear)")
plt.legend()
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.show()

# Exercise 2

Use one or both of the models from above to perform anomaly detection (you decide which metric, i.e. MSE or MAE, to use for this purpose). That is, find the losses on the training data, and then decide on a threshold above which you classify data as anomolous.

Let us find the non-anomaly training and test reconstructions losses for each network.

In [None]:
x_train_normal_reconstructed_linear = autoencoder_linear.predict(x_train_normal)
x_train_normal_reconstructed_nonlinear = autoencoder_nonlinear.predict(x_train_normal)

mse_x_train_normal_linear = tf.keras.losses.mse(x_train_normal_reconstructed_linear, x_train_normal)
mae_x_train_normal_linear = tf.keras.losses.mae(x_train_normal_reconstructed_linear, x_train_normal)
mse_x_train_normal_nonlinear = tf.keras.losses.mse(x_train_normal_reconstructed_nonlinear, x_train_normal)
mae_x_train_normal_nonlinear = tf.keras.losses.mae(x_train_normal_reconstructed_nonlinear, x_train_normal)

In [None]:
x_test_normal_reconstructed_linear = autoencoder_linear.predict(x_test_normal)
x_test_normal_reconstructed_nonlinear = autoencoder_nonlinear.predict(x_test_normal)

mse_x_test_normal_linear = tf.keras.losses.mse(x_test_normal_reconstructed_linear, x_test_normal)
mae_x_test_normal_linear = tf.keras.losses.mae(x_test_normal_reconstructed_linear, x_test_normal)
mse_x_test_normal_nonlinear = tf.keras.losses.mse(x_test_normal_reconstructed_nonlinear, x_test_normal)
mae_x_test_normal_nonlinear = tf.keras.losses.mae(x_test_normal_reconstructed_nonlinear, x_test_normal)

Now, let us find all the non-anomaly (**note**: here we can use train *and* test as the test set here, as we never actually used the training data for anything).

In [None]:
x_anomalous = np.concatenate([x_train_anomalous, x_test_anomalous])

In [None]:
x_anomalous_reconstructed_linear = autoencoder_linear.predict(x_anomalous)
x_anomalous_reconstructed_nonlinear = autoencoder_nonlinear.predict(x_anomalous)

mse_x_anomalous_linear = tf.keras.losses.mse(x_anomalous_reconstructed_linear, x_anomalous)
mae_x_anomalous_linear = tf.keras.losses.mae(x_anomalous_reconstructed_linear, x_anomalous)
mse_x_anomalous_nonlinear = tf.keras.losses.mse(x_anomalous_reconstructed_nonlinear, x_anomalous)
mae_x_anomalous_nonlinear = tf.keras.losses.mae(x_anomalous_reconstructed_nonlinear, x_anomalous)

We now need to determine a "cutoff" or "threshold" value, above which we classify an observation as an outlier.

In [None]:
# CODE HERE

# Exercise 3

Use a supervised model to perform anomaly detection (i.e. use both the normal and anomolous data for training). Is this model better than the approach above? Is this still the case if you restrict the number of anomalies in the training data to a small number (such as 10)?

In [None]:
supervised_anomaly_detecter = tf.keras.models.Sequential([
    ??
])

supervised_anomaly_detecter.compile(??)

supervised_anomaly_detecter.summary()

In [None]:
hist_supervised = supervised_anomaly_detecter.fit(??)

In [None]:
plt.plot(hist_supervised.history["loss"], label="Training loss")
plt.plot(hist_supervised.history["val_loss"], label="Test loss")
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()

plt.plot(hist_supervised.history["accuracy"], label="Training accuracy")
plt.plot(hist_supervised.history["val_accuracy"], label="Test accuracy")
plt.legend()
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.show()

We will use the same model, but lower number of anomalies in the training data.

We will use the same test data.

In [None]:
supervised_anomaly_detecter_2 = tf.keras.models.Sequential([
    ??
])

supervised_anomaly_detecter_2.compile(??)

supervised_anomaly_detecter_2.summary()

In [None]:
hist_supervised_2 = supervised_anomaly_detecter_2.fit(
    x=np.concatenate([x_train_normal, x_train_anomalous[:10]]), # 10 examples
    y=np.concatenate([train_labels[train_labels == 1], train_labels[train_labels == 0][:10]]), 
    validation_data=(x_test, test_labels), epochs=20)

In [None]:
plt.plot(hist_supervised_2.history["loss"], label="Training loss")
plt.plot(hist_supervised_2.history["val_loss"], label="Test loss")
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()

plt.plot(hist_supervised_2.history["accuracy"], label="Training accuracy")
plt.plot(hist_supervised_2.history["val_accuracy"], label="Test accuracy")
plt.legend()
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.show()