# Sequence classification with Neural Networks
## Split-window RNN model

Now we're going to try split-window RNN model. We are doing this because feeding whole sequence of your data might be impractican for a number of reasons:

* one sample of high-frequency data (like acceleration) might not even fit into GPU memory and the training will crash
* model might not be able to learn properly on long sequences
* avoids the need for padding/masking since we will have equal size windows

In [1]:
# Load the TensorBoard notebook extension
%load_ext tensorboard
import altair as alt

import numpy as np
import pandas as pd

import os
import sys
module_path = os.path.abspath(os.path.join('../..'))
if module_path not in sys.path:
    sys.path.append(module_path)

from tmdprimer.datagen import generate_sample, Dataset, Sample

We're going to use the same network as for the per-sample RNN:

In [2]:
import tensorflow as tf

def get_rnn_model():
    rnn_model = tf.keras.Sequential(
        [
            tf.keras.layers.GRU(8, return_sequences=True),
            tf.keras.layers.Dense(1, activation="sigmoid")
        ]
    )
    rnn_model.compile(
        loss="binary_crossentropy",
        optimizer=tf.keras.optimizers.Nadam(),
        metrics=[tf.keras.metrics.BinaryAccuracy()]
    )
    return rnn_model

In [3]:
data_rnn = []
for outlier_prob in (0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0):
    print(outlier_prob)
    dataset = Dataset.generate(train_outlier_prob=outlier_prob, n_samples=100)    
    model = get_rnn_model()

    model.fit(
        x=dataset.to_split_window_tfds(window_size=50).batch(32),
        epochs=10,
        verbose=0
    )
    test_dataset = Dataset.generate(train_outlier_prob=outlier_prob, n_samples=20)
    res = model.evaluate(test_dataset.to_split_window_tfds(window_size=50).batch(32), verbose=0)
    data_rnn.append({'outlier_prob': outlier_prob, 'accuracy': res[1]})
    
df_rnn = pd.DataFrame(data_rnn)

0.01
0.05
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0


In [4]:
alt.Chart(df_rnn).mark_line().encode(x='outlier_prob', y='accuracy')

Looks quite similar to the per-sample RNN, but the accuracy of at the low levels of noise doesn't reach 0.99 for some reason. I don't know why this is happening, so let me know in the comments if you have an idea.

Let's see again how the tensorboard graphs look like for this RNN:

In [3]:
# Clear any logs from previous runs
from datetime import datetime
!rm -rf ./logs/
log_dir = "logs/fit/" + datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)

dataset = Dataset.generate(train_outlier_prob=0, n_samples=200)
model = get_rnn_model()

model.fit(
    x=dataset.to_split_window_tfds(window_size=50).batch(20),
    epochs=10,
    callbacks=[tensorboard_callback]
)

#%tensorboard --logdir logs/fit

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x1514c6d90>

It's interesting that this model doesn't reach the 99% accuracy easily like the others. Let's look at its prediction in detail.

In [23]:
test_dataset = Dataset.generate(train_outlier_prob=0, n_samples=20)

In [24]:
df = pd.DataFrame(data=({"time step": i, "speed": lf.features[0]/100, "label": lf.label} for i, lf in enumerate(test_dataset.samples[0].features)))
base = alt.Chart(df).encode(x="time step")
x, _ = test_dataset.samples[0].to_numpy_split_windows(window_size=50, scaler=dataset.std_scaler)
pred_y = model.predict(x)
df.loc[:, "pred_label"] = pd.Series(pred_y.flatten())
df.fillna(1, inplace=True)

In [25]:
alt.layer(
    base.mark_line(color="cornflowerblue").encode(y="speed"),
    base.mark_line(color="orange").encode(y="label"),
    base.mark_line(color="red").encode(y="pred_label"),
)

Somehow it's less confident in the first prediction it makes for each window. And the same situation is amplified with the introduction of outliers. Write me a message if you know how to explain this.

Take a look at the stateful split-window RNN to see how to improve the results.