# Sequence classification with Neural Networks
## Split-window stateful RNN model

We saw the problem with split-window RNN model -- at the edges of the window, the model probability spikes as the model becomes uncertain. The reason is that we have split the same sequence into chunks and the state of the model is reset between those chunks.

Now we are going to try to mitigate this with the stateful RNNs. In keras luckily we already have an implementation of those with a single flag, but we need to to some additional bookkeeping as now we need to manually reset the model's state on a new sample.

**In this model, size of the window only affects the speed of the training, and not much else since we are not resetting model's state**.

Here are some extra links about stateful RNNs:

* http://philipperemy.github.io/keras-stateful-lstm/ (read first)
* https://fairyonice.github.io/Stateful-LSTM-model-training-in-Keras.html
* https://github.com/vsmolyakov/experiments_with_python/blob/master/chp01/keras_lstm_seqclass.ipynb (full example)

In [1]:
# Load the TensorBoard notebook extension
%load_ext tensorboard
import altair as alt

import numpy as np
import pandas as pd

import os
import sys
module_path = os.path.abspath(os.path.join('../..'))
if module_path not in sys.path:
    sys.path.append(module_path)

from tmdprimer.datagen import generate_sample, Dataset, Sample

Two things are important here:

* We need to specify the full `batch_input_shape` for stateful RNN
* We write a custom callback to manually manage model's state -- reset after each sample 

In [2]:
import tensorflow as tf

def get_rnn_model(batch_size, window_size):
    rnn_model = tf.keras.Sequential(
        [
            tf.keras.layers.GRU(8, return_sequences=True, stateful=True,
                                batch_input_shape=(batch_size, window_size, 1)),
            tf.keras.layers.Dense(1, activation="sigmoid")
        ]
    )
    rnn_model.compile(
        loss="binary_crossentropy",
        optimizer=tf.keras.optimizers.Nadam(),
        metrics=[tf.keras.metrics.BinaryAccuracy()]
    )
    return rnn_model


class StateResetCallback(tf.keras.callbacks.Callback):
    """
    Callback to reset model's state when one sequences is exhausted
    """
    def __init__(self, samples, window_size):
        self.samples = samples
        self.window_size = window_size
        self.cur_sample_ix = 0
        self.cur_ix_inside_sample = 0

    def on_batch_begin(self, batch, logs=None):
        if len(self.samples[self.cur_sample_ix]) < self.cur_ix_inside_sample:
            self.cur_sample_ix +=1
            self.cur_ix_inside_sample = 0
            self.model.reset_states()
        self.cur_ix_inside_sample += self.window_size

    def on_epoch_begin(self, epochs, logs=None):
        self.model.reset_states()
        self.cur_sample_ix = 0
        self.cur_ix_inside_sample = 0

batch_size = 1
window_size = 50

In [3]:
data_rnn = []
for outlier_prob in (0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0):
    print(outlier_prob)
    dataset = Dataset.generate(train_outlier_prob=outlier_prob, n_samples=100)    
    model = get_rnn_model(batch_size, window_size)

    model.fit(x=dataset.to_split_window_tfds(window_size=window_size).batch(batch_size),
              epochs=10,
              shuffle=False,
              verbose=0,
              callbacks=[StateResetCallback(dataset.samples, window_size)])
    test_dataset = Dataset.generate(train_outlier_prob=outlier_prob, n_samples=20)
    res = model.evaluate(test_dataset.to_split_window_tfds(window_size=window_size).batch(batch_size), verbose=0)
    data_rnn.append({'outlier_prob': outlier_prob, 'accuracy': res[1]})
    
df_rnn = pd.DataFrame(data_rnn)

0.01
0.05
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0


In [4]:
alt.Chart(df_rnn).mark_line().encode(x='outlier_prob', y='accuracy')

In [5]:
dataset = Dataset.generate(train_outlier_prob=0, n_samples=200)
model = get_rnn_model(batch_size, window_size)


model.fit(x=dataset.to_split_window_tfds(window_size=window_size).batch(batch_size),
          epochs=5, shuffle=False, callbacks=[StateResetCallback(dataset.samples, window_size)])

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x15c84eb50>

In [6]:
test_dataset = Dataset.generate(train_outlier_prob=0, n_samples=20)
pred_model = get_rnn_model(1, window_size)
pred_model.set_weights(model.get_weights())

In [7]:
df = pd.DataFrame(data=({"time step": i, "speed": lf.features[0]/100, "label": lf.label} for i, lf in enumerate(test_dataset.samples[0].features)))
base = alt.Chart(df).encode(x="time step")
x, _ = test_dataset.samples[0].to_numpy_split_windows(window_size=50, scaler=dataset.std_scaler)
pred_y = pred_model.predict(x, batch_size=1)
df.loc[:, "pred_label"] = pd.Series(pred_y.flatten())

In [8]:
alt.layer(
    base.mark_line(color="cornflowerblue").encode(y="speed"),
    base.mark_line(color="orange").encode(y="label"),
    base.mark_line(color="red").encode(y="pred_label"),
)

Notice the small part at the end is not predicted since I don't use padding for sequences that are not fully divided by the `window_size`