# Sequence classification with Neural Networks
## Part 3: Basic RNN model

Now we're going to try RNN model (as GRU) on our time series data.
The difference here is that we can feed the network with the whole sequence at once, so that it can learn the patterns and hopefully demonstrate better performance in presence of outliers.

That should be relatively easy for our data. Basically the model could learn that:
* speed of 5 can only happen at the begininng
* or after the train segment speed has reached 0.

If the speed of 5 (km/h) happens abruptly after any other speed value -- that would mean it's still a train segment. That means the network should be able to demonstrate high performance even with the 50% or more outliers in the data.

In [1]:
# Load the TensorBoard notebook extension
%load_ext tensorboard
import altair as alt

import numpy as np
import pandas as pd

import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

from tmdprimer.datagen import generate_sample, Dataset, Sample

We're going to create a shallow RNN architecture with just one recurrent layer and one output dense unit. But that should be enough for our case given simplicity of our data.

The learning rate is adjusted with a schedule for faster convergence.

In [2]:
import tensorflow as tf

# converge faster
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
        0.01,
        decay_steps=100,
        decay_rate=0.7)

def get_rnn_model():
    rnn_model = tf.keras.Sequential(
        [
            tf.keras.layers.GRU(8, return_sequences=True),
            tf.keras.layers.Dense(1, activation="sigmoid")
        ]
    )
    rnn_model.compile(
        loss="binary_crossentropy",
        optimizer=tf.keras.optimizers.RMSprop(learning_rate=lr_schedule),
        metrics=[tf.keras.metrics.BinaryAccuracy()]
    )
    return rnn_model

In [10]:
data_rnn = []
for outlier_prob in (0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0):
    print(outlier_prob)
    dataset = Dataset.generate(train_outlier_prob=outlier_prob, n_samples=120)
    # truncate samples since we don't use masking and padding here
    min_sample_size = min([len(s) for s in dataset.samples])
    test_samples = [Sample(s.features[:min_sample_size]) for s in dataset.samples[100:]]
    dataset.samples = [Sample(s.features[:min_sample_size]) for s in dataset.samples[:100]]
    
    model = get_rnn_model()

    model.fit(
        x=dataset.to_tfds(),
        epochs=10,
        verbose=0
    )
    dataset.samples = test_samples
    res = model.evaluate(dataset.to_tfds(), verbose=0)
    data_rnn.append({'outlier_prob': outlier_prob, 'accuracy': res[1]})
    
df_rnn = pd.DataFrame(data_rnn)

0.01
0.05
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0


In [11]:
alt.Chart(df_rnn).mark_line().encode(x='outlier_prob', y='accuracy')

Right in line with our predictions, the model can easily learn the patterns in the data, and can yield over 90% accuracy even in case of whopping 80% of outliers.

As expected, at the 100% outlier level, when features become indistinguishable, the network falls to a random 50% accuracy.

Let's see now how the tensorboard graphs look like for RNN. You can use those graphs as a reference when comparing them to the more complex models in production.

In [3]:
# Clear any logs from previous runs
tf.executing_eagerly()
from datetime import datetime
!rm -rf ./logs/
log_dir = "logs/fit/" + datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)

dataset = Dataset.generate(train_outlier_prob=0.10, n_samples=200)
# truncate samples since we don't use masking and padding here
min_sample_size = min([len(s) for s in dataset.samples])
dataset.samples = [Sample(s.features[:min_sample_size]) for s in dataset.samples[:100]]

get_rnn_model().fit(
    x=dataset.to_tfds(),
    epochs=10,
    callbacks=[tensorboard_callback]
)

#%tensorboard --logdir logs/fit

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x1537aaa60>