# RNN Primer
## Part 2: Padding and masking

In real world, we typically won't have time series samples of the same length. For example, one user was tracking his movement of 30 minutes, another for 1 hour, another for 10 minutes, etc.

In the previous notebook, we truncated samples to the same length. To accomodate samples of different length, we need to use the techniques called **padding** and **masking**. Let's see how it's done.

For details refer to the tensorflow guide: https://www.tensorflow.org/guide/keras/masking_and_padding

In [1]:
# Load the TensorBoard notebook extension
%load_ext tensorboard
import altair as alt

import numpy as np
import pandas as pd

import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

from rnnprimer.datagen import generate_sample, Dataset

The methodology we use to generate data already generates sample of variable length.

For now, we also keep the train/walk split at 50%.

In [2]:
sample = generate_sample()
fig = sample.get_figure()
fig.properties(title="Sample without outliers", width=800)

In [22]:
dataset = Dataset.generate(train_outlier_prob=0, n_samples=100)
sample_size_df = pd.DataFrame([len(s) for s in dataset.samples], columns=['# of timesteps'])
alt.Chart(sample_size_df).mark_bar().encode(
    alt.X("# of timesteps:Q", bin=True),
    y='count()',
)

1 train sub-segment is 20-100 timesteps (average 60), in total 4-10 train sub-segments (average 7). That makes around 350 train timesteps on average. Since we keep the walk at 50%, that makes 350 + 350 = 700 total timesteps on average. Since we have 2 random variables, the # of timesteps in our dataset follows triangular distribution (https://en.wikipedia.org/wiki/Irwin%E2%80%93Hall_distribution)

If samples are of different length, our `to_tfds()` function will generate padded batches with a pad value of (-1, 0). -1 is for feature, and 0 is for label. Here is a sample batch in detail:

In [23]:
for batch in dataset.to_tfds():
    features, labels = batch
    break

In [27]:
print(features[0])
print(labels[0])

tf.Tensor(
[[ 0.05]
 [ 0.05]
 [ 0.05]
 ...
 [-1.  ]
 [-1.  ]
 [-1.  ]], shape=(1220, 1), dtype=float32)
tf.Tensor(
[[1]
 [1]
 [1]
 ...
 [0]
 [0]
 [0]], shape=(1220, 1), dtype=int32)


The last elements of the first feature vector were set to -1 and the total size is 1200 elements. This means that in this batch there is a sample with a maximum of 1200 unpadded features.

Let's now train the same model we had before to see how it works:

In [28]:
import tensorflow as tf

lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
        0.1,
        decay_steps=100,
        decay_rate=0.1)

rnn_model = tf.keras.Sequential(
    [
        tf.keras.layers.Masking(mask_value=np.array([-1])),
        tf.keras.layers.GRU(8, return_sequences=True),
        tf.keras.layers.Dense(1, activation="sigmoid")
    ]
)
rnn_model.compile(
    loss="binary_crossentropy",
    optimizer=tf.keras.optimizers.RMSprop(learning_rate=lr_schedule),
    metrics=[tf.keras.metrics.BinaryAccuracy()]
)

In [29]:
rnn_model.fit(
    x=dataset.to_tfds(),
    epochs=10,
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x147dc2b50>

Something is obviously wrong here, as our model quickly reaches ~80% accuracy and then saturates, when it should be close to 100%.

Turns out the problem is not in the model, but in the way we specified metrics. We used a -1 padding with a 0 (train) label. Padded features are ignored during traning, and predictions on them are also irrelevant. But the metrics do not ignore those predictions.

For each metric in tensorflow, it is written:
> If sample_weight is None, weights default to 1. Use sample_weight of 0 to mask values.

For example https://www.tensorflow.org/api_docs/python/tf/keras/metrics/BinaryAccuracy

That also means that for `model.compile()` we need to use `weighted_metrics` argument instead of just `metrics`.

In [53]:
rnn_model = tf.keras.Sequential(
    [
        tf.keras.layers.Masking(mask_value=np.array([-1])),
        tf.keras.layers.GRU(8, return_sequences=True),
        tf.keras.layers.Dense(1, activation="sigmoid")
    ]
)
rnn_model.compile(
    loss="binary_crossentropy",
    optimizer=tf.keras.optimizers.RMSprop(learning_rate=lr_schedule),
    # CHANGED
    weighted_metrics=[tf.keras.metrics.BinaryAccuracy()]
)

I have prepared a `dataset.to_weighted_tfds()` method which accepts class weights and assigned the weights accordingly:

In [54]:
rnn_model.fit(
    x=dataset.to_weighted_tfds({1: 0.5, 0: 0.5}),
    epochs=10,
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x14824bfd0>

This looks much better now. Accuracy close to 100% as it should be.