# RNN to predict more than one value
Previous exercise learned a series and predicted the next one.
Now, predict a series.

The book points out three ways to look ahead 10 steps.
* Offset the training pairs i.e. on X(1), predict y(11). 
* In a loop, predict y, append the value to X, predict again.
* Train the RNN to predict 10 values at a time. See below.

In [1]:
import sys
import sklearn
import tensorflow
import numpy as np
import tensorflow as tf
from tensorflow import keras
import os
from pathlib import Path
np.random.seed(42)
tf.random.set_seed(42)

%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

# Data generator and baseline for comparison.
# Combination of 2 sine waves plus noise
def generate_time_series (batch_size, n_steps):
    freq1, freq2, offset1, offset2 = np.random.rand(4, batch_size, 1)
    time = np.linspace(0, 1, n_steps)
    series = 0.5 * np.sin((time - offset1) * (freq1 * 10 + 10))
    series += 0.2 * np.sin((time - offset2) * (freq2 * 20 + 20))
    series += 0.1 * (np.random.rand(batch_size, n_steps) - 0.5)
    return series[..., np.newaxis].astype(np.float32)

## Predict future 10 
First attempt quite literally predicts the 10 future values.
The entire training is based on predicting the 10 future values.
(As opposed to predicting the NEXT 10 values at every step.)

Alter the model to have 10 output nodes.
Alter the training data to contain 10 future values.
Alter the y_train to contain only the 10 future values of each instance.
Thus, every backprop uses the error of predicting the 10 future values.
This seems pretty dumb but it works pretty well.

In [11]:
n_steps = 50
series = generate_time_series(10000,n_steps+10)
# X = 7000 different series each with random variation
#     * time steps 0..50
# y = 7000 predictions
#     * time steps 50..60
X_train,y_train = series[:7000, :n_steps], series[:7000, -10:, 0]
X_valid,y_valid = series[7000:9000, :n_steps], series[7000:9000, -10:, 0]
X_test,y_test = series[9000:, :n_steps], series[9000:, -10:, 0]
y_train.shape
# Every y is a vector of 10

(7000, 10)

In [12]:
rnn1 = keras.models.Sequential([
    keras.layers.SimpleRNN(20,return_sequences=True,input_shape=[None,1]),
    keras.layers.SimpleRNN(20),
    keras.layers.Dense(10)
])
rnn1.compile(loss="mse", optimizer="adam")
history = rnn1.fit(X_train, y_train, epochs=20,
                    validation_data=(X_valid, y_valid))  

# loss: Value of cost function on training data.
# val_loss: Value of cost function on validation data.

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [None]:
# My validation loss is much worse than the book's
# but my loss jumps up and down.
# My poor values could be due to my small memory;
# book does 7000 instances per batch compared to my 219.
# The instability could be due to overfitting the small batches.
y_pred = rnn1.predict(X_train)
y_pred.shape

## Predict rolling 10
Now do it the right way with a rolling definition.
At every time step, predict the next 10
and measure the loss on the next 10
(as opposed to always using the same future 10).

Modify the data so every y contains 10 future X values.
Avoid look-ahead so the model remains causal
i.e. only uses the past to predict the future.

Use return_sequences=True in every layer. 
Thus, the last layer must operate on 10 time steps.
This involves many matrix reshapes.
Use the TimeDistributed class to handle this
(although the Dense class is actually smart enough to do it). 
TimeDistributed is used when the output is a sequence not a vector.

In [14]:
# 10K instances, 50 time steps each, with 10 predictions per step.
Y = np.empty((10000,n_steps,10))
for step_ahead in range(1,10+1):
    Y[:,:,step_ahead-1] = series[:,step_ahead:step_ahead+n_steps,0]
y_train = Y[:7000]
y_valid = Y[7000:9000]
y_test  = Y[9000:]

rnn2 = keras.models.Sequential([
    keras.layers.SimpleRNN(20,return_sequences=True,input_shape=[None,1]),
    keras.layers.SimpleRNN(20,return_sequences=True),
    keras.layers.TimeDistributed(keras.layers.Dense(10))
])

In [17]:
# Training will use the rolling window of next 10 at every step.
# But we only care about the loss on the future 10.
# Use a custom callback function to report loss.
# This generates extra numbers in the history below.
# The future loss is lower than the cummulative loss.
def last_time_step_mse(y_true,y_pred):
    return keras.metrics.mean_squared_error(y_true[:,-1],y_pred[:,-1])
optimizer = keras.optimizers.Adam(lr=0.005)
rnn2.compile(loss="mse", optimizer=optimizer, metrics=[last_time_step_mse])
history = rnn2.fit(X_train, y_train, epochs=20,
                    validation_data=(X_valid, y_valid))  


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
