# Weather forecasting with RNNs

This notebook is based on one of the Keras examples, where we will try to predict the weather using data recorded at the Weather Station of the Max Planck Institute for Biogeochemistry in Jena, Germany.

Updated version of various additional data are available at www.bgc-jena.mpg.de/wetter -- the particular data we are looking at here, was recorded from 2009 to 2016.

In [None]:
import keras
import tensorflow as tf
import os
import numpy as np
from matplotlib import pyplot as plt

## Download the data

In [None]:
!wget https://s3.amazonaws.com/keras-datasets/jena_climate_2009_2016.csv.zip
!unzip jena_climate_2009_2016.csv.zip

Have a look at the contents. There are 14 features in total.

In [None]:

fname = os.path.join("jena_climate_2009_2016.csv")
with open(fname) as f:
    data = f.read()
    lines = data.split("\n")
    header = lines[0].split(",")
    lines = lines[1:]
    print(header)
    print(len(lines))

Convert the data to numpy arrays:

In [None]:
temperature = np.zeros((len(lines),))
raw_data = np.zeros((len(lines), len(header) - 1))
for i, line in enumerate(lines):
    values = [float(x) for x in line.split(",")[1:]]
    temperature[i] = values[1]
    raw_data[i, :] = values[:]

What does the temperature look like?

In [None]:
plt.plot(range(len(temperature)), temperature)

A lot of data points here, so make a plot that focusses on the first 10 days. There is a measurement every 10 minutes, so we get 24 × 6 = 144 data points per day.

In [None]:
plt.plot(range(1440), temperature[:1440])

For our our experiments, we use the first 50% of the data for training, the following 25% for validation, and the last 25% for testing.

In [None]:
num_train_samples = int(0.5 * len(raw_data))
num_val_samples = int(0.25 * len(raw_data))
num_test_samples = len(raw_data) - num_train_samples - num_val_samples
print("num_train_samples:", num_train_samples)
print("num_val_samples:", num_val_samples)
print("num_test_samples:", num_test_samples)

To prepare the data, we normalise it by subtracting the mean and dividing by the standard deviation.

In [None]:
mean = raw_data[:num_train_samples].mean(axis=0)
raw_data -= mean
std = raw_data[:num_train_samples].std(axis=0)
raw_data /= std

## Create the datasets to use

We’ll use `timeseries_dataset_from_array()` to instantiate three datasets: one for training, one for validation, and one for testing.
We’ll use the following parameter values:

 - `sampling_rate = 6`: Observations will be sampled at one data point per hour:
we will only keep one data point out of 6.
 - `sequence_length = 120`: Observations will go back 5 days (120 hours).
 - `delay = sampling_rate * (sequence_length + 24 - 1)`: The target for a sequence will be the temperature 24 hours after the end of the sequence.

In [None]:
sampling_rate = 6
sequence_length = 120
delay = sampling_rate * (sequence_length + 24 - 1)
batch_size = 256

train_dataset = keras.utils.timeseries_dataset_from_array(
    raw_data[:-delay],
    targets=temperature[delay:],
    sampling_rate=sampling_rate,
    sequence_length=sequence_length,
    shuffle=True,
    batch_size=batch_size,
    start_index=0,
    end_index=num_train_samples - (num_train_samples % batch_size)
)

val_dataset = keras.utils.timeseries_dataset_from_array(
    raw_data[:-delay],
    targets=temperature[delay:],
    sampling_rate=sampling_rate,
    sequence_length=sequence_length,
    shuffle=True,
    batch_size=batch_size,
    start_index=num_train_samples,
    end_index=num_train_samples + num_val_samples
)

test_dataset = keras.utils.timeseries_dataset_from_array(
    raw_data[:-delay],
    targets=temperature[delay:],
    sampling_rate=sampling_rate,
    sequence_length=sequence_length,
    shuffle=True,
    batch_size=batch_size,
    start_index=num_train_samples + num_val_samples)

Try out the datasets:

In [None]:
for samples, targets in train_dataset:
    print("samples shape:", samples.shape)
    print("targets shape:", targets.shape)
    break

## A simplistic model

First, for the simplest approach -- 100% autocorrelation.


In [None]:
def evaluate_naive_method(dataset):
    total_abs_err = 0.
    samples_seen = 0
    for samples, targets in dataset:
        preds = samples[:, -1, 1] * std[1] + mean[1]
        total_abs_err += np.sum(np.abs(preds - targets))
        samples_seen += samples.shape[0]

        return total_abs_err / samples_seen


print(f"Validation MAE: {evaluate_naive_method(val_dataset):.2f}")
print(f"Test MAE: {evaluate_naive_method(test_dataset):.2f}")

The most basic machine learning model -- a single-layer dense model.

In [None]:
inputs = keras.Input(shape=(sequence_length, raw_data.shape[-1]))
print('inputs.shape:', inputs.shape)
x = keras.layers.Reshape((sequence_length, raw_data.shape[-1]))(inputs)
x = keras.layers.Dense(16, activation="relu")(x)
outputs = keras.layers.Dense(1)(x)

model = keras.Model(inputs, outputs)

callbacks = [
    keras.callbacks.ModelCheckpoint("jena_dense.keras",
    save_best_only=True)
]

model.compile(optimizer="rmsprop", loss="mse", metrics=["mae"])

history = model.fit(
    train_dataset,
    epochs=10,
    validation_data=val_dataset,
    callbacks=callbacks
)

model = keras.models.load_model("jena_dense.keras")

print(f"Test MAE: {model.evaluate(test_dataset)[1]:.2f}")

Define a utulity function for plotting the learning curve

In [None]:
def plot_loss_curve(history):
    loss = history.history["mae"]
    val_loss = history.history["val_mae"]
    epochs = range(1, len(loss) + 1)
    plt.figure()
    plt.plot(epochs, loss, "bo", label="Training MAE")
    plt.plot(epochs, val_loss, "b", label="Validation MAE")
    plt.title("Training and validation MAE")
    plt.legend()
    plt.show()

plot_loss_curve(history)

## A convolutional model

Let's see how a straight-up convolutional model performs.

### <span style="color: red; font-weight: bold;">Exercise:<span>

Build a model with two convolutional layers, each followed by a max-pooling layer. The convolutions can have 24 filters with kernel size 8.

In [None]:
inputs = keras.Input(shape=(sequence_length, raw_data.shape[-1]))
x = ... # TODO
x = keras.layers.GlobalAveragePooling1D()(x)
outputs = keras.layers.Dense(1)(x)

model = keras.Model(inputs, outputs)

Now train it, and plot the learning curves:

In [None]:
callbacks = [
    keras.callbacks.ModelCheckpoint("jena_conv.keras",
    save_best_only=True)
]

model.compile(optimizer="rmsprop", loss="mse", metrics=["mae"])
history = model.fit(
    train_dataset,
    epochs=10,
    validation_data=val_dataset,
    callbacks=callbacks
)
model = keras.models.load_model("jena_conv.keras")

print(f"Test MAE: {model.evaluate(test_dataset)[1]:.2f}")

In [None]:
plot_loss_curve(history)

### <span style="color: red; font-weight: bold;">Exercise:<span>

Implement a simple [WaveNet](https://deepmind.google/discover/blog/wavenet-a-generative-model-for-raw-audio/) model -- that is, a fully convolutional network with strides larger than one, and padding set to `"causal"`, so that the layers can only look backwards in time.

You can use the textbook to get a simple solution -- but try also to make a more complicated version.

Then train and evaluate it.

In [None]:
# model goes here

wavenet = keras.Model(inputs, outputs)

In [None]:
wavenet.compile(optimizer="rmsprop", loss="mse", metrics=["mae"])
history = wavenet.fit(
    train_dataset,
    epochs=10,
    validation_data=val_dataset,
    callbacks=callbacks
)

print(f"Test MAE: {wavenet.evaluate(test_dataset)[1]:.2f}")

# Compare to a recurrent model: LSTM

In [None]:
inputs = keras.Input(shape=(sequence_length, raw_data.shape[-1]))
x = keras.layers.LSTM(16)(inputs)
outputs = keras.layers.Dense(1)(x)
model = keras.Model(inputs, outputs)
callbacks = [
    keras.callbacks.ModelCheckpoint("jena_lstm.keras",
    save_best_only=True)
]

model.compile(optimizer="rmsprop", loss="mse", metrics=["mae"])
history = model.fit(
    train_dataset,
    epochs=10,
    validation_data=val_dataset,
    callbacks=callbacks
)

model = keras.models.load_model("jena_lstm.keras")
print(f"Test MAE: {model.evaluate(test_dataset)[1]:.2f}")

Plot learning curves:

In [None]:
plot_loss_curve(history)

Maybe we need to regularise it. We add `recurrent_dropout`.

In [None]:
inputs = keras.Input(shape=(sequence_length, raw_data.shape[-1]))
x = keras.layers.LSTM(32, recurrent_dropout=0.25)(inputs)
x = keras.layers.Dropout(0.5)(x)
outputs = keras.layers.Dense(1)(x)
model = keras.Model(inputs, outputs)
callbacks = [
    keras.callbacks.ModelCheckpoint("jena_lstm_dropout.keras",
    save_best_only=True)
]

model.compile(optimizer="rmsprop", loss="mse", metrics=["mae"])
history = model.fit(
    train_dataset,
    epochs=50,
    validation_data=val_dataset,
    callbacks=callbacks
)

# Stacked LSTM

In [None]:
inputs = keras.Input(shape=(sequence_length, raw_data.shape[-1]))
x = keras.layers.GRU(32, recurrent_dropout=0.5, return_sequences=True)(inputs)
x = keras.layers.GRU(32, recurrent_dropout=0.5)(x)
x = keras.layers.Dropout(0.5)(x)
outputs = keras.layers.Dense(1)(x)
model = keras.Model(inputs, outputs)
callbacks = [
    keras.callbacks.ModelCheckpoint("jena_stacked_gru_dropout.keras",
    save_best_only=True)
]

inputs = keras.Input(shape=(sequence_length, num_features))
x = layers.LSTM(32, recurrent_dropout=0.2, unroll=True)(inputs)

model.compile(optimizer="rmsprop", loss="mse", metrics=["mae"])
history = model.fit(train_dataset,
    epochs=50,
    validation_data=val_dataset,
    callbacks=callbacks)
model = keras.models.load_model("jena_stacked_gru_dropout.keras")
print(f"Test MAE: {model.evaluate(test_dataset)[1]:.2f}")

Results:

In [None]:
plot_loss_curve(history)

# Try a bidirectional LSTM

This one processes the input sequences twice: Once in chronological order, and once in reverse order.

Does it help in predicting our weather data?

In [None]:
inputs = keras.Input(shape=(sequence_length, raw_data.shape[-1]))
x = keras.layers.Bidirectional(keras.layers.LSTM(16))(inputs)
outputs = keras.layers.Dense(1)(x)
model = keras.Model(inputs, outputs)


model.compile(optimizer="rmsprop", loss="mse", metrics=["mae"])
history = model.fit(train_dataset,
epochs=10,
validation_data=val_dataset)

### <span style="color: red; font-weight: bold;">Open exercises:<span>

- Adjust the number of units in each recurrent layer in the stacked setup, as well as the amount of dropout. (The current choices are largely arbitrary and
probably suboptimal.)

- Adjust the learning rate used by the RMSprop optimizer, or try a different
optimizer.

- Try using a stack of Dense layers as the regressor on top of the recurrent layer, instead of a single Dense layer.

- Improve the input to the model: try using longer or shorter sequences or a different sampling rate,