 # Table of Contents
<div class="toc" style="margin-top: 1em;"><ul class="toc-item" id="toc-level0"><li><span><a href="http://localhost:8887/notebooks/rnn_test.ipynb#Set-up-data" data-toc-modified-id="Set-up-data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Set up data</a></span></li><li><span><a href="http://localhost:8887/notebooks/rnn_test.ipynb#RNNs-with-only-temperature-data" data-toc-modified-id="RNNs-with-only-temperature-data-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>RNNs with only temperature data</a></span><ul class="toc-item"><li><span><a href="http://localhost:8887/notebooks/rnn_test.ipynb#RNN-predicting-only-the-last-target" data-toc-modified-id="RNN-predicting-only-the-last-target-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>RNN predicting only the last target</a></span></li><li><span><a href="http://localhost:8887/notebooks/rnn_test.ipynb#Sequence-RNN" data-toc-modified-id="Sequence-RNN-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Sequence RNN</a></span></li><li><span><a href="http://localhost:8887/notebooks/rnn_test.ipynb#Longer-sequence" data-toc-modified-id="Longer-sequence-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Longer sequence</a></span></li><li><span><a href="http://localhost:8887/notebooks/rnn_test.ipynb#Reference-experiment-with-longer-training-set." data-toc-modified-id="Reference-experiment-with-longer-training-set.-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Reference experiment with longer training set.</a></span></li><li><span><a href="http://localhost:8887/notebooks/rnn_test.ipynb#Sequence-model-with-longer-training-set" data-toc-modified-id="Sequence-model-with-longer-training-set-2.5"><span class="toc-item-num">2.5&nbsp;&nbsp;</span>Sequence model with longer training set</a></span></li></ul></li><li><span><a href="http://localhost:8887/notebooks/rnn_test.ipynb#Predict-only-one-value" data-toc-modified-id="Predict-only-one-value-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Predict only one value</a></span></li><li><span><a href="http://localhost:8887/notebooks/rnn_test.ipynb#Get-additional-variables" data-toc-modified-id="Get-additional-variables-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Get additional variables</a></span></li></ul></div>

# Recurrent neural networks

In this notebook we will try out RNNs for our post-processing. The idea here is that there might be some extra information in looking at data from previous time steps.

RNNs take quite a long time to train, so I am using a GPU here.

In [61]:
# Imports
from importlib import reload
import utils; reload(utils)
from utils import *
import crps_loss; reload(crps_loss)
from crps_loss import crps_cost_function, crps_cost_function_seq
import matplotlib.pyplot as plt
%matplotlib inline

import keras
from keras.layers import Input, Dense, merge, Embedding, Flatten, Dropout, \
    SimpleRNN, LSTM, TimeDistributed, GRU, Dropout, Masking
from keras.layers.merge import Concatenate
from keras.models import Model, Sequential
import keras.backend as K
from keras.callbacks import EarlyStopping
from keras.optimizers import SGD, Adam

Anaconda environment: py36_gpu
Linux 4.4.0-96-generic


In [2]:
# Use this if you want to limit the GPU RAM usage
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.5
set_session(tf.Session(config=config))

In [3]:
# Basic setup
# DATA_DIR = '/Volumes/STICK/data/ppnn_data/'  # Mac
DATA_DIR = '/project/meteo/w2w/C7/ppnn_data/'   # LMU
results_dir = '../results/'
window_size = 25   # Days in rolling window
fclt = 48   # Forecast lead time in hours

## Set up data

This is now also done inside the `get_train_test_sets` function. `seq_len` is the number of timesteps (including the one to predict). We will start out with a moderate length of 5 days, training for 2015, predicting for 2016.

In [4]:
seq_len=5

In [62]:
train_dates = ['2015-01-01', '2016-01-01']
test_dates =  ['2016-01-01', '2017-01-01']
train_set, test_set, valid_set = get_train_test_sets(DATA_DIR, train_dates, test_dates, 
                                          seq_len=seq_len, fill_value=-999., valid_size=0.2)

train set contains 365 days
test set contains 366 days


In [63]:
train_set.features.shape, test_set.features.shape, valid_set.features.shape

((144680, 5, 2), (182218, 5, 2), (36169, 5, 2))

The arrays have dimensions [sample, time step, feature]

## RNNs with only temperature data

As a comparison. Our simple networks got a train/test loss of around 1.07/1.01.

I am using a Gated Recurrent Unit (GRU) as my recurrent layer. LSTM is probably the more common one, but GRU is slightly cheaper and for our simple applications provides similar results. 

### RNN predicting only the last target

In [64]:
batch_size = 1024
hidden_nodes = 100   # Number of hidden nodes inside RNN cell

In [65]:
inp = Input(shape=(seq_len, 2, )) # time step, feature
x = GRU(hidden_nodes)(inp)
x = Dense(2, activation='linear')(x)
rnn_model = Model(inputs=inp, outputs=x)

In [66]:
rnn_model.compile(optimizer=Adam(0.01), loss=crps_cost_function)

In [67]:
rnn_model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         (None, 5, 2)              0         
_________________________________________________________________
gru_2 (GRU)                  (None, 100)               30900     
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 202       
Total params: 31,102
Trainable params: 31,102
Non-trainable params: 0
_________________________________________________________________


In [68]:
rnn_model.fit(train_set.features, train_set.targets[:,-1], epochs=10, batch_size=batch_size,
              validation_data=(valid_set.features, valid_set.targets[:,-1]))

Train on 144680 samples, validate on 36169 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f8c7e66aef0>

In [70]:
rnn_model.evaluate(test_set.features, test_set.targets[:,-1], batch_size=4096)



1.0353038025571735

So we get a better train score and a worse validation score. This indicates overfitting. 

### Sequence RNN

In [34]:
inp = Input(shape=(seq_len, 2, )) # time step, feature
x = GRU(hidden_nodes, return_sequences=True)(inp)
x = TimeDistributed(Dense(2, activation='linear'))(x)
seq_rnn_model = Model(inputs=inp, outputs=x)

In [36]:
seq_rnn_model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_4 (InputLayer)         (None, 5, 2)              0         
_________________________________________________________________
gru_4 (GRU)                  (None, 5, 100)            30900     
_________________________________________________________________
time_distributed_2 (TimeDist (None, 5, 2)              202       
Total params: 31,102
Trainable params: 31,102
Non-trainable params: 0
_________________________________________________________________


In [39]:
seq_rnn_model.compile(optimizer=Adam(0.01), loss=crps_cost_function_seq, 
                      sample_weight_mode="temporal")

In [58]:
def train_and_valid(model, train_set, test_set, epochs, batch_size):
    """Write our own function to train and validate, 
    because the keras fit function cannot handle sample weights for training
    and validation at the same time.
    """
    for i in range(epochs):
        print('Epoch:', i+1)
        h = model.fit(train_set.features, train_set.targets, epochs=1, batch_size=batch_size, 
                      sample_weight=train_set.sample_weights, verbose=0)
        print('Train:', h.history['loss'])
        print('Valid', model.evaluate(test_set.features, test_set.targets, batch_size=4096, 
                       sample_weight=test_set.sample_weights, verbose=0))

In [41]:
train_and_valid(seq_rnn_model, train_set, test_set, 10, batch_size)

Epoch: 0
Epoch 1/1
Valid 1.02194318586
Epoch: 1
Epoch 1/1
Valid 1.01296814224
Epoch: 2
Epoch 1/1
Valid 1.01225032061
Epoch: 3
Epoch 1/1
Valid 1.01777627614
Epoch: 4
Epoch 1/1
Valid 1.01947451983
Epoch: 5
Epoch 1/1
Valid 1.0184237752
Epoch: 6
Epoch 1/1
Valid 1.01643501133
Epoch: 7
Epoch 1/1
Valid 1.02314476758
Epoch: 8
Epoch 1/1
Valid 1.01951744772
Epoch: 9
Epoch 1/1
Valid 1.01807210738


Same as with the first RNN above we seem to overfit to the dataset, but maybe not as strongly. Let's now try a more complex model with a longer sequence length.

### Longer sequence

In [42]:
seq_len = 20
train_dates = ['2015-01-01', '2016-01-01']
test_dates =  ['2016-01-01', '2017-01-01']
train_set, test_set = get_train_test_sets(DATA_DIR, train_dates, test_dates, 
                                          seq_len=seq_len, fill_value=-999.)

train set contains 365 days
test set contains 366 days


In [43]:
hidden_nodes = 200

In [44]:
inp = Input(shape=(seq_len, 2, )) # time step, feature
x = GRU(hidden_nodes, return_sequences=True)(inp)
x = TimeDistributed(Dense(2, activation='linear'))(x)
seq_rnn_model = Model(inputs=inp, outputs=x)

In [45]:
seq_rnn_model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_5 (InputLayer)         (None, 20, 2)             0         
_________________________________________________________________
gru_5 (GRU)                  (None, 20, 200)           121800    
_________________________________________________________________
time_distributed_3 (TimeDist (None, 20, 2)             402       
Total params: 122,202
Trainable params: 122,202
Non-trainable params: 0
_________________________________________________________________


In [46]:
seq_rnn_model.compile(optimizer=Adam(0.01), loss=crps_cost_function_seq, 
                      sample_weight_mode="temporal")

In [47]:
train_and_valid(seq_rnn_model, train_set, test_set, 10, batch_size)

Epoch: 0
Epoch 1/1
Valid 1.09413727563
Epoch: 1
Epoch 1/1
Valid 1.01743481942
Epoch: 2
Epoch 1/1
Valid 1.04572273414
Epoch: 3
Epoch 1/1
Valid 1.0348635667
Epoch: 4
Epoch 1/1
Valid 1.01168127525
Epoch: 5
Epoch 1/1
Valid 1.01859074728
Epoch: 6
Epoch 1/1
Valid 1.02702659271
Epoch: 7
Epoch 1/1
Valid 1.01975782837
Epoch: 8
Epoch 1/1
Valid 1.04456050902
Epoch: 9
Epoch 1/1
Valid 1.07489963545


So again we are overfitting, but maybe there is something to be learned. Let's try a much longer training dataset.

### Reference experiment with longer training set.

In [48]:
train_dates = ['2008-01-01', '2016-01-01']
test_dates =  ['2016-01-01', '2017-01-01']
train_set, test_set = get_train_test_sets(DATA_DIR, train_dates, test_dates)

train set contains 2922 days
test set contains 366 days


In [49]:
# Copied from fc_network notebook
def build_fc_model():
    inp = Input(shape=(2,))
    x = Dense(2, activation='linear')(inp)
    return Model(inputs=inp, outputs=x)

In [50]:
fc_model = build_fc_model()
fc_model.compile(optimizer=Adam(0.1), loss=crps_cost_function)

In [51]:
fc_model.fit(train_set.features, train_set.targets, epochs=10, batch_size=1024,
             validation_data=[test_set.features, test_set.targets])

Train on 1456977 samples, validate on 182218 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fdc74391eb8>

Maybe a small improvement. Now let's test our sequence model with a longer training period.

### Sequence model with longer training set

In [52]:
seq_len = 20
train_dates = ['2008-01-01', '2016-01-01']
test_dates =  ['2016-01-01', '2017-01-01']
train_set, test_set = get_train_test_sets(DATA_DIR, train_dates, test_dates, 
                                          seq_len=seq_len, fill_value=-999.)

train set contains 2922 days
test set contains 366 days


In [59]:
inp = Input(shape=(seq_len, 2, )) # time step, feature
x = GRU(hidden_nodes, return_sequences=True)(inp)
x = TimeDistributed(Dense(2, activation='linear'))(x)
seq_rnn_model = Model(inputs=inp, outputs=x)
seq_rnn_model.compile(optimizer=Adam(0.01), loss=crps_cost_function_seq, 
                      sample_weight_mode="temporal")

In [60]:
# This takes a while!
train_and_valid(seq_rnn_model, train_set, test_set, 5, batch_size)

Epoch: 1
Train: [1.1365550085535305]
Valid 0.998565860164
Epoch: 2
Train: [1.0129750551209091]
Valid 1.02674336182
Epoch: 3
Train: [0.97901781529687926]
Valid 1.03270412821
Epoch: 4
Train: [0.95060623054262205]
Valid 1.03952660352
Epoch: 5
Train: [0.93341366950854698]
Valid 1.04783327321


After epoch one there is an improvement in the validation score, then we are starting to overfit again. So maybe some regularization is needed.

I am not quite sure how to regularize GRUs properly. Using the parameter dropout gives me nans. Using recurrent_dropout does not, so let's try that.

In [70]:
inp = Input(shape=(seq_len, 2, )) # time step, feature
x = GRU(hidden_nodes, return_sequences=True, recurrent_dropout=0.5)(inp)
x = TimeDistributed(Dense(2, activation='linear'))(x)
seq_rnn_model = Model(inputs=inp, outputs=x)
seq_rnn_model.compile(optimizer=Adam(0.001), loss=crps_cost_function_seq, 
                      sample_weight_mode="temporal")

In [71]:
train_and_valid(seq_rnn_model, train_set, test_set, 5, batch_size)

Epoch: 1
Train: [1.6217677070568353]
Valid 1.00813483157
Epoch: 2
Train: [1.0490734272999829]
Valid 1.00000224211
Epoch: 3
Train: [1.0414916153958527]
Valid 0.99807775358
Epoch: 4
Train: [1.0388923035127486]
Valid 0.995825972757
Epoch: 5
Train: [1.0369893335706539]
Valid 0.994749346544


## Predict only one value

In [169]:
inp = Input(shape=(seq_len, 2, )) # time step, feature
x = GRU(hidden_nodes)(inp)
x = Dense(2, activation='linear')(x)
rnn_model2 = Model(inputs=inp, outputs=x)

In [170]:
rnn_model2.compile(optimizer=Adam(0.001), loss=crps_cost_function)

In [171]:
rnn_model2.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_14 (InputLayer)        (None, 5, 2)              0         
_________________________________________________________________
gru_14 (GRU)                 (None, 20)                1380      
_________________________________________________________________
dense_14 (Dense)             (None, 2)                 42        
Total params: 1,422
Trainable params: 1,422
Non-trainable params: 0
_________________________________________________________________


In [173]:
rnn_model2.fit(x_seq_train, y_seq_train[:,-1], epochs=5, batch_size=1024,
              validation_data=(x_seq_test, y_seq_test[:,-1]))
#rnn_model2.fit(x_seq_train, y_seq_train[:,-1], epochs=10, batch_size=1024)

Train on 180849 samples, validate on 182218 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7fcb2d435e80>

## Get additional variables

In [212]:
from collections import OrderedDict
aux_dict = OrderedDict()
aux_dict['data_aux_geo_interpolated.nc'] = ['orog', 
                                            'station_alt', 
                                            'station_lat', 
                                            'station_lon']
aux_dict['data_aux_pl500_interpolated_00UTC.nc'] = ['u_pl500_fc',
                                                    'v_pl500_fc',
                                                    'gh_pl500_fc']
aux_dict['data_aux_pl850_interpolated_00UTC.nc'] = ['u_pl850_fc',
                                                    'v_pl850_fc',
                                                    'q_pl850_fc']
aux_dict['data_aux_surface_interpolated_00UTC.nc'] = ['cape_fc',
                                                      'sp_fc',
                                                      'tcc_fc']

In [213]:
train_set, test_set = get_train_test_sets(DATA_DIR, train_dates, test_dates, 
                                          seq_len=5, fill_value=-999., aux_dict=aux_dict)

train set contains 365 days
test set contains 366 days


In [214]:
n_features = train_set.features.shape[-1]
n_features

24

In [233]:
inp = Input(shape=(seq_len, n_features, )) # time step, feature
x = GRU(20, return_sequences=True)(inp)
# x = Dropout(0.5)(x)
# x = TimeDistributed(Dense(2, activation='linear'))(x)
x = TimeDistributed(Dense(2))(x)
rnn_model = Model(inputs=inp, outputs=x)

In [234]:
rnn_model.compile(optimizer=Adam(0.01), loss=crps_cost_function_seq, sample_weight_mode="temporal")

In [235]:
for i in range(10):
    rnn_model.fit(train_set.features, train_set.targets, epochs=1, batch_size=1024, 
                  sample_weight=train_set.sample_weights, verbose=0)
    print('Test', rnn_model.evaluate(train_set.features, train_set.targets, batch_size=4096, 
                   sample_weight=train_set.sample_weights, verbose=0))
    print('Valid', rnn_model.evaluate(test_set.features, test_set.targets, batch_size=4096, 
                   sample_weight=test_set.sample_weights, verbose=0))

Test 1.47494211695
Valid 1.50654933895
Test 0.977439547093
Valid 0.976263109129
Test 0.944071006104
Valid 0.952683184602
Test 0.927436171778
Valid 0.947129741995
Test 0.921549836288
Valid 0.941121052018
Test 0.910908860636
Valid 0.946302759566
Test 0.907522408847
Valid 0.943646217648
Test 0.899797803507
Valid 0.943910612833
Test 0.894670393442
Valid 0.942409784089
Test 0.891969689318
Valid 0.9460832431


In [236]:
inp = Input(shape=(seq_len, n_features, )) # time step, feature
x = GRU(20)(inp)
x = Dense(2, activation='linear')(x)
rnn_model2 = Model(inputs=inp, outputs=x)

In [237]:
rnn_model2.compile(optimizer=Adam(0.01), loss=crps_cost_function)

In [239]:
rnn_model2.fit(train_set.features, train_set.targets[:,-1], epochs=5, batch_size=1024,
              validation_data=(test_set.features, test_set.targets[:,-1]))

Train on 180849 samples, validate on 182218 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7fcb07e897b8>