# Global Average Land Temperature Time Series forecasting with a simple Gated Recurrent Units (GRU) Neural Network architecture, using TensorFlow, Keras and Talos.


<br>
<div style="text-align: justify">
Time Series forecasting is one of Machine Learning challenges, especially when it comes to weather parameters, as these a subject to multiple processes influence that may hinder a better learning of these sequences of values. In  this piece of work we are going to illustrate the power Deep Learning in tackling a typical problem, viz., forecasting the global average land temperature using a 2-layer GRU.  The dataset contains monthly temperatures records ranging from 1750, for the longest time series, to 2015.<br>

Just like performing a Grid-/Random-Search with Scikit-Learn, a set of hyperparameters will be optimized using Talos.
</div>


In [1]:
##################################################################################################################
##*********                    Moukouba Moutoumounkata, July 2020              ***************                  ##
##                Global Land Average Temperature Time Series Forecasting                                       ##
##                *******************************************************                                       ##
##  NOTE: please, copy and paste the notebook link into http://nbviewer.jupyter.org/ to show the graphics       ##       
##################################################################################################################



#we will try to build a reproductible experience, although without 
#guarantee, due to the extremely random nature of Neural Networks
from numpy.random import seed
seed(1)
from tensorflow import set_random_seed
set_random_seed(2)

import numpy as np
import pandas as pd
from operator import itemgetter
import matplotlib.pyplot as plt
import chart_studio.plotly as py
import plotly.express as px
import plotly.graph_objs as go
import seaborn as sns

from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

# Importing the Keras libraries and packages from TensorFlow
from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (Input, Dense, Dropout, LSTM, Flatten, 
                                     GRU, Bidirectional, TimeDistributed)
from tensorflow.keras.activations import relu, linear
from tensorflow.keras.optimizers import SGD, Adam, RMSprop
from tensorflow.keras.models import model_from_json
from tensorflow.keras import backend as K
import tensorflow as tf

import talos
from talos import scan, Evaluate, Reporting 
from talos.utils import early_stopper, lr_normalizer
from talos.utils.gpu_utils import parallel_gpu_jobs
from talos.utils.recover_best_model import recover_best_model

import warnings
warnings.filterwarnings("ignore")

%matplotlib inline

df = pd.read_csv("/home/moukouba/data_science/python/datasets/globaltemperatures.csv")

Using TensorFlow backend.


In [2]:
df.head()

Unnamed: 0,dt,landaveragetemperature,landaveragetemperatureuncertainty,landmaxtemperature,landmaxtemperatureuncertainty,landmintemperature,landmintemperatureuncertainty,landandoceanaveragetemperature,landandoceanaveragetemperatureuncertainty
0,1750-01-01,3.034,3.574,,,,,,
1,1750-02-01,3.083,3.702,,,,,,
2,1750-03-01,5.626,3.076,,,,,,
3,1750-04-01,8.49,2.451,,,,,,
4,1750-05-01,11.573,2.072,,,,,,


In [3]:
df.shape

(3192, 9)

In [4]:
df["dt"] = df.dt.apply(pd.to_datetime, errors='coerce')

df1 = df[["dt", "landaveragetemperature", "landaveragetemperatureuncertainty"]].set_index("dt")
df2 = df[["dt", "landmintemperature", "landmintemperatureuncertainty"]].set_index("dt")

print(df1.isnull().sum())
print(df2.isnull().sum())
df1.head(15)

landaveragetemperature               12
landaveragetemperatureuncertainty    12
dtype: int64
landmintemperature               1200
landmintemperatureuncertainty    1200
dtype: int64


Unnamed: 0_level_0,landaveragetemperature,landaveragetemperatureuncertainty
dt,Unnamed: 1_level_1,Unnamed: 2_level_1
1750-01-01,3.034,3.574
1750-02-01,3.083,3.702
1750-03-01,5.626,3.076
1750-04-01,8.49,2.451
1750-05-01,11.573,2.072
1750-06-01,12.937,1.724
1750-07-01,15.868,1.911
1750-08-01,14.75,2.231
1750-09-01,11.413,2.637
1750-10-01,6.367,2.668


In [5]:
df1.shape

(3192, 2)

In [6]:
df2 = df2.dropna()
df2.head()

Unnamed: 0_level_0,landmintemperature,landmintemperatureuncertainty
dt,Unnamed: 1_level_1,Unnamed: 2_level_1
1850-01-01,-3.206,2.822
1850-02-01,-2.291,1.623
1850-03-01,-1.905,1.41
1850-04-01,1.018,1.329
1850-05-01,3.811,1.347


<div style="text-align: justify">
As can be seen, the sequence has a few missing values (#12). These will be imputed using the Long Term Mean (LTM) of specific day, but not the LTM of all the series, so as not to affect the range of values that specific day may take.
</div>

In [7]:
#We define a function that replaces the missing entries with the long term mean of that specific day of the year
def imputer(df):
    #The actual dates with missing values within all the dataset
    dates_nan = pd.Series([index for index, row in df.iterrows() if row.isnull().any()])
    
    #The specific days of the year with some missing values (without duplicates)
    days_nan = dates_nan.dt.strftime('%m-%d').drop_duplicates()
    
    #The Sets of specific days with missing values
    set_with_missing = [[index for index in df.index if day in str(index)] for day in days_nan]
    
    #Now, we can replace the missings with, say, the mean/mode of each set
    for missing in dates_nan:
        for labels in set_with_missing:
            if missing in labels:
                df.loc[missing,] = df.loc[labels,].mean() 
    return df

In [8]:
df1 = imputer(df1)

df1.head(15)

Unnamed: 0_level_0,landaveragetemperature,landaveragetemperatureuncertainty
dt,Unnamed: 1_level_1,Unnamed: 2_level_1
1750-01-01,3.034,3.574
1750-02-01,3.083,3.702
1750-03-01,5.626,3.076
1750-04-01,8.49,2.451
1750-05-01,11.573,2.072
1750-06-01,12.937,1.724
1750-07-01,15.868,1.911
1750-08-01,14.75,2.231
1750-09-01,11.413,2.637
1750-10-01,6.367,2.668


<div style="text-align: justify">
Although the main focus of the work is to try and forecast the Land Average Temperature (LAT) only, it has been well worthwhile to pinpoint how global temperatures (average and minimum) have varied over the years, especially since 1900, in the context of Climate Change. That is, a succinct analysis of the Land Minimum Temperature (LMT) has been done accordingly. This has been done through graphical analysis of the series line plots.
</div>

In [31]:
#For the ease of readability, the series will be broken into three portions
#x1, x2, x3 = itemgetter(0, 1, 2)(np.array_split(X, 3))
x = np.array_split(df1, 3)

for data in x:
    fig = px.line(data["landaveragetemperature"])
    fig.show()

In [32]:
fig = px.line(df1["landaveragetemperatureuncertainty"])
fig.show()

In [11]:
#z1, z2, z3 = itemgetter(0, 1, 2)(np.array_split(Z, 3))

z = np.array_split(df2, 3)

for data in z:
    fig = px.line(data["landmintemperature"])
    fig.show()

In [12]:
#Subsetting one month, say July, and plot the series' curve
df3 = df1[df1.index.month == 7]
df4 = df2[df2.index.month == 7]

fig = px.line(df3["landaveragetemperature"])
fig.show()

In [13]:
fig = px.line(df4["landmintemperature"])
fig.show()

<div style="text-align: justify">
Carefully looking at the above graphics, we can say that both the LAT and the LMT are indeed rising! It can be noticed that, up to 1930, the LAT was fluctuating about $14°C$; but from 1930 on, the LAT has gone well above this value, exceeding $15°C$ from 1995 onwards.
Furthermore, the LMT is also on the rise, with values below $-3°C$ up to 1954; and from this year, the LMT has risen above $-3°C$ and has generally gone above $-2°C$.
<br><br>
Finally, the two last graphics above, which exemplify temperature variations during the month of July, conspicuously show that global warming is a reality, as temperatures, especially the LMT, have been continuously rising since 1900.
<br><br>
Worthy of note, since the World did not have a dense network of observation stations until the last century, these values have been mostly estimated through Re-analysis, and the uncertainty curve shows that the farther we go back in time, the larger the uncertainty in estimation, with values approaching zero in recent decades. Therefore, values up to the end of 1800s should be considered with a grain of salt, as they tend to be inconsistent with the above observations.  
</div>

## Data Preparation

<div style="text-align: justify">
GRUs are Recurrent Neural Network models, and expect three-dimensional input with the shape [samples, timesteps, features]. The data will, thus, be reshaped accordingly. But first, we check for the consistency of the data; then through a certain number of transformations before feeding it to the model.
</div>

In [14]:
#Plotting boxplots to check for outliers

trace0 = go.Box(y=df2["landmintemperature"], name="Min_temp")
trace1 = go.Box(y=df1["landaveragetemperature"], name="Avg_temp")

sequences = [trace0, trace1]
fig = go.Figure(sequences)
fig.show()

<div style="text-align: justify">
The above boxplots may be missleading, as they show that the data is free from outliers. But a careful understanding of the variability of temperature with respect to months/seasons is enough to consider each month individually; for example, considering the month of February, we can see that it's not exempt from outliers, as shown in the graphic below. But, as stipulated above, we want to investigate the power of Deep Learing, so will we leave the algorithm to learn and make forecasts. 
</div>

In [15]:
#Boxplots for the specific month of July
trace0 = go.Box(y=df4["landmintemperature"], name="Min_temp")
trace1 = go.Box(y=df3["landaveragetemperature"], name="Avg_temp")

sequences = [trace0, trace1]
fig = go.Figure(sequences)
fig.show()

In [16]:
#We define a function that transforms the data to the required shape
def convert2matrix(data_arr, n_steps):
    X, y = [], []
    for i in range(len(data_arr) - n_steps):
        d = i + n_steps  
        X.append(data_arr[i:d, ])
        y.append(data_arr[d, ])
    return np.array(X), np.array(y)

#We define a function that splits the data into train, validation and test sets
def data_splitter(df, train_fraction):
    train_size = int(len(df)*train_fraction)
    train, test = df[0:train_size, ], df[train_size:len(df), ]
    return train, test

In [17]:
#Split the data into train, validation and test series
series = df1["landaveragetemperature"].values 

train, test = data_splitter(series, 0.85)
train, val = data_splitter(series, 0.75)
print(len(train), len(val), len(test))

2394 798 479


In [18]:
#Convert dataset into right shape to turn the problem into a supervised... 
#... learning one. We choose n_steps = 36, thus, we use we look back up to... 
#... 3 years (36) consecutive months to be able to forecast the next month

n_steps = 36

X_train, y_train = convert2matrix(train, n_steps)
X_val, y_val = convert2matrix(val, n_steps)
X_test, y_test = convert2matrix(test, n_steps)

#Scale the data
b_scaled = X_train.copy()
b_scaled_val = X_val.copy()
b_scaled_test = X_test.copy()

scaler = MinMaxScaler(feature_range=(0, 1))
x_train = scaler.fit_transform(b_scaled)
x_val = scaler.transform(b_scaled_val)
x_test = scaler.transform(b_scaled_test)

# reshape input to be [samples, time steps, features]
x_train = np.reshape(x_train, (x_train.shape[0], 1, x_train.shape[1]))
x_val = np.reshape(x_val, (x_val.shape[0], 1, x_val.shape[1]))
x_test = np.reshape(x_test, (x_test.shape[0], 1, x_test.shape[1]))

In [19]:
b_scaled_test.shape

(443, 36)

In [20]:
for i in range(5):
    print(b_scaled_test[i], y_test[i])

[ 3.031  4.517  8.294 10.942 13.086 14.155 13.511 11.895  8.511  5.66
  3.681  2.492  3.471  5.702  8.85  11.78  13.876 14.631 14.09  11.862
  9.156  6.544  3.749  2.705  3.456  5.607  8.791 11.414 13.22  14.364
 13.297 12.03   9.339  6.35   3.74   2.679] 2.841
[ 4.517  8.294 10.942 13.086 14.155 13.511 11.895  8.511  5.66   3.681
  2.492  3.471  5.702  8.85  11.78  13.876 14.631 14.09  11.862  9.156
  6.544  3.749  2.705  3.456  5.607  8.791 11.414 13.22  14.364 13.297
 12.03   9.339  6.35   3.74   2.679  2.841] 5.474
[ 8.294 10.942 13.086 14.155 13.511 11.895  8.511  5.66   3.681  2.492
  3.471  5.702  8.85  11.78  13.876 14.631 14.09  11.862  9.156  6.544
  3.749  2.705  3.456  5.607  8.791 11.414 13.22  14.364 13.297 12.03
  9.339  6.35   3.74   2.679  2.841  5.474] 8.455
[10.942 13.086 14.155 13.511 11.895  8.511  5.66   3.681  2.492  3.471
  5.702  8.85  11.78  13.876 14.631 14.09  11.862  9.156  6.544  3.749
  2.705  3.456  5.607  8.791 11.414 13.22  14.364 13.297 12.03   9.339


<div style="text-align: justify">
Now, we build a basic GRU neural network, with two hidden layers and one  dropout-regularization layer. Two metrics (the Mean Absolute Error and the Coefficient of Determination (r_squared)), will be used.
</div>

In [21]:
# Setting the dictionary of the hyperparameter to be included in the optimisation process 
p = {'epochs': (20, 150, 5),
     'neurons1': [ 32, 64, 128, 256],
     'neurons2': [32, 64, 128, 256],
     'dropout': (0.1, 0.6, 5),
     'loss': ['mse', 'mae'],
     'activation1':[relu, None,],
     'activation2':[linear, None,],
     'batch_size': [8, 16, 32, 64, 128],
     'optimizer': ['Adam', 'SGD', 'RMSprop']
     }

In [22]:
#Define custom coefficient of determination metric
def r_squared(y_true, y_pred):
    SS_res =  K.sum(K.square(y_true-y_pred)) 
    SS_tot = K.sum(K.square(y_true - K.mean(y_true))) 
    return (1 - SS_res/(SS_tot + K.epsilon()))


#We define a model builder function. We wrap the first hidden layer into a Bidirectional layer 
def model_builder(x_train, y_train, x_val, y_val, params, n_steps = 36, n_features = 1):
    
    tf.keras.backend.clear_session()
    
    model = Sequential([
        Bidirectional(GRU(params['neurons1'], return_sequences=True, 
                          activation=params['activation1'], input_shape=(n_features, n_steps))),
        GRU(params['neurons2'], activation=params['activation1']),
        Dropout(params['dropout']),
        
        Dense(1, activation=params['activation2'])        
    ])
    
    model.compile(optimizer=params['optimizer'], loss=params['loss'], metrics=['mae', r_squared])
    
    history = model.fit(x_train, y_train, epochs=params['epochs'], batch_size=params['batch_size'],  verbose=0, 
                        validation_data=[x_val, y_val], callbacks=[early_stopper(epochs=params['epochs'], 
                                                                                 mode='moderate', monitor='val_loss')])
    
    return history, model

<div style="text-align: justify">
Next, we run the experiments creating a Scan object (scrutinizer) and split the GPU memory in two for two parallel jobs; and for the sake of computational burden, only 1% of the hyperparameter space, randomly downsampled, was used - that is, 480 rounds - and the process took 1 hour and 43 minutes to execute, on a modest GTX-1050 GPU. 
</div>

In [24]:
parallel_gpu_jobs(0.5)

scrutinizer = talos.Scan(x=x_train, y=y_train, x_val=x_test, y_val=y_test, seed=42, 
                         model=model_builder, experiment_name='time_series__gru_hpo_1', 
                         params=p,fraction_limit=0.01, reduction_metric='val_loss')


  0%|          | 0/480 [00:00<?, ?it/s][A
  0%|          | 1/480 [00:05<42:02,  5.27s/it][A
  0%|          | 2/480 [00:09<39:07,  4.91s/it][A
  1%|          | 3/480 [00:23<1:01:02,  7.68s/it][A
  1%|          | 4/480 [00:37<1:16:49,  9.68s/it][A
  1%|          | 5/480 [00:48<1:18:14,  9.88s/it][A
  1%|▏         | 6/480 [00:53<1:06:41,  8.44s/it][A
  1%|▏         | 7/480 [00:57<57:00,  7.23s/it]  [A
  2%|▏         | 8/480 [01:08<1:06:26,  8.45s/it][A
  2%|▏         | 9/480 [01:32<1:41:35, 12.94s/it][A
  2%|▏         | 10/480 [01:47<1:47:18, 13.70s/it][A
  2%|▏         | 11/480 [01:54<1:30:24, 11.57s/it][A
  2%|▎         | 12/480 [02:05<1:30:05, 11.55s/it][A
  3%|▎         | 13/480 [02:09<1:10:18,  9.03s/it][A
  3%|▎         | 14/480 [02:15<1:03:08,  8.13s/it][A
  3%|▎         | 15/480 [02:36<1:34:25, 12.18s/it][A
  3%|▎         | 16/480 [02:47<1:31:09, 11.79s/it][A
  4%|▎         | 17/480 [02:53<1:16:29,  9.91s/it][A
  4%|▍         | 18/480 [03:01<1:13:33,  9.55s/it]

 32%|███▏      | 152/480 [28:39<1:48:15, 19.80s/it][A
 32%|███▏      | 153/480 [28:48<1:29:54, 16.50s/it][A
 32%|███▏      | 154/480 [28:54<1:12:58, 13.43s/it][A
 32%|███▏      | 155/480 [29:57<2:33:16, 28.30s/it][A
 32%|███▎      | 156/480 [30:15<2:15:48, 25.15s/it][A
 33%|███▎      | 157/480 [30:21<1:44:45, 19.46s/it][A
 33%|███▎      | 158/480 [30:44<1:49:16, 20.36s/it][A
 33%|███▎      | 159/480 [30:49<1:24:45, 15.84s/it][A
 33%|███▎      | 160/480 [30:57<1:12:29, 13.59s/it][A
 34%|███▎      | 161/480 [31:08<1:07:07, 12.63s/it][A
 34%|███▍      | 162/480 [31:14<56:37, 10.68s/it]  [A
 34%|███▍      | 163/480 [31:17<44:30,  8.42s/it][A
 34%|███▍      | 164/480 [31:53<1:27:27, 16.60s/it][A
 34%|███▍      | 165/480 [32:02<1:16:27, 14.56s/it][A
 35%|███▍      | 166/480 [32:08<1:02:02, 11.86s/it][A
 35%|███▍      | 167/480 [32:15<53:49, 10.32s/it]  [A
 35%|███▌      | 168/480 [32:24<52:46, 10.15s/it][A
 35%|███▌      | 169/480 [32:29<44:28,  8.58s/it][A
 35%|███▌      |

 63%|██████▎   | 304/480 [1:01:24<27:36,  9.41s/it][A
 64%|██████▎   | 305/480 [1:01:33<27:04,  9.28s/it][A
 64%|██████▍   | 306/480 [1:01:38<22:35,  7.79s/it][A
 64%|██████▍   | 307/480 [1:03:26<1:49:26, 37.95s/it][A
 64%|██████▍   | 308/480 [1:03:47<1:34:08, 32.84s/it][A
 64%|██████▍   | 309/480 [1:04:02<1:18:11, 27.43s/it][A
 65%|██████▍   | 310/480 [1:04:08<59:32, 21.02s/it]  [A
 65%|██████▍   | 311/480 [1:04:35<1:04:26, 22.88s/it][A
 65%|██████▌   | 312/480 [1:04:41<50:11, 17.92s/it]  [A
 65%|██████▌   | 313/480 [1:04:45<38:14, 13.74s/it][A
 65%|██████▌   | 314/480 [1:04:55<34:48, 12.58s/it][A
 66%|██████▌   | 315/480 [1:04:59<27:11,  9.89s/it][A
 66%|██████▌   | 316/480 [1:05:03<22:06,  8.09s/it][A
 66%|██████▌   | 317/480 [1:05:32<39:04, 14.38s/it][A
 66%|██████▋   | 318/480 [1:05:50<41:53, 15.52s/it][A
 66%|██████▋   | 319/480 [1:06:01<37:43, 14.06s/it][A
 67%|██████▋   | 320/480 [1:06:16<38:37, 14.48s/it][A
 67%|██████▋   | 321/480 [1:06:34<41:24, 15.63s/it][

 94%|█████████▍| 452/480 [1:32:57<03:55,  8.42s/it][A
 94%|█████████▍| 453/480 [1:33:09<04:16,  9.49s/it][A
 95%|█████████▍| 454/480 [1:33:18<04:06,  9.46s/it][A
 95%|█████████▍| 455/480 [1:33:32<04:31, 10.84s/it][A
 95%|█████████▌| 456/480 [1:33:49<05:02, 12.61s/it][A
 95%|█████████▌| 457/480 [1:33:59<04:31, 11.81s/it][A
 95%|█████████▌| 458/480 [1:34:07<03:50, 10.50s/it][A
 96%|█████████▌| 459/480 [1:34:18<03:43, 10.66s/it][A
 96%|█████████▌| 460/480 [1:34:33<04:00, 12.01s/it][A
 96%|█████████▌| 461/480 [1:34:37<03:03,  9.66s/it][A
 96%|█████████▋| 462/480 [1:35:46<08:13, 27.42s/it][A
 96%|█████████▋| 463/480 [1:35:54<06:09, 21.75s/it][A
 97%|█████████▋| 464/480 [1:36:20<06:07, 23.00s/it][A
 97%|█████████▋| 465/480 [1:36:25<04:24, 17.63s/it][A
 97%|█████████▋| 466/480 [1:36:32<03:21, 14.40s/it][A
 97%|█████████▋| 467/480 [1:36:37<02:31, 11.64s/it][A
 98%|█████████▊| 468/480 [1:36:51<02:27, 12.25s/it][A
 98%|█████████▊| 469/480 [1:37:09<02:33, 13.92s/it][A
 98%|█████

In [25]:
scrutinizer.details

experiment_name        time_series__gru_hpo_1
random_method                uniform_mersenne
reduction_method                         None
reduction_interval                         50
reduction_window                           20
reduction_threshold                       0.2
reduction_metric                     val_loss
complete_time                  07/25/20/08:12
x_shape                         (2358, 1, 36)
y_shape                               (2358,)
dtype: object

In [26]:
analyze_object = talos.Analyze(scrutinizer)
analyze_object.data

Unnamed: 0,start,end,duration,round_epochs,loss,mean_absolute_error,r_squared,val_loss,val_mean_absolute_error,val_r_squared,activation1,activation2,batch_size,dropout,epochs,loss.1,neurons1,neurons2,optimizer
0,07/25/20-063215,07/25/20-063220,5.035407,7,1.439386,1.439386,0.802741,0.330142,0.330142,0.989781,<function relu at 0x7fd0526c55f0>,<function linear at 0x7fd0526c58c0>,32,0.5,20,mae,128,64,RMSprop
1,07/25/20-063220,07/25/20-063224,3.906805,15,0.904662,0.904662,0.925316,0.468399,0.468399,0.979842,,<function linear at 0x7fd0526c58c0>,128,0.1,20,mae,128,32,SGD
2,07/25/20-063225,07/25/20-063239,13.953959,27,0.856514,0.856514,0.930507,0.392612,0.392612,0.985738,<function relu at 0x7fd0526c55f0>,<function linear at 0x7fd0526c58c0>,32,0.4,72,mae,64,256,Adam
3,07/25/20-063239,07/25/20-063253,14.190145,28,1.083846,0.754136,0.939367,0.316230,0.476759,0.979417,,,16,0.4,98,mse,64,128,SGD
4,07/25/20-063253,07/25/20-063303,10.165519,10,0.908455,0.908455,0.923679,0.352476,0.352476,0.988389,<function relu at 0x7fd0526c55f0>,,32,0.1,98,mae,256,64,RMSprop
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
475,07/25/20-081134,07/25/20-081149,15.243603,16,0.669704,0.669704,0.941218,0.403760,0.403760,0.975214,,,8,0.1,98,mae,32,128,SGD
476,07/25/20-081149,07/25/20-081203,13.670615,14,1.699057,1.699058,0.643340,0.778763,0.778763,0.942424,<function relu at 0x7fd0526c55f0>,,8,0.5,98,mae,64,32,RMSprop
477,07/25/20-081203,07/25/20-081211,7.495608,21,1.286148,0.843235,0.929853,1.239018,1.048762,0.923064,,,32,0.2,72,mse,128,32,SGD
478,07/25/20-081211,07/25/20-081221,9.789307,37,1.161870,1.161870,0.876745,0.715827,0.715827,0.960211,,<function linear at 0x7fd0526c58c0>,128,0.5,124,mae,256,64,SGD


In [27]:
# The lowest root mean squared error achieved on validation set 
best_r_squared = analyze_object.high('val_r_squared')
# The lowest mean absolute error achieved on validation set 
best_mae = analyze_object.low('val_mean_absolute_error')

print("The scores for r_squared and mae, on validation set are %.5f and %.5f, respectively."%(best_r_squared, best_mae))

The scores for r_squared and mae, on validation set are 0.99253 and 0.27230, respectively.


In [28]:
# The best models based on respective metrics
best1 = scrutinizer.best_model(metric='val_r_squared', asc=False)
best2 = scrutinizer.best_model(metric='val_mean_absolute_error', asc=False)

# Predicting the Test set results
y_pred_rsq = best1.predict(x_test)
y_pred_mae = best2.predict(x_test)
r2_1 = r2_score(y_test, y_pred_rsq)
r2_2 = r2_score(y_test, y_pred_mae)
print('R-squared for the mae-based metric, on test set, is %.5f .'%(r2_2))

#
y0 = y_test.flatten()
y1 = y_pred_rsq.flatten()
y2 = y_pred_mae.flatten()
results = pd.DataFrame({"y_test":y0, "y_pred_rsq":y1, "y_pred_mae":y2})

results.tail(30)

R-squared for the mae-based metric, on test set, is 0.66884 .


Unnamed: 0,y_test,y_pred_rsq,y_pred_mae
413,15.003,15.610468,19.072659
414,14.742,14.941566,18.447414
415,13.154,12.978126,16.05217
416,10.256,10.125749,12.58973
417,7.424,6.861506,8.842847
418,4.724,4.369382,5.923769
419,3.732,3.272722,4.593326
420,3.5,3.966535,5.218691
421,6.378,6.004427,7.486385
422,9.589,9.167122,11.012334


In [29]:
xpoints = df.iloc[-443:, 0]

trace0 = go.Scatter(x=xpoints, y=y0, name='Actual Values')
trace1 = go.Scatter(x=xpoints, y=y1, name='Predicted (RSQ)')
trace2 = go.Scatter(x=xpoints, y=y2, name='Predicted (MAE)')

data = [trace0, trace1, trace2]

layout=go.Layout(title="Actual and Predicted Temperatures", xaxis={'title':'Year'}, yaxis={'title':'Temprature'})
fig = go.Figure(data=data, layout=layout)
fig.show()

In [30]:
# We can now save the best model for further use or deployment

# Get the best model index based on the highest 'validation ROC_AUC' 
model_id = analyze_object.data[['val_r_squared']].idxmax()[0]

# Clear any previous TensorFlow session.
tf.keras.backend.clear_session()

# Load the model parameters from the scanner.
model = model_from_json(scrutinizer.saved_models[model_id])
model.set_weights(scrutinizer.saved_weights[model_id])
model.summary()
model.save('./avg_land_temp_best_model.h5')

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
bidirectional (Bidirectional multiple                  38784     
_________________________________________________________________
gru_1 (GRU)                  multiple                  295680    
_________________________________________________________________
dropout (Dropout)            multiple                  0         
_________________________________________________________________
dense (Dense)                multiple                  257       
Total params: 334,721
Trainable params: 334,721
Non-trainable params: 0
_________________________________________________________________


## Conclusion:
<br>
<div style="text-align: justify">
As can be seen, a GRU achieves a mind-blowing Coefficiencient of Determination (r_squared) of 99% on completely new data (test set). The curve (red line), of the predicted values perfectly mimics the actual values curve (blue line), almost completely shading it. Also, the r_squared-based model, by far, outperforms the one based on Mean Absolute Error.<br><br>
To implement a GRU on the above time series, the number of steps was chosen through trials, and 36 has shown to yield a good performance. Unfortunately, Talos does not have a mechanism that can permit to use an sklearn data processing pipeline in order to incorporate n_steps in the hyperparameter grid so as to automatically search its best value.
</div>