### SHRED applied to SST dataset

This iPython notebook gives an introductory walkthrough to using SHRED models.  The dataset we consider is weekly mean sea-surface temperature as given by the NOAA Optimum Interpolation SST V2 dataset (https://psl.noaa.gov/data/gridded/data.noaa.oisst.v2.html).

SHRED (SHallow REcurrent Decoder) models are a network architecture that merges a recurrent layer (LSTM) with a shallow decoder network (SDN) to reconstruct high-dimensional spatio-temporal fields from a trajectory of sensor measurements of the field. More formally, the SHRED architecture can be written as 
$$ \mathcal {H} \left( \{ y_i \} _{i=t-k}^t \right) = \mathcal {F} \left( \mathcal {G} \left( \{ y_i \} _{i=t-k}^t \right) ; W_{RN}) ; W_{SD} \right)$$
where $\mathcal F$ is a feed forward network parameterized by weights $W_{SD}$, $\mathcal G$ is a LSTM network parameterized by weights $W_{RN}$, and $\{ y_i \} _{i=t-k}^t$ is a trajectory of sensor measurements of a high-dimensional spatio-temporal field $\{ x_i \} _{i=t-k}^t$.

We first randomly select 3 sensor locations and set the trajectory length (lags) to 52, corresponding to one year of measurements.

In [1]:
import numpy as np
from processdata import load_data
from processdata import TimeSeriesDataset
import models
import torch
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler

num_sensors = 3 
lags = 52
load_X = load_data('SST')
n = load_X.shape[0]
m = load_X.shape[1]
sensor_locations = np.random.choice(m, size=num_sensors, replace=False)

We now select indices to divide the data into training, validation, and test sets.

In [2]:
train_indices = np.random.choice(n - lags, size=1000, replace=False)
mask = np.ones(n - lags)
mask[train_indices] = 0
valid_test_indices = np.arange(0, n - lags)[np.where(mask!=0)[0]]
valid_indices = valid_test_indices[::2]
test_indices = valid_test_indices[1::2]

In [24]:
# FRTP:
n_test = 174
n_valid = 174
n_train = 1000

train_indices = np.arange(0, n_train)
valid_test_indices = np.arange(n_train,n_train+n_valid)
test_indices = np.arange(n_train+n_valid,n_train+n_valid+n_test)


In [25]:
# FRTP: Question: Isn't the test and training data overlapping? Only the start data point is varying, but the lagged sequence has been seen before?? 

sklearn's MinMaxScaler is used to preprocess the data for training and we generate input/output pairs for the training, validation, and test sets. 

In [26]:
sc = MinMaxScaler()
sc = sc.fit(load_X[train_indices])
transformed_X = sc.transform(load_X)

We now organize the data such that the inputs are of shape (batch_size, lags, num_sensors) with corresponding outputs of size (batch_size, state_dimension).

In [27]:
### Generate input sequences to a SHRED model
all_data_in = np.zeros((n - lags, lags, num_sensors))
for i in range(len(all_data_in)):
    all_data_in[i] = transformed_X[i:i+lags, sensor_locations]

### Generate training validation and test datasets both for reconstruction of states and forecasting sensors
device = 'cuda' if torch.cuda.is_available() else 'cpu'

train_data_in = torch.tensor(all_data_in[train_indices], dtype=torch.float32).to(device)
valid_data_in = torch.tensor(all_data_in[valid_indices], dtype=torch.float32).to(device)
test_data_in = torch.tensor(all_data_in[test_indices], dtype=torch.float32).to(device)

### -1 to have output be at the same time as final sensor measurements
train_data_out = torch.tensor(transformed_X[train_indices + lags - 1], dtype=torch.float32).to(device)
valid_data_out = torch.tensor(transformed_X[valid_indices + lags - 1], dtype=torch.float32).to(device)
test_data_out = torch.tensor(transformed_X[test_indices + lags - 1], dtype=torch.float32).to(device)

train_dataset = TimeSeriesDataset(train_data_in, train_data_out)
valid_dataset = TimeSeriesDataset(valid_data_in, valid_data_out)
test_dataset = TimeSeriesDataset(test_data_in, test_data_out)

We train the model using the training and validation datasets.

In [28]:
shred = models.SHRED(num_sensors, m, hidden_size=64, hidden_layers=2, l1=350, l2=400, dropout=0.1).to(device)
validation_errors = models.fit(shred, train_dataset, valid_dataset, batch_size=64, num_epochs=1000, lr=1e-3, verbose=True, patience=5)

Training epoch 1
Error tensor(0.4682)
Training epoch 20
Error tensor(0.2172)
Training epoch 40
Error tensor(0.2107)
Training epoch 60
Error tensor(0.2098)
Training epoch 80
Error tensor(0.2094)
Training epoch 100
Error tensor(0.2089)
Training epoch 120
Error tensor(0.2087)
Training epoch 140
Error tensor(0.2084)
Training epoch 160
Error tensor(0.2065)
Training epoch 180
Error tensor(0.2019)
Training epoch 200
Error tensor(0.1981)
Training epoch 220
Error tensor(0.1963)
Training epoch 240
Error tensor(0.1890)
Training epoch 260
Error tensor(0.1858)
Training epoch 280
Error tensor(0.1791)
Training epoch 300
Error tensor(0.1789)
Training epoch 320
Error tensor(0.1741)
Training epoch 340
Error tensor(0.1739)
Training epoch 360
Error tensor(0.1705)
Training epoch 380
Error tensor(0.1674)
Training epoch 400
Error tensor(0.1661)
Training epoch 420
Error tensor(0.1648)
Training epoch 440
Error tensor(0.1663)
Training epoch 460
Error tensor(0.1654)
Training epoch 480
Error tensor(0.1638)
Traini

Finally, we generate reconstructions from the test set and print mean square error compared to the ground truth.

In [29]:
test_recons = sc.inverse_transform(shred(test_dataset.X).detach().cpu().numpy())
test_ground_truth = sc.inverse_transform(test_dataset.Y.detach().cpu().numpy())
print('Test Reconstruction Error: ')
print(np.linalg.norm(test_recons - test_ground_truth) / np.linalg.norm(test_ground_truth))

Test Reconstruction Error: 
0.04866559


In [None]:
# Default from paper ?!?!?!
# Test Reconstruction Error: 
# 0.0193155

# When train and test are clearly separated:
# Test Reconstruction Error: 
# 0.04866559