<center>
    
    Vector Output of LSTMs
    
    Author: Daniel Coble
    
    Status: Work in Progress
</center>

An LSTM's complexity is determined by it's 'units'. This is the size of the cell state and hidden states, and an LSTM cell with a larger units number is expected to recognize more complex patterns. But because the output size of the LSTM also has units size (being the hidden state), we often come into the problem that the output has a higher dimension than what we need for our purposes. In fact, often we only want a scalar returned from the LSTM. The common solution to this is to add a dense layer to the top of the model. This works fine, but I suspect a better way to extract a scalar from the vector output of an LSTM would be to simply take the first element of its output.

At first this may seem like it wouldn't work, but remember that the first element of the output is related to every other element through time. What this would seem to allow would be to let the LSTM find features which are relevant but have a nonlinear relationship with the prediction (whereas with a dense top each feature must have a linear relation to the prediction).


In this notebook, I'll train LSTM models on the same dataset used in the "Training an LSTM" notebook: a time series sine wave. I'll train the models to predict frequency and also the next element. I'll train three models, which only vary in the top layer. The first model will be the normal dense top, the second will take only the first element, and the third takes the norm of the output vector.

**TensorFlow 2.5.0 \
Numpy 1.19.5**

In [5]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import tensorflow.keras as keras
from sklearn.metrics import mean_squared_error

def generate_time_series(batch_size, n_steps, y_type = 'period'):
    T = np.random.rand(1, batch_size, 1) * 8 + 2
    phase = np.random.rand(1, batch_size, 1)*2*np.pi
    A = np.random.rand(1, batch_size, 1)*9.8 + .2
    time = np.linspace(0, n_steps, n_steps)
    series = A * np.sin((time - phase)*2*np.pi/T)
    series += 0.1 * (np.random.rand(1, batch_size, n_steps) - .5)
    rtrn = np.expand_dims(np.squeeze(series.astype(np.float32)), axis=2)
    if(y_type == 'amplitude'):
        return rtrn, A.flatten()
    if(y_type == 'frequency'):
        return rtrn, 1/T.flatten()
    if(y_type == 'next_element'):
        return rtrn[:,:,:-1], rtrn[:,:,-1]
    return rtrn, T.flatten()

Make `keras` custom layers for `TakeFirstElement` and `NormVector`. The model scales the output by a learnable parameter.

In [22]:
class TakeFirstElement(keras.layers.Layer):
    
    def __init__(self):
        super(TakeFirstElement, self).__init__()
        self.scaler = self.add_weight(
            shape=(1,),
            name='scaler',
            trainable=True
        )
    
    def compute_output_shape(self, input_shape):
        return [1,1]
        
    
    def call(self, x):
        # return tf.matmul(self.scaler, x[:,0])
        return x[:,0]
        
class NormVector(keras.layers.Layer):
    
    def __init__(self):
        super(NormVector, self).__init__()
        self.scaler = self.add_weight(
            shape=(1,1),
            name='scaler',
            trainable=True
        )
    
    def compute_output_shape(self, input_shape):
        return [1,1]
    
    def call(self, x):
        # return tf.matmul(self.scaler, tf.norm(x, axis=1)
        return tf.norm(x, axis=1)

Now we can train the models for predicting frequency.

In [23]:
dense_top = keras.layers.TimeDistributed(keras.layers.Dense(1))
element_top = keras.layers.TimeDistributed(TakeFirstElement())
norm_top = keras.layers.TimeDistributed(NormVector())
tops = [dense_top, norm_top, element_top]

freq_rmse = [0,0,0] # this will fill with RMSE for each model type
np.random.seed(42)
n_steps = 75
X, y = generate_time_series(10000, n_steps + 1, y_type='frequency')
X_train = X[:7000]; y_train = y[:7000]
X_test = X[7000:]; y_test = y[7000:]

for i in range(3):
    top = tops[i]
    model = keras.Sequential((
        keras.layers.LSTM(15, return_sequences=True, input_shape=[None, 1]),
        keras.layers.LSTM(15, return_sequences=True),
        top
    ))
    model.compile(
        loss="mse",
        optimizer="adam",
    )
    model.fit(X_train, y_train, epochs=3)
    pred = model.predict(X_test)[:,-1].flatten()
    rmse = mean_squared_error(y_test, pred, squared=False)
    freq_rmse[i] = rmse
    

Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3


InvalidArgumentError:  Input to reshape is a tensor with 76 values, but the requested shape has 2432
	 [[node gradient_tape/sequential_20/time_distributed_27/norm_vector_8/norm/Reshape (defined at \AppData\Local\Temp\ipykernel_16208\579004451.py:24) ]] [Op:__inference_train_function_107161]

Function call stack:
train_function
