## Future Work

For the best results, one should optimize the LSTM with the following hyperparameters: the neurons, layers, learning rate, batch size, and epochs. However, even randomized-search hyperparameter tuning is too taxing and time-intensive for my computer. In the future, I will perform hyperparameter tuning to better optimize the model. To reach robust performance results with non-parametric models, their respective hyperparameters must be optimized. Default hyperparameter settings cannot guarantee an optimal performance of machine-learning techniques, and additional attention should be directed to this critical step (Schratz et al., 2019). To do this, one should design their model to objectively search different values for model hyperparameters and choose a subset that results in a model that achieves the best performance on a given dataset (Brownlee 2020). Two common implementations for hyperparameter tuning are "randomized search" and "grid search." While grid search finds the more optimal hyperparameter, randomized search is far less taxing regarding time and processing power (Brownlee 2020). For LSTM, there are several hyperparameters to optimize.

**Number of LSTM Neurons:** Increasing the number of neurons per hidden layer will increase the complexity of the model. This will, in turn, improve accuracy, so long as it does not over-fit the data.

**Number of Layers:** Increasing the hidden layers will similarly increase how "deep" the model is. Past a few hidden layers, the model will become "blackboxed," and one will not be able to conclude meaningful relationships from it. However, this will again increase accuracy so long as it does not overfit the data.

**Learning Rate (alpha):** The learning rate is a tuning parameter in the gradient descent optimization algorithm. It determines the step size at each iteration while moving toward a minimum loss function. Increasing the learning rate will increase the model's training speed and convergence, but if it is too high, the algorithm will overstep and not converge.

**Batch Size:** Batches are the samples used during each epoch to update the model's weights. Smaller batch sizes can lead to faster convergence but might be less stable. Larger batch sizes may provide more stable updates but require more memory. Experiment with different batch sizes to find the optimal balance.

**Number of Epochs (Not Tuned):** The number of epochs determines how many times the model is trained with one forward and one backward propagation. Too few epochs lead to underfitting; likewise, too many lead to overfitting. These models are trained with the number of epochs only set as the upper bound. Ideally, the model stops itself when it reaches a minimum MSE.

In [None]:
# split the data in training and testing sets
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_dict['upstream_padding'], y, test_size=0.2, random_state=1, shuffle=True)
train_test['upstream_padding_tuned'] = {'X_train': X_train, 'X_test': X_test, 'y_train': y_train, 'y_test': y_test}

In [None]:
# Define the random search parameters
param_distributions = {
    'neurons': [32, 64, 128],
    'layers': [1, 2, 3],
    'learning_rate': [0.001, 0.01, 0.1],
    'batch_size': [16, 32, 64],
    'epochs': [50, 100, 150]
}

epochs = 100

In [None]:
from scikeras.wrappers import KerasRegressor
from sklearn.model_selection import RandomizedSearchCV
from keras.callbacks import Callback

class LossHistory(Callback):
    def __init__(self):
        super().__init__()
        self.lowest_loss = float('inf')

    def on_epoch_end(self, epoch, logs=None):
        current_loss = logs.get('loss')
        if current_loss < self.lowest_loss:
            self.lowest_loss = current_loss
        print(f"Epoch {epoch+1}: Current Loss = {current_loss}, Lowest Loss = {self.lowest_loss}")

# Function to create model
def create_model(neurons=64, layers=1, learning_rate=0.001):
    model = Sequential()
    for _ in range(layers):
        model.add(LSTM(neurons, input_shape=X_dict['upstream_padding_tuned'].shape[1:], return_sequences=True if layers > 1 else False))
    model.add(Dense(1, activation='linear'))
    optimizer = Adam(learning_rate=learning_rate)
    model.compile(optimizer=optimizer, loss='mean_squared_error')
    return model

# Define EarlyStopping and LossHistory
early_stopping = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
loss_history = LossHistory()

# Create KerasRegressor for RandomizedSearchCV
models['upstream_padding_tuned'] = KerasRegressor(layers=1, learning_rate=0.001, neurons=32, build_fn=create_model, epochs=epochs, batch_size=32, verbose=0)

# Randomized search
grid = RandomizedSearchCV(estimator=models['upstream_padding_tuned'], param_distributions=param_distributions, n_jobs=-1, cv=3)
grid_result = grid.fit(train_test['upstream_padding_tuned']['X_train'], train_test['upstream_padding_tuned']['y_train'], callbacks=[early_stopping, loss_history])

# Summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))