<a id='ExperimentalDataTransferLearningReplacingOutputLayerEnsembleModelsTop'></a>
# Fine Tune Ensemble of LSTM Networks on Experimental Data

Investigating whether starting with a model pretrained on simulated data, and using transfer learning, removing the output layer from the pretrained model and adding 4 more layers to train on the experimental dataset, improves performance

- Architecture, informed by limited [hyperparameter tuning](ExperimentalDataTransferLearningReplacingOutputLayerModelTuner.ipynb#ExperimentalDataTransferLearningReplacingOutputLayerModelTunerTop) with keras tuner, is:
    - input layer - 500
    - hidden layer - 5
    - hidden layer - 200
    - output layer - 2 (softplus activation, to ensure variance predictions are positive)
- batch size = 16
- Number of base learners in ensemble = 10
- 3:1:1 train:validation:test dataset split
- 1500 epochs

The model from the epoch which gives the best validation loss is saved for future use, for making predictions on the previously unseen experimental test set

Minimum loss is -ln(minimum_variance)/2 = -6.91 (for a minimum variance chosen to be 1e-6)

In [None]:
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
from LSTMutils import MeanVarianceLogLikelyhoodLoss
from sklearn.model_selection import train_test_split
import LSTMutils

# input parameters
NumEnsemble = 10
SequenceLength = 250
validation_split = 0.25
batch_size = 16
NumEpochs = 1500
test_split = 0.2

# set random seeds
np.random.seed(42)
tf.random.set_seed(42)

reconstructed_model = keras.models.load_model("../Models/SimulatedDataPretrainedModel",custom_objects={"MeanVarianceLogLikelyhoodLoss": MeanVarianceLogLikelyhoodLoss})
reconstructed_model.trainable = False

# read experimental dataset
ExperimentalData = LSTMutils.ExperimentalData(SequenceLength=SequenceLength)
unused, concentrations, df_data, unused = ExperimentalData.ReadData()

# split data into stratified train and test sets, size defined by the test_split variable
# the split will always be the same provided the data is in the same order, the same random_state is used,
# and strangely the labels used for stratification are always the same type (str is used here)
df_train, df_test = train_test_split(df_data, test_size=test_split, train_size=1-test_split, random_state=42, shuffle=True, stratify=concentrations)

# split data into stratified train and validation sets, size defined by the validation_split variable
train_concentrations = df_train.iloc[:,0]
df_train, df_val = train_test_split(df_train, test_size=validation_split, train_size=1-validation_split, random_state=42, shuffle=True, stratify=train_concentrations)

# normalise time series data
df_norm_train, df_norm_test, df_norm_val = ExperimentalData.NormalizeData(df_train,df_test,df_val)

# train NumEnsemble base learners, minimizing negative log likelyhood loss for mean and variance predictions
# implementation follows this work: doi.org/10.48550/arXiv.1612.01474
for i in range(NumEnsemble):
    
    # randomly shuffle data to achieve sufficient base learner diversity 
    df_norm_train = df_norm_train.sample(frac=1)
    df_norm_val = df_norm_val.sample(frac=1)
    
    # Define y as the last element in X, and ensure X and y are the correct shape
    X_train, y_train = ExperimentalData.Shape(df_norm_train)
    X_val, y_val = ExperimentalData.Shape(df_norm_val)
    
    print("model number " + str(i+1))
    model = keras.models.Sequential()
    
    # define network architecture, prepending the pretrained model without the output layer
    for layer in reconstructed_model.layers[:-1]:
            model.add(layer)
    for layer in model.layers:
            layer.trainable = False
    model.add(keras.layers.LSTM(500, return_sequences=True,stateful=False, name = "first_lstm_layer"))
    model.add(keras.layers.LSTM(5, return_sequences=True,stateful=False, name = "second_lstm_layer"))
    model.add(keras.layers.LSTM(200, return_sequences=True,stateful=False, name = "third_lstm_layer"))
    model.add(keras.layers.LSTM(2, activation='softplus',return_sequences=True,stateful=False, name = "output_layer"))
    
    # save the model at the epoch which gives the lowest loss predictions on the validataion dataset
    checkpoint_filepath = r"../Models/ExperimentalDataTransferLearningEnsembleModels/ExperimentalDataTransferLearningEnsembleModel" + str(i+1)
    model_checkpoint_callback = keras.callbacks.ModelCheckpoint(
        filepath=checkpoint_filepath,
        monitor='val_loss',
        mode='min',
        save_best_only=True)
        
    model.compile(optimizer="adam",loss = MeanVarianceLogLikelyhoodLoss)

    history = model.fit(X_train, y_train, batch_size=batch_size, validation_data=(X_val,y_val), epochs=NumEpochs, callbacks=[model_checkpoint_callback])

    # plot loss vs epochs
    Evaluation = LSTMutils.ModelTrainingEvaluation()
    Evaluation.PlotLossHistory(history)
    
    # load and evaluate the best model, in terms of validation loss
    bestModel = keras.models.load_model(checkpoint_filepath, custom_objects={"MeanVarianceLogLikelyhoodLoss": MeanVarianceLogLikelyhoodLoss})
    bestModel.evaluate(X_train, y_train, batch_size=batch_size)