# **Block-Rocket**

The benchmark workbook testing CNN-ResBiGRU found that there is almost no performance improvement when using more than 1 ResBiGRU block. In fact, when using 3 ResBiGRU blocks, the performance on the validation data actually worsened. Therefore, in the experiments below, we use 1 ResBiGRU block unless stated otherwise.

## **Initialisation**

In [1]:
pip install sktime==0.31.0

Collecting sktime==0.31.0
  Downloading sktime-0.31.0-py3-none-any.whl.metadata (31 kB)
Collecting scikit-base<0.9.0,>=0.6.1 (from sktime==0.31.0)
  Downloading scikit_base-0.8.3-py3-none-any.whl.metadata (8.5 kB)
Downloading sktime-0.31.0-py3-none-any.whl (24.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.0/24.0 MB[0m [31m70.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading scikit_base-0.8.3-py3-none-any.whl (136 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m136.2/136.2 kB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: scikit-base, sktime
Successfully installed scikit-base-0.8.3 sktime-0.31.0


In [2]:
import os
import numpy as np
import pickle
import itertools
from tqdm import tqdm
import tensorflow as tf
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.decomposition import PCA
from sktime.transformations.panel.rocket import MiniRocketMultivariate
from keras.models import Model
from keras.losses import CategoricalCrossentropy
from keras.metrics import Accuracy
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping, ReduceLROnPlateau
from tensorflow.keras.models import load_model
from tensorflow.keras.regularizers import l2
from keras.layers import Input, Layer, Concatenate, Lambda, Conv1D, MaxPool1D, ReLU, BatchNormalization, LayerNormalization, Dropout, Add, Dense, GlobalMaxPooling1D, Bidirectional, GRU

np.random.seed(123)

Dask dataframe query planning is disabled because dask-expr is not installed.

You can install it with `pip install dask[dataframe]` or `conda install dask`.
This will raise in a future version.



In [3]:
#You MUST run this command before reading in any data from Google Drive
from google.colab import files
from google.colab import drive
import pandas as pd
drive.mount('/content/drive', force_remount=True)
os.chdir('/content/drive/My Drive/Colab Notebooks/Thesis/experiments')

%run ../sys_configs.ipynb
%run ../plots.ipynb

Mounted at /content/drive


In [4]:
with open('../data/train.npy', 'rb') as f:
    x_train = np.load(f)
    y_train = np.load(f).astype(np.int64)
sz, dim = x_train.shape[1:]

with open('../data/val.npy', 'rb') as f:
    x_val = np.load(f)
    y_val = np.load(f).astype(np.int64)

with open('../data/test.npy', 'rb') as f:
    x_test = np.load(f)
    y_test = np.load(f).astype(np.int64)

classes = np.unique(y_train)

N_train = len(y_train)
N_val = len(y_val)
N_test = len(y_test)

In [5]:
# Convert the labels to tensors
train_labels_tf = tf.one_hot(y_train, 31, dtype=tf.float32)
val_labels_tf = tf.one_hot(y_val, 31, dtype=tf.float32)
test_labels_tf = tf.one_hot(y_test, 31, dtype=tf.float32)

In [6]:
C = len(set(y_train)) # Number of classes

#### **Prepare train & validation datasets**

We first produce the raw and first order differenced datasets.

In [7]:
# Convert the raw dataset to tensors
train_raw_tf = tf.convert_to_tensor(x_train[:, :-1, :], dtype=tf.float32)
val_raw_tf = tf.convert_to_tensor(x_val[:, :-1, :], dtype=tf.float32)
test_raw_tf = tf.convert_to_tensor(x_test[:, :-1, :], dtype=tf.float32)

In [8]:
# Compute the first order differenced dataset
x_train_diff = np.diff(x_train, axis = 1)
x_val_diff = np.diff(x_val, axis = 1)
x_test_diff = np.diff(x_test, axis = 1)

In [9]:
# Convert the first order differenced dataset to tensors
train_diff_tf = tf.convert_to_tensor(x_train_diff, dtype=tf.float32)
val_diff_tf = tf.convert_to_tensor(x_val_diff, dtype=tf.float32)
test_diff_tf = tf.convert_to_tensor(x_test_diff, dtype=tf.float32)

#### **Prepare Rocket transformation train & validation datasets**

We use the *sktime* MiniRocket implementation, which is the same implementation introduced in the MiniRocket paper.

In [10]:
# Transpose the train and validation data as the format needs to be N x D x (T - 1)
x_train_ = x_train[:, :-1, :].transpose((0, 2, 1))
x_val_ = x_val[:, :-1, :].transpose((0, 2, 1))
x_test_ = x_test[:, :-1, :].transpose((0, 2, 1))

x_train_diff_ = x_train_diff.transpose((0, 2, 1))
x_val_diff_ = x_val_diff.transpose((0, 2, 1))
x_test_diff_ = x_test_diff.transpose((0, 2, 1))

In [11]:
# Compute the MiniRocket transform and transform to tensors
minirocket_multi = MiniRocketMultivariate(num_kernels = 10000, max_dilations_per_kernel = 32)
minirocket_multi.fit(x_train_)

train_rocket_np = minirocket_multi.transform(x_train_)
val_rocket_np = minirocket_multi.transform(x_val_)
test_rocket_np = minirocket_multi.transform(x_test_)

train_rocket_tf = tf.convert_to_tensor(train_rocket_np, dtype = tf.float32)
val_rocket_tf = tf.convert_to_tensor(val_rocket_np, dtype = tf.float32)
test_rocket_tf = tf.convert_to_tensor(test_rocket_np, dtype = tf.float32)

train_rocket_diff_np = minirocket_multi.transform(x_train_diff_)
val_rocket_diff_np = minirocket_multi.transform(x_val_diff_)
test_rocket_diff_np = minirocket_multi.transform(x_test_diff_)

train_rocket_diff_tf = tf.convert_to_tensor(train_rocket_diff_np, dtype = tf.float32)
val_rocket_diff_tf = tf.convert_to_tensor(val_rocket_diff_np, dtype = tf.float32)
test_rocket_diff_tf = tf.convert_to_tensor(test_rocket_diff_np, dtype = tf.float32)

#### **Prepare PCA transformation train & validation datasets**

In [12]:
pca_rocket = PCA(n_components=128)
pca_rocket.fit(train_rocket_np)

pca_diff_rocket = PCA(n_components=128)
pca_diff_rocket.fit(train_rocket_diff_np)

In [13]:
train_pca_np = pca_rocket.transform(train_rocket_np)
val_pca_np = pca_rocket.transform(val_rocket_np)
test_pca_np = pca_rocket.transform(test_rocket_np)

train_pca_diff_np = pca_diff_rocket.transform(train_rocket_diff_np)
val_pca_diff_np = pca_diff_rocket.transform(val_rocket_diff_np)
test_pca_diff_np = pca_diff_rocket.transform(test_rocket_diff_np)

In [14]:
train_pca_tf = tf.convert_to_tensor(train_pca_np, dtype = tf.float32)
val_pca_tf = tf.convert_to_tensor(val_pca_np, dtype = tf.float32)
test_pca_tf = tf.convert_to_tensor(test_pca_np, dtype = tf.float32)

train_pca_diff_tf = tf.convert_to_tensor(train_pca_diff_np, dtype = tf.float32)
val_pca_diff_tf = tf.convert_to_tensor(val_pca_diff_np, dtype = tf.float32)
test_pca_diff_tf = tf.convert_to_tensor(test_pca_diff_np, dtype = tf.float32)

#### **Combine input datasets**

We now prepare different Tensorflow datasets for the experiments in this workbook.

**Dataset 1: Raw Time Series**

In [15]:
train_raw_ds = tf.data.Dataset.from_tensor_slices((train_raw_tf, train_labels_tf))
val_raw_ds = tf.data.Dataset.from_tensor_slices((val_raw_tf, val_labels_tf))
test_raw_ds = tf.data.Dataset.from_tensor_slices((test_raw_tf, test_labels_tf))

train_raw_ds = train_raw_ds.shuffle(500)

train_raw_ds = train_raw_ds.padded_batch(64)
val_raw_ds = val_raw_ds.padded_batch(64)
test_raw_ds = test_raw_ds.padded_batch(64)

**Dataset 2: Differenced time series**

In [16]:
train_diff_ds = tf.data.Dataset.from_tensor_slices((train_diff_tf, train_labels_tf))
val_diff_ds = tf.data.Dataset.from_tensor_slices((val_diff_tf, val_labels_tf))
test_diff_ds = tf.data.Dataset.from_tensor_slices((test_diff_tf, test_labels_tf))

train_diff_ds = train_diff_ds.shuffle(500)

train_diff_ds = train_diff_ds.padded_batch(64)
val_diff_ds = val_diff_ds.padded_batch(64)
test_diff_ds = test_diff_ds.padded_batch(64)

**Dataset 3: Raw + Differenced Time Series**

In [17]:
train_rd_ds = tf.data.Dataset.from_tensor_slices((train_raw_tf, train_diff_tf, train_labels_tf))
val_rd_ds = tf.data.Dataset.from_tensor_slices((val_raw_tf, val_diff_tf, val_labels_tf))
test_rd_ds = tf.data.Dataset.from_tensor_slices((test_raw_tf, test_diff_tf, test_labels_tf))

# Map function to process the dataset elements
def map_rd(raw, diff, label):
    return {"raw": raw, "diff": diff}, label

# Apply the mapping function
train_rd_ds = train_rd_ds.map(map_rd)
val_rd_ds = val_rd_ds.map(map_rd)
test_rd_ds = test_rd_ds.map(map_rd)

train_rd_ds = train_rd_ds.shuffle(500)

train_rd_ds = train_rd_ds.padded_batch(64)
val_rd_ds = val_rd_ds.padded_batch(64)
test_rd_ds = test_rd_ds.padded_batch(64)

**Dataset 4: Raw + Rocket PCA Transformation Time Series**

In [18]:
train_rr_ds = tf.data.Dataset.from_tensor_slices((train_raw_tf, train_pca_tf, train_labels_tf))
val_rr_ds = tf.data.Dataset.from_tensor_slices((val_raw_tf, val_pca_tf, val_labels_tf))
test_rr_ds = tf.data.Dataset.from_tensor_slices((test_raw_tf, test_pca_tf, test_labels_tf))

# Map function to process the dataset elements
def map_rr(raw, pca, label):
    return {"raw": raw, "pca": pca}, label

# Apply the mapping function
train_rr_ds = train_rr_ds.map(map_rr)
val_rr_ds = val_rr_ds.map(map_rr)
test_rr_ds = test_rr_ds.map(map_rr)

train_rr_ds = train_rr_ds.shuffle(500)

train_rr_ds = train_rr_ds.padded_batch(64)
val_rr_ds = val_rr_ds.padded_batch(64)
test_rr_ds = test_rr_ds.padded_batch(64)

**Dataset 5: Raw + Differenced + Rocket PCA Transformation Time Series**

In [19]:
train_all_ds = tf.data.Dataset.from_tensor_slices((train_raw_tf, train_diff_tf, train_pca_tf, train_pca_diff_tf, train_labels_tf))
val_all_ds = tf.data.Dataset.from_tensor_slices((val_raw_tf, val_diff_tf, val_pca_tf, val_pca_diff_tf, val_labels_tf))
test_all_ds = tf.data.Dataset.from_tensor_slices((test_raw_tf, test_diff_tf, test_pca_tf, test_pca_diff_tf, test_labels_tf))

# Map function to process the dataset elements
def map_all(raw, diff, pca, pca_diff, label):
    return {"raw": raw, "diff": diff, "pca": pca, "pca_diff": pca_diff}, label

# Apply the mapping function
train_all_ds = train_all_ds.map(map_all)
val_all_ds = val_all_ds.map(map_all)
test_all_ds = test_all_ds.map(map_all)

train_all_ds = train_all_ds.shuffle(500)

train_all_ds = train_all_ds.padded_batch(64)
val_all_ds = val_all_ds.padded_batch(64)
test_all_ds = test_all_ds.padded_batch(64)

## **Set up standard CNN-ResBiGRU blocks**

In [20]:
class ConvBlock(Layer):

    def __init__(self, num_filters, **kwargs):
        super().__init__(**kwargs)
        self.num_filters = num_filters

    def build(self, input_shape): # Tensorflow calls this method automatically when the object is defined
        self.conv = Conv1D(self.num_filters, kernel_size=10, strides = 1, padding="same")
        self.batch_norm = BatchNormalization()
        self.max_pool = MaxPool1D(pool_size=3, strides=1, padding="same")
        self.dropout = Dropout(0.25)

    def call(self, input):
        x = self.conv(input)
        x = self.batch_norm(x)
        x = self.max_pool(x)
        output = self.dropout(x)
        return output

In [21]:
class ResBiGRU(Layer):

    def __init__(self, h1_units, h2_units, **kwargs):
        super().__init__(**kwargs)
        self.h1_units = h1_units
        self.h2_units = h2_units

    def build(self, input_shape): # Tensorflow calls this method automatically when the object is defined
        self.gru_1 =  Bidirectional(GRU(self.h1_units, activation = None, return_sequences=True), merge_mode=None)
        self.gru_2a = GRU(self.h2_units, activation = None, return_sequences=True)
        self.gru_2b = GRU(self.h2_units, activation = None, return_sequences=True)
        self.layer_norm = LayerNormalization()

    def call(self, input):
        # In the first (hidden) RNN layer, apply the forward and backward GRU layers concurrently
        z_forward, z_backward = self.gru_1(input)

        # In the second (hidden) RNN layer, apply the forward and backward GRU layers separately
        z2_forward = self.gru_2a(z_forward)
        z2_backward = self.gru_2b(z_backward)

        # Add the output of the first RNN layer to the output of the second RNN layer
        z_forward = Add()([z_forward, z2_forward])
        z_backward = Add()([z_backward, z2_backward])

        z_forward = self.layer_norm(z_forward)
        z_backward = self.layer_norm(z_backward)

        output = tf.concat([z_forward, z_backward], axis = 2)
        return output

In [22]:
# We use early stopping as CNNResBiGRU is a high parameter network, so requires many epochs to train.
earlystopping = EarlyStopping(monitor='val_accuracy', patience=15)

# The previous benchmark paper also made use of ReduceLROnPlateau which reduces the lesrning rate when the loss plateaus.
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3, min_lr=10e-5)

## **Experiment 1: ResBiGRU (raw vs differenced vs raw + differenced time series)**

The purpose of this experiment is to determine whether the first order differenced time series improves the validation performance of the ResBiGRU model.

To combine the two channels, which run parallel, concatenate them before applying a Dense layer at the end.

#### **Experiment 1.1: Raw time series**

This is the same model per the benchmark workbook on CNN-ResBiGRU. Also, we already applied it to raw time series in the benchmark workbook, but do so here again for easy comparison.

In [31]:
def CNNResBiGRU(shape):
    block1_input_layer = Input(shape=shape)

    layer = ConvBlock(num_filters = 32, name = "ConvBlock")(block1_input_layer)

    # The architecture extensively factors in the gradient vanishing problem.
    layer = ResBiGRU(h1_units = 32, h2_units = 32, name = "ResBiGRU1")(layer)

    layer = GlobalMaxPooling1D()(layer)
    output_layer = Dense(C, activation="softmax")(layer)
    return Model(inputs=block1_input_layer, outputs=output_layer)

In [33]:
CNNResBiGRU_model = CNNResBiGRU(shape = (sz - 1, dim))
CNNResBiGRU_model.compile(optimizer=Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, clipvalue = 1.0), loss='categorical_crossentropy', metrics=['accuracy'])
history = CNNResBiGRU_model.fit(train_raw_ds, validation_data=val_raw_ds, epochs=25, verbose = 1, callbacks = [earlystopping, reduce_lr])

Epoch 1/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m73s[0m 236ms/step - accuracy: 0.1727 - loss: 3.2107 - val_accuracy: 0.4411 - val_loss: 1.9323 - learning_rate: 0.0010
Epoch 2/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 124ms/step - accuracy: 0.5040 - loss: 1.7296 - val_accuracy: 0.5668 - val_loss: 1.3701 - learning_rate: 0.0010
Epoch 3/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 126ms/step - accuracy: 0.6261 - loss: 1.2656 - val_accuracy: 0.6609 - val_loss: 1.0744 - learning_rate: 0.0010
Epoch 4/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 130ms/step - accuracy: 0.7267 - loss: 0.9660 - val_accuracy: 0.6774 - val_loss: 0.9315 - learning_rate: 0.0010
Epoch 5/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 136ms/step - accuracy: 0.7844 - loss: 0.7610 - val_accuracy: 0.7011 - val_loss: 0.8597 - learning_rate: 0.0010
Epoch 6/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1

In [36]:
CNNResBiGRU_results1_val = CNNResBiGRU_model.evaluate(val_raw_ds, batch_size=128)
CNNResBiGRU_results1_test = CNNResBiGRU_model.evaluate(test_raw_ds, batch_size=128)
print("Validation Loss: {}\nValidation Accuracy: {}".format(*CNNResBiGRU_results1_val))
print("Test Loss: {}\nTest Accuracy: {}".format(*CNNResBiGRU_results1_test))

[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 58ms/step - accuracy: 0.7286 - loss: 0.7559
[1m17/17[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 32ms/step - accuracy: 0.6912 - loss: 1.0092
Validation Loss: 0.7010752558708191
Validation Accuracy: 0.7349137663841248
Test Loss: 1.0123274326324463
Test Accuracy: 0.6983824968338013


In [37]:
predictions1_val = CNNResBiGRU_model.predict(val_raw_tf, batch_size=128)
predictions1_test = CNNResBiGRU_model.predict(test_raw_tf, batch_size=128)

predictions1_val =tf.argmax(predictions1_val, axis = 1)
predictions1_test = tf.argmax(predictions1_test, axis = 1)

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 191ms/step
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 136ms/step


In [38]:
# Compare predictions against the targets
print("Validation Data - Total predictions made: %s. Number of correct predictions: %s" % (len(predictions1_val), sum(np.equal(predictions1_val, y_val))))
print("Test Data - Total predictions made: %s. Number of correct predictions: %s" % (len(predictions1_test), sum(np.equal(predictions1_test, y_test))))

Validation Data - Total predictions made: 1392. Number of correct predictions: 1023
Test Data - Total predictions made: 1051. Number of correct predictions: 734


In [None]:
CNNResBiGRU_model.save('../models/BlockRocketExperiments/Experiment_1_1.keras')
with open('../models/BlockRocketExperiments/train_history.pkl', 'wb') as f:
    pickle.dump(history.history, f)

#### **Experiment 1.2: Differenced time series**

**Experiment 1.2.1: One ResBiGRU block**

In [39]:
CNNResBiGRU_model2 = CNNResBiGRU(shape = (sz - 1, dim)) # (sz - 1) as the data is differenced
CNNResBiGRU_model2.compile(optimizer=Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08), loss='categorical_crossentropy', metrics=['accuracy'])
history2 = CNNResBiGRU_model2.fit(train_diff_ds, validation_data=val_diff_ds, epochs=25, verbose = 1, callbacks = [earlystopping, reduce_lr])

Epoch 1/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m26s[0m 220ms/step - accuracy: 0.0493 - loss: 3.8918 - val_accuracy: 0.0395 - val_loss: 3.3224 - learning_rate: 0.0010
Epoch 2/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 125ms/step - accuracy: 0.1635 - loss: 2.9758 - val_accuracy: 0.0790 - val_loss: 3.3825 - learning_rate: 0.0010
Epoch 3/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 119ms/step - accuracy: 0.3082 - loss: 2.5351 - val_accuracy: 0.1228 - val_loss: 3.0560 - learning_rate: 0.0010
Epoch 4/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 126ms/step - accuracy: 0.4515 - loss: 2.0351 - val_accuracy: 0.2421 - val_loss: 2.3863 - learning_rate: 0.0010
Epoch 5/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 122ms/step - accuracy: 0.5626 - loss: 1.5935 - val_accuracy: 0.3333 - val_loss: 2.1512 - learning_rate: 0.0010
Epoch 6/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m

In [40]:
CNNResBiGRU_results2_val = CNNResBiGRU_model2.evaluate(val_diff_ds, batch_size=128)
CNNResBiGRU_results2_test = CNNResBiGRU_model2.evaluate(test_diff_ds, batch_size=128)
print("Validation Loss: {}\nValidation Accuracy: {}".format(*CNNResBiGRU_results2_val))
print("Test Loss: {}\nTest Accuracy: {}".format(*CNNResBiGRU_results2_test))


[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 31ms/step - accuracy: 0.5996 - loss: 1.1732
[1m17/17[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 91ms/step - accuracy: 0.5711 - loss: 1.3135
Validation Loss: 1.128129005432129
Validation Accuracy: 0.618534505367279
Test Loss: 1.315980315208435
Test Accuracy: 0.5842055082321167


In [41]:
predictions2_val = CNNResBiGRU_model2.predict(val_diff_tf, batch_size=128)
predictions2_test = CNNResBiGRU_model2.predict(test_diff_tf, batch_size=128)

predictions2_val =tf.argmax(predictions2_val, axis = 1)
predictions2_test = tf.argmax(predictions2_test, axis = 1)

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 159ms/step
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 95ms/step


In [42]:
# Compare predictions against the targets
print("Validation Data - Total predictions made: %s. Number of correct predictions: %s" % (len(predictions2_val), sum(np.equal(predictions2_val, y_val))))
print("Test Data - Total predictions made: %s. Number of correct predictions: %s" % (len(predictions2_test), sum(np.equal(predictions2_test, y_test))))

Validation Data - Total predictions made: 1392. Number of correct predictions: 861
Test Data - Total predictions made: 1051. Number of correct predictions: 614


In [None]:
CNNResBiGRU_model2.save('../models/BlockRocketExperiments/Experiment_1_2_1.keras')
with open('../models/BlockRocketExperiments/train_history2.pkl', 'wb') as f:
    pickle.dump(history2.history, f)

The model manages to pick up on a signal within the differenced time series, but the performance falls well short of that from the same model applied on the raw time series. Perhaps more ResBiGRU blocks are needed to extract information from the differenced time series.

**Experiment 1.2.2: Two ResBiGRU blocks**

This is identical to experiment 1.2.1 except for the addition of a second ResBiGRU block.

In [None]:
def CNNResBiGRU2(shape):
    block1_input_layer = Input(shape=shape)

    layer = ConvBlock(num_filters = 32, name = "ConvBlock")(block1_input_layer)

    # The architecture extensively factors in the gradient vanishing problem.
    layer = ResBiGRU(h1_units = 32, h2_units = 32, name = "ResBiGRU1")(layer)
    layer = ResBiGRU(h1_units = 32, h2_units = 32, name = "ResBiGRU2")(layer)

    layer = GlobalMaxPooling1D()(layer)
    output_layer = Dense(C, activation="softmax")(layer)
    return Model(inputs=block1_input_layer, outputs=output_layer)

In [None]:
CNNResBiGRU_model3 = CNNResBiGRU2(shape = (sz - 1, dim)) # (sz - 1) as the data is differenced
CNNResBiGRU_model3.compile(optimizer=Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08), loss='categorical_crossentropy', metrics=['accuracy'])
history3 = CNNResBiGRU_model3.fit(train_diff_ds, validation_data=val_diff_ds, epochs=25, verbose = 1)

Epoch 1/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m48s[0m 436ms/step - accuracy: 0.0595 - loss: 3.6771 - val_accuracy: 0.0316 - val_loss: 4.1204
Epoch 2/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 244ms/step - accuracy: 0.2151 - loss: 2.8445 - val_accuracy: 0.1149 - val_loss: 3.0403
Epoch 3/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 251ms/step - accuracy: 0.4704 - loss: 1.9549 - val_accuracy: 0.3161 - val_loss: 2.1713
Epoch 4/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 245ms/step - accuracy: 0.6684 - loss: 1.2453 - val_accuracy: 0.4612 - val_loss: 1.6708
Epoch 5/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 245ms/step - accuracy: 0.7932 - loss: 0.8285 - val_accuracy: 0.5395 - val_loss: 1.3715
Epoch 6/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 250ms/step - accuracy: 0.8569 - loss: 0.6015 - val_accuracy: 0.5496 - val_loss: 1.3895
Epoch 7/25
[1m68/68[

In [None]:
CNNResBiGRU_results3_val = CNNResBiGRU_model3.evaluate(val_diff_ds, batch_size=128)
CNNResBiGRU_results3_test = CNNResBiGRU_model3.evaluate(test_diff_ds, batch_size=128)
print("Validation Loss: {}\nValidation Accuracy: {}".format(*CNNResBiGRU_results3_val))
print("Test Loss: {}\nTest Accuracy: {}".format(*CNNResBiGRU_results3_test))

[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 63ms/step - accuracy: 0.6916 - loss: 1.3214
[1m17/17[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 134ms/step - accuracy: 0.6592 - loss: 1.4286
Validation Loss: 1.2244813442230225
Validation Accuracy: 0.6975574493408203
Test Loss: 1.3822401762008667
Test Accuracy: 0.6698382496833801


In [None]:
predictions3_val = CNNResBiGRU_model3.predict(val_diff_tf, batch_size=128)
predictions3_test = CNNResBiGRU_model3.predict(test_diff_tf, batch_size=128)

predictions3_val =tf.argmax(predictions3_val, axis = 1)
predictions3_test = tf.argmax(predictions3_test, axis = 1)

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 329ms/step
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 198ms/step


In [None]:
# Compare predictions against the targets
print("Validation Data - Total predictions made: %s. Number of correct predictions: %s" % (len(predictions3_val), sum(np.equal(predictions3_val, y_val))))
print("Test Data - Total predictions made: %s. Number of correct predictions: %s" % (len(predictions3_test), sum(np.equal(predictions3_test, y_test))))

Validation Data - Total predictions made: 1392. Number of correct predictions: 971
Test Data - Total predictions made: 1051. Number of correct predictions: 704


The accuracy is very close to the accuracy on the raw (not differenced) time series, but some information was lost during the differencing process, resulting in an overall slightly poorer performance.

In [None]:
CNNResBiGRU_model3.save('../models/BlockRocketExperiments/Experiment_1_2_2.keras')
with open('../models/BlockRocketExperiments/train_history3.pkl', 'wb') as f:
    pickle.dump(history3.history, f)

#### **Experiment 1.3: Raw + differenced time series**

These experiments are intended to test whether using both the raw and first order differenced time series improves the performance of the resulting model. In experiment 1.2.2, it was observed that a second ResBiGRU block  the differenced time series

**Experiment 1.3.1: Combining after first Convolutional block**

In [43]:
def CNNResBiGRU3(shape_raw, shape_diff):
    block1_input_layer = Input(shape=shape_raw, name="raw")
    block2_input_layer = Input(shape=shape_diff, name="diff")

    layer_raw = ConvBlock(num_filters = 32, name = "ConvBlockRaw")(block1_input_layer)
    layer_diff = ConvBlock(num_filters = 32, name = "ConvBlockDiff")(block2_input_layer)

    # Concatenate the raw and differenced channels
    layer = Concatenate(axis = 2)([layer_raw, layer_diff])

    # The ResBIGRU blocks is repeated several times; the architecture extensively factors in the gradient vanishing problem.
    layer = ResBiGRU(h1_units = 32, h2_units = 32, name = "ResBiGRU1")(layer)
    layer = ResBiGRU(h1_units = 32, h2_units = 32, name = "ResBiGRU2")(layer)

    layer = GlobalMaxPooling1D()(layer)
    output_layer = Dense(C, activation="softmax")(layer)
    return Model(inputs=[block1_input_layer, block2_input_layer], outputs=output_layer)

In [44]:
CNNResBiGRU_model3 = CNNResBiGRU3(shape_raw = (sz - 1, dim), shape_diff = (sz - 1, dim)) # (sz - 1) as the data is differenced
CNNResBiGRU_model3.summary()

In [45]:
CNNResBiGRU_model4 = CNNResBiGRU3(shape_raw = (sz - 1, dim), shape_diff = (sz - 1, dim))
CNNResBiGRU_model4.compile(optimizer=Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08), loss='categorical_crossentropy', metrics=['accuracy'])
history4 = CNNResBiGRU_model4.fit(train_rd_ds, validation_data=val_rd_ds, epochs=25, verbose = 1, callbacks = [earlystopping, reduce_lr])

Epoch 1/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m50s[0m 426ms/step - accuracy: 0.1941 - loss: 3.2244 - val_accuracy: 0.4382 - val_loss: 1.8027 - learning_rate: 0.0010
Epoch 2/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 236ms/step - accuracy: 0.6309 - loss: 1.4091 - val_accuracy: 0.5740 - val_loss: 1.3011 - learning_rate: 0.0010
Epoch 3/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 237ms/step - accuracy: 0.7539 - loss: 0.9187 - val_accuracy: 0.6774 - val_loss: 1.0542 - learning_rate: 0.0010
Epoch 4/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 237ms/step - accuracy: 0.8439 - loss: 0.6243 - val_accuracy: 0.6983 - val_loss: 0.9013 - learning_rate: 0.0010
Epoch 5/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 236ms/step - accuracy: 0.9059 - loss: 0.4232 - val_accuracy: 0.7076 - val_loss: 0.8448 - learning_rate: 0.0010
Epoch 6/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m 

In [46]:
CNNResBiGRU_results4 = CNNResBiGRU_model4.evaluate(val_rd_ds, batch_size=128)
print("Validation Loss: {}\nValidation Accuracy: {}".format(*CNNResBiGRU_results4))

[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 48ms/step - accuracy: 0.7546 - loss: 0.8251
Validation Loss: 0.7880142331123352
Validation Accuracy: 0.7622126340866089


In [47]:
CNNResBiGRU_results4_val = CNNResBiGRU_model4.evaluate(val_rd_ds, batch_size=128)
CNNResBiGRU_results4_test = CNNResBiGRU_model4.evaluate(test_rd_ds, batch_size=128)
print("Validation Loss: {}\nValidation Accuracy: {}".format(*CNNResBiGRU_results4_val))
print("Test Loss: {}\nTest Accuracy: {}".format(*CNNResBiGRU_results4_test))

[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 58ms/step - accuracy: 0.7546 - loss: 0.8251
[1m17/17[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 139ms/step - accuracy: 0.7234 - loss: 1.0950
Validation Loss: 0.7880142331123352
Validation Accuracy: 0.7622126340866089
Test Loss: 1.0641939640045166
Test Accuracy: 0.7250238060951233


In [50]:
predictions4_val = CNNResBiGRU_model4.predict((val_diff_tf, val_raw_tf), batch_size=128)
predictions4_test = CNNResBiGRU_model4.predict((test_diff_tf, test_raw_tf), batch_size=128)

predictions4_val =tf.argmax(predictions4_val, axis = 1)
predictions4_test = tf.argmax(predictions4_test, axis = 1)

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 56ms/step
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 43ms/step


In [51]:
# Compare predictions against the targets
print("Validation Data - Total predictions made: %s. Number of correct predictions: %s" % (len(predictions4_val), sum(np.equal(predictions4_val, y_val))))
print("Test Data - Total predictions made: %s. Number of correct predictions: %s" % (len(predictions4_test), sum(np.equal(predictions4_test, y_test))))

Validation Data - Total predictions made: 1392. Number of correct predictions: 1061
Test Data - Total predictions made: 1051. Number of correct predictions: 762


The closer a classifier gets towards a perfect classification rate, the harder to incrementally improve the classifier. This approach beats using the raw time series alone, indicating there is complementary information in the differenced time series that does not exist in the raw time series.

In [52]:
CNNResBiGRU_model4.save('../models/BlockRocketExperiments/Experiment_1_3_1.keras')
with open('../models/BlockRocketExperiments/train_history4.pkl', 'wb') as f:
    pickle.dump(history4.history, f)

**Experiment 1.3.2: Combining after ResBiGRU block**

In [53]:
def CNNResBiGRU4(shape_raw, shape_diff):
    T_raw, dim_raw = shape_raw

    block1_input_layer = Input(shape=shape_raw, name="raw")
    block2_input_layer = Input(shape=shape_diff, name="diff")

    layer_raw = ConvBlock(num_filters = 32, name = "ConvBlockRaw")(block1_input_layer)
    layer_diff = ConvBlock(num_filters = 32, name = "ConvBlockDiff")(block2_input_layer)

    # The ResBiGRU block is applied once to the raw time series. This was observed to be optimal in the CNN-ResBiGRU workbook.
    layer_raw = ResBiGRU(h1_units = 32, h2_units = 32, name = "ResBiGRU1")(layer_raw)

    # The ResBIGRU block is applied twice to the differenced time series. This was observed to be optimal during experiments 1.2.1 and 1.2.2
    layer_diff = ResBiGRU(h1_units = 32, h2_units = 32, name = "ResBiGRU2")(layer_diff)
    layer_diff = ResBiGRU(h1_units = 32, h2_units = 32, name = "ResBiGRU3")(layer_diff)

    # Concatenate the raw and differenced channels
    layer = Concatenate(axis = 2)([layer_raw, layer_diff])

    layer = GlobalMaxPooling1D()(layer)
    output_layer = Dense(C, activation="softmax")(layer)
    return Model(inputs=[block1_input_layer, block2_input_layer], outputs=output_layer)

In [54]:
CNNResBiGRU_model5 = CNNResBiGRU4(shape_raw = (sz - 1, dim), shape_diff = (sz - 1, dim))
CNNResBiGRU_model5.compile(optimizer=Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08), loss='categorical_crossentropy', metrics=['accuracy'])
history5 = CNNResBiGRU_model5.fit(train_rd_ds, validation_data=val_rd_ds, epochs=25, verbose = 1)

Epoch 1/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m81s[0m 737ms/step - accuracy: 0.1915 - loss: 3.5210 - val_accuracy: 0.4483 - val_loss: 1.8381
Epoch 2/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m45s[0m 387ms/step - accuracy: 0.5585 - loss: 1.4810 - val_accuracy: 0.5769 - val_loss: 1.3425
Epoch 3/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 357ms/step - accuracy: 0.7304 - loss: 0.9589 - val_accuracy: 0.6674 - val_loss: 1.0245
Epoch 4/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m24s[0m 353ms/step - accuracy: 0.8244 - loss: 0.6521 - val_accuracy: 0.6976 - val_loss: 0.8164
Epoch 5/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 371ms/step - accuracy: 0.9055 - loss: 0.4289 - val_accuracy: 0.6968 - val_loss: 0.8025
Epoch 6/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 357ms/step - accuracy: 0.9375 - loss: 0.3107 - val_accuracy: 0.7205 - val_loss: 0.6888
Epoch 7/25
[1m68/68[

In [55]:
CNNResBiGRU_results5_val = CNNResBiGRU_model5.evaluate(val_rd_ds, batch_size=128)
CNNResBiGRU_results5_test = CNNResBiGRU_model5.evaluate(test_rd_ds, batch_size=128)
print("Validation Loss: {}\nValidation Accuracy: {}".format(*CNNResBiGRU_results5_val))
print("Test Loss: {}\nTest Accuracy: {}".format(*CNNResBiGRU_results5_test))

[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 68ms/step - accuracy: 0.7108 - loss: 0.8613
[1m17/17[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 210ms/step - accuracy: 0.7240 - loss: 0.9968
Validation Loss: 0.8295275568962097
Validation Accuracy: 0.7313218116760254
Test Loss: 0.9738443493843079
Test Accuracy: 0.7250238060951233


In [56]:
predictions5_val = CNNResBiGRU_model5.predict((val_diff_tf, val_raw_tf), batch_size=128)
predictions5_test = CNNResBiGRU_model5.predict((test_diff_tf, test_raw_tf), batch_size=128)

predictions5_val =tf.argmax(predictions5_val, axis = 1)
predictions5_test = tf.argmax(predictions5_test, axis = 1)

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 438ms/step
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 286ms/step


In [57]:
# Compare predictions against the targets
print("Validation Data - Total predictions made: %s. Number of correct predictions: %s" % (len(predictions5_val), sum(np.equal(predictions5_val, y_val))))
print("Test Data - Total predictions made: %s. Number of correct predictions: %s" % (len(predictions5_test), sum(np.equal(predictions5_test, y_test))))

Validation Data - Total predictions made: 1392. Number of correct predictions: 1018
Test Data - Total predictions made: 1051. Number of correct predictions: 762


This has the same performance as the model which combined the two streams after the first convolutional block. This may be because this current model has a greater number of parameters (an extra ResBiGRU block) and is unable to generalise to unseen validation data as effectively.

In [None]:
CNNResBiGRU_model5.save('../models/BlockRocketExperiments/Experiment_1_3_2.keras')
with open('../models/BlockRocketExperiments/train_history5.pkl', 'wb') as f:
    pickle.dump(history5.history, f)

## **Experiment 2: ResBiGRU (zero vs two preceding convolutional blocks)**

In this set of experiments, we do not need to test the case of exactly 1 preceding convolutional block as this was tested in experiment 1.1. We are interested in determining the effect changing the number of preceding convolutional blocks has on the generalisation ability of the model.

#### **Experiment 2.1: 0 preceding convolutional blocks**

In [59]:
def CNNResBiGRU5(shape):
    block1_input_layer = Input(shape=shape)

    # The architecture extensively factors in the gradient vanishing problem.
    layer = ResBiGRU(h1_units = 32, h2_units = 32, name = "ResBiGRU1")(block1_input_layer)

    layer = GlobalMaxPooling1D()(layer)
    output_layer = Dense(C, activation="softmax")(layer)
    return Model(inputs=block1_input_layer, outputs=output_layer)

In [61]:
CNNResBiGRU_model6 = CNNResBiGRU5(shape = (sz - 1, dim))
CNNResBiGRU_model6.compile(optimizer=Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08), loss='categorical_crossentropy', metrics=['accuracy'])
history6 = CNNResBiGRU_model6.fit(train_raw_ds, validation_data=val_raw_ds, epochs=25, verbose = 1, callbacks = [earlystopping, reduce_lr])

Epoch 1/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 199ms/step - accuracy: 0.1270 - loss: 3.3563 - val_accuracy: 0.4282 - val_loss: 1.9373 - learning_rate: 0.0010
Epoch 2/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 115ms/step - accuracy: 0.4899 - loss: 1.8049 - val_accuracy: 0.5812 - val_loss: 1.4188 - learning_rate: 0.0010
Epoch 3/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 107ms/step - accuracy: 0.6132 - loss: 1.3542 - val_accuracy: 0.6164 - val_loss: 1.1829 - learning_rate: 0.0010
Epoch 4/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 116ms/step - accuracy: 0.6879 - loss: 1.0885 - val_accuracy: 0.6444 - val_loss: 1.0282 - learning_rate: 0.0010
Epoch 5/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 113ms/step - accuracy: 0.7732 - loss: 0.8636 - val_accuracy: 0.6997 - val_loss: 0.9005 - learning_rate: 0.0010
Epoch 6/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1

In [62]:
CNNResBiGRU_results6 = CNNResBiGRU_model6.evaluate(val_raw_ds, batch_size=128)
print("Validation Loss: {}\nValidation Accuracy: {}".format(*CNNResBiGRU_results6))

[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 27ms/step - accuracy: 0.6919 - loss: 0.7848
Validation Loss: 0.7436817288398743
Validation Accuracy: 0.7140804529190063


The validation accuracy is lower than with a single convolutional block, indicating that the convolutional block was necessary for feature extraction for the ResBiGRU block.

#### **Experiment 2.2: 2 preceding convolutional blocks**

In [63]:
def CNNResBiGRU6(shape):
    block1_input_layer = Input(shape=shape)

    layer = ConvBlock(num_filters = 32, name = "ConvBlock")(block1_input_layer)
    layer = ConvBlock(num_filters = 32, name = "ConvBlock2")(layer)

    # The architecture extensively factors in the gradient vanishing problem.
    layer = ResBiGRU(h1_units = 32, h2_units = 32, name = "ResBiGRU1")(layer)

    layer = GlobalMaxPooling1D()(layer)
    output_layer = Dense(C, activation="softmax")(layer)
    return Model(inputs=block1_input_layer, outputs=output_layer)

In [64]:
CNNResBiGRU_model7 = CNNResBiGRU6(shape = (sz - 1, dim))
CNNResBiGRU_model7.compile(optimizer=Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08), loss='categorical_crossentropy', metrics=['accuracy'])
history7 = CNNResBiGRU_model7.fit(train_raw_ds, validation_data=val_raw_ds, epochs=25, verbose = 1)

Epoch 1/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m31s[0m 260ms/step - accuracy: 0.1737 - loss: 3.3602 - val_accuracy: 0.4267 - val_loss: 1.8814
Epoch 2/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 127ms/step - accuracy: 0.4715 - loss: 1.7292 - val_accuracy: 0.6207 - val_loss: 1.3576
Epoch 3/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 119ms/step - accuracy: 0.6030 - loss: 1.2930 - val_accuracy: 0.5876 - val_loss: 1.1713
Epoch 4/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 130ms/step - accuracy: 0.6723 - loss: 1.0553 - val_accuracy: 0.6602 - val_loss: 1.0314
Epoch 5/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 131ms/step - accuracy: 0.7555 - loss: 0.8387 - val_accuracy: 0.6638 - val_loss: 0.9517
Epoch 6/25
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 126ms/step - accuracy: 0.7988 - loss: 0.6940 - val_accuracy: 0.7033 - val_loss: 0.8520
Epoch 7/25
[1m68/68[0m

In [67]:
CNNResBiGRU_results7_val = CNNResBiGRU_model7.evaluate(val_raw_ds, batch_size=128)
CNNResBiGRU_results7_test = CNNResBiGRU_model7.evaluate(test_raw_ds, batch_size=128)
print("Validation Loss: {}\nValidation Accuracy: {}".format(*CNNResBiGRU_results7_val))
print("Test Loss: {}\nTest Accuracy: {}".format(*CNNResBiGRU_results7_test))

[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 33ms/step - accuracy: 0.7586 - loss: 0.7825
[1m17/17[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 85ms/step - accuracy: 0.6812 - loss: 1.1293
Validation Loss: 0.7063380479812622
Validation Accuracy: 0.7794540524482727
Test Loss: 1.1214402914047241
Test Accuracy: 0.6850618720054626


In [68]:
predictions7_val = CNNResBiGRU_model7.predict(val_raw_tf, batch_size=128)
predictions7_test = CNNResBiGRU_model7.predict(test_raw_tf, batch_size=128)

predictions7_val =tf.argmax(predictions7_val, axis = 1)
predictions7_test = tf.argmax(predictions7_test, axis = 1)

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 225ms/step
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 245ms/step


In [69]:
# Compare predictions against the targets
print("Validation Data - Total predictions made: %s. Number of correct predictions: %s" % (len(predictions7_val), sum(np.equal(predictions7_val, y_val))))
print("Test Data - Total predictions made: %s. Number of correct predictions: %s" % (len(predictions7_test), sum(np.equal(predictions7_test, y_test))))

Validation Data - Total predictions made: 1392. Number of correct predictions: 1085
Test Data - Total predictions made: 1051. Number of correct predictions: 720


We can conclude that 2 convolutional blocks beats a single convolutional block.

## **Experiment 3: ResBiGRU (raw + differenced time series weighted by Rocket inputs)**

So far in experiments 1 and 2, we have shown incorporating information from both the raw time series and the differenced time series can improve the classification performance. In this experiment, we want to explore whether the extracted Rocket features might further improve classification performance, and if so, how best to integrate them into the model. We will consider the case where we intgrate both the raw time series and the differenced time series and we will consider the case where we only integrate the raw time series.

#### **Experiment 3.1: Integrating after GlobalMaxPool layer (raw time series only)**

An obvious place to integrate the extracted Rocket features, which do not have a time dimension, is after the time dimension is removed from the CNN-ResBiGRU network. The integration can be directly in a concatenation or in a two step process, where the most important Rocket features are extracted in the first step, then merged with the main network in the second step.

In this experiment, we apply batch normalisation to both the component acting on the raw time series and the component acting on the Rocket.

In a since discarded experiment, we consider concatenating the two components and adding an additional fully connected layer on the combined fields.

In the experiment shown below, we follow up on the Rocket batch normalisation with a fully connected layer of 64 dimensions, in order to condense the 9,996 Rocket features into a more compact representation. This replaced the additional fully connected layer on the combined fields.

In [72]:
def RocketCNNResBiGRU(ts_raw_shape, pca_raw_shape):
    ts_raw_input = Input(shape=ts_raw_shape, name = "raw")
    pca_raw_input = Input(shape=pca_raw_shape, name = "pca")

    ts_layer = ConvBlock(num_filters = 32, name = "ConvBlock")(ts_raw_input)
    ts_layer = ConvBlock(num_filters = 32, name = "ConvBlock2")(ts_layer)
    ts_layer = ResBiGRU(h1_units = 32, h2_units = 32, name = "ResBiGRU1")(ts_layer)
    ts_layer = GlobalMaxPooling1D()(ts_layer)

    pca_layer = BatchNormalization()(pca_raw_input)
    pca_layer = Dense(64, activation="sigmoid")(pca_layer) # Compress the information into 32 dimensions

    combined_layer = Concatenate(axis = 1)([pca_layer, ts_layer])

    output_layer = Dense(C, activation="softmax")(combined_layer)
    return Model(inputs=[pca_raw_input, ts_raw_input], outputs=output_layer)

In [73]:
RocketCNNResBiGRU_model = RocketCNNResBiGRU(ts_raw_shape = (sz - 1, dim), pca_raw_shape = (128,))
RocketCNNResBiGRU_model.summary()

In [75]:
RocketCNNResBiGRU_model = RocketCNNResBiGRU(ts_raw_shape = (sz - 1, dim), pca_raw_shape = (128,))
RocketCNNResBiGRU_model.compile(optimizer=Adam(learning_rate=0.001, beta_1=0.90, beta_2=0.999, epsilon=1e-08), loss='categorical_crossentropy', metrics=['accuracy'])
historyR1 = RocketCNNResBiGRU_model.fit(train_rr_ds, validation_data=val_rr_ds, epochs=50, verbose = 1)

Epoch 1/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 260ms/step - accuracy: 0.2106 - loss: 3.1160 - val_accuracy: 0.5172 - val_loss: 1.7609
Epoch 2/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 127ms/step - accuracy: 0.6123 - loss: 1.5321 - val_accuracy: 0.6329 - val_loss: 1.2299
Epoch 3/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 130ms/step - accuracy: 0.7333 - loss: 1.0348 - val_accuracy: 0.6782 - val_loss: 0.9471
Epoch 4/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 122ms/step - accuracy: 0.8143 - loss: 0.7341 - val_accuracy: 0.7112 - val_loss: 0.8359
Epoch 5/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 121ms/step - accuracy: 0.8795 - loss: 0.5417 - val_accuracy: 0.7292 - val_loss: 0.7242
Epoch 6/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 129ms/step - accuracy: 0.9189 - loss: 0.4026 - val_accuracy: 0.7428 - val_loss: 0.6768
Epoch 7/50
[1m68/68[0

In [77]:
RocketCNNResBiGRU_results_val = RocketCNNResBiGRU_model.evaluate(val_rr_ds, batch_size=128)
RocketCNNResBiGRU_results_test = RocketCNNResBiGRU_model.evaluate(test_rr_ds, batch_size=128)
print("Validation Loss: {}\nValidation Accuracy: {}".format(*RocketCNNResBiGRU_results_val))
print("Test Loss: {}\nTest Accuracy: {}".format(*RocketCNNResBiGRU_results_test))

[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 34ms/step - accuracy: 0.7759 - loss: 0.7295
[1m17/17[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 84ms/step - accuracy: 0.7722 - loss: 0.8387
Validation Loss: 0.6660051345825195
Validation Accuracy: 0.790229856967926
Test Loss: 0.8363613486289978
Test Accuracy: 0.7621312737464905


In [85]:
predictions_combined_val = RocketCNNResBiGRU_model.predict((val_pca_tf, val_raw_tf), batch_size=128)
predictions_combined_test = RocketCNNResBiGRU_model.predict((test_pca_tf, test_raw_tf), batch_size=128)

predictions_combined_val =tf.argmax(predictions_combined_val, axis = 1)
predictions_combined_test = tf.argmax(predictions_combined_test, axis = 1)

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 179ms/step
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 119ms/step


In [86]:
# Compare predictions against the targets
print("Validation Data - Total predictions made: %s. Number of correct predictions: %s" % (len(predictions_combined_val), sum(np.equal(predictions_combined_val, y_val))))
print("Test Data - Total predictions made: %s. Number of correct predictions: %s" % (len(predictions_combined_test), sum(np.equal(predictions_combined_test, y_test))))

Validation Data - Total predictions made: 1392. Number of correct predictions: 1100
Test Data - Total predictions made: 1051. Number of correct predictions: 801


This experiment ended in success. The validation accuracy beat the accuracy of the best performing model so far by 1%.

#### **Experiment 3.2: Breaking experiment 3.1 into a component-wise training process**

Given experiment 3.1 failed to train, and it was postulated that this was due to the components possibly being in a feedback loop, where improvements in one component resulted in declines in the performance of the second component, we attempt to alternately train the Rocket and CNN-ResBiGRU component, while holding the weights of the alternate component constant, in each epoch.

It was later observed this approach was not effective for training the weights, so the Rocket and CNN-ResBiGRU models were pre-trained separately and the weights copied in to the combined model.

**Pretrain Rocket PCA model component independently**

In [49]:
# Note the time series raw input is not used.
ts_raw_input  = Input(shape=(sz - 1, dim), name = "raw")
pca_raw_input = Input(shape=(128,), name = "pca")

layer = BatchNormalization(name = "pca_BatchNorm")(pca_raw_input)
layer = Dense(64, activation="sigmoid", name = "pca_Dense")(layer)
output_layer = Dense(C, activation="softmax")(layer)
pretrained_rocket_model = Model(inputs={"raw": ts_raw_input, "pca": pca_raw_input}, outputs=output_layer)

In [50]:
pretrained_rocket_model.compile(optimizer=Adam(learning_rate=0.001, beta_1=0.99, beta_2=0.999, epsilon=1e-08), loss='categorical_crossentropy', metrics=['accuracy'])
history_pretrained_rocket_model = pretrained_rocket_model.fit(train_rr_ds,  validation_data = val_rr_ds, epochs=50, verbose = 1, callbacks = [earlystopping, reduce_lr])

Epoch 1/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 34ms/step - accuracy: 0.0602 - loss: 3.4493 - val_accuracy: 0.1473 - val_loss: 3.1214 - learning_rate: 0.0010
Epoch 2/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.2371 - loss: 2.9642 - val_accuracy: 0.3628 - val_loss: 2.7862 - learning_rate: 0.0010
Epoch 3/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5329 - loss: 2.5565 - val_accuracy: 0.6243 - val_loss: 2.4038 - learning_rate: 0.0010
Epoch 4/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7419 - loss: 2.1398 - val_accuracy: 0.6947 - val_loss: 2.0105 - learning_rate: 0.0010
Epoch 5/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7990 - loss: 1.7225 - val_accuracy: 0.7249 - val_loss: 1.6418 - learning_rate: 0.0010
Epoch 6/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/

In [51]:
pretrained_rocket_model.save('../models/BlockRocketExperiments/pretrained_rocket_model.keras')
with open('../models/BlockRocketExperiments/history_pretrained_rocket_model.pkl', 'wb') as f:
    pickle.dump(history_pretrained_rocket_model.history, f)

In [52]:
# Load the model
pretrained_rocket_model = load_model('../models/BlockRocketExperiments/pretrained_rocket_model.keras')

# Load the training history
with open('../models/BlockRocketExperiments/history_pretrained_rocket_model.pkl', 'rb') as f:
    history_pretrained_rocket_model = pickle.load(f)

In [53]:
pretrained_rocket_results_val = pretrained_rocket_model.evaluate(val_rr_ds, batch_size=128)
pretrained_rocket_results_test = pretrained_rocket_model.evaluate(test_rr_ds, batch_size=128)
print("Validation Loss: {}\nValidation Accuracy: {}".format(*pretrained_rocket_results_val))
print("Test Loss: {}\nTest Accuracy: {}".format(*pretrained_rocket_results_test))

[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 14ms/step - accuracy: 0.7521 - loss: 0.6616
[1m17/17[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step - accuracy: 0.7294 - loss: 0.9097
Validation Loss: 0.6328442692756653
Validation Accuracy: 0.7600574493408203
Test Loss: 0.885633111000061
Test Accuracy: 0.7288296818733215


In [54]:
predictions_pretrained_rocket_val = pretrained_rocket_model.predict((val_pca_tf, val_raw_tf), batch_size=128)
predictions_pretrained_rocket_test = pretrained_rocket_model.predict((test_pca_tf, test_raw_tf), batch_size=128)

predictions_pretrained_rocket_val =tf.argmax(predictions_pretrained_rocket_val, axis = 1)
predictions_pretrained_rocket_test = tf.argmax(predictions_pretrained_rocket_test, axis = 1)

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step


In [55]:
# Compare predictions against the targets
print("Validation Data - Total predictions made: %s. Number of correct predictions: %s" % (len(predictions_pretrained_rocket_val), sum(np.equal(predictions_pretrained_rocket_val, y_val))))
print("Test Data - Total predictions made: %s. Number of correct predictions: %s" % (len(predictions_pretrained_rocket_test), sum(np.equal(predictions_pretrained_rocket_test, y_test))))

Validation Data - Total predictions made: 1392. Number of correct predictions: 1058
Test Data - Total predictions made: 1051. Number of correct predictions: 766


**Pretrain CNN-ResBiGRU model component independently**

In [92]:
# Note the PCA raw input is not used.
ts_raw_input  = Input(shape=(sz - 1, dim), name = "raw")

layer = ConvBlock(num_filters = 32, name = "ts_ConvBlock")(ts_raw_input)
layer = ConvBlock(num_filters = 32, name = "ts_ConvBlock2")(layer)
layer = ResBiGRU(h1_units =32, h2_units = 32, name = "ts_ResBiGRU1")(layer)
layer = GlobalMaxPooling1D(name = "ts_GlobalMaxPooling")(layer)
layer = BatchNormalization(name = "ts_BatchNorm")(layer)
output_layer = Dense(C, activation="softmax")(layer)
pretrained_cnnresbigru_model = Model(inputs=ts_raw_input, outputs = output_layer)

In [94]:
pretrained_cnnresbigru_model.compile(optimizer=Adam(learning_rate=0.001, beta_1=0.99, beta_2=0.999, epsilon=1e-08), loss='categorical_crossentropy', metrics=['accuracy'])
history_cnnresbigru_model = pretrained_cnnresbigru_model.fit(train_raw_ds,  validation_data = val_raw_ds, epochs=50, verbose = 1, callbacks = [earlystopping, reduce_lr])

Epoch 1/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m32s[0m 269ms/step - accuracy: 0.2318 - loss: 2.6390 - val_accuracy: 0.4375 - val_loss: 1.8288 - learning_rate: 0.0010
Epoch 2/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 132ms/step - accuracy: 0.5558 - loss: 1.5121 - val_accuracy: 0.5575 - val_loss: 1.2772 - learning_rate: 0.0010
Epoch 3/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 125ms/step - accuracy: 0.6747 - loss: 1.1204 - val_accuracy: 0.6415 - val_loss: 1.0672 - learning_rate: 0.0010
Epoch 4/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 121ms/step - accuracy: 0.7580 - loss: 0.8714 - val_accuracy: 0.6825 - val_loss: 0.9069 - learning_rate: 0.0010
Epoch 5/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 132ms/step - accuracy: 0.8202 - loss: 0.6894 - val_accuracy: 0.6911 - val_loss: 0.8284 - learning_rate: 0.0010
Epoch 6/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1

In [101]:
pretrained_cnnresbigru_model.save('../models/BlockRocketExperiments/pretrained_cnnresbigru_model.keras')
with open('../models/BlockRocketExperiments/history_cnnresbigru_model.pkl', 'wb') as f:
    pickle.dump(history_cnnresbigru_model.history, f)

In [59]:
# Load the model
custom_objects = {'ConvBlock': ConvBlock, 'ResBiGRU': ResBiGRU}
pretrained_cnnresbigru_model = load_model('../models/BlockRocketExperiments/pretrained_cnnresbigru_model.keras', custom_objects=custom_objects)

# Load the training history
with open('../models/BlockRocketExperiments/history_cnnresbigru_model.pkl', 'rb') as f:
    history_cnnresbigru_model = pickle.load(f)

In [60]:
pretrained_cnnresbigru_results_val = pretrained_cnnresbigru_model.evaluate(val_raw_ds, batch_size=128)
pretrained_cnnresbigru_results_test = pretrained_cnnresbigru_model.evaluate(test_raw_ds, batch_size=128)
print("Validation Loss: {}\nValidation Accuracy: {}".format(*pretrained_cnnresbigru_results_val))
print("Test Loss: {}\nTest Accuracy: {}".format(*pretrained_cnnresbigru_results_test))

[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 66ms/step - accuracy: 0.7338 - loss: 0.7156
[1m17/17[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 76ms/step - accuracy: 0.7282 - loss: 1.0129
Validation Loss: 0.6878427863121033
Validation Accuracy: 0.7428160905838013
Test Loss: 0.9957500696182251
Test Accuracy: 0.7269267439842224


In [61]:
predictions_pretrained_cnnresbigru_val = pretrained_cnnresbigru_model.predict(val_raw_tf, batch_size=128)
predictions_pretrained_cnnresbigru_test = pretrained_cnnresbigru_model.predict(test_raw_tf, batch_size=128)

predictions_pretrained_cnnresbigru_val =tf.argmax(predictions_pretrained_cnnresbigru_val, axis = 1)
predictions_pretrained_cnnresbigru_test = tf.argmax(predictions_pretrained_cnnresbigru_test, axis = 1)

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 167ms/step
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 114ms/step


In [62]:
# Compare predictions against the targets
print("Validation Data - Total predictions made: %s. Number of correct predictions: %s" % (len(predictions_pretrained_cnnresbigru_val), sum(np.equal(predictions_pretrained_cnnresbigru_val, y_val))))
print("Test Data - Total predictions made: %s. Number of correct predictions: %s" % (len(predictions_pretrained_cnnresbigru_test), sum(np.equal(predictions_pretrained_cnnresbigru_test, y_test))))

Validation Data - Total predictions made: 1392. Number of correct predictions: 1034
Test Data - Total predictions made: 1051. Number of correct predictions: 764


**Define combined model**

In [26]:
ts_raw_input = Input(shape=(sz - 1, dim), name = "raw")
pca_raw_input = Input(shape=(128,), name = "pca")

# Define the layers of the time series model
ts_layer = ConvBlock(num_filters = 32, name = "ts_ConvBlock1")(ts_raw_input)
ts_layer = ConvBlock(num_filters = 32, name = "ts_ConvBlock2")(ts_layer)
ts_layer = ResBiGRU(h1_units = 32, h2_units = 32, name = "ts_ResBiGRU1")(ts_layer)
ts_layer = GlobalMaxPooling1D(name = "ts_GlobalMaxPooling")(ts_layer)
ts_layer = BatchNormalization(momentum=0.99, center=True, scale=True, name = "ts_BatchNorm")(ts_layer)
output_ts = ts_layer

# Define the layers of the PCA model based on the Rocket features
pca_layer = BatchNormalization(name = "pca_BatchNorm")(pca_raw_input)
pca_layer = Dense(64, activation="sigmoid", name = "pca_Dense")(pca_layer)
output_pca = pca_layer

# Define the two component models
ts_model = Model(inputs=ts_raw_input, outputs=output_ts)
pca_model = Model(inputs=pca_raw_input, outputs=output_pca)

# Define the layers of the combined model
combined_layer = Concatenate(axis = 1)([ts_model.output, pca_model.output])
#combined_layer = Dense(32, activation="sigmoid", name = "comb_Dense1")(combined_layer) # Compress the information into 32 dimensions
final_output  = Dense(C, activation="softmax", name = "comb_Dense2")(combined_layer)

# Define the combined model
combined_model = Model(inputs= {"raw": ts_model.input, "pca": pca_model.input}, outputs = final_output)
combined_model.summary()

In [27]:
def copy_model_weights(source, target, mapping):
    # Copy weights from source model to target model based on layer names
    for source_layer_name, target_layer_name in mapping.items():
        try:
            target.get_layer(name=target_layer_name).set_weights(
                source.get_layer(name=source_layer_name).get_weights()
            )
            print(f"Copied weights from {source_layer_name} to {target_layer_name}")
        except Exception as e:
            print(f"Could not copy weights from {source_layer_name} to {target_layer_name}: {e}")
    return target

In [28]:
rocket_layer_mapping = {
    'pca_Dense': 'pca_Dense',
    'pca_BatchNorm': 'pca_BatchNorm'
}

ts_layer_mapping = {
    'ts_ConvBlock1': 'ts_ConvBlock1',
    'ts_ConvBlock2': 'ts_ConvBlock2',
    'ts_ResBiGRU1': 'ts_ResBiGRU1',
    'ts_GlobalMaxPooling': 'ts_GlobalMaxPooling',
    'ts_BatchNorm': 'ts_BatchNorm'
}


#### **3.2.1: Using custom train steps**

In [111]:
combined_model_custom = Model(inputs= {"raw": ts_model.input, "pca": pca_model.input}, outputs = final_output)

combined_model_custom = copy_model_weights(source = pretrained_rocket_model, target = combined_model_custom, mapping = rocket_layer_mapping)
combined_model_custom = copy_model_weights(source = pretrained_cnnresbigru_model, target = combined_model_custom, mapping = ts_layer_mapping)

Copied weights from pca_Dense to pca_Dense
Copied weights from pca_BatchNorm to pca_BatchNorm
Could not copy weights from ts_ConvBlock1 to ts_ConvBlock1: No such layer: ts_ConvBlock1. Existing layers are: ['raw', 'ts_ConvBlock', 'ts_ConvBlock2', 'ts_ResBiGRU1', 'ts_GlobalMaxPooling', 'ts_BatchNorm', 'dense_22'].
Copied weights from ts_ConvBlock2 to ts_ConvBlock2
Copied weights from ts_ResBiGRU1 to ts_ResBiGRU1
Copied weights from ts_GlobalMaxPooling to ts_GlobalMaxPooling
Copied weights from ts_BatchNorm to ts_BatchNorm


Define the train step for the PCA component. Notice the CNN-ResBiGRU component is held constant.

In [117]:
@tf.function
def train_step_pca(loss_fn, opt, train_batch):
    """
    This function performs the train step for the PCA component.
    """
    inputs, y_true = train_batch
    input_ResBiGRU, input_pca = inputs["raw"], inputs["pca"]

    with tf.GradientTape() as tape:
        output_pca = pca_model(input_pca, training = True)
        output_resbigru = ts_model(input_ResBiGRU, training = False)
        y_pred = combined_model({"raw": input_ResBiGRU, "pca": input_pca}, training=True)
        loss = loss_fn(y_true, y_pred)

    grads = tape.gradient(loss, pca_model.trainable_weights)
    opt.apply_gradients(zip(grads, pca_model.trainable_weights))

    return loss, y_true, y_pred

Define the train step for the CNN-ResBiGRU component. Notice the PCA component is held constant.

In [118]:
@tf.function
def train_step_CNNResBiGRU(loss_fn, opt, train_batch):
    """
    This function performs the train step for the CNN-ResBiGRU component.
    """
    inputs, y_true = train_batch
    input_ResBiGRU, input_pca = inputs["raw"], inputs["pca"]

    with tf.GradientTape() as tape:
        output_pca = pca_model(input_pca, training = False)
        output_resbigru = ts_model(input_ResBiGRU, training = True)
        y_pred = combined_model({"raw": input_ResBiGRU, "pca": input_pca}, training=True)
        loss = loss_fn(y_true, y_pred)

    grads = tape.gradient(loss, pca_model.trainable_weights)
    opt.apply_gradients(zip(grads, pca_model.trainable_weights))

    return loss, y_true, y_pred

In [120]:
# Define the loss and accuracy metrics and optimizer

loss_metric = tf.keras.metrics.Mean()
accuracy_metric = tf.keras.metrics.CategoricalAccuracy()

In [119]:
def train_model(loss_fn, training_dataset, epochs, accuracy_metric=accuracy_metric):
    """
    This function should run the custom training loop as described above.
    The function should return a tuple of two lists with the loss and accuracy scores.
    """
    epoch_losses = []
    epoch_acc = []

    pca_opt = Adam(learning_rate=1.0, beta_1=0.99, beta_2=0.999, epsilon=1e-08)
    resbigru_opt = Adam(learning_rate=0.001, beta_1=0.99, beta_2=0.999, epsilon=1e-08)

    for epoch in range(epochs):
        loss_metric.reset_state()
        accuracy_metric.reset_state()

        for train_batch in training_dataset:
            # Losses, targets and predicted values after running PCA component
            loss1, y_true1, y_pred1 = train_step_pca(loss_fn, pca_opt, train_batch)

            # Losses, targets and predicted values after running CNN-ResBiGRU component
            loss2, y_true2, y_pred2 = train_step_CNNResBiGRU(loss_fn, resbigru_opt, train_batch)

            # The accuracy and loss metrics are updated after both training steps have been executed
            loss_metric.update_state(loss2)
            accuracy_metric.update_state(y_true2, y_pred2)

        avg_epoch_loss = float(loss_metric.result().numpy())
        avg_epoch_acc = float(accuracy_metric.result().numpy())
        epoch_losses.append(avg_epoch_loss)
        epoch_acc.append(avg_epoch_acc)
        print(f"Epoch {epoch}: loss - {avg_epoch_loss:.4f}, accuracy = {avg_epoch_acc:.4f}")

    return epoch_losses, epoch_acc

In [None]:
# Run the custom training loop
epoch_losses, epoch_acc = train_model(loss_fn=CategoricalCrossentropy(from_logits = False),
                                      training_dataset=train_rr_ds,
                                      epochs=50)

This approach was too slow.

#### **3.2.2: Using Keras (2 x Alternate Training)**

In [29]:
combined_model = Model(inputs= {"raw": ts_model.input, "pca": pca_model.input}, outputs = final_output)

combined_model = copy_model_weights(source = pretrained_rocket_model, target = combined_model, mapping = rocket_layer_mapping)
combined_model = copy_model_weights(source = pretrained_cnnresbigru_model, target = combined_model, mapping = ts_layer_mapping)

optimiser = Adam(learning_rate=0.001, beta_1=0.99, beta_2=0.999, epsilon=1e-08, clipnorm=1.0)
combined_model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['accuracy'])

Copied weights from pca_Dense to pca_Dense
Copied weights from pca_BatchNorm to pca_BatchNorm
Could not copy weights from ts_ConvBlock1 to ts_ConvBlock1: No such layer: ts_ConvBlock1. Existing layers are: ['raw', 'ts_ConvBlock', 'ts_ConvBlock2', 'ts_ResBiGRU1', 'ts_GlobalMaxPooling', 'ts_BatchNorm', 'dense_22'].
Copied weights from ts_ConvBlock2 to ts_ConvBlock2
Copied weights from ts_ResBiGRU1 to ts_ResBiGRU1
Copied weights from ts_GlobalMaxPooling to ts_GlobalMaxPooling
Copied weights from ts_BatchNorm to ts_BatchNorm


In [30]:
pca_layers = [layer.name for layer in pca_model.layers]
ts_layers = [layer.name for layer in ts_model.layers]
combined_layers = [layer.name for layer in combined_model.layers]

# The layers which are shared between both components of the model.
shared_layers = list(set(combined_layers) - set(pca_layers) - set(ts_layers))

pca_component_layers = pca_layers + shared_layers
ts_component_layers = ts_layers + shared_layers

In [31]:
# Custom training loop
epochs = 10 # 5 sub-epochs per parent epoch = 50 epochs

# Keras optimisers are tied to the weights they are created with
ts_optimiser = Adam(learning_rate=0.001, beta_1=0.99, beta_2=0.999, epsilon=1e-08)
pca_optimiser = Adam(learning_rate=0.001, beta_1=0.99, beta_2=0.999, epsilon=1e-08)

for epoch in range(epochs):
    print(f'Epoch {epoch+1}/{epochs}')

    # Freeze the PCA component and train the CNN-ResBiGRU component
    for layer in combined_model.layers:  # Adjust indices based on your model's architecture
        if layer.name in ts_component_layers:
            layer.trainable = True
        else:
            layer.trainable = False

    combined_model.compile(optimizer= ts_optimiser, loss='categorical_crossentropy', metrics=['accuracy'])
    combined_model.fit(train_rr_ds, validation_data = val_rr_ds, epochs=5, verbose=1)

    # Freeze the CNN-ResBiGRU component and train the PCA component
    for layer in combined_model.layers:  # Adjust indices based on your model's architecture
        if layer.name in pca_component_layers:
            layer.trainable = True
        else:
            layer.trainable = False

    combined_model.compile(optimizer= pca_optimiser, loss='categorical_crossentropy', metrics=['accuracy'])
    combined_model.fit(train_rr_ds, validation_data = val_rr_ds, epochs=5, verbose=1)

Epoch 1/10
Epoch 1/5
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 266ms/step - accuracy: 0.2060 - loss: 2.9261 - val_accuracy: 0.3470 - val_loss: 1.8583
Epoch 2/5
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 124ms/step - accuracy: 0.5175 - loss: 1.4477 - val_accuracy: 0.5374 - val_loss: 1.2445
Epoch 3/5
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 125ms/step - accuracy: 0.6585 - loss: 1.0346 - val_accuracy: 0.6329 - val_loss: 0.9865
Epoch 4/5
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 124ms/step - accuracy: 0.7548 - loss: 0.8013 - val_accuracy: 0.6861 - val_loss: 0.8761
Epoch 5/5
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 124ms/step - accuracy: 0.8206 - loss: 0.6316 - val_accuracy: 0.6875 - val_loss: 0.8498
Epoch 1/5
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 102ms/step - accuracy: 0.8752 - loss: 0.4775 - val_accuracy: 0.7011 - val_loss: 0.7894
Epoch 2/5
[1m68/68[

In [37]:
combined_results_val = combined_model.evaluate(val_rr_ds, batch_size=128)
combined_results_test = combined_model.evaluate(test_rr_ds, batch_size=128)
print("Validation Loss: {}\nValidation Accuracy: {}".format(*combined_results_val))
print("Test Loss: {}\nTest Accuracy: {}".format(*combined_results_test))

[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 23ms/step - accuracy: 0.7532 - loss: 1.2090
[1m17/17[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 86ms/step - accuracy: 0.7555 - loss: 1.1266
Validation Loss: 1.1175472736358643
Validation Accuracy: 0.7629310488700867
Test Loss: 1.1137651205062866
Test Accuracy: 0.7526165843009949


In [38]:
predictions_combined_val = combined_model.predict((val_pca_tf, val_raw_tf), batch_size=128)
predictions_combined_test = combined_model.predict((test_pca_tf, test_raw_tf), batch_size=128)

predictions_combined_val =tf.argmax(predictions_combined_val, axis = 1)
predictions_combined_test = tf.argmax(predictions_combined_test, axis = 1)

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 182ms/step
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 118ms/step


In [39]:
# Compare predictions against the targets
print("Validation Data - Total predictions made: %s. Number of correct predictions: %s" % (len(predictions_combined_val), sum(np.equal(predictions_combined_val, y_val))))
print("Test Data - Total predictions made: %s. Number of correct predictions: %s" % (len(predictions_combined_test), sum(np.equal(predictions_combined_test, y_test))))

Validation Data - Total predictions made: 1392. Number of correct predictions: 1062
Test Data - Total predictions made: 1051. Number of correct predictions: 791


In [40]:
pretrained_cnnresbigru_model.save('../models/BlockRocketExperiments/combined_model1.keras')

In [41]:
# Load the model
custom_objects = {'ConvBlock': ConvBlock, 'ResBiGRU': ResBiGRU}
pretrained_cnnresbigru_model = load_model('../models/BlockRocketExperiments/combined_model1.keras', custom_objects=custom_objects)


Alternately training each component does not appear to be the right approach. Can they both be trained concurrently after the pretraining?

**3.2.3: Keras (3 x alternate pretraining)**

In [63]:
combined_model2 = Model(inputs= {"raw": ts_model.input, "pca": pca_model.input}, outputs = final_output)

combined_model2 = copy_model_weights(source = pretrained_rocket_model, target = combined_model2, mapping = rocket_layer_mapping)
combined_model2 = copy_model_weights(source = pretrained_cnnresbigru_model, target = combined_model2, mapping = ts_layer_mapping)

Copied weights from pca_Dense to pca_Dense
Copied weights from pca_BatchNorm to pca_BatchNorm
Could not copy weights from ts_ConvBlock1 to ts_ConvBlock1: No such layer: ts_ConvBlock1. Existing layers are: ['raw', 'ts_ConvBlock', 'ts_ConvBlock2', 'ts_ResBiGRU1', 'ts_GlobalMaxPooling', 'ts_BatchNorm', 'dense_22'].
Copied weights from ts_ConvBlock2 to ts_ConvBlock2
Copied weights from ts_ResBiGRU1 to ts_ResBiGRU1
Copied weights from ts_GlobalMaxPooling to ts_GlobalMaxPooling
Copied weights from ts_BatchNorm to ts_BatchNorm


In [64]:
optimiser2 = Adam(learning_rate=1.0, beta_1=0.99, beta_2=0.999, epsilon=1e-08, clipnorm=1.0)
combined_model2.compile(optimizer=optimiser2, loss='categorical_crossentropy', metrics=['accuracy'])

In [65]:
pca_layers = [layer.name for layer in pca_model.layers]
ts_layers = [layer.name for layer in ts_model.layers]
combined_layers = [layer.name for layer in combined_model2.layers]

# The layers which are shared between both components of the model.
shared_layers = list(set(combined_layers) - set(pca_layers) - set(ts_layers))

pca_component_layers = pca_layers + shared_layers
ts_component_layers = ts_layers + shared_layers

In [66]:
# Custom training loop
epochs = 25

# Keras optimisers are tied to the weights they are created with
comb_optimiser = Adam(learning_rate=0.001, beta_1=0.99, beta_2=0.999, epsilon=1e-08)
ts_optimiser = Adam(learning_rate=0.001, beta_1=0.99, beta_2=0.999, epsilon=1e-08)
pca_optimiser = Adam(learning_rate=0.001, beta_1=0.99, beta_2=0.999, epsilon=1e-08)

for epoch in range(epochs):
    print(f'Epoch {epoch+1}/{epochs}')

    # Freeze the PCA component and CNN-ResBiGRU component and train the combination component
    for layer in combined_model2.layers:  # Adjust indices based on your model's architecture
        if layer.name in shared_layers:
            layer.trainable = True
        else:
            layer.trainable = False
    # Keras optimisers are tied to the weights they are created with
    combined_model2.compile(optimizer= comb_optimiser, loss='categorical_crossentropy', metrics=['accuracy'])
    combined_model2.fit(train_rr_ds, validation_data = val_rr_ds, epochs=5, verbose=1, callbacks = [reduce_lr])

    # Freeze the PCA component and combination component and train the CNN-ResBiGRU component
    for layer in combined_model2.layers:  # Adjust indices based on your model's architecture
        if layer.name in ts_layers:
            layer.trainable = True
        else:
            layer.trainable = False

    combined_model2.compile(optimizer= ts_optimiser, loss='categorical_crossentropy', metrics=['accuracy'])
    combined_model2.fit(train_rr_ds, validation_data = val_rr_ds, epochs=3, verbose=1, callbacks = [reduce_lr])

    # Freeze the CNN-ResBiGRU component and combination component and train the PCA component
    for layer in combined_model2.layers:  # Adjust indices based on your model's architecture
        if layer.name in pca_layers:
            layer.trainable = True
        else:
            layer.trainable = False

    combined_model2.compile(optimizer= pca_optimiser, loss='categorical_crossentropy', metrics=['accuracy'])
    combined_model2.fit(train_rr_ds, validation_data = val_rr_ds, epochs=2, verbose=1, callbacks = [reduce_lr])

Epoch 1/25
Epoch 1/5
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 94ms/step - accuracy: 0.6878 - loss: 1.0188 - val_accuracy: 0.5948 - val_loss: 1.7074 - learning_rate: 0.0010
Epoch 2/5
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 31ms/step - accuracy: 0.7963 - loss: 0.6542 - val_accuracy: 0.6078 - val_loss: 1.5460 - learning_rate: 0.0010
Epoch 3/5
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 30ms/step - accuracy: 0.8200 - loss: 0.5092 - val_accuracy: 0.6286 - val_loss: 1.3567 - learning_rate: 0.0010
Epoch 4/5
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 32ms/step - accuracy: 0.8592 - loss: 0.4023 - val_accuracy: 0.6458 - val_loss: 1.3126 - learning_rate: 0.0010
Epoch 5/5
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 31ms/step - accuracy: 0.8840 - loss: 0.3533 - val_accuracy: 0.6552 - val_loss: 1.3140 - learning_rate: 0.0010
Epoch 1/3
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3

In [67]:
combined_model2.save('../models/BlockRocketExperiments/combined_model2.keras')

In [68]:
# Load the model
custom_objects = {'ConvBlock': ConvBlock, 'ResBiGRU': ResBiGRU}
combined_model2 = load_model('../models/BlockRocketExperiments/combined_model2.keras', custom_objects=custom_objects)


  saveable.load_own_variables(weights_store.get(inner_path))


In [70]:
combined2_results_val = combined_model2.evaluate(val_rr_ds, batch_size=128)
combined2_results_test = combined_model2.evaluate(test_rr_ds, batch_size=128)
print("Validation Loss: {}\nValidation Accuracy: {}".format(*combined2_results_val))
print("Test Loss: {}\nTest Accuracy: {}".format(*combined2_results_test))

[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 65ms/step - accuracy: 0.7353 - loss: 1.0765
[1m17/17[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 74ms/step - accuracy: 0.6769 - loss: 1.6901
Validation Loss: 0.9906644225120544
Validation Accuracy: 0.7586206793785095
Test Loss: 1.6279574632644653
Test Accuracy: 0.6860133409500122


In [71]:
predictions_combined2_val = combined_model2.predict((val_pca_tf, val_raw_tf), batch_size=128)
predictions_combined2_test = combined_model2.predict((test_pca_tf, test_raw_tf), batch_size=128)

predictions_combined2_val =tf.argmax(predictions_combined2_val, axis = 1)
predictions_combined2_test = tf.argmax(predictions_combined2_test, axis = 1)

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 160ms/step
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 100ms/step


In [72]:
# Compare predictions against the targets
print("Validation Data - Total predictions made: %s. Number of correct predictions: %s" % (len(predictions_combined2_val), sum(np.equal(predictions_combined2_val, y_val))))
print("Test Data - Total predictions made: %s. Number of correct predictions: %s" % (len(predictions_combined2_test), sum(np.equal(predictions_combined2_test, y_test))))

Validation Data - Total predictions made: 1392. Number of correct predictions: 1056
Test Data - Total predictions made: 1051. Number of correct predictions: 721


#### **3.2.3: Using Keras (Simultaneous Training)**

In [79]:
combined_model3 = Model(inputs= {"raw": ts_model.input, "pca": pca_model.input}, outputs = final_output)
combined_model3.summary()

In [80]:
combined_model3 = copy_model_weights(source = pretrained_rocket_model, target = combined_model3, mapping = rocket_layer_mapping)
combined_model3 = copy_model_weights(source = pretrained_cnnresbigru_model, target = combined_model3, mapping = ts_layer_mapping)

Copied weights from pca_Dense to pca_Dense
Copied weights from pca_BatchNorm to pca_BatchNorm
Could not copy weights from ts_ConvBlock1 to ts_ConvBlock1: No such layer: ts_ConvBlock1. Existing layers are: ['raw', 'ts_ConvBlock', 'ts_ConvBlock2', 'ts_ResBiGRU1', 'ts_GlobalMaxPooling', 'ts_BatchNorm', 'dense_22'].
Copied weights from ts_ConvBlock2 to ts_ConvBlock2
Copied weights from ts_ResBiGRU1 to ts_ResBiGRU1
Copied weights from ts_GlobalMaxPooling to ts_GlobalMaxPooling
Copied weights from ts_BatchNorm to ts_BatchNorm


In [81]:
optimiser3 = Adam(learning_rate=0.001, beta_1=0.99, beta_2=0.999, epsilon=1e-08, clipnorm=1.0)
combined_model3.compile(optimizer=optimiser3, loss='categorical_crossentropy', metrics=['accuracy'])
history_2 = combined_model3.fit(train_rr_ds, validation_data=val_rr_ds, epochs=50, verbose = 1, callbacks = [reduce_lr])

Epoch 1/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 99ms/step - accuracy: 0.9678 - loss: 0.1024 - val_accuracy: 0.7299 - val_loss: 1.1635 - learning_rate: 0.0010
Epoch 2/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 31ms/step - accuracy: 0.9816 - loss: 0.0633 - val_accuracy: 0.7306 - val_loss: 1.1492 - learning_rate: 0.0010
Epoch 3/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 31ms/step - accuracy: 0.9892 - loss: 0.0481 - val_accuracy: 0.7270 - val_loss: 1.1500 - learning_rate: 0.0010
Epoch 4/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 31ms/step - accuracy: 0.9872 - loss: 0.0470 - val_accuracy: 0.7284 - val_loss: 1.1571 - learning_rate: 0.0010
Epoch 5/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 30ms/step - accuracy: 0.9909 - loss: 0.0410 - val_accuracy: 0.7299 - val_loss: 1.1628 - learning_rate: 0.0010
Epoch 6/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m

#### **3.2.4: Using Keras (Simultaneous Training) with no pre-training**

In [82]:
combined_model5 = Model(inputs= {"raw": ts_model.input, "pca": pca_model.input}, outputs = final_output)
combined_model5.summary()

In [83]:
optimiser3 = Adam(learning_rate=0.001, beta_1=0.99, beta_2=0.999, epsilon=1e-08, clipnorm=1.0)
combined_model5.compile(optimizer=optimiser3, loss='categorical_crossentropy', metrics=['accuracy'])
history_4 = combined_model5.fit(train_rr_ds, validation_data=val_rr_ds, epochs=50, verbose = 1, callbacks = [reduce_lr])

Epoch 1/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 100ms/step - accuracy: 0.9979 - loss: 0.0212 - val_accuracy: 0.7241 - val_loss: 1.1780 - learning_rate: 0.0010
Epoch 2/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 30ms/step - accuracy: 0.9983 - loss: 0.0202 - val_accuracy: 0.7220 - val_loss: 1.1757 - learning_rate: 0.0010
Epoch 3/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 31ms/step - accuracy: 0.9960 - loss: 0.0205 - val_accuracy: 0.7249 - val_loss: 1.1737 - learning_rate: 0.0010
Epoch 4/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 30ms/step - accuracy: 0.9967 - loss: 0.0200 - val_accuracy: 0.7256 - val_loss: 1.1712 - learning_rate: 0.0010
Epoch 5/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 31ms/step - accuracy: 0.9988 - loss: 0.0167 - val_accuracy: 0.7241 - val_loss: 1.1698 - learning_rate: 0.0010
Epoch 6/50
[1m68/68[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0

The model appears to be caught in a local minima. This was also the case for the pre-trained model.