# Testing each Model:
Now that we have built each model, let's test them on unseen (test) data to gauge their true performance.

### Importing Libraries

In [31]:
#importing all libraries
import os
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import pandas as pd
import tensorflow as tf
import numpy as np
import hashlib
import cv2
import seaborn as sns
from sklearn.metrics import confusion_matrix

from sklearn.ensemble import VotingClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [5]:
# Change directory to 'DATASCI207_FinalProject'
os.chdir('/content/drive/My Drive/DATASCI207_FinalProject')
tf.keras.backend.clear_session()

### Load the Testing Data:

In [6]:
X_train = np.load('X_train.npy')
Y_train = np.load('Y_train.npy')

In [7]:
X_test = np.load('X_test.npy')
Y_test = np.load('Y_test.npy')

### Load each Model:

In [8]:
# Load the baseline model:
baseline_model = tf.keras.models.load_model("baseline_model_fit.keras")

In [9]:
# Load the 2-layer CNN model:
model_2_layers = tf.keras.models.load_model("model_2_layers.keras")

In [10]:
# Load the 3-layer CNN model:
model_3_layers = tf.keras.models.load_model("model_3_layers.keras")

In [11]:
# Load the hybrid CNN-Transformer model:
hybrid_transformer_cnn_model = tf.keras.models.load_model("hybrid_transformer_cnn_model.keras")

### Evaluate Each Model
Use the testing data to evaluate the results of each model:

In [12]:
baseline_model.evaluate(X_test, Y_test)



[0.4505707919597626, 0.880497932434082]

In [13]:
model_2_layers.evaluate(X_test, Y_test)



[0.32811668515205383, 0.9153527021408081]

In [14]:
model_3_layers.evaluate(X_test, Y_test)



[0.36401471495628357, 0.8979253172874451]

In [15]:
hybrid_transformer_cnn_model.evaluate(X_test, Y_test)



[1.5343784093856812, 0.9634854793548584]

We find great results evaluating each model with test data. The hybrid (CNN + Transformer) model has resulted in 96% accuracy against test data! This is a noticeable improvement upon the baseline accuracy of 88%, meaning that our model has learned siginificant features from the dataset.

Now, we will attempt one more step - an ensemble model, to see if that results in any extra performance boost.

# Build an Ensemble Model:
Construct an ensemble model that combines the best two performing models and averages their predictions into one final prediction:

In [36]:
def build_ensemble_model(model1, model2):

  # Get probability predictions from both models
  model1_predictions = model1.predict(X_test)
  model2_predictions = model2.predict(X_test)

  # Calculate average probabilities
  ensemble_predictions = (model1_predictions + model2_predictions) / 2

  # Get final predictions based on the highest average probability
  final_predictions = np.argmax(ensemble_predictions, axis=1)

  ensemble_accuracy = accuracy_score(Y_test, final_predictions)

  return f'Ensemble model accuracy = {ensemble_accuracy}'

In [37]:
build_ensemble_model(model1=hybrid_transformer_cnn_model, model2=model_2_layers)



'Ensemble model accuracy = 0.9676348547717842'

From above, we find that the ensemble model indeed does work marginally better than the hybrid model. Taking into account explainability, we will select the hybrid model as our best performing model overall.