<a href="https://colab.research.google.com/github/Jieoi/traffic_sign_recognition/blob/main/9_Fusion_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1>9. Development of final model and evaluating</h1>

Like previous notebooks, Google Colab is first used to mount the drive

In [None]:
from google.colab import drive

# Mount Google Drive
drive.mount('drive', force_remount=True)

Mounted at drive


A similar set of libraries is then imported

In [None]:
import os
import zipfile
from shutil import copyfile

from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Metrics for evaluation
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score, precision_score, f1_score, recall_score, confusion_matrix, matthews_corrcoef

from tensorflow.keras.models import load_model
import numpy as np

from scipy.stats import mode

# 9.1 Test data preparation

In [None]:
# Define a function to extract the class label from the image file name
def extract_label(filename):
    parts = filename.split('_')
    if len(parts) >= 3:
        return parts[-2]  # Extract the label number
    else:
        return None

Test data is extracted from the file in drive

In [None]:
# Define the paths
test_zip_path = 'drive/MyDrive/final/test_data/test_images_enhanced_PIL_RRDB.zip'
test_extracted_dir = 'extracted_test_images'
test_dir = 'test_images'

# Create the directory for extracted test images if it doesn't exist
os.makedirs(test_extracted_dir, exist_ok=True)

# Open the ZIP file and extract its contents
with zipfile.ZipFile(test_zip_path, 'r') as zip_ref:
    zip_ref.extractall(test_extracted_dir)

They are first saved in the local directory

In [None]:
# List all files in the extracted directory
test_image_files = os.listdir(test_extracted_dir)

# Create subdirectories based on class labels and move images
for filename in test_image_files:
    if filename.endswith('.png'):
        label = extract_label(filename)
        if label is not None:
            class_dir = os.path.join(test_dir, label)
            os.makedirs(class_dir, exist_ok=True)
            src_path = os.path.join(test_extracted_dir, filename)
            dst_path = os.path.join(class_dir, filename)
            copyfile(src_path, dst_path) # shutil

Then fed to the data generator

In [None]:
# Create a data generator for the test data
test_datagen = ImageDataGenerator(rescale=1.0/255)

test_generator = test_datagen.flow_from_directory(
    test_dir,
    target_size=(128,128),
    batch_size=32,
    class_mode='categorical',
    shuffle=False  # Set shuffle to False for testing data
)

Found 12599 images belonging to 43 classes.


True labels are extracted

In [None]:
# Get the true labels from the test generator
true_labels = test_generator.classes

## 9.2 Preparing the source models

**<h2> Source model 1 - ResNet50 model with RRDB enhanced data:</h2>**

The best performing models are loaded and used to do predictions on test data:

In [None]:
# Load the trained model
model_path_RR = '/content/drive/My Drive/final/training_models/resnet50/RRDB/final_resnet_model_RRDB.keras'
model_RR = load_model(model_path_RR)

# Make predictions on the test data
predictions_RR = model_RR.predict(test_generator)

# Convert predictions to class labels
predicted_labels_RR = np.argmax(predictions_RR, axis=1)



Evaluation matrics are calculated

In [None]:
# Calculate accuracy
accuracy_RR = accuracy_score(true_labels, predicted_labels_RR)
print("Accuracy:", accuracy_RR)

# Calculate F1-score and recall
f1_RR = f1_score(true_labels, predicted_labels_RR, average='weighted')
recall_RR = recall_score(true_labels, predicted_labels_RR, average='weighted')

print("F1-score:", f1_RR)
print("Recall:", recall_RR)

report_RR = classification_report(true_labels, predicted_labels_RR)
print(report_RR)

Accuracy: 0.9847606952932773
F1-score: 0.9846397264412138
Recall: 0.9847606952932773
              precision    recall  f1-score   support

           0       1.00      1.00      1.00       293
           1       0.99      1.00      0.99       293
           2       1.00      1.00      1.00       293
           3       0.96      1.00      0.98       293
           4       1.00      1.00      1.00       293
           5       1.00      1.00      1.00       293
           6       1.00      1.00      1.00       293
           7       1.00      0.99      1.00       293
           8       1.00      1.00      1.00       293
           9       1.00      1.00      1.00       293
          10       1.00      0.99      0.99       293
          11       1.00      1.00      1.00       293
          12       0.99      0.99      0.99       293
          13       0.99      1.00      0.99       293
          14       0.94      0.99      0.97       293
          15       0.96      0.97      0.96       

In [None]:
mcc_rr = matthews_corrcoef(true_labels, predicted_labels_RR)
print("Matthews Correlation Coefficient (RR Model):", mcc_rr)

Matthews Correlation Coefficient (RR Model): 0.9844173536175891


**<h2> Source model 2 - CNN model with RRDB enhanced data:</h2>**

A different model is loaded with a different data generator as the CNN model were developed with grey scaled images

In [None]:
test_datagen = ImageDataGenerator(rescale=1.0/255)

test_generator = test_datagen.flow_from_directory(
    test_dir,
    target_size=(128,128),
    batch_size=32,
    class_mode='categorical',
    color_mode='grayscale',
    shuffle=False  # Set shuffle to False for testing data
)

Found 12599 images belonging to 43 classes.


The model is used to do preditions on test data

In [None]:
model_path_CNN = '/content/drive/My Drive/final/training_models/CNN/RRDB/final_CNN_model_RRDB.keras'
model_CNN = load_model(model_path_CNN)

# Make predictions on the test data using the second model
predictions_CNN = model_CNN.predict(test_generator)

# Convert predictions to class labels
predicted_labels_CNN = np.argmax(predictions_CNN, axis=1)



Evaluation matrics are calculated

In [None]:
# Calculate accuracy for the second model
accuracy_CNN = accuracy_score(true_labels, predicted_labels_CNN)
print("Accuracy for CNN model:", accuracy_CNN)

# Calculate F1-score and recall for the second model
f1_CNN = f1_score(true_labels, predicted_labels_CNN, average='weighted')
recall_CNN = recall_score(true_labels, predicted_labels_CNN, average='weighted')

print("F1-score for CNN model:", f1_CNN)
print("Recall for CNN model:", recall_CNN)

report_CNN = classification_report(true_labels, predicted_labels_CNN)
print("Classification Report for CNN model:\n", report_CNN)

Accuracy for CNN model: 0.927057702992301
F1-score for CNN model: 0.9272350610183079
Recall for CNN model: 0.927057702992301
Classification Report for CNN model:
               precision    recall  f1-score   support

           0       0.98      0.95      0.96       293
           1       0.98      0.97      0.97       293
           2       0.98      1.00      0.99       293
           3       0.95      0.99      0.97       293
           4       1.00      0.96      0.98       293
           5       1.00      1.00      1.00       293
           6       0.99      0.98      0.99       293
           7       0.96      0.99      0.98       293
           8       0.94      0.98      0.96       293
           9       1.00      0.96      0.98       293
          10       0.98      0.94      0.96       293
          11       0.70      0.96      0.81       293
          12       0.97      0.99      0.98       293
          13       0.81      0.89      0.85       293
          14       0.84   

In [None]:
mcc_cnn = matthews_corrcoef(true_labels, predicted_labels_CNN)
print("Matthews Correlation Coefficient (CNN Model):", mcc_cnn)

Matthews Correlation Coefficient (CNN Model): 0.9253989832259155


## 9.3 Ensemble

The results are combined using a majority voting

In [None]:
# Combine predictions from both models using majority voting
combined_predictions = np.vstack((predicted_labels_RR, predicted_labels_CNN)).T
ensemble_predictions = mode(combined_predictions, axis=1).mode.flatten()

Evaluation matrics are calculated

In [None]:
accuracy = accuracy_score(true_labels, ensemble_predictions)
print("Accuracy:", accuracy)

Accuracy: 0.9516628303833637


In [None]:
report = classification_report(true_labels, ensemble_predictions)
print("Classification Report:\n", report)

Classification Report:
               precision    recall  f1-score   support

           0       0.97      1.00      0.99       293
           1       0.98      1.00      0.99       293
           2       0.98      1.00      0.99       293
           3       0.94      1.00      0.97       293
           4       0.99      0.99      0.99       293
           5       1.00      1.00      1.00       293
           6       0.99      1.00      0.99       293
           7       0.99      1.00      0.99       293
           8       0.94      1.00      0.97       293
           9       1.00      0.99      0.99       293
          10       0.98      0.99      0.99       293
          11       0.72      1.00      0.84       293
          12       0.97      1.00      0.98       293
          13       0.82      0.97      0.89       293
          14       0.85      0.94      0.89       293
          15       0.91      0.96      0.94       293
          16       0.85      0.77      0.81       293
   

In [None]:
macro_precision = precision_score(true_labels, ensemble_predictions, average='macro')
macro_recall = recall_score(true_labels, ensemble_predictions, average='macro')
macro_f1 = f1_score(true_labels, ensemble_predictions, average='macro')

micro_precision = precision_score(true_labels, ensemble_predictions, average='micro')
micro_recall = recall_score(true_labels, ensemble_predictions, average='micro')
micro_f1 = f1_score(true_labels, ensemble_predictions, average='micro')


In [None]:
print("Macro Precision:", macro_precision)
print("Macro Recall:", macro_recall)
print("Macro F1-Score:", macro_f1)

print("Micro Precision:", micro_precision)
print("Micro Recall:", micro_recall)
print("Micro F1-Score:", micro_f1)

Macro Precision: 0.9552085583707138
Macro Recall: 0.9516628303833637
Macro F1-Score: 0.9517023161113864
Micro Precision: 0.9516628303833637
Micro Recall: 0.9516628303833637
Micro F1-Score: 0.9516628303833637


In [None]:
mcc = matthews_corrcoef(true_labels, ensemble_predictions)
print("Matthews Correlation Coefficient:", mcc)

Matthews Correlation Coefficient: 0.9506084499589846


The combine result is not as good as the individual best performing model. This is expected as the **ResNet model is significantly better than the CNN model**