### Framework testing Introduction.

**Welcome to the Framework introduction**
In this Notebook we will follow a few steps to create a testing environment for an approach for image recognition called ensembly learning, which we will be using to filter data from a big dataset into a smaller known possible individuals.

To initialyse this Framework, please be sure you have the Framework folder and its contents:
* framework validation folder.
* Models Folder.
* Full Classes model folder.

#### Framework validation
This is a Folder containing one picture to represent all different individuals, there are 2000 different picture as a sample of each class. These pictures were not trained with and will be used as our validation of the models and this framework.

#### Models Folder.
This Folder contains the trained models for the classes we will be working with, there are a total of 100 models trained with the same parameters for 10 epochs and also a dictionary of dictionaries that will be used to label each model trained to a class name.

#### Full Class Model Folder
This FOlder contains a single model trained for all 2000 classes with 100 epochs, this model will be used as comparation of results with our Ensembly learning approach.


In [1]:
import os
import tkinter as tk
from tkinter import filedialog
import numpy as np



#### Cell 1

We start by loading a few libraries that we will be using initially, Not loading all at once here, so we can follow a though process and avoid confusion of the concept for the creation of this idea.

* os - Will be used to operate with our operating system path mostly, so all paths will be standardized properly in each computer loading this notebook

* tkinter - Will be used to create dialog windows for the user to be able to load the framework in their machine, this will setup the relative paths to all files we will be using in this demonstration.

* numpy - Will be used to deal with out dictionaries and work with images as arrays for further processing.

##### Please note that the dialog box might not show on top some times depending on your operational system, you will be able to find it using "alt + tab" or any similar way to browse trought applications running in your operational system.

In [2]:
#Define the window
root = tk.Tk()
root.withdraw()

# Create a separate dialog window, place it on top of your screen and name the window as "Select Models Folder"
dialog = tk.Toplevel(root)
dialog.lift()
dialog.attributes('-topmost', True)
dialog.withdraw()
#The window created will look for directories to be selected
models_folder = filedialog.askdirectory(title="Select Models Folder")

# Create a separate dialog window, place it on top of your screen and will look for a file to be selected.
dialog = tk.Toplevel(root)
dialog.lift()
dialog.attributes('-topmost', True)
dialog.withdraw()
image_path = filedialog.askopenfilename(parent=dialog)

#normalyse the path as we'd have to use this for another relative path.
models_folder = os.path.normpath(models_folder)
#Save the image name for further comparison in the future
image_name = os.path.basename(image_path)
image_name = os.path.splitext(image_name)[0]
#path for the dictionary file relative to the Models folder
labels_file = os.path.join(models_folder, "labels.npy")

# Display the results, showing the path for the Model folder, The name of the image and the path for the dictionary file.
print("INDIVIDUAL LOADED is " + image_name)
print("MODELS FOLDER LOADED IN  " + models_folder)
print("Dictionary name updated as "+ labels_file)

INDIVIDUAL LOADED is 222
MODELS FOLDER LOADED IN  C:\Users\aliss\OneDrive\Área de Trabalho\last Semester\Industry Project\dataset\Digiface\FRAMEWORK\Models
Dictionary name updated as C:\Users\aliss\OneDrive\Área de Trabalho\last Semester\Industry Project\dataset\Digiface\FRAMEWORK\Models\labels.npy


### Cell 2

After running this cell, we will have loaded in your system the paths for all we will be working with.
Firstly a window will open for the user to select the Models folder, then to select one picture from our framework validation.

It will save the paths relative to the image selected, the Model folder and also the Labels we need to identify the results in all models.

It shows the output for the paths in your system for all these files and folder.

In [4]:
from PIL import Image


# Resize the image to match the input size expected by the models
image = Image.open(image_path)
image = image.resize((112, 112))

# Convert the image to RGB mode if it's not already
if image.mode != "RGB":
    image = image.convert("RGB")

# Convert the image to an array and normalize the pixel values
image_array = np.array(image) / 255.0

# Add an extra dimension to the image array to match the expected input shape of (None, 112, 112, 3)
image_array = np.expand_dims(image_array, axis=0)

print(image_array.shape)  # Verify the shape of the image array



(1, 112, 112, 3)


### Cell 2

We will load a library to load the image

* PIL - is responsible to load images and also perform some operations necessary for the image preprocessing.

After running this cell, The image will be loaded in the system in a variable, preprocessed and also an array for the image will be create and also preprocessed in order to be in the shape expected by the models trained.

The output will show the image array shape so we can see if the format, this is the format accepted by the models.

In [5]:
# Load the label dictionary
label_dict = np.load(labels_file, allow_pickle=True).item()

# Get the total number of keys in the dictionary
total_keys = len(label_dict.keys())

# Iterate over the keys
for key in label_dict.keys():
    value = label_dict[key]
    
print(total_keys)


100


### Cell 3

The Labels dictionary is loaded from the path we had saved.
to show its loaded properly, we display as an output the total of keys.
we are expecting 100 dictionaries, one for each model trained.

In [7]:
import sys

# Calculate the memory size of the label_dict (our dictionary of dictionaries.)
label_dict_size = sys.getsizeof(label_dict)

# Get the file size in bytes
file_size = os.path.getsize(labels_file)

# Display the size in memory
print("Memory Size of label_dict:", label_dict_size, "bytes")

# Display the file size in disk
print("Disc Size of label_dict.npy:", file_size, "bytes")


Memory Size of label_dict: 4696 bytes
Disc Size of label_dict.npy: 33041 bytes


### Cell 4

Here we show the sizes from the file in memory and in disk, This is to show the memory used for this method to map individuals is very interesting due to its memory use being low, even even the file size in disc would be much greater, its expecting to have a 4 digits memory size, even if the metadata in this file was weighting Gigabytes in disk.

* sys - This library will use the operational system to perform the operatios of scal and get size in memory and in disk.

In [None]:
import win32api
from keras.models import load_model

#load all files that are our models ('h5') and create a dictionary named models
model_files = sorted([file for file in os.listdir(models_folder) if file.endswith(".h5")])
models = {}

#Iterate tought each file extracting file names, paths and the model metadata an saving in the dictionary.
for model_file in model_files:
    model_path = os.path.join(models_folder, model_file)
    short_path = win32api.GetShortPathName(model_path)
    model = load_model(short_path, compile=False)
    model_name = os.path.splitext(model_file)[0]  # Extract the file name without the extension to be our key
    models[model_name] = model


### Cell 5

With this cell we will load all our models in a dictionary that will hold all model name and its metadata.

* wind32api - This library was important to create relative paths for loading and solve the problem with character encoding in some systems that could cause problems loading the models.
* keras.models - this library will perform the reading of the files and save in a format we will be able to use with other keras library to perform our tasks further.

In [19]:
#create a dictionary of best results
best_results = {}

# iterate tought all models, doing predictions in each models from the image selected
# select the label correspondent to the prediction, save it or update it in the correspondent model name.
for model_name, model in models.items():
    
    prediction = model.predict(image_array)
    predicted_label = np.argmax(prediction)
    accuracy = prediction[0][predicted_label]
    label = label_dict[model_name][predicted_label]
    
    # Check if model is already present in best results
    if model_name not in best_results:
        best_results[model_name] = (accuracy, label)
    else:
        # Update best result if current accuracy is higher
        best_accuracy, _ = best_results[model_name]
        if accuracy > best_accuracy:
            best_results[model_name] = (accuracy, label)

# Display the best results
for model_name, (accuracy, label) in best_results.items():
    print(f"Model name: {model_name}, Accuracy: {accuracy}, Label: {label}")



Model name: batch_1, Accuracy: 0.9956525564193726, Label: 1001
Model name: batch_10, Accuracy: 0.9959709048271179, Label: 116
Model name: batch_100, Accuracy: 0.6309670805931091, Label: 988
Model name: batch_11, Accuracy: 0.9981791973114014, Label: 1180
Model name: batch_12, Accuracy: 0.9987255930900574, Label: 1196
Model name: batch_13, Accuracy: 0.7034862041473389, Label: 1230
Model name: batch_14, Accuracy: 0.8203438520431519, Label: 1242
Model name: batch_15, Accuracy: 0.9537598490715027, Label: 1262
Model name: batch_16, Accuracy: 0.7583553791046143, Label: 128
Model name: batch_17, Accuracy: 0.9359960556030273, Label: 1294
Model name: batch_18, Accuracy: 0.5355360507965088, Label: 1314
Model name: batch_19, Accuracy: 0.8710424304008484, Label: 1330
Model name: batch_2, Accuracy: 0.9038170576095581, Label: 1015
Model name: batch_20, Accuracy: 0.5058407187461853, Label: 1350
Model name: batch_21, Accuracy: 0.8590993881225586, Label: 1374
Model name: batch_22, Accuracy: 0.9875842928

### Cell 6

In this cell we are performing the predictions for each model, we save the best results for each model in a dictionary with the accuracy and the label correspondent to this accuracy, then display all these results at the end with all best predictions.



In [20]:
# Select the best accuracy number
best_model = max(best_results, key=lambda x: best_results[x][0])
best_accuracy, best_label = best_results[best_model]

# Display the best accuracy number, label, model name, and image name
print("Best Model Name:", best_model)
print("Best Accuracy:", best_accuracy)
print("Label:", best_label)
print("Image Name:", image_name)


Best Model Name: batch_86
Best Accuracy: 0.9999958
Label: 746
Image Name: 200


### Cell 7

This cell shows the first findings.
The best result accuracy is the models will most likely not be the same as the individual chosen. showing this is not a precision identifyer of a unique individual in 2000 diferent individuals.

The output displays the model that has the best accuracy result, how much is this accuracy and the label correspondent to this accuracy.
And the image name for the individual tested against all models, so we can know if its correct or not.

In [21]:
# Find the model corresponding to the image name

model_name_for_image = None

for model_name, (accuracy, label) in best_results.items():
    if label == image_name:
        model_name_for_image = model_name
        break

# Check if the model exists and display the accuracy
if model_name_for_image:
    accuracy_for_image, label_for_image = best_results[model_name_for_image]
    print(f"Model name: {model_name_for_image}, Accuracy: {accuracy_for_image}, Label: {label_for_image}")
else:
    print(f"No model found for image: {image_name}")

Model name: batch_56, Accuracy: 0.9959650039672852, Label: 200


### Cell 8

In this cell we check if the model that is actually correspondent to the image trained shows in our results at all.
If so, It retrieves the model name, its accuracy and the label name and show as an output.


In [12]:
# Compare the maximum, second-best, and minimum accuracies
accuracies = [accuracy for accuracy, _ in best_results.values()]
best_accuracy_all_models = max(accuracies)
second_best_accuracy_all_models = sorted(accuracies, reverse=True)[1]
worst_accuracy_all_models = min(accuracies)

# Calculate the percentage differences
accuracy_for_image = best_results.get(model_name_for_image)
if accuracy_for_image:
    max_difference_percentage = (best_accuracy_all_models - accuracy_for_image[0]) / best_accuracy_all_models * 100
    second_max_difference_percentage = (second_best_accuracy_all_models - accuracy_for_image[0]) / second_best_accuracy_all_models * 100
    min_difference_percentage = (accuracy_for_image[0] - worst_accuracy_all_models) / worst_accuracy_all_models * 100

    print(f"Maximum accuracy difference: {max_difference_percentage:.5f}%")
    print(f"Second-best accuracy difference: {second_max_difference_percentage:.5f}%")
    print(f"Minimum accuracy difference: {min_difference_percentage:.5f}%")
else:
    print(f"No accuracy found for model: {model_name_for_image}")



Maximum accuracy difference: 0.08328%
Second-best accuracy difference: 0.08270%
Minimum accuracy difference: 256.64675%


### Cell 9

In this cell we make a few comparissons between the actual individual values and the first, the second best and worst accuracies calculated.
The output display the diference in percentage for all these. 

Its expected the diference to the best being much smaller than the diference to the "worst"


In [13]:
# Count the number of results with accuracy equal to or greater than the tested image
count_high_accuracy = sum(accuracy >= accuracy_for_image[0] for accuracy in accuracies)
print(f"Number of results with accuracy >= {accuracy_for_image[0]}: {count_high_accuracy}")




Number of results with accuracy >= 0.999163031578064: 8


### Cell 10

More comparissons, here we check with position the result found would be in accuracy.
This means the framework filtered this individual to the a precision to be "this number" over 2000 diferent individuals, showing this works as a filter, some times even a precision Image filter.

In [29]:
# Create a separate dialog window
dialog = tk.Toplevel(root)
dialog.lift()
dialog.attributes('-topmost', True)
dialog.withdraw()
# Open the "Select the Single model file" dialog and saves the path to it
single_model = filedialog.askopenfilename(parent=dialog)


### Cell 11

Extra comparisons, this cell will ask you to load the file for the model that was trained with 2000 classes and 100 epochs.
will open a window to find the file located in the folder "full class model folder"

In [30]:
# save paths and load the model
singleModel_path = win32api.GetShortPathName(single_model)
single_model = load_model(singleModel_path, compile=False)

# Perform prediction on the image using the model
predictions = single_model.predict(image_array)

# Get the top 10 results
top_results = np.argsort(predictions[0])[-10:][::-1]  # Get indices of top 10 predictions

# Display the top 10 results
print(f"Accuracy of results for '{image_name}':")
for idx in top_results:
    accuracy = predictions[0][idx]
    print(f"Result: {accuracy}")

Accuracy of results for '38':
Result: 0.00054666877258569
Result: 0.000546460272744298
Result: 0.0005463571287691593
Result: 0.0005447774892672896
Result: 0.000544664217159152
Result: 0.0005443018162623048
Result: 0.0005439831875264645
Result: 0.0005438109510578215
Result: 0.0005432708421722054
Result: 0.0005423048860393465


### Cell 12

This compares the image with the model that was trained for all 2000 classes.
as expected, the accuracy was much smaller and even the diference between the probabilities was not great, showing the fact the models return big accuracy values for many diferent images is consistent also in models trained with all classes too.