## Implementation of Final Solution

- model has a score of ~95% on the mixed test data which includes MRI scans from three different datasets.
- model has a missed positive rate of 0.81% to give extra caution.
- chance of getting a false positive is ~10%.
- code can be copied from here to recreate the model.
- all model files should be located within an application under the same names.
- in order to run model all you need to install is python 3.x, tensorflow and numpy.
- model takes about a quarter of second to run and needs image data with dimensions: 
    >image_height = 128
    >image_width = 128

In [28]:
import os
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import numpy as np
import matplotlib.pyplot as plt

from time import time
from tensorflow.keras.models import Sequential
from collections import defaultdict
from tensorflow.keras.layers import LeakyReLU

- Models from the two compatible datasets
- Not necessary to include Model 1D but can be added for high amount of Bias against false negatives

In [4]:
model1A = keras.models.load_model('./model_1A') #no augmentation or dropout - accuracy 95.41%
model1B = keras.models.load_model('./model_1B') # high number or neurons, epochs and augmentation - accuracy 94.6%
model1C = keras.models.load_model('./model_1C') # no dropout and limited augmentation - accuracy 96.13%
# model1D = keras.models.load_model('./model_1D') # 76.15% - bias to reduce false negatives
model1E = keras.models.load_model('./model_1E') # leaky relu model - accuracy 96.64%



- Models for the Dataset which includes diagnosis

In [5]:
model2A = keras.models.load_model('./model_2A/') # with augmentation low dropout - accuracy 98.4%
model2B = keras.models.load_model('./model_2B/') # high number of neurons and layer with augmentation low dropout - accuracy 97.56%
model2C = keras.models.load_model('./model_2C/') #leaky relu low dropout with data augmentation - accuracy 98.86%
model2D = keras.models.load_model('./model_2D/') # leaky relu higher dropout with augmentation - accuracy 97.33%



- seperation model

In [14]:
seperation_model = keras.models.load_model('./Model3_Seperation/') # Accuracy: 78.14% - low epochs low dropout

In [7]:
import statistics
from statistics import mode

In [11]:
YN_class_names = ['no', 'yes']
WD_class_names = ['glioma', 'meningioma', 'no', 'pituitary']
model_names = ['model_A_lrod', 'model_B_wd']

In [23]:
def Ensemble_Model_1(img_array):
    predictions_model1A = model1A.predict(img_array, verbose=None)
    CLASS_model1A = YN_class_names[np.argmax(predictions_model1A)]
    
    predictions_model1B = model1B.predict(img_array, verbose=None)
    CLASS_model1B = YN_class_names[np.argmax(predictions_model1B)]
    
    predictions_model1C = model1C.predict(img_array, verbose=None)
    CLASS_model1C = YN_class_names[np.argmax(predictions_model1C)]
    
    predictions_model1E = model1E.predict(img_array, verbose=None)
    CLASS_model1E = YN_class_names[np.argmax(predictions_model1E)]
    
    #predictions_model1D = model1D.predict(img_array, verbose=None)
    #CLASS_model1D = YN_class_names[np.argmax(predictions_model1D)]    
    
    if "yes" in [CLASS_model1A, CLASS_model1B, CLASS_model1C, CLASS_model1E]: #CLASS_model1D CAN BE added if preferred
        return "yes"
    else:
        return "no"

In [24]:
def Ensemble_Model_2(img_array):
    predictions_model2A = model2A.predict(img_array, verbose=None)
    CLASS_model2A = WD_class_names[np.argmax(predictions_model2A)]
    
    predictions_model2B = model2B.predict(img_array, verbose=None)
    CLASS_model2B = WD_class_names[np.argmax(predictions_model2B)]
    
    predictions_model2C = model2C.predict(img_array, verbose=None)
    CLASS_model2C = WD_class_names[np.argmax(predictions_model2C)]
    
    predictions_model2D = model2D.predict(img_array, verbose=None)
    CLASS_model2D = WD_class_names[np.argmax(predictions_model2D)]

    
    if "glioma" in [CLASS_model2A, CLASS_model2B, CLASS_model2C, CLASS_model2D]:
        return "yes"
    elif "meningioma" in [CLASS_model2A, CLASS_model2B, CLASS_model2C, CLASS_model2D]:
        return "yes"
    elif "pituitary" in [CLASS_model2A, CLASS_model2B, CLASS_model2C, CLASS_model2D]:
        return "yes"
    #elif "yes" in [CLASS_model2A, CLASS_model2B, CLASS_model2C, CLASS_model2D]:
    #    return "yes" - CAN ADD IN FOR FINAL TEST
    else:
        return "no"

In [25]:
def final_model(img_array):
    prediction_models = seperation_model.predict(img_array, verbose=None)
    predicted_model = model_names[np.argmax(prediction_models)]
    if predicted_model == "model_A_lrod":
        return Ensemble_Model_1(img_array)
    else:
        return Ensemble_Model_2(img_array)

In [17]:
batch_size = 64
image_height = 128
image_width = 128

In [26]:
%%time

image_path = "./Testing_YesNo/no/no50.jpg"

img = tf.keras.utils.load_img(
    image_path, target_size=(image_height, image_width)
)

img_array = tf.keras.utils.img_to_array(img)
img_array = tf.expand_dims(img_array, 0) # Create a batch

prediction = final_model(img_array)

print(f"Predciton: \t {prediction}")

Predciton: 	 no
CPU times: total: 78.1 ms
Wall time: 255 ms


- wall time is under a second and the answer is correct

## Testing model on mixed data

In [27]:
%%time


# Create a dictionary to store false prediction counts for each class
false_predictions = defaultdict(int)

# Define the class names based on the folder names
class_names = sorted(os.listdir('./Testing_YesNo'))  # Replace with the path to your dataset folder

# Create a list to store true labels and predicted labels
true_labels = []
predicted_labels = []

# Specify the path to the test dataset folder
test_data_folder = './Testing_YesNo/'  # Replace with the path to your test dataset folder

# Iterate through the test dataset
for class_name in class_names:
    class_folder = os.path.join(test_data_folder, class_name)
    if not os.path.isdir(class_folder):
        continue

    for image_filename in os.listdir(class_folder):
        image_path = os.path.join(class_folder, image_filename)

        # Load and preprocess the image
        img = tf.keras.utils.load_img(image_path, target_size=(image_height, image_width))
        img_array = tf.keras.utils.img_to_array(img)
        img_array = tf.expand_dims(img_array, 0)

        # Make predictions
        predicted_class = final_model(img_array)

        # Append true and predicted labels to the lists
        true_labels.append(class_name)
        predicted_labels.append(predicted_class)
        
        # Check if the prediction is correct
        if predicted_class != class_name:
            false_predictions[class_name] += 1

# Calculate accuracy
correct_predictions = [true == pred for true, pred in zip(true_labels, predicted_labels)]
accuracy = sum(correct_predictions) / len(correct_predictions)


# Print accuracy
print(f"Accuracy: {accuracy * 100:.2f}%")

def getpercent(numerator, denominator):
    return (numerator / denominator) * 100

for class_name, wrong_count in false_predictions.items():
    class_folder = os.path.join(test_data_folder, class_name)
    files = os.listdir(class_folder)
    total_class_count = len(files)
    print(f"{class_name}: {wrong_count} / {total_class_count}")
    print("rate of failure: \t", getpercent(wrong_count, total_class_count))




Accuracy: 95.57%
no: 87 / 855
rate of failure: 	 10.175438596491228
yes: 11 / 1356
rate of failure: 	 0.8112094395280236
CPU times: total: 3min 23s
Wall time: 11min 11s


- very good score considiring the seperation model had below 80% accuracy
- false negatives less than 1%
- in reality most scans will probably not be positive so score is slightly biased but is effective at catching the tumors