
# Analysis of the prediction results

The aim of this notebook is to retrieve which samples in the training data were wrongly classified.
For a subset of the mis-classfied results, it shows the inputs fed into the model and the models' output certainty,
so as to allow visualisation of the process and potentially pinpoint what caused the issue.

## Retrieval of misclassified samples

In [None]:
import os
import numpy as np
import pandas as pd
from src.data_processing.MNIST import transform_to_trio_MNIST, prepare_for_model_training
from src.models.max_mnist_predictor import MaxMNISTPredictor
from src.models.mnist_predictor import get_model
from src.evaluate_MNIST_models import train_model as train_model_MNIST
from src.evaluate_trio_mnist import train_model as train_model_trio_MNIST
from src.util.fileio import load_pkl_file, load_training_labels, show_image, load_model
from src.config import data_path, training_images_file, training_labels_file_name, retrain_models, models_path, \
    MNIST_PIXEL

analysis_model = "CNN"
analysis_dataset = "MNIST"

# here, train the model if not already trained
if not retrain_models:
    try:
        model = get_model(analysis_model, (MNIST_PIXEL, 3 * MNIST_PIXEL, 1) if analysis_dataset == "TRIO" else (MNIST_PIXEL, MNIST_PIXEL, 1))
        model_path = os.path.join(models_path, analysis_model + "_" + analysis_dataset + ".h5")
        load_model(model_path, model)
        model.summary()
    except:
        print("\tThe model file cannot be found at " + model_path + " so it will be retrained.")
        if analysis_dataset == "TRIO":
            model = train_model_trio_MNIST(analysis_model, generate_results=False)
        else:
            model = train_model_MNIST(analysis_model, analysis_dataset)
else:
    if analysis_dataset == "TRIO":
        model = train_model_trio_MNIST(analysis_model, generate_results=False)
    else:
        model = train_model_MNIST(analysis_model, analysis_dataset)

# Once model is found, predict on train data
training_images_file_path = os.path.join(data_path, training_images_file)
training_labels_file_path = os.path.join(data_path, training_labels_file_name)
x_train = load_pkl_file(training_images_file_path)
y_train = load_training_labels(training_labels_file_path)

# Predict output
print("\tPredicting data to model: " + analysis_model)
if analysis_dataset == "TRIO":
    x_train_trio = transform_to_trio_MNIST(x_train)
    x_train_trio = prepare_for_model_training(x_train_trio)
    y_predicted = model.predict(x_train_trio).argmax(axis=1)
else:
    y_predicted = MaxMNISTPredictor(model).predict_max_num(x_train)

# print the wrongly classified samples
df = pd.DataFrame()
df["actual"] = y_train
df["predicted"] = y_predicted
incorrect = df[df["actual"] != df["predicted"]]
print("There is a total of " + str(incorrect.shape[1]) + " incorrect predictions")
print(incorrect)

Using TensorFlow backend.








Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.

	The model file cannot be found at ../models\CNN_MNIST.h5 so it will be retrained.
	Training model CNN with dataset MNIST
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_8 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
batch_normalization_8 (Batch (None, 26, 26, 32)        128       
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 24, 24, 32)        9248      
_________________________________________________________________
batch_normalization_9 (Batch (None, 24, 24, 32)        128       
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 12, 12, 32)        25632     
______________________________

EXPLAIN WHAT WE SEE

## Subset input analysis

Now, from the gathered predictions, print the original images of the first N wrongly classified samples.
Also, depending on which model was chosen, print the processed image that was fed to the model.

In [None]:
if analysis_dataset == "MNITS":
    # print the 3 unique images

elif analysis_dataset == "TRIO":    
    #print the combines images

EXPLAIN WHAT SEE


## Subset output analysis

From the gathered predictions, show the model's evaluation on all classifications of of the first N wrongly classified samples.

In [None]:
# .predict vs .score

EXPLAIN