# Evaluation of the model
The metric used to evaluate the model is the mean squared error between predicted and ground truth images. If you desire it, you could add further metrics in the training file in order to check also them. 

In the following cells I plot only one chart because I considered only 'mse' as stated above and I will show a visual evaluation of predicted images with respect to the real one.

In [None]:
from BIS_model import build_model
from train import execute_training
from utils import plot_history
from data import create_generators

from tensorflow.keras.optimizers import Adam
import numpy as np
import matplotlib.pyplot as plt

First of all I need to initialize a model and either train it or start from pre-trained weights. I will plot the chart of the history only if I will train again the model, otherwise no history is present. 

In [None]:
model = build_model(input_shape=(32, 32, 1), n_ch=128, blocks=3, conv_per_b=2)
pre_trained_model = True
pre_trained_weights = '../weights/best_weights.h5'

if pre_trained_model:
    model.load_weights(pre_trained_weights)
    model.compile(optimizer=Adam(learning_rate=1e-3), loss='mse', metrics=['mean_squared_error'])
else:
    history, metrics = execute_training(model, 'new_train', start_weights=None, weights_cp_path='../weights/cp_weights.h5')

In [None]:
if not pre_trained_model:
    plot_history(history, metrics, save_images=True)

## Metric evaluation

At this point I evaluate the model using the 'evaluate' function with the test generator and I compute the MSE average score on 20 thousand samples randomly created 10 times as suggested in the problem statement.    
I also compute the standard deviation in order to see how closer are the different runs.

In [None]:
# I need only the test generator
_, _, test_generator = create_generators(1, 1, 1024)

In [11]:
def compute_mse(model, test_generator):
    res = []
    for i in range(10):
        metrics = model.evaluate(test_generator, steps=20)
        res.append(metrics[1])
    return res

pred_res = compute_mse(model, test_generator)
print(f"The average value of MSE is: {np.mean(pred_res)}")
print(f"The standard deviation is: {np.std(pred_res)}")
print(f"The min value of MSE is: {np.min(pred_res)}")

The model reached a ***minimum*** MSE score of ***0.000296*** on a test set of 20 thousand samples.

The MSE average score on 20 thousand samples randomly created ten times by the test generator is **0.000299994**.      

The standard deviation between the different tests is **2.39e-06**.

## Visual evaluation

In [None]:
def visual_test(generator, model, n_images=1):
    x_test, y_test = next(generator)

    for i in range(n_images):
        print(f'\n --------- IMAGE {i+1} --------- \n')
        # Create the grid
        fig = plt.figure(figsize=(12, 8))
        grid = plt.GridSpec(2, 4, wspace=0.2, hspace=0.1)

        # Mixed image
        mixed_fig = fig.add_subplot(grid[0, 1])
        mixed_fig.axis('off')
        mixed_fig.set_title('Mixed image')
        mixed_fig.imshow(x_test[i], cmap='gray', interpolation='nearest')

        # Reconstructed image
        recons_fig = fig.add_subplot(grid[1, :2])
        recons_fig.axis('off')
        recons_fig.set_title('Reconstructed image')
        # Transform the test image into the right shape
        x, y = np.expand_dims(np.reshape(x_test[i], (32,32,1)), 0), y_test[i]   
        res = model.predict(x)
        recons_fig.imshow(np.reshape(res, (32,64)), cmap='gray', interpolation='nearest')
        
        # Groundtruth image
        orig_fig = fig.add_subplot(grid[1, 2:])
        orig_fig.axis('off')
        orig_fig.set_title('Groundtruth image')
        orig_fig.imshow(y, cmap='gray', interpolation='nearest')
        
        plt.show()

In [None]:
visual_test(test_generator, model, n_images=2)

The visual evaluation is not very meaningful because the model is able to split the mixed images also if the MSE value is much higher than the reached one.