# Results

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.preprocessing import StandardScaler

# import functions used in the different notebooks of the course project
%run Tools.ipynb

----
## Important :

Because Github file size limitation, I was not able to commit the random forest model that is more 100 MB large. In consequence, it is not possible to execute this notebook. The test accuracy and predictions are collected from the saved models...

----

In this notebook, we will :

1. For each of the previously built and saved models, we will compute their accuracy and predictions on the same test set used across all the notebooks. 

2. We will select the best and/or my favorite model and predict the labels on the final test set for which we don't know the true target values.

3. Because I am curious, we will also build a basic meta model (most frequent predicted class) from the predictions of each models and look if we can improve predictions by using them together (or some of them). 

### Load overfeat and pixels data

In [2]:
# load the datasets
overfeat, pixels, labels, names = load_data()

# check shapes
print('Overfeat shape:', overfeat.shape)
print('Pixels shape:', pixels.shape)
print('Labels shape:', labels.shape)

Overfeat shape: (5000, 4096)
Pixels shape: (5000, 3072)
Labels shape: (5000,)


### Extract the same test set than in the previous notebooks

In [3]:
# split the train/test sets (4000/1000 stratified split)
X_overfeat_train, X_overfeat_test, y_train, y_test = split_data_stratified(overfeat, labels)
X_pixels_train, X_pixels_test, _, _ = split_data_stratified(pixels, labels)

print('Overfeat test shape:', X_overfeat_test.shape, y_test.shape)
print('Pixels test shape:', X_pixels_test.shape, y_test.shape)

Overfeat test shape: (1000, 4096) (1000,)
Pixels test shape: (1000, 3072) (1000,)


### Rescale data for the neuronal networks

Neuronal networks need some data pre-processing : 

In [4]:
# rescale data for the fully-connected neuronal network
scaler = StandardScaler()
scaler.fit(X_overfeat_train)
X_overfeat_test_rescaled = scaler.transform(X_overfeat_test)

# rescale data for the convolutional neuronal network
X_pixels_test_rescaled = (X_pixels_test - 128) / 255
X_pixels_test_rescaled = X_pixels_test_rescaled.reshape(-1, 32, 32, 3)

### Get test accuracy and predictions for each model

In [5]:
# model filenames on disk
filenames = {
    'k-NN': 'saved_best_models/knn_model.npy',
    'decision tree': 'saved_best_models/decision_tree_model.npy',
    'random forest': 'saved_best_models/random_forest_model.npy',
    'svm linear': 'saved_best_models/svm_linear_model.npy',
    'svm rbf': 'saved_best_models/svm_rbf_model.npy',
    'logistic': 'saved_best_models/logistic_regression_model.npy',
    'fc nn': 'saved_best_models/fcnn/fcnn_model.ckpt',
    'cnn': 'saved_best_models/cnn/cnn_model.ckpt'
}

test_accuracies = {}
features = []
predictions = []
for key, filename in filenames.items():
    
    features.append(key)
    
    # accuracy evaluation on test data
    if key == 'fc nn':
        # delete the current graph
        tf.reset_default_graph()

        # import the graph from the file
        graph = tf.train.import_meta_graph(filename + '.meta')

        # Load the network from file
        with tf.Session() as sess:
        
            # load trained variables
            graph.restore(sess, filename)
    
            # evaluate test accuracy
            test_accuracy = sess.run('Accuracy:0', feed_dict={
                'X:0': X_overfeat_test_rescaled,
                'Y:0': y_test,
                'Training:0': False
            })
            
            # get predictions
            model_predictions = sess.run('Predictions:0', feed_dict={
                'X:0': X_overfeat_test_rescaled,
                'Training:0': False
            })
            
    elif key == 'cnn':
        # delete the current graph
        tf.reset_default_graph()

        # import the graph from the file
        graph = tf.train.import_meta_graph(filename + '.meta')

        # Load the network from file
        with tf.Session() as sess:
        
            # load trained variables
            graph.restore(sess, filename)
    
            # evaluate test accuracy
            test_accuracy = sess.run('Accuracy:0', feed_dict={
                'X:0': X_pixels_test_rescaled,
                'Y:0': y_test,
                'Training:0': False
            })
            
            # get predictions
            model_predictions = sess.run('Predictions:0', feed_dict={
                'X:0': X_pixels_test_rescaled,
                'Training:0': False
            })
    else:
        # load model
        model = np.load(filename, allow_pickle=True).item(0)
        model_predictions = model.predict(X_overfeat_test)
        test_accuracy = model.score(X_overfeat_test, y_test)
    
    # store test accuracy
    test_accuracies[key] = test_accuracy
    
    # store model predictions
    predictions.append(model_predictions.reshape(-1, 1))

# Build a dataframe with the predictions from each model
predictions = np.concatenate(predictions, axis=1)
df_predictions = pd.DataFrame(predictions, columns=features)

# Collect test accuracies in a dataframe
df_test_accuracy = pd.DataFrame.from_dict({'test accuracy': test_accuracies}).sort_values(by='test accuracy', ascending=False)

INFO:tensorflow:Restoring parameters from saved_best_models/fcnn/fcnn_model.ckpt
INFO:tensorflow:Restoring parameters from saved_best_models/cnn/cnn_model.ckpt


In [6]:
# show test accuracy of each model
df_test_accuracy

Unnamed: 0,test accuracy
svm rbf,0.842
fc nn,0.838
cnn,0.834
svm linear,0.83
logistic,0.828
random forest,0.781
k-NN,0.78
decision tree,0.666


It is of course exactly the same test accuracy presented in the different notebooks but it's great to have the models usable as a piece of software if we want. Which one to choose ?

First, our image classifier use case clearly don't force us to choose a model that is good to explain why and how things works. Decision tree are good to understand the logic behind the results but here it is not required and the results a far behind. k-NN and random forest return good results but there is no real good reason to select them as our best model if best prediction accuracy is the goal. 

Note that if the use case of the model would be to find similar images, k-NN would be a good and intuitive model to work with and my choice because accuracy would not be the most important factor. Proposing enough similar images to a user would suffice to create a good app functionality.

So, it remains the 80%+ models, with similar accuracy and discutable ovefitting for some of them, even if cross-validation was done as a safety belt and help us to tune the different hyperparameters. Note that the convolutional neuronal network is very different from the other models because it is the only one that use the pixels data as its data source and it is a great feature. We can find images on google, rescale them to 32x32 pixels and use our model with hope to have good predictions. All other models can't do that because they are based on the OverFeat convolutional neural network. At least, some development will be required to link the trained OverFeat CNN model with our own models to get some new images predictions.

From the overfeat data, it is the svm rbf model that gived us the best results with a test accuracy identical to the validation accuracy predicted by cross-validation. It was a time consuming model to tune and there is more than 9% difference between the training and validation accuracy. Our two SVM models are strong and would be good choices.

One of my favorite model is the logistic regression model because we obtained a good test accuracy, identical to the validation accuracy predicted by cross-validation but with less overfitting, less than 3% difference with the training accuracy. Moreover, logistic regression models are probabilistic classifiers and it allows to know for each images the probabilities to belong to each class. It's a nice feature because it can be used to tune/understand the model and it offers nice app functionality opportunities. Depending the use case, the client who needs it, I would choose this model because its possible to explain, prove and try to solve misclassifications. 

### Compute predictions on the final cifar_test data

In [7]:
# load cifar_test data
with np.load('cifar4-test.npz', allow_pickle=False) as npz:            
    overfeat_final = npz['overfeat'].astype(np.float32)
    pixels_final   = npz['pixels'].astype(np.float32)
    
    # rescale pixels data for the CNN model
    pixels_final_rescaled = (pixels_final - 128) / 255
    pixels_final_rescaled = pixels_final_rescaled.reshape(-1, 32, 32, 3)

In [8]:
# Get CNN predictions
tf.reset_default_graph()
graph = tf.train.import_meta_graph('saved_best_models/cnn/cnn_model.ckpt.meta')
with tf.Session() as sess:
    graph.restore(sess, filename)        
    final_predictions = sess.run('Predictions:0', feed_dict={
        'X:0': pixels_final_rescaled,
        'Training:0': False
    })
    
# Save CNN predictions
np.save('test-predictions.npy', final_predictions)

INFO:tensorflow:Restoring parameters from saved_best_models/cnn/cnn_model.ckpt


In [9]:
# Get logistic regression predictions
model = np.load(filenames['logistic'], allow_pickle=True).item(0)
final_predictions = model.predict(overfeat_final)

# Save logistic regression predictions
np.save('test-predictions-logistic.npy', final_predictions)

## Optional - Most-frequent predicted class meta model

In [10]:
# show some predictions
df_predictions.head(10)

Unnamed: 0,k-NN,decision tree,random forest,svm linear,svm rbf,logistic,fc nn,cnn
0,2,2,2,2,3,3,2,2
1,2,3,3,2,2,2,2,3
2,2,3,2,2,1,2,2,1
3,0,0,0,0,0,0,0,0
4,2,2,2,2,2,2,3,2
5,0,0,0,0,0,0,0,1
6,2,1,2,2,2,2,2,2
7,0,0,0,0,0,0,0,0
8,0,1,2,0,0,0,0,2
9,1,0,1,1,1,1,1,1


Interesting to see that it seems not rare to have 3 differents predicted classes if we ask all the models.
We will drop the decision tree model from our meta model because its accuracy is far behind and trees are already represented by the random forest model.

In [11]:
# Get the most frequent predicted class for each image
most_frequent_predictions = np.zeros_like(y_test)
for i in range(y_test.shape[0]):
    most_frequent_predictions[i] = np.argmax(np.bincount(df_predictions.drop(['decision tree'], axis=1).values[i]))

In [12]:
# Get accuracy
accuracy = (most_frequent_predictions == y_test).sum() / y_test.shape[0]
print('Most-frequent predicted class model accuracy:', accuracy)

Most-frequent predicted class model accuracy: 0.85


Using predictions done by different models looks an interesting way to improve accuracy. This approach is rough and naive but it is clear that there are opportunities to improve accuracy with multiple models... 