# Machine Learning 2 - Neural Networks

In this lab, we will use simple Neural Networks to classify the images from the simplified CIFAR-10 dataset. We will compare our results with those obtained with Decision Trees and Random Forests.

Lab objectives
----
* Classification with neural networks
* Influence of hidden layers and of the selected features on the classifier results

In [2]:
from lab_tools import CIFAR10, evaluate_classifier, get_hog_image
        
dataset = CIFAR10('../../extern_data/CIFAR10/')

  "class": algorithms.Blowfish,


Pre-loading training data
Pre-loading test data


We will use the *[Multi-Layer Perceptron](http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier)* implementation from scikit-learn, which is only available since version 0.18. You can check which version of scikit-learn is installed by executing this :

In [3]:
import sklearn
print(sklearn.__version__)

1.2.2


If you have version 0.17 or older, please update your scikit-learn installation (for instance, with the command *pip install scikit-learn==0.19.1* in the terminal or Anaconda prompt)

## Build a simple neural network

* Using the [MLPClassifier](http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html) from scikit-learn, create a neural network with a single hidden layer.
* Train this network on the CIFAR dataset.
* Using cross-validation, try to find the best possible parameters.

In [7]:
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score

clf = MLPClassifier()
clf.fit(dataset.train['hog'], dataset.train['labels'])
pred = clf.predict(dataset.test['hog'])
accuracy = accuracy_score(dataset.test['labels'], pred)
print(accuracy)



0.8163333333333334


In [4]:
from sklearn.model_selection import GridSearchCV
from sklearn.neural_network import MLPClassifier

param_grid = {
    "hidden_layer_sizes": [(50), (100), (150), (200)]
}

clf = MLPClassifier()

grid_search = GridSearchCV(estimator=clf, param_grid=param_grid, cv=5, scoring='accuracy')
grid_search.fit(dataset.train['hog'], dataset.train['labels'])

best_params = grid_search.best_params_
best_score = grid_search.best_score_

print(best_params)
print(best_score)



{'hidden_layer_sizes': 200}
0.7939333333333334




In [5]:
print(grid_search.cv_results_)

{'mean_fit_time': array([16.00017409, 23.08753862, 30.40776048, 36.89085026]), 'std_fit_time': array([0.40875333, 0.57447755, 0.18316765, 0.46095789]), 'mean_score_time': array([0.00587821, 0.00911751, 0.00949354, 0.01020584]), 'std_score_time': array([0.00317304, 0.00262466, 0.00335694, 0.00317811]), 'param_hidden_layer_sizes': masked_array(data=[50, 100, 150, 200],
             mask=[False, False, False, False],
       fill_value='?',
            dtype=object), 'params': [{'hidden_layer_sizes': 50}, {'hidden_layer_sizes': 100}, {'hidden_layer_sizes': 150}, {'hidden_layer_sizes': 200}], 'split0_test_score': array([0.77533333, 0.785     , 0.79533333, 0.79366667]), 'split1_test_score': array([0.777     , 0.79633333, 0.79533333, 0.79566667]), 'split2_test_score': array([0.76633333, 0.78033333, 0.784     , 0.78566667]), 'split3_test_score': array([0.766     , 0.779     , 0.788     , 0.79333333]), 'split4_test_score': array([0.772     , 0.78766667, 0.79733333, 0.80133333]), 'mean_test_scor

## Add hidden layers to the network.

Try to change the structure of the network by adding hidden layers. Using cross-validation, try to find the best architecture for your network.

In [6]:

from sklearn.model_selection import GridSearchCV
from sklearn.neural_network import MLPClassifier

param_grid = {
    "hidden_layer_sizes": [(50, 50, 50), (100, 100), (200,)]
}

clf = MLPClassifier()

grid_search_multiple_layers = GridSearchCV(estimator=clf, param_grid=param_grid, cv=5, scoring='accuracy')
grid_search_multiple_layers.fit(dataset.train['hog'], dataset.train['labels'])

best_params = grid_search_multiple_layers.best_params_
best_score = grid_search_multiple_layers.best_score_

print(best_params)
print(best_score)

print(grid_search_multiple_layers.cv_results_)



{'hidden_layer_sizes': (200,)}
0.7968
{'mean_fit_time': array([27.57118535, 34.63277979, 37.3298131 ]), 'std_fit_time': array([0.12998891, 0.49652325, 0.29712745]), 'mean_score_time': array([0.00574555, 0.00835958, 0.00952802]), 'std_score_time': array([0.00297056, 0.0055931 , 0.00504929]), 'param_hidden_layer_sizes': masked_array(data=[(50, 50, 50), (100, 100), (200,)],
             mask=[False, False, False],
       fill_value='?',
            dtype=object), 'params': [{'hidden_layer_sizes': (50, 50, 50)}, {'hidden_layer_sizes': (100, 100)}, {'hidden_layer_sizes': (200,)}], 'split0_test_score': array([0.76766667, 0.76933333, 0.80066667]), 'split1_test_score': array([0.78033333, 0.77466667, 0.79566667]), 'split2_test_score': array([0.75533333, 0.76566667, 0.79166667]), 'split3_test_score': array([0.742     , 0.77666667, 0.791     ]), 'split4_test_score': array([0.76433333, 0.78033333, 0.805     ]), 'mean_test_score': array([0.76193333, 0.77333333, 0.7968    ]), 'std_test_score': array

