# Pollenating Insects classification

## Context - aim
The aim is to build an image classification model allowing to predict species of insects from pictures. 

The dataset consists in 20348 pictures of insects from different species gathered from the SPIPOLL project and labeled. There are 18 different classes each one corresponding to a different insect specie. Each picture is a 64x64 colored image, which makes a total of 64×64×3=12288 features per picture. 

*Reference* : http://www.spipoll.org/

## Image recognition and neural networks

Over the past few years, neural nets have proven to be very efficient as regards image classification. As an example, "basic" multilayer perceptrons can yield very good results in terms of classification errors on the well-known handwritten digits [MNIST dataset](http://yann.lecun.com/exdb/mnist/) (ref : [D. C. Ciresan et al. (2010)](https://arxiv.org/pdf/1003.0358.pdf))

*To get familiar with neural networks (with a nice tutorial on the handwritten digits classification problem) : * http://neuralnetworksanddeeplearning.com/chap1.html


Object (or insect...) recognition using "real life" images can however prove to be tricky, and this for many reasons. A few are listed below :
- The separation between the object and its background is not necessarily obvious
- Several pictures of a same object can actually look quite different the one from the other. For example, the object's location in the image or the illumination can vary, which means the classification model needs to be invariant under certain transformations (translational symmetry for example)
- Efficient computer vision requires models that are able to exploit the 2D topology of pixels, as well as locality.


Because of their particular properties, convolutional neural networks (CNNs) allow to address the issues listed above.


### Convolutional neural networks

By construction, CNNs are well suited for image classification :
- from one convolutional layer (CL) to the next one, only a few units are connected together, which allows local treatment of subsets of pixels
- parameter sharing in one given CL contributes to translational invariance of the model
- In practice, the two constraints listed above reduce drastically the number of model parameters to be computed, and then allow to train quite complex models in a reasonable time.

*Some useful reference to gain knowledge of CNNs : * 
http://cs231n.github.io/convolutional-networks/


A basic CNN consists in successions of convolutional layers (CL) and pooling layers (PL), the latter allowing to reduce the number of parameters to be computed in the network. Those successions of CLs and PLs allow to perform feature extraction. For image classification, the output layer is a fully connected NN layer with a number of units equal to the number of classes. The output layer activation is a softmax, so that the i$^{th}$ output unit activation is consistent with the probability that the image belongs to class i.

It's also common to see in a CNN, the CLs and PLs being combined with some rectification (non-linearities) and normalization layers that can drastically improve the classification accuracy ([Jarrett et al. (2009)](http://cs.nyu.edu/~koray/publis/jarrett-iccv-09.pdf))



## Building CNNs with Keras

Below are loaded some useful libraries for building, training and evaluating neural nets.
- [Keras](https://keras.io/) is a python library running either on [Tensorflow](https://www.tensorflow.org/) or [Theano](http://deeplearning.net/software/theano/). The following pieces of codes are valid for a Tensorflow implementation. [Here are some instructions to install Tensorflow](https://github.com/tensorflow/tensorflow#download-and-setup). As training neural nets can be quite computationally costly, it is recommended to install the gpu version of tensorflow (obviously, it's possible only if you have a dedicated GPU!).


- In what's next we'll use some methods that are implemented in the well-known machine learning library [scikit-learn](http://scikit-learn.org/stable/). In particular, the methods cross_val_score() and GridSearchCV() will be used, respectively to apply some unit tests to the models and to perform grid searches on model hyperparameters. For those functions to be called on Keras models, those latter will be wrapped into classes that are "compatible" with scikit-learn. [Here is some useful tutorial to build scitkit-learn wrappers for estimators](http://danielhnyk.cz/creating-your-own-estimator-scikit-learn/).

In [None]:
from __future__ import print_function
import time

import numpy as np

from sklearn.base import BaseEstimator

from sklearn.preprocessing import LabelEncoder

from sklearn.cross_validation import StratifiedShuffleSplit
from sklearn.cross_validation import cross_val_score
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.externals import joblib

from keras.models import Sequential
from keras.layers import Dense, Convolution2D, MaxPooling2D, AveragePooling2D, Flatten, Activation
from keras.callbacks import EarlyStopping
from keras.utils import np_utils


## Loading dataset
Remark : 10% of the examples are staged as a test set that will be used to evaluate classification accuracy with the model chosen from hyperparameter tuning.

In [None]:
f = np.load('train_64x64.npz')

X = f['X']
print('X shape : ', X.shape)
X_flat = X.reshape((20348,64*64*3))
print('X_flat shape : ', X_flat.shape)

y = f['y']
print(y.shape)
# encode class values
encoder = LabelEncoder()
encoder.fit(y)
encoded_y = encoder.transform(y)
print('y shape : ', encoded_y.shape)

X_train, X_test, y_train, y_test = train_test_split(X_flat, encoded_y, test_size=0.1, random_state=0, stratify=encoded_y)
print('Train set size : ', X_train.shape[0])
print('Test set size : ', X_test.shape[0])

## Pipeline
The pipeline to perform model selection is inspired from [Jarrett et al. (2009)](http://cs.nyu.edu/~koray/publis/jarrett-iccv-09.pdf).

### Model architecture

In this paper, the impact of the following model properties on object classification accuracy is investigated :
- number of convolutional layers (CL) needed to perform feature extraction
- type of pooling (PL) used (average pooling vs. max pooling)
- role of rectification layers (RL)


In the following, those criteria will be tested so as to find the model "architecture" that is best suited for our classification problem. This will be done by training different models (with different numbers of CLs, and varied types of PL / RL) and evaluating classification accuracy by cross-validation.


### Hyperparameter tuning

Once the model architecture is determined, some hyperparameters tuning is performed by using grid search. The concerned hyperparameters are :
- number of feature maps in CLs
- dimensions of feature maps in CLs
- dimensions of pooling matrices in PLs.


NB : In an ideal world the model "architecture" could also be tuned with grid search, together with the hyperparameters listed above. To avoid exploding the parameters space, grid search was however performed by focusing only on the number and dimensions of feature maps in convolutional layers, and the dimensions of the pooling matrices.

### Define a unit_test function to extract cross-validated score for model architecture selection

In [None]:
def unit_test(classifier, nb_iter=3):
    test_size = 0.2
    random_state = 15
    cv = StratifiedShuffleSplit(encoded_y, nb_iter,test_size=test_size,random_state=random_state)
    clf = classifier()
    scores = cross_val_score(clf, X=X_flat, y=encoded_y, scoring='accuracy', cv=cv)
    return scores

### Define a hyperparameter_optim function to perform grid search 

In [None]:
def hyperparameter_optim(classifier, params, cv=3):

    clf = GridSearchCV(classifier(), params, cv=cv, scoring='accuracy')
    clf.fit(X_train, y_train)

    print("Best parameters set found:")
    print(clf.best_params_)
    print()
    print("Grid scores:")
    means = clf.cv_results_['mean_test_score']
    stds = clf.cv_results_['std_test_score']
    for mean, std, params in zip(means, stds, clf.cv_results_['params']):
        print("%0.3f (+/-%0.03f) for %r"
                % (mean, std * 2, params))
    print()
    
    return clf

## Determining best model architecture

### "Basic" model with only one convolutional layer

#### Building corresponding classifier inheriting from sklearn.BaseEstimator
##### Default hyperparameters
- nb_filters = 32, filter_size = (3,3) in CL
- pool_size = (2,2) in PL
- nb_epochs = 10

##### Early stopping
- An early stopping condition based on the monitoring of the validation set accuracy is used so as to avoid overfitting and improve a bit the training time.




In [None]:
class Classifier(BaseEstimator):  

    def __init__(self, nb_filters=32, filter_size=3, pool_size=2):
        self.nb_filters = nb_filters
        self.filter_size = filter_size
        self.pool_size = pool_size
        
    def preprocess(self, X):
        X = X.reshape((X.shape[0],64,64,3))
        X = (X / 255.)
        X = X.astype(np.float32)
        return X
    
    def preprocess_y(self, y):
        return np_utils.to_categorical(y)
    
    def fit(self, X, y):
        X = self.preprocess(X)
        y = self.preprocess_y(y)
        
        hyper_parameters = dict(
        nb_filters = self.nb_filters,
        filter_size = self.filter_size,
        pool_size = self.pool_size 
        )
        
        print("FIT PARAMS : ")
        print(hyper_parameters)
        
        self.model = build_model(hyper_parameters)
        
        earlyStopping = EarlyStopping(monitor='val_loss', patience=1, verbose=1, mode='auto')
        self.model.fit(X, y, nb_epoch=10, verbose=1, callbacks=[earlyStopping], validation_split=0.1, 
                       validation_data=None, shuffle=True)
        return self

    def predict(self, X):
        X = self.preprocess(X)
        return self.model.predict_classes(X)

    def predict_proba(self, X):
        X = self.preprocess(X)
        return self.model.predict(X)
    
    def score(self, X, y):
        print(self.model.evaluate(self, X, y, batch_size=32, verbose=1, sample_weight=None))
        return self.model.evaluate(self, X, y, batch_size=32, verbose=1, sample_weight=None)

#### One CL combined to one PL (average pooling / no rectification)

In [None]:
def build_model(hp):
    net = Sequential()
    net.add(Convolution2D(hp['nb_filters'], hp['filter_size'], hp['filter_size'], border_mode='same', 
                          input_shape=(64,64,3)))
    net.add(AveragePooling2D(pool_size=(hp['pool_size'],hp['pool_size'])))
    net.add(Flatten())
    net.add(Dense(output_dim=18))
    net.add(Activation("softmax"))
    
    net.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])
    
    return net

print(unit_test(Classifier,nb_iter=3))

#### One CL combined to one PL (max pooling / no rectification)

In [None]:
def build_model(hp):
    net = Sequential()
    net.add(Convolution2D(hp['nb_filters'], hp['filter_size'], hp['filter_size'], border_mode='same', 
                          input_shape=(64,64,3)))
    net.add(MaxPooling2D(pool_size=(hp['pool_size'],hp['pool_size'])))
    net.add(Flatten())
    net.add(Dense(output_dim=18))
    net.add(Activation("softmax"))
    
    net.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])
    
    return net

print(unit_test(Classifier,nb_iter=3))

From the above we know that max pooling performs better than average pooling, as suggested in [Jarrett et al. (2009)](http://cs.nyu.edu/~koray/publis/jarrett-iccv-09.pdf).

This very basic first model is composed of one CL followed by one PL(max) for feature extraction, and a single fully-connected layer with softmax activation for the classification step. The cross-validated accuracy obtained with this model is $\sim$ 36%.

In the following, max pooling will systematically be used.

#### One CL combined to one PL (with sigmoid non-linearity)

In [None]:
def build_model(hp):
    net = Sequential()
    net.add(Convolution2D(hp['nb_filters'], hp['filter_size'], hp['filter_size'], border_mode='same', 
                          input_shape=(64,64,3)))
    net.add(Activation("sigmoid"))
    net.add(MaxPooling2D(pool_size=(hp['pool_size'],hp['pool_size'])))
    net.add(Flatten())
    net.add(Dense(output_dim=18))
    net.add(Activation("softmax"))
    
    net.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])
    
    return net

print(unit_test(Classifier,nb_iter=3))

#### One CL combined to one PL (with relu non-linearity)

In [None]:
def build_model(hp):
    net = Sequential()
    net.add(Convolution2D(hp['nb_filters'], hp['filter_size'], hp['filter_size'], border_mode='same', 
                          input_shape=(64,64,3)))
    net.add(Activation("relu"))
    net.add(MaxPooling2D(pool_size=(hp['pool_size'],hp['pool_size'])))
    net.add(Flatten())
    net.add(Dense(output_dim=18))
    net.add(Activation("softmax"))
    
    net.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])
    
    return net

print(unit_test(Classifier,nb_iter=3))

From the above we know that adding sigmoid non-linearities deteriorate the performances, whereas relu rectification improves the classification accuracy.

In what follows, relu activations will be used as rectification layers.

### Model with two convolutional layers

#### Building corresponding classifier inheriting from sklearn.BaseEstimator
##### Default hyperparameters
- nb_filters_1 = 32, filter_size_1 = (3,3) in 1st CL
- pool_size_1 = (2,2) in 1st PL
- nb_filters_2 = 32, filter_size_2 = (3,3) in 2nd CL
- pool_size_2 = (2,2) in 2nd PL
- nb_epochs = 10

##### Early stopping
- An early stopping condition based on the monitoring of the validation set accuracy is used so as to avoid overfitting and improve a bit the training time.




In [None]:
class Classifier(BaseEstimator):  

    def __init__(self, nb_filters_1=32, filter_size_1=3, pool_size_1=2,
                 nb_filters_2=32, filter_size_2=3, pool_size_2=2):
        self.nb_filters_1 = nb_filters_1
        self.filter_size_1 = filter_size_1
        self.pool_size_1 = pool_size_1
        self.nb_filters_2 = nb_filters_2
        self.filter_size_2 = filter_size_2
        self.pool_size_2 = pool_size_2
        
    def preprocess(self, X):
        X = X.reshape((X.shape[0],64,64,3))
        X = (X / 255.)
        X = X.astype(np.float32)
        return X
    
    def preprocess_y(self, y):
        return np_utils.to_categorical(y)
    
    def fit(self, X, y):
        X = self.preprocess(X)
        y = self.preprocess_y(y)
        
        hyper_parameters = dict(
        nb_filters_1 = self.nb_filters_1,
        filter_size_1 = self.filter_size_1,
        pool_size_1 = self.pool_size_1,
        nb_filters_2 = self.nb_filters_2,
        filter_size_2 = self.filter_size_2,
        pool_size_2 = self.pool_size_2
        )
        
        print("FIT PARAMS : ")
        print(hyper_parameters)
        
        self.model = build_model(hyper_parameters)
        
        earlyStopping = EarlyStopping(monitor='val_loss', patience=1, verbose=1, mode='auto')
        self.model.fit(X, y, nb_epoch=10, verbose=1, callbacks=[earlyStopping], validation_split=0.1, 
                       validation_data=None, shuffle=True)
        return self

    def predict(self, X):
        print("PREDICT")
        X = self.preprocess(X)
        return self.model.predict_classes(X)

    def predict_proba(self, X):
        X = self.preprocess(X)
        return self.model.predict(X)
    
    def score(self, X, y):
        print("SCORE")
        print(self.model.evaluate(self, X, y, batch_size=32, verbose=1, sample_weight=None))
        return self.model.evaluate(self, X, y, batch_size=32, verbose=1, sample_weight=None) 
    

#### Two CLs/PLs (no rectification layer)

In [None]:
def build_model(hp):
    net = Sequential()
    net.add(Convolution2D(hp['nb_filters_1'], hp['filter_size_1'], hp['filter_size_1'], border_mode='same', 
                          input_shape=(64,64,3)))
    net.add(MaxPooling2D(pool_size=(hp['pool_size_1'],hp['pool_size_1'])))
    net.add(Convolution2D(hp['nb_filters_2'], hp['filter_size_2'], hp['filter_size_2'], border_mode='same'))
    net.add(MaxPooling2D(pool_size=(hp['pool_size_2'],hp['pool_size_2'])))
    net.add(Flatten())
    net.add(Dense(output_dim=18))
    net.add(Activation("softmax"))
    
    net.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])
    
    return net

print(unit_test(Classifier,nb_iter=3))

#### Two CLs/PLs (with relu layers)

In [None]:
def build_model(hp):
    net = Sequential()
    net.add(Convolution2D(hp['nb_filters_1'], hp['filter_size_1'], hp['filter_size_1'], border_mode='same', 
                          input_shape=(64,64,3)))
    net.add(Activation("relu"))
    net.add(MaxPooling2D(pool_size=(hp['pool_size_1'],hp['pool_size_1'])))
    net.add(Convolution2D(hp['nb_filters_2'], hp['filter_size_2'], hp['filter_size_2'], border_mode='same'))
    net.add(Activation("relu"))
    net.add(MaxPooling2D(pool_size=(hp['pool_size_2'],hp['pool_size_2'])))
    net.add(Flatten())
    net.add(Dense(output_dim=18))
    net.add(Activation("softmax"))
    
    net.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])
    
    return net

print(unit_test(Classifier,nb_iter=3))

### Model architecture choice
- From the results of the tests listed above, we retain as best model architecture for feature extraction : CL/relu/PL(max)/CL/relu/PL(max).
- With this architecture, grid search will be performed to tune the number of filters in the CLs as well as their sizes, and the sizes of the pooling matrices.

## Hyperparameter optimization

### Using grid search to tune the number of filters, and the size of the filters / pooling matrices

In [None]:
def build_model(hp):
    net = Sequential()
    net.add(Convolution2D(hp['nb_filters_1'], hp['filter_size_1'], hp['filter_size_1'], border_mode='same', 
                          input_shape=(64,64,3)))
    net.add(Activation("relu"))
    net.add(MaxPooling2D(pool_size=(hp['pool_size_1'],hp['pool_size_1'])))
    net.add(Convolution2D(hp['nb_filters_2'], hp['filter_size_2'], hp['filter_size_2'], border_mode='same'))
    net.add(Activation("relu"))
    net.add(MaxPooling2D(pool_size=(hp['pool_size_2'],hp['pool_size_2'])))
    net.add(Flatten())
    net.add(Dense(output_dim=18))
    net.add(Activation("softmax"))
    
    net.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])
    
    return net

class Classifier(BaseEstimator):  

    def __init__(self, nb_filters_1=32, filter_size_1=3, pool_size_1=2,
                 nb_filters_2=32, filter_size_2=3, pool_size_2=2):
        self.nb_filters_1 = nb_filters_1
        self.filter_size_1 = filter_size_1
        self.pool_size_1 = pool_size_1
        self.nb_filters_2 = nb_filters_2
        self.filter_size_2 = filter_size_2
        self.pool_size_2 = pool_size_2
        
    def preprocess(self, X):
        X = X.reshape((X.shape[0],64,64,3))
        X = (X / 255.)
        X = X.astype(np.float32)
        return X
    
    def preprocess_y(self, y):
        return np_utils.to_categorical(y)
    
    def fit(self, X, y):
        X = self.preprocess(X)
        y = self.preprocess_y(y)
        
        hyper_parameters = dict(
        nb_filters_1 = self.nb_filters_1,
        filter_size_1 = self.filter_size_1,
        pool_size_1 = self.pool_size_1,
        nb_filters_2 = self.nb_filters_2,
        filter_size_2 = self.filter_size_2,
        pool_size_2 = self.pool_size_2
        )
        
        print("FIT PARAMS : ")
        print(hyper_parameters)
        
        self.model = build_model(hyper_parameters)
        
        earlyStopping = EarlyStopping(monitor='val_loss', patience=1, verbose=1, mode='auto')
        self.model.fit(X, y, nb_epoch=20, verbose=2, callbacks=[earlyStopping], validation_split=0.1, 
                       validation_data=None, shuffle=True)
        time.sleep(0.1)
        return self

    def predict(self, X):
        print("PREDICT")
        X = self.preprocess(X)
        return self.model.predict_classes(X)

    def predict_proba(self, X):
        X = self.preprocess(X)
        return self.model.predict(X)
    
    def score(self, X, y):
        print("SCORE")
        print(self.model.evaluate(self, X, y, batch_size=32, verbose=1, sample_weight=None))
        return self.model.evaluate(self, X, y, batch_size=32, verbose=1, sample_weight=None) 
    


params = {
    'nb_filters_1': [32,64],
    'filter_size_1': [3,6],
    'pool_size_1': [2,4],
    'nb_filters_2': [32,64],
    'filter_size_2': [3,6],
    'pool_size_2': [2,4]
}
clf = hyperparameter_optim(Classifier,params)

print("Detailed classification report:")
print()
y_true, y_pred = y_test, clf.predict(X_test)
print(classification_report(y_true, y_pred))

### Evaluating the best model's performances on the test set 

In [None]:
print("Best parameters set found:")
print(clf.best_params_)
print()
    
print("Detailed classification report:")
print()
y_true, y_pred = y_test, clf.predict(X_test)
print(classification_report(y_true, y_pred))

The average classification accuracy is 48% on the test set. As regards the most represented class, the precision goes up to 70%. Precision can drop to 0% for certain classes, but this actually concerns classes that are under-represented in the dataset. This issue might be adressed by artificially increasing the number of instances for those under-represented classes to get a dataset that would be more balanced. This can be done by applying some transformations to available images, such as translations, rotations or changing luminosity, for example.

## In a nutshell : Best model architecture and  hyperparameters

The retained model architecture consists of two successive steps of feature extraction, each one being composed of :
- a convolutional layer 
- a rectification layer with relu activation
- a pooling layer (max pooling)

The hyperparameters that gave the best performances are listed below : 
- nb_filters_1 = 64
- filter_size_1 = 3
- pool_size_1 = 2
- nb_filters_2 = 64
- filter_size_2 = 6
- pool_size_2 = 4

Interestingly, the filter and pooling matrices sizes are greater at the second stage, which means that the process of feature extraction passes consecutively through a "fine" step followed by a "coarse" step.

In a way, pooling corresponds to "losing" information, that's why intuitively the contrary (going from "coarse" to "fine" feature extraction) might be useless : a refined step would be pointless after having thrown away some information !


## Refining the classification step

Until now, we mainly focused on the feature extraction step to enhance the model's performances. In the above, the classification step simply consists in one fully-connected NN layer with softmax activation, which corresponds to a linear separation with respect to the output of the convolutional layers.

In the following, we propose to refine the classification step. To do so, we use the "best model" described above to extract features and then plug them into a more elaborated classifier.

In [None]:
class Classifier(BaseEstimator):  

    def __init__(self, nb_filters_1=64, filter_size_1=3, pool_size_1=2,
                 nb_filters_2=64, filter_size_2=6, pool_size_2=4):
        self.nb_filters_1 = nb_filters_1
        self.filter_size_1 = filter_size_1
        self.pool_size_1 = pool_size_1
        self.nb_filters_2 = nb_filters_2
        self.filter_size_2 = filter_size_2
        self.pool_size_2 = pool_size_2
        
    def preprocess(self, X):
        X = X.reshape((X.shape[0],64,64,3))
        X = (X / 255.)
        X = X.astype(np.float32)
        return X
    
    def preprocess_y(self, y):
        return np_utils.to_categorical(y)
    
    def fit(self, X, y):
        X = self.preprocess(X)
        y = self.preprocess_y(y)
        
        hyper_parameters = dict(
        nb_filters_1 = self.nb_filters_1,
        filter_size_1 = self.filter_size_1,
        pool_size_1 = self.pool_size_1,
        nb_filters_2 = self.nb_filters_2,
        filter_size_2 = self.filter_size_2,
        pool_size_2 = self.pool_size_2
        )
        
        print("FIT PARAMS : ")
        print(hyper_parameters)
        
        self.model = build_model(hyper_parameters)
        
        earlyStopping = EarlyStopping(monitor='val_loss', patience=1, verbose=1, mode='auto')
        self.model.fit(X, y, nb_epoch=20, verbose=1, callbacks=[earlyStopping], validation_split=0.1, 
                       validation_data=None, shuffle=True)
        time.sleep(0.1)
        return self

    def predict(self, X):
        print("PREDICT")
        X = self.preprocess(X)
        return self.model.predict_classes(X)

    def predict_proba(self, X):
        X = self.preprocess(X)
        return self.model.predict(X)
    
    def score(self, X, y):
        print("SCORE")
        print(self.model.evaluate(self, X, y, batch_size=32, verbose=1, sample_weight=None))
        return self.model.evaluate(self, X, y, batch_size=32, verbose=1, sample_weight=None) 
   

### Add one hidden layer + relu non-linearity 

In [None]:
def build_model(hp):
    net = Sequential()
    net.add(Convolution2D(hp['nb_filters_1'], hp['filter_size_1'], hp['filter_size_1'], border_mode='same', 
                          input_shape=(64,64,3)))
    net.add(Activation("relu"))
    net.add(MaxPooling2D(pool_size=(hp['pool_size_1'],hp['pool_size_1'])))
    net.add(Convolution2D(hp['nb_filters_2'], hp['filter_size_2'], hp['filter_size_2'], border_mode='same'))
    net.add(Activation("relu"))
    net.add(MaxPooling2D(pool_size=(hp['pool_size_2'],hp['pool_size_2'])))
    net.add(Flatten())
    net.add(Dense(output_dim=200))
    net.add(Activation("relu"))
    net.add(Dense(output_dim=18))
    net.add(Activation("softmax"))
    
    net.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])
    
    return net
    
print(unit_test(Classifier,nb_iter=3))

### Do we need more hidden layers ?

In [None]:
def build_model(hp):
    net = Sequential()
    net.add(Convolution2D(hp['nb_filters_1'], hp['filter_size_1'], hp['filter_size_1'], border_mode='same', 
                          input_shape=(64,64,3)))
    net.add(Activation("relu"))
    net.add(MaxPooling2D(pool_size=(hp['pool_size_1'],hp['pool_size_1'])))
    net.add(Convolution2D(hp['nb_filters_2'], hp['filter_size_2'], hp['filter_size_2'], border_mode='same'))
    net.add(Activation("relu"))
    net.add(MaxPooling2D(pool_size=(hp['pool_size_2'],hp['pool_size_2'])))
    net.add(Flatten())
    net.add(Dense(output_dim=200))
    net.add(Activation("relu"))
    net.add(Dense(output_dim=200))
    net.add(Activation("relu"))
    net.add(Dense(output_dim=18))
    net.add(Activation("softmax"))
    
    net.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])
    
    return net
 
print(unit_test(Classifier,nb_iter=3))

The classification accuracies aren't improved by the additionnal hidden layer. In the following, we stick to a classification step that includes only one hidden layer. We propose to use grid search to tune its number of hidden units.

### Tuning the number of units in the fully-connected hidden layer

In [None]:
class Classifier(BaseEstimator):  

    def __init__(self, nb_filters_1=64, filter_size_1=3, pool_size_1=2,
                 nb_filters_2=64, filter_size_2=6, pool_size_2=4, nb_hunits=200):
        self.nb_filters_1 = nb_filters_1
        self.filter_size_1 = filter_size_1
        self.pool_size_1 = pool_size_1
        self.nb_filters_2 = nb_filters_2
        self.filter_size_2 = filter_size_2
        self.pool_size_2 = pool_size_2
        self.nb_hunits = nb_hunits
        
    def preprocess(self, X):
        X = X.reshape((X.shape[0],64,64,3))
        X = (X / 255.)
        X = X.astype(np.float32)
        return X
    
    def preprocess_y(self, y):
        return np_utils.to_categorical(y)
    
    def fit(self, X, y):
        X = self.preprocess(X)
        y = self.preprocess_y(y)
        
        hyper_parameters = dict(
        nb_filters_1 = self.nb_filters_1,
        filter_size_1 = self.filter_size_1,
        pool_size_1 = self.pool_size_1,
        nb_filters_2 = self.nb_filters_2,
        filter_size_2 = self.filter_size_2,
        pool_size_2 = self.pool_size_2,
        nb_hunits = self.nb_hunits
        )
        
        print("FIT PARAMS : ")
        print(hyper_parameters)
        
        self.model = build_model(hyper_parameters)
        
        earlyStopping = EarlyStopping(monitor='val_loss', patience=1, verbose=1, mode='auto')
        self.model.fit(X, y, nb_epoch=20, verbose=1, callbacks=[earlyStopping], validation_split=0.1, 
                       validation_data=None, shuffle=True)
        time.sleep(0.1)
        return self

    def predict(self, X):
        print("PREDICT")
        X = self.preprocess(X)
        return self.model.predict_classes(X)

    def predict_proba(self, X):
        X = self.preprocess(X)
        return self.model.predict(X)
    
    def score(self, X, y):
        print("SCORE")
        print(self.model.evaluate(self, X, y, batch_size=32, verbose=1, sample_weight=None))
        return self.model.evaluate(self, X, y, batch_size=32, verbose=1, sample_weight=None) 
   

In [None]:
def build_model(hp):
    net = Sequential()
    net.add(Convolution2D(hp['nb_filters_1'], hp['filter_size_1'], hp['filter_size_1'], border_mode='same', 
                          input_shape=(64,64,3)))
    net.add(Activation("relu"))
    net.add(MaxPooling2D(pool_size=(hp['pool_size_1'],hp['pool_size_1'])))
    net.add(Convolution2D(hp['nb_filters_2'], hp['filter_size_2'], hp['filter_size_2'], border_mode='same'))
    net.add(Activation("relu"))
    net.add(MaxPooling2D(pool_size=(hp['pool_size_2'],hp['pool_size_2'])))
    net.add(Flatten())
    net.add(Dense(output_dim=hp['nb_hunits']))
    net.add(Activation("relu"))
    net.add(Dense(output_dim=18))
    net.add(Activation("softmax"))
    
    net.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])
    
    return net

params = {
    'nb_hunits': [100,200,300,400,500]
}
clf = hyperparameter_optim(Classifier,params)

print("Detailed classification report:")
print()
y_true, y_pred = y_test, clf.predict(X_test)
print(classification_report(y_true, y_pred))

The average model's accuracy has now increased up to 55%. Depending on the classes, precision and recall are quite heterogeneous : thay span from 0% to almost 100%.

To answer the question "are we satisfied with such performances ?", well, it depends on *what* aim we want to achieve here.

We can't say that this model is the best suited to classify accurately all pictures, independetely of the class they belong to. In particular, the model is very bad for under-represented classes. On the contrary, if the model is meant to detect in an efficient way all the instances of class 0, we can say that it does the job, as recall for this class is 98%. 