# Introduction to deep learning using Keras

In this class we will explore some of the deep learning capabilities of the library Keras. For this, we will use the CIFAR-10 dataset

<img src="cifar_10.png" />


CIFAR-10, contains 32×32×3 coloured images: if we are to treat each channel of each pixel as an independent input to an MLP, each neuron of the first hidden layer adds ∼ 3000 new parameters to the model! The situation quickly becomes unmanageable as image sizes grow larger, way before reaching the kind of images people usually want to work with in real applications.


# Convolutional Neural Networks

Lets start by describing what is a convolutional neural network, an architecture that has achieved state of the art performance for image recognition.

## Convolutions

Convolutional neural networks (CNNs) are a specialized kind of neural network 
for processing data which has spatial correlation between the neighborhood data points which 
is also called as grid-like topology. 

<p>Enter the <em>convolution</em> operator. Given a two-dimensional image, $I$, and a small matrix, $K$ of size $h \times w$, (known as a <em>convolution kernel</em>), which we assume encodes a way of extracting an interesting image feature, we compute the convolved image, $I * K$, by overlaying the kernel on top of the image in all possible ways, and recording the sum of elementwise products between the image and the kernel:</p>
$$(I * K)_{xy} = \sum_{i=1}^h \sum_{j=1}^w {K_{ij} \cdot I_{x + i - 1, y + j - 1}}$$

<p>The images below show a diagrammatical overview of the above formula and the result of applying convolution (with two separate kernels) over an image, to act as an edge detector:</p>

![](convolve.png)

![](lena.jpg)

# Activation

May different activation functions are possible, but ReLUs are by far the most popular ones.


# Pooling


After application of the activation function, programmers may choose to apply a pooling layer. It is also referred to as a downsampling layer. In this category, there are also several layer options, with maxpooling being the most popular. This basically takes a filter (normally of size 2x2) and a stride of the same length. It then applies it to the input volume and outputs the maximum number in every subregion that the filter convolves around.

![](max_pool.png)

For a real example (note that the z dimension, the number of layers, remains unchanged in the pooling operation):

![](max_pooling_2.jpeg)

# Combining them into a neural network

The different convolution and pooling layers are then combined into a single architecture by composition. The last layer is different as it needs to generate a multi-class decision. It is commonly a softmax layer.

![](architecture.png)

## First steps with Keras

Keras is a high-level library for deep learning with an API modeled after scikit-learn. It makes use of Theano of Tensorflow beneath the scenes for the actual computations.

Keras can be installed from conda-forge. Uncomment and run the next cell to install using conda.

In [1]:
!conda install --yes --channel https://conda.anaconda.org/conda-forge keras tensorflow

Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /home/fabian/anaconda3:
#
keras                     2.0.6                    py36_0    conda-forge
tensorflow                1.3.0                    py36_0    conda-forge


In [2]:
%pylab inline

import keras
from keras.models import Sequential # basic class for specifying and training a neural network
from keras.datasets import cifar10 # subroutines for fetching the CIFAR-10 dataset
from keras.layers import Input, Conv2D, MaxPooling2D, Dense, Dropout, Flatten, Activation
import numpy as np


Populating the interactive namespace from numpy and matplotlib


Using TensorFlow backend.



The CNN relies on several hyperparameters. For the purposes of this tutorial, we will also stick to "sensible" hand-picked values for them, but do still keep in mind that later on I will introduce a more proper method for learning them:

 * The batch size, representing the number of training examples being used simultaneously during a single iteration of the gradient descent algorithm.
 * The number of epochs, representing the number of times the training algorithm will iterate over the entire training set before terminating.
 * The kernel sizes in the convolutional layers.
 * The pooling size in the pooling layers.
 * The number of kernels in the convolutional layers.
 * The dropout probability (we will apply dropout after each pooling, and after the fully connected layer).
 * The number of neurons in the fully connected layer of the MLP.




In [3]:
batch_size = 32   # in each iteration, we consider 32 training examples at once
num_epochs = 10   # we iterate 10 times over the entire training set
kernel_size = 3   # we will use 3x3 kernels throughout
pool_size = 2     # we will use 2x2 pooling throughout
conv_depth_1 = 20 # we will initially have 20 kernels per conv. layer...
conv_depth_2 = 50 # ...switching to 50 after the first pooling layer
hidden_size = 512 # the FC layer will have 512 neurons

In [4]:
# The data, shuffled and split between train and test sets:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
num_classes = np.unique(y_train).size
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

x_train shape: (50000, 32, 32, 3)
50000 train samples
10000 test samples


# The LeNet architecture

![](lenet.png)

The LeNet architecture is an classical Convolutional Neural Networks architecture.

LeNet is small and easy to understand — yet large enough to provide interesting results. Furthermore, it is able to run on the CPU.


In [None]:

model = Sequential()
model.add(Conv2D(conv_depth_1, (kernel_size, kernel_size), padding='same',
                 input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(conv_depth_2, (kernel_size, kernel_size), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(hidden_size))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))

We will now train the model on the data.

<img style="float: left; width: 50px; top: -20px" src="https://cdn1.iconfinder.com/data/icons/hawcons/32/700303-icon-61-warning-128.png" /> In Keras, before a model is fitted it needs to be "compiled".


In [None]:

# we need to choose the optimization method that we will use
# initiate RMSprop optimizer
model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

print(model.summary())

In [7]:
# # Let's train the model using RMSprop
# model.compile(loss='categorical_crossentropy',
#               optimizer=opt,
#               metrics=['accuracy'])

In [8]:


x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

model.fit(x_train, y_train,
              batch_size=batch_size,
              epochs=num_epochs,
              validation_data=(x_test, y_test), shuffle=True)

Train on 50000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fc46a7e0cf8>

In [None]:
# Score trained model.
scores = model.evaluate(x_test, y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

Test loss: 1.26771278038
Test accuracy: 0.5694


## Scikit-learn <=> Keras dictionary

```model.fit```  <-> ```model.compile``` followed by ```model.fit```

```model.predict``` <-> ```model.predict```

```model.score``` <-> ```model.evaluate(x)[1]```

---


# Scikit-learn  <=> Keras interoperability

It is possible to convert a Keras classifier into a scikit-learn estimator so it can be used in objects such as ```GridSearchCV``` or ```Pipeline```

In [None]:
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.grid_search import GridSearchCV

def make_model(dense_layer_sizes, filters, kernel_size, pool_size):
    '''Creates model comprised of 2 convolutional layers followed by dense layers

    dense_layer_sizes: List of layer sizes.
        This list has one number for each layer
    filters: Number of convolutional filters in each convolutional layer
    kernel_size: Convolutional kernel size
    pool_size: Size of pooling area for max pooling
    '''

    model = Sequential()
    model.add(Conv2D(filters, kernel_size,
                     padding='valid',
                     input_shape=x_train.shape[1:]))
    model.add(Activation('relu'))
    model.add(Conv2D(filters, kernel_size))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=pool_size))
    model.add(Dropout(0.25))

    model.add(Flatten())
    for layer_size in dense_layer_sizes:
        model.add(Dense(layer_size))
    model.add(Activation('relu'))
    model.add(Dropout(0.5))
    model.add(Dense(num_classes))
    model.add(Activation('softmax'))

    model.compile(loss='categorical_crossentropy',
                  optimizer='adadelta',
                  metrics=['accuracy'])

    return model

dense_size_candidates = [[32], [64], [32, 32], [64, 64]]
my_classifier = KerasClassifier(make_model, batch_size=32)
validator = GridSearchCV(my_classifier,
                         param_grid={'dense_layer_sizes': dense_size_candidates,
                                     # epochs is avail for tuning even when not
                                     # an argument to model building function
                                     'epochs': [3, 6],
                                     'filters': [8],
                                     'kernel_size': [3],
                                     'pool_size': [2]},
                         scoring='neg_log_loss',
                         n_jobs=1)
validator.fit(x_train, y_train)

print('The parameters of the best model are: ')
print(validator.best_params_)

# validator.best_estimator_ returns sklearn-wrapped version of best model.
# validator.best_estimator_.model returns the (unwrapped) keras model
best_model = validator.best_estimator_.model
metric_names = best_model.metrics_names
metric_values = best_model.evaluate(x_test, y_test)
for metric, value in zip(metric_names, metric_values):
    print(metric, ': ', value)




Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 2/3
Epoch 3/3
Epoch 2/3
Epoch 3/3
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 2/3
Epoch 3/3
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6
Epoch 2/3
Epoch 3/3

# Chihuahua vs Muffin

This is one of the most important and challenging problems of our time. For decades, worlds best minds have tried and failed. Will you be able to build a classifier to distinguish chihuahua from a muffin?


![](chihuahua_vs_muffin.jpg)

For this, I have prepared a dataset consisting of 100 train images of chihuahuas and muffins, 100 for the test set and 100 for the validation set that are in the current directory. You should extract it to the current directory (i.e. wherever this notebook lives).

In this case, since the dataset is quite big (images are larger). To avoid loading it into memory, we will use the ```ImageGenerator``` object to create a ```train_generator``` that will be passed to the model, except the model will be trained using ```fit_generator``` instead of ```fit```.

Note: the [```ImageGenerator```](https://keras.io/preprocessing/image/) is extremely versatile and has many useful features. Get to know it.

**Goal of the exercise**. Create a classifier that distinguishes Chihuahua vs Muffin. Compute and report score on validation set.

Hint: you have few data. You might want to do some [data augmentation](https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html).

In [None]:
from keras.preprocessing.image import ImageDataGenerator

batch_size = 16

train_datagen = ImageDataGenerator(
        rescale=1./255)

# this is the augmentation configuration we will use for testing:
# only rescaling
test_datagen = ImageDataGenerator(rescale=1./255)

validation_datagen = ImageDataGenerator(rescale=1./255)

# this is a generator that will read pictures found in
# subfolers of 'data/train', and indefinitely generate
# batches of augmented image data
train_generator = train_datagen.flow_from_directory(
        'train',  # this is the target directory
        target_size=(150, 150),  # all images will be resized to 150x150
        batch_size=batch_size,
        class_mode='binary')  # since we use binary_crossentropy loss, we need binary labels

# this is a similar generator, for validation data
test_generator = test_datagen.flow_from_directory(
        'test',
        target_size=(150, 150),
        batch_size=batch_size,
        class_mode='binary')

validation_generator = validation_datagen.flow_from_directory(
        'validation',  # this is the target directory
        target_size=(150, 150),  # all images will be resized to 150x150
        batch_size=batch_size,
        class_mode='binary')  # since we use binary_crossentropy loss, we need binary labels