# Convolutional Neural Networks

In this notebook, we will try to get a general overview of CNNs and what can be done with them.
We will use the MNIST dataset.
At the end of the notebook as an extra side, you can also try to implement something similar by loading the CIFAR-10 dataset.

Please note that this notebook is not an advanced implementation of CNNs. It is just for you to learn ho to implement from scratch a simple CNN, without using any pre-trained network.

## MNIST Dataset

The MNIST dataset is a large database of handwritten digits. It contains 60,000 training images and 10,000 testing images

### Data Preparation

** Import the packages that you may need.**

In [11]:
from __future__ import absolute_import, division, print_function
import numpy as np
import keras
from keras.datasets import cifar10, mnist
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout, Flatten, Reshape
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from keras.utils import to_categorical
#from seansUtils.research import StatsCallback, ModelSummary
import pickle
from matplotlib import pyplot as plt
import seaborn as sns
plt.rcParams['figure.figsize'] = (15, 8)

%matplotlib inline

** Load the MNIST dataset.**

In [12]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

** Perform some data pre-processing on both input and labels. Hint: reshape the input with dimension (28,28,1)**

In [13]:
train_images = train_images.reshape(train_images.shape[0], 28, 28, 1).astype('float32')
test_images = test_images.reshape(test_images.shape[0], 28, 28, 1).astype('float32')
train_images /= 255
test_images /= 255
train_labels = to_categorical(train_labels, 10)
test_labels = to_categorical(test_labels, 10)

** Print the shape of the data and some sample to visualize them.**

In [14]:
# Print the Data
print('--- THE DATA ---')
print('train_images shape:', train_images.shape)
print(train_images.shape[0], 'train samples')
print(test_images.shape[0], 'test samples')
print(train_images.shape)
print(train_labels.shape)

--- THE DATA ---
train_images shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
(60000, 28, 28, 1)
(60000, 10)


## Vanilla CNN

This is the most basic CNN: you will have to build a convolutional neural network that is composed by 2 Convolutional layers and 2 Fully Connected layers. Use proper activation functions.

- Convolutional layer: 32 filters, 3x3;
- ReLU activation function;
- Convolutional layer: 32 filters, 3x3;
- ReLU activation function;
- Flatten;
- Fully Connected layer of size 128;
- ReLU activation function;
- Fully Connected layer of size 10;
- Softmax activation function;

** Set the number of batches and epochs. Without GPU please keep number of epochs under 10.**

In [15]:
# Parameters
batch_size = 64
epochs = 5

** Build the Vanilla CNN model. **

In [16]:
model_vanilla = Sequential()

# 1st Conv Layer
model_vanilla.add(Convolution2D(32, 3, 3, input_shape=(28, 28, 1)))
model_vanilla.add(Activation('relu'))

# 2nd Conv Layer
model_vanilla.add(Convolution2D(32, 3, 3))
model_vanilla.add(Activation('relu'))

# Fully Connected Layer
model_vanilla.add(Flatten())
model_vanilla.add(Dense(128))
model_vanilla.add(Activation('relu'))

# Prediction output Layer
model_vanilla.add(Dense(10))
model_vanilla.add(Activation('softmax'))

  after removing the cwd from sys.path.
  


** Get a summary of the model. **

In [17]:
model_vanilla.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_7 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
activation_8 (Activation)    (None, 26, 26, 32)        0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 24, 24, 32)        9248      
_________________________________________________________________
activation_9 (Activation)    (None, 24, 24, 32)        0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 18432)             0         
_________________________________________________________________
dense_3 (Dense)              (None, 128)               2359424   
_________________________________________________________________
activation_10 (Activation)   (None, 128)               0         
__________

** Configure the model with an optimizer and a loss. **

In [18]:
model_vanilla.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

** Train the model. **

In [19]:
model_vanilla.fit(train_images, train_labels, batch_size=batch_size, nb_epoch=epochs)



Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x1bdce4f3b38>

### CNN with Max Pooling and Dropout

Let's implement the same CNN as above but plus Max Pooling and Dropout.

**Build the new network with max pooling and dropout. You should think a little bit where Max Pooling and Dropout should be inserted. **

In [20]:
model_vanilla_pooling = Sequential()

# 1st Convolutional Layer
model_vanilla_pooling.add(Convolution2D(32, 3, 3, input_shape=(28, 28, 1)))
model_vanilla_pooling.add(Activation('relu'))

# 2nd Convolutional Layer
model_vanilla_pooling.add(Convolution2D(32, 3, 3))
model_vanilla_pooling.add(Activation('relu'))

# Max Pooling
model_vanilla_pooling.add(MaxPooling2D(pool_size=(2,2)))
    
# Dropout
model_vanilla_pooling.add(Dropout(0.25))

# Fully Connected Layer
model_vanilla_pooling.add(Flatten())
model_vanilla_pooling.add(Dense(128))
model_vanilla_pooling.add(Activation('relu'))
    
# More Dropout
model_vanilla_pooling.add(Dropout(0.5))

# Fully Connected Layer for Prediction
model_vanilla_pooling.add(Dense(10))
model_vanilla_pooling.add(Activation('softmax'))

  after removing the cwd from sys.path.
  


** Get a summary of the model. **

In [21]:
model_vanilla_pooling.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_9 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
activation_12 (Activation)   (None, 26, 26, 32)        0         
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 24, 24, 32)        9248      
_________________________________________________________________
activation_13 (Activation)   (None, 24, 24, 32)        0         
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 12, 12, 32)        0         
_________________________________________________________________
dropout_4 (Dropout)          (None, 12, 12, 32)        0         
_________________________________________________________________
flatten_3 (Flatten)          (None, 4608)              0         
__________

** Configure the network. **

In [22]:
model_vanilla_pooling.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

** Train the network. **

In [23]:
model_vanilla_pooling.fit(train_images, train_labels, batch_size=batch_size, nb_epoch=epochs)



Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x1bdcf928f28>

** Evaluate the model on the test data. **

In [24]:
test_loss, test_acc = model_vanilla_pooling.evaluate(test_images, test_labels)



** Print the test accuracy. **

In [25]:
print(test_acc)

0.9902


## Extra: More complex CNN with CIFAR-10

As an extra part, you can also load the CIFAR-10 dataset, perform a similar data pre-processing as the MNIST dataset and implement a proper CNN. In this case, the dataset consists of 60,000 32x32 color images in 10 classes, with 6,000 images per class. Therefore you will need a network that is a little bit deeper, with 4 convolution layer. 

This part is not guided as the previous one, it's up to you to start from scratch and try out the implementation. However the procduere is pretty similar.