# Applying Convolutional Neural Networks

We will start out by building a simple CNN to see how the foundations work, then we'll look into improving our methods.

We will be using the Keras library which provides us with the backbone of our neural network, including methods we will use for the process. Learn more about Keras here https://keras.io

In [1]:
import keras

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


Now, we will import the methods to use to build the CNN. Let's start with Sequential, this will be used to initialize our model for adding layers later.

In [2]:
from keras.models import Sequential

Speaking of layers, why don't we bring them in right now. Here are the 'learning' layers we will use, and just to recap what they are:

Conv2D is our convolutional layer, which will apply the covolutional process of producing the dot product of our input image matrix with a submatrix of the same image which we define. We will add an activation to this process too in order to make this linear process more non-linear.

MaxPooling2D will get the most important(max) weights from our convolved process and 'pool' them together into a smaller matrix.

Dropout will give some pooled data a probability of 'dropping out' a.k.a not being included into the next layer. This is key in preventing the network from overfitting the data.

In [3]:
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.layers import Dropout

Our classify layers will combine our learnings into a network, applying the fundamental process of neural networks. These layers include:

Flatten will 'flatten' our data from the learning process into a singular vector, which will be the input for the network.

Dense will create the network using the flattened data, and will apply an activation to each layer in the neural net.


In [4]:
from keras.layers import Flatten, Dense

Now that we have all of our layers, let's define the problem we will work on. Convolutional Neural Networks are typically used for types of data processing involving data that can be parsed in chunks to identify features, like image and audio. In our example, we will be looking at one of the most popular applications of CNNs, the MNIST handwriting database.  

# Problem: Classifying Handwritten Numbers

<img src="mnist_example.png">

We will be creating a network to read in images like this and assign them a classification of what number they are. Each image is 28x28 pixels, with each pixel corresponding to a greyscale value from 0-255. Our goal is to create a neural network that will work out the features of each image, such as the forks on the number 4 <img src="mnist_four_highlight.png"> 

The network will then use these features to learn how to classify the numbers, and we will be training the network with over 60,000 labeled handwritten numbers.

Step 1: Getting the data started

Let's bring in the dataset and some tools to help us manage the data

In [5]:
#dataset containing the training and testing sets
from keras.datasets import mnist

#we will use this to turn some integers into a binary class matrix and convert to one-hot
from keras.utils import np_utils, to_categorical

#we will use the Adam gradient algorithm, one of the fastest optimizers for CNNs
from keras.optimizers import Adam

#we will hold data in np arrays
import numpy as np

Let's load in the training and testing data. Our training set contains 60,000 images and the testing set contains 10,000 images. Both sets have corresponding labels to their images as well.

In [10]:
# load data
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()

The most important thing to do before passing in data into the CNN is making sure the data is actually passable. For that reason, we must format our data. Let's start with the images. Currently the images are our training set are in an array of shape (60000, 28, 28), being 60000 samples, each 28 pixels wide and 28 pixels high. Our Keras Conv2D layer expects a sort of 'depth' aspect for our image, and that depth will be our color channel. Colored images typically have 3 color channels,(Red, Green, Blue), but since we are using greyscale images, we will just use 1 channel. Let's define the shape of our images.

In [11]:
num_of_training_images = training_images.shape[0]
num_of_test_images = test_images.shape[0]
num_of_color_channels = 1
pixel_width, pixel_height = 28, 28

#image shapes
training_image_shape = (num_of_training_images, num_of_color_channels, pixel_width, pixel_height)
test_image_shape = (num_of_test_images, num_of_color_channels, pixel_width, pixel_height)
print(training_image_shape)

(60000, 1, 28, 28)


We will reshape our image sets and turn the data into a type we can perform numerical operations on. It is always perferable to choose a type with higher precision, so we will go with float32.

In [12]:
training_images = training_images.reshape(training_image_shape).astype('float32')
test_images = test_images.reshape(test_image_shape).astype('float32')

Now that our data are floats, we can normalize it by dividing by the max possible value in our color channel, 255. This will turn our data into values between 0 and 1 and will enable us to use these precise numbers in evaluating possible features later on.

In [13]:
training_images /= 255
test_images /= 255

We have now prepared our images for the CNN, now we will look at the labels. The labels are a 1D array containing integers 0-9, and it is not very efficient for our neural network to have to categorize all of these different classes when performing operations, so we will 'binarize' the labels using one-hot encoding using the to_categorical method. To learn more about one-hot: https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/

In [14]:
training_labels = to_categorical(training_labels)
test_labels = to_categorical(test_labels)

We have now prepared our data for the CNN! Let's now apply it to the actual neural network.

Step 2: Form the CNN model

Initialize the model for which we will add layers to

In [15]:
model = Sequential()

Now we will add layers in accordance to the convolution process described earlier
1. Convolve the data using a 5x5 convolve window scanning the image, givng 32 output filters with a ReLU activation (ReLU turns negative values to 0, or gets value if positive)
2. Pool the maximum convolved values 
3. Dropout some values, in this case we will drop 1/4th of the units

*We set padding to same to keep the length of inputs to outputs for the convolution and pooling layers

In [16]:
model.add(Conv2D(32, (5, 5), padding="same", input_shape=(num_of_color_channels, pixel_width, pixel_height), activation='relu'))
model.add(MaxPooling2D(padding="same"))
model.add(Dropout(0.25))

Now we will add layers for the classification process described earlier
1. Flatten the data to be fed into the neural network
2. Set some ReLU to the nodes, and we are outputting 4 times the amount from our convolved process.
3. Get the highest probable classification from the final output, which has 10 outputs corresponding to each class 0-9, using softmax

In [17]:
model.add(Flatten())
model.add(Dense(128, kernel_initializer="normal", activation='relu'))
model.add(Dense(10, activation='softmax'))

Compile all the layers together. Since our output, like our labels, will categorical we will use the categorical crossentropy function to compute our loss error.

In [18]:
model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=['accuracy'])

Now let's get to training! We will train our model using the training data defined earlier. We are training over 60k images so it should take around 2 minutes for the first epoch, aka the number of times the network passes through the training data.


In [20]:
model.fit(training_images, training_labels, epochs=9, batch_size=150, verbose=1)

Epoch 1/9
Epoch 2/9
Epoch 3/9
Epoch 4/9
Epoch 5/9
Epoch 6/9
Epoch 7/9
Epoch 8/9
Epoch 9/9


<keras.callbacks.History at 0x182ea60ef0>

During our training, you can see that we've received around 87% accuracy with just 1 epoch, giving 10 epochs can increase the accuracy to around 98%!

Now that we trained, let's test our model

In [21]:
# Final evaluation of the model
evaluation = model.evaluate(test_images, test_labels, verbose=1)

print("We got " + "{:.1%}".format(evaluation[1]) + " accuracy in our testing set.")
print("Our error rate was " + "{:.1%}".format(evaluation[0]) + ".")

We got 98.6% accuracy in our testing set.
Our error rate was 3.9%.


Wow! That is pretty accurate for a CNN with such few layers. Now that you see how these Convolutional Neural Networks work, here is a challenge for you.

# Challenge

We've just looked at the MNIST handwriting database, now let's look at a compartively similar set, the Fashion MNIST database. <img src="fashion_mnist.png">

This set has the same dimensionality as the MNIST handwriting dataset (60000 training images, 10 classes, 10000 test images, 28x28 pixel images etc..) What would happen if you ran this dataset through our CNN? Would you expect a higher or lower classification accuracy than the handwriting set and why? What would you have to change about the CNN to get the results you desire? Go ahead and load fashion_mnist and run it through the network to confirm your findings.

If you would like more review about convoluted neural networks, here are some reference sources I used when building this tutorial:

https://www.youtube.com/watch?v=FTr3n7uBIuE&t=2431s Siraj Ravel - Convolutional Neural Networks - The Math of Intelligence
http://colah.github.io/posts/2014-07-Conv-Nets-Modular/ Intuitive explanation of CNNs        
https://www.kaggle.com/bugraokcu/cnn-with-keras/code A fantastic application of CNN to this problemset