# Convolutional Neural Networks

In the previous notebook we built and trained a simple model to classify ASL images. The model was able to learn how to correctly classify the training dataset with very high accuracy, but it did not perform nearly as well on validation dataset. This is called overfitting.

*Overfitting* is a behaviour of not generalizing well to non-training data (validation dataset). Here we will use a popular kind of model called a convolutional neural network that is especially good for reading images and classifying them.

## Loading and Preparing the Data

In [1]:
import tensorflow.keras as keras
import pandas as pd

In [2]:
# Load data from CSV files
train_df = pd.read_csv('Data/mnist/sign_mnist_train.csv')
valid_df = pd.read_csv('Data/mnist/sign_mnist_valid.csv')

In [3]:
# Separate out target values
y_train = train_df['label']
y_valid = valid_df['label']
del train_df['label']
del valid_df['label']

In [4]:
# Separate out image vectors
x_train = train_df.values
x_valid = valid_df.values

In [5]:
# Turn scalar targets into binary categories
num_classes = 24
y_train = keras.utils.to_categorical(y_train, num_classes)
y_valid = keras.utils.to_categorical(y_valid, num_classes)

In [6]:
# Normalize image data
x_train = x_train / 255
x_valid = x_valid / 255

## Reshaping Images for a CNN (Convolutiona Neural Network)

In [7]:
x_train.shape, x_valid.shape

((27455, 784), (7172, 784))

Individual pictures in our dataset are in the format of long lists of 784 pixels.

In this format we do not have all the information abut which pixels are near each other. Because of this, we can't apply convolutions that will detect featuers. Let's reshape our dataset so that they are in 28x28 pixel format. This will allow out convolutions to associate groups of pixels and detect important features.

Note: In the first convolutional layer of the model, we need to have not only the height and width of the image, but also the number of *color channels*. Our images are grayscale, so we will just have 1 channel.

That means that we need to convert the current shape `(274555, 784)` to `(27455, 28, 28, 1)`. As a convenience, we can pass the `reshape` method a `-1` for any dimension we wish to remain the same.

In [8]:
x_train = x_train.reshape(-1, 28, 28, 1)
x_valid = x_valid.reshape(-1, 28, 28, 1)

In [9]:
x_train.shape, x_valid.shape

((27455, 28, 28, 1), (7172, 28, 28, 1))

## Creating a Convolutional Model

Thsese days, many data scientist start their projects by borrowing model properties from a similar project. Assuming the problem is not totally unique, there is a great chance that people have created models that will perform well which are posted in online repositories like `TensorFlow Hub` and the `NGC Catalog`.

In [10]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (Dense, Conv2D, MaxPool2D, Flatten,
                                     Dropout, BatchNormalization)

In [11]:
model = Sequential()

In [12]:
model.add(Conv2D(75, (3, 3), strides=1, padding='same', 
                 activation = 'relu', input_shape=(28,28,1)))
model.add(BatchNormalization())
model.add(MaxPool2D((2,2), strides=2, padding='same'))
model.add(Conv2D(50, (3,3), strides=1, padding='same', activation='relu'))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(MaxPool2D((2,2), strides=2, padding='same'))
model.add(Conv2D(25, (3,3), strides=1, padding='same', activation='relu'))
model.add(BatchNormalization())
model.add(MaxPool2D((2,2), strides=2, padding='same'))
model.add(Flatten())
model.add(Dense(units=512, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(units=num_classes, activation='softmax'))

**Conv2D**

These are our 2D convolutional layers. Small kernels will go over the input image and detect features that are important for classification. Earlier convolutions in the model will detect simple features such as lines. Later convolutions will detect more complex features.

Conv2D layer:
```Python
model.add(Conv2D(75, (3,3), strides=1, padding='same', ...))
```
75 referes to the number of filters that will be learned. (3,3) refers to the size of those filters. Strides refer to the step size that the filter will take as it passes over the image. Padding refers to whether the output image that's created from the filter will match the size of the input image.

**BatchNormalization**

Like normalizin our inputs, batch normalization scales the values in the hidden layers to improve training.

**MaxPool2D**

Max pooling takes an image and essentially shrinks it to a lower resolution. It does this to help the model be robust to translation (objects moving side to side) and also makes our model faster.

**Dropout**

Dropout is a technique for preventing overfitting. Dropout randomly selects a subset of neurons and turns them off, so that they do not participate in forward or backward propagation in that particular pass. This helps to make sure that the network is robust and redundant and does not rely on any one area to come up with answers.

**Flatten**

Flatten takes the output of one layer which is multidimensional, and flattens it into a one-dimensional array. The ouput is calles a *feature vector* and will be connected to the final classification layer.

**Dense**

Our first dense layer (512 units) takes the feature vector as input and learns which features will contribute to a particular classification. The second dense layer (24 units) is the final classification layer that outputs our prediction.

## Summarizing the Model

Notice: This model has fewer trainable parameters than the model in the previous notebook.

In [13]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 28, 28, 75)        750       
_________________________________________________________________
batch_normalization (BatchNo (None, 28, 28, 75)        300       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 75)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 14, 14, 50)        33800     
_________________________________________________________________
dropout (Dropout)            (None, 14, 14, 50)        0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 14, 14, 50)        200       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 7, 7, 50)          0

## Compiling the Model

In [14]:
model.compile(loss='categorical_crossentropy', metrics=['accuracy'])

## Training the Model

In [15]:
model.fit(x_train, y_train, epochs=20, verbose=1, validation_data=(x_valid, y_valid))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7f8b997e5580>

We can see that this model is significantly improved than the model in previous notebook. The training accuracy is very high, and the validation accuracy has improved as well. This is a great result, as all we had to do was swap in a new model.

In this section we utilized several new kinds of layers to implement a CNN, which perforemd better that the more simple model used in the previous notebook.

In [16]:
# Clear GPU memory
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}