## ![](https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png) Convolutional Neural Networks I

### LEARNING OBJECTIVES
_By the end of this lesson, students should be able to:_
- Build convolutional neural networks in Keras.

We'll recreate a very similar neural network to the example provided at the end of the notes.

In [1]:
# 1. Import libraries and modules
import numpy as np
np.random.seed(123)  # for reproducibility

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from keras.datasets import mnist
 
# 2. Load pre-shuffled MNIST data into train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

Using TensorFlow backend.


Couldn't import dot_parser, loading of dot files will not be possible.


In [2]:
#  3. Preprocess our nput data.
X_train=X_train.reshape(X_train.shape[0],28,28,1) # to preserve the preserve the dimensions of the image so that we can pass a filter over the image. 
#So our data dimensions would be `(num_samples, 28, 28)`.
X_test=X_test.reshape(X_test.shape[0],28,28,1)
X_train =X_train.astype('float32')
X_test =X_test.astype('float32')
X_train/=255 # to change the values from 0-255 is changed to 0 and 1
X_test/=255 # 255 is how they code colors

In [4]:
#4.Preprocess the class labels.
y_train[0:5]

array([5, 0, 4, 1, 9], dtype=uint8)

In [5]:
#Convert individual numbers in y_train to categorical output.
Y_train=np_utils.to_categorical(y_train,10)
Y_test=np_utils.to_categorical(y_test,10)

In [6]:
Y_train[0:10]

array([[0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.]], dtype=float32)

60 parameters (in Conv2D layer)

each filter was 3x3. <-- 9 parameters
6 filters <-- 6 filters x 9 parameters per filter = 54 parameters

for each channel (a.k.a. result from filter), we have 1 bias parameter <-- 1 parameter per filter x 6 filters = 6 parameters

Answer: 54 parameters (within filters) + 6 parameters (bias) = 60 parameters

In [7]:
## 5. Define model architecture.

model = Sequential() # Instantiate our NN in the same way.

model.add(Convolution2D(filters = 6,     # number of filters/channels
                        kernel_size = 3, # filters are 3x3
                        activation = 'relu',
                        input_shape = (28, 28, 1))) # (height, width, depth)

model.add(MaxPooling2D(pool_size = (2,2))) # 2x2 filter
# by default, MaxPooling2D will pool over non-overlapping regions

model.add(Convolution2D(filters = 16,
                        kernel_size = 3,
                        activation = 'relu'))

model.add(MaxPooling2D(pool_size = (2,2)))

model.add(Flatten()) # converts "box" to vertical array of nodes

model.add(Dense(128, activation = 'relu')) # finally at the FC layer(s)

model.add(Dropout(0.5)) # randomly dropping out 50% of 
                        # nodes during training

model.add(Dense(10, activation = 'softmax'))

# 10 nodes because 10 possible outputs/values for Y {0, 1, ..., 9}
# softmax activation because it ensures my predictions are
# non-negative and they sum to 100% for a given observation

In [9]:
# 6. Compile model.

model.compile(loss = 'categorical_crossentropy',
              # common choice for unordered discrete predictions
              optimizer = 'adam',
              # more sophisticated version of gradient descent
              metrics = ['accuracy'])

In [10]:
# 7. Fit model on the training data.

model.fit(X_train,
          Y_train,
          batch_size = 32, 
          epochs = 10,
          verbose = 1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0xb1f6685f8>

In [11]:
# 8. Evaluate model on test data.

score = model.evaluate(X_test, Y_test, verbose = 1)
labels = model.metrics_names

print(str(labels[0]) + ": " + str(score[0]))
print(str(labels[1]) + ": " + str(score[1]))

loss: 0.032125405359922664
acc: 0.9904


In [12]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 26, 26, 6)         60        
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 6)         0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 11, 11, 16)        880       
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 16)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 400)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               51328     
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
__________

60 parameters (in Conv2D layer)

each filter was 3x3. <-- 9 parameters

6 filters <-- 6 filters x 9 parameters per filter = 54 parameters

for each channel (a.k.a. result from filter), we have 1 bias parameter <-- 1 parameter per filter x 6 filters = 6 parameters

Answer: 54 parameters (within filters) + 6 parameters (bias) = 60 parameter

## Conclusion

<details><summary>Why are neural networks better equipped to handle image data than non-neural networks?
</summary>
```
Neural networks are naturally set up to consider interactions among features.
```
</details>

<details><summary>Why are **convolutional neural networks** better equipped to handle image data than non-CNNs?
</summary>
```
CNNs are naturally set up to consider interactions among "close pixels" only and drastically cuts down the number of parameters needed to learn through parameter sharing.
```
</details>

