# Convolutional Neural Networks

Convolutional neural networks are mainly used for image classification. Because of that, they necessarily require large amounts of data, even for just a few observations. Plus, there are several more layers of computation than the neural networks we looked at last week. Only the smallest problems (like the one here) can be done on a CPU.

## So what's different about CNNs?

**Quick note:** In math, an nD array is called a tensor. A vector is a specific name for a 1D tensor and a matrix is a specific name for a 2D tensors. Images are usually composed of three channels, often red, green and blue. So each image is a matrix of red values, a matrix of green values, and a matrix of blue values. Thus, images are 3D tensors. That's where TensorFlow gets its name.

CNNs add a step called convolution, which gives the algorithm its name. In convolution, you take a small tensor (e.g. $5\times5\times n\_channels$), drag it across the image like a moving window, and convolve the two at each position. Convolving is the sum of the elementwise multiplication product.
$$ Conv = \sum^i \sum^j \sum^k Image_{ijk} Filter_{ijk} $$
This creates a new, smaller 2D tensor.
<img src="http://neuralnetworksanddeeplearning.com/images/tikz44.png">
(Image from "<a href="http://neuralnetworksanddeeplearning.com/index.html">Neural Networks and Deep Learning</a>" by Michael Nielsen.)

So what is this doing? Well, basically it's looking for features like lines and curves. If the image has a line that matches the filter, it will have a large convolution. If the image and filter don't match, the convolution will be small or zero. Take this example of finding a curve on a mouse:
<img src="https://adeshpande3.github.io/assets/Filter.png">
<img src="https://adeshpande3.github.io/assets/OriginalAndFilter.png">
<img src="https://adeshpande3.github.io/assets/FirstPixelMulitiplication.png">
<img src="https://adeshpande3.github.io/assets/SecondMultiplication.png">
Images from <a href="https://adeshpande3.github.io/adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/">this blog post</a>. <a href="https://github.com/adeshpande3/adeshpande3.github.io/blob/master/LICENSE">LICENSE</a>

A convolution step can involve many filters. After a convolution step, the layer has to be passed through an activation function, just like with logistic regression. It's used to add nonlinearity. All these convolutions are linear functions, so stringing together several convolutions is the same as one big linear function. We need to prevent that from happening. Often, CNNs end up having several convolution + activation cycles.

## Other useful tricks
There are two more layers that can be added to help increase accuracy. Just trust me when I say they do that. There are pooling layers that take nonoverlapping windows in the dataset and reduce them to one number somehow. It's used to downsample. A common pooling algorithm is max pooling, that takes the maximum value in the window:
<img src="https://adeshpande3.github.io/assets/MaxPool.png">
(Also from the blog post)  

Another trick is a dropout layer, where you literally drop examples from the training set.

## Some vocab
* batch: a subset of images to train at once
* epoch: a cycle of forward and back propagation on ALL training examples
* data augmentation: creating a larger dataset by adding transformed images, i.e. flipped and rotated image copies
* fully connected layer: this is the start of the neural network part

# Example time!

In [1]:
#install commands, assuming Python 3 only
#!pip install tensorflow #(CPU support only)
#!pip install tensorflow-gpu #(GPU and CPU support)
#!pip install keras

I just did this on my laptop, so I didn't install the GPU support.

In [2]:
import keras #should give you a notification that you're using the Tensorflow backend

Using TensorFlow backend.


We'll use the CIFAR10 small image classification dataset included in Keras. It's got 50,000 32x32 color training images, labeled over 10 categories, and 10,000 test images.

In [3]:
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D

In [4]:
batch_size = 32 #this trains 32 examples at a time
num_classes = 10
epochs = 200 #the number of forward and back propagation cycles for ALL training examples
data_augmentation = False #this is used if you want to augment your data with transforms of your data

In [5]:
# The data, shuffled and split between train and test sets:
(x_train, y_train), (x_test, y_test) = cifar10.load_data() #the data might need to be downloaded first
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

x_train shape: (50000, 32, 32, 3)
50000 train samples
10000 test samples


Notice that the 4th dimension is 3. That means these images have 3 color channels, namely RGB.

In [6]:
# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

In [7]:
model = Sequential() #we're going to add all the layers one by one

In [8]:
#for padding, 'valid' means don't pad, 'same' means pad with zeros
model.add(Conv2D(32, (3, 3), padding='same', #we're going to use 32 3x3 filters with padding
                 input_shape=x_train.shape[1:])) #the first layer needs to know the input size of the data
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3))) #another 32 3x3 filters without padding
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2))) #max pooling with a 2x2 pool
model.add(Dropout(0.25, )) #randomly set 25% of examples to 0, also set random seed

model.add(Conv2D(64, (3, 3), padding='same')) #now we're using 64 3x3 filters with padding
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3))) #64 3x3 filters without padding
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2))) #max pooling with a 2x2 pool
model.add(Dropout(0.25)) #randomly set 25% of examples to 0

#add the fully connected layers (called Dense here)
model.add(Flatten()) #we don't need images anymore, so it's easier to turn everything into vectors
model.add(Dense(512)) #first (and only) hidden layer with 512 nodes
model.add(Activation('relu'))
model.add(Dropout(0.5)) #randomly set 50% of examples to 0
model.add(Dense(num_classes)) #output layer
model.add(Activation('softmax'))

In [9]:
# initiate RMSprop optimizer
opt = keras.optimizers.rmsprop(lr=0.0001, decay=1e-6)

In [10]:
# configure the learning process
model.compile(loss='categorical_crossentropy', #this is our logistic regression loss function
              optimizer=opt,
              metrics=['accuracy'])

In [11]:
#original type is uint8
#want to normalize, so it needs to be converted to float
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

In [12]:
#This will take a while on a CPU, but it's still doable
if not data_augmentation:
    print('Not using data augmentation.')
    model.fit(x_train, y_train,
              batch_size=batch_size,
              epochs=epochs,
              validation_data=(x_test, y_test),
              shuffle=True)
else:
    print('Using real-time data augmentation.')
    # This will do preprocessing and realtime data augmentation:
    datagen = ImageDataGenerator(
        featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        rotation_range=0,  # randomly rotate images in the range (degrees, 0 to 180)
        width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
        horizontal_flip=True,  # randomly flip images
        vertical_flip=False)  # randomly flip images

    # Compute quantities required for feature-wise normalization
    # (std, mean, and principal components if ZCA whitening is applied).
    datagen.fit(x_train)

    # Fit the model on the batches generated by datagen.flow().
    model.fit_generator(datagen.flow(x_train, y_train,
                                     batch_size=batch_size),
                        steps_per_epoch=x_train.shape[0] // batch_size,
                        epochs=epochs,
                        validation_data=(x_test, y_test))

Not using data augmentation.
Train on 50000 samples, validate on 10000 samples
Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 7

KeyboardInterrupt: 