# Convolutional Neural Networks 

There has been a lot of buzz surrounding the area of autonomous driving and GANs. Modern deep learning has almost become synonyms with the output of a particular class of neural networks. In this notebook, we meet these beasts and call them "Convolutional Neural Networks". We shall explore what convolution is, what does a convolutional layer do and then build our very first classifier. 

## Convolutions Demystified 

In order to understand convolutional neural networks, we need to understand what a convolution operation does. Now, If I went a "convolution is an operation that computes the average of everything in a fixed window" you'll probably groan internally. I'm gonna be better and show you visually how a convolution looks like for an image. 

![ConvGif](imgs/conv_moving.gif "conv")

This gif is a neat little visualization of what a convolution operation is. Essentially, a convolution operation is to compute the **weighted average** of a given fixed (height, width) region in an image. The weights are provided by what is called a _kernel_. So in this figure above, the blue square is the image and the red square is the output of the convolution. The green moving square is the kernel. We usually have kernels which have _odd_ height and width (we'll explain this shortly). For now, let's keep this definition in mind while we study the other elements of convolutional neural networks. 

## Elements of CNNs

Now that we have a working definition of convolution we can focus on other components that make up a convolutional neural networks. We'll introduce the convolutional layer, the pooling operation and ReLU before desigining our first CNN!

![Image](imgs/depthcol.jpeg)

This figure shows what a convolutional layer does. The red square represents an image. We say an image has a `height`, `width` and `depth`. The height and width are intutitive (in this figure they are 32 respectively). The `depth` parameter is a little harder to understand. I'll try to explain it in a good way however.


If you consider a color image, it's composed of three colors R,G,B. Now a computer only understands binary so we need to have one dataholder for red, one for green and so on. These are called _channels_ of an image. Now a way to visualize this is to imagine the red square as a _stack_ of these channels. What a convolutional layer does is it computes the convolution in the (`kernel_height`, `kernel_width`) regions for _all_ the channels. And this is the key thing that makes convolutional layers so effective - unlike fully connected layers, the input neurons are not connected to _all_ output neurons. Rather they are connected in a specific region defined by `kernel height` $H$ and `kernel_width` $W$. 

![Image](imgs/maxpool.jpeg)

This is a `maxpool` layer. This is a very simple layer, that simply returns the maximum value in a ($H$, $W$) region. However, the motivation behind this is powerful. We select the maximum in a given window because we interpret anything that's not only above the activation threshold, but also greater than any other values in its nearby region as being _most important_. Thus it makes sense to only look in that region for anything of value. The stride $S$ of a max pool or a convolutional filter is the _step size_ it takes. This is similar to incrementing a variable in a for loop by a fixed amount.

![Image](imgs/relu.jpeg)

Last bit of theory I swear! 

This is the ReLU nonlinearity that is used. Mathematically it's simple $relu(x) = max(0,x)$. Essentially, this is a thresholding operation that rejects any value below zero. This is typically applied as nonlinearity between different layers in a CNN. There are other variants to it, but we'll dicuss it ......sometime.

## Programming CNNs in Keras 

Finally! We get to coding our very first CNN in Keras. In this we'll be using the [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset to classify objects. The CIFAR-10 dataset has about 60000 images in its training set, each having a dimension of $(32,3,3)$. As before we'll start by arranging components first and then building APIs. 

In [1]:
# necessary imports 
import keras 
import os, time 
import matplotlib.pyplot as plt
from keras import Sequential 

Using TensorFlow backend.


In [2]:
# imports for convolutional layers 
from keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense
from keras import Model 

In [3]:
# We're going to create the vision model in the same way we created our input model. However whenever you create 
# a new convolutional layer you need to keep in mind 3 things 
# 1. The number of output channels of the layer. 
# 2. The kernel size 
# 3. The stride and padding. 

model = Sequential() 
# input layer 
model.add(Conv2D(64, (3,3), padding='same', activation='relu', input_shape=(32,32,3)))
model.add(Conv2D(64, (3,3), padding='same', activation='relu'))
# adding pooling 
model.add(MaxPooling2D(2,2))
model.add(Conv2D(128, (3,3), padding='same', activation='relu'))
model.add(MaxPooling2D(2,2))
model.add(Conv2D(128, (3,3), padding='same', activation='relu'))
model.add(Flatten()) # flatten the input vector into a 1D shape 
model.add(Dense(1024, activation='relu')) # we met these in the last notebook 
model.add(Dense(10, activation='softmax')) # the final output


print(model.summary())


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 32, 32, 64)        1792      
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 32, 32, 64)        36928     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 16, 16, 64)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 16, 16, 128)       73856     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 8, 8, 128)         0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 8, 8, 128)         147584    
_________________________________________________________________
flatten_1 (Flatten)          (None, 8192)              0         
__________

In this code snippet we create a tiny CNN. The input layer accepts an input shape which varies according to the images in your dataset. The first parameter in the call to `Conv2D` tells us how many output _activation maps_ we want from this layer. So you can read the a call to first `conv2d` layer as "_we want to create a convolutional layer which outputs 64 activation maps, by looking in a $(3x3)$ area, and an input shape of $(32,32,3)$_". Similarly you can read the other layers with the exception that the other convolutional layers do not explicitly specify an `input_shape` parameter. Keras automagically handles this by deducing the input shapes for you. Neat!

If you run this cell, it'll work (but it doesn't produce anything yet, just creates a convolutional model in memory). However, this is terribly repetitive, wouldn't it be better if we could make some sort of function that could return a model with all the layers instantiated. We could then customize the argument to that function and obtain a different neural network. 

In [5]:
def create_cnn(layer_config, input_shape):
    """
    Creates a convolutional neural network 
    to be used with Keras.
    
    NOTE: All activations are assumed to be Relu units.
    
    Args:
    1. layer_config: A list of tuples specifiying the size of the 
    layers. By default the first layer is assumed to be the input. 
    We define the following mapping for creating different layers: 
    (c, 64, 3) -> Create a conv2d with 64 output channels and kernel size of 3
    (m, 2) -> Create a maxpooling2d with kernel size of 2x2 
    (d, 1024) -> create a dense layer with 1024 units 
    (f, None) -> flatten the convolutional layer.
    
    2. Input shape: The input shape 
    
    Returns:
    A sequential Keras model 
    """
    a_model = Sequential()
    for i, layer in enumerate(layer_config):
        if i == 0 and layer[0] == 'c':
            a_model.add(Conv2D(layer[1], (layer[2],layer[2]), padding='same', activation='relu', input_shape=input_shape))
        elif layer[0] == 'c':
            a_model.add(Conv2D(layer[1], (layer[2],layer[2]), padding='same', activation='relu'))
        
        if layer[0] == 'm':
            a_model.add(MaxPooling2D((layer[1], layer[1])))
        
        if layer[0] == 'f':
            a_model.add(Flatten())
            
        if i == len(layer_config)-1 and layer[0] == 'd':
            a_model.add(Dense(layer[1], activation='softmax'))
        elif layer[0] == 'd':
            a_model.add(Dense(layer[1], activation='relu'))
    
    return a_model


# using the definition we have above 
layer_config=[('c',64,3), ('c',64,3),('m',2), ('c',128, 3), ('m',2), ('c',128,3), ('f',None),('d',1024),('d',10)]

vis_model = create_cnn(layer_config, input_shape=(32,32,3))
print(vis_model.summary())
     
    
    
    

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_5 (Conv2D)            (None, 32, 32, 64)        1792      
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 32, 32, 64)        36928     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 16, 16, 64)        0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 16, 16, 128)       73856     
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 8, 8, 128)         0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 8, 8, 128)         147584    
_________________________________________________________________
flatten_2 (Flatten)          (None, 8192)              0         
__________

Awesome! Now we have a function that we can use to create any neural network for ourselves by just specifying the `layer_config`. Let's train the model now

## Training the CNN 

In  this section we're going to load the CIFAR-10 data and train for two epochs before evaluating the model on testing set. 

In [6]:
# load the keras cifar-10 loader 
from keras.datasets import cifar10

(x_train, y_train), (x_test, y_test) = cifar10.load_data()

In [7]:
# As before we shall convert our labels to one-hot encoding for easier computation 

y_train_cat = keras.utils.to_categorical(y_train, num_classes=10)
y_test_cat = keras.utils.to_categorical(y_test, num_classes=10)

In [8]:
# lets see what a label looks like 
print(y_train_cat)

[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 1.]
 [0. 0. 0. ... 0. 0. 1.]
 ...
 [0. 0. 0. ... 0. 0. 1.]
 [0. 1. 0. ... 0. 0. 0.]
 [0. 1. 0. ... 0. 0. 0.]]


In [9]:
# now lets compile our vision model 
vis_model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])

In [10]:
# And now lets train our model on the trianing set. We'll train for two epochs, with a batch size of 100 

vis_model.fit(x=x_train, y=y_train_cat, batch_size=100, epochs=2)

Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x7fc19c38bd68>

In [11]:
# Lets evaluate now

score = vis_model.evaluate(x=x_test, y=y_test_cat)
print("test loss:{}".format(score[0]))
print("test accuracy:{}".format(score[1]))

test loss:14.506285661315918
test accuracy:0.1


## Assignment: It's Convolutions all over!!!

You just learn a lot of information. These are some questions that make you think, experiment and learn more deeply about CNNs in general. 


1. Read the page here on convolutional neural network architecures -> [Convolutional Archs](http://cs231n.github.io/convolutional-networks/#architectures). Comprehend (and ask us questions) on different CNN architectures and why deep nets are the rage. 


2. Use `create_cnn` to create a CNN with:

    i. Different kernel size of the convolutional layer. e.g (3,3) -> (5,5) or (7,7). Can you have an even number    as a kernel size? 
    
    ii. Different number of output filters after the input layer.

3. Train on CIFAR-10 data using:

    i. Bigger batch size
    
    ii. More epochs 
    
    Does any of this have any effect on `score`?

4. Create a Convolutional network with 2 Convolutional, 1 MaxPooling and 2 Dense layers for MNIST-10. Do you observe better accuracy? 