# Convolutional Neural Networks


### A simple fully connected neural network


In [4]:

from keras.models import Sequential
from keras.layers import Dense, Input
from keras.backend import clear_session

# Good Practice Klaxon: Free your memory from previously made models.
clear_session()

# Create a new blank model
model = Sequential()
# Add a hidden layer of shape 2 with an input of size 4 (, denotes that we can accept variable amounts of data)
model.add(Dense(2, input_shape=(4,)))
# And finally, add an output layer of shape 1
model.add(Dense(1))

# Print out a summary of the model
model.summary()

InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory

## Convolutional Neural Networks

Standard CNNs are comprised of three types of layers: convolutional layers, pooling layers and fully-connected layers.  When  these  layers  are  stacked, a CNN architecture has been formed. A simplified CNN architecture for MNIST image classification is illustrated in Figure 2.

![CNN](images/cnn.png)

**Figure Two:** A common form of CNN architecture in which convolutional layers arestacked between ReLus continuously before being passed through the poolinglayer, before going between one or many fully connected ReLus.


The basic functionality of the example CNN with input images can be broken down into four key areas.

1. As found in other forms of ANN, the **input layer** will hold the pixel values of the image.
2. The **convolutional layer** will determine the output of neurons of which are connected to local regions of the input,  bythe scalar product between their weights and the region connected to the input volume. The activation function (such as rectified linear unit, sigmoid, softmax) will then be applied to generate the output activation.
3. The **pooling layer** will then simply perform downsampling along the spatial dimensionality of the given input, further reducing the number of parameters within that activation.
4.  The **fully-connected layer**/s will then perform the same duties found in MLPs and attempt to produce class scores from the activations, to be used for classification. 


Through this simple method of transformation, CNNs are able to transform the original input layer by layer using convolutional and downsampling tech-niques to produce class scores for classification and regression purposes. However, it is important to note that simply understanding the overall architecture of a CNN architecture will not suffice. The creation and optimisation of these models can take quite some time, and can be quite confusing. We will now explore in  detail the individual layers, detailing their hyperparameters and connectivities.

As we glide through the input, the scalar product is calculated for each value inthat kernel (Figure 3). From this the network will learn kernels that 'fire' when they see a specific feature at a given spatial position of the input. These  are commonly known as **activations**.

![MLP](images/convolutional.png)
**Figure Three:** A visual representation of a convolutional layer. The centre element of the kernel is placed over the input vector, which is then calculated and replaced with a weighted sum of itself and any nearby pixels.

Every kernel will have a corresponding activation map, of which will be stacked along the depth dimension to form the full output volume from the convolutional layer.

These kernels are usually small in spatial dimensionality, but spreads along the entirety of the depth of the input. When the data hits a convolutional layer, the layer convolves each filter across the spatial dimensionality of the input to produce a 2D activation map.

One of the key differences compared to the MLP is that the neurons that the layers within the CNN are comprised of neurons organised into three dimensions, the spatial dimensionality of the input (**height**, and **width**) and the **depth**. The depth is the third dimension of an activation volume, that is the number of filters/kernels used. Unlike standard MLPs, the neurons within any given layer will only connect to a small region (**receiptive field**) of the layer preceding it.

We are also able to define the **stride** in which we set the depth around the spatial dimensionality of the input in order to place the receptive field. For example, if we were to set a stride as 1 then we would have a heavily overlapped receptive field producing extremely large activations. Alternatively, setting the stride to a greater number will reduce the amount of overlapping and produce an output of lower spatial dimensions.

**Zero-padding** is the simple process of padding the border of the input, and is an effective method to give further control as to the dimensionality of theoutput volumes. It is important to understand through the use of these tehcniques, we will in turn alter the spatial dimensionality of the convolutional layers' output. We can calculate this using the following method:

In [3]:
def calculate_conv_output(height, width, depth, kernel_size, zero_padding, stride):
    # Receptive field size = kernel size.
    
    volume_size = (height*width)*depth
    z = (zero_padding*zero_padding)
    
    return ((volume_size - kernel_size) + z) / stride + 1 

If the calculated result from this equation is not equal to a whole integer then the stride has been incorrectly set, as the neurons will be unable to fit neatly across the given input. 


See the slides for lecture10-CNN for more information on CNN. 

Or go through the short tutorial for the basic components in a ConvNet
https://machinelearningmastery.com/crash-course-convolutional-neural-networks/


## Task One: MNIST Classification

Using the slides given last week, build a CNN to classify MNIST digits:

Last week we reduced the data dimensionality with PCA prior to appl a feedforward neural network. This time, we'll train a network on the complete image and use a CNN, a sparsely connected network. 

In [None]:
from keras.utils import to_categorical
from keras.datasets import mnist

# input image dimensions
width = 28
height = 28

num_classes = 10

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Reshape for CNN
x_train = x_train.reshape(x_train.shape[0], height, width, 1)
x_test = x_test.reshape(x_test.shape[0], height, width, 1)
input_shape = (width, height, 1)


# Make it faster. 
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = to_categorical(y_train, num_classes)
y_test = to_categorical(y_test, num_classes)


Build your network below (you can get some insiration from this [keras example](https://github.com/keras-team/keras/blob/master/examples/mnist_cnn.py).

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Activation, Conv2D, MaxPool2D, Dropout, Flatten

model = Sequential()
model.add(Conv2D(16, kernel_size=(3, 3), activation='relu', input_shape=(28,28,1)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(0.25)) # Dropout 25% of the nodes of the previous layer during training
model.add(Flatten())     # Flatten, and add a fully connected layer
model.add(Dense(128, activation='relu')) 
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax')) # Last layer: 10 class nodes, with dropout
model.summary()


Note that we have about half a million parameters. With a strong optimizer like Adam, and a big dataset like MNIST, this shouldn't be a problem.

In [None]:
from keras.optimizers import Adam

optimizer = Adam()
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model, iterating on the data in batches of 32 samples
model.fit(x_train, y_train, epochs=15, batch_size=32, validation_split=1/6)

Exercise:

1. Try out different network architecture and hyperparameter settings, and observe the effect on performance.
or if you have time, check out the [LeNet architecture](https://engmrk.com/lenet-5-a-classic-cnn-architecture/)

2. Or try to play with a more difficult MNIST dataset: [the Fashion MNIST dataset](https://github.com/zalandoresearch/fashion-mnist), to load the data from keras:

from keras.datasets import fashion_mnist

(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

In [None]:
# Your code





## Task Two: Hotdog Classification

Cast your mind back to the first workshop, where you saw the following image:

![Supervised Learning](images/supervised_classification.png)

**Figure Five:** An example of a supervised classification task, in which the training examples in the orange segment are pre-labeled as being "hot dog", and those in the white segment are pre-labeled as being "not a hot dog".

In this week's tutorial, using the above information - build a Convolutional Neural Network to classify images of hotdogs against a random selection of images of "not hot dogs". To load in the data, unzip the ```practical7.zip``` archive into the directory of this workshop and run the following code:

In [1]:
import numpy as np
import os
from PIL import Image
from sklearn.utils import shuffle

classes = ["nothotdogs", "hotdogs"]

X = []
y = []

for index, c in enumerate(classes):
    data_dir = os.path.join("./data/", c)
    
    for fn in os.listdir(data_dir):
        try:
            fp = os.path.join(data_dir, fn)

            im = Image.open(fp)

            # Setting mode to L will give greyscale image.
            im = im.convert("L")

            # Resize to 64 x 64
            im = im.resize((64, 64))

            # Convert to numpy array
            im = np.array(im)
            
            X.append(im)
            y.append(index)
        except OSError:
            # If you want to use RGB, edit this to visualise the data.
            pass
        
X = np.array(X)
y = np.array(y)

# Shuffle it a bit
X, y = shuffle(X, y, random_state=0)

Now, using the following method - investigate the data.

In [None]:
import matplotlib.pyplot as plt

def visualise_data(X):
    print("Data is of shape", X.shape)
    fig = plt.figure()
    if len(X.shape) == 3:
        for i in range(6):
            ax = fig.add_subplot(3, 3, i+1)
            ax = plt.imshow(X[i], cmap='gray')
            plt.axis("off")
    else:
        pass
    plt.tight_layout()
    plt.show()

In [None]:
# Your code here



Now that you've understood the data, now it's time to build and evaluate your model.

In [None]:
# Your code here

