# Overview

Our goal here is to feed the data we have prepared earlier into a convolutional neural network, which we want to classify the data into dogs and cats. Before we do this, we should first cover how convolutional neural networks actually work. 

***

A convolutional nerual network, also known as a CNN or ConvNet, is an artificial nerual network which is most popularly used in image analysis. These ConvNets can be thought of as neural networks with some type of specialization for being able to pick out or detect patterns in data and make sense of them. The thing that differentiates ConvNets from simple neural networks are their hidden layers of neurons, known as convolutional layers.

These convolutional layers, like any other layers, take an input and then transform the input into an output for the next layer to use as input. For convolutional layers, this transformation is a convolutional operation - we will come back to this in a bit, but let's first look at what convolutional layers are doing at a high level.

Convolutional layers are able to detect patterns by having each convolutional layer define some number of filters - these filters are actually what detect the patterns. Patterns can be things like edges, circles, corners, or other features in data - filters which detect edges are called edge detectors, for example. At the start of the network, our patterns would be simple and geometric (such as edges and corners), but as we progress and deepen the number of layers and filters, the patterns we detect can become more complex (such as eyes, ears, hair, and fur).

Say we have a CNN which takes in images of handwritten digits (such as from the MNIST dataset) and detects what integer the images represent. We can now assume that the first hidden layer in our model is a convolutional layer. As mentioned above, when we add a convolutional layer to a model, we also have to specify the filters we want the layer to have. A filter can be thought of as a relatively small matrix for which we decide the number of rows and columns, which is initialized to be full of random numbers. 

If we decide for our first layer to have a single filter of size 3x3, the filter will slide over each 3x3 block of pixels from the input itself - this sliding is actually referred to as convolving. For each 3x3 block of pixels, the dot product of our filter matrix and that block is computed and stored. Then, you would slide to the next 3x3 block and compute the dot product, storing it away again. These dot products of pixels which have been stored can then become a new representation of our input, which is made up of the entire matrix of stored dot products obtained from the input. This dot product matrix will then be the output of the layer and will act as input to the next layer, which will then filter over it but with potentially different shapes and numbers of filters. Here is how an input of pixels into a convolutional layer (with the numbers in each pixel representing its grayscale value) and its filtered output (labeled Conv1):

![image.png](attachment:image.png)

These filters can be thought of as pattern detectors that basically aggregate data using filters to try and detect any patterns that they might store. Our above example didn't do much since it was filled with random numbers, but we can look at a better example of filters here:

![image.png](attachment:image.png)

These filters can be visually represented as the following images, with -1 representing black, 1 representing white, and 0 representing gray:

![image-2.png](attachment:image-2.png)

What these filters would do is go and basically scan through the image, searching for edges. The output would be brightest where there is an edge that satisfies the requirement, and darkes where the requirement is not met. For example, on the first filter, it is searching for a 'top edge', which is an edge at teh top of a drawing of a digit - the output image would be bright where there is a top edge, dark where there is a bottom edge, and grey (neutral) when there is neither. The second filter detects 'left edges', the third detects 'bottom edges', and the fourth detects 'right edges':

![image.png](attachment:image.png)

These filters are very basic and only detect edges - they might be used near the beginning of the network. More complex filters would be deeper in the network and might be able to detect more sophisticated patterns. The weights (values) in these filters will be adjusted as the model learns through gradient descent.

After convolutional layers like these, there are sometimes pooling layers which look at input data and aggregate it by scanning through with matrices of varying size (such as 3x3) and doing some operation to simplify each of these areas into a single value which can be stored (similar to how this is done in convolution). However, instead of doing something like taking a dot product, these poolings might just take the max value out of the input data in that area (taking the max is called "max pooling"). This is done to reduce the amount of data and thus computation that our neural networks have to do. 

Overall, convolutional neural networks are used to detect paterns in (typically image) data. They do this by having multiple filters per layer, which scan through image data looking for patterns, such as edges (shown above). As you go deeper into a network, the scans (known as filters) become more advanced (each layer scans the previous layer's output data) and can look for more complex patterns.

***

Now, in order to apply a convolutional neural network to our Kaggle dataset, we need to start by importing our necessary libraries:

In [1]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D
import numpy as np

Now, we need to load in our data from earlier:

In [2]:
X = np.load("features.npy")
y = np.load("labels.npy")

The first thing we want to do before feeding this in to our neural network is consider normalizing our data. The easiest way to do this would be to scale the data - since we are using image data with a min value of 0 and a max value of 255, we can just scale the data by dividing it all by 255:

In [3]:
X = X/255.0

Now, we can begin building our model. We will start by initializing our simple sequential model:

In [5]:
model = Sequential()

Now, we are going to start adding layers. Our first layers are going to be convolutional layers, of which we will pass in 64 neurons. The next input we will pass in is the size of our filter - we will be going with 3x3. Our final parameter is the input shape - since we are passing in several sets of data, each in shape 50x50x1, we could input that there - however, another way to do this would be to give `X.shape[1:]`, which is the same thing, as our shape for X is (-1, 50, 50, 1), but we don't need to pass in -1 because that is just the number of data values we have (we don't have -1 data values, but putting in -1 tells NumPy that we want it to decide this value). Anyways, here is that addition:

In [9]:
model.add(Conv2D(64, (3,3), input_shape = X.shape[1:]))

Now, we need to add a layer - after the convolutional layer, we could add either an activation or a pooling layer. We are going to go with an activation (using the ReLU function).

In [10]:
model.add(Activation("relu"))

Now, we can add our pooling layer. To add a pooling layer, we have to specify the pool size (the size of input that we want to max pool together at a time) - we will go with 2x2.

In [11]:
model.add(MaxPooling2D(pool_size = (2,2)))

Now, this is the first real layer. We can do this again and add all the layers:

In [12]:
model.add(Conv2D(64, (3,3))) #Note that we don't have to put in input_shape here since we 
                                                        #already defined it
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2,2)))

Now, we will add a final dense layer with 64 nodes. Before we add this dense layer, however, we have to flatten our data - this is because convolution uses 2D data, but dense layers use 1D (flattened) data:

In [13]:
model.add(Flatten())
model.add(Dense(64))

Now, all we need is the output layer and the activation - we will be using a sigmoid function here as activation. For the output, we will be using a Dense layer of size 1.

In [14]:
model.add(Dense(1))
model.add(Activation("sigmoid"))

Now, we just need to compile our model. Since we are making a binary decision of cats or dogs, we will be using `binary_crossentropy` as our loss, with `adam` as our optimizer and our only metric being accuracy:

In [15]:
model.compile(loss = "binary_crossentropy",
             optimizer = "adam",
             metrics = ["accuracy"])

Now, we want to train our data. We can specify the batch size (the number of samples we want to pass in at a time) - we don't want to pass in too few at a time since we then risk overfitting, but we don't want to pass in too many, as we won't be able to have our model learn as fast then. It's usually a good idea to have a batch size between 20 and maybe 200, so we will pick 32. We then set epochs = 3, meaning we will go through all the data 3 times. We will also select a validation split (a decimal representing the precent of the training data you reserve for validation, which is kind of like an intermediate of training and testing) of 0.1 (meaning 10%). 

In [17]:
model.fit(X, y, batch_size = 32, epochs = 3, validation_split = 0.1)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x17861b19d00>