# Convolutional Neural Network

## Main Task
> Creating a CNN model for determining if an image shows a dog OR a cat.

In [8]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# We’ll use Keras’ ImageDataGenerator to preprocess our images. It allows us to augment our images on-the-fly while our model is still learning. 
# This can help our model generalize better to new images.

### Preprocess the data

**Training set**  
Data augmentation is a strategy that enables us to significantly increase the diversity of data available for training models, without actually collecting new data.  
Data augmentation techniques such as cropping, padding, and horizontal flipping are used to train a model with slightly modified versions of the original images.  
This helps to make the model more robust to variations in the input data.

In our case, train_datagen uses data augmentation techniques including:

shear_range: This is for randomly applying shearing transformations. A shearing transformation slants the shape of the image.
zoom_range: This is for randomly zooming inside pictures.
horizontal_flip: This is for randomly flipping half of the images horizontally. This is relevant when there are no assumptions of horizontal asymmetry (e.g. real-world pictures).

In [10]:
train_datagen = ImageDataGenerator(rescale=1./255,
                                   shear_range=0.2,
                                   zoom_range=0.2,
                                   horizontal_flip=True)

training_set = train_datagen.flow_from_directory('../dataset/training_set',
                                                 target_size=(64, 64),
                                                 batch_size=32,
                                                 class_mode='binary')

Found 8000 images belonging to 2 classes.


**Test set**  
test_datagen doesn’t need these augmentations because it’s only used to preprocess the test set images, not to train the model.  
The test data should represent real-world data as closely as possible, and should not be augmented in the same way as the training data.

In [11]:
test_datagen = ImageDataGenerator(rescale=1./255)

test_set = test_datagen.flow_from_directory('../dataset/test_set',
                                            target_size=(64, 64),
                                            batch_size=32,
                                            class_mode='binary')

Found 2000 images belonging to 2 classes.


### Building a CNN Model

In [None]:
cnn_model = Sequential()

# Add the first convolutional layer
cnn_model.add(Conv2D(32, (3, 3), input_shape=(64, 64, 3), activation='relu'))
cnn_model.add(MaxPooling2D(pool_size=(2, 2)))

# Add the second convolutional layer
cnn_model.add(Conv2D(64, (3, 3), activation='relu'))
cnn_model.add(MaxPooling2D(pool_size=(2, 2)))

# Add the third convolutional layer
cnn_model.add(Conv2D(128, (3, 3),   activation='relu'))
cnn_model.add(MaxPooling2D(pool_size=(2, 2)))

# Flatten the tensor output from the convolutional layers
cnn_model.add(Flatten())

# Add a fully connected (dense) layer
cnn_model.add(Dense(units=128, activation='relu'))

# Add the output layer
cnn_model.add(Dense(units=1, activation='sigmoid'))

**Convolutional Layers (Conv2D):** Convolutional layers are the major building blocks used in convolutional neural networks. A convolution is a mathematical operation that merges two sets of information. In the context of a CNN, the convolution is performed on the input data with the use of a filter or kernel (these terms are used interchangeably) to then produce a feature map. We use three convolutional layers to progressively extract higher-level features from the input image. The parameters inside Conv2D are:

1. The first parameter (32, 64, 128) is the number of filters that the convolutional layer will learn. Layers early in the network architecture (closer to the actual input image) learn fewer convolutional filters while layers deeper in the network (closer to the output predictions) will learn more filters. This allows the network to learn more complex representations.

2. (3, 3) is the size of the filters. Each filter will be a 3x3 matrix which is convolved with the image.

3. input_shape=(64, 64, 3) is only needed for the first layer. It specifies the shape of the input image. 64, 64 is the dimension of the image and 3 stands for the three color channels (RGB).

4. activation='relu' is the activation function to use. ReLU (Rectified Linear Unit) is a common activation function that outputs the input directly if it is positive, otherwise, it outputs zero.

**MaxPooling Layers (MaxPooling2D):** After each convolutional layer, a pooling layer is often added for downsampling, which reduces the spatial size of the representation, reducing the amount of parameters and computation in the network, and hence also helps to control overfitting. MaxPooling takes the maximum value of the area it is applied to.

**Flatten Layer (Flatten):** This layer is used to convert the final feature maps into a one-dimensional single vector. This flattening step is needed so that you can make use of fully connected layers (Dense layers) after some convolutional/maxpool layers. It combines all the found local features of the previous convolutional layers.

**Dense Layer (Dense):** After the flatten layer, you use a dense layer (also called fully connected layer), which performs classification on the features extracted by the convolutional layers and downsampled by the max-pooling layers. In this dense layer, every node in the layer is connected to every node in the preceding layer.

**Activation Functions:** Activation functions are used to introduce non-linearity to the model. Without activation functions, no matter how many layers your neural network has, it would still behave just like a single-layer perceptron because summing these layers would give you just another linear function. relu is used in the hidden layers and is one of the most commonly used activation functions for hidden layers. It helps the model learn complex patterns and does not activate all the neurons at the same time, thus helping in reducing overfitting.

**Output Layer:** The last layer is the output layer. It uses the sigmoid activation function, which squashes the output between 0 and 1. This is useful for binary classification as it can be interpreted as the probability of the input image being a dog (or whatever class you define as 1).