# Convolutional Neural Networks: Code

## Dataset Structure

<img src="directories.png" width="500px;" alt="Dataset directory structure." />

The _training_ and _test_ sets are separated into 2 different directories, which is necessary to help Python distinguish between the two sets. Within the _training_ and _test_ set directories, the images of the cats and dogs are also separated into their own directories. Python will automatically recognize the images in the different directories as being part of the two different classes and perform classification with this understanding.

<hr>

## Fixing Image Specificity

The training and test set images also often depict the images of dogs and cats in specific positions, which may cause the model to overfit. An example of overfitting due to position is when images of dogs jumping are classified as dogs, while images of dogs sitting are classified as cats. _Image augmentation_ will help artifically create different perspectives of dogs and cats, which will help increase the ability of the _CNN_ model to generalize. A few ways to perform _Image Augmentation_ involve flipping images, zooming in and out of them, shearing them, as well as downsampling.

* __Shearing:__
<img src="shearing.png" width="300px;" alt="Example of an image being sheared." />
* __Image Augmentation:__ Creating artificial variation within a dataset by altering image structure. <br>
* __Dowsampling:__ The reduction in the resolution of an image while maintaining its 2D representation.

<hr>

## Code

__Image Augmentation:__

In [1]:
from tf.keras.preprocessing.image import ImageDataGenerator

# Augments images in the training set.
train_datagen = ImageDataGenerator(
        # Downsamples images to 1/255ths of their original size for faster computation.
        rescale = 1./255,
        # Range for shearing angle (Start: 0). Generator will pick a value from the range randomly.
        shear_range = 0.2,
        # Range for zoom value (Start: 0). Generator will pick a value from the range randomly.
        zoom_range = 0.2,
        # If set to True, randomly flips images across the vertical axis.
        horizontal_flip = True)

# Rescales images in the test set to match the image size of the training set.
test_datagen = ImageDataGenerator(rescale = 1./255)

# Creates the training set from the training set generator created.
training_set = train_datagen.flow_from_directory (
        # Path to directory containing the training set data.
        'dataset/training_set',
        # Downsamples images to be 64 x 64 pixels, which is expected by the convolutional layer in the CNN.
        target_size = (64, 64),
        # Model will update weights and feature detectors after sampling 32 elements.
        batch_size = 32,
        # There are two classes: cats and dogs.
        class_mode ='binary'
)

# Creates the test set from the test set generator created.
test_set = test_datagen.flow_from_directory (
        'dataset/test_set',
        target_size = (64, 64),
        batch_size = 32,
        class_mode = 'binary'
)

Using TensorFlow backend.


Found 8000 images belonging to 2 classes.
Found 2000 images belonging to 2 classes.


<hr>

__Creating the CNN:__

In [2]:
from tf.keras.models import Sequential
from tf.keras.layers import Dense, Conv2D, MaxPooling2D, Flatten  

# Creating the ANN.
classifier = Sequential()

# The convolutional layer consists of 32 feature detectors which are 3 x 3 pixels.
# The convolutional layer accepts 64 x 64 pixel images. The value of 3 represents the RGB layers.
classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))

# The pooling layer dowsamples the feature maps, transforming 4 pixels to 1 pixel.
classifier.add(MaxPooling2D(pool_size = (2, 2)))

# Flattens the pooled feature maps into an input layer for the ANN.
classifier.add(Flatten())

# Hidden Layer
classifier.add(Dense(units = 128, activation = 'relu'))

# Output Layer
classifier.add(Dense(units = 1, activation = 'sigmoid'))

# Compiles the CNN classifier.
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

<hr>

__Fitting the Classifier & Testing Accuracy:__

In [3]:
# Trains the model based on the images in the training set.
classifier.fit_generator(
        # Training data for the model.
        training_set,
        # Indicates the number of images to parse through in a single epoch. Usually equal to:
        # Dataset_Size/Batch_Size
        steps_per_epoch = 250,
        # Number of epochs to perform when training.
        epochs = 25,
        
        # Test data to validate the model's correctness.
        validation_data = test_set,
        # Number of images used to validate the correctness. Usually equal to the number of images in the
        # test set, which is equal to 2000 in this case.
        validation_steps = 2000
)

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<keras.callbacks.callbacks.History at 0x26f20762288>

__Results:__<br>
The model is overfit on the training data, as it has an accuracy of almost 100% on the training set data but only 76% on the test set data. Parameter turning, an extra hidden layer, or an extra convolutional layer may be able to help find more intricate detail differences between a cat and a dog. In return, it will help prevent the drastic difference in accuracy between the training and test sets.