## Data Preprocessing
Separate training and test data into folders - 1 for each

Also then separate images into folders for each class, e.g. cats and dogs

Naming convention - LABEL.NUMBER.EXTENSION, e.g. cat.1.jpg

10000 images total - 80/20 split between training and test set

Unlike previous ANN, the data preprocessing is done manually and some feature selection is one later

## Building the CNN

In [1]:
# Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Conv2D #2D for images
from keras.layers import MaxPooling2D #As above
from keras.layers import Flatten
from keras.layers import Dense


Using TensorFlow backend.


In [2]:
# Initialising the CNN
classifier = Sequential()

In [3]:
# Step 1 - Convolution
#Create number of feature maps
#Convolution2D 
#- number of filters = 32 (common practise to start with a 32 and then add more conv layers with 64 filters for example)
#- number of rows in each filter = 3
#- number of columns in each filter
#- input shape - shape of the input image. Not all have the same size or format so we must convert all input images to same format
# Colour images - convert to 3D array. BW - 2D array
#(64, 64, 3) = 3 channels (colour, 1 for BW), 64x64 format (smaller as we are running on a CPU)
#- activation function - makes sure we have no negative pixel values in our feature maps, ensures non-linearity in model
classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))

In [4]:
# Step 2 - Pooling
#Reduces size of feature map
#Apply to each feature map as result of conv layer
#Reduce nodes needed for next step (flattening)
#Pool size 2x2 - reduces without losing too much feature information
classifier.add(MaxPooling2D(pool_size = (2, 2)))

In [5]:
# Adding a second convolutional layer
classifier.add(Conv2D(32, (3, 3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))

In [6]:
# Step 3 - Flattening
classifier.add(Flatten())

In [7]:
# Step 4 - Full connection
#Classic ANN of fully connected layers
#units = no. nodes in hidden layer.
classifier.add(Dense(units = 128, activation = 'relu'))
classifier.add(Dense(units = 1, activation = 'sigmoid'))

In [8]:
# Compiling the CNN
#optimizer = SGD algorithm
#loss = loss function Binary Cross entropy used as our outcome is binary
#metrics = performance metric
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

## Fitting the CNN to the images

In [9]:
#Preprocessing
#Image Augmentation from keras
#Prevents overfitting
#ImageDataGenerator - generates batches of images with some random augmentations (rotation, flipped, etc.)
from keras.preprocessing.image import ImageDataGenerator

In [10]:
#Generate Augmented data
#rescale - feature scaling
#shear_range Shearing or transvection. Geometrical transformation
#zoom_range - random zooming
#horizontal_flip
train_datagen = ImageDataGenerator(rescale = 1./255,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)

test_datagen = ImageDataGenerator(rescale = 1./255)

In [11]:
#Create training and test data from augmented data generator
#target size - size expected by the model
#class mode - binary as we have only two classes
training_set = train_datagen.flow_from_directory('dataset/training_set',
                                                 target_size = (64, 64),
                                                 batch_size = 32,
                                                 class_mode = 'binary')

test_set = test_datagen.flow_from_directory('dataset/test_set',
                                            target_size = (64, 64),
                                            batch_size = 32,
                                            class_mode = 'binary')

Found 8000 images belonging to 2 classes.
Found 2000 images belonging to 2 classes.


In [None]:
#Fit model to training set
#steps per epoch = number of training examples
classifier.fit_generator(training_set,
                         steps_per_epoch = 8000,
                         epochs = 25,
                         validation_data = test_set,
                         validation_steps = 2000)

Epoch 1/25
 920/8000 [==>...........................] - ETA: 1156s - loss: 0.5731 - acc: 0.6941

## Homework - Make new predictions

In [None]:
import numpy as np
from keras.preprocessing import image
test_image = image.load_img('dataset/single_prediction/cat_or_dog_1.jpg', target_size = (64, 64))
test_image = image.img_to_array(test_image)
test_image = np.expand_dims(test_image, axis = 0)
result = classifier.predict(test_image)
training_set.class_indices
if result[0][0] == 1:
    prediction = 'dog'
else:
    prediction = 'cat'