In [3]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Convolution2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense

In [6]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


In [7]:
#cd /content/drive/My\ Drive/Colab\ Notebooks

/content/drive/My Drive/Colab Notebooks


In [8]:
#cd Keras/

/content/drive/My Drive/Colab Notebooks/Keras


In [9]:
! python -c "from keras import backend; print(backend.backend())"
#! python -c "import keras; print(keras.__version__)"

Using TensorFlow backend.
tensorflow


Initializing CNN: Creating an object of class Sequential

In [0]:
classifier = Sequential()

# Adding next layers:

Four steps: 

    1) Convolution
    2) Max pooling
    3) Flattening
    4) Full connection

__Step1 - Convolution__
- Converting the image into a table of 0 and 1 pixels
- Applying several feature detectors on the input image
- For each feature detector that we use we slide it all over the image, and the part of the image that matches the best with the feature detector results a very high number in a table called feature map. It contains numbers with the highest numbers when the feature detector and the input match the closest.  This is the convolutional operation. We do this with many feature detectors. So, we get as many feature maps as the feature detectors, hence we need to input the number of feature detectors. Our convolutional layer is composed of all these feature maps.
    - The number of filters is the number of feature maps. 
    - Kernel size is a tuple containing the number of rows and columns of the feature detector window
    - Expected format of our input images:
    Input images are converted into 3D array if they are colored and 2D array if black & white
    


In [11]:
classifier.add(Convolution2D(32, 3, 3, input_shape = (64, 64, 3), activation = 'relu'))

Instructions for updating:
If using Keras pass *_constraint arguments to layers.


__Step2 - Pooling__

reduce the size of the feature map:

1) take a two by two sub-table that we slide over the feature map and each time we take the maximum of the four cells inside these blue squares (maxpooling). This time, we do it with stride of 2 as opposed to the previous step where we slided the feature detector window with a stride of 1. So, here, the size of feature map is divided by 2 when we apply Maxpooling. so the 5-by-5 feature map is reduced to 3-by-3.

2) We apply maxpooling on each of our feature maps and then we obtain our next layer composed of all these reduce feature maps and this is called pooling layer.

The reason for applying this pooling step is to reduce the number of nodes we will be using in the next steps which are the flattening step and then the full connection step. In the flatteninng step, all the cells in the pooled feature maps are flattened into one huge 1D vector. If we don't reduce the size of the feature maps, we'll get a too large vector which results in too many nodes in the fully connected layer and therefore our model would be highly computational expensive. To avoid this, we use maxpooling to reduce the complexity and the time execution, w/o losing performance. How is that? Because by taking the maximum of 2-by-2 subtables of the feature maps, we are in some way keeping the information (we are keeping track of the part of the image that contained the high numbers, corresponding to where the feature detectors detected some specific features. Hence, we don't use the spatial structure, but at the same time, we manage to reduce the time complexity and we make it less computational expensive.

In [0]:
classifier.add(MaxPooling2D(pool_size = (2,2)))

__Step3 - Flattening__

Taking all our pooled feature maps and put them into one single vector, which will be the input to a classic ANN with fully connected layers. 
What is the need to do feature detection and maxpooling? Why not just go with flattening from the beginning? Well, that is because applying flattening from the start will result to have information only for the cell itself, regardless of the neighbor cells.



In [0]:
classifier.add(Flatten())

__Step4 - Fully connected ANN__

A classic ANN can be a great classifier for nonlinear problems. We need to add a hidden layer and the output layer which is binary in this case (cat or dog)

In [0]:
# Using Dense to create fully connected layer
classifier.add(Dense(units = 128, activation = 'relu'))
classifier.add(Dense(units = 1, activation = 'sigmoid'))

In [15]:
# Compiling CNN
classifier.compile(optimizer = "adam", loss = "binary_crossentropy", metrics = ["accuracy"])

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


__Preprocessing using image augmentation__

Before fitting our CNN model to images, we apply a technique called "image augmentation" to avoid overfitting. Its a technique that allows us to enrich our training set w/o adding more images and therefore avoiding overfitting. What it actually does is that it will create many batches of the images and then it will apply on each batch some random transformations on a random selection of the images, like rotating them, shearing them, flipping them, or shifting them which results in many more diverse versions of the images, and therefore a lot more training data will be achieved, and because it is a random transformation, our model will never find the same pictures across the batches.

In [0]:
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(
        rescale=1./255, # values between 0 and 1
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1./255)

training_set = train_datagen.flow_from_directory(
        'dataset/training_set',
        target_size=(64, 64),
        batch_size=32, # the size of input after which the weights will be updated
        class_mode='binary')

test_set = test_datagen.flow_from_directory(
        'dataset/test_set',
        target_size=(64, 64),
        batch_size=32,
        class_mode='binary')

# Fitting the CNN to images
classifier.fit_generator(
        training_set,
        steps_per_epoch=8000, # Number of images in our training set
        epochs=25,
        validation_data=test_set,
        validation_steps=2000) # Number of images in our test set

Using TensorFlow backend.


Found 8000 images belonging to 2 classes.
Found 2000 images belonging to 2 classes.
Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25

Viola! 99% accuracy in classifiying photos of dogs vs cats.