In [16]:
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Conv2D, Activation, MaxPooling2D, Dense, Flatten, Dropout
from keras.preprocessing.image import ImageDataGenerator

from keras.models import load_model
from IPython.display import display
from PIL import Image 
import matplotlib.pyplot as plt

import numpy as np

Below we first start by creating some type of sequential model (which is a linear stack of different layers) -- it is commonly used in many multilayer perceptron neural networks. To create the "Convolutional" part of the neural network, we add the different convolutional blocks along with the RELU (Rectified Linear Unit function) and Pooling Layer to add some non-linearity. Generally, the first layer picks up some of the more basic features while the second layer picks up more complex features and this continues going on -- you pick the number of layers normally based on the task and this is a big part of model-engineering in the field of deep-learning. The non-linearity is particularly helpful becuase we don't want a model that is totally linear since that ignores a lot of the natural decision process, but furthermore, activation functions also help with the vanishing gradient problem because it will make sure small negative gradients and positive gradients do not cancel eachother out. The pooling layers, as mentioned prior, are almost exclusively because they will help with dimensionality reduction and things of the sort.

In [6]:
catdogimageclassifier = Sequential();

#adding layers to the network - conv2d will add two dimensional convolutional layer which 
#have 32 filters
catdogimageclassifier.add(Conv2D(32, (3,3), input_shape=(64,64,3))) #feature map
catdogimageclassifier.add(Activation('relu')) #adding in relu activation
catdogimageclassifier.add(MaxPooling2D(pool_size=(2,2)))

#adding all three convolutional blocks
catdogimageclassifier.add(Conv2D(32, (3,3)))
catdogimageclassifier.add(Activation('relu'))
catdogimageclassifier.add(MaxPooling2D(pool_size=(2,2)))

catdogimageclassifier.add(Conv2D(32, (3,3)))
catdogimageclassifier.add(Activation('relu'))
catdogimageclassifier.add(MaxPooling2D(pool_size=(2,2)))

catdogimageclassifier.add(Conv2D(32, (3,3)))
catdogimageclassifier.add(Activation('relu'))
catdogimageclassifier.add(MaxPooling2D(pool_size=(2,2)))

#flatten the dataset which will transform the pooled feature map matrix into one column
catdogimageclassifier.add(Flatten())

#add dense function now followed by RELU activation
catdogimageclassifier.add(Dense(64))
catdogimageclassifier.add(Activation('relu'))

#to deal with overfitting, we will use a dropout layer
catdogimageclassifier.add(Dropout(0.5))

#add one more fully connected layer to get the output in n-dimensional classes
catdogimageclassifier.add(Dense(1))
catdogimageclassifier.add(Activation('sigmoid'))
catdogimageclassifier.summary()

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_16 (Conv2D)           (None, 62, 62, 32)        896       
_________________________________________________________________
activation_24 (Activation)   (None, 62, 62, 32)        0         
_________________________________________________________________
max_pooling2d_16 (MaxPooling (None, 31, 31, 32)        0         
_________________________________________________________________
conv2d_17 (Conv2D)           (None, 29, 29, 32)        9248      
_________________________________________________________________
activation_25 (Activation)   (None, 29, 29, 32)        0         
_________________________________________________________________
max_pooling2d_17 (MaxPooling (None, 14, 14, 32)        0         
_________________________________________________________________
conv2d_18 (Conv2D)           (None, 12, 12, 32)       

Below, the model is being compiled with a loss function (binary cross-entropy -- it's a very basic and simple technique which basically involves subtracting the differences in expected and result), a metric on how to evaluate backpropagation, and a different method of gradient descent which adjusts the learning rate. **RMSPROP**, otherwise known as *root mean squared propagation* is a technique proposed by Geoffrey Hinton that adjusts the learning rate based on certain parameters so that it isn't always constant when doing gradient descent -- note that the learning rate essentially details how "thorough" or "fine-grained" the stochastic gradient descent algorithm standardly used will go until in order to reach the minimum -- a big learning-rate will often mean that you won't necessarily be exact and might cross the minimum before returning to it while a small learning-rate takes a significant chunk of time to parse through. 

Let's define **target size** as the size the image is adjusted to (as in pixels) so that it can easily be fed into the Convolutional Neural Network. Let's also define the **batch-size** as the number of images that will be processed at a given time -- this is pretty much done because most machines aren't able to deal with that many images at one time.

In [8]:

# Compile the model
catdogimageclassifier.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])

# Data augmentation to help with overfitting
train_datagen = ImageDataGenerator(rescale=1./255, shear_range=0.25, zoom_range=0.25, horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)

# Loading training data
training_set = train_datagen.flow_from_directory(
    "/mnt/c/Users/abhi/Documents/Programs/Computer-Vision/Deep-Learning/dogsvscats/test",
    target_size=(64, 64),
    batch_size=32,
    class_mode='binary'
)

# Loading testing data
test_set = test_datagen.flow_from_directory(
    "/mnt/c/Users/abhi/Documents/Programs/Computer-Vision/Deep-Learning/dogsvscats/train",
    target_size=(64, 64),
    batch_size=32,
    class_mode='binary'
)

Found 4000 images belonging to 2 classes.
Found 21000 images belonging to 2 classes.


And now here we begin the training of the model! We have 10 epochs, which are kind of like iterations of training the model and testing it, and we train over the entire training set for each epoch.  

In [10]:
#begin training now 
catdogimageclassifier.fit(
    training_set,
    steps_per_epoch=len(training_set),
    epochs=10,
    validation_data=test_set,
    validation_steps=len(test_set)
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f83b436aeb0>

In [20]:
catdogimageclassifier = load_model('catdog_cnn_model.h5')

from keras.preprocessing import image 
an_image = image.load_img("/mnt/c/Users/abhi/Documents/Programs/Computer-Vision/Deep-Learning/dogsvscats/MANUAL test/gsd.jpeg", target_size = (64, 64))
ar_image = image.img_to_array(an_image)
ar_image = np.expand_dims(ar_image, axis=0)

verdict = catdogimageclassifier.predict(ar_image)
if verdict[0][0]>=.5:
    prediction = 'dog'
else:
    prediction = 'cat'
print(prediction);

dog


You can see that in the above code which is classifying an image of a german shepherd, which you can see in the folder is an actual dog!