# Dog or Cat?

This notebook trains a simple convolutional neural network to distinguish between images of dogs and cats. It can be run using Google Colab (to speed up training) or locally.


## Import packages

In [None]:
#Import some packages to use
import cv2
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
%matplotlib inline 
import seaborn as sns
import matplotlib.image as mpimg

#For the model 
from keras import layers
from keras import models
from keras import optimizers
from keras.preprocessing.image import ImageDataGenerator
from keras.preprocessing.image import img_to_array, load_img
from keras.callbacks import TensorBoard
from sklearn.model_selection import train_test_split

#To see our directory
import os, datetime
import random
import gc   #Gabage collector for cleaning deleted data from memory
import tensorflow as tf
tf.test.gpu_device_name() #to test whether we're using a gpu

#If running on Google Colab with data in Google Drive, enable the following 
if False:
  from google.colab import drive
  drive.mount('/content/drive')
  root_path = 'drive/My Drive/DogVsCat'
else:
    root_path = './'

## Load the training and testing data

In [None]:
num_im = 500   # Specify the number of images to load of each category

train_dir = os.path.join(root_path,'data/train')
test_dir = os.path.join(root_path,'data/test')

train_dogs = [f'{train_dir}/{i}' for i in os.listdir(train_dir) if 'dog' in i]  #get dog images
train_cats = [f'{train_dir}/{i}' for i in os.listdir(train_dir) if 'cat' in i]  #get cat images

test_imgs = [f'{test_dir}/{i}' for i in os.listdir(test_dir)] #get test images

train_imgs = train_dogs[:num_im] + train_cats[:num_im]  # slice the dataset and use 2000 in each class
random.shuffle(train_imgs)  # shuffle it randomly

#Clear list that are useless
del train_dogs
del train_cats
gc.collect()   #collect garbage to save memory

## Look at some of the images

In [None]:
#Lets view some of the pics
plt.figure(figsize=(20,10))
columns = 5
for i in range(columns):
    plt.subplot(5 / columns + 1, columns, i + 1)
    plt.imshow(cv2.imread(train_imgs[i], cv2.IMREAD_COLOR))

## Resize the images

The images all need to have consistent sizes in order to feed in to the neural network.

This function will resize the images so that they're all 150x150 pixels, and also extract the "dog" or "cat" text from the filename to use as a label for the network.

In [None]:
#Lets declare our image dimensions
#we are using coloured images. 
nrows = 150
ncolumns = 150
channels = 3  #change to 1 if you want to use grayscale image


#A function to read and process the images to an acceptable format for our model
def read_and_process_image(list_of_images):
    """
    Returns two arrays: 
        X is an array of resized images
        y is an array of labels
    """
    X = [] # images
    y = [] # labels
    
    for image in list_of_images:
        X.append(cv2.resize(cv2.imread(image, cv2.IMREAD_COLOR), (nrows,ncolumns), interpolation=cv2.INTER_CUBIC))  #Read the image
        #get the labels
        if 'dog' in image:
            y.append(1)
        elif 'cat' in image:
            y.append(0)
    
    return X, y

In [None]:
#get the train and label data
X, y = read_and_process_image(train_imgs)

In [None]:
#Lets view some of the pics now that they've been resized
plt.figure(figsize=(20,10))
columns = 5
for i in range(columns):
    plt.subplot(5 / columns + 1, columns, i + 1)
    plt.imshow(X[i])

## Convert the list of images in to an array

In [None]:
del train_imgs
gc.collect()

#Convert list to numpy array
X = np.array(X)
y = np.array(y)

#Lets plot the labels to be sure we just have two balanced class
sns.countplot(y)
plt.title('Labels for Cats and Dogs')

In [None]:
print("Shape of train images is:", X.shape)
print("Shape of labels is:", y.shape)

## Split some images off from the train set to validate the model

In [None]:
#Lets split the data into train and test set
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.20, random_state=2)

print("Shape of train images is:", X_train.shape)
print("Shape of validation images is:", X_val.shape)
print("Shape of labels is:", y_train.shape)
print("Shape of labels is:", y_val.shape)

In [None]:
#clear memory
del X
del y
gc.collect()

#get the length of the train and validation data
ntrain = len(X_train)
nval = len(X_val)


# Setup CNN

Lot's of parameters need to be chosen and optimised when training a neural network. Some of the most important ones are the number of epochs, batch size, learning rate and the amount of data augmentation.

**Number of epochs:**
> Number of times the network will see the entire training set. One epoch is an entire pass over the training set. If this is too large the network will overfit to the training set, if it's too small the network will not reach optimal performance. Look in to "early stopping" if you want to optimise this. 

**Batch size**
 > The number of images that will be passed through to the network at one time. This should be a power of 2 (e.g. 4,8,16,32,64...) for more efficient GPU training. The number of batches that make up one epoch = size of training set / batch size. The size of a batch must be more than or equal to one and less than or equal to the number of samples in the training dataset. Too small leads to noisier training, too large lessens the ability of the network to generalise (and also is a strain on memory). 32 or 64 are good sizes.

**Learning rate**
> Controls how much to change the model in response to the estimated error each time the model weights are updated. Since model weights are updated after each batch, the learning rate should be optimised with the batch size. When starting off, err on the side of smaller. 

**Augmentation**
> This is an easy way of artificially increasing the size of your dataset. Since CNN's have HEAPS of parameters that need to be trained, they need a proportional amount of data to train them. If you can't get more data (and labels) augmentation is the next best thing. If you can do it, you should! Examples of image augmentation include: zooming in/out, rotating the image, flipping the image, adding noise, shifting the image. 



In [None]:
#This is where we set these parameters

num_epoch = 200 
batch_size = 32 
learning_rate = 1e-2 
aug_flag=True

## CNN Code
* This is a pretty small CNN, it has 4 x 2D convolutional layers, each with 3 x 3 kernels, and increasing number of features in each layer. 
* The convolutional layers are separated by "max pooling" layers, which halve the size of the data at that layer.
* The output is then "flattened" in to essentially a 1 x 6272 feature vector that represents the entire image.
* This is followed by 2 normal layers ("dense" layers) that distill the features to a single number
* This number decides whether the image is a cat or dog

In [None]:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu',input_shape=(150, 150, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dropout(0.5))  #Dropout for regularization
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))  #Sigmoid function at the end because we have just two classes

In [None]:
#Lets see our model
model.summary()


## Set up the loss function

In [None]:
#We'll use the RMSprop optimizer with a learning rate of 0.0001
#We'll use binary_crossentropy loss because its a binary classification
model.compile(loss='binary_crossentropy', optimizer=optimizers.RMSprop(lr=learning_rate), metrics=['acc'])


## Set up data augmentation

In [None]:
#Lets create the augmentation configuration
#This helps prevent overfitting, since we are using a small dataset
if aug_flag:
  train_datagen = ImageDataGenerator(rescale=1./255,   #Scale the image between 0 and 1
                                    rotation_range=40,
                                    width_shift_range=0.2,
                                    height_shift_range=0.2,
                                    shear_range=0.2,
                                    zoom_range=0.2,
                                    horizontal_flip=True,)
else:
  train_datagen = ImageDataGenerator(rescale=1./255)  #Scale the image between 0 and 1)

val_datagen = ImageDataGenerator(rescale=1./255)  #We do not augment validation data. we only perform rescale

#Create the image generators
train_generator = train_datagen.flow(X_train, y_train, batch_size=batch_size)
val_generator = val_datagen.flow(X_val, y_val, batch_size=batch_size)

### Let's look at what augmentation does to an image

In [None]:
#X_tmp,y_tmp=next(train_generator)""
plt.figure(figsize=(20,10))
columns = 5
im_tmp = train_datagen.flow(X_train[:1], y_train[:1], batch_size=1) #setup a temporary generator with just one image
for i in range(columns):
    plt.subplot(5 / columns + 1, columns, i + 1)
    plt.imshow(next(im_tmp)[0][0,:])
    



## Tensorboard

Tensorboard let's us investigate the performance of the network as it trains. It also let's us compare different networks and identify issues with the training. This is done through a "callback". Callbacks are run after each batch, and there are a lot that can be applied quite easily through Keras.

This tensorboard callback will save logs during each training, named according to the parameters we set earlier.

In [None]:
#%load_ext tensorboard
log_dir= os.path.join(root_path, 'logs', f'bs{batch_size}_lr{learning_rate}_aug{aug_flag}_sz{ntrain}')
tensorboard_callback = TensorBoard(log_dir=log_dir, batch_size=batch_size)


## Let's Train!

In [None]:
#The training part
history = model.fit_generator(train_generator,
                              steps_per_epoch=ntrain // batch_size,
                              epochs=num_epoch,
                              validation_data=val_generator,
                              validation_steps=nval // batch_size, 
                              callbacks=[tensorboard_callback])

In [None]:
#Save the model
if False:
  model.save_weights(os.path.join(root_path,'model_wieghts.h5'))
  model.save(os.path.join(root_path,'model_keras.h5'))

## Visualise training

In [None]:
#lets plot the train and val curve
#get the details form the history object
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

#Train and validation accuracy
plt.plot(epochs, acc, 'b', label='Training accurarcy')
plt.plot(epochs, val_acc, 'r', label='Validation accurarcy')
plt.title('Training and Validation accurarcy')
plt.legend()

plt.figure()
#Train and validation loss
plt.plot(epochs, loss, 'b', label='Training loss')
plt.plot(epochs, val_loss, 'r', label='Validation loss')
plt.title('Training and Validation loss')
plt.legend()



## Let's see how it goes on the test set

In [None]:
#Now lets predict on the first 10 Images of the test set
X_test, y_test = read_and_process_image(test_imgs[0:10]) #Y_test in this case will be empty.
x = np.array(X_test)
test_datagen = ImageDataGenerator(rescale=1./255)

i = 0
text_labels = []
plt.figure(figsize=(30,20))
for batch in test_datagen.flow(x, batch_size=1):
    pred = model.predict(batch)
    if pred > 0.5:
        text_labels.append('dog')
    else:
        text_labels.append('cat')
    plt.subplot(5 / columns + 1, columns, i + 1)
    plt.title('This is a ' + text_labels[i])
    imgplot = plt.imshow(batch[0])
    i += 1
    if i % 10 == 0:
        break


## Let's compare the different models using tensorboard

In [None]:
# Load the TensorBoard notebook extension
%load_ext tensorboard

%tensorboard --logdir f'{root_path}logs'
