We will be using the aerial cactus dataset provided by J. Irving Vasquez-Gomez on Kaggle. This dataset is split between two classes, "cactus" and "no cactus". The datasets provided have already been split into a training and validation set respectively, so we do not necessarily need to split the dataset independently. There are 17,000 images in the training set and 4,000 images in the validation set. We'll be using some of the images from the validation set for testing out the prediction from our models.

Please note that you must set your runtime on Colab to GPU as this will speed up model performance. 


We want to make sure that we mount our root directory correctly in order to move into the directory where we want to save our data in.

In [None]:
from google.colab import drive 

ROOT = "/content/drive"     
print(ROOT)                 

drive.mount(ROOT)           

In [None]:
#%cd /content/drive/My Drive/
#make sure to note in which directory you'll be saving the code. 
#I recommend to save it in a separate directory for the following steps.  

Follow tutorial [here](https://medium.com/@opalkabert/downloading-kaggle-datasets-into-google-colab-fb9654c94235) on how to download kaggle data from Colab.

In [None]:
!pip install -U -q kaggle
!mkdir -p ~/.kaggle

In [None]:
from google.colab import files
files.upload()

Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"jofr94","key":"f5e4678eef6761c307bb0faaa0b01710"}'}

In [None]:
!cp kaggle.json ~/.kaggle/

In [None]:
!kaggle datasets download -d irvingvasquez/cactus-aerial-photos
!ls

In [None]:
!mkdir dataset/
from zipfile import ZipFile 
with ZipFile('cactus-aerial-photos.zip', 'r') as cactus_images: 
  cactus_images.extractall('dataset/')

In [None]:
import os
from matplotlib import pyplot as plt

We want to firstly display the images, so we'll know what we're dealing with from a high level. I'm using the *matplotlib* library as well as python's *os* library to do so. Note that the original size of the images are 32x32, but I'm enlarging them for display purposes.

In [None]:
def display_images(path):
  images = []
  labels = []
  for img_path in os.listdir('dataset/training_set/training_set/' + path):
    images.append(plt.imread('dataset/training_set/training_set/' + path + img_path))
    labels.append(img_path)
  fig = plt.figure()
  index = 0
  plt.figure(figsize=(15, 15))
  print('There are {} images.'.format(len(images)))
  for image, labels in zip(images[:9], labels[:9]):
    index += 1
    plt.subplot(3, 3, index)
    plt.imshow(image)
    plt.title(labels)
    plt.axis("off")
  plt.show()

In [None]:
display_images('cactus/')

In [None]:
display_images('no_cactus/')

Here, we'll be setting up the pipeline for data augmentation that will be applied on to the training and validation sets. Keras allows us to do so by using the *ImageDataGenerator* object where we can input the different types of augmentations we can apply to each image, such as, flipping the image, enlarging it, smoothing it, and so on. 

We'll then use the *flow_from_directory* function to apply it to each of the sets. Please note that for the training set, it is better to have it in a randomized fashion than in its original order. However, for the validation set (and testing set), we must leave it as is as these will be the sets to use for model evaluation and prediction. Because we're dealing with binary classification, we must set the class mode to binary. 

S/N: About batch sizes, batch sizes are important to experiment with in order to improve on model performance and efficiency. Batch sizes determines how many data points that the model will interpret at a time to see which one will fit into which class. It might be better to start with a smaller batch size, because the model might determine the best classes to fit faster. However, if you find poor model performance with a smaller size, then slowly increase it over time to observe any changes in performance. 

In [None]:
from keras.preprocessing.image import ImageDataGenerator
import keras 

dir_train = "dataset/training_set/training_set/"
dir_valid = "dataset/validation_set/validation_set/"

target_w, target_h = 32, 32
batch_size = 32

datagen_train = ImageDataGenerator(rescale=1./255.0,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True, 
        vertical_flip=True)
train_gen = datagen_train.flow_from_directory(
    dir_train, 
    target_size=(target_w, target_h),
    batch_size=batch_size,
    shuffle=True,
    class_mode="binary")
validation_gen = datagen_train.flow_from_directory(
    dir_valid, 
    target_size=(target_w, target_h),
    batch_size=batch_size, 
    shuffle=False,
    class_mode="binary")


Now, we're going to setup our baseline model. It'll compose of 2 convolutional layers, 1 batch normalization layer, 1 max pooling layer, 2 dropout layers with 30% and 50% probabilities respectively, and 3 fully-connected layers. The last fully-connected layer will be our final output. 

In addition to these main layers, we'll be also using a "flattening" layer in order to reduce the dimensionality of the previous layers. As you may probably pick up, dimensionality reduction is key component to CNNs. It helps speed up the computational process of our model, and it'll allow it to generalize for other data that wasn't seen in the dataset. It's highly important to not overfit the model, otherwise, it'll only predict accurately what it computed from the training set. 

In [None]:
base_model = keras.Sequential()
base_model.add(keras.layers.Conv2D(16, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=(target_w, target_h, 3)))
base_model.add(keras.layers.BatchNormalization())
base_model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
base_model.add(keras.layers.Conv2D(32, (3, 3), activation='relu'))
base_model.add(keras.layers.BatchNormalization())
base_model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
base_model.add(keras.layers.Dropout(0.5))
base_model.add(keras.layers.Flatten())
base_model.add(keras.layers.Dense(64, activation='relu')) 
base_model.add(keras.layers.Dense(32, activation='relu')) 
base_model.add(keras.layers.Dropout(0.5)) 
base_model.add(keras.layers.Dense(1, activation='sigmoid'))

Let's briefly go over some of the parameters that you see here for each label. 

Prior to our first official layer, we have a *Sequential* object. It's important to note that we'll be encasing our layers into this object as this will allow us to linearly stack everything into one "data structure". You can find similarities with this and other linear based data strucutres such as arrays, lists, and stacks! 

Starting from the convolutional layer, we the first parameter is the filter. The filter determines the number of dimensions for the output after the convolutions have been calculated. To read more about how convolutions are applied to the input image, check [this article](https://connect2compute.wordpress.com/2019/02/19/introduction-to-convolutions-in-deep-learning/) out. The second parameter is the kernel size which is the size of the convolutional window. There is no sure fire way to determine the window's size, but it's best to err on the side of smaller windows to apply multiple convolutions on the image. For the third parameter, we have the activation function which was previously mentioned in the lecture. We're using a [ReLU activation function](https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/) in order to determine the probability of which aspects of the image the convolution picks up on to place it in certain category. What defers ReLU from other activation functions is that as long as a probability is greater than 0, it'll choose that value, otherwise, it'll be set to 0. Finally, we have the input size, which is 28x28. 

Recall that the batch normalization layer is a way for us to scale the pixel values in mini-batches. It is a regularization technique which helps us prevent the model from overfitting. 

Similar in nature, the max pooling layer allows us to aggregate values to reduce the feature dimension space. As you can see, we're determining the pooling size, which is similar to the kernel size in the convolutional layer. Again, there is no sure fire way of determining a proper pooling size, but err on the side of small so we don't end up with too little information. 

The dropout layer is another dimensionality reduction technique where we can turn off certain hidden neurons in network (before the drop layer is introduced) by certain probability. 

As you can see, we have a flatten layer introduced here. It allows us to condense all of the layers before that point into one vector. The reason why being that we need to use fully-connected (AKA Dense layers in Keras) layers when setting up our final outputs in a CNN. Fully-connected layers will NOT be able interpret multidimensional feature spaces such as convolutional layers. 

Next, we have some fully-connected layers. FC layers are still important for a CNN, because it receives all of the inputs from the previous layer. In a regular neural networks, FC layers will compute the dot-product of all of the previous inputs, adding weights to each, before passing a single (or more depending on the activation function) value. 

Finally, we have our last fully-connected layer. This is an absolute must as this will determine the final output of the CNN. In this case, we're outputting 1, as we're dealing with a binary classification task. It might sound unintuitive as we're dealing with 2 classes, "cactus" and "no_cactus", but the reason is that we want to output a single class, rather than have 2 possible probabilities. It's either going to be 0 or 1, yes or no, etc. If we were dealing with multiple labels, then we can up the amount of classes need in our final output in order to determine the highest probability. 

Now let's print out a summary of the model!

In [None]:
base_model.summary()

As you can see, we're using most of the total parameters here in the CNN, but you can definitely add more regularization techniques in order to decrease the amount. 

In the next block, we have our optimizer and compiler which passes in said optimizer, as well as a loss function, and an accuracy metric. Before explaining what does an optimizer do, we must first understand the logic behind a loss function. The loss function baasically allows model the minimize the error to find the optimal placement of a datapoint to be placed in a certain class. It is common to use gradient descent under the hood, as it allows us to find the global minimum (which will result in the lowest loss value) by maximizing the steepest rate of change (or descent in this case). To get a more indepth analysis of gradient descent, check out [this article](https://https://analyticsindiamag.com/guide-to-tensorflow-keras-optimizers/). 

It's important to mention that our loss function deals with binary [cross-entropy](https://towardsdatascience.com/cross-entropy-loss-function-f38c4ec8643e). The cross-entropy part of the equation is the loss function, but this is geared towards binary classification. 

The optimizer comes in as away to essentially "optimize" our gradient descent. It helps us get to our global minimum much faster depending the parameters we set to it. There is no sure fire way, as you can probably pick up by now, of choosing the most optimal parameters, so feel free to experiment. 

In [None]:
optimizer = keras.optimizers.Adam(lr=0.01)
base_model.compile(loss ="binary_crossentropy", optimizer=optimizer, 
              metrics=['accuracy'])

Here, I am installing a library called livelossplot in order to quickly plot out how well our accuracy scores and loss scores are doing per epoch. Just to add here, you can think of an epoch as an iteration a model makes as it continues to fit onto the data.

In [None]:
!pip3 install livelossplot

Here, I have some extra variables that I want to add. These are totally optional, but I would recommend to use them in your practice, as it allows you to get the most optimal version of your model. 

For the *ReduceLROnPlateau* function, it basically monitors and reduces the learning rate on the optimizer whenever the loss function starts to plateau. Reducing the learning rate can be helpful for getting to the global minimum faster. You can set the lower bound of how far you want the optimizer's learning rate to decrease. 

As for the *ModelCheckpoint* function, this allows you to save the best weights for further use. This can be helpful for other image classifiers that you'd like to build in the future, but you already have a model that can classify well on similar images. It is set to validation accuracy, so it'll monitor over that metric. 

As for the *EarlyStopping* function, this allows the model to stop training where it thinks it has found its most optimal model. This is dependent on which metric you'd like to monitor as well as the minimum change it sees as improvement and the number of epochs it'll let performance plateau before stopping.

In [None]:
from livelossplot import PlotLossesKerasTF

epochs = 20
steps_per_epoch = train_gen.n//train_gen.batch_size #length of the training set / batch size
validation_steps = validation_gen.n//validation_gen.batch_size #length of the validation set / batch size 

reduce_lr = keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.1,
                              patience=2, min_lr=0.001, mode='auto')

checkpoint = keras.callbacks.ModelCheckpoint("model_weights.h5", monitor='val_accuracy',
                             save_weights_only=True, mode='max', verbose=1)

early_stop = keras.callbacks.EarlyStopping(monitor='val_accuracy', min_delta=0, patience=10, verbose=1, mode='auto',
                           restore_best_weights=True)

callbacks = [PlotLossesKerasTF(), checkpoint, reduce_lr, early_stop]

history = base_model.fit(
    x = train_gen,
    steps_per_epoch = steps_per_epoch,
    epochs = epochs,
    validation_data = validation_gen,
    validation_steps = validation_steps,
    callbacks=callbacks
)


Make sure to monitor and jot down your results from the model fitting process. Did model performed well or did it overfit (high accuracy score in training but low in validation)? 

Also from these graphs alone, you may start to see the relationship between the accuracy metric and the loss score. They are pretty much an inverse of one another, where as the accuracy metric increases, the loss metric is also decreasing and vice-versa.


We can also display a [confusion matrix](https://towardsdatascience.com/understanding-confusion-matrix-a9ad42dcfd62) to view from a high-level which labels did the model applied correctly. 

In [None]:
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns 

def display_matrix(model, validation_gen, validation_steps):
  valid_pred = model.predict_generator(validation_gen, validation_steps)
  valid_pred = np.argmax(valid_pred, axis=1)
  conf_mat = confusion_matrix(validation_gen.classes, valid_pred)


  figure = plt.figure(figsize=(8, 8))
  sns.heatmap(conf_mat, annot=True,cmap=plt.cm.Blues)
  plt.tight_layout()
  plt.ylabel('True label')
  plt.xlabel('Predicted label')
  plt.show()

display_matrix(base_model, validation_gen, validation_steps)

Now let's start building out the customized LeNet-5 model from the [paper](https://jivasquez.files.wordpress.com/2019/03/rp_cactus_recognition_elsa-1.pdf), so we can do side by side comparisons with our baseline.

Note that I'm defining our data augmentation operations from above once again, because I want to tune the parameters to follow the same format that was in the original paper.

In [None]:
from keras.preprocessing.image import ImageDataGenerator
import keras 

dir_train = "dataset/training_set/training_set/"
dir_valid = "dataset/validation_set/validation_set/"

target_w, target_h = 32, 32
batch_size = 2500 #using the original batch_size defined by the paper 

datagen_train = ImageDataGenerator(rescale=1./255.0,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True, 
        vertical_flip=True)
train_gen = datagen_train.flow_from_directory(
    dir_train, 
    target_size=(target_w, target_h),
    batch_size=batch_size,
    shuffle=True,
    class_mode="binary")
validation_gen = datagen_train.flow_from_directory(
    dir_valid, 
    target_size=(target_w, target_h),
    batch_size=batch_size, 
    shuffle=False,
    class_mode="binary")

In [None]:
lenet_model = keras.Sequential()
lenet_model.add(keras.layers.Conv2D(6, kernel_size=(5, 5),activation='relu',input_shape=(target_w, target_h, 3)))
lenet_model.add(keras.layers.BatchNormalization())
lenet_model.add(keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
lenet_model.add(keras.layers.Conv2D(16, kernel_size=(5, 5),strides=1,activation='relu'))
lenet_model.add(keras.layers.BatchNormalization())
lenet_model.add(keras.layers.MaxPooling2D(pool_size=(2, 2),strides=(2, 2)))
lenet_model.add(keras.layers.Flatten())
lenet_model.add(keras.layers.Dense(120, activation='relu'))
lenet_model.add(keras.layers.Dense(84, activation='relu'))
lenet_model.add(keras.layers.Dropout(0.5))
lenet_model.add(keras.layers.Dense(1, activation='sigmoid'))
lenet_model.summary()

In [None]:
optimizer = keras.optimizers.Adam(lr=0.01)
lenet_model.compile(loss="binary_crossentropy", optimizer=optimizer, 
              metrics=['accuracy'])

In [None]:
epochs = 150 #as noted by paper 
steps_per_epoch = train_gen.n//train_gen.batch_size #not noted by paper 
validation_steps = validation_gen.n//validation_gen.batch_size #not noted by paper 


callbacks = [PlotLossesKerasTF()]

history = lenet_model.fit(
    x = train_gen,
    steps_per_epoch = steps_per_epoch,
    epochs = epochs,
    validation_data = validation_gen,
    validation_steps = validation_steps,
    callbacks=callbacks
)


Make sure to monitor and jot down your results from the model fitting process. Did model performed well as to be expected or did it overfit (high accuracy score in training but low in validation)? 

We won't be adapting the model to better the performance here (just in case it did underperform), but I encourage you to think of other ways we can reduce the overfitting that is happening here with the LeNet-5 model. 

Let's also apply the confusion matrix here to see how we're doing from a high-level.

In [None]:
display_matrix(lenet_model, validation_gen, validation_steps)

Now let's do some actual predictions with our model. We firstly want to to make sure that it's reshaped for the predict function to fit onto it. Then, we'll be choosing two random images from the validation set to see if the model predicts the correct labeling for both. 

In [None]:
from keras.applications.imagenet_utils import preprocess_input, decode_predictions
from keras.preprocessing import image
import numpy as np

def processImg(image_path):
    img = image.load_img(image_path, target_size=(target_w, target_h))
    img = image.img_to_array(img)
    img = img.reshape(1, target_w, target_h, 3)
    img = preprocess_input(img)
    return img

In [None]:
from keras.preprocessing.image import load_img
import random

cactus_dict = train_gen.class_indices
cactus_dict = {y:x for x, y in cactus_dict.items()} #switching key-value pair around
print(cactus_dict)
cactus_path = 'dataset/validation_set/validation_set/cactus/'
no_cactus_path = 'dataset/validation_set/validation_set/no_cactus/'
path1 = random.choice(os.listdir(cactus_path))
path2 = random.choice(os.listdir(no_cactus_path))

img1_prob_base = base_model.predict(processImg(cactus_path+path1))
img1_classes_base = cactus_dict[np.argmax(img1_prob_base, axis=1)[0]]
img2_prob_base = base_model.predict(processImg(no_cactus_path+path2))
img2_classes_base = cactus_dict[np.argmax(img2_prob_base, axis=1)[0]]


f = plt.figure()
ax1 = f.add_subplot(1, 2, 1)
ax1.title.set_text("(B Model) Predicted: {}".format(img1_classes_base))
plt.imshow(load_img(cactus_path+path1))
plt.xticks([]); plt.yticks([])
ax2 = f.add_subplot(1, 2, 2)
ax2.title.set_text("(B Model) Predicted: {}".format(img2_classes_base))
plt.imshow(load_img(no_cactus_path+path2)) 
plt.xticks([]); plt.yticks([])
plt.show(block=True)

In [None]:
img1_prob_lenet = lenet_model.predict(processImg(cactus_path+path1))
img1_classes_lenet = cactus_dict[np.argmax(img1_prob_lenet, axis=1)[0]]
img2_prob_lenet = lenet_model.predict(processImg(no_cactus_path+path2))
print(np.argmax(img2_prob_lenet))
img2_classes_lenet = cactus_dict[np.argmax(img2_prob_lenet, axis=1)[0]]

f = plt.figure()
ax1 = f.add_subplot(1, 2, 1)
ax1.title.set_text("(L Model) Predicted: {}".format(img1_classes_lenet))
plt.imshow(load_img(cactus_path+path1))
plt.xticks([]); plt.yticks([])
ax2 = f.add_subplot(1, 2, 2)
ax2.title.set_text("(L Model) Predicted: {}".format(img2_classes_lenet))
plt.imshow(load_img(no_cactus_path+path2)) 
plt.xticks([]); plt.yticks([])
plt.show(block=True)

How much better do you think the LeNet-5 model did overall? Or, if our baseline model performed better, how do you think we can modify the original team's customized version so we can achieve better performance? 

I hope this notebook as well as overall workshop will serve you well in your Machine/Deep Learning journey. Computer Vision is just one of many "sub-genres" of Deep Learning, and there are plenty of areas to explore before you start to settle on one you prefer. It's good to be a jack-of-all trades when it comes to Machine Learning but make sure you're highly competent in one area! 