# Gap Framework - Computer Vision / CNN

In this tutorial, we will show you how to prepare a dataset for a convolutional neural network. We will do the following:

1. Preprocess a collection of images of fruits from the Kaggle Fruits-360 dataset into Machine Learning ready data.
2. Store the Machine Learning ready data into a repository.
3. Create a batch feeder.
4. Create a CNN.
5. Retreive the Machine Learning ready data.
6. Train the CNN with our Machine Learning ready data.

In [None]:
# Let's go the directory of the Gap Framework
import os
os.chdir("../")
!cd
#!ls on linux

### Setup

Let's start by importing the Gap <b style='color:saddlebrown'>vision</b> module.

In [None]:
# import the Gap Vision module
from gapml.vision import Image, Images

### Location of Dataset

The Fruits 360 dataset can be downloaded from:

http://www.labs.earth/datasets/fruits360.zip

Let's go to a respository of images for classifying types of fruits. We will use this repository for image preprocessing for computer vision.

The training and test datasets are under the corresponding subfolders Training and Test. Each subfolder under Training (and Test) is named according to the type of fruit (e.g., Apple) and optionally followed by a variety (e.g., Red Delicious).

Let's take a look at the subfolders and see how many different classes of fruits are in our training set (i.e., 76).

In [None]:
!cd

In [None]:
#os.chdir("../FruitMaps/fruits/fruits-360/Training")

# Let's get a list of all the subfolders of collections of fruits
labels = os.listdir()
print("Number of Labels:", len(labels))
print(labels)

Let's now look a little closer at the images in the training set. We will dive into the first subfolder (Apple Braeburn).

In [None]:
# Let's get a listing of all the images in the first subfolder.
data = os.listdir(labels[0])
print("Number of Images:", len(data))

We will use openCV to get some basic information on the images. From the shape of the pixel data we see that its a 100x100 pixel image with three channels (i.e., RGB).

In [None]:
# Import the openCV module
import cv2

# We will look at the first image in this first collection.
print(data[0])

# Use openCV to read the image into memory as an uncompressed bitmap
pixels = cv2.imread(labels[0] + '/' + data[0])

# Let's look at the shape of the image.
print(pixels.shape)

Let's look at a few more random images in this subfolder and see if they are all the same size and type.

In [None]:
# Our random selection of images
for index in [ 7, 26, 143 ]:
    print(data[index])

    # Use openCV to read the image into memory as an uncompressed bitmap
    pixels = cv2.imread(labels[0] + '/' + data[index])

    # Let's look at the shape of the image.
    print(pixels.shape)

Okay, the are the same size.

Let's look a different collection of fruits and see if they too are the same size. Let's use the 8th (index 7) subfolder.

In [None]:
# Our random selection of images
for index in [ 7, 26, 143 ]:
    print(data[index])

    # Use openCV to read the image into memory as an uncompressed bitmap
    pixels = cv2.imread(labels[7] + '/' + data[index])

    # Let's look at the shape of the image.
    print(pixels.shape)

### Practice

Let's do a practice run and preprocess one collection of fruit images.

Note, how we specified the subfolder instead of a list for the parameter images. The initializer (constructor) looks at the parameter and if its not a list, but a string it presumes the parameter is a path to a folder with images.

In [None]:
images = Images(labels[0], 0)
print("TIME", images.time)

Perhaps our images won't need to be as big to train the CNN. Let's take a shot in the dark and say they only need to be 50x50. This will reduce the size of our data by 75%.

In [None]:
images = Images(labels[0], 0, config=['resize=(50,50)'])
print("TIME", images.time)

os.remove('collection.0_100.h5')

### Prepare the Data

The labels of the fruits are names, but we need integer values to train the CNN. Since all the subfolder names (fruit name+variety) are in the list labels, we will use the index of the list as the labels.

For brevity of time, we will only create machine learning ready data for three of the fruit collections (hence why we commented out the line for doing the entire set of fruits).

In [None]:
# Process all the Collections (subfolders) of Fruits
#images = Images(labels, [l for l in range(len(labels))], config=['resize=(50,50)'], name='fruits')

# For brevity, let's just do three of them
images = Images([labels[0], labels[1], labels[2]], [l for l in range(3)], config=['resize=(50,50)'], name='fruits')

print("TIME:", images.time)

### Batch Generation

In the full Kaggle Fruits360 dataset, the training and test data are in separate collections. 

Since for this code along we are just using a subset for demonstration, we will use only images from the training set and split part of it into our test set, as well as randomize the order of the training set. We will set the split to 20% test and use 42 as our random seed.

In [None]:
# Keep all the images for training, randomize their order
images.split = 0.2, 42

# Let's look at the (internal) _train property and verify that the indices of the images has been randomized.
images._train

Let's now split the data. Note how the method looks similar to sci-learn's train_test_split() function, but much simpler to use.

In [None]:
# When used as a getter, the split property will return the training / test data and labels the same as the sci-learn
# procedure train_test_split()
X_train, X_test, Y_train, Y_test = images.split

In [None]:
print("Number of Images", len(X_train))
print("Image Example", X_train[0])
print("Label", Y_train[0])

Let's create our mini-barch generator.

In [None]:
images.minibatch = 32

##  Construct the CNN

In [None]:
# Importing Tensorflow
import tensorflow as tf
from tensorflow.python.framework import ops

### Input Vector and Output Vector and Hyperparameter Placeholders

For our first tensorflow step, we will setup a Tensorflow placeholders.

We have four placeholders we need to declare, one for the input vector (pixel image data, one for the output vector (fruit classifier), one for the dropout rate and one for the learning rate.

For our input placeholder (which we call X), we have 7500 features (pixels per image). For the output vector (which we call Y), we have have 3 classifiers (3 different fruits). In both cases, we set the second dimension of our vector to None. The None is a placeholder for the number of samples we will feed into the neural network at run-time. We also know that our data is floating point values between 0 and 1, so we will set the data type to float32.

We will declare two more placeholders for setting some hyper-parameters, the percent to keep in the dropout layer (D) and the learning rate in the optimizer (L). Since both are scalar values, we will define their shape as a single value.

In [None]:
# Let's first reset our graph, so our neural network components are all declared within the same graph
ops.reset_default_graph() 

X = tf.placeholder(tf.float32, [None, 50, 50, 3]) # shape = [batch, width, height, channels ]
Y = tf.placeholder(tf.float32, [None, 3])  # shape = [batch, number of labels ]
D = tf.placeholder(tf.float32, [])
L = tf.placeholder(tf.float32, [])

### INPUT (CONVOLUTION) LAYER

Let's now design our input convolution layer. For our convolutional layer, we will need a set of filters, weights for the filters and biases for the output. We will use 32 filters. Each filter will be 5 x 5 (pixels) in size and one channel (i.e., single plane) corresonding to grayscale image.

Each input filter will need a weight (which our model will learn during training). The weight is multipled against the value of the input (filter), which we symbolically represent as Wx. 

Each output from the layer will need a bias (which our model will learn during training). The bias is added to the result of the weight multipled by the filter (Wx + b).

Let's create two Tensorflow variables for our weights and biases. The weights (which we call W) will need to be a 4D matrix. The first two dimensions are the filter size (5 x 5), then the number of channels, and then the number of outputs, which will be 32.

The bias will be a vector of size 32 (one for each output).

We need to initialize our weights and biases to some initial value. We will initialize the weights using a random value initializer (normalized distribution) and initialize the biases to 0.1.

In [None]:
tf.set_random_seed(1)   # Set the same seed to get the same initialization as in this demo.

# The weights for the input (convolutional) layer
# 5x5 pixel filter, 3 channels, 32 outputs (filters)
W1 = tf.Variable(tf.truncated_normal([5, 5 , 3, 32], stddev=0.1))

# The bias for the output from the input (convolutional) layer
b1 = tf.Variable(tf.constant(0.1, shape=[32]))

Let's put it together into an input (convolutional) layer. We will use the Tensorflow method tf.nn.conv2d() to apply the filters and the weights (our variable W1) against the inputs (our placeholder X), add in the bias (b1), and pass the output through a linear rectifier (RELU) activation function.

- We need to reshape our flattened input data (X - which is our input placeholder) back into a 50x50 2D matric (bitmap) with three color channels - tf.reshape(X, [-1, 50, 50, 3])
- We will set our stride for the sliding filters to move one pixel at a time in each direction.
- We will set the padding when the filter moves past the edge of the bitmap to same.
- Add the bias to the 32 outputs from our convolution.
- Pass the outputs from the input (convolutional) layer through a RELU activation function.

In [None]:
# The first layer (2D Convolution)

Z1 = tf.nn.conv2d( input=X,     # tf.reshape(X, [-1, 50, 50, 3]),  
                   filter=W1,           
                   strides=[1,1,1,1],
                   padding='SAME') + b1

A1 = tf.nn.relu(Z1)

In [None]:
# Let's look at the what the shape of the output tensor will be from the activation unit. 
# As you can see, it will be 50x50 pixels with 32 channels.
print(A1)

### MAX POOLING LAYER

The max pooling layer will have as input the output from the first layer, which is a 4D matrix (batch, height, width, channels), where the number of channels is 32. We will use a 2x2 pooling window over each channel, with a stride of 2.

In [None]:
# the second layer (max pooling)

Z2 = tf.nn.max_pool(A1, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')

In [None]:
# Let's look at the shape of the output tensor will be from the max pooling layer.
# As you can see, it has been downsampled to 25x25 pixels with 32 channels.
print(Z2)

### FIRST HIDDEN LAYER

The first hidden layer will have as inputs the flatten outputs from max pooling layer and 256 outputs. 

Let's start by flattening the output from the max pooling layer.

In [None]:
F2 = tf.reshape(Z2, [-1, 25*25*32])  # Flatten each 25x25 pixel with 32 channels to single 1D vector
print(F2)
print(F2.get_shape()[1])

Each input will need a weight and each output a bias (which we will train). Each output will be passed through the linear rectifier unit (RELU) activation function.

We will initialize the weights using a random value initializer (Xavier) and initialize the biases to zero.

In [None]:
# The return value from F2.get_shape() needs to be casted into an int.
W3 = tf.Variable(tf.truncated_normal([int(F2.get_shape()[1]), 256], stddev=0.1))

b3 = tf.Variable(tf.constant(0.1, shape=[256]))

Let's construct the first hidden layer

- Create a node that will multiply the weights (W3) against the outputs of the max pooling layer (F2)
- Create a node that adds the bias (b3) to the above node (F2 * W3).
- Pass the output of the hidden layer through a dropout layer
- Pass the outputs from the dropout layer through a RELU activation function

In [None]:
# The third layer (first hidden layer)
Z3 = tf.add(tf.matmul(F2, W3), b3)

# Let's add the dropout layer to the output signal from the second layer
D3 = tf.nn.dropout(Z3, keep_prob=D)

# Let's add the activation function to the output signal from the dropout layer
A3 = tf.nn.relu(D3)

### SECOND HIDDEN LAYER

The second hidden layer will have 256 inputs (outputs from first hidden layer) and 20 outputs. Each input will need a weight and each output a bias (which we will train). Each output will be passed through the linear rectifier unit (RELU) activation function.

We will initialize the weights using a random value initializer (Xavier) and initialize the biases to zero.

In [None]:
W4 = tf.get_variable("W4", [256, 20], initializer=tf.contrib.layers.xavier_initializer(seed=1))
b4 = tf.get_variable("b4", [1, 20], initializer=tf.zeros_initializer())

#### Let's construct the second hidden layer

- Create a node that will multiply the weights (W4) against the outputs of the first hidden layer (A3).
- Create a node that adds the bias (b4) to the above node (W4 * A3)
- Pass the outputs from the second hidden layer through a RELU activation function

In [None]:
# The fourth layer (second hidden layer)
Z4 = tf.add(tf.matmul(A3, W4), b4) 

# Let's add the activation function to the output signal from the third layer
A4 = tf.nn.relu(Z4)

### OUTPUT LAYER

The output layer will have 20 inputs (outputs from the second hidden layer) and 3 outputs (one for each type of fruit). Each input will need a weight and each output a bias (which we will train). The 3 outputs will be passed through a softmax activation function. 

We will initialize the weights using a random value initializer (Xavier) and initialize the biases to zero.

In [None]:
W5 = tf.get_variable("W5", [20, 3], initializer=tf.contrib.layers.xavier_initializer(seed=1))
b5 = tf.get_variable("b5", [1, 3], initializer=tf.zeros_initializer())

Let's construct the output layer

- Create a node that will multiply the weights (W4) against the outputs of the second hidden layer (A3).
- Create a node that adds the bias to the above node (W4 * A3).
- Pass the outputs from the output layer through a SOFTMAX squashing function (done by the optimizer).

In [None]:
# The fifth layer (output layer)
Z5 = tf.add(tf.matmul(A4, W5), b5) 

### OPTIMIZER

Now its time to design our optimizer. Let's start by designing our cost function. We will use the mean value of the softmax cross entropy between the predicted labels and actual labels. This is what we want to reduce on each batch.

In [None]:
# to fit the tensorflow requirement for tf.nn.softmax_cross_entropy_with_logits(...,...)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=Z5, labels=Y))

Let's design our optimizer. This is the method that adjusts the values of the weights and biases, based on minimizing the cost value during training.

We also need to set a learning rate. This is multiplied against the gradient calculation. It's used to prevent huge swings in setting weights which can result in either converging at a local (instead of global) optima, or not converging at all (infinite gradient). We will set the learning rate when we run the graph using the placeholder L.

In [None]:
# The learning rate for Gradient Descent algorithm
# learning_rate = 0.5

optimizer = tf.train.GradientDescentOptimizer(L).minimize(cost)

### Run the Graph

We've built our Tensorflow graph for training our data. So, let's start training it.

First, we need to call Tensorflow's global_variables_initializer() method to initialize the variables we've defined. We will create this as another node, which will be the first node we run (evaluate) in our graph.

In [None]:
init = tf.global_variables_initializer()

It's also a good idea to know how long your training takes, so let's import the time library.

In [None]:
import time

Let's set our hyperparameters.

We need to set the number of epochs (that's how many times we run the training data through the neural network), and the batch size. The batch size is a small subset of the entire training set. We will be running a batch at a time per epoch. After each batch, then the cost is computed and backpropagated through the neural network.

In [None]:
import time

epochs = 25                                    # run a 25 epochs
batch_size = 32                                # for each epoch, train in batches of 100 images
number_of_images = len(X_train)                # number of images in training data

# Feed Dictionary Parameters
keep_prob = 0.9                                # percent of outputs to keep in dropout layer
learning_rate = 0.02                           # the learning rate for graident descent

In [None]:
def train():
    start = time.time()

    with tf.Session() as sess:
        # Initialize the variables
        sess.run(init)
        
        # number of batches in an epoch
        batches = number_of_images // batch_size

        # run our training data through the neural network for each epoch
        for epoch in range(epochs):

          epoch_cost = 0

          # Run the training data through the neural network
          for batch in range(batches):

              # Calculate the start and end indices for the next batch
              begin = (batch * batch_size)
              end   = (batch * batch_size) + batch_size


              # Get the next sequential batch from the training data
              batch_xs, batch_ys = X_train[begin:end], Y_train[begin:end]

              # Feed this batch through the neural network.
              _, batch_cost = sess.run([optimizer, cost], feed_dict={X: batch_xs, Y: batch_ys, D: keep_prob, L: learning_rate})

              epoch_cost += batch_cost

          print("Epoch: ", epoch, epoch_cost / batches)

        end = time.time()

        print("Training Time:", end - start)

        # Test the Model

        # Let's select the highest percent from the softmax output per image as the prediction.
        prediction = tf.equal(tf.argmax(Z5), tf.argmax(Y))

        # Let's create another node for calculating the accuracy
        accuracy = tf.reduce_mean(tf.cast(prediction, tf.float32))

        # Now let's run our trainingt images through the model to calculate our accuracy during training
        # Note how we set the keep percent for the dropout rate to 1.0 (no dropout) when we are evaluating the accuracy.
        print ("Train Accuracy:", accuracy.eval({X: X_train, Y: Y_train, D: 1.0}))

        # Now let's run our test images through the model to calculate our accuracy on the test data
        print ("Test Accuracy:", accuracy.eval({X: X_test, Y: Y_test, D: 1.0}))
        
train()