# Welcome to my Convolutional Neural Network + Random Forest Classification Notebook
The aim of this notebook is to use a pretrained Convolutional Neural Network 'VGG-16' in addition to a customized chosen Classifier 'Random Forest'in order to classify digit images.

I will be using keras library in this notebook.


## Packages used in this notebook

In [None]:
#!pip install tqdm
#!pip install tensorflow-gpu

I will use GPU in order to accelerate training and prediction phases.

In [None]:
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

Found GPU at: /device:GPU:0


## Importing the 'MNIST' Dataset

The MNIST dataset contains 70 000 images of digits ranging from 0 to 9.<br>
The training set wcontains 60 000 images. While the test set contains 10 000 images.

Importing and preprocessing the images

In [None]:
from tensorflow.keras import datasets 
import numpy as np
import cv2
SIZE = 100

(x_train_raw, y_train_raw), (x_test_raw, y_test_raw) = datasets.mnist.load_data()
x_train_raw, y_train_raw, x_test_raw, y_test_raw = x_train_raw, y_train_raw, x_test_raw, y_test_raw
x_train = []
for img in x_train_raw:
  img = cv2.resize(img, (SIZE, SIZE))
  img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
  x_train.append(img)

x_train = np.array(x_train)

x_test = []
for img in x_test_raw:
  img = cv2.resize(img, (SIZE, SIZE))
  img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
  x_test.append(img)

x_test = np.array(x_test)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


## Importing the pretrained 'VGG-16' model
The aim of this part is to exploit the capacity of the pretrained model to recognize specific features.

I will freeze the layers of this model so that the weights doesn't change during the training phase. 

In [None]:
from keras.applications.vgg16 import VGG16
with tf.device('/device:GPU:0'):
  vgg16_mdl = VGG16(weights='imagenet', include_top=False, input_shape=(SIZE, SIZE, 3))
  for layer in vgg16_mdl.layers:
    layer.trainable = False
vgg16_mdl.summary()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 100, 100, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 100, 100, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 100, 100, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 50, 50, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 50, 50, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 50, 50, 128)      

## Importing and Training Random Forest classifier

In [None]:
from sklearn.ensemble import RandomForestClassifier
with tf.device('/device:GPU:0'):
  classifier_mdl = RandomForestClassifier()

In [None]:
# Function used to split data into batches
def batch(iterable, size=1):
    l = len(iterable)
    for ndx in range(0, l, size):
        yield iterable[ndx:min(ndx + size, l)]

Training the Random Forest Classifier

In [None]:
from tqdm import tqdm

with tf.device('/device:GPU:0'):
  x_train_batches = list(batch(x_train, 500))
  y_train_batches = list(batch(y_train_raw, 500))

  for i in tqdm(range(len(x_train_batches))):
      base_output = vgg16_mdl.predict(x_train_batches[i])
      inter_output = base_output.reshape(base_output.shape[0], -1)
      classifier_mdl.fit(inter_output, y_train_batches[i])

100%|██████████| 120/120 [03:52<00:00,  1.94s/it]


## Evaluate the model
In this part, we use our stacked model 'VGG-16 + Random Forest Classifier' to predict the results on the testing/validation dataset.

In [None]:
with tf.device('/device:GPU:0'):
  x_test_batches = list(batch(x_test, 200))
  pred = []
  for i in tqdm(range(len(x_test_batches))):
      base_output = vgg16_mdl.predict(x_test_batches[i])
      inter_output = base_output.reshape(base_output.shape[0], -1)
      pred += classifier_mdl.predict(inter_output).tolist()


100%|██████████| 50/50 [00:28<00:00,  1.75it/s]


We can see that just by using the original predefined weights of the VGG-16 CNN model and training the classifier, we reached 90% accuracy on test dataset.

In [None]:
from sklearn import metrics

print("Accuracy = ", metrics.accuracy_score(y_test_raw, pred))

Accuracy =  0.8999
