## Implementing AlexNet CNN Architecture Using TensorFlow 2.0+ and Keras

Learn how to implement the neural network architecture that kicked off the deep convolutional neural network revolution back in 2012.

https://towardsdatascience.com/implementing-alexnet-cnn-architecture-using-tensorflow-2-0-and-keras-2113e090ad98

In [None]:
!pip install scikit-learn

In [1]:
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import os
import time

In [2]:
tf.__version__

'2.7.0'

In [3]:
keras.__version__

'2.7.0'

## 2. Dataset
https://www.cs.toronto.edu/~kriz/cifar.html

The CIFAR-10 dataset contains 60,000 colour images, each with dimensions 32x32px. The content of the images within the dataset is sampled from 10 classes.

CIFAR-10 images were aggregated by some of the creators of the AlexNet network, Alex Krizhevsky and Geoffrey Hinton.
The deep learning Keras library provides direct access to the CIFAR10 dataset with relative ease, through its dataset module. Accessing common datasets such as CIFAR10 or MNIST, becomes a trivial task with Keras.

In [4]:
(train_images, train_labels), (test_images, test_labels) = keras.datasets.cifar10.load_data()

In [5]:
CLASS_NAMES= ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

The CIFAR dataset is partitioned into 50,000 training data and 10,000 test data by default. The last partition of the dataset we require is the validation data.
The validation data is obtained by taking the last 5000 images within the training data.

In [6]:
# validation_images, validation_labels = train_images[:1000], train_labels[:1000]
# train_images, train_labels = train_images[1000:20000], train_labels[1000:20000]


from sklearn.model_selection import train_test_split

train_images, validation_images, train_labels, validation_labels = train_test_split(train_images, train_labels,
                                                                                    test_size= 0.3,
                                                                                   shuffle=True,
                                                                                   random_state=42,
                                                                                   stratify=train_labels)

train_images, validation_images, train_labels, validation_labels = train_test_split(train_images, train_labels,
                                                                                    test_size= 0.25,
                                                                                   shuffle=True,
                                                                                   random_state=42,
                                                                                   stratify=train_labels)

train_images.shape, train_labels.shape, validation_images.shape, validation_labels.shape

((26250, 32, 32, 3), (26250, 1), (8750, 32, 32, 3), (8750, 1))

To be able to access these methods and procedures, it is required that we transform our dataset into an efficient data representation TensorFlow is familiar with. This is achieved using the tf.data.Dataset API.
More specifically, tf.data.Dataset.from_tensor_slices method takes the train, test, and validation dataset partitions and returns a corresponding TensorFlow Dataset representation.

In [None]:
train_ds = tf.data.Dataset.from_tensor_slices((train_images, train_labels))
test_ds = tf.data.Dataset.from_tensor_slices((test_images, test_labels))
validation_ds = tf.data.Dataset.from_tensor_slices((validation_images, validation_labels))

## 3. Preprocessing
Preprocessing within any machine learning is associated with the transformation of data from one form to another.
Usually, preprocessing is conducted to ensure the data utilized is within an appropriate format.

In [None]:
# Excuse the blurriness of the images; the CIFAR-10 images have small dimensions, which makes visualization of the actual pictures a bit difficult.
plt.figure(figsize=(20,20))
for i, (image, label) in enumerate(train_ds.take(5)):
    ax = plt.subplot(5,5,i+1)
    plt.imshow(image)
    plt.title(CLASS_NAMES[label.numpy()[0]])
    plt.axis('off')
plt.show()

The primary preprocessing transformations that will be imposed on the data presented to the network are:
- Normalizing and standardizing the images.
- Resizing of the images from 32x32 to 227x227. The AlexNet network input expects a 227x227 image.

In [None]:
def process_images(image, label):
    # Normalize images to have a mean of 0 and standard deviation of 1
    image = tf.image.per_image_standardization(image)
    # Resize images from 32x32 to 227x227
    image = tf.image.resize(image, (227,227))
    return image, label

## 4. Data/Input Pipeline
An input/data pipeline is described as a series of functions or methods that are called consecutively one after another. 

In [None]:
import tensorflow as tf
tf.test.is_gpu_available()

In [None]:
train_ds_size = tf.data.experimental.cardinality(train_ds).numpy()
test_ds_size = tf.data.experimental.cardinality(test_ds).numpy()
validation_ds_size = tf.data.experimental.cardinality(validation_ds).numpy()
print("Training data size:", train_ds_size)
print("Test data size:", test_ds_size)
print("Validation data size:", validation_ds_size)

For our basic input/data pipeline, we will conduct three primary operations:
- Preprocessing the data within the dataset
- Shuffle the dataset
- Batch data within the dataset

In [None]:
train_ds = (train_ds
                  .map(process_images)
                  .shuffle(buffer_size=train_ds_size)
                  .batch(batch_size=4, drop_remainder=True))
test_ds = (test_ds
                  .map(process_images)
                  .shuffle(buffer_size=train_ds_size)
                  .batch(batch_size=4, drop_remainder=True))
validation_ds = (validation_ds
                  .map(process_images)
                  .shuffle(buffer_size=train_ds_size)
                  .batch(batch_size=4, drop_remainder=True))

## 5. Model Implementation
Here are the types of layers the AlexNet CNN architecture is composed of, along with a brief description:

**Convolutional layer**: A convolution is a mathematical term that describes a dot product multiplication between two sets of elements. Within deep learning the convolution operation acts on the filters/kernels and image data array within the convolutional layer. Therefore a convolutional layer is simply a layer the houses the convolution operation that occurs between the filters and the images passed through a convolutional neural network.

**Batch Normalisation layer**: Batch Normalization is a technique that mitigates the effect of unstable gradients within a neural network through the introduction of an additional layer that performs operations on the inputs from the previous layer. The operations standardize and normalize the input values, after that the input values are transformed through scaling and shifting operations.

**MaxPooling layer**: Max pooling is a variant of sub-sampling where the maximum pixel value of pixels that fall within the receptive field of a unit within a sub-sampling layer is taken as the output. The max-pooling operation below has a window of 2x2 and slides across the input data, outputting an average of the pixels within the receptive field of the kernel.

**Flatten layer**: Takes an input shape and flattens the input image data into a one-dimensional array.
Dense Layer: A dense layer has an embedded number of arbitrary units/neurons within. Each neuron is a perceptron.

### Some other operations and techniques utilized within the AlexNet CNN that are worth mentioning are:

**Activation Function**: A mathematical operation that transforms the result or signals of neurons into a normalized output. The purpose of an activation function as a component of a neural network is to introduce non-linearity within the network. The inclusion of an activation function enables the neural network to have greater representational power and solve complex functions.

**Rectified Linear Unit** Activation Function(ReLU): A type of activation function that transforms the value results of a neuron. The transformation imposed by ReLU on values from a neuron is represented by the formula y=max(0,x). The ReLU activation function clamps down any negative values from the neuron to 0, and positive values remain unchanged. The result of this mathematical transformation is utilized as the output of the current layer and used as input to a consecutive layer within a neural network.

**Softmax Activation Function**: A type of activation function that is utilized to derive the probability distribution of a set of numbers within an input vector. The output of a softmax activation function is a vector in which its set of values represents the probability of an occurrence of a class or event. The values within the vector all add up to 1.

**Dropout**: Dropout technique works by randomly reducing the number of interconnecting neurons within a neural network. At every training step, each neuron has a chance of being left out, or rather, dropped out of the collated contributions from connected neurons.

In [None]:
model = keras.models.Sequential([
    keras.layers.Conv2D(filters=96, kernel_size=(11,11), strides=(4,4), activation='relu', input_shape=(227,227,3)),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPool2D(pool_size=(3,3), strides=(2,2)),
    keras.layers.Conv2D(filters=256, kernel_size=(5,5), strides=(1,1), activation='relu', padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPool2D(pool_size=(3,3), strides=(2,2)),
    keras.layers.Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), activation='relu', padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), activation='relu', padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), activation='relu', padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPool2D(pool_size=(3,3), strides=(2,2)),
    keras.layers.Flatten(),
    keras.layers.Dense(4096, activation='relu'),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(4096, activation='relu'),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(10, activation='softmax')
])

## 6. TensorBoard
TensorBoard is a tool that provides a suite of visualization and monitoring mechanisms. For the work in this tutorial, we’ll be utilizing TensorBoard to monitor the progress of the training of the network.

More specifically, we’ll be monitoring the following metrics: training loss, training accuracy, validation loss, validation accuracy.

In the shortcode snippet below we are creating a reference to the directory we would like all TensorBoard files to be stored within. The function get_run_logdir returns the location of the exact directory that is named according to the current time the training phase starts.

To complete this current process, we pass the directory to store TensorBoard related files for a particular training session to the TensorBoard callback.

In [None]:
root_logdir = os.path.join(os.curdir, "logs","fit","")
def get_run_logdir():
    run_id = time.strftime("run_%Y_%m_%d-%H_%M_%S")
    return os.path.join(root_logdir, run_id)
run_logdir = get_run_logdir()
tensorboard_cb = keras.callbacks.TensorBoard(run_logdir)

## 7. Training and Results

To train the network, we have to compile it.

The compilation processes involve specifying the following items:

**Loss function**: A method that quantifies ‘how well’ a machine learning model performs. The quantification is an output(cost) based on a set of inputs, which are referred to as parameter values. The parameter values are used to estimate a prediction, and the ‘loss’ is the difference between the predictions and the actual values.

**Optimization Algorithm**: An optimizer within a neural network is an algorithmic implementation that facilitates the process of gradient descent within a neural network by minimizing the loss values provided via the loss function. To reduce the loss, it is paramount the values of the weights within the network are selected appropriately.

**Learning Rate**: An integral component of a neural network implementation detail as it’s a factor value that determines the level of updates that are made to the values of the weights of the network. Learning rate is a type of hyperparameter.

In Fit method: 

**Epoch**: This is a numeric value that indicates the number of time a network has been exposed to all the data points within a training dataset.

In [None]:
model.compile(loss='sparse_categorical_crossentropy',
              optimizer=tf.optimizers.SGD(lr=0.001),
              metrics=['accuracy'])
# model.summary()

In [None]:
model.fit(train_ds,
          epochs=50,          
          validation_data=validation_ds,
          validation_freq = 1,
          workers = 48,
          callbacks=[tensorboard_cb],
          use_multiprocessing=True
         )

In [None]:
model.save('Saved_model/CIFAR10_Alexnet_50_epochs.h5')

## Evaluate Model
After executing the cell block below, we are presented with a score that indicates the performance of the model on unseen data.

In [None]:
model.evaluate(test_ds)

The first element of the returned result contains the evaluation loss: 1.115, the second element indicates is the evaluation accuracy 0.785.

The custom implemented AlexNet network that was trained, validated, and evaluated on the CIFAR-10 dataset to create a model with an evaluation accuracy of 78.5% on a test dataset containing 2500 data points.

In [None]:
# # total_images = 10000
# # batch size = 10
# # steps per epoch = 1000
# # epochs = 1

# total_images :
#     train = 26250
#     val = 8750
# batch size = 4
# epochs = 50
# steps per epoch = 6562


In [None]:
!tensorboard --logdir logs --host 0.0.0.0


NOTE: Using experimental fast data loading logic. To disable, pass
    "--load_fast=false" and report issues on GitHub. More details:
    https://github.com/tensorflow/tensorboard/issues/4784

TensorBoard 2.7.0 at http://0.0.0.0:6006/ (Press CTRL+C to quit)


## Using Pretrained weights
https://keras.io/api/applications/#available-models
Let’s learn how to classify images with pre-trained Convolutional Neural Networks using the Keras library.

Keras Applications are deep learning models that are made available alongside pre-trained weights. These models can be used for prediction, feature extraction, and fine-tuning.

Weights are downloaded automatically when instantiating a model. They are stored at ~/.keras/models/.

Upon instantiation, the models will be built according to the image data format set in your Keras configuration file at ~/.keras/keras.json. For instance, if you have set image_data_format=channels_last, then any model loaded from this repository will get built according to the TensorFlow data format convention, "Height-Width-Depth".

### Note: each Keras Application expects a specific kind of input preprocessing. 

In [44]:
# The default input size for this model is 224x224.
# Note: each Keras Application expects a specific kind of input preprocessing. 
# For VGG16, call tf.keras.applications.vgg16.preprocess_input on your inputs before passing them to the model. 
# vgg16.preprocess_input will convert the input images from RGB to BGR, 
# then will zero-center each color channel with respect to the ImageNet dataset, without scaling.

# include_top: whether to include the 3 fully-connected layers at the top of the network.
# weights: one of None (random initialization), 'imagenet' (pre-training on ImageNet), or the path to the weights file to be loaded.
# input_tensor: optional Keras tensor (i.e. output of layers.Input()) to use as image input for the model.
# input_shape: optional shape tuple, only to be specified if include_top is False (otherwise the input shape has to be (224, 224, 3) (with channels_last data format) or (3, 224, 224) (with channels_first data format). It should have exactly 3 input channels, and width and height should be no smaller than 32. E.g. (200, 200, 3) would be one valid value.
# pooling: Optional pooling mode for feature extraction when include_top is False. - None means that the output of the model will be the 4D tensor output of the last convolutional block. - avg means that global average pooling will be applied to the output of the last convolutional block, and thus the output of the model will be a 2D tensor. - max means that global max pooling will be applied.
# classes: optional number of classes to classify images into, only to be specified if include_top is True, and if no weights argument is specified.
# classifier_activation: A str or callable. The activation function to use on the "top" layer. Ignored unless include_top=True. Set classifier_activation=None to return the logits of the "top" layer. When loading pretrained weights, classifier_activation can only be None or "softmax".

In [45]:
root_logdir = os.path.join(os.curdir, "logs","fit","")
def get_run_logdir():
    run_id = time.strftime("run_%Y_%m_%d-%H_%M_%S")
    return os.path.join(root_logdir, run_id)
run_logdir = get_run_logdir()

In [46]:
from tensorflow.keras.applications import VGG16, VGG19
from tensorflow.keras import Sequential 
from tensorflow.keras.layers import Flatten,Dense

def get_model(model_name, input_shape_=(32,32,3), n_class=10, last_act_func="softmax"):
    pretrained_model = model_name(
        include_top=False,
        weights="imagenet",
        input_tensor=None,
        input_shape=input_shape_,
        pooling=None,
        classes=n_class,
#         classifier_activation=last_act_func,
    )

    model = Sequential(layers=pretrained_model.layers)
    model.add(Flatten())
    model.add(Dense(1024, activation='relu'))
    model.add(Dense(512, activation='relu'))
    model.add(Dense(10, activation=last_act_func))
    
    return model

In [25]:
#training pipeline
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.preprocessing import image
from tensorflow.keras import applications as tkp
import numpy as np

def train_model(preprocess_with,model_name, train_images, train_labels, validation_images, validation_labels):

    X = preprocess_with.preprocess_input(train_images)
    y = train_labels
    # y =  tf.keras.utils.to_categorical(train_labels, num_classes = 10)

    X_val = preprocess_with.preprocess_input(validation_images)
    y_val = validation_labels
    # y_val =  tf.keras.utils.to_categorical(validation_labels, num_classes = 10)
    print(X.shape, y.shape,X_val.shape, y_val.shape)

    model = get_model(model_name, input_shape_=(32,32,3), n_class=10, last_act_func="softmax")
    
    model.compile(loss='sparse_categorical_crossentropy',
                  optimizer=SGD(lr=0.001),
                  metrics=['accuracy'])

    tensorboard_cb = keras.callbacks.TensorBoard(run_logdir)

    # for layer in model.layers:
    #     layer.trainable = True

    model_history = model.fit(X,y,
          epochs=50,          
          validation_data=(X_val,y_val),
          validation_freq = 1,
          workers = 48,
          callbacks=[tensorboard_cb],
          use_multiprocessing=True
         )
    return model, model_history

In [10]:
# # vgg16_model
# model = get_model(VGG19, input_shape_=(32,32,3), n_class=10, last_act_func="softmax")
# from tensorflow.keras.optimizers import SGD
# model.compile(loss='sparse_categorical_crossentropy',
#               optimizer=SGD(lr=0.001),
#               metrics=['accuracy'])

# tensorboard_cb = keras.callbacks.TensorBoard(run_logdir)

# # for layer in model.layers:
# #     layer.trainable = True

# model.fit(X,y,
#           epochs=50,          
#           validation_data=(X_val,y_val),
#           validation_freq = 1,
#           workers = 48,
#           callbacks=[tensorboard_cb],
#           use_multiprocessing=True
#          )

# train_model(preprocess_with=tkp.vgg16,model_name=VGG16, train_images, train_labels, validation_images, validation_labels)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50

KeyboardInterrupt: 

In [12]:
model.evaluate(test_images, test_labels)



[5.8077521324157715, 0.6036999821662903]

In [14]:
model.evaluate(preprocess_input(test_images), test_labels)



[1.7145371437072754, 0.819100022315979]

In [26]:
model, history = train_model(preprocess_with=tkp.vgg19,model_name=VGG19, train_images=train_images, train_labels=train_labels, validation_images=validation_images, validation_labels=validation_labels)

(26250, 32, 32, 3) (26250, 1) (8750, 32, 32, 3) (8750, 1)


  super(SGD, self).__init__(name, **kwargs)


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


In [27]:
model.evaluate(preprocess_input(test_images), test_labels)



[1.7842907905578613, 0.8228999972343445]

In [32]:
model.save('Saved_model/CIFAR10_VGG19_epochs.h5')

history.history.keys()

dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])

In [40]:
from tensorflow.keras.applications import Xception , MobileNet
# Xception - minimum input size 71 X 71  is req.
# MobileNetV2 - minimum input size 71 X 71  is req.

model, history = train_model(preprocess_with=tkp.mobilenet,
                             model_name=MobileNet,
                             train_images=train_images, train_labels=train_labels, 
                             validation_images=validation_images, validation_labels=validation_labels)

(26250, 32, 32, 3) (26250, 1) (8750, 32, 32, 3) (8750, 1)
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet/mobilenet_1_0_224_tf_no_top.h5
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


In [42]:
model.evaluate(preprocess_input(test_images), test_labels)
model.save('Saved_model/CIFAR10_MobileNet_epochs.h5')



In [43]:
model.evaluate(preprocess_input(test_images), test_labels)



[3.76102614402771, 0.21379999816417694]