# Setup

Ensure that Python version 3.7 or above is installed:

In [None]:
import sys

assert sys.version_info >= (3, 7)

Import TensorFlow ≥ 2.8 * math:

In [None]:
from packaging import version
import tensorflow as tf
import math

assert version.parse(tf.__version__) >= version.parse("2.8.0")

# Building the Deep Neural Network

I initially built the neural network with 5 hidden layers of 50 neurons each, though I ended up doubling the number of layers twice until I had 20. Furthermore, I employed he swish activation function and the he_normal kernel initializer. In order ensure the network self-normalizes, however, I standardized the input features, changed the activation function to selu, and switched to LeCun normal initialization.

In [None]:
tf.random.set_seed(42)

model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(input_shape=[32, 32, 3]))
for _ in range(20):
    model.add(tf.keras.layers.Dense(50,
                                    activation="selu",
                                    kernel_initializer="lecun_normal"))

After some lackluster results, I decided to add some regularization with alpha dropout, as well.

In [None]:
model.add(tf.keras.layers.AlphaDropout(rate = 0.1))

# Training the Network on the Dataset

Next, the network was trained on the CIFAR10 dataset using Nadam optimization. I chose to train the network on the CIFAR10 dataset because it is one of the most common datasets for learning machine learning and is already built into Keras. Furthermore, the author of our course textbook Géron provided their approach to developing a model for this dataset in a Google Colab notebook. I frequently referenced this notebook and used it as a baseline for comparison.

In [None]:
cifar10 = tf.keras.datasets.cifar10.load_data()

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


This dataset is composed of 60,000 32 x 32-pixel color images. 50,000 of these will be set aside for the training set, and the rest (10,000) will make up the testing set. There are 10 classes in this dataset, so the output layer should have 10 neurons with the softmax activation function. The softmax function scales numbers into probabilities. I added the output layer to the model as such:

In [None]:
model.add(tf.keras.layers.Dense(10, activation = "softmax"))

I compiled the model and specified the stochastic gradient descent optimizer (I initially employed Nadam), sparse categorical crossentropy loss function, and accuracy metric to evaluate the the performance of the model. By compiling the model, "the backend automatically chooses the best way to represent the network for training and making predictions to run on [the] hardware" (Brownlee, 2022).

In [None]:
model.compile(loss = "sparse_categorical_crossentropy",
              optimizer = tf.keras.optimizers.SGD(),
              metrics = ["accuracy"])

Note: The learning rate associated with the optimizer can be manually tuned by comparing the learning curves for different rates for a set number of epochs.

# Fitting the Model

In order to the train (fit) the model, it is first necessary to split the data into subsets. First, I split the data into only a training and testing subset, but I quickyl realized I would also need a validation set to best evaluate the model's performance. The validation set is composed of 5,000 images of the original training set.

In [None]:
(X_train_full, y_train_full), (X_test, y_test) = cifar10

X_train = X_train_full[5000:]
y_train = y_train_full[5000:]
X_valid = X_train_full[:5000]
y_valid = y_train_full[:5000]

X_means = X_train.mean(axis = 0)
X_stds = X_train.std(axis = 0)
X_train_scaled = (X_train - X_means) / X_stds
X_valid_scaled = (X_valid - X_means) / X_stds
X_test_scaled = (X_test - X_means) / X_stds

I decided to add 1cycle scheduling after hearing about how powerful it can be for this particular dataset. I utilized (Géron 2023)'s OneCycleScheduler custom callback class, find_learning_rate function, and ExponentialLearningRate callback class from the Chapter 11 Code for this purpose:

In [None]:
K = tf.keras.backend
# Updates the learning rate at the beginning of each batch. Increases learning rate linearly
# during about the first half of training, then reduces it linearly back to initial
# learning rate. Finally, it reduces it down to close to zero for the last part of
# training.
class OneCycleScheduler(tf.keras.callbacks.Callback):
    def __init__(self, iterations, max_lr=1e-3, start_lr=None,
                 last_iterations=None, last_lr=None):
        self.iterations = iterations
        self.max_lr = max_lr
        self.start_lr = start_lr or max_lr / 10
        self.last_iterations = last_iterations or iterations // 10 + 1
        self.half_iteration = (iterations - self.last_iterations) // 2
        self.last_lr = last_lr or self.start_lr / 1000
        self.iteration = 0

    def _interpolate(self, iter1, iter2, lr1, lr2):
        return (lr2 - lr1) * (self.iteration - iter1) / (iter2 - iter1) + lr1

    def on_batch_begin(self, batch, logs):
        if self.iteration < self.half_iteration:
            lr = self._interpolate(0, self.half_iteration, self.start_lr,
                                   self.max_lr)
        elif self.iteration < 2 * self.half_iteration:
            lr = self._interpolate(self.half_iteration, 2 * self.half_iteration,
                                   self.max_lr, self.start_lr)
        else:
            lr = self._interpolate(2 * self.half_iteration, self.iterations,
                                   self.start_lr, self.last_lr)
        self.iteration += 1
        K.set_value(self.model.optimizer.learning_rate, lr)

# Updates the learning rate during training at the end of each batch.
class ExponentialLearningRate(tf.keras.callbacks.Callback):
    def __init__(self, factor):
        self.factor = factor
        self.rates = []
        self.losses = []

# Trains the model using the ExponentialLearningRate callback and returns the
# learning rates & batch losses.
def find_learning_rate(model, X, y, epochs=1, batch_size=32, min_rate=1e-4,
                       max_rate=1):
    init_weights = model.get_weights()
    iterations = math.ceil(len(X) / batch_size) * epochs
    factor = (max_rate / min_rate) ** (1 / iterations)
    init_lr = K.get_value(model.optimizer.learning_rate)
    K.set_value(model.optimizer.learning_rate, min_rate)
    exp_lr = ExponentialLearningRate(factor)
    history = model.fit(X, y, epochs=epochs, batch_size=batch_size,
                        callbacks=[exp_lr])
    K.set_value(model.optimizer.learning_rate, init_lr)
    model.set_weights(init_weights)
    return exp_lr.rates, exp_lr.losses

In [None]:
batch_size = 128
num_epochs = 10
num_iterations = math.ceil(len(X_train_scaled) / batch_size) * num_epochs
onecycle = OneCycleScheduler(num_iterations, max_lr = 0.05)
# This finds the optimal max learning rate
rates, losses = find_learning_rate(model, X_train_scaled, y_train, epochs=1,
                                   batch_size=batch_size)



Now, the model can be trained on the training set over 10 epochs
:

In [None]:
model.fit(X_train_scaled, y_train, epochs = num_epochs, batch_size = batch_size,
          validation_data = (X_valid_scaled, y_valid),
          callbacks = [onecycle])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7b3bb7ca55d0>

Finally, the model can be evaluated on the validation data:

In [None]:
model.evaluate(X_valid_scaled, y_valid)



[1.5112253427505493, 0.4828000068664551]

Since implementing advanced ML techniques such as alpha dropout regularization and 1cycle sdcheduling, my accuracy increased from 32% to 48%. I got to witness the importance of all of these small changes on the performance of the model over time, showcasing how ML techniques have improved model performance over time.

# Works Cited (APA7)

Brownlee, J. (2022, August 16). *Your first deep learning project in python with keras step-by-step.* MachineLearningMastery.com. https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/

Géron, A. (2023). *Hands-on machine learning with scikit-learn, keras and tensorflow: Concepts, tools, and techniques to build Intelligent Systems* (3rd ed.). O’Reilly Media, Inc.

Géron, A. (n.d.). Google Colab. *Chapter 11 – Training Deep Neural Networks*. https://colab.research.google.com/github/ageron/handson-ml3/blob/main/11_training_deep_neural_networks.ipynb

Google. (n.d.). *Machine Learning Crash Course with TensorFlow APIs*. Google Machine Learning. https://developers.google.com/machine-learning/crash-course
