**Task:**
Practice training a deep neural network on the CIFAR10 image dataset:

a. Build a DNN with 20 hidden layers of 100 neurons each (that’s too many, but
it’s the point of this exercise). Use He initialization and the Swish activation
function.

b. Using Nadam optimization and early stopping, train the network on the
CIFAR10 dataset. You can load it with tf.keras.datasets.cifar10.load_
data(). The dataset is composed of 60,000 32 × 32–pixel color images (50,000
for training, 10,000 for testing) with 10 classes, so you’ll need a softmax
output layer with 10 neurons. Remember to search for the right learning rate
each time you change the model’s architecture or hyperparameters.

c. Now try adding batch normalization and compare the learning curves: is it
converging faster than before? Does it produce a better model? How does it
affect training speed?

d. Try replacing batch normalization with SELU, and make the necessary adjust‐
ments to ensure the network self-normalizes (i.e., standardize the input fea‐
tures, use LeCun normal initialization, make sure the DNN contains only a
sequence of dense layers, etc.).

e. Try regularizing the model with alpha dropout. Then, without retraining your
model, see if you can achieve better accuracy using MC dropout.

f. Retrain your model using 1cycle scheduling and see if it improves training
speed and model accuracy.

a. Building the DNN with 20 hidden layers of 100 neurons each:

In [1]:
import tensorflow as tf
from tensorflow import keras

initializer = tf.keras.initializers.HeNormal()

model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[32, 32, 3]))
for _ in range(20):
    model.add(keras.layers.Dense(100, 
                                 activation=keras.layers.Activation('swish'), 
                                 kernel_initializer=initializer))
model.add(keras.layers.Dense(10, activation="softmax"))



b. Training the network with Nadam optimization and early stopping:

In [2]:
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.cifar10.load_data()
X_train_full = X_train_full.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0

model.compile(loss="sparse_categorical_crossentropy", optimizer=keras.optimizers.Nadam(lr=1e-3), metrics=["accuracy"])

early_stopping_cb = keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)
history = model.fit(X_train_full, y_train_full, epochs=100, validation_split=0.2, callbacks=[early_stopping_cb])

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz




Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100


c. Adding batch normalization to the model:

In [3]:
model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[32, 32, 3]))
for _ in range(20):
    model.add(keras.layers.Dense(100, use_bias=False))
    model.add(keras.layers.BatchNormalization())
    model.add(keras.layers.Activation('swish'))
model.add(keras.layers.Dense(10, activation="softmax"))

model.compile(loss="sparse_categorical_crossentropy", optimizer=keras.optimizers.Nadam(lr=1e-3), metrics=["accuracy"])

early_stopping_cb = keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)
history = model.fit(X_train_full, y_train_full, epochs=100, validation_split=0.2, callbacks=[early_stopping_cb])



Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100


d. Replacing batch normalization with SELU:

In [4]:
from tensorflow.keras.layers import Dense, Flatten, Input
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Nadam
from tensorflow.keras.initializers import lecun_normal
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Activation, Dropout

def selu(x):
    alpha = 1.67326
    scale = 1.0507
    return scale * K.elu(x, alpha)

model = Sequential()
model.add(Flatten(input_shape=[32, 32, 3]))
for _ in range(20):
    model.add(Dense(100, activation=selu, kernel_initializer=lecun_normal()))
model.add(Dense(10, activation="softmax"))

model.compile(loss="sparse_categorical_crossentropy", optimizer=Nadam(lr=1e-3), metrics=["accuracy"])

early_stopping_cb = keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)
history = model.fit(X_train_full, y_train_full, epochs=100, validation_split=0.2, callbacks=[early_stopping_cb])



Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100


e. regularizing the model with alpha dropout

In [8]:
model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[32, 32, 3]))
for _ in range(20):
    model.add(keras.layers.Dense(100, activation=keras.layers.Activation('swish'), kernel_initializer=initializer))
    model.add(keras.layers.AlphaDropout(rate=0.1))
model.add(keras.layers.Dense(10, activation="softmax"))

model.compile(loss="sparse_categorical_crossentropy", optimizer=keras.optimizers.Nadam(lr=1e-3), metrics=["accuracy"])

early_stopping_cb = keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)
history = model.fit(X_train_full, y_train_full, epochs=100, validation_split=0.2, callbacks=[early_stopping_cb])



Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100


In [9]:
import numpy as np

# make predictions with MC dropout
y_probas = np.stack([model.predict(X_test, batch_size=128, verbose=1)
                     for sample in range(100)])

# compute mean and standard deviation of predictions
y_mean = y_probas.mean(axis=0)
y_std = y_probas.std(axis=0)

# evaluate accuracy
accuracy = np.mean(np.equal(y_test, np.argmax(y_mean, axis=1)))
print("Test accuracy:", accuracy)

Test accuracy: 0.1


f. Retraining model using 1cycle scheduling

In [41]:
from tensorflow.keras.callbacks import LearningRateScheduler
from tensorflow.keras import backend as K

def one_cycle_lr(epoch, lr):
    max_lr = 0.05
    end_percentage = 0.1
    step_size = 5 * (len(X_train_full) // 128)
    midpoint = step_size // 2
    momentum = 0.95
    
    if epoch <= midpoint:
        new_lr = ((max_lr / end_percentage) / 2) * epoch * end_percentage
    else:
        new_lr = ((max_lr / end_percentage) / 2) * (1 - (epoch - midpoint) * (1 - end_percentage) / (step_size - midpoint))
        
    model.optimizer.lr = new_lr
    model.optimizer.beta_1 = momentum
    return new_lr


lr_scheduler = LearningRateScheduler(one_cycle_lr)

model.compile(loss="sparse_categorical_crossentropy", optimizer=keras.optimizers.Nadam(lr=1e-3, beta_1=0.95), metrics=["accuracy"])
history = model.fit(X_train_full, y_train_full, epochs=20, validation_split=0.2, batch_size=128, callbacks=[lr_scheduler])




Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
