# Training Deep Neural Networks

## Practical 1

_**Build a neural network of appropriate depth on CIFAR10 dataset with He initialization, Swish activation function, Nadam optimizer and early stopping. Compare this model’s learning curve on its convergence, performance and training time with another model with bath normalization layer added.**_

In [36]:
# Imports required packages

import pandas as pd
import tensorflow as tf
from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras.datasets import cifar10
from sklearn.model_selection import train_test_split
from pathlib import Path
from time import strftime
import matplotlib.pyplot as plt

In [9]:
# Loads CIFAR10 dataset
# Note that first time download may take several minutes to complete

cifar10 = cifar10.load_data()

In [10]:
# Considering dataset is organized in tuple, items are referenced as follows
(X_train_full, y_train_full), (X_test, y_test) = cifar10

In [11]:
# Checks the shape of the training and test dataset
print("Shape of the training dataset:", X_train_full.shape)
print("Shape of the test dataset:", X_test.shape)

Shape of the training dataset: (50000, 32, 32, 3)
Shape of the test dataset: (10000, 32, 32, 3)


In [15]:
# Splits train dataset further to seperate 5000 instances to be used as validation set

X_train, X_val, y_train, y_val = train_test_split(
    X_train_full, y_train_full, test_size=5000, random_state=42, stratify=y_train_full)

In [16]:
# Each training and test example is assigned to one of the following labels.
class_names = ["airplanes", "cars", "birds", "cats", "deer", "dogs", "frogs", \
               "horses", "ships", "trucks"]

Creates a dense neural network model with specific activation and kernel initializer.

In [20]:
tf.random.set_seed(42)

model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(input_shape=[32, 32, 3]))
for _ in range(20):
    model.add(tf.keras.layers.Dense(100, activation="swish", kernel_initializer="he_normal"))

2024-08-27 13:39:06.653849: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.


In [53]:
model.add(tf.keras.layers.Dense(10, activation="softmax"))

In [55]:
optimizer = tf.keras.optimizers.Nadam(learning_rate=5e-5)
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=["accuracy"])

**Creates callbacks for model checkpoints, early stopping and TensorBoard.**

In [27]:
def get_logdir(logdir="logs"):
    """
    Returns directory path to store all logs into
    """
    return Path(logdir) / strftime("%Y_%m_%d_%H_%M_%S")

In [45]:
logdir = get_logdir()

In [49]:
model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint("./model_weights/my_cifar10_model", save_best_only=True)
early_stopping_callback = tf.keras.callbacks.EarlyStopping(patience=20, restore_best_weights=True)
#run_index = 1 # increment every time you train the model
#run_logdir =  get_logdir() #Path() / "my_cifar10_logs" / f"run_{run_index:03d}"
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir)
callbacks = [early_stopping_callback, model_checkpoint_callback, tensorboard_callback]

In [40]:
%load_ext tensorboard
%tensorboard --logdir=./logs

In [57]:
model.fit(X_train, y_train, epochs=100, validation_data=(X_val, y_val), callbacks=callbacks)

Epoch 1/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


Epoch 2/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


Epoch 3/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


Epoch 4/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


Epoch 5/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


Epoch 6/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


Epoch 7/100
Epoch 8/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


Epoch 9/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


Epoch 10/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


Epoch 11/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


Epoch 12/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


Epoch 13/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


Epoch 14/100
Epoch 15/100
Epoch 16/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


Epoch 17/100
Epoch 18/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


Epoch 19/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


Epoch 20/100
Epoch 21/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


Epoch 22/100
Epoch 23/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


Epoch 24/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


Epoch 25/100
Epoch 26/100
Epoch 27/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


Epoch 28/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_model/assets


Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100


<keras.callbacks.History at 0x78a6438a7450>

In [64]:
model.evaluate(X_val, y_val)



[1.5023068189620972, 0.4691999852657318]

**Now, adds batch normalization layer after every Dense layer (and before the activation function), except for the output layer.**

In [67]:
# Resets all the keras states
tf.keras.backend.clear_session()

tf.random.set_seed(42)

model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(input_shape=[32, 32, 3]))
for _ in range(20):
    model.add(tf.keras.layers.Dense(100, kernel_initializer="he_normal"))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.Activation("swish"))
    
model.add(tf.keras.layers.Dense(10, activation="softmax"))

optimizer = tf.keras.optimizers.Nadam(learning_rate=5e-4)
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=["accuracy"])


In [69]:
logdir = get_logdir()

In [73]:
model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint("./model_weights/my_cifar10_bn_model", save_best_only=True)
early_stopping_callback = tf.keras.callbacks.EarlyStopping(patience=20, restore_best_weights=True)
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir)
callbacks = [early_stopping_callback, model_checkpoint_callback, tensorboard_callback]

In [77]:
model.fit(X_train, y_train, epochs=100, validation_data=(X_val, y_val), callbacks=callbacks)

Epoch 1/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_bn_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_bn_model/assets


Epoch 2/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_bn_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_bn_model/assets


Epoch 3/100
Epoch 4/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_bn_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_bn_model/assets


Epoch 5/100
Epoch 6/100
Epoch 7/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_bn_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_bn_model/assets


Epoch 8/100
Epoch 9/100
Epoch 10/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_bn_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_bn_model/assets


Epoch 11/100
Epoch 12/100
Epoch 13/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_bn_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_bn_model/assets


Epoch 14/100
Epoch 15/100



INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_bn_model/assets


INFO:tensorflow:Assets written to: ./model_weights/my_cifar10_bn_model/assets


Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100


<keras.callbacks.History at 0x78a5f6b5a190>

In [82]:
model.evaluate(X_val, y_val)



[1.4116411209106445, 0.5009999871253967]

**Observations:**

- The previous model took 27 epochs to reach the lowest validation loss, while the new model achieved that same loss in just 9 epochs and continued to make progress until the 14th epoch. The BN layers stabilized training and allowed us to use a much larger learning rate, so convergence was faster.

- The final model is also much better, with 50.1% validation accuracy instead of 46.91%.

- Although the model converged much faster, each epoch took about 60s instead of 50s, because of the extra computations required by the BN layers. But overall the training time (wall time) to reach the best model was shortened from 28 minutes to 18 minutes.