### In this exercise, I will perform DNN based on dataset CIFAR10

#### a) Create a deep network consisting of 20 hidden layers containing 100 neurons each (there are too many, but that's the moral of the exercise). Use the He initialisation and the ELU activation function.

In [None]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt

In [7]:
keras.backend.clear_session()
tf.random.set_seed(0)
np.random.seed(0)

network = keras.models.Sequential()
network.add(keras.layers.Flatten(input_shape = [32,32,3]))

for _ in range(20):
    network.add(keras.layers.Dense(100, 
                                  kernel_initializer="he_normal", # these parameters are default values
                                  activation="elu"))

#### b) Exercise: Using Nadam optimization and early stopping, train the network on the CIFAR10 dataset. You can load it with keras.datasets.cifar10.load_data(). The dataset is composed of 60,000 32 × 32–pixel color images (50,000 for training, 10,000 for testing) with 10 classes, so you'll need a softmax output layer with 10 neurons. Remember to search for the right learning rate each time you change the model's architecture or hyperparameters.

In [9]:
(X_train_full, y_train_full), (X_test_full, y_test_full) = keras.datasets.cifar10.load_data()

In [12]:
X_train_full.shape, X_test_full.shape

((50000, 32, 32, 3), (10000, 32, 32, 3))

##### As validation dataset, I will use 5000 observations of each training and test dataset.

In [16]:
X_train = X_train_full[5000:]
y_train = y_train_full[5000:]
X_valid = X_train_full[:5000]
y_valid = y_train_full[:5000]

Now, I will build my optimizer.

In [17]:
network.add(keras.layers.Dense(10, activation="softmax"))
optimizer = keras.optimizers.Nadam(learning_rate=0.001)
network.compile(loss = 'sparse_categorical_crossentropy',
                optimizer = optimizer,
                metrics = ["accuracy"])


early_stopping_cb = keras.callbacks.EarlyStopping(patience=10) # Because of early stopping, I have created validation dataset
checkpoint_cb = keras.callbacks.ModelCheckpoint("GNN_exercise_2.h5", save_best_only=True)
callbacks = [early_stopping_cb, checkpoint_cb]

network.fit(X_train, y_train, epochs = 50,
            validation_data = [X_valid, y_valid],
            callbacks = callbacks)

network = keras.models.load_model("GNN_exercise_2.h5")
network.evaluate(X_valid, y_valid)


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50


[1.6971863508224487, 0.38339999318122864]

#### c) Now try adding Batch Normalization and compare the learning curves: Is it converging faster than before? Does it produce a better model? How does it affect training speed?

In [19]:
keras.backend.clear_session()
tf.random.set_seed(0)
np.random.seed(0)

network_batched = keras.models.Sequential()
network_batched.add(keras.layers.Flatten(input_shape = [32,32,3]))
network_batched.add(keras.layers.BatchNormalization())

for _ in range(20):
    network_batched.add(keras.layers.Dense(100, kernel_initializer="he_normal"))
    network_batched.add(keras.layers.BatchNormalization())
    network_batched.add(keras.layers.Activation("elu"))
network_batched.add(keras.layers.Dense(10, activation="softmax"))

optimizer = keras.optimizers.Nadam(learning_rate=0.001)
network_batched.compile(loss = 'sparse_categorical_crossentropy',
                optimizer = optimizer,
                metrics = ["accuracy"])


early_stopping_cb = keras.callbacks.EarlyStopping(patience=10) # Because of early stopping, I have created validation dataset
checkpoint_cb = keras.callbacks.ModelCheckpoint("GNN_exercise_2_1.h5", save_best_only=True)
callbacks = [early_stopping_cb, checkpoint_cb]

network_batched.fit(X_train, y_train, epochs = 50,
            validation_data = [X_valid, y_valid],
            callbacks = callbacks)

network_batched = keras.models.load_model("GNN_exercise_2_1.h5")
network_batched.evaluate(X_valid, y_valid)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50


[1.299830675125122, 0.545199990272522]