# Training Deep Neural Networks

**_Build a neural network of appropriate depth on CIFAR10 dataset with batch normalization, He initialization, Swish activation function, Nadam optimizer and early stopping._**

In [1]:
# Import required packages

import pickle
import tensorflow as tf
from sklearn.model_selection import train_test_split

## Retrieving & Analysing Dataset

In [4]:
# Loads CIFAR10 dataset from a pickle dump

with open("data/cifar10.pickle", "rb") as f:
    cifar10 = pickle.load(f)

In [6]:
# Considering dataset is organized in tuple, items are referenced as follows

(X_train_full, y_train_full), (X_test, y_test) = cifar10

In [None]:
# Check the shape of the training and test dataset

print("Shape of the training dataset:", <code here>)

print("Shape of the test dataset:", <code here>)

## Preparing Datasets

In [18]:
# Split train dataset further to seperate 5000 instances to be used as validation set.
# Also, consider stratification during splitting.

X_train, X_val, y_train, y_val = <code here>

## Modeling with Batch Normalization

**Add Batch Normalization (BN) layer after every Dense layer (and before the activation function), except for the output layer.**

In [30]:
# Reset all the keras states
tf.keras.backend.clear_session()

tf.random.set_seed(42)

# Initialize a sequential model
<code here>

# Initialize a "Flatten" layer with input shape and add that to add
<code here>

# In each iteration in a loop of 20 iterations, 
#   1) initialize a "Dense" layer specifying output shape and "he_normal" as kernel initializer and add that to model,
#   2) initialize a "BatchNormalization" layer and add that to model,
#   3) initialize a "Activation" layer for "swish" activation and add that to model

<code here>

# Initialize a "Dense" layer specifying output shape and activation function according to the task and add to model
<code here>


In [32]:
# Initialize "Nadam" as model optimizer with 5e-4 as learning rate
<code here>

# Compile the model by specifying sparse categorical crossentropy as loss function,
# already initialized optimizer and "accuracy" as a metric
<code here>

In [34]:
# Set the path for the model weights to be stored into. Instead of storing weights across all batches,
# only best weights gets saved.
model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint("./model_weights/my_cifar10_bn_model.keras", save_best_only=True)

# Set callback to stop training when model does improve after 20 training iterations
early_stopping_callback = <code here>

# Prepare list of callbacks to be passed into training process
callbacks = [early_stopping_callback, model_checkpoint_callback]

In [None]:
# Fit the model by specifying training dataset, 100 epochs,
# validation data (tuple with features and labels) and callbacks

<code here>

In [None]:
# Evaluate model performance agaisnt against validation data
<code here>

**Note down if the accuracy of this model is better than that of the previous model.**

**Observations:**

Note down all your observations in green/blue book.