# Chapter 11: Training Deep Neural Network

This chapter introduces some common techniques to train a deep neural network, including selecting initializers, choosing activation functions, batch normalization, using better optimizers, etc. This book only contains basic concept and usage of these techniques. For other techniques in training NN or the priciples of these methods, please refer to other materials.

> This jupyter notebook contains my own solution to the coding exercises of the book. However the code may not be optimized to the best performance. For answers to the questions of the book, please check the markdown file under the same directory.

## Exercise 8: Deep Learning

Requirement: 
1. Use 5 hidden layers with 100 neurons, He initializer and ELU activation function to build a DNN.
2. Try to use Adam optimizer and early stopping to train the DNN on MNIST, but only with digit 0 ~ 4, because we need to train the model to recognize digit 5 ~ 9 using transfer learning in the next exercise. You will need an output softmax layer with 5 neurons, and make sure to keep saving the checkpoint for later usage.
3. Use cross validation to fine tune hyperparameters. Check your precision score.
4. Try batch normalization.
5. Does the model overfit the train set? Try adding Dropout for each layer.

The code below creates a five layer DNN and adds batch normalization and dropout.

In [None]:
import tensorflow as tf
from keras.datasets import mnist
import numpy as np
from sklearn.metrics import accuracy_score, precision_score

# prepare for dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train_first, y_train_first, X_test_first, y_test_first = [], [], [], []
X_train_second, y_train_second, X_test_second, y_test_second = [], [], [], []
# select digits 0-4
for i in range(len(y_train)):
    if y_train[i] <= 4:
        X_train_first.append(X_train[i])
        y_train_first.append(y_train[i])
    else:
        X_train_second.append(X_train[i])
        y_train_second.append(y_train[i])

for i in range(len(y_test)):
    if y_test[i] <= 4:
        X_test_first.append(X_test[i])
        y_test_first.append(y_test[i])
    else:
        X_test_second.append(X_test[i])
        y_test_second.append(y_test[i])

X_train_early = np.array(X_train_first)
y_train_early = np.array(y_train_first)
X_test_early = np.array(X_test_first)
y_test_early = np.array(y_test_first)

# hyperparameters
num_epochs = 35
batch_size = 32
learning_rate = 0.001
num_training_samples = X_train_early.shape[0]
num_batches = int(np.ceil(num_training_samples / batch_size))
random_indexes = np.arange(num_training_samples)
np.random.shuffle(random_indexes)
current_epoch = -1

# build the network
input_tensor = tf.placeholder(tf.float32, shape=(None, 28, 28), name="input")
reshape_input = tf.reshape(input_tensor, [-1, 28*28])
label = tf.placeholder(tf.int32, shape=(None,), name="label")

def fully_connected_layer_with_elu_bn_dropout(
    X,
    num_neurons
):
    """
    A custom fully connected layer

    :param X: input tensor
    :param num_neurons: number of neuron in this custom fully connected layer
    :return: output tensor with elu activation function, batch normalization and dropout to relief overfitting.

    """
    with tf.name_scope("my_fully_connected_layer"):
        hidden_layer = tf.layers.Dense(num_neurons, kernel_initializer=tf.initializers.he_normal())(X)
        batch_mean, batch_var = tf.nn.moments(hidden_layer, axes=0)
        hidden_layer = tf.nn.batch_normalization(hidden_layer, mean=batch_mean, variance=batch_var, offset=None, scale=None, variance_epsilon=0.0001)
        hidden_layer = tf.nn.elu(hidden_layer)
        hidden_layer = tf.nn.dropout(hidden_layer, rate=0.3)
        return hidden_layer

with tf.name_scope("fully_connected_network"):
    hidden_1 = fully_connected_layer_with_elu_bn_dropout(reshape_input, num_neurons=100)
    hidden_2 = fully_connected_layer_with_elu_bn_dropout(hidden_1, num_neurons=100)
    hidden_3 = fully_connected_layer_with_elu_bn_dropout(hidden_2, num_neurons=100)
    hidden_4 = fully_connected_layer_with_elu_bn_dropout(hidden_3, num_neurons=100)
    hidden_5 = fully_connected_layer_with_elu_bn_dropout(hidden_4, num_neurons=100)

    logits = tf.layers.Dense(5, kernel_initializer=tf.initializers.he_normal(), name="output")(hidden_5)

with tf.name_scope('loss'):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=label, logits=logits)
    loss = tf.reduce_mean(xentropy, name='loss')

with tf.name_scope('train'):
    adam_optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
    train_op = adam_optimizer.minimize(loss)

init = tf.global_variables_initializer()
saver = tf.train.Saver()

with tf.Session() as sess:
    sess.run(init)

    for i in range(num_epochs):
        for j in range(num_batches):
            if i != current_epoch:
                random_indexes = np.arange(num_training_samples)
                np.random.shuffle(random_indexes)
                current_epoch += 1

            selected_index = random_indexes[j*batch_size:(j+1)*batch_size]
            X_train_batch, y_train_batch = X_train_early[selected_index], y_train_early[selected_index]
            sess.run(train_op, feed_dict={input_tensor: X_train_batch, label: y_train_batch})

        print("---------- Epoch %d ----------" % (i))
        print("Loss:", loss.eval(feed_dict={input_tensor: X_train_batch, label: y_train_batch}))
        save_path = saver.save(sess, "/your/path/here")

    save_path = saver.save(sess, "/your/path/here")

    # check the accuracy of trained model on train and test set
    predictions = np.argmax(logits.eval(feed_dict={input_tensor: X_test_early, label: y_test_early}), axis=1)
    print("========== Model Performance ==========")
    print("Accuracy:", accuracy_score(y_test_early, predictions))
    print("Precision::", precision_score(y_test_early, predictions, average='macro'))

## Exercise 9: Transfer Learning

Requirement:
1. Reuse the hidden layers previously and build a new DNN.
2. Train the new network with digit 5-9.

## Exercise 10: Pretrain on Assistant Task

Requirement: