## Deep Learning
a. Build a DNN with five hidden layers of 100 neurons each, He initialization, and the ELU activation function.

b. Using Adam optimization and early stopping, try training it on MNIST but only on digits 0 to 4, as we will use transfer learning for digits 5 to 9 in the next exercise. You will need a softmax output layer with five neurons, and as always make sure to save checkpoints at regular intervals and save the final model so you can reuse it later.

c. Tune the hyperparameters using cross-validation and see what precision you can achieve.

d. Now try adding Batch Normalization and compare the learning curves: is it converging faster than before? Does it produce a better model?

e. Is the model overfitting the training set? Try adding dropout to every layer and try again. Does it help?

In [1]:
import numpy as np
import tensorflow as tf

#### Build a DNN with five hidden layers of 100 neurons each, He initialization, and the ELU activation function.

In [2]:
# Create Inputs and outputs
n_inputs = 28 * 28
n_hidden = 100
n_outputs = 5

X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int32, shape=(None), name="y")
is_training = tf.placeholder(tf.bool, shape=(), name='is_training')
bn_params = {
    'is_training': is_training,
    'decay': 0.99,
    'updates_collections': None
}
keep_prob = 0.8

In [3]:
# build layers
from tensorflow.contrib.layers import fully_connected
from tensorflow.contrib.layers import batch_norm
from tensorflow.contrib.layers import dropout

he_init = tf.contrib.layers.variance_scaling_initializer()
with tf.contrib.framework.arg_scope(
        [fully_connected],
        weights_initializer = he_init,
        normalizer_fn =  batch_norm,
        normalizer_params = bn_params):
    hidden_1 = fully_connected(X, n_hidden, scope='hidden1', activation_fn=tf.nn.relu)
    hidden_drop_1 = dropout(hidden_1, keep_prob, is_training=is_training)
    hidden_2 = fully_connected(hidden_drop_1, n_hidden, scope='hidden2', activation_fn=tf.nn.relu)
    hidden_drop_2 = dropout(hidden_2, keep_prob, is_training=is_training)
    hidden_3 = fully_connected(hidden_drop_2, n_hidden, scope='hidden3', activation_fn=tf.nn.relu)
    hidden_drop_3 = dropout(hidden_3, keep_prob, is_training=is_training)
    hidden_4 = fully_connected(hidden_drop_3, n_hidden, scope='hidden4', activation_fn=tf.nn.relu)
    hidden_drop_4 = dropout(hidden_4, keep_prob, is_training=is_training)
    logits = fully_connected(hidden_drop_4, n_outputs, scope='hidden5', activation_fn=tf.nn.softmax)

In [4]:
# calculate loss
with tf.name_scope('loss'):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
    loss = tf.reduce_mean(xentropy, name='loss')

In [5]:
# Optimizer
with tf.name_scope('train'):
    optimizer = tf.train.AdamOptimizer()
    train_op = optimizer.minimize(loss)

In [6]:
# evalue, 这一步不影响训练流程，只是为了获取一些信息
with tf.name_scope('eval'):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

In [7]:
# init
init = tf.global_variables_initializer()
saver = tf.train.Saver()

In [8]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data')

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.


In [9]:

n_epoch = 50
batch_size = 50
minimum_loss = 1.11
# minimum_loss = 0.913
# minimum_loss = 1.49

def run_train(sess, X_train, y_train, train_idx):
    sess.run(init)
#     sess.run(he_init)
    model_path = '/Users/chancezhang/machine_learning/hands_on_machine_learning/tmp/C11/C11_Ex08_' + str(train_idx) + '.ckpt'
    for epoch in range(n_epoch):
        for idx in range(X_train.shape[0] // batch_size):
            X_batch = X_train[batch_size * idx : batch_size * (idx + 1)]
            y_batch = y_train[batch_size * idx : batch_size * (idx + 1)]
            run_ret = sess.run(train_op, feed_dict={X:X_batch, y:y_batch, is_training:True})
        runtime_loss = loss.eval(feed_dict={X:X_train, y:y_train, is_training:False})
        runtime_acc = accuracy.eval(feed_dict={X:X_train, y:y_train, is_training:False})
        print('epoch ', epoch, 'runtime-accuracy:', runtime_acc, 'loss:', runtime_loss)
        if runtime_loss < minimum_loss:
            save_path = saver.save(sess, model_path) # early-stopping
            return;
    save_path = saver.save(sess, model_path) # early-stopping

# with tf.Session() as sess:
    
#     for epoch in range(n_epoch):
#         for mini_batch_idx in range(mnist.train.num_examples // batch_size):
#             X_batch, y_batch = mnist.train.next_batch(batch_size)
#             sess.run(train_op, feed_dict={X:X_batch, y:y_batch})
#         acc_train = accuracy.eval(feed_dict={X:X_batch, y:y_batch})
#         acc_test = accuracy.eval(feed_dict={X:mnist.test.images,
#                                             y:mnist.test.labels})
#         print('epoch ', epoch, 'train_acc:', acc_train, 'test_acc:', acc_test)
#     save_path = saver.save(sess, model_path)

In [10]:
# Cross Validation Implementation
split_size = 5
from sklearn.model_selection import KFold
results = []
with tf.Session() as sess:
    # 只采用0~4数据
    X_train_all = mnist.train.images
    y_train_all = mnist.train.labels
    selective_indices = np.where(y_train_all < 5)
    X_train = X_train_all[selective_indices]
    y_train = y_train_all[selective_indices]
    
    train_id = 0
    kf = KFold(n_splits=split_size)
    for train_idx, val_idx in kf.split(X_train, y_train):
        X_train_cs = X_train[train_idx]
        y_train_cs = y_train[train_idx]
        X_val_cs = X_train[val_idx]
        y_val_cs = y_train[val_idx]
        print('Begin Cross-Validation Train: ', train_id)
        run_train(sess, X_train_cs, y_train_cs, train_id)
        acc = accuracy.eval(feed_dict={X:X_val_cs, y:y_val_cs, is_training:False})
        print('accuracy: ', acc)
        results.append(acc)
        train_id += 1
print(results)

Begin Cross-Validation Train:  0
epoch  0 runtime-accuracy: 0.9749443 loss: 1.1184297
epoch  1 runtime-accuracy: 0.9820776 loss: 1.1162745
epoch  2 runtime-accuracy: 0.9875167 loss: 1.1136063
epoch  3 runtime-accuracy: 0.99032545 loss: 1.1152239
epoch  4 runtime-accuracy: 0.9912617 loss: 1.1091119
accuracy:  0.9803852
Begin Cross-Validation Train:  1
epoch  0 runtime-accuracy: 0.9728489 loss: 1.1209599
epoch  1 runtime-accuracy: 0.98109674 loss: 1.1194181
epoch  2 runtime-accuracy: 0.98555505 loss: 1.1192982
epoch  3 runtime-accuracy: 0.98903257 loss: 1.111756
epoch  4 runtime-accuracy: 0.99090505 loss: 1.115391
epoch  5 runtime-accuracy: 0.99148464 loss: 1.1149554
epoch  6 runtime-accuracy: 0.9937138 loss: 1.1141894
epoch  7 runtime-accuracy: 0.9939813 loss: 1.1158588
epoch  8 runtime-accuracy: 0.9953188 loss: 1.1190138
epoch  9 runtime-accuracy: 0.99505126 loss: 1.1126528
epoch  10 runtime-accuracy: 0.9960321 loss: 1.1163722
epoch  11 runtime-accuracy: 0.9961659 loss: 1.1204696
epoch

epoch  44 runtime-accuracy: 0.9987963 loss: 1.128567
epoch  45 runtime-accuracy: 0.9986626 loss: 1.1225657
epoch  46 runtime-accuracy: 0.99884087 loss: 1.125255
epoch  47 runtime-accuracy: 0.99893004 loss: 1.1256492
epoch  48 runtime-accuracy: 0.998618 loss: 1.1267347
epoch  49 runtime-accuracy: 0.99888545 loss: 1.1263832
accuracy:  0.98983413
Begin Cross-Validation Train:  4
epoch  0 runtime-accuracy: 0.9709331 loss: 1.1192257
epoch  1 runtime-accuracy: 0.9799385 loss: 1.1186392
epoch  2 runtime-accuracy: 0.98568946 loss: 1.113932
epoch  3 runtime-accuracy: 0.98809683 loss: 1.1105347
epoch  4 runtime-accuracy: 0.98996925 loss: 1.1120654
epoch  5 runtime-accuracy: 0.99201995 loss: 1.1147864
epoch  6 runtime-accuracy: 0.99233204 loss: 1.1156058
epoch  7 runtime-accuracy: 0.99420446 loss: 1.1169355
epoch  8 runtime-accuracy: 0.9950069 loss: 1.116377
epoch  9 runtime-accuracy: 0.99563104 loss: 1.1183469
epoch  10 runtime-accuracy: 0.99558645 loss: 1.1189171
epoch  11 runtime-accuracy: 0.9

In [11]:
# Train and save final model
with tf.Session() as sess:
    print('Begin Final Train')
    X_train_all = mnist.train.images
    y_train_all = mnist.train.labels
    selective_indices = np.where(y_train_all < 5)
    X_train = X_train_all[selective_indices]
    y_train = y_train_all[selective_indices]
    
    X_test_all = mnist.test.images
    y_test_all = mnist.test.labels
    selective_indices = np.where(y_test_all < 5)
    X_test = X_test_all[selective_indices]
    y_test = y_test_all[selective_indices]
    
    run_train(sess, X_train, y_train, 100)
    acc = accuracy.eval(feed_dict={X:X_test, y:y_test, is_training:False})
    print('Final Accuracy: ', acc)

Begin Final Train
epoch  0 runtime-accuracy: 0.9755689 loss: 1.124169
epoch  1 runtime-accuracy: 0.9823097 loss: 1.1194228
epoch  2 runtime-accuracy: 0.9883373 loss: 1.1178962
epoch  3 runtime-accuracy: 0.9905842 loss: 1.1171715
epoch  4 runtime-accuracy: 0.99168986 loss: 1.1162269
epoch  5 runtime-accuracy: 0.99304515 loss: 1.1150918
epoch  6 runtime-accuracy: 0.99440044 loss: 1.1161937
epoch  7 runtime-accuracy: 0.99543476 loss: 1.1168222
epoch  8 runtime-accuracy: 0.9958628 loss: 1.1180718
epoch  9 runtime-accuracy: 0.9958628 loss: 1.1178206
epoch  10 runtime-accuracy: 0.9962194 loss: 1.1146296
epoch  11 runtime-accuracy: 0.9966474 loss: 1.1121229
epoch  12 runtime-accuracy: 0.9964691 loss: 1.1181396
epoch  13 runtime-accuracy: 0.9972894 loss: 1.1153132
epoch  14 runtime-accuracy: 0.99682575 loss: 1.1268251
epoch  15 runtime-accuracy: 0.99718237 loss: 1.1176854
epoch  16 runtime-accuracy: 0.9975034 loss: 1.1195031
epoch  17 runtime-accuracy: 0.99703974 loss: 1.1198167
epoch  18 runt