# MNIST master

For a demo we shall solve the same digit recognition problem, but at a different scale
* images are now 28x28
* 10 different digits
* 50k samples

Before doing this homework, read some code examples written in tensorflow. There is a good repository with code examples: https://github.com/aymericdamien/TensorFlow-Examples. As we already know, we need many samples to learn :)

In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
import tensorflow as tf

In [3]:
from mnist import load_dataset
X_train,y_train,X_val,y_val,X_test,y_test = load_dataset()

print(X_train.shape,y_train.shape)

(50000, 1, 28, 28) (50000,)


In [4]:
# def add_noise(X):
#     return X*np.random.normal(0.6, 0.4, X.shape)

# X_train = np.append(X_train, add_noise(X_train), axis = 0)
# y_train = np.append(y_train, y_train, axis = 0)

# X_train = np.append(X_train, add_noise(X_train), axis = 0)
# y_train = np.append(y_train, y_train, axis = 0)

In [5]:
# print(X_train.shape,y_train.shape)
# plt.imshow(-X_train[151115, 0], cmap="jet")
# y_train[151115]

In [26]:
#defining placeholders for input and target


Defining network architecture

In [83]:
# flatten images to (batch_size x 728) matrix
# input_X_reshaped = tf.reshape(input_X, shape=[-1, 1*28*28], name="reshape_X")

# Fully connected layer, that takes input layer and applies 50 neurons 
# to it. Nonlinearity here is sigmoid as in logistic regression.
# You can give a name to each layer (optional)
tf.reset_default_graph()
input_X = tf.placeholder(tf.float32, shape=[None, 1, 28, 28], 
                         name="X")
target_y = tf.placeholder(tf.int32, shape=[None], 
                          name="target_Y_integer")


cv1_0 = tf.layers.conv2d(input_X, filters=128, kernel_size=(5, 5), strides=1, padding='same', activation=tf.nn.relu, name='cv1_0')
cv1_1 = tf.layers.conv2d(cv1_0,   filters=128, kernel_size=(5, 5), strides=1, padding='same', activation=tf.nn.relu, name='cv1_1')
mp1 = tf.layers.max_pooling2d(cv1_1, (2, 2), strides=2, padding='same', name='mp1')

cv2_0 = tf.layers.conv2d(mp1,     filters=256, kernel_size=(3, 3), strides=1, padding='same', activation=tf.nn.relu, name='cv2_0')
cv2_1 = tf.layers.conv2d(cv2_0,   filters=256, kernel_size=(3, 3), strides=1, padding='same', activation=tf.nn.relu, name='cv2_1')
mp2 = tf.layers.max_pooling2d(cv2_1, (2, 2), strides=2, padding='same', name='mp2')


# cv3_0 = tf.layers.conv2d(mp2,     filters=256, kernel_size=(3, 3), strides=1, padding='same', activation=tf.nn.relu, name='cv3_0')
# cv3_1 = tf.layers.conv2d(cv3_0,   filters=256, kernel_size=(3, 3), strides=1, padding='same', activation=tf.nn.relu, name='cv3_1')
# mp3 = tf.layers.max_pooling2d(cv3_1, (2, 2), strides=2, padding='same', name='mp3')

f  = tf.layers.flatten(mp2, name='flatten')

l1 = tf.layers.dense(f , units=512, activation=tf.nn.relu, name='l1', kernel_regularizer=tf.nn.l2_normalize)

l2 = tf.layers.dense(l1 , units=256, activation=tf.nn.relu, name='l2', kernel_regularizer=tf.nn.l2_normalize)

l3 = tf.layers.dense(l2 , units=128, activation=tf.nn.relu, name='l3', kernel_regularizer=tf.nn.l2_normalize)

l4 = tf.layers.dense(l3 , units=32, activation=tf.nn.relu, name='l4', kernel_regularizer=tf.nn.l2_normalize)

l5 = tf.layers.dense(l4 , units=10, activation=None, name='l5')

# We use softmax nonlinearity to make probabilities add up to 1
l_out = tf.nn.softmax(l5, name='l_out')

# Prediction
y_predicted = tf.argmax(l5, axis=-1)

weights = tf.trainable_variables()
weights

[<tf.Variable 'cv1_0/kernel:0' shape=(5, 5, 28, 128) dtype=float32_ref>,
 <tf.Variable 'cv1_0/bias:0' shape=(128,) dtype=float32_ref>,
 <tf.Variable 'cv1_1/kernel:0' shape=(5, 5, 128, 128) dtype=float32_ref>,
 <tf.Variable 'cv1_1/bias:0' shape=(128,) dtype=float32_ref>,
 <tf.Variable 'cv2_0/kernel:0' shape=(3, 3, 128, 256) dtype=float32_ref>,
 <tf.Variable 'cv2_0/bias:0' shape=(256,) dtype=float32_ref>,
 <tf.Variable 'cv2_1/kernel:0' shape=(3, 3, 256, 256) dtype=float32_ref>,
 <tf.Variable 'cv2_1/bias:0' shape=(256,) dtype=float32_ref>,
 <tf.Variable 'l1/kernel:0' shape=(1792, 512) dtype=float32_ref>,
 <tf.Variable 'l1/bias:0' shape=(512,) dtype=float32_ref>,
 <tf.Variable 'l2/kernel:0' shape=(512, 256) dtype=float32_ref>,
 <tf.Variable 'l2/bias:0' shape=(256,) dtype=float32_ref>,
 <tf.Variable 'l3/kernel:0' shape=(256, 128) dtype=float32_ref>,
 <tf.Variable 'l3/bias:0' shape=(128,) dtype=float32_ref>,
 <tf.Variable 'l4/kernel:0' shape=(128, 32) dtype=float32_ref>,
 <tf.Variable 'l4/bi

### Than you could simply
* define loss function manually
* compute error gradient over all weights
* define updates
* But that's a whole lot of work and life's short
  * not to mention life's too short to wait for SGD to converge

Instead, we shall use Tensorflow builtins

In [84]:
# Mean categorical crossentropy as a loss function
# - similar to logistic loss but for multiclass targets
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
    labels=target_y, logits=l5, name="softmax_loss"))

In [85]:
accuracy, update_accuracy = tf.metrics.accuracy(target_y, y_predicted)
# tf.local_variables()

In [86]:
optimzer = tf.train.AdamOptimizer()
train_step = optimzer.minimize(loss)

### That's all, now let's train it!
* We got a lot of data, so it's recommended that you use SGD
* So let's implement a function that splits the training sample into minibatches

In [87]:
# An auxilary function that returns mini-batches for neural network training

#Parameters
# inputs - a tensor of images with shape (many, 1, 28, 28), e.g. X_train
# outputs - a vector of answers for corresponding images e.g. Y_train
#batch_size - a single number - the intended size of each batches

def iterate_minibatches(inputs, targets, batchsize):
    assert len(inputs) == len(targets)
    indices = np.arange(len(inputs))
    np.random.shuffle(indices)
    for start_idx in range(0, len(inputs) - batchsize + 1, batchsize):
        excerpt = indices[start_idx:start_idx + batchsize]
        yield inputs[excerpt], targets[excerpt]

# Training loop

Model saver.
<br>
See more:
http://cv-tricks.com/tensorflow-tutorial/save-restore-tensorflow-models-quick-complete-tutorial/

In [88]:
model_path = "./checkpoints/model.ckpt"
saver = tf.train.Saver(max_to_keep=3)

In [89]:
import time

num_epochs = 25 # amount of passes through the data

batch_size = 100 # number of samples processed at each function call

with tf.Session() as sess:
    # initialize global wariables
    sess.run(tf.global_variables_initializer())
    
    for epoch in range(num_epochs):
        # In each epoch, we do a full pass over the training data:
        train_err = 0
        train_batches = 0
        start_time = time.time()

        sess.run(tf.local_variables_initializer())
        for batch in iterate_minibatches(X_train, y_train, batch_size):
            inputs, targets = batch

            _, train_err_batch, _ = sess.run(
                [train_step, loss, update_accuracy], 
                feed_dict={input_X: inputs, target_y:targets}
            )
            train_err += train_err_batch
            train_batches += 1
            if train_batches%100 == 0:
                print(train_batches)
            time.sleep(0.002)
        train_acc = sess.run(accuracy)

        # And a full pass over the validation data:
        sess.run(tf.local_variables_initializer())
        for batch in iterate_minibatches(X_val, y_val, batch_size):
            inputs, targets = batch
            sess.run(update_accuracy, feed_dict={input_X: inputs, 
                                                 target_y:targets})
        val_acc = sess.run(accuracy)


        # Then we print the results for this epoch:
        print("Epoch {} of {} took {:.3f}s".format(epoch + 1, num_epochs, time.time() - start_time))

        print("  training loss (in-iteration):\t\t{:.6f}".format(train_err / train_batches))
        print("  train accuracy:\t\t{:.2f} %".format(train_acc * 100))
        print("  validation accuracy:\t\t{:.2f} %".format(val_acc * 100))
        
        # save model
        save_path = saver.save(sess, model_path, global_step=epoch)
        print("  Model saved in file: %s" % save_path)
        
        batch_size += (epoch*3)//4
        batch_size = min(batch_size, 1000)
        
        if val_acc > 0.995:
            break
        time.sleep(5)

100
200
300
400
500
Epoch 1 of 25 took 174.551s
  training loss (in-iteration):		0.292077
  train accuracy:		90.68 %
  validation accuracy:		97.00 %
  Model saved in file: ./checkpoints/model.ckpt-0
100
200
300
400
500
Epoch 2 of 25 took 174.097s
  training loss (in-iteration):		0.081920
  train accuracy:		97.53 %
  validation accuracy:		98.07 %
  Model saved in file: ./checkpoints/model.ckpt-1
100
200
300
400
500
Epoch 3 of 25 took 175.578s
  training loss (in-iteration):		0.059325
  train accuracy:		98.26 %
  validation accuracy:		97.91 %
  Model saved in file: ./checkpoints/model.ckpt-2
100
200
300
400
Epoch 4 of 25 took 173.715s
  training loss (in-iteration):		0.051032
  train accuracy:		98.44 %
  validation accuracy:		98.46 %
  Model saved in file: ./checkpoints/model.ckpt-3
100
200
300
400
Epoch 5 of 25 took 171.915s
  training loss (in-iteration):		0.037823
  train accuracy:		98.91 %
  validation accuracy:		98.39 %
  Model saved in file: ./checkpoints/model.ckpt-4
100
200
300
4

Now we can restore saved parameters:

In [90]:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    load_path = saver.restore(sess, saver.last_checkpoints[-1])
    print("Model restored from file: %s" % save_path)
    
    sess.run(tf.local_variables_initializer())
    for batch in iterate_minibatches(X_test, y_test, 500):
        inputs, targets = batch
        sess.run(update_accuracy, feed_dict={input_X: inputs, 
                                                   target_y:targets})
    test_acc = sess.run(accuracy)
    print("Final results:")
    print("  test accuracy:\t\t{:.2f} %".format(
        test_acc* 100))

    if test_acc * 100 > 99.5:
        print ("Achievement unlocked: 80lvl Warlock!")
    else:
        print ("We need more magic!")

INFO:tensorflow:Restoring parameters from ./checkpoints/model.ckpt-24
Model restored from file: ./checkpoints/model.ckpt-24
Final results:
  test accuracy:		99.13 %
We need more magic!


# Now improve it!

* Moar layers!
* Moar units!
* Different nonlinearities!