# Task1: Many to one Model:GRU 3 Layer
In this task we classifiy the MNSIT data set by using a RNN of different sizes and differnt cells. The binarized images are passed into the RNN cell one pixel at a time and the final output is taken. Here the output is transformed into fully connected layer of 100 units with a ReLU activation function. It 
is then passed through another linear layer of width 10 and a softmax operation is used to get the probabilites of each digit. Cross entropy loss function is used for the cost and the Adam optimizer is used to minimize this value.

It was noted that the lstm cell was quite unstable and did not perform as well as the GRU cell over the same number of epochs. Many different learning rates turned out to be too small to learn anything in time or to large such that it would quite well for some time till it jumped to much and lost all accuracy.

In [3]:
%matplotlib inline
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
from sklearn.metrics import confusion_matrix
import time
from datetime import timedelta
import math

import os
import pdb



In [4]:
# Need to load the MNist data to work with
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("data/MNIST/", one_hot=True)
# one hot true gives the y labels as vectors with 1's which correspond to the number it is

Extracting data/MNIST/train-images-idx3-ubyte.gz
Extracting data/MNIST/train-labels-idx1-ubyte.gz
Extracting data/MNIST/t10k-images-idx3-ubyte.gz
Extracting data/MNIST/t10k-labels-idx1-ubyte.gz


In [5]:
print('The size of the data that we will be working with:')
print('train set: {} '.format(len(mnist.train.labels)))
print('valid set: {} '.format(len(mnist.validation.labels)))
print('test set: {} '.format(len(mnist.test.labels)))

The size of the data that we will be working with:
train set: 55000 
valid set: 5000 
test set: 10000 


In [6]:
# define parameters
n_classes = 10 # digits
batch_size = 100
chunk_size = 1 # input per timestep
n_chunks = 784 # number of pixels/timesteps
rnn_size = 32
units_output = 100 # output after rnn
learning_rate = 0.001

# placeholders tp store the inputs and labels 
x = tf.placeholder('float', [None, n_chunks,chunk_size],name='InputData')
y = tf.placeholder('float',name='LabelData')

logs_path = '/tmp/tensorflow_logs/example'

In [7]:
layer1 = {'weights':tf.Variable(tf.random_normal([rnn_size,units_output]),name='Weights1'),
             'biases':tf.Variable(tf.random_normal([units_output]),name='Bias')}
layer2 = {'weights':tf.Variable(tf.random_normal([units_output,n_classes]),name='Weights2'),
             'biases':tf.Variable(tf.random_normal([n_classes]),name='Bias2')}

In [8]:
# Here the gru cell is defined of specified size
gru_cell = tf.nn.rnn_cell.GRUCell(rnn_size)

# make it 3 layers
gru_cell = tf.nn.rnn_cell.MultiRNNCell(cells=[gru_cell] * 3, state_is_tuple=True)

# The ouputs are a tensor of all the ouput states of the pixels
# interested only in the last
outputs, states = tf.nn.dynamic_rnn(cell = gru_cell, inputs = x,dtype=tf.float32)

# Checking to make sure of the correct shape
print(outputs.get_shape())


(?, 784, 32)


In [9]:
# Many to one model so we need only the last output of the rnn.
outputs = outputs[:, -1, :]

# linear transformation
output_rnn = tf.matmul(outputs,layer1['weights']) + layer1['biases']

# Relu activation
act = tf.nn.relu(output_rnn)

# linear transformatino
output = tf.matmul(act,layer2['weights'])+layer2['biases']

# calculate cost of batch
Xent =  tf.nn.softmax_cross_entropy_with_logits(output,y)

# calculate the average cost per image and optimize
with tf.name_scope('Loss'):
    cost = tf.reduce_mean( Xent )
with tf.name_scope('Adam'):    
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)

Calculate the accuracy by comparing the output label with the true label of the images

In [10]:
# create a boolean of correct labels and take the average to 
# get the percentage of correctly available
with tf.name_scope('Accuracy'):
    correct_label = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_label, tf.float32))

### Save the models

In [11]:
# Need to save the model, weights and biases varibles
saver = tf.train.Saver(write_version = tf.train.SaverDef.V2)

# Suggested Directory to use
save_MDir = 'models/3Layer_gru/'


#create the directory if it does not exist already
if not os.path.exists(save_MDir):
    os.makedirs(save_MDir)

save_model = os.path.join(save_MDir,'best_accuracy_3')


In [12]:
# Need to save the model, weights and biases varibles
saver2 = tf.train.Saver()

# Suggested Directory to use
save2_MDir = 'models/3Layer_gru/best'


#create the directory if it does not exist already
if not os.path.exists(save2_MDir):
    os.makedirs(save2_MDir)

save_model2 = os.path.join(save2_MDir,'best_accuracy_3')

### Save the models

In [13]:
def binarize(images, threshold=0.1):
    return (threshold < images).astype("float32")

In [14]:
# Initializing the variables
init = tf.global_variables_initializer()

# Create a summary to monitor cost tensor
tf.summary.scalar("loss", cost)
# Create a summary to monitor accuracy tensor
tf.summary.scalar("accuracy", accuracy)
# Merge all summaries into a single op
merged_summary_op = tf.summary.merge_all()

### Optimizer function
Here the main work is done. Each batch is passed through and outputs the cost at the end of each epoch.

In [15]:
def optimize(hm_epochs):
    with tf.Session() as sess:
        sess.run(init)
        summary_writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())        
        start_epoch = time.time()
        freq_epoch = 1
        count = 0
        acc_list = []
        for epoch in range(hm_epochs):
            print("-------Running Epoch:{}-------".format(epoch+1))
            epoch_loss = 0
            
            start = time.time()
            n_batches = int(mnist.train.num_examples/batch_size)
            #n_batches = 10
            freq = int(n_batches/5)
            for i in range(n_batches):
                epoch_x, epoch_y = mnist.train.next_batch(batch_size)
                epoch_x = binarize(epoch_x)
                epoch_x = epoch_x.reshape((batch_size,n_chunks,chunk_size))


                _, c,summary = sess.run([optimizer, cost,merged_summary_op], feed_dict={x: epoch_x, y: epoch_y})
                summary_writer.add_summary(summary, epoch * n_batches + i)
                epoch_loss += c
                if i% freq ==0 or i == (n_batches):
                    print("Trained {} batches with current epoch cost: {}".format(i+1,epoch_loss))
                    acc_train = sess.run(accuracy,feed_dict = {x: epoch_x, y: epoch_y})
                    acc_test = accuracy.eval({x: binarize(mnist.test.images[0:batch_size].reshape((-1, 784, 1))), y: mnist.test.labels[0:batch_size]})
                    print("At batch: {0}, the training accuracy is: {1:.1%}".format(i+1, acc_train))
                    print("At batch: {0}, the test accuracy is: {1:.1%}".format(i+1, acc_test))
                    print("Current run time is: {} \n".format(time.time()-start_epoch))
                    
            if epoch % freq_epoch==0:
                print('Epoch', epoch+1, 'completed out of:',hm_epochs,'loss:',epoch_loss, ', time:', time.time()-start,'\n')            
                acc = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(output, 1), tf.argmax(y, 1)), 'float'))
                acc_test = acc.eval({x: binarize(mnist.test.images.reshape((-1, 784, 1))), y: mnist.test.labels})
                acc_list.append(acc_test)
                print("At end of epoch: {0}, the training accuracy in batch is: {1:.1%}".format(epoch+1, acc_train))
                print("At end of epoch: {}, the test accuracy is: {:.1%}".format(epoch+1, acc_test))
                if epoch>=0:
                    if acc_list[count]== max(acc_list):
                        saver2.save(sess= sess, save_path = save_model2)
                        print(acc_list)
                        
                        
                count = count+1

                print("Total time taken for current epoch : {:f} \n".format(time.time()-start))
        
        
        Final_acc_test,Final_cost_test = sess.run([accuracy,cost],feed_dict = {x: binarize(mnist.test.images.reshape((-1, 784, 1))), y: mnist.test.labels})
        saver.save(sess= sess, save_path = save_model)        
        print("At final epoch: {}, the test accuracy is: {:.1%}, with cost {}".format(epoch+1, Final_acc_test, Final_cost_test))
    print("Total time taken for run : {:f}".format(time.time()-start_epoch))     

In [16]:
#optimize(30)

-------Running Epoch:1-------
Trained 1 batches with current epoch cost: 7.126248836517334
At batch: 1, the training accuracy is: 12.0%
At batch: 1, the test accuracy is: 8.0%
Current run time is: 3.17871356010437 

Trained 111 batches with current epoch cost: 274.37916803359985
At batch: 111, the training accuracy is: 30.0%
At batch: 111, the test accuracy is: 38.0%
Current run time is: 274.9030005931854 

Trained 221 batches with current epoch cost: 473.86110866069794
At batch: 221, the training accuracy is: 37.0%
At batch: 221, the test accuracy is: 40.0%
Current run time is: 510.1500380039215 

Trained 331 batches with current epoch cost: 673.8907819986343
At batch: 331, the training accuracy is: 34.0%
At batch: 331, the test accuracy is: 30.0%
Current run time is: 730.2748701572418 

Trained 441 batches with current epoch cost: 870.9061751365662
At batch: 441, the training accuracy is: 31.0%
At batch: 441, the test accuracy is: 34.0%
Current run time is: 893.6729032993317 

Epoch 

# Restoring model
Here the model is restored and the values in the report match the recovered ones.

In [16]:
def print_acc(rnn_size,epochs):
    
    acc_test_list = []
    acc_train_list = []
    cost_train_list = []
    cost_test_list =[]
    b_size = 1000
    num_train = len(mnist.train.labels)
    num_test = len(mnist.test.labels)
    
    # Comment out here to use whole training set!
    num_train = len(mnist.train.labels[:10000,:])
    n_batches = num_train/b_size
    count = 0
    i = 0
    start = time.time()
    while i < num_train:
        print('Processing batch number {} of {}.'.format(count+1,n_batches))
        # The ending index for the next batch is denoted j.
        j = min(i + b_size, num_train)
        
        if j<= num_test:
            
            # Get the images from the test-set between index i and j.
            images_test = mnist.test.images.reshape((-1, 784, 1))[i:j, :]

            # Get the associated labels.
            labels_test = mnist.test.labels[i:j, :]

            acc_test, cost_test = sess.run([accuracy,cost],feed_dict = {x: binarize(images_test), y: labels_test})
            #print(cost_test)

            acc_test_list.append(acc_test)
            cost_test_list.append(cost_test)
        images_train = mnist.train.images.reshape((-1, 784, 1))[i:j, :]

        # Get the associated labels.
        labels_test = mnist.train.labels[i:j, :]

        acc_train,cost_train = sess.run([accuracy,cost],feed_dict = {x: binarize(images_train), y: labels_test})
        acc_train_list.append(acc_train)
        cost_train_list.append(cost_train)
        i = j
        count +=1
        
    #print(cost)
    #print(time.time()-start)
    print('\n')
    
    total_acc_train = sum(acc_train_list)/len(acc_train_list)
    total_acc_test = sum(acc_test_list)/len(acc_test_list)
    total_cost_train = sum(cost_train_list)/len(cost_train_list)
    total_cost_test = sum(cost_test_list)/len(cost_test_list)
    #print(total_acc)
    #total_cost = sum(cost)/len(cost)
    print(time.time()-start)
    print('The training accuracy for the 3 layer {} unit GRU model is {:.1%} after {} epochs'.format(rnn_size,total_acc_train,epochs))
    print('The training cost for the 3 layer {} unit GRU model is {} after {} epochs \n'.format(rnn_size,total_cost_train,epochs))
    print('The test accuracy for the 3 layer {} unit GRU model is {:.1%} after {} epochs'.format(rnn_size,total_acc_test,epochs))
    print('The test cost for the 3 layer {} unit GRU model is {} after {} epochs \n'.format(rnn_size,total_cost_test,epochs))
    return(total_acc_train,total_acc_test)

In [17]:
save_MDir = 'models/3Layer_gru/'
save_model = os.path.join(save_MDir,'best_accuracy_3')
init = tf.global_variables_initializer()

In [18]:
sess= tf.Session()
sess.run(init)
saver2restore = tf.train.Saver()
saver2restore.restore(sess = sess, save_path= save_model)

In [18]:
print_acc(rnn_size=32, epochs = 30)
sess.close()

Processing batch number 1 of 55.0.
Processing batch number 2 of 55.0.
Processing batch number 3 of 55.0.
Processing batch number 4 of 55.0.
Processing batch number 5 of 55.0.
Processing batch number 6 of 55.0.
Processing batch number 7 of 55.0.
Processing batch number 8 of 55.0.
Processing batch number 9 of 55.0.
Processing batch number 10 of 55.0.
Processing batch number 11 of 55.0.
Processing batch number 12 of 55.0.
Processing batch number 13 of 55.0.
Processing batch number 14 of 55.0.
Processing batch number 15 of 55.0.
Processing batch number 16 of 55.0.
Processing batch number 17 of 55.0.
Processing batch number 18 of 55.0.
Processing batch number 19 of 55.0.
Processing batch number 20 of 55.0.
Processing batch number 21 of 55.0.
Processing batch number 22 of 55.0.
Processing batch number 23 of 55.0.
Processing batch number 24 of 55.0.
Processing batch number 25 of 55.0.
Processing batch number 26 of 55.0.
Processing batch number 27 of 55.0.
Processing batch number 28 of 55.0.
P

In [19]:
print_acc(rnn_size=32, epochs = 30)
sess.close()

Processing batch number 1 of 10.0.
Processing batch number 2 of 10.0.
Processing batch number 3 of 10.0.
Processing batch number 4 of 10.0.
Processing batch number 5 of 10.0.
Processing batch number 6 of 10.0.
Processing batch number 7 of 10.0.
Processing batch number 8 of 10.0.
Processing batch number 9 of 10.0.
Processing batch number 10 of 10.0.


46.34651732444763
The training accuracy for the 3 layer 32 unit GRU model is 97.2% after 30 epochs
The training cost for the 3 layer 32 unit GRU model is 0.08718579411506652 after 30 epochs 

The test accuracy for the 3 layer 32 unit GRU model is 97.2% after 30 epochs
The test cost for the 3 layer 32 unit GRU model is 0.08031027801334858 after 30 epochs 

