# Comparison of Pure TensorFlow, Keras and TF Learn

In this notebook we are going to present the same problem being solved in three different ways. In all the cases, TensorFlow is going to be used as backend. The main difference is that two high levels API are going to be evaluated and compared each other and also with the original TensorFlow implementation. The problem to be solved is the classic MNIST. First we are goig to implement in TensorFlow, next in Keras and finally in TFLearn. In the end, a summary of the pros and cons of each is discussed. 

## Input Data

## Implementation in TensorFlow

In [3]:
# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

import tensorflow as tf

# Parameters
learning_rate = 0.1
num_steps = 500
batch_size = 128
display_step = 100

# Network Parameters
n_hidden_1 = 128 # 1st layer number of neurons
n_hidden_2 = 512 # 2nd layer number of neurons
num_input = 784 # MNIST data input (img shape: 28*28)
num_classes = 10 # MNIST total classes (0-9 digits)

# tf Graph input
X = tf.placeholder("float", [None, num_input])
Y = tf.placeholder("float", [None, num_classes])

# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([num_input, n_hidden_1])),
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_hidden_2, num_classes]))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'b2': tf.Variable(tf.random_normal([n_hidden_2])),
    'out': tf.Variable(tf.random_normal([num_classes]))
}

# Create model
def neural_net(x):
    # Hidden fully connected layer with 256 neurons
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    # Hidden fully connected layer with 256 neurons
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
    # Output fully connected layer with a neuron for each class
    out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
    return out_layer
  
# Construct model
logits = neural_net(X)

# Define loss and optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits=logits, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)

# Evaluate model (with test logits, for dropout to be disabled)
correct_pred = tf.equal(tf.argmax(logits, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

# Start training
with tf.Session() as sess:

    # Run the initializer
    sess.run(init)

    for step in range(1, num_steps+1):
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # Run optimization op (backprop)
        sess.run(train_op, feed_dict={X: batch_x, Y: batch_y})
        if step % display_step == 0 or step == 1:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([loss_op, accuracy], feed_dict={X: batch_x,
                                                                 Y: batch_y})
            print("Step " + str(step) + ", Minibatch Loss= " + \
                  "{:.4f}".format(loss) + ", Training Accuracy= " + \
                  "{:.3f}".format(acc))

    print("Optimization Finished!")

    # Calculate accuracy for MNIST test images
    print("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={X: mnist.test.images,
                                      Y: mnist.test.labels}))

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use urllib or similar directly.
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting /tmp/data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from t

## Implementation in Keras

In [4]:
import tensorflow as tf
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation=tf.nn.relu),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)
test_accuracy = model.evaluate(x_test, y_test)[1]
print("Test accuracy: {}".format(test_accuracy))

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test accuracy: 0.9764


## Implementation in TFLearn

In [5]:
!pip install tflearn

import tensorflow as tf
import tflearn
import tflearn.datasets.mnist as mnist

X_train, y_train, X_test, y_test = mnist.load_data(one_hot=True)

net = tflearn.input_data([None, 784])
net = tflearn.fully_connected(net, 128, activation='ReLU')
net = tflearn.fully_connected(net, 512, activation='ReLU')
net = tflearn.fully_connected(net, 10, activation='softmax')
net = tflearn.regression(net,
                         optimizer='sgd',
                         learning_rate=0.1,
                         loss='categorical_crossentropy')

model = tflearn.DNN(net)
model.fit(X_train, y_train,
          validation_set=0.1, show_metric=True, batch_size=100, n_epoch=5)

test_accuracy = model.evaluate(X_test, y_test)
print("Test accuracy: ", test_accuracy)

Training Step: 2474  | total loss: [1m[32m0.11723[0m[0m | time: 4.569s
| SGD | epoch: 005 | loss: 0.11723 - acc: 0.9696 -- iter: 49400/49500
Training Step: 2475  | total loss: [1m[32m0.11746[0m[0m | time: 5.596s
| SGD | epoch: 005 | loss: 0.11746 - acc: 0.9697 | val_loss: 0.15667 - val_acc: 0.9551 -- iter: 49500/49500
--
Test accuracy:  [0.9601]


## Textual Quiz

### Setup code
Please execute the following line of code so that your answer in the quizes can be checked:

In [0]:
from base64 import b64encode, b64decode

encoded_answer = {}
encoded_answer[1] = b'eydhJzogJycsICdiJzogJyd9'

def check_answer(students_answer, question, see_correct_answer=False):
  if see_correct_answer:
    expected_answer = b64decode(encoded_answer[question]).decode('utf-8')
    print("The expected answer is {}".format(expected_answer))
  else:
    if b64encode(str(students_answer).lower().strip().encode('utf-8')) == encoded_answer[question]:
      print('You got it right!')
    else:
      print("Please try again!")

In [0]:
def encode_answer(a):
  print(b64encode(str(a).lower().strip().encode('utf-8')))


### Question 1) Please fill the gaps in the following definitions:
- Each iteration from a neural network is called 'a'
- A sample drawn from a dataset for each iteration in a neural network is called 'b'

Please answer to what 'a' and 'b' stands for below (example: `{'a': 'neural network', 'b':'True'}`:


In [8]:
# TODO: Please answer Question 1 below )
reply = {'a':'', 'b':''}

# Check if your answer is correct. Change see_correct_answer to 'True' to see expected answer 
check_answer(reply, question=1, see_correct_answer=False)

You got it right!


In [0]:
encode_answer(str({'a':'', 'b':''}))

b'eydhJzogJycsICdiJzogJyd9'


### 2. When are the weights updated?
Whenever you train the network using batch means that you have chosen to train using batch gradient descent. There are three variants for gradient descent algorithm:

Gradient Descent
Stochastic Gradient Descent
Batch Gradient Descent
The first one passes the whole data through the network and finds the error rate for all of them and finds the gradients with respect to all the data samples and updates the weights after passing the whole data-set. That means for each epoch, passing the whole data-set through the network, one update occurs. This update is accurate toward descending gradient.

The second one, updates the weights after passing each data which means if your data sample has one thousand samples, one thousand updates will happen whilst the previous method updates the weights one time per the whole data-sample. This method is not accurate but is so much faster than the previous one.

The last one tries to find a trade-off between the above approaches. You specify a batch size and you will update the weights after passing the data samples in each batch, means the gradients are calculated after passing each batch. Suppose you have one thousand data sample and you have specified a batch size with one hundred data sample. You will have 10 weight update for each epoch. This method is more accurate than the second approach and is more faster than the first approach.

Do I back propagate after each batch has been presented to network or after each image?

Your method is the last one. Consequently, after passing the entire batch, you would update the weights.

In [0]:
# TODO: Please answer Question 2 below
reply = {'a': '', 'b':''}

# Change see_correct_answer to 'True' to see expected answer 
check_answer(reply, question=2, see_correct_answer=False)

In [0]:
b64encode(str(answer_1).encode('utf-8')) 

b'eydhJzogJycsICdiJzogJyd9'

In [0]:
from base64 import b64encode, b64decode

In [0]:
b64decode(b'ZmVybmFuZG8ud2l0dG1hbm5AZ21haWwuY29t')

b'fernando.wittmann@gmail.com'

In [0]:
b64encode(b'fernando.wittmann@gmail.com')

b'ZmVybmFuZG8ud2l0dG1hbm5AZ21haWwuY29t'