# Introduction To TensorFlow

## Why TensorFlow?

The entire purpose of TensorFlow is to have a so-called computational graph that can be executed much more efficiently than if the same calculations were to be performed directly in Python. TensorFlow can be more efficient than NumPy because TensorFlow knows the entire computation graph that must be executed, while NumPy only knows the computation of a single mathematical operation at a time.




<center><img src="http://adventuresinmachinelearning.com/wp-content/uploads/2017/03/Simple-graph-example-260x300.png" width="320" height="300" align="center"/></center>

This may seem like a silly example – but notice a powerful idea in expressing the equation this way: two of the computations $(d=b+c$  and $ e=c+2)$ can be performed in parallel.

---




TensorFlow can also automatically calculate the gradients that are needed to optimize the variables of the graph so as to make the model perform better. This is because the graph is a combination of simple mathematical expressions so the gradient of the entire graph can be calculated using the chain-rule for derivatives.

TensorFlow can also take advantage of multi-core CPUs as well as GPUs - and Google has even built special chips just for TensorFlow which are called TPUs (Tensor Processing Units) that are even faster than GPUs.

A TensorFlow graph consists of the following parts which will be detailed below:

* Placeholder variables used to feed input into the graph.
* Model variables that are going to be optimized so as to make the model perform better.
* The model which is essentially just a mathematical function that calculates some output given the input in the placeholder variables and the model variables.
* A cost measure that can be used to guide the optimization of the variables.
* An optimization method which updates the variables of the model.




# Let's Start with a very Simple Example!

In [0]:
import tensorflow as tf
import numpy as np

In [0]:
# first, create a TensorFlow constant
const = tf.constant(2.0, name="const")

# # create TensorFlow variables
# b = tf.Variable(2.0, name='b')

# create TensorFlow variables
b = tf.placeholder(tf.float32, [None, 1], name='b')


c = tf.Variable(1.0, name='c')
# now create some operations
d = tf.add(b, c, name='d')
e = tf.add(c, const, name='e')
a = tf.multiply(d, e, name='a')

# setup the variable initialisation
init_op = tf.global_variables_initializer()

# config = tf.ConfigProto()
# config.gpu_options.per_process_gpu_memory_fraction = 0.4

with tf.Session() as sess:
    # initialise the variables
    sess.run(init_op)
    # compute the output of the graph
    # a_out = sess.run(a)
    a_out = sess.run(a, feed_dict={b: np.arange(0, 10)[:, np.newaxis]})
    print("Variable a is {}".format(a_out))

Variable a is [[ 3.]
 [ 6.]
 [ 9.]
 [12.]
 [15.]
 [18.]
 [21.]
 [24.]
 [27.]
 [30.]]


<center><img src="http://adventuresinmachinelearning.com/wp-content/uploads/2017/03/TensorFlow-data-flow-graph.gif" width="400" height="500" align="center"/>
  

The animated data flows between different nodes in the graph are tensors which are multi-dimensional data arrays.  For instance, the input data tensor may be 5000 x 64 x 1, which represents a 64 node input layer with 5000 training samples.  After the input layer there is a hidden layer with rectified linear units as the activation function.  There is a final output layer (called a “logit layer” in the above graph) which uses cross entropy as a cost/loss function.  At each point we see the relevant tensors flowing to the “Gradients” block which finally flow to the Stochastic Gradient Descent optimiser which performs the back-propagation and gradient descent.

# Now Let's See a NN Example!

In [0]:
from tensorflow.examples.tutorials.mnist import input_data



<center><img src="https://cdn-images-1.medium.com/max/1600/1*9Mjoc_J0JR294YwHGXwCeg.jpeg" width="400" height="200" align="center"/>

In [0]:
 # Read Data Set
  mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

  # Python optimisation variables
  learning_rate = 0.5
  epochs = 10
  batch_size = 100

  # declare the training data placeholders
  # input x - for 28 x 28 pixels = 784
  x = tf.placeholder(tf.float32, [None, 784])
  # now declare the output data placeholder - 10 digits
  y = tf.placeholder(tf.float32, [None, 10])

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.


Notice the x input layer is 784 nodes corresponding to the 28 x 28 (=784) pixels, and the y output layer is 10 nodes corresponding to the 10 possible digits.  

In [0]:
  # now declare the weights connecting the input to the hidden layer
  W1 = tf.Variable(tf.random_normal([784, 300], stddev=0.03), name='W1')
  b1 = tf.Variable(tf.random_normal([300]), name='b1')
  # and the weights connecting the hidden layer to the output layer
  W2 = tf.Variable(tf.random_normal([300, 10], stddev=0.03), name='W2')
  b2 = tf.Variable(tf.random_normal([10]), name='b2')

First, we declare some variables for W1 and b1, the weights and bias for the connections between the input and hidden layer.  This neural network will have 300 nodes in the hidden layer, so the size of the weight tensor W1 is [784, 300].  We initialise the values of the weights using a random normal distribution with a mean of zero and a standard deviation of 0.03. 

In [0]:
 # calculate the output of the hidden layer
  hidden_out = tf.add(tf.matmul(x, W1), b1)
  hidden_out = tf.nn.relu(hidden_out)

  # now calculate the hidden layer output - in this case, let's use a softmax activated
  # output layer
  y_ = tf.nn.softmax(tf.add(tf.matmul(hidden_out, W2), b2))

We finalise the hidden_out operation by applying a rectified linear unit activation function to the matrix multiplication plus bias.  Note that TensorFlow has a rectified linear unit activation already setup for us, tf.nn.relu.

We use a softmax activation for the output layer – we can use the included TensorFlow softmax function tf.nn.softmax.

In [0]:
  # now let's define the cost function which we are going to train the model on
  y_clipped = tf.clip_by_value(y_, 1e-10, 0.9999999)
  cross_entropy = -tf.reduce_mean(tf.reduce_sum(y * tf.log(y_clipped)
                                                + (1 - y) * tf.log(1 - y_clipped), axis=1))

We also have to include a cost or loss function for the optimisation / backpropagation to work on. Here we’ll use the cross entropy cost function. The first line is an operation converting the output y_ to a clipped version, limited between 1e-10 to 0.999999.  This is to make sure that we never get a case were we have a log(0) operation occurring during training – this would return NaN and break the training process.

In [0]:
 # add an optimiser
  optimiser = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cross_entropy)

Here we are just using the gradient descent optimiser provided by TensorFlow.

In [0]:
  # finally setup the initialisation operator
  init_op = tf.global_variables_initializer()

  # define an accuracy assessment operation
  correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
  accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


  # start the session
  with tf.Session() as sess:
      # initialise the variables
      sess.run(init_op)
      total_batch = int(len(mnist.train.labels) / batch_size)
      for epoch in range(epochs):
          avg_cost = 0
          for i in range(total_batch):
              batch_x, batch_y = mnist.train.next_batch(batch_size=batch_size)
              o, c = sess.run([optimiser, cross_entropy], feed_dict={x: batch_x, y: batch_y})
              # print(sess.run(b2))
              avg_cost += c / total_batch
          print("Epoch:", (epoch + 1), "cost =", "{:.3f}".format(avg_cost))


      print("\nTraining complete!")

      print(sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels}))


Epoch: 1 cost = 0.598
Epoch: 2 cost = 0.218
Epoch: 3 cost = 0.156
Epoch: 4 cost = 0.121
Epoch: 5 cost = 0.101
Epoch: 6 cost = 0.083
Epoch: 7 cost = 0.066
Epoch: 8 cost = 0.053
Epoch: 9 cost = 0.043
Epoch: 10 cost = 0.035

Training complete!
0.9777


# Now Let's Try Some Convolutional NN with TensorFlow!

First thing, we will create a function that will create a convolutional NN layer for us. To do this, we will call the tf function *tf.nn.conv2d*:


```
# Conv2d Function Inputs:
tf.nn.conv2d(
    input,
    filter,
    strides,
    padding,
    use_cudnn_on_gpu=True,
    name=None
)
```



In [0]:
def create_new_conv_layer(input_data, num_input_channels, num_filters, filter_shape, pool_shape, name):
  
    # setup the filter input shape for tf.nn.conv_2d
    conv_filt_shape = [filter_shape[0], filter_shape[1], num_input_channels, num_filters]

    # initialise weights and bias for the filter
    weights = tf.Variable(tf.truncated_normal(conv_filt_shape, stddev=0.03), name=name+'_W')
    bias = tf.Variable(tf.truncated_normal([num_filters]), name=name+'_b')

    # setup the convolutional layer operation
    out_layer = tf.nn.conv2d(input_data, weights, [1, 1, 1, 1], padding='SAME')

    # add the bias
    out_layer += bias

    # apply a ReLU non-linear activation
    out_layer = tf.nn.relu(out_layer)

    # now perform max pooling
   
    ksize = [1, pool_shape[0], pool_shape[1], 1]   
    strides = [1, 2, 2, 1]
    
    out_layer = tf.nn.max_pool(out_layer, ksize=ksize, strides=strides, padding='SAME')

    return out_layer


'ksize' is the argument which defines the size of the max pooling window (i.e. the area over which the maximum is calculated).  It must be 4D to match the convolution - in this case, for each image we want to use a 2 x 2 area applied to each channel.

'strides' defines how the max pooling area moves through the image - a stride of 2 in the x direction will lead to max pooling areas starting at x=0, x=2, x=4 etc. through your image.  If the stride is 1, we will get max pooling overlapping previous max pooling areas (and no reduction in the number of parameters).  In this case, we want to do strides of 2 in the x and y directions.



In [0]:
import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [0]:

# Python optimisation variables
learning_rate = 0.0001
epochs = 10
batch_size = 50

# declare the training data placeholders
# input x - for 28 x 28 pixels = 784 - this is the flattened image data that is drawn from mnist.train.nextbatch()
x = tf.placeholder(tf.float32, [None, 784])

# reshape the input data so that it is a 4D tensor.  The first value (-1) tells function to dynamically shape that
# dimension based on the amount of data passed to it.  The two middle dimensions are set to the image size (i.e. 28
# x 28).  The final dimension is 1 as there is only a single colour channel i.e. grayscale.  If this was RGB, this
# dimension would be 3

x_shaped = tf.reshape(x, [-1, 28, 28, 1])
# now declare the output data placeholder - 10 digits
y = tf.placeholder(tf.float32, [None, 10])

# create some convolutional layers
layer1 = create_new_conv_layer(x_shaped, 1, 32, [5, 5], [2, 2], name='layer1')
layer2 = create_new_conv_layer(layer1, 32, 64, [5, 5], [2, 2], name='layer2')

# flatten the output ready for the fully connected output stage - after two layers of stride 2 pooling, we go
# from 28 x 28, to 14 x 14 to 7 x 7 x,y co-ordinates, but with 64 output channels.  To create the fully connected,
# "dense" layer, the new shape needs to be [-1, 7 x 7 x 64]
flattened = tf.reshape(layer2, [-1, 7 * 7 * 64])

# setup some weights and bias values for this layer, then activate with ReLU
wd1 = tf.Variable(tf.truncated_normal([7 * 7 * 64, 1000], stddev=0.03), name='wd1')
bd1 = tf.Variable(tf.truncated_normal([1000], stddev=0.01), name='bd1')
dense_layer1 = tf.matmul(flattened, wd1) + bd1
dense_layer1 = tf.nn.relu(dense_layer1)

# another layer with softmax activations
wd2 = tf.Variable(tf.truncated_normal([1000, 10], stddev=0.03), name='wd2')
bd2 = tf.Variable(tf.truncated_normal([10], stddev=0.01), name='bd2')
dense_layer2 = tf.matmul(dense_layer1, wd2) + bd2
y_ = tf.nn.softmax(dense_layer2)

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=dense_layer2, labels=y))

# add an optimiser
optimiser = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cross_entropy)

# define an accuracy assessment operation
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# setup the initialisation operator
init_op = tf.global_variables_initializer()

# # setup recording variables
# # add a summary to store the accuracy
# tf.summary.scalar('accuracy', accuracy)

# merged = tf.summary.merge_all()
# writer = tf.summary.FileWriter('C:\\Users\\ameni\\Desktop\\csu\\Machine_Learning\\TF')
with tf.Session() as sess:
    # initialise the variables
    sess.run(init_op)
    total_batch = int(len(mnist.train.labels) / batch_size)
    for epoch in range(epochs):
        avg_cost = 0
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size=batch_size)
            _, c = sess.run([optimiser, cross_entropy], feed_dict={x: batch_x, y: batch_y})
            avg_cost += c / total_batch
        test_acc = sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels})
        print("Epoch:", (epoch + 1), "cost =", "{:.3f}".format(avg_cost), " test accuracy: {:.3f}".format(test_acc))
#         summary = sess.run(merged, feed_dict={x: mnist.test.images, y: mnist.test.labels})
#         writer.add_summary(summary, epoch)

    print("\nTraining complete!")
#     writer.add_graph(sess.graph)
    print(sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels}))


Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.

Epoch: 1 cost = 0.661  test accuracy: 0.928
Epoch: 2 cost = 0.167  test accuracy: 0.966
Epoch: 3 cost = 0.103  test accuracy: 0.976
Epoch: 4 cost = 0.075  test accuracy: 0.982
Epoch: 5 cost = 0.060  test accuracy: 0.983
Epoch: 6 cost = 0.051  test accuracy: 0.985
Epoch: 7 cost = 0.042  test accuracy: 0.987
Epoch: 8 cost = 0.036  test accuracy: 0.988
Epoch: 9 cost = 0.031  test accuracy: 0.989
Epoch: 10 cost = 0.027  test accuracy: 0.988

Training complete!
0.988


TensorBoard DEMO !!

# Dataset Example

In [0]:
import tensorflow as tf
import numpy as np

from sklearn.datasets import load_digits

digits = load_digits(return_X_y=True)
print(digits[0].shape)
print(digits[1].shape)

(1797, 64)
(1797,)


In [0]:
# split into train and validation sets
  train_images = digits[0][:int(len(digits[0]) * 0.8)]
  train_labels = digits[1][:int(len(digits[0]) * 0.8)]
  valid_images = digits[0][int(len(digits[0]) * 0.8):]
  valid_labels = digits[1][int(len(digits[0]) * 0.8):]

In [0]:
# create the training datasets
  dx_train = tf.data.Dataset.from_tensor_slices(train_images)
  
# Let's look at the type change of our data!
  print(type(train_images))
  print(type(dx_train))
  
  
  # apply a one-hot transformation to each label for use in the neural network
  dy_train = tf.data.Dataset.from_tensor_slices(train_labels).map(lambda z: tf.one_hot(z, 10))
  # zip the x and y training data together and shuffle, batch etc.
  train_dataset = tf.data.Dataset.zip((dx_train, dy_train)).shuffle(500).repeat().batch(30)
  # do the same operations for the validation set
  dx_valid = tf.data.Dataset.from_tensor_slices(valid_images)
  dy_valid = tf.data.Dataset.from_tensor_slices(valid_labels).map(lambda z: tf.one_hot(z, 10))
  valid_dataset = tf.data.Dataset.zip((dx_valid, dy_valid)).shuffle(500).repeat().batch(30)

<class 'numpy.ndarray'>
<class 'tensorflow.python.data.ops.dataset_ops.TensorSliceDataset'>


The training x and y data is zipped together in the full train_dataset. Chained along together with this zip method is first the *shuffle()* dataset method. This method randomly shuffles the data, using a buffer of data specified in the argument – 500 in this case. Next, the *repeat()* method is used, to allow the iterator to continuously extract data from this dataset. When this method (*repeat()*) is applied to the dataset with no argument, it means that the dataset can be repeated indefinitely without throwing an OutOfRangeError. Finally the data is batched with a batch size of 30.

In [0]:
  # create general iterator
  iterator = tf.data.Iterator.from_structure(train_dataset.output_types,
                                             train_dataset.output_shapes)
  next_element = iterator.get_next()
  # make datasets that we can initialize separately, but using the same structure via the common iterator
  training_init_op = iterator.make_initializer(train_dataset)
  validation_init_op = iterator.make_initializer(valid_dataset)

Now, we want to be able to extract data from either the train_dataset or the valid_dataset seamlessly. This is important, as we don’t want to have to change how data flows through the neural network structure when all we want to do is just change the dataset the model is consuming. To do this, we can use another way of creating the Iterator object – the from_structure() method. This method creates a generic iterator object – all it needs is the data types of the data it will be outputting and the output data size/shape in order to be created.

The second line of the above creates a standard get_next() iterator operation which can be called to extract data from this generic iterator structure.

In [0]:
def nn_model(in_data):
    bn = tf.layers.batch_normalization(in_data)
    fc1 = tf.layers.dense(bn, 50)
    fc2 = tf.layers.dense(fc1, 50)
    fc2 = tf.layers.dropout(fc2)
    fc3 = tf.layers.dense(fc2, 10)
    return fc3

In [0]:
logits = nn_model(next_element[0])
# add the optimizer and loss
loss = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits_v2(labels=next_element[1], logits=logits))
optimizer = tf.train.AdamOptimizer().minimize(loss)
# get accuracy
prediction = tf.argmax(logits, 1)
equality = tf.equal(prediction, tf.argmax(next_element[1], 1))
accuracy = tf.reduce_mean(tf.cast(equality, tf.float32))
init_op = tf.global_variables_initializer()
# run the training
epochs = 600
with tf.Session() as sess:
    sess.run(init_op)
    sess.run(training_init_op)
    for i in range(epochs):

        l, _, acc = sess.run([loss, optimizer, accuracy])
        if i % 50 == 0:
            # print(sess.run(tf.shape(next_element[0])))
            print("Epoch: {}, loss: {:.3f}, training accuracy: {:.2f}%".format(i, l, acc * 100))
    # now setup the validation run
    valid_iters = 100
    # re-initialize the iterator, but this time with validation data
    sess.run(validation_init_op)
    avg_acc = 0
    for i in range(valid_iters):
        acc = sess.run([accuracy])
        avg_acc += acc[0]
    print("Average validation set accuracy over {} iterations is {:.2f}%".format(valid_iters,
                                                                                 (avg_acc / valid_iters) * 100))


Epoch: 0, loss: 475.750, training accuracy: 10.00%
Epoch: 50, loss: 12.298, training accuracy: 90.00%
Epoch: 100, loss: 4.910, training accuracy: 90.00%
Epoch: 150, loss: 12.442, training accuracy: 90.00%
Epoch: 200, loss: 1.859, training accuracy: 100.00%
Epoch: 250, loss: 2.246, training accuracy: 93.33%
Epoch: 300, loss: 3.609, training accuracy: 90.00%
Epoch: 350, loss: 2.346, training accuracy: 96.67%
Epoch: 400, loss: 1.726, training accuracy: 100.00%
Epoch: 450, loss: 4.693, training accuracy: 93.33%
Epoch: 500, loss: 3.151, training accuracy: 96.67%
Epoch: 550, loss: 0.202, training accuracy: 100.00%
Average validation set accuracy over 100 iterations is 90.13%


# Why TensorFlow Debugging is Difficult? 

The concept of Computation Graph might be unfamiliar to us.
* The "Inversion of Control"
  * The actual computation (feed-forward, training) of model runs inside Session.run(), upon the computation graph, but not upon the Python code we wrote
  * What is exactly being done during an execution of session is under an abstraction barrier
* Therefore, we do not retrieve the intermediate values during the computation, unless we explicitly fetch them via Session.run()

## Debugging in TensorFlow:

* Basic ways:

  * Explicitly fetch, and print (or do whatever you want)! \\
      Session.run()
  * Tensorboard: Histogram and Image Summary
  * the tf.Print() operation
* Advanced ways:

  * A step-by-step debugger (ipdb, pdb, pudb ...)
  * tfdbg: The TensorFlow debugger
  * Tensorboard Debugger