# Tensorflow Basic concepts

In [1]:
import addutils.toc ; addutils.toc.js(ipy_notebook=True)

In [2]:
import warnings
import scipy.io
import numpy as np
import tensorflow as tf
import pandas as pd
from time import time
import sklearn as sk
import sklearn.metrics
from IPython.display import Image
from bokeh.models import ColumnDataSource
from addutils import css_notebook
css_notebook()

In [3]:
import bokeh.plotting as bk
bk.output_notebook()

## Required packages and tools

This notebook requires few additional packages. Please be sure to install them properly before running the notebook.

The main argument we treat in this notebook is TensorFlow, a library for machine learning and deep learning recently open sourced by google. For more information please visit [TensorFlow](http://www.tensorflow.org).

**<font color='red'>WARNING</font>**: this library supports only Linux and Mac OS. At the moment Windows operating system is not supported.

TensorFlow, is conceptually similar to theano; the computation is formally a graph, with nodes representing operations while edges representing tensors (multidimensional data) communicated between operations. According to TensorFlow web site, the flow of tensors through the graph is where TensorFlow gets its name. It is not intended to be only a neural network library but to perform any computation that can be expressed as a graph. TensorFlow automatic differentiation is especially suited for gradient based machine learning algorithms. The library is written in C++ and it has nice Python bindings. Moreover it can run both on CPU and GPU.

We suggest to follows the steps outlined in:

[https://www.tensorflow.org/versions/master/get_started/os_setup.html#pip_install](https://www.tensorflow.org/versions/master/get_started/os_setup.html#pip_install)

we recommend to follow pip installation process:

- first choose which package to install, with or without GPU support
    
- install TensorFlow for python 3 (this notebook use it)
`$ sudo pip3 install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.6.0-cp34-none-linux_x86_64.whl`

If you installed the GPU version of TensorFlow, you must also install the Cuda Toolkit 7.0 and CUDNN 6.5 V2. 

## 1 Basic Usage

### 1.1 Overview

TensorFlow represent computations as **graphs**. Nodes in the graph are called **ops** (operations). An op takes zero or more **Tensors** and produces zero or more Tensors as output. A Tensor is a multidimensional array with a specified type. The graph is a description of a computation, in order to actually execute the computation a graph must be launched in a **session**. A session exectue a specific graph on one of the available **devices** (that can be either CPUs or GPUs).

### 1.2 Building the Graph

It is possibile to build a graph by starting with nodes that do not need any input, such as constant nodes. Then it is possible to use the output of the constant node as input to other operations. TensorFlow uses a default graph to which operations are added. It is sufficient for most operations but it is also possible to manage multiple graphs with the `Graph` class.

In [4]:
x = tf.constant([[4.]])
y = tf.constant([[3.]])
product = tf.matmul(x, y)

The code creates three nodes: two constant and an operation (multiplication) that takes two inputs (the two constants) and produces an output (product). To actually procude an output is is necessary to run the graph in a session

### 1.3 Launching a Session

Without argument the Session construct uses the default graph. It is necessary to close a session once it is oper, otherwise use `with ... as  ...` statement.

In [5]:
with tf.Session() as sess:
    result = sess.run([product])
    print(result)

[array([[ 12.]], dtype=float32)]


Computation runs on GPU by default (if you have one on your computer otherwise it switches automatically on CPU). It is possible to use a specific device for a session with `with tf.device("/gpu:1"):` statement. For example previous command execute the graph on the second GPU of the machine. Try to change the string and execute the code on CPU (or another GPU of your machine)

In [6]:
with tf.Session() as sess:
    with tf.device("/cpu:0"):
        result = sess.run([product])
        print(result)

[array([[ 12.]], dtype=float32)]


### 1.4 Tensor Variables

TensorFlow uses tensors to represents all data. Only tensors are passed between ops in the graph. Tensors are an n-dimensional array of lists. **Variables** are used to maintain state accross executions of the graph. In this example `state` is initialized to zero and updated each time `update` is run. When using variables, they must be initialized after launching the graph, that is after creating a session.

In [7]:
state = tf.Variable(0, name="counter")

one = tf.constant(1)
new_value = tf.add(state, one)
update = tf.assign(state, new_value)

init_op = tf.initialize_all_variables()

with tf.Session() as sess:
    sess.run(init_op)
    print('state: ' + str(sess.run(state)))
    for _ in range(3):
        sess.run(update)
        print('state: ' + str(sess.run(state)))

state: 0
state: 1
state: 2
state: 3


Here `assign` is part of the computational graph as `add` and other operators. They won't produce an effect until `run()` executes the expression. Variables are typically used to represent parameters of a model, for example in neural network they are used to store the weights matrix, that it is updated at every execution of the graph. It is possible to **fetch** more than one variable by passing them simultaneously to the run() command (`session.run([var1, var2])`).

So far we have covered how to store values in constant and use variable to update their values. TensorFlow provides also a method to pass a value to the variables with a **feed** mechanism. A feed replace the value of an operation with a value. The typical case is to use `placeholder` to feed the operation with values.

In [8]:
input1 = tf.placeholder(tf.float32)
input2 = tf.placeholder(tf.float32)
output = tf.mul(input1, input2)

with tf.Session() as sess:
    print(sess.run([output], feed_dict={input1:[7.], input2:[2.]}))

[array([ 14.], dtype=float32)]


A variable declaered as `placeholder` expects a feed and generate an error if it is not supplied

## 2 Example: classification with a Multi Layer Perceptron

This tutorial covers the simplest neural network: a multilayer perceptron (MLP) also known as feedforward neural network.

We will learn to classify MNIST handwritten digit images into their correct label (0-9).

First, let's load the data and take a look:

In [9]:
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [10]:
print('train examples: ', mnist.train.num_examples)
print('test examples: ', mnist.test.num_examples)
print('validation examples: ', mnist.validation.num_examples)

train examples:  55000
test examples:  10000
validation examples:  5000


The standard way to use TensorFlow is first to create the graph and then start a session. A more flexible way to perform a computation is using InteractiveSession. It allows to interleave operations of construction with ones that run the graph.

In [11]:
import tensorflow as tf
sess = tf.InteractiveSession()

A MLP looks like this: input -> hidden layer -> output classification

Each stage is just a matrix multiplication with a nonlinear function applied after. Inputs are matrices where rows are examples and columns are pixels.

Defining a `tf.placeholder` with `None` as its first dimension indicates that the first dimension, corresponding to the batch size, can be any size.

The hidden layer has 19 units and it computes a nonlinear transformation, using hyperbolic tangent, on the linear combination of input, weight and bias. The weight matrix is initialized to random values, while the bias vector is initialized to a costant (small) value.

In [12]:
x = tf.placeholder("float", shape=[None, 784])
y_ = tf.placeholder("float", shape=[None, 10])

W_x = tf.Variable(tf.truncated_normal([784,19], stddev=0.1))
b_h = tf.Variable(tf.constant(0.1, shape=[19]))

h = tf.nn.tanh(tf.matmul(x, W_x) + b_h)

The last stage compute a softmax transformation of the hidden layer. The softmax is a generalization of the logistic function, it is used to calculate the probability associated to each class and it is usefull in multiclass classification problems. Here a class is one of the 10 possibile digits.

In [13]:
W_h = tf.Variable(tf.truncated_normal([19, 10], stddev=0.1))
b_y = tf.Variable(tf.constant(0.1, shape=[10]))

#y = tf.nn.softmax(tf.matmul(h, W_h) + b_y)
logits = tf.matmul(h, W_h) + b_y

After we constructed the graph, we need to initialize the variable in the session.

In [14]:
sess.run(tf.initialize_all_variables())

Next we define the cost function (cross entropy in this case). Now it is possible to use TensorFlow automatic differentiation to find the gradients of the cost with respect to each variable. For this example we use basic steepest gradient descent as optimizer. The learning rate is fixed at 0.05.

In [15]:
#cross_entropy = -tf.reduce_sum(y_*tf.log(y))
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits,
                                                        y_,
                                                        name='xentropy')

loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')
#train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(loss)

In [16]:
#print('Epochs completed: ', end='')
#for _ in range(25): # epochs
#    for i in range(1100): # number of examples divided by batchsize
#        batch = mnist.train.next_batch(50)
#        train_step.run(feed_dict={x: batch[0], y_: batch[1]})
#    print(mnist.train.epochs_completed, end=' '),

To evaluate the model we first compute the position of the maximum entry in both predicted and real output. This corresponds to the class to which the example belongs. Then we compare the two vector to determine what fraction of the predicted output is correct.

In [17]:
#correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
correct_prediction = tf.equal(tf.argmax(tf.nn.softmax(logits),1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
#print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

This cell displays the the graphs. To actually see the data, first start the algorithm. The plots will updates dinamically.

In [18]:
train_accuracy_vector = []
valid_accuracy_vector = []
test_accuracy_vector = []
cost_vector = []
input_vector = []

s_train = ColumnDataSource(data=dict(x=input_vector, y=train_accuracy_vector))
s_valid = ColumnDataSource(data=dict(x=input_vector, y=valid_accuracy_vector))
s_test = ColumnDataSource(data=dict(x=input_vector, y=test_accuracy_vector))
s_cost = ColumnDataSource(data=dict(x=input_vector, y=cost_vector))

fig1 = bk.figure(plot_width=500, 
                plot_height=300,
                x_axis_label='Epochs',
                y_axis_label='Precision (%)')

fig1.line('x', 'y', source=s_train, legend='training')
fig1.line('x', 'y', source=s_valid, color='red', legend='validation')
fig1.line('x', 'y', source=s_test, color='green', legend='test')

fig2 = bk.figure(plot_width=500, 
                plot_height=300,
                x_axis_label='Epochs',
                y_axis_label='Cost')

fig2.line('x', 'y', source=s_cost)

p = bk.vplot(fig1, fig2)

In [19]:
from bokeh.io import push_notebook

def update_c(x_vec, y_vec):
    s_cost.data['x'] = x_vec
    s_cost.data['y'] = y_vec
    push_notebook()
    
def update(x_vec, t_vec, v_vec, e_vec):
    s_train.data['x'] = x_vec
    s_valid.data['x'] = x_vec
    s_test.data['x'] = x_vec
    s_train.data['y'] = t_vec
    s_valid.data['y'] = v_vec
    s_test.data['y'] = e_vec
    push_notebook()

In [20]:
bk.show(p)

<bokeh.io._CommsHandle at 0x7f1b187aa8d0>

Ok, now we are ready to launch the training:

In [21]:
epochs = 25
batch_size = 50
train_batches = mnist.train.num_examples // batch_size

for epoch in range(epochs+1):
    train_costs = []
    train_accuracy = []
    for i in range(train_batches):
        batch = mnist.train.next_batch(batch_size)
        #train_step.run(feed_dict={x: batch[0], y_: batch[1]})
        #_, loss = sess.run([train_step, cross_entropy], feed_dict={x: batch[0], y_: batch[1]})
        _, loss_value = sess.run([train_step, loss], feed_dict={x: batch[0], y_: batch[1]})
        train_accuracy.append(accuracy.eval(feed_dict={x: batch[0], y_: batch[1]}))
        train_costs.append(loss_value)

    train_accuracy_vector.append(np.mean(train_accuracy) * 100)
    cost_vector.append(np.mean(train_costs))
    update_c(range(epoch), cost_vector)
     
    valid_accuracy_vector.append(accuracy.eval(feed_dict={x: mnist.validation.images, y_: mnist.validation.labels}) * 100)
    test_accuracy_vector.append(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}) * 100)
    
    update(range(epoch), train_accuracy_vector, valid_accuracy_vector, test_accuracy_vector)

**REMARK** Epochs are calculated directly by the mnist object. Whenver the number of batches requested is greather than the number of examples it automatically updates the number of epochs. The object shuffles also the training set for better learning.

---

Visit [www.add-for.com](<http://www.add-for.com/IT>) for more tutorials and updates.

This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.